Hi everyone,
to speed up the calculation of injection efficiency in Diamond I am using pelegant. I have done a "crash" test with a large number of turns (6000) and 1000 paricles, to determine the minimum number of turns to define this parameter. The puzzling result is that while (plain) elegant shows a plateau after ~ 2000 turns, pelegant exhibits a dramatic drop at about 1630 turns. I am investigating the reasons for this anomaly, but so far the only difference I can see in the two cases is switching between elegant and pelegant while doing the tracking. Anyone else saw this effect? Thanks.
Injection Efficiency with pelegant
Moderators: cyao, michael_borland
Injection Efficiency with pelegant
- Attachments
-
- elegant_vs_pelegant_INJEFF.png (5.06 KiB) Viewed 20267 times
-
- Posts: 1959
- Joined: 19 May 2008, 09:33
- Location: Argonne National Laboratory
- Contact:
Re: Injection Efficiency with pelegant
This is definitely not supposed to happen. Can you post (or send me) the input files for Pelegant?
Also, you may want to be sure you have the latest version of Pelegant, 28.1.0.
--Michael
Also, you may want to be sure you have the latest version of Pelegant, 28.1.0.
--Michael
Re: Injection Efficiency with pelegant
Hi Michael it took me some time to check the actual settings and do some tests:
A) This are the files I use in my calculation, .
B) the plot I have posted, with the dramatic change in Inj. Eff. after 1630 turns was indeed produced with an old Pelegant version: 25.2.2
Just to give an idea of the times, 1500 turns (still with eff > 70%), 1000 particles and 100 nodes is performed 2m40s. Pretty fast.
C) I have repeated the calculations with Pelegant v = 28.1.0, I understand there was a bug related to the center_on_orbit feature in the &track command the version we are using is compiled from the sources for 28.1.0 with this patch to fix the above bug. With this version for Pelegant I get
n_nodes CPU_time N_turns InjEff (%)
2 34m24s 601 76.3
5 10m48s 601 76.3
10 09m18s 601 76.3
20 09m40s 601 76.3
----------------------------------------------------------
18 16m40s 1800 70.3
36 19m47s 1800 70.3
----------------------------------------------------------
400 5h45m 6000 69.3
What I see is that:
1(good) - Inj Eff is independent from the n of nodes used,
2(good) - I can go beyond 1630 turns without any sudden drop in the InjEff curve
3(bad) - the CPU time does not seem to depend too much on the number of nodes. It does when I switch from 2 to 5 to 10 nodes, but it does seem to reach an optimum for 10/15 nodes only. In particular when using 400 nodes it seems to behave like plain elegant, with no benefit coming from parallelization. I 'd would be grateful to know your findings, either as a general result or using the files I have attached.
Thanks very much
Marco
A) This are the files I use in my calculation, .
B) the plot I have posted, with the dramatic change in Inj. Eff. after 1630 turns was indeed produced with an old Pelegant version: 25.2.2
Just to give an idea of the times, 1500 turns (still with eff > 70%), 1000 particles and 100 nodes is performed 2m40s. Pretty fast.
C) I have repeated the calculations with Pelegant v = 28.1.0, I understand there was a bug related to the center_on_orbit feature in the &track command the version we are using is compiled from the sources for 28.1.0 with this patch to fix the above bug. With this version for Pelegant I get
n_nodes CPU_time N_turns InjEff (%)
2 34m24s 601 76.3
5 10m48s 601 76.3
10 09m18s 601 76.3
20 09m40s 601 76.3
----------------------------------------------------------
18 16m40s 1800 70.3
36 19m47s 1800 70.3
----------------------------------------------------------
400 5h45m 6000 69.3
What I see is that:
1(good) - Inj Eff is independent from the n of nodes used,
2(good) - I can go beyond 1630 turns without any sudden drop in the InjEff curve
3(bad) - the CPU time does not seem to depend too much on the number of nodes. It does when I switch from 2 to 5 to 10 nodes, but it does seem to reach an optimum for 10/15 nodes only. In particular when using 400 nodes it seems to behave like plain elegant, with no benefit coming from parallelization. I 'd would be grateful to know your findings, either as a general result or using the files I have attached.
Thanks very much
Marco
-
- Posts: 1959
- Joined: 19 May 2008, 09:33
- Location: Argonne National Laboratory
- Contact:
Re: Injection Efficiency with pelegant
Marco,
Several factors can affect how performance scales with the number of cores. The Pelegant manual describes some of this
Unfortunately, our systems are swamped right now so I can't do any quick scaling tests with your inputs.
--Michael
Several factors can affect how performance scales with the number of cores. The Pelegant manual describes some of this
- How many particles are being tracked: too few particles can result in poor scaling for larger numbers of cores. At some point, the overhead of starting up (system startup, loading the lattice, computing twiss parameters, etc.) dominates the whole process.
- Amount and frequency of I/O: WATCH elements are a particular concern here. Consider using the INTERVAL and FLUSH_INTERVAL controls to reduce the amount of I/O and how frequently files are updated.
- How many elements are included that require inter-processor communcation (IPC). In your case, the WATCH, RFCA, CHARGE, and BUMPER elements all involve IPC.
- Load balance can get skewed, particularly when there are few particles per core. One core may lose several particles, while others lose none. The wall clock time in this case wouldn't reflect the reduced number of particles. You can try setting load_balancing_on=1 in the &run_setup command, but it might not help (load balancing requires IPC, so it can hurt performance in some cases).
Unfortunately, our systems are swamped right now so I can't do any quick scaling tests with your inputs.
--Michael
-
- Posts: 1959
- Joined: 19 May 2008, 09:33
- Location: Argonne National Laboratory
- Contact:
Re: Injection Efficiency with pelegant
Marco,
I did a scaling study for your input files. First, I removed the WATCH element with mode="coordinates" and substituted one with mode="parameters", just to avoid excessive I/O. I did runs with 1k and 10k particles. As I suspected, 1k particles is not sufficient to make good use of more than about 30 cores. With 10k particles, scaling is decent up to 256 cores.
Depending on the speed of your network and your file system, your scaling results may be better or worse than this.
--Michael
I did a scaling study for your input files. First, I removed the WATCH element with mode="coordinates" and substituted one with mode="parameters", just to avoid excessive I/O. I did runs with 1k and 10k particles. As I suspected, 1k particles is not sufficient to make good use of more than about 30 cores. With 10k particles, scaling is decent up to 256 cores.
Depending on the speed of your network and your file system, your scaling results may be better or worse than this.
--Michael