Is CWiggler element parallelizable

Moderators: cyao, michael_borland

wmliu
Posts: 20
Joined: 30 Sep 2008, 10:52

Is CWiggler element parallelizable

Post by wmliu » 01 Nov 2010, 12:34

Hi,
I'm doing simulation of helical undulator using cwiggler elements. I tried to use pelegant with mpich2 but couldn't see any performance improvment. My computer has quad core and I launched pelegant using mpiexe -n 4 pelegant ....
Question: Is cwiggler element parallelized? Is there any difference between input file for serial simulation and parallel simulation?

Thanks,

wl

michael_borland
Posts: 1933
Joined: 19 May 2008, 09:33
Location: Argonne National Laboratory
Contact:

Re: Is CWiggler element parallelizable

Post by michael_borland » 01 Nov 2010, 13:18

CWIGGLER is parallelized. There's no need to change any input files in order to use the parallel version.

--Michael

ywang25
Posts: 52
Joined: 10 Jun 2008, 19:48

Re: Is CWiggler element parallelizable

Post by ywang25 » 01 Nov 2010, 18:28

wl,

Could you post or send your input files, so we can do a performance analysis? One tip to get reasonable performance is running simulation with a relatively large number of particles.
Please check the reference if you want all the 4 cores to track particles (n_cores-1 will be used for tracking by default):
http://www.aps.anl.gov/Accelerator_Syst ... gant.shtml

Yusong

wmliu
Posts: 20
Joined: 30 Sep 2008, 10:52

Re: Is CWiggler element parallelizable

Post by wmliu » 02 Nov 2010, 00:08

Michael and Yusong

Thanks for your replies.
I realized that I'm still using the 21.0 version. My input file doesn't work with the latest version yet. I'll fix it and give it a try with the latest version to see if it make any difference or not.

Wanming

wmliu
Posts: 20
Joined: 30 Sep 2008, 10:52

Re: Is CWiggler element parallelizable

Post by wmliu » 02 Nov 2010, 13:12

yusong

Please find the attached file. Still did not see any difference between parallel and serial runs.

Thanks,

Wanming
Last edited by wmliu on 02 Nov 2010, 16:19, edited 1 time in total.

ywang25
Posts: 52
Joined: 10 Jun 2008, 19:48

Re: Is CWiggler element parallelizable

Post by ywang25 » 02 Nov 2010, 15:03

Wanming,

I tested your input files with 1000, and 10000 particles respectively on 10 CPU cores. Pelegant is faster for both cases, although not linear. For simulation with 10000 particles, Pelegant finished in 00:07:12 with 10 cores (9 cores for tracking), and it took 00:30:36 for elegant. Do you have other machines with a larger number of CPU cores for the performance test? Some system-related issues could be eliminated if running on different environment.

wmliu
Posts: 20
Joined: 30 Sep 2008, 10:52

Re: Is CWiggler element parallelizable

Post by wmliu » 02 Nov 2010, 16:20

Yusong,

Thanks. I'll try it on a different system.

Wanming

ywang25
Posts: 52
Joined: 10 Jun 2008, 19:48

Re: Is CWiggler element parallelizable

Post by ywang25 » 02 Nov 2010, 16:40

I did a performance study on another cluster, witch does not have shared CPUs for different jobs, the performance is very good. More than 95% of time is spent on the CWIGGLER element.
Pelegant (8 cores, 7 cores for tracking) 00:04:10
elegant 00:28:34

Yusong
Last edited by ywang25 on 03 Nov 2010, 08:26, edited 1 time in total.

wmliu
Posts: 20
Joined: 30 Sep 2008, 10:52

Re: Is CWiggler element parallelizable

Post by wmliu » 02 Nov 2010, 22:38

Yusong,

After I enabled DMASTER_READTITLE_ONLY in Makefile.OAG and rebuild the both SDDSlib and Pelegant, I'm now back on track. Is it there any reason not to enable it by default?

Thanks,
Wanming

ywang25
Posts: 52
Joined: 10 Jun 2008, 19:48

Re: Is CWiggler element parallelizable

Post by ywang25 » 03 Nov 2010, 08:22

It depends on which file system you used. I tested with the default configuration on both Lustre and GPFS parallel file system. The time spent on I/O part should be trivial for this test. You can comment out the "track" command and find out how much time spent on I/O before and after you enable DMASTER_READTITLE_ONLY. It will give you some idea if the file system is the bottleneck.

Yusong

Post Reply