gpu-elegant seems not to finish tracking

Moderators: cyao, michael_borland

jcytsai
Posts: 41
Joined: 01 Oct 2012, 20:18

gpu-elegant seems not to finish tracking

Post by jcytsai » 19 Oct 2020, 05:53

Hi,

With Cent OS 7.6, I have graphics card properly installed and I wish to use gpu-elegant to simulate. After the command "gpu-elegant filename.ele", gpu-elegant seems not to finish tracking and terminates earlier than expected. The output files (for example *.twi, *.sig) are incomplete. In the terminal it shows

gpuBaseInit: Using cuda device of compute capability 7.0.
gpuBaseInit: Particle kernels use 512 threads and 320 blocks.
gpuBaseInit: Reductions use 256 threads and 240 blocks.
gpuElegant: unknown n_comp=10, np=50000

Do I miss anything necessary to run gpu-elegant? or, could anyone give a hint how I should look into this issue?

Thanks in advance!
Cheng-Ying

soliday
Posts: 368
Joined: 28 May 2008, 09:15

Re: gpu-elegant seems not to finish tracking

Post by soliday » 19 Oct 2020, 23:56

I think I found the problem. There are new elegant RPMs for RHEL7 on the software page now. If you get a chance, can you try it out and let me know if it works for you? I will try to release the fix for the other operating systems tomorrow or the day after.

jcytsai
Posts: 41
Joined: 01 Oct 2012, 20:18

Re: gpu-elegant seems not to finish tracking

Post by jcytsai » 20 Oct 2020, 07:57

Dear Dr. Soliday,

Thanks for the help. I installed elegant-2020.4.0-2.rhel.7.mpich.x86_64.rpm; now the "gpuElegant: unknown" message does not show up anymore. But I still fail to run a case successfully. For example, I run LCLS (an example folder under elegantExamples) and it showed the following message:

-----------------------
gpuBaseInit: Using cuda device of compute capability 7.0.
gpuBaseInit: Particle kernels use 512 threads and 320 blocks.
gpuBaseInit: Reductions use 256 threads and 240 blocks.
20 Oct 20 08:53:24: This step establishes energy profile vs s (fiducial beam).
20 Oct 20 08:53:24: Rf phases/references reset.
Starting C#1 at s=0.000000e+00 to 0.000000e+00 m, pass 0, 10005 particles, memory 2355217 kB
Starting L0AWAKE#1 at s=0.000000e+00 to 0.000000e+00 m, pass 0, 10005 particles, memory 2355217 kB
Warning: The beam is shorter than 20*DT, where DT is the spacing of the wake points.
Depending on the longitudinal distribution and shape of the wake, this may produce poor results.
Consider using a wake with finer time spacing in WAKE elements.
warning: only 0 of 10005 particles were binned (WAKE)
consider setting n_bins=0 in WAKE definition to invoke autoscaling
0 particles transmitted, total effort of 0 particle-turns
0 multipole kicks done

Dumping output beam data...done.
Dumping centroid data...done.
Dumping sigma data...done.
Dumping final properties data...done.
Post-tracking output completed.
Tracking step completed ET: 00:00:01 CP: 0.56 BIO:0 DIO:0 PF:0 MEM:2355329

Saving lattice parameters to LCLS04NOV07.par...done.
Finished tracking.
End of input data encountered.
statistics: ET: 00:00:01 CP: 0.59 BIO:0 DIO:0 PF:0 MEM:2353254

=====================================================================================
Thanks for using gpu-elegant. Please cite the following references in your publications:
M. Borland, "elegant: A Flexible SDDS-Compliant Code for Accelerator Simulation,"
Advanced Photon Source LS-287, September 2000.
J. R. King, I. V. Pogorelov, M. Borland, R. Soliday, K. Amyx,
"Current status of the GPU-Accelerated version of elegant," Proc. IPAC15, 623 (2015).
If you use a modified version, please indicate this in all publications.
=====================================================================================

Is there anything I may miss? I originally installed elegant from source (using Build-AOP-RPMs script), but this time I installed the rpm file through yum.

Thanks!
Cheng-Ying

soliday
Posts: 368
Joined: 28 May 2008, 09:15

Re: gpu-elegant seems not to finish tracking

Post by soliday » 20 Oct 2020, 18:54

Okay, the LCLS example now runs to completion. There are new RPMs up for RHEL7 again. Turned out this was a problem with code I changed to get it to compile with CUDA 11.1. Right now the newest version of CUDA that can compile gpu-elegant is CUDA 10.2.

soliday
Posts: 368
Joined: 28 May 2008, 09:15

Re: gpu-elegant seems not to finish tracking

Post by soliday » 21 Oct 2020, 13:41

gpu-elegant has now been updated on all the operating systems we build it for.

jcytsai
Posts: 41
Joined: 01 Oct 2012, 20:18

Re: gpu-elegant seems not to finish tracking

Post by jcytsai » 23 Oct 2020, 07:56

Dear Dr. Soliday,

It runs without error message, but seems not successfully done, after trying some examples in elegantExample folder. The output files (for example, *.sig) seem to be incomplete, as it always shows

0 particles transmitted, total efforts of 0 particle-turns
0 multiple kicks done

I have excluded those elements that GPU version does not support. The same example can produce complete outputs when using elegant or Pelegant. Could you provide (or point out in the elegantExample folder) an example that can successfully run using gpu-elegant?

Thanks very much for help!
Cheng-Ying

soliday
Posts: 368
Joined: 28 May 2008, 09:15

Re: gpu-elegant seems not to finish tracking

Post by soliday » 23 Oct 2020, 10:08

Which examples are you having trouble with. I know gpu-elegant won't run with all of them but it does run with LCLS and matching/beamSizeMatch1 and finishes with:
10005 particles transmitted, total effort of 10005 particle-turns
and
1000 particles transmitted, total effort of 1000 particle-turns

Not the 0 particles like you are seeing.

jcytsai
Posts: 41
Joined: 01 Oct 2012, 20:18

Re: gpu-elegant seems not to finish tracking

Post by jcytsai » 23 Oct 2020, 19:58

Dear Dr. Soliday,

Thanks for pointing me the examples that may test gpu-elegant from my side. I tried both LCLS and matching/beamSizeMatch1, but fail to finish them successfully. They run without error message but both of them indicate 0 particles transmitted. Please let me attach the output information below for your information. I appreciate if you could give any hint how I could find out the problem. Thanks!

(for matching/beamSizeMatch1 example)
tracking step 1
generating bunch 1
tracking 1000 particles
gpuBaseInit: Using cuda device of compute capability 7.0.
gpuBaseInit: Particle kernels use 512 threads and 320 blocks.
gpuBaseInit: Reductions use 256 threads and 240 blocks.
23 Oct 20 20:45:39: This step establishes energy profile vs s (fiducial beam).
23 Oct 20 20:45:39: Rf phases/references reset.
0 particles transmitted, total effort of 0 particle-turns
0 multipole kicks done

Dumping sigma data...done.
Post-tracking output completed.


(for LCLS example)
tracking step 1
23 Oct 20 20:47:09: Starting to read beam from SDDS file.
File lcls_end_L0a_nominal.sdds opened and checked.
Read page 1 from file
0 particle ID slots per bunch
200001 rows in page 1
File lcls_end_L0a_nominal.sdds was used up and closed.
23 Oct 20 20:47:09: Done reading beam from SDDS file.
a total of 200001 data points were read

tracking 10005 particles
gpuBaseInit: Using cuda device of compute capability 7.0.
gpuBaseInit: Particle kernels use 512 threads and 320 blocks.
gpuBaseInit: Reductions use 256 threads and 240 blocks.
23 Oct 20 20:47:09: This step establishes energy profile vs s (fiducial beam).
23 Oct 20 20:47:09: Rf phases/references reset.
Starting C#1 at s=0.000000e+00 to 0.000000e+00 m, pass 0, 10005 particles, memory 2355218 kB
Starting L0AWAKE#1 at s=0.000000e+00 to 0.000000e+00 m, pass 0, 10005 particles, memory 2355218 kB
Warning: The beam is shorter than 20*DT, where DT is the spacing of the wake points.
Depending on the longitudinal distribution and shape of the wake, this may produce poor results.
Consider using a wake with finer time spacing in WAKE elements.
warning: only 0 of 10005 particles were binned (WAKE)
consider setting n_bins=0 in WAKE definition to invoke autoscaling
0 particles transmitted, total effort of 0 particle-turns
0 multipole kicks done

Dumping output beam data...done.
Dumping centroid data...done.
Dumping sigma data...done.
Dumping final properties data...done.
Post-tracking output completed.

soliday
Posts: 368
Joined: 28 May 2008, 09:15

Re: gpu-elegant seems not to finish tracking

Post by soliday » 26 Oct 2020, 13:42

Okay, I have hope that I may have fixed it. It was previously only compiling code for GPU devices that were of level 5.2 and 6.0 compute capability. You have a GPU device that uses 7.0 compute capability. I don't have hardware to test this but I was able to compile this latest release with support for all three levels of compute capability. You will find the new RHEL.7 elegant RPMs on the software page.

jcytsai
Posts: 41
Joined: 01 Oct 2012, 20:18

Re: gpu-elegant seems not to finish tracking

Post by jcytsai » 26 Oct 2020, 19:46

Dear Dr. Soliday,

Thank you for the help. I installed the latest release and now it works (successfully completed) for LCLS and matching/beamSizeMatch1 (the two examples mentioned). Then I tested my own simpler example and surprisingly found that it did not work (it runs without error message but leads to 0 particles transmitted).

I found that (at least) the element "KQUAD" seems not working for gpu-elegant. Both LCLS and matching/beamSizeMatch1 do not have KQUAD in the lattice file, so they work well. To confirm, I simply modify the fodo.lte in matching/beamSizeMatch1, replace QUAD (q1h, q2h) with KQUAD, and run with gpu-elegant. It runs without error message but leads to 0 particles transmitted. The latest manual notes that KQUAD supported by gpu version; I am not sure if there might be some bug in this compute capability (7.0) or other issue.

Anyway, gpu-elegant is a great speedup!

Thanks,
Cheng-Ying

Post Reply