Dear Dr. Borland,
I'm using Touschek_scatter for the Touschek loss rate calculation. But there are always something wrong when I use Pelegant to run the job.
More clearly, when I use elegant to do the calculation, it will take about 6 hours for my test job(choose small parameter for n_simulated and other parameters). It can really work and get some reasonable results. But when I take the Pelegant run using the same parameters (I didn't change anything), it's seems that the job can't give any results. The job seems stopping at some loop, something like that.
For example, the information printed by the job (I list below) keep the same for more than one day.
//********printing information************//
Working on TEST#1 at s=0.000000e+00, n_simulated = 500
Setting up loss file
Before particle generation: ET: 00:00:01 CP: 0.69 BIO:0 DIO:0 PF:0 MEM:138353
After particle generation: ET: 00:00:01 CP: 0.70 BIO:0 DIO:0 PF:0 MEM:138353
12 of 500 particles selected for tracking
After particle selection: ET: 00:00:01 CP: 0.70 BIO:0 DIO:0 PF:0 MEM:138353
//***************************************//
My Pelegant running syntax is mpirun -np 20 Pelegant touschek_B.ele
Need I add any other options to run the Pelegant? Or maybe there are some troubles in parallel communication?
Thanks and regards,
Zhilong Pan
Pelegant bug for touschek_scatter?
Moderators: cyao, michael_borland
-
- Posts: 2015
- Joined: 19 May 2008, 09:33
- Location: Argonne National Laboratory
- Contact:
Re: Pelegant bug for touschek_scatter?
Zhilong,
Sorry for the delayed reply. I was away from work.
I think the problem may just be that you are using too few particles. With 20 cores but only 12 particles, 8 cores have nothing to do. Ideally, this shouldn't cause a crash, but there may be a bug. Try increasing the number of particles. If that doesn't help, please upload (or email) the input files so I can check it.
--Michael
Sorry for the delayed reply. I was away from work.
I think the problem may just be that you are using too few particles. With 20 cores but only 12 particles, 8 cores have nothing to do. Ideally, this shouldn't cause a crash, but there may be a bug. Try increasing the number of particles. If that doesn't help, please upload (or email) the input files so I can check it.
--Michael
Re: Pelegant bug for touschek_scatter?
Dear Dr. Borland,
Thanks for your kind reply.
For touschek_scatter, yes, I have tried more particles follow your suggestion, it works! So the problem should be there. And thanks!
For momentum aperture, the attachment is the file we use for momentum aperture calculation.
Yes, we didn't set the output_mode=1, so it could be matter, I'm also checking that. Another thing is when we set load_balancing_on=1, everything seems good. We test with two different examples.
1, 4 nodes with 20 tasks per node( 4*20 cores)
2, 2 nodes with 20 tasks per node( 2*20 cores)
Finally example 2 costs double time than example 1. but when we set load_balancing_on=0, the two examples cost the same time. So does that option also matter for the Pelegant?
Thanks and regards
Zhilong Pan
Thanks for your kind reply.
For touschek_scatter, yes, I have tried more particles follow your suggestion, it works! So the problem should be there. And thanks!
For momentum aperture, the attachment is the file we use for momentum aperture calculation.
Yes, we didn't set the output_mode=1, so it could be matter, I'm also checking that. Another thing is when we set load_balancing_on=1, everything seems good. We test with two different examples.
1, 4 nodes with 20 tasks per node( 4*20 cores)
2, 2 nodes with 20 tasks per node( 2*20 cores)
Finally example 2 costs double time than example 1. but when we set load_balancing_on=0, the two examples cost the same time. So does that option also matter for the Pelegant?
Thanks and regards
Zhilong Pan
- Attachments
-
- touscat_A.ele
- momentun aperture file
- (1.38 KiB) Downloaded 1716 times
-
- Posts: 2015
- Joined: 19 May 2008, 09:33
- Location: Argonne National Laboratory
- Contact:
Re: Pelegant bug for touschek_scatter?
Zhilong,
The load_balancing_on parameter should not make any difference for the momentum_aperture command. I tested to be sure, and confirmed that the run times are unaffected.
Not sure what happened in your runs, but there must be another cause.
--Michael
The load_balancing_on parameter should not make any difference for the momentum_aperture command. I tested to be sure, and confirmed that the run times are unaffected.
Not sure what happened in your runs, but there must be another cause.
--Michael
Re: Pelegant bug for touschek_scatter?
Hi Michael,
Thanks so much! I will double check that again.
For the touschek_scatter, I have change the macro particle to 10000, and the selected particle is 343, large enough than the tasks number 40. The trouble is still there as I mentioned before. it just stop as below.
“”"
Working on TEST#1 at s=0.000000e+00, n_simulated = 10000
Setting up loss file
Before particle generation: ET: 00:00:01 CP: 0.80 BIO:0 DIO:0 PF:0 MEM:142601
After particle generation: ET: 00:00:01 CP: 0.92 BIO:0 DIO:0 PF:0 MEM:142601
343 of 10000 particles selected for tracking
After particle selection: ET: 00:00:01 CP: 0.92 BIO:0 DIO:0 PF:0 MEM:142727
The balance is in bad status. The fastest time (id=6) is 2.117973e-01
The slowest time (id=11) is 3.464404e-01
We need redistribute the particles. The difference is 63.57 percent
Gathering particles to master from 39 processors
Finished gathering particles to master: ET: 00:00:01 CP: 1.56 BIO:0 DIO:0 PF:0 MEM:143564
"""
I have attached the file to calculate the touschek loss rate. And could you have a quick look to see whether there are some mistakes in the file?
The momentum aperture was calculated from the touschek_A.ele file I attached before.
Thanks so much!
Zhilong Pan
Thanks so much! I will double check that again.
For the touschek_scatter, I have change the macro particle to 10000, and the selected particle is 343, large enough than the tasks number 40. The trouble is still there as I mentioned before. it just stop as below.
“”"
Working on TEST#1 at s=0.000000e+00, n_simulated = 10000
Setting up loss file
Before particle generation: ET: 00:00:01 CP: 0.80 BIO:0 DIO:0 PF:0 MEM:142601
After particle generation: ET: 00:00:01 CP: 0.92 BIO:0 DIO:0 PF:0 MEM:142601
343 of 10000 particles selected for tracking
After particle selection: ET: 00:00:01 CP: 0.92 BIO:0 DIO:0 PF:0 MEM:142727
The balance is in bad status. The fastest time (id=6) is 2.117973e-01
The slowest time (id=11) is 3.464404e-01
We need redistribute the particles. The difference is 63.57 percent
Gathering particles to master from 39 processors
Finished gathering particles to master: ET: 00:00:01 CP: 1.56 BIO:0 DIO:0 PF:0 MEM:143564
"""
I have attached the file to calculate the touschek loss rate. And could you have a quick look to see whether there are some mistakes in the file?
The momentum aperture was calculated from the touschek_A.ele file I attached before.
Thanks so much!
Zhilong Pan
- Attachments
-
- touscat_B.ele
- (1.32 KiB) Downloaded 1694 times
-
- Posts: 2015
- Joined: 19 May 2008, 09:33
- Location: Argonne National Laboratory
- Contact:
Re: Pelegant bug for touschek_scatter?
Zhilong,
To check this further, I would need your lattice file as well. You can email it to me if you'd rather not post it.
My gut feeling is that you are using far too few particles for an accurate Touschek scattering simulation. We would typically use ~1 million.
--Michael
To check this further, I would need your lattice file as well. You can email it to me if you'd rather not post it.
My gut feeling is that you are using far too few particles for an accurate Touschek scattering simulation. We would typically use ~1 million.
--Michael
Re: Pelegant bug for touschek_scatter?
Thanks Michael,
I will email the lattice file to you. Yes, the particle number we use is not the real value to get the true touschek loss rate. We just want to get some test first so we use the number so small.
Regards,
Zhilong
I will email the lattice file to you. Yes, the particle number we use is not the real value to get the true touschek loss rate. We just want to get some test first so we use the number so small.
Regards,
Zhilong