Page 1 of 1

Occasional errors while simulating ion-effects

Posted: 07 Dec 2022, 06:25
by Siwei_Wang
Dear all,

When I run ion-effects simulation, sometimes I encounter the following errors:

*** Error in `/dls/physics/wsw/ELEGANT2022/2022/bin/Pelegant': realloc(): invalid old size: 0x0000000001c48870 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7f474)[0x2ac4b92f6474]
/lib64/libc.so.6(+0x84861)[0x2ac4b92fb861]
/lib64/libc.so.6(realloc+0x1d2)[0x2ac4b92fce12]
/dls/physics/wsw/ELEGANT2022/2022/bin/Pelegant[0x68fd73]
(.....)
/lib64/libc.so.6(__libc_start_main+0xf5)[0x2ac4b9299555]
/dls/physics/wsw/ELEGANT2022/2022/bin/Pelegant[0x40a890]
======= Memory map: ========
00400000-01265000 r-xp 00000000 00:2e 6731606190 /dls/physics/wsw/ELEGANT2022/2022/bin/Pelegant
01464000-0146c000 r--p 00e64000 00:2e 6731606190 /dls/physics/wsw/ELEGANT2022/2022/bin/Pelegant
0146c000-014d1000 rw-p 00e6c000 00:2e 6731606190 /dls/physics/wsw/ELEGANT2022/2022/bin/Pelegant
(.....)

There are several observations:
1) This occurs more easily when I run the job on more cores than less cores, say, if I want to run on 40 cores of a whole node. But when I reduce to 16 cores or less, it's less easily to occur.
With less cores (say, 10 or 16), there is a chance of seeing this error, i.e., when I submit the job to the cluster, it fails with the above error. And I submit the same job again, it succeeds.
2) This only happens to Pelegant, when I run the same job with serial elegant, there's no problem at all.
3) At first I was thinking it's a cluster problem, but I saw the same error when I submit the job to another cluster. Exactly the same phenomenon is observed.
4) I tried this script with different versions of ELEGANT: 2021.4, 2022.1, 2022.2, and the same error.

I have attached one of my scripts below. For my simulation, I'm using a single ILMATRIX element and one ion-effects element per turn. Could you help me look at it and see what settings I could use to avoid this error. The log files of the above error could be found in the 'logs' folder.

Many thanks,
Siwei

Re: Occasional errors while simulating ion-effects

Posted: 07 Dec 2022, 10:33
by soliday
Can you run Pelegant with no options to get it to print out the usage message that contains the MPI version information and send that to me at soliday@anl.gov

Thanks,
--Bob Soliday

Re: Occasional errors while simulating ion-effects

Posted: 19 Dec 2022, 11:46
by Siwei_Wang
Dear Soliday,

I seems to find a way of overcoming this problem. Currently in my pressure file, I have only two lines of data corresponding the start and end of the lattice. After I duplicate the pressure data in some middle points, making 11 lines of the total data, it seems the problem is gone. I could successfully run the job on 40 cores when previously it failed. Very interesting though.....

I'll do more tests to see if I would see the same problem again.

Best regards,
Siwei