Occasional errors while simulating ion-effects

Moderators: cyao, michael_borland

Post Reply
Siwei_Wang
Posts: 21
Joined: 27 Jun 2017, 07:28

Occasional errors while simulating ion-effects

Post by Siwei_Wang » 07 Dec 2022, 06:25

Dear all,

When I run ion-effects simulation, sometimes I encounter the following errors:

*** Error in `/dls/physics/wsw/ELEGANT2022/2022/bin/Pelegant': realloc(): invalid old size: 0x0000000001c48870 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7f474)[0x2ac4b92f6474]
/lib64/libc.so.6(+0x84861)[0x2ac4b92fb861]
/lib64/libc.so.6(realloc+0x1d2)[0x2ac4b92fce12]
/dls/physics/wsw/ELEGANT2022/2022/bin/Pelegant[0x68fd73]
(.....)
/lib64/libc.so.6(__libc_start_main+0xf5)[0x2ac4b9299555]
/dls/physics/wsw/ELEGANT2022/2022/bin/Pelegant[0x40a890]
======= Memory map: ========
00400000-01265000 r-xp 00000000 00:2e 6731606190 /dls/physics/wsw/ELEGANT2022/2022/bin/Pelegant
01464000-0146c000 r--p 00e64000 00:2e 6731606190 /dls/physics/wsw/ELEGANT2022/2022/bin/Pelegant
0146c000-014d1000 rw-p 00e6c000 00:2e 6731606190 /dls/physics/wsw/ELEGANT2022/2022/bin/Pelegant
(.....)

There are several observations:
1) This occurs more easily when I run the job on more cores than less cores, say, if I want to run on 40 cores of a whole node. But when I reduce to 16 cores or less, it's less easily to occur.
With less cores (say, 10 or 16), there is a chance of seeing this error, i.e., when I submit the job to the cluster, it fails with the above error. And I submit the same job again, it succeeds.
2) This only happens to Pelegant, when I run the same job with serial elegant, there's no problem at all.
3) At first I was thinking it's a cluster problem, but I saw the same error when I submit the job to another cluster. Exactly the same phenomenon is observed.
4) I tried this script with different versions of ELEGANT: 2021.4, 2022.1, 2022.2, and the same error.

I have attached one of my scripts below. For my simulation, I'm using a single ILMATRIX element and one ion-effects element per turn. Could you help me look at it and see what settings I could use to avoid this error. The log files of the above error could be found in the 'logs' folder.

Many thanks,
Siwei
Attachments
iontest.zip
(495.04 KiB) Downloaded 55 times

soliday
Posts: 390
Joined: 28 May 2008, 09:15

Re: Occasional errors while simulating ion-effects

Post by soliday » 07 Dec 2022, 10:33

Can you run Pelegant with no options to get it to print out the usage message that contains the MPI version information and send that to me at soliday@anl.gov

Thanks,
--Bob Soliday

Siwei_Wang
Posts: 21
Joined: 27 Jun 2017, 07:28

Re: Occasional errors while simulating ion-effects

Post by Siwei_Wang » 19 Dec 2022, 11:46

Dear Soliday,

I seems to find a way of overcoming this problem. Currently in my pressure file, I have only two lines of data corresponding the start and end of the lattice. After I duplicate the pressure data in some middle points, making 11 lines of the total data, it seems the problem is gone. I could successfully run the job on 40 cores when previously it failed. Very interesting though.....

I'll do more tests to see if I would see the same problem again.

Best regards,
Siwei

Post Reply