Error when optimizing with Pelegant

Moderators: cyao, michael_borland

Post Reply
Björklund
Posts: 84
Joined: 19 May 2016, 07:14

Error when optimizing with Pelegant

Post by Björklund » 27 Apr 2018, 06:45

Hi,

I'm currently running a serial optimization and tracking ~60k particles in Pelegant. It runs well for a while, then crashes with the error message
SDDS_MPI_FlushBuffer(MPI_File_write_at failed): MPI_ERR_IO: input/output error
Error:
Unable to update page--file pointer is NULL (SDDS_UpdateBinaryPage)
with the last line repeating for hundreds of lines. Towards the bottom, there is another line reading
Problem writing SDDS table (dump_watch_particles)
I had a very similar crash when also running the optimization in parallel.

Any idea what this could be related to? I'll try to provide any further files necessary.

Best regards
Jonas

michael_borland
Posts: 1927
Joined: 19 May 2008, 09:33
Location: Argonne National Laboratory
Contact:

Re: Error when optimizing with Pelegant

Post by michael_borland » 27 Apr 2018, 08:14

Jonas,

I haven't seen an error like that before. Can you provide me with the input files?

What kind of filesystem are you using? (E.g., NFS, Lustre, GPFS, ...)

Finally, you might get some relief by playing with parameters of the &global_settings command, e.g., set mpi_io_force_file_sync=1 or usleep_mpi_io_kludge=10.

--Michael

Björklund
Posts: 84
Joined: 19 May 2016, 07:14

Re: Error when optimizing with Pelegant

Post by Björklund » 27 Apr 2018, 09:00

Michael,

I'm not exactly sure which partition is which on my machine, they are not assigned any label, but the file system type seems to be vfat. If it's not that, it's ext2 or the oddly named LVM2_member.

I have had problems with other things (serial) on this machine recently, and I think you got my files then (and they turned out to be mostly fine), but I'll email these specific ones to you. I have the feeling that the problem is not there, though.

I should also say that I'm running Ubuntu 16.04.4 LTS and that I'm using OpenMPI v1.10.2 and not MPICH, although both MPI's are installed. I don't know how to switch between them in a simple way, I'm fairly new to both Linux and MPI stuff.

I will give those parameters a shot too!

//Jonas

Björklund
Posts: 84
Joined: 19 May 2016, 07:14

Re: Error when optimizing with Pelegant

Post by Björklund » 27 Apr 2018, 09:09

Hi again,

The code doesn't recognize the commands you suggested, it says that the known variables for global_settings are all those listed in the manual, except those two.

//Jonas

michael_borland
Posts: 1927
Joined: 19 May 2008, 09:33
Location: Argonne National Laboratory
Contact:

Re: Error when optimizing with Pelegant

Post by michael_borland » 27 Apr 2018, 11:30

Jonas,

You'll need the most recent update (34.2.0, from March 21) to get those parameters.

--Michael

Björklund
Posts: 84
Joined: 19 May 2016, 07:14

Re: Error when optimizing with Pelegant

Post by Björklund » 04 May 2018, 08:30

Hi again,

So, I finally got around to updating the elegant version to the latest one, and this alone seemed to solve my issue; I can't reproduce the error anymore. Maybe some file got corrupted, I previously had a problem with corruption of my rpn definitions file on that machine.

Anyways, thanks for the help as always, and have a nice weekend.

//Jonas

Björklund
Posts: 84
Joined: 19 May 2016, 07:14

Re: Error when optimizing with Pelegant

Post by Björklund » 07 May 2018, 01:41

Hi again,

It seems that I spoke too soon, the problem is still there but took quite a bit longer to show up. I will try running with the two options in &global_settings, but it will take a bit of time to verify if this does anything.

//Jonas

Björklund
Posts: 84
Joined: 19 May 2016, 07:14

Re: Error when optimizing with Pelegant

Post by Björklund » 07 May 2018, 09:07

Hi,

So now I have tried with the usleep_mpi_io_kludge=10 option and got the error there too. The mpi_io_force_file_sync=1 is so slow that I'm not expecting to see it crash (if it does) for many hours. It took a few hours with the first option for it to crash, and that ran significantly faster.

I have noticed another thing too, which is perhaps related, which is that Pelegant doesn't converge when running an optimization file which converges when tracking serial. In this case, I'm optimizing in serial mode (with just &optimization_setup) and tracking in parallel, which as far as I've understood is what happens when I call the serial optimization using Pelegant. I don't know if Pelegant spits out different values for the statistical parameters that I'm using for optimization or what, but I will clean up my files and email them to you.

Even if this second thing is not directly related, I expect the code not to run for as long if it at least converges, so that the original error doesn't have time to show up.

//Jonas

Post Reply