Page 1 of 1

MPI_Barrier

Posted: 06 May 2011, 15:53
by JoelFrederico
I've noticed there's a strange crash at the end of this simulation:

Code: Select all

=====================================================================================
Thanks for using Pelegant.  Please cite the following references in your publications:
  M. Borland, "elegant: A Flexible SDDS-Compliant Code for Accelerator Simulation,"
  Advanced Photon Source LS-287, September 2000.
  Y. Wang and M. Borland, "Pelegant: A Parallel Accelerator Simulation Code for  
  Electron Generation and Tracking, Proceedings of the 12th Advanced Accelerator  
  Concepts Workshop, 2006.
If you use a modified version, please indicate this in all publications.
=====================================================================================
*** An error occurred in MPI_Barrier
*** after MPI was finalized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
*** An error occurred in MPI_Barrier
*** after MPI was finalized
[oak068:26519] Abort after MPI_FINALIZE completed successfully; not able to guarantee that all other processes were killed!
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[oak068:26520] Abort after MPI_FINALIZE completed successfully; not able to guarantee that all other processes were killed!
--------------------------------------------------------------------------
mpirun has exited due to process rank 1 with PID 26518 on
node oak068 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------

TID  HOST_NAME    COMMAND_LINE            STATUS            TERMINATION_TIME
==== ========== ================  =======================  ===================
0001 oak068     mympirun_wrapper  Exit (1)                 05/06/2011 13:47:48
0002 oak068     mympirun_wrapper  Exit (1)                 05/06/2011 13:47:48
It seems to have something to do with the run_control settings (possibly first_is_fiducial), although I can't seem to figure out exactly what's triggering it. It doesn't seem to be a huge problem as far as the sim goes, since I haven't had a problem with the final data being written, but my guess is it's not cleaning up correctly.

http://www.stanford.edu/~joelfred/Peleg ... ier.tar.gz

Re: MPI_Barrier

Posted: 06 May 2011, 16:06
by ywang25
Joel,

It appears the memory is not handled properly somewhere in the code.
Can you check the file you uploaded? I downloaded and got 0 byte.

Thanks,

Yusong

Re: MPI_Barrier

Posted: 06 May 2011, 16:14
by JoelFrederico
Sorry, went over my quota. It should be there now.

Re: MPI_Barrier

Posted: 11 May 2011, 08:56
by ywang25
Joel,

I tested your example on two clusters: both have Red Hat 4.1.2 installed. One test uses MVAPICH2 1.4.0rc1 with Infiniband network and the other uses MPICH2 version 1.2.1. The problem your described didn't show on both cases. I also used a memory debugger and it didn't report any problem for this simple simulation.

In both tests, the Pelegant is built natively on each of the clusters. I am afraid there could be minor portability issue if the environment you ran is not the same as where Pelegant was built.

Yusong

Re: MPI_Barrier

Posted: 17 May 2011, 18:16
by JoelFrederico
Thanks, Yusong,

Pelegant was compiled for this system, it's using NFS for the file system, and OpenMPI with Infiniband for communication. Since it's not causing problems in the simulation as far as I can tell, I won't worry about it. It seems reproducible with given settings, but I haven't really invested in finding which setting causes it. I'll let you know if it becomes a problem or I find a pattern.

JOel