Errors: MPI_File_open failed / No such file or directory

Moderators: cyao, michael_borland

Post Reply
arusanov
Posts: 8
Joined: 14 Jun 2009, 20:07
Location: National Synchrotron Radiation Research Center

Errors: MPI_File_open failed / No such file or directory

Post by arusanov » 19 Nov 2009, 03:45

Hi,

We are trying to run Pelegant on our linux cluster at NSRRC (Taiwan), but some run-time errors appear.

SDDS and elegant were succesfuly installed from binary packages:
SDDSPython-2.5.1-1.i386.rpm
SDDSToolKit-2.6-1.fc5.x86_64.rpm
elegant-22.0.1-1.x86_64.rpm

We are using mpich2-1.1 and we want to run Pelegant on 3 nodes so far:
virus (master node), node28, node29

mpdtrace shows that the ring is running:
> mpdtrace
virus
node28
node29

I can successfully run example program cpi:
> mpiexec -n 3 ./cpi
Process 0 of 3 is on virus
Process 2 of 3 is on node29
Process 1 of 3 is on node28
pi is approximately 3.1415926544231323, Error is 0.0000000008333392
wall clock time = 0.003630

So it seems there are no problems with mpd, but when I try to run Pelegant in similar way it produces two kinds of errors depending on where the Pelegant is installed.

1) Pelegant is installed in /usr/bin/Pelegant only on the master node (virus) and is not accessible on node28 and on node29
> mpiexec -n 3 Pelegant ./tps79h2wake.ele
problem with execution of Pelegant on node28: [Errno 2] No such file or directory
problem with execution of Pelegant on node29: [Errno 2] No such file or directory

2) Pelegant is installed in the /home/yam03/usr/bin/Pelegant and is accessible form all three nodes: virus, node28, node29
> mpiexec -n 3 /home/yam03/usr/bin/Pelegant ./tps79h2wake.ele
This is elegant 22.0.1, Jun 3 2009, by M. Borland, W. Guo, V. Sajaev, Y. Wang, Y. Wu, and A. Xiao.
Parallelized by Y. Wang, H. Shang, and M. Borland.
Link date: Jun 3 2009 16:50:15
statistics: ET: 00:00:00 CP: 0.00 BIO:0 DIO:0 PF:0 MEM:1396
&run_setup
lattice = tps79h2wake.lte,
...
dumping bunch
tracking 200000 particles
MPI_File_open failed: Error message texts are not available
rank 2 in job 1 virus_56144 caused collective abort of all ranks
exit status of rank 2: killed by signal 9
rank 0 in job 1 virus_56144 caused collective abort of all ranks
exit status of rank 0: killed by signal 9

Could you please provide any help.

Thank you,
Andriy

michael_borland
Posts: 1927
Joined: 19 May 2008, 09:33
Location: Argonne National Laboratory
Contact:

Re: Errors: MPI_File_open failed / No such file or directory

Post by michael_borland » 19 Nov 2009, 09:23

I've never seen this particular problem, but I'd guess the issue is that all the nodes need access to the input files, because the master and slaves all ready the input files independently.

--Michael

arusanov
Posts: 8
Joined: 14 Jun 2009, 20:07
Location: National Synchrotron Radiation Research Center

Re: Errors: MPI_File_open failed / No such file or directory

Post by arusanov » 22 Nov 2009, 19:37

I'd guess the issue is that all the nodes need access to the input files, because the master and slaves all ready the input files independently.
Should all the nodes have access to the Pelegant executable file as well?

michael_borland
Posts: 1927
Joined: 19 May 2008, 09:33
Location: Argonne National Laboratory
Contact:

Re: Errors: MPI_File_open failed / No such file or directory

Post by michael_borland » 22 Nov 2009, 21:06

Yes, I think so.

--Michael

arusanov
Posts: 8
Joined: 14 Jun 2009, 20:07
Location: National Synchrotron Radiation Research Center

Re: Errors: MPI_File_open failed / No such file or directory

Post by arusanov » 22 Nov 2009, 21:17

ok, thank you very much for the advise.

arusanov
Posts: 8
Joined: 14 Jun 2009, 20:07
Location: National Synchrotron Radiation Research Center

Re: Errors: MPI_File_open failed / No such file or directory

Post by arusanov » 03 Dec 2009, 22:04

Dear Dr. Borland,

We report successful installation of Pelegant on Linux cluster (10 nodes, 64-bits, quad-core CPUs) at NSRRC.

Both mpich2 and Pelegant have been compiled from the source code to ensure their mutual compatibility. Previously reported run-time errors have been eliminated:

1) problem with execution of Pelegant on node28: [Errno 2] No such file or directory
Solution: executables of both mpich2 and Pelegant should be accessible from every node.

2) MPI_File_open failed: Error message texts are not available
Solution: SDDSlib should be build against particular version of mpich2.

Thanks for the help.
Andriy Rusanov,
Ping-Jung Chou

michael_borland
Posts: 1927
Joined: 19 May 2008, 09:33
Location: Argonne National Laboratory
Contact:

Re: Errors: MPI_File_open failed / No such file or directory

Post by michael_borland » 04 Dec 2009, 13:38

Andriy and Ping,

That's great! Thanks for sharing your experience.

--Michael

Post Reply