Errors: MPI_File_open failed / No such file or directory
Posted: 19 Nov 2009, 03:45
Hi,
We are trying to run Pelegant on our linux cluster at NSRRC (Taiwan), but some run-time errors appear.
SDDS and elegant were succesfuly installed from binary packages:
SDDSPython-2.5.1-1.i386.rpm
SDDSToolKit-2.6-1.fc5.x86_64.rpm
elegant-22.0.1-1.x86_64.rpm
We are using mpich2-1.1 and we want to run Pelegant on 3 nodes so far:
virus (master node), node28, node29
mpdtrace shows that the ring is running:
> mpdtrace
virus
node28
node29
I can successfully run example program cpi:
> mpiexec -n 3 ./cpi
Process 0 of 3 is on virus
Process 2 of 3 is on node29
Process 1 of 3 is on node28
pi is approximately 3.1415926544231323, Error is 0.0000000008333392
wall clock time = 0.003630
So it seems there are no problems with mpd, but when I try to run Pelegant in similar way it produces two kinds of errors depending on where the Pelegant is installed.
1) Pelegant is installed in /usr/bin/Pelegant only on the master node (virus) and is not accessible on node28 and on node29
> mpiexec -n 3 Pelegant ./tps79h2wake.ele
problem with execution of Pelegant on node28: [Errno 2] No such file or directory
problem with execution of Pelegant on node29: [Errno 2] No such file or directory
2) Pelegant is installed in the /home/yam03/usr/bin/Pelegant and is accessible form all three nodes: virus, node28, node29
> mpiexec -n 3 /home/yam03/usr/bin/Pelegant ./tps79h2wake.ele
This is elegant 22.0.1, Jun 3 2009, by M. Borland, W. Guo, V. Sajaev, Y. Wang, Y. Wu, and A. Xiao.
Parallelized by Y. Wang, H. Shang, and M. Borland.
Link date: Jun 3 2009 16:50:15
statistics: ET: 00:00:00 CP: 0.00 BIO:0 DIO:0 PF:0 MEM:1396
&run_setup
lattice = tps79h2wake.lte,
...
dumping bunch
tracking 200000 particles
MPI_File_open failed: Error message texts are not available
rank 2 in job 1 virus_56144 caused collective abort of all ranks
exit status of rank 2: killed by signal 9
rank 0 in job 1 virus_56144 caused collective abort of all ranks
exit status of rank 0: killed by signal 9
Could you please provide any help.
Thank you,
Andriy
We are trying to run Pelegant on our linux cluster at NSRRC (Taiwan), but some run-time errors appear.
SDDS and elegant were succesfuly installed from binary packages:
SDDSPython-2.5.1-1.i386.rpm
SDDSToolKit-2.6-1.fc5.x86_64.rpm
elegant-22.0.1-1.x86_64.rpm
We are using mpich2-1.1 and we want to run Pelegant on 3 nodes so far:
virus (master node), node28, node29
mpdtrace shows that the ring is running:
> mpdtrace
virus
node28
node29
I can successfully run example program cpi:
> mpiexec -n 3 ./cpi
Process 0 of 3 is on virus
Process 2 of 3 is on node29
Process 1 of 3 is on node28
pi is approximately 3.1415926544231323, Error is 0.0000000008333392
wall clock time = 0.003630
So it seems there are no problems with mpd, but when I try to run Pelegant in similar way it produces two kinds of errors depending on where the Pelegant is installed.
1) Pelegant is installed in /usr/bin/Pelegant only on the master node (virus) and is not accessible on node28 and on node29
> mpiexec -n 3 Pelegant ./tps79h2wake.ele
problem with execution of Pelegant on node28: [Errno 2] No such file or directory
problem with execution of Pelegant on node29: [Errno 2] No such file or directory
2) Pelegant is installed in the /home/yam03/usr/bin/Pelegant and is accessible form all three nodes: virus, node28, node29
> mpiexec -n 3 /home/yam03/usr/bin/Pelegant ./tps79h2wake.ele
This is elegant 22.0.1, Jun 3 2009, by M. Borland, W. Guo, V. Sajaev, Y. Wang, Y. Wu, and A. Xiao.
Parallelized by Y. Wang, H. Shang, and M. Borland.
Link date: Jun 3 2009 16:50:15
statistics: ET: 00:00:00 CP: 0.00 BIO:0 DIO:0 PF:0 MEM:1396
&run_setup
lattice = tps79h2wake.lte,
...
dumping bunch
tracking 200000 particles
MPI_File_open failed: Error message texts are not available
rank 2 in job 1 virus_56144 caused collective abort of all ranks
exit status of rank 2: killed by signal 9
rank 0 in job 1 virus_56144 caused collective abort of all ranks
exit status of rank 0: killed by signal 9
Could you please provide any help.
Thank you,
Andriy