Example install of Pelegant from source ... and questions

Moderators: cyao, michael_borland

Post Reply
astecpete
Posts: 35
Joined: 24 Jul 2008, 04:01
Location: Daresbury Laboratory, UK

Example install of Pelegant from source ... and questions

Post by astecpete » 10 Jun 2009, 11:21

Hi,

I've been trying to build Pelegant (and SDDS). I thought it would be useful to post our experience. I have a question for you at the end. Here's what we did...

To build Pelegant on lancs2.nw-grid.ac.uk we needed to make the following changes from the instructions at http://www.aps.anl.gov/Accelerator_Syst ... inux.shtml ...

1. Build a shared version of LAPACK:
Download from here http://www.netlib.org/lapack/
Follow the instructions to build it shared here http://icl.cs.utk.edu/lapack-forum/view ... ?f=2&t=908

2. To build SDDS:
In epics/extensions/configure/os/CONFIG_SITE.linux-x86_64.linux-x86_64
we add the lines
USR_CFLAGS+=-L/usr/lib64/curses -L/usr/lib64
USR_LDFLAGS+=-L/usr/lib64/curses -L/usr/lib64 -L/usr/local/lib64
and comment in the lines
MOTIF_LIB=/usr/lib64
MOTIF_INC=/usr/include

Now "make" in /epics/extension/configure.

Then go to epics/extensions/src/SDDS/SDDSlib/Makefile.OAG
Delete the line MPI=0
(!)

Then go to epics/extensions/lib/linux-x86_64 and do
ln -s /usr/lib64/libpng.a libpng.a

Then in /epics/extensions/src/SDDS/SDDSaps/sddsplots/Makefile.OAG
comment out the line
STATIC_LDFLAGS += -L/usr/lib

Now SDDS should build.

2. To build Elegant:

Go to oag/apps/configure/CONFIG
Change SHARED_LIBRARIES=NO to SHARED_LIBRARIES=YES

Go to oag/apps/configure/os/CONFIG_SITE.linux-x86_64.linux-x86_64
Change STATIC_BUILD=YES to STATIC_BUILD=NO
Change the last two lines so it can find the lapack you've just built...
LDLIBS_READLINE = -L/usr/lib64/curses -lreadline -lcurses
USR_LDFLAGS+= -L/home/dlphw/lapack-3.2.1

3. module load openmpi

4. On test i find that Pelegant is unable to read in SDDS distributions in ascii format. Do an sddsconvert -binary on the input distribution to read it in binary format.

5. I now have a MPI_File_open failed: MPI_ERR_IO: input/output error
when it tried to write a file ... investigating


... that's what we did. It crashes when writing output files. Serial elegant works ok though. Is it something to do with removing the MPI=0 line, or some static vs shared library inconsistancy. Or something???

soliday
Posts: 390
Joined: 28 May 2008, 09:15

Re: Example install of Pelegant from source ... and questions

Post by soliday » 10 Jun 2009, 16:21

I see a lot of the issues you are having are with the 64 bit paths. It would probably be useful to post what flavor of Linux you are using so I could setup a test computer to try to fix the build rules without having to create symbolic links to work around these issues.

LAPACK and BLAS are available as prebuilt packages for most versions of Linux. But there is nothing wrong with building your own.

With this release we have added MPI (parallel) implementation of the SDDS I/O routines. After building the SDDS tree from the top level you need to go down to SDDS/SDDSlib and run:
make clean
make MPI=1

This will rebuild this directory with MPI enabled. It will create the libSDDSmpi.so library. You probably will first need to edit the Makefile.OAG file in this directory to set the MPI_PATH variable for your system.

Then just rebuild Pelegant and let me know if the problem is resolved.

ywang25
Posts: 52
Joined: 10 Jun 2008, 19:48

Re: Example install of Pelegant from source ... and questions

Post by ywang25 » 10 Jun 2009, 19:49

Pelegant does require the input SDDS file in binary format partially because the offset calculation for parallel I/O. On the other hand, Pelegant is designed for large scale simulation, binary data is more efficient and occupies less disk space.
Under SDDS/SDDSlib directory, make clean; make MPI=1; are required to build parallel SDDS library for Pelegant. It is not required to modify Makefile.OAG to set up MPI_PATH if the MPI executables are already in your PATH.
Meanwhile, you can still download the 32-bits binary version of Pelegant and run it on your 64-bits LINUX platform if you installed MPICH2 with its default configuration. The script below (or you can download it from my previous post in this discuss section) should do the job.

#!/bin/bash

# Script to install MPICH2 on Linux
# by Yusong Wang, 2009


MPICH2_version=1.1

install_dir=`pwd`

check_input () {
read
while [ $REPLY != "yes" ] && [ $REPLY != "no" ]
do
echo "Please type yes or no"
read
done
}

# Install MPICH2 (Internet connection required)

echo -n "Do you want to install MPICH2 at ${install_dir} (yes/no)? "
check_input
if [ $REPLY = "yes" ]
then
wget http://www.mcs.anl.gov/research/projects/mpich2/downloads/tarballs/${MPICH2_version}/\
mpich2-${MPICH2_version}.tar.gz
tar xzf mpich2-${MPICH2_version}.tar.gz
pushd mpich2-${MPICH2_version}
./configure --prefix=${install_dir}/mpich2-install 2>&1 | tee c.txt
make 2>&1 | tee log.txt
popd
mpi_dir=${install_dir}/mpich2-${MPICH2_version}/bin
export PATH=$PATH:${mpi_dir}
echo "MPICH2 binaries has been installed at ${mpi_dir}"
else
echo "You chose not to install MPICH2. Please make sure MPI binaries are in your PATH"
echo "Press \"Enter\" to continue."
read
fi

astecpete
Posts: 35
Joined: 24 Jul 2008, 04:01
Location: Daresbury Laboratory, UK

Re: Example install of Pelegant from source ... and questions

Post by astecpete » 11 Jun 2009, 10:38

Hi - thanks for your help

I'm building on a set of clusters on the grid and won't have root access for any of them so i don't really want to use rpms. The administrators recommended building my own blas and lapack. They seem ok.

I've done a full reinstall of SDDS and Pelegant following your instructions, like this...

1. In epics/extensions/configure/os/CONFIG_SITE.linux-x86_64.linux-x86_64
we add the lines at the end
USR_CFLAGS+=-L/usr/lib64/curses -L/usr/lib64
USR_LDFLAGS+=-L/usr/lib64/curses -L/usr/lib64 -L/usr/local/lib64 –L/home/dlphw/lapack-3.2.1
and comment in the lines
MOTIF_LIB=/usr/lib64
MOTIF_INC=/usr/include
2. Make in epics/extensions/configure
3. Make in epics/extensions/src/SDDS
4. Make clean in epics/extensions/src/SDDS/SDDSlib, followed by make MPI=1
5. In oag/apps/configure/RELEASE.linux-x86_64
EPICS_BASE=/home/dlphw/epics/base
6. In oag/apps/configure/os/CONFIG_SITE.linux-x86_64.linux-x86_64 change
STATIC_BUILD to NO
LD_LIBS_READLINE+= -L/usr/lib64/curses –lreadline –lcurses –L/home/dlphw/lapack-3.2.1 –llapack
7. Make in oag/apps/configure
8. Make in oag/apps/src/physics
9. Make Pelegant in oag/apps/src/elegant.

I still have the same runtime error:
MPI_FILE_open failed: MPI_ERR_IO: input/output error

On this machine uname -a says:
Linux umbra 2.6.22.17-0.1-default #1 SMP 2008/02/10 20:01:04 UTC x86_64 x86_64 x86_64 GUN/Linux
The version of mpi i have loaded is openmpi/1.2.7 and have built it with this loaded. Won't it just build against this version?
The SGE queues are set up for openmpi. There is also an mpich2 parallel environment on there, but the sysadmins say its broken and have disabled it.

soliday
Posts: 390
Joined: 28 May 2008, 09:15

Re: Example install of Pelegant from source ... and questions

Post by soliday » 11 Jun 2009, 16:44

I downloaded openmpi-1.2.7 and built it using the default options. I then built Pelegant against it. I also am seeing issues with it under openmpi. But the error message is different:

[max15:25834] *** An error occurred in MPI_Bcast
[max15:25834] *** on communicator MPI_COMM_WORLD
[max15:25834] *** MPI_ERR_ROOT: invalid root
[max15:25834] *** MPI_ERRORS_ARE_FATAL (goodbye)

It looks like I probably didn't install openmpi correctly. I'll try again on another system to see if I have any better luck.

ywang25
Posts: 52
Joined: 10 Jun 2008, 19:48

Re: Example install of Pelegant from source ... and questions

Post by ywang25 » 14 Jun 2009, 22:56

I have successfully built and run Pelegant (22.01) with SDDS (2.6) on my dual-core Linux laptop with the latest version of openmpi (1.3.2). Both result and speedup are reasonable.

A couple of things you can check:
1) "In general, Open MPI requires that its executables are in your PATH on
every node that you will run on and if Open MPI was compiled as dynamic
libraries (which is the default), the directory where its libraries are
located must be in your LD_LIBRARY_PATH on every node"
This can be easily check with the hello_c example from openmpi distribution

2) Open MPI also uses MPICH2 I/O, but the most recent release 1.3.2 still uses MPICH2 I/O (ROMIO) from MPICH2 1.0.7
while MPICH2 has version 1.1 release already. openmpi/1.2.7 should use even older version of parallel I/O implementation. An update of openmpi is recommended if this causes the problem.

3) make sure the build environment (for both SDDS and Pelegant ) and runtime environment use same version of MPI implementation:
#!/bin/bash
which mpicc
echo $LD_LIBRARY_PATH
which Pelegant
ldd Pelegant


run the above script on both head node (where you build Pelegant) and compute node (where you run Pelegant) by submitting the script and make sure they are consistent. You should check if Pelegant points to the right version of shared library, especially for the SDDS and MPI libraries.

soliday
Posts: 390
Joined: 28 May 2008, 09:15

Re: Example install of Pelegant from source ... and questions

Post by soliday » 15 Jun 2009, 15:40

I have built and run Pelegant with openmpi-1.2.7.
I was not able to make a static version but if your system is similar to our Fedora 5 64bit system you may be able to use it.
It is at http://www.aps.anl.gov/asd/oag/download ... nmpi-1.2.7

When I look at what libraries it links to I see:

[soliday@max1 O.linux-x86_64]$ ldd Pelegant_64bit_openmpi-1.2.7
libz.so.1 => /usr/lib64/libz.so.1 (0x0000003c30000000)
liblapack.so.3 => /usr/lib64/atlas/liblapack.so.3 (0x0000003c50e00000)
libblas.so.3 => /usr/lib64/atlas/libblas.so.3 (0x0000003c50500000)
libgfortran.so.1 => /usr/lib64/libgfortran.so.1 (0x0000003c50300000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003c31c00000)
libreadline.so.5 => /usr/lib64/libreadline.so.5 (0x0000003c30200000)
libncurses.so.5 => /usr/lib64/libncurses.so.5 (0x0000003c39000000)
librt.so.1 => /lib64/librt.so.1 (0x0000003c35d00000)
libmpi_cxx.so.0 => not found
libmpi.so.0 => not found
libopen-rte.so.0 => not found
libopen-pal.so.0 => not found
libdl.so.2 => /lib64/libdl.so.2 (0x0000003c2fe00000)
libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003c33d00000)
libutil.so.1 => /lib64/libutil.so.1 (0x0000003c36500000)
libm.so.6 => /lib64/libm.so.6 (0x0000003c2fc00000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x0000003c50000000)
libc.so.6 => /lib64/libc.so.6 (0x0000003c2f900000)
/lib64/ld-linux-x86-64.so.2 (0x0000003c2f700000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003c4fe00000)

The 4 libraries that are not found are openmpi libraries. You can either set the LD_LIBRARY_PATH like Yusong suggested or you can just run the openmpi-1.2.7/bin/mpirun command and it will add this environment variable on the fly.

astecpete
Posts: 35
Joined: 24 Jul 2008, 04:01
Location: Daresbury Laboratory, UK

Re: Example install of Pelegant from source ... and questions

Post by astecpete » 17 Jun 2009, 03:40

Thanks very much to you both for your help on this! Yusong - your trick of submitting ldd to the nodes was really useful.

Unfortunately (or perhaps fortunately) the system i was on is now down for upgrading. I suspect my problems may have been due to some inconsistencies in the libraries that were seen from the head node with respect to the compute nodes.

However, I've done another setup on a similar cluster and had exactly this problem - some libraries couldn't be seen from compute nodes. In this case they were curses and termcap. I've temporarily "fixed" this by just copying them over to where i keep the lapack libraries - which can be seen from compute nodes. I now have a working install!!! YAY

I also informed the sysadmins so they can make the proper libraries visible from the nodes, but have got a reply that i can't pretend to fully understand ...
There's something rather wrong with this. You surely shouldn't be
linking against curses and ncurses -- presumably they will fight over
the terminal. [I didn't think there actually was a free curses
implementation off of DOS, so I'm puzzled anyway.] I think the only
reason termcap is on the head (assuming we're talking lv3) is for a
semi-proprietary program I didn't know was there, and would be inclined
to get rid of.

I reckon you should expurgate curses and link against acml for the
linear algebra. I don't think installing termcap is the right thing to
do, and the system will be blown away before too long, at which point
things probably need relinking anyway.
Do you know what he's talking about on the curses/ncurses issue?

Post Reply