SCRIPT causing hangs

Moderators: cyao, michael_borland

Post Reply
JoelFrederico
Posts: 60
Joined: 05 Aug 2010, 11:32
Location: SLAC National Accelerator Laboratory

SCRIPT causing hangs

Post by JoelFrederico » 25 Apr 2011, 17:47

I've noticed a problem brought on by using the SCRIPT element. I was testing to see if I could use a more complicated script later, so I just used a simple copy script with some output to make sure it was reaching different parts of it and accepting commands:

Code: Select all

#!/bin/bash
echo $1 $2
cp $1 $2
echo "I'm here!"

Code: Select all

SCRIPTEL: SCRIPT,COMMAND="./test.sh %i %o"
The code hangs in different ways. In the simple drift sim I created, it hangs at the end, before it's supposed to output. In this case, no temporary files appear:

http://stanford.edu/~joelfred/script.tar.gz

In a more complicated sim, it seems to test the script when reading the ele file. Sometimes it fails to find the output files here, even though an "ls" shows the files it's looking for. Sometimes it passes this section and simply hangs when it gets to the script element. I'm running on an NFS cluster, and I'm wondering if this has something to do with elegant not waiting long enough for the file to appear? In both cases, temporary files appear in the directory.

http://stanford.edu/~joelfred/drift.tar.gz

Of course, as always, things are fine in regular elegant.

ywang25
Posts: 52
Joined: 10 Jun 2008, 19:48

Re: SCRIPT causing hangs

Post by ywang25 » 26 Apr 2011, 08:13

Joel,

The SCRIPT element is not fully supported in the current version of Pelegant, primarily due to too many ways of using the scripts during a simulation and they need to be handled specifically for different situations. If you have I/O operations in your script, it could make it more complicated to write to the same file with multiple processors. If you can streamline the usage of a script, it is possible to work out a solution for a particular case.

Also, NFS file system have a known issue of delay to read a newly-generated file, you can try to add some "sleep" command to work around it.

Yusong

JoelFrederico
Posts: 60
Joined: 05 Aug 2010, 11:32
Location: SLAC National Accelerator Laboratory

Re: SCRIPT causing hangs

Post by JoelFrederico » 29 Apr 2011, 18:58

Yusong,

Thanks so much for your explanation, particularly about NFS. I may be a little confused still. When you say I/O in the script could make it complicated to write to the same file with multiple processors, does that mean that it's possible for the SCRIPT element to get called more than once for each run through the simulation?

I was under the impression that all processes had to catch up and be in the same place at the SCRIPT. Then Pelegant would gather the particles into one file for the script to process. The script would then run on one machine on the one input file. It would then output a file and terminate. Pelegant would resume and read the output file and distribute particles as appropriate to the processes and the simulation would continue. Is this not what (basically) happens?

It was sounding like Pelegant runs a script for each process, but I'm unclear how that would work.

A bit of clarification would be helpful!

Joel

ywang25
Posts: 52
Joined: 10 Jun 2008, 19:48

Re: SCRIPT causing hangs

Post by ywang25 » 02 May 2011, 09:11

JoelFrederico wrote:Yusong,

Thanks so much for your explanation, particularly about NFS. I may be a little confused still. When you say I/O in the script could make it complicated to write to the same file with multiple processors, does that mean that it's possible for the SCRIPT element to get called more than once for each run through the simulation?
Yes. For current implementation, all the slave processors would execute the scripts simultaneously.
I was under the impression that all processes had to catch up and be in the same place at the SCRIPT. Then Pelegant would gather the particles into one file for the script to process. The script would then run on one machine on the one input file. It would then output a file and terminate. Pelegant would resume and read the output file and distribute particles as appropriate to the processes and the simulation would continue. Is this not what (basically) happens?
Traditionally, this is the way how it works before parallel I/O was added. After Pelegant integrated with SDDS parallel I/O, the memory bottleneck on one compute node does not exist, so it can be used to track a very large number of particles. Gathering to one processor could cause memory issue. From efficiency point of view, gathering and scattering frequently in the tracking will have a negative impact for the performance. The ultimate solution would be to let all the cores do script process on their own part of the particles, but it depends on the scripts chosen by users.

It was sounding like Pelegant runs a script for each process, but I'm unclear how that would work.
It would be ideal if the script can also run in parallel. It might be possible to add an option to let the users to choose if they want to run the SCRIPT element on one processor (after gathering), but it seems to be no advantage compared with running the script outside the Pelegant program.

Yusong

JoelFrederico
Posts: 60
Joined: 05 Aug 2010, 11:32
Location: SLAC National Accelerator Laboratory

Re: SCRIPT causing hangs

Post by JoelFrederico » 02 May 2011, 12:54

Thanks for the detail. I'm sure most applications don't need to have access to the full phase space, but I have an application I'm interested in. I'd like to try to optimize on what I've defined as the "core emittance" of our beam. We can use quads and sextupoles to try to make the geometric contribution to emittance better. However, I'd like to optimize based on the emittance excluding the tails. The tails are complex, and so it's not as easy as just doing a CLEAN or anything like that. I really need access to the full transverse phase space from one single script. So it would be very useful for me to be able to ask Pelegant to gather the particles into one file and have my script run on one core.

Wouldn't the performance still be better? You would be limited to the speed of your slowest processor. But since that processor is simulating less particles, it seems like it would still run faster than serial elegant.

Also, don't you need to gather particles together for effects like CSR, where what matters is the entire beam distribution and you can't simulate it just knowing each subset of particles?

ywang25
Posts: 52
Joined: 10 Jun 2008, 19:48

Re: SCRIPT causing hangs

Post by ywang25 » 02 May 2011, 13:31

Joel,

Does the example "http://stanford.edu/~joelfred/drift.tar.gz" have the same file I/O operation pattern with the application you plan to do? If so, I can use it as an example to develop a solution for this case. It will be something like this: Pelegant dumps an output file in parallel -> processes the file with the script on one core -> Pelegant reads the result file in parallel and continues the simulation. It just has no performance gain for the SCRIPT element.

In CSR, all the processors have the whole distribution, but not the full phase space. According to the characteristics of the distribution, the communication overhead can be reduced significantly by sharing the non-empty bins of the distribution across all the processors.

Yusong

JoelFrederico
Posts: 60
Joined: 05 Aug 2010, 11:32
Location: SLAC National Accelerator Laboratory

Re: SCRIPT causing hangs

Post by JoelFrederico » 02 May 2011, 13:39

Yusong,
Does the example "http://stanford.edu/~joelfred/drift.tar.gz" have the same file I/O operation pattern with the application you plan to do?
Sort of. The drift.tar.gz has what should be a file I/O that is generic - it shouldn't matter if it's getting an incomplete or complete dump. But it's just a test case, it's set up to just copy the particle file, it's the simplest script I could imagine.

I would like to ultimately use the script to call a binary (either C++, compiled matlab, python, whatever) to use the SDDS routines to read in a complete particle distribution in 6-D, operate on it, and return a complete particle distribution. So your description is correct:
Pelegant dumps an output file in parallel -> processes the file with the script on one core -> Pelegant reads the result file in parallel and continues the simulation.
And thanks for the clarification on CSR, and responding so quickly, it's very helpful!

Joel

ywang25
Posts: 52
Joined: 10 Jun 2008, 19:48

Re: SCRIPT causing hangs

Post by ywang25 » 02 Mar 2012, 09:55

Joel,

The SCRIPT element is supported in the new Pelegant 25. It is going to be processed by the Master CPU.

Yusong
Last edited by ywang25 on 02 Mar 2012, 16:49, edited 1 time in total.

JoelFrederico
Posts: 60
Joined: 05 Aug 2010, 11:32
Location: SLAC National Accelerator Laboratory

Re: SCRIPT causing hangs

Post by JoelFrederico » 02 Mar 2012, 16:47

Yusong,

That's great, thanks!

Joel

Post Reply