An error with parallel optimization

Moderators: cyao, michael_borland

A.V.Bogomyagkov
Posts: 18
Joined: 29 Jan 2021, 14:07

An error with parallel optimization

Post by A.V.Bogomyagkov » 14 Feb 2021, 09:38

Hello,

I am trying to optimize (increase the area) on-momentum 4d dynamic aperture with just two variables and two covariables
After running on windows 10 with elegant 2020.5 Dec 11 2020

Code: Select all

mpiexec -n 10 pelegant run-da-opt-4d.ele > results/run-da-opt-4d.log
you may see in %s.opt file that there were some optimization steps and after another one an error was issued.

The error message is

Code: Select all

job aborted:
[ranks] message
[0] terminated
[1] fatal error
Fatal error in MPI_Allreduce: Message truncated, error stack:
MPI_Allreduce(sbuf=0x000000A9B18E79B0, rbuf=0x000000A9B18E79C0, count=1, MPI_DOUBLE_INT, MPI_MINLOC, MPI_COMM_WORLD) failed
Message truncated; 2296 bytes received but buffer size is 12
[2-9] terminated
---- error analysis -----
[1] on S1-33-BOGOM2
mpi has detected a fatal error and aborted pelegant
---- error analysis -----
What am I doing wrong?

The next question is that after this simple try I want to optimize dynamic aperture at several energy deviation points with about a hundred variables. How do I access the DaArea term at several energy points?

Anton
Attachments
Optimization.zip
(231.88 KiB) Downloaded 237 times

A.V.Bogomyagkov
Posts: 18
Joined: 29 Jan 2021, 14:07

Re: An error with parallel optimization

Post by A.V.Bogomyagkov » 19 Feb 2021, 21:48

Is there any hope waiting for reply and fix or should I abandon idea of running pelegant on windows?

Anton

michael_borland
Posts: 1927
Joined: 19 May 2008, 09:33
Location: Argonne National Laboratory
Contact:

Re: An error with parallel optimization

Post by michael_borland » 25 Feb 2021, 15:23

Anton,

I think the problem is that you are using parallel_optimization_setup instead of optimization_setup. The DA procedure in elegant uses parallel resources. When you try to invoke it from the parallel_optimization setup, it gets confused; the hybrid simplex method in particular will try to run many parallel simplex optimizations, each of which uses a single core.

If you just use optimization_setup, it should work and will use parallel resources for the DA computation. Note that each line is tracked in sequence, so the maximum number of cores that will be productively used is set up the nx parameter (41 in your input files).

--Michael

A.V.Bogomyagkov
Posts: 18
Joined: 29 Jan 2021, 14:07

Re: An error with parallel optimization

Post by A.V.Bogomyagkov » 25 Feb 2021, 22:06

Thank you Michael,

I use parallel_optimization_setup because I want to try other methods of optimization in the future, e.g. "swarm".

I am confused.
From the description of "parallel_optimization_setup"
type: setup command (for Pelegant only).
function: define overall parallel optimization parameters and methods
method — May be one of “genetic”, “hybridsimplex” or “swarm”.
I understand that
1. "parallel_optimization_setup" offers other methods that "optimization_setup" does not use.
2. I need to use pelegant.
So, if I want to use multicore optimization I need to use "parallel_optimization_setup" with pelegant only;
if I want to use "swarm" method I need to use "parallel_optimization_setup" with pelegant only.
Am I right?

I did try changing the method to swarm

Code: Select all

&parallel_optimization_setup
    mode = "minimize", method = "swarm",
    population_log = %s.pop,
    print_all_individuals = 1,
    target = 0,
    log_file = "%s.opt"
&end
and running the code with pelegant. I had expected number of processes start but after a few iterations all of them consumed 0% processor and just hang with no additional output in %s.opt or .log files and no error message. I had to kill them after waiting several hours.

I did try "optimization_setup" with elegant and it worked, but there was only one process, one core and it took two days with just two variables at AMD Ryzen 9 3950X 16-Core processor with 64 GB RAM. Or may be should I try to run "optimization_setup" with pelegant?

Do I understand you correctly that even with elegant "find_aperture" will use several cores anyway? Or do I need to run it with pelegant?
Is it the same with "tune_footprint"?

Does it mean that I can use "optimization_setup" with elegant, which means one core optimization, but "find_aperture" and "tune_footprint" will use several cores to calculate the target function?

I want to optimize 5d dynamic aperture, which is transverse DA at fixed energy deviations. Can you advise how to do it, because "find_aperture" only finds DA for on momentum particle. The other way is to find the aperture in x and delta with given initial Y, and optimize the square.

The final goal is optimized 6d aperture, but I do not understand how to do it and can't find an appropriate function in the manual.

Respectfully,
Anton

michael_borland
Posts: 1927
Joined: 19 May 2008, 09:33
Location: Argonne National Laboratory
Contact:

Re: An error with parallel optimization

Post by michael_borland » 25 Feb 2021, 23:10

Anton,

The DA algorithm can use either a single core (elegant) or many cores (Pelegant). However, in Pelegant, it is multi-core only. So when you want to optimize the DA, you should use Pelegant to get faster DA evaluation, but you need to use an optimization method that doesn't in itself use parallel resources. That means using optimization_setup in Pelegant. The same goes for anything obtained from FMA. When the basic computation is parallel, you should just use optimization_setup in Pelegant. Unfortunately, that means parallel optimization algorithms like particle swarm and hybrid simplex are not available for such applications.

If you want to use a genetic algorithm and use parallel resources for the computations, we have geneticOptimizer script that allows doing this. It submits any jobs to evaluate the fitness function. Each job can use as many cores as you like and run whatever programs you like (e.g., Pelegant). To find it, look for "MOGA optimizer for rings" on our software page.
https://www.aps.anl.gov/Accelerator-Ope ... s/Software
This provides much more flexibility in what you optimize and allows using bigger resources (for faster results) than using elegant's internal optimizer.

--Michael

A.V.Bogomyagkov
Posts: 18
Joined: 29 Jan 2021, 14:07

Re: An error with parallel optimization

Post by A.V.Bogomyagkov » 25 Feb 2021, 23:32

Thank you Michael,

now it is clear.

What can you advise about following.

I want to optimize 5d dynamic aperture, which is transverse DA at fixed energy deviations. Can you advise how to do it, because "find_aperture" only finds DA for on momentum particle. The other way is to find the aperture in x and delta with given initial Y, and optimize the square.

The final goal is optimized 6d aperture, but I do not understand how to do it and can't find an appropriate function in the manual.

Can I use MALIGN element to set dp and find DA? But how do I use DaArea variable at different dp?

Anton

A.V.Bogomyagkov
Posts: 18
Joined: 29 Jan 2021, 14:07

Re: An error with parallel optimization

Post by A.V.Bogomyagkov » 26 Feb 2021, 04:33

Michael,

I tried what you explained and pelegant started with 16 processes, found initial DA, and just froze during optimization as it seen in resource monitor (Screenshot 2021-02-26 162353.png) and results/run-da-opt-4d.log

Code: Select all

mpiexec -n 16 pelegant run-da-opt-4d.ele > results/run-da-opt-4d.log
with

Code: Select all

&optimization_setup
    mode = "minimize", method = "simplex",
    target = 0,
    tolerance = 1e-12,
    n_passes = 3,
    n_evaluations = 30,
    n_restarts = 0,
    verbose = 1,
    log_file = "%s.opt"
&end
I used elegant 2020.4.0 Oct 21 2020, but the same result is with elegant 2020.5

Anton
Attachments
Optimization-2.zip
(283.61 KiB) Downloaded 248 times

michael_borland
Posts: 1927
Joined: 19 May 2008, 09:33
Location: Argonne National Laboratory
Contact:

Re: An error with parallel optimization

Post by michael_borland » 01 Mar 2021, 15:47

Anton,

I'm not able to reproduce this problem. It seems to run as expected. Is it possible for you to try this on a non-Windows computer?

--Michael

A.V.Bogomyagkov
Posts: 18
Joined: 29 Jan 2021, 14:07

Re: An error with parallel optimization

Post by A.V.Bogomyagkov » 02 Mar 2021, 23:14

Michael,

I tried on linux ubuntu.
Here is a screenshot of the error message image.png
and the log and output files

Anton
Attachments
Optimization-3.zip
(238.08 KiB) Downloaded 234 times
image.png

A.V.Bogomyagkov
Posts: 18
Joined: 29 Jan 2021, 14:07

Re: An error with parallel optimization

Post by A.V.Bogomyagkov » 17 Mar 2021, 22:41

Hello,

I tried to run Pelegant on

ОС: Scientific Linux 7.9
Linux en002.binp.gcf 3.10.0-1160.15.2.el7.x86_64 #1 SMP Tue Feb 2
08:13:55 CST 2021 x86_64 x86_64 x86_64 GNU/Linux

MPI:
openmpi-1.10.7-5.el7.x86_64
openmpi-devel-1.10.7-5.el7.x86_64

The result is same.
elegant run-da-opt-4d.ele > results/run-da-opt-4d.log
works and finishes optimization, saves the new lattice.
But Pelegant starts and freezes after several iterations of optimization, no new lattice is saved.

The log file shows no errors, and the process just hangs and does nothing.
The same happens on windows 10 and ubuntu.

Here is the tail of log file when the process froze.

Determining reference trajectory for CSBEND MBDS2.1#8 at s=6.204984e+02
** Starting 7-line aperture search
DA area is 6.354589e-06
** Starting 7-line aperture search
DA area is 3.505011e-06
** Starting 7-line aperture search
DA area is 3.153602e-06
** Starting 7-line aperture search
DA area is 5.943329e-06
** Starting 7-line aperture search
DA area is 7.728071e-06
** Starting 7-line aperture search
DA area is 6.622016e-06
** Starting 7-line aperture search
Warning: particle acquired undefined slopes when integrating through kick multipole
DA area is 5.643222e-06
** Starting 7-line aperture search
Warning: particle acquired undefined slopes when integrating through kick multipole
DA area is 5.760818e-06
** Starting 7-line aperture search
Warning: particle acquired undefined slopes when integrating through kick multipole
Warning: particle acquired undefined slopes when integrating through kick multipole
DA area is 6.339684e-06
** Starting 7-line aperture search
Warning: particle acquired undefined slopes when integrating through kick multipole
Warning: particle acquired undefined slopes when integrating through kick multipole
DA area is 6.301518e-06
** Starting 7-line aperture search
DA area is 6.301933e-06
** Starting 7-line aperture search
Warning: particle acquired undefined slopes when integrating through kick multipole
Warning: particle acquired undefined slopes when integrating through kick multipole
DA area is 6.248984e-06
** Starting 7-line aperture search
Warning: particle acquired undefined slopes when integrating through kick multipole
DA area is 6.291907e-06
** Starting 7-line aperture search
DA area is 1.875784e-06
** Starting 7-line aperture search
DA area is 3.387569e-06
** Starting 7-line aperture search
DA area is 5.244089e-06
** Starting 7-line aperture search
DA area is 2.579718e-06
** Starting 7-line aperture search
DA area is 3.169127e-06
** Starting 7-line aperture search
Warning: particle acquired undefined slopes when integrating through kick multipole
Warning: particle acquired undefined slopes when integrating through kick multipole
DA area is 4.083242e-06
** Starting 7-line aperture search
Warning: particle acquired undefined slopes when integrating through kick multipole
Warning: particle acquired undefined slopes when integrating through kick multipole
DA area is 3.687216e-06
** Starting 7-line aperture search
DA area is 3.349078e-06
** Starting 7-line aperture search
Warning: particle acquired undefined slopes when integrating through kick multipole
DA area is 3.666178e-06
** Starting 7-line aperture search
DA area is 3.534813e-06
** Starting 7-line aperture search
DA area is 3.802689e-06
** Starting 7-line aperture search
DA area is 3.504473e-06
** Starting 7-line aperture search
DA area is 3.005793e-06
** Starting 7-line aperture search
DA area is 6.354589e-06
** Starting 7-line aperture search
DA area is 6.354589e-06

Anton
Attachments
Optimization-4.zip
(240.89 KiB) Downloaded 238 times

Post Reply