MPI version for version 2020.5.0

Moderators: cyao, michael_borland

Post Reply
Teresia
Posts: 30
Joined: 04 Oct 2018, 08:42

MPI version for version 2020.5.0

Post by Teresia » 26 Jan 2021, 12:24

Hi,

We have tried to build Elegant version 2020.5.0 at Diamond, but run into some odd issues which we think is related to MPI. So far it seems to only happen when running with ion_effects. The simulation starts and everything seems perfectly fine for 1000 turns or so, but when running more turns suddenly entire bunches are gone from the beam without any apparent physics reason (coordinates just become zero) and the total number of particles is reduced. Eventually the job stops without finishing, but still exits with code 0 so the cluster thinks things went fine. It however still gives an error of the form:

mlx5: cs05r-sc-com01-04.diamond.ac.uk: got completion with error:
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000006 00000000 00000000 00000000
00000000 00008813 08020d3b ac994ed3
[[56410,1],39][btl_openib_component.c:3645:handle_wc] from cs05r-sc-com01-04 to: cs05r-sc-com01-10 error polling LP CQ with status REMOTE ACCESS ERROR status number 10 for wr_id 3f8a180 opcode -546311451 vendor error 136 qp_idx 1

Currently we have built it using openmpi 3.1.4, but we also have other versions available, both using earlier and later MPI standards. We haven't had this problem with previous Elegant versions so therefore we are wondering which openmpi version that is recommended to use for this version I case that is the reason for our problem?

Best regards,

Teresia

Post Reply