Extract SDDS Page Efficiently

Moderators: cyao, michael_borland

JoelFrederico
Posts: 60
Joined: 05 Aug 2010, 11:32
Location: SLAC National Accelerator Laboratory

Extract SDDS Page Efficiently

Post by JoelFrederico » 20 Sep 2012, 16:26

Hi all,

I'd like to know if, on a fundamental level, it's possible to extract a page from an SDDS file efficiently. I'm running large simulations again, and all the code I've seen does an SDDS_ReadPage to move through the pages one-by-one until it gets to the desired location. Is there an easy way to specify a page to start at and then read the page?

- Joel

michael_borland
Posts: 2015
Joined: 19 May 2008, 09:33
Location: Argonne National Laboratory
Contact:

Re: Extract SDDS Page Efficiently

Post by michael_borland » 20 Sep 2012, 17:21

Joel,

Unfortunately, there isn't a way to do this. It's something we should add.

You might get some benefit from the following trick: instead of using SDDS_ReadPage(), use SDDS_ReadPageSparse() and set the interval parameter to a large value. This will speed up the process of reading the unwanted pages. Once you reach the page you are interested in, set the interval to 1.

--Michael

JoelFrederico
Posts: 60
Joined: 05 Aug 2010, 11:32
Location: SLAC National Accelerator Laboratory

Re: Extract SDDS Page Efficiently

Post by JoelFrederico » 21 Sep 2012, 17:23

Thanks Michael,

Is there documentation for how to use these sddsdata functions? SDDS_ReadPageSparse takes fileIndex, sparse_interval, and sparse_offset. What do they mean? I can get fileIndex, in python that's (sddsclassvariable).index. But I don't know what to put for sparse_interval or sparse_offset.

Joel

michael_borland
Posts: 2015
Joined: 19 May 2008, 09:33
Location: Argonne National Laboratory
Contact:

Re: Extract SDDS Page Efficiently

Post by michael_borland » 27 Sep 2012, 09:50

Joel,

Alas, there isn't any documentation of that specific routine.

Code: Select all

int32_t SDDS_ReadPageSparse(SDDS_DATASET *SDDS_dataset, uint32_t mode,
                         int32_t sparse_interval,
                         int32_t sparse_offset)
mode is unused and is present for future expansion.
sparse_interval is an integer greater than 1 giving the interval between rows that are read into memory.
sparse_offset is an integer greater than 0 giving the offset in rows to the first row that will be stored.

Hope this helps.

--Michael

JoelFrederico
Posts: 60
Joined: 05 Aug 2010, 11:32
Location: SLAC National Accelerator Laboratory

Re: Extract SDDS Page Efficiently

Post by JoelFrederico » 27 Sep 2012, 17:42

Michael,

I don't think I understand. I have a 1000-page file. If I run:

Code: Select all

(load page and initialize class)
page=sddsdata.ReadPageSparse(a.index,10,3)
if page != 1:
    sddsdata.PrintErrors(a.SDDS_EXIT_PrintErrors)
I get:

Code: Select all

Error:
Unable to read rows--failure reading string (SDDS_ReadBinaryRows)
If I leave off the PrintErrors, it seems to do the exact same thing ReadPage does.

Also, loosely related: it occasionally reads 47914655154177L instead of 1L from the file. Any ideas what's going on with that? It's intermittent too - it's usually just the first page, but sometimes it's every single page. (This is a parameter file I'm testing on that's for use with loading parameters, the occurrence is supposed to be 1 for all of the rows.)

soliday
Posts: 408
Joined: 28 May 2008, 09:15

Re: Extract SDDS Page Efficiently

Post by soliday » 28 Sep 2012, 16:45

When using this in python the code would look like:

Code: Select all

skipToPage=6
page = 0
while (page < skipToPage - 1)):
    page=sddsdata.ReadPageSparse(a.index,99999999,0)
    if page == 0:
        sddsdata.PrintErrors(a.SDDS_EXIT_PrintErrors)
    if page == -1:
        break
if page != -1:
    page=sddsdata.ReadPageSparse(a.index,1,0)
    while page > 0:
        for i in range(numberOfParameters):
            a.parameterData[i].append(sddsdata.GetParameter(a.index,i))
        for i in range(numberOfColumns):
            a.columnData[i].append(sddsdata.GetColumn(a.index,i))
        page = sddsdata.ReadPage(a.index)
    if page == 0:
        sddsdata.PrintErrors(a.SDDS_EXIT_PrintErrors)
#close SDDS file
if sddsdata.Terminate(a.index) != 1:
    sddsdata.PrintErrors(a.SDDS_EXIT_PrintErrors)

JoelFrederico
Posts: 60
Joined: 05 Aug 2010, 11:32
Location: SLAC National Accelerator Laboratory

Re: Extract SDDS Page Efficiently

Post by JoelFrederico » 28 Sep 2012, 16:57

Thanks for the code, I'll see if I can make it work.

Question: So (sparse_offset >= 0)? Not (sparse_offset > 0)? Is this the same with sparse_interval?

soliday
Posts: 408
Joined: 28 May 2008, 09:15

Re: Extract SDDS Page Efficiently

Post by soliday » 01 Oct 2012, 10:15

JoelFrederico wrote: Question: So (sparse_offset >= 0)? Not (sparse_offset > 0)? Is this the same with sparse_interval?
Sparse interval has to be 1 or greater. 1 meaning there really is no 'sparsing' going on because it is reading every row.

Sparse offset has to be 0 or greater. 0 means it will read the first row.

So in my example code, it will read the first row of every page but then skip the rest of it. It must read at least one row otherwise it will bomb which is why having the sparse offset greater than the number or rows will cause problems.

JoelFrederico
Posts: 60
Joined: 05 Aug 2010, 11:32
Location: SLAC National Accelerator Laboratory

Re: Extract SDDS Page Efficiently

Post by JoelFrederico » 01 Oct 2012, 13:29

Ohhhhh, awesome, this is exactly the information I was looking for. Thanks! I think this can be called closed now.

JoelFrederico
Posts: 60
Joined: 05 Aug 2010, 11:32
Location: SLAC National Accelerator Laboratory

Re: Extract SDDS Page Efficiently

Post by JoelFrederico » 01 Oct 2012, 17:25

Okay, too quick to speak. I don't think this will be a problem for me now because I'm not looking at integers, but it's strange. I'm including the code you'll need to replicate the problem.

The first run through, I believe before it compiles .pyc files, it has errors reading the second column in pages 3, 4, and 7. (It should always read as 1.) Subsequent runs have errors on the first page. I don't really know what's going on.

Code: Select all

joelfred@noric05 error$ ./script.py 
Attempt to load pages 1-10

Page number 1.
[[['EOFFSET']], [[1L]], [['DE']], [['0.000000e+00\r']]]

Page number 2.
[[['EOFFSET']], [[1L]], [['DE']], [['-1.508067e-03\r']]]

Page number 3.
[[['EOFFSET']], [[1L]], [['DE']], [['1.155152e-03\r']]]

Page number 4.
[[['EOFFSET']], [[47944719925249L]], [['DE']], [['-1.099028e-02\r']]]

Page number 5.
[[['EOFFSET']], [[1L]], [['DE']], [['1.996521e-03\r']]]

Page number 6.
[[['EOFFSET']], [[1L]], [['DE']], [['7.943676e-03\r']]]

Page number 7.
[[['EOFFSET']], [[47944719925249L]], [['DE']], [['4.454010e-03\r']]]

Page number 8.
[[['EOFFSET']], [[1L]], [['DE']], [['3.903766e-03\r']]]

Page number 9.
[[['EOFFSET']], [[1L]], [['DE']], [['-1.027400e-03\r']]]

Page number 10.
[[['EOFFSET']], [[47944719925249L]], [['DE']], [['7.756908e-04\r']]]
joelfred@noric05 error$ ./script.py 
Attempt to load pages 1-10

Page number 1.
[[['EOFFSET']], [[47051366727681L]], [['DE']], [['0.000000e+00\r']]]

Page number 2.
[[['EOFFSET']], [[1L]], [['DE']], [['-1.508067e-03\r']]]

Page number 3.
[[['EOFFSET']], [[1L]], [['DE']], [['1.155152e-03\r']]]

Page number 4.
[[['EOFFSET']], [[1L]], [['DE']], [['-1.099028e-02\r']]]

Page number 5.
[[['EOFFSET']], [[1L]], [['DE']], [['1.996521e-03\r']]]

Page number 6.
[[['EOFFSET']], [[1L]], [['DE']], [['7.943676e-03\r']]]

Page number 7.
[[['EOFFSET']], [[1L]], [['DE']], [['4.454010e-03\r']]]

Page number 8.
[[['EOFFSET']], [[1L]], [['DE']], [['3.903766e-03\r']]]

Page number 9.
[[['EOFFSET']], [[1L]], [['DE']], [['-1.027400e-03\r']]]

Page number 10.
[[['EOFFSET']], [[1L]], [['DE']], [['7.756908e-04\r']]]
Attachments
error.tar.gz
(3.97 KiB) Downloaded 413 times

Post Reply