Page 1 of 2

Extract SDDS Page Efficiently

Posted: 20 Sep 2012, 16:26
by JoelFrederico
Hi all,

I'd like to know if, on a fundamental level, it's possible to extract a page from an SDDS file efficiently. I'm running large simulations again, and all the code I've seen does an SDDS_ReadPage to move through the pages one-by-one until it gets to the desired location. Is there an easy way to specify a page to start at and then read the page?

- Joel

Re: Extract SDDS Page Efficiently

Posted: 20 Sep 2012, 17:21
by michael_borland
Joel,

Unfortunately, there isn't a way to do this. It's something we should add.

You might get some benefit from the following trick: instead of using SDDS_ReadPage(), use SDDS_ReadPageSparse() and set the interval parameter to a large value. This will speed up the process of reading the unwanted pages. Once you reach the page you are interested in, set the interval to 1.

--Michael

Re: Extract SDDS Page Efficiently

Posted: 21 Sep 2012, 17:23
by JoelFrederico
Thanks Michael,

Is there documentation for how to use these sddsdata functions? SDDS_ReadPageSparse takes fileIndex, sparse_interval, and sparse_offset. What do they mean? I can get fileIndex, in python that's (sddsclassvariable).index. But I don't know what to put for sparse_interval or sparse_offset.

Joel

Re: Extract SDDS Page Efficiently

Posted: 27 Sep 2012, 09:50
by michael_borland
Joel,

Alas, there isn't any documentation of that specific routine.

Code: Select all

int32_t SDDS_ReadPageSparse(SDDS_DATASET *SDDS_dataset, uint32_t mode,
                         int32_t sparse_interval,
                         int32_t sparse_offset)
mode is unused and is present for future expansion.
sparse_interval is an integer greater than 1 giving the interval between rows that are read into memory.
sparse_offset is an integer greater than 0 giving the offset in rows to the first row that will be stored.

Hope this helps.

--Michael

Re: Extract SDDS Page Efficiently

Posted: 27 Sep 2012, 17:42
by JoelFrederico
Michael,

I don't think I understand. I have a 1000-page file. If I run:

Code: Select all

(load page and initialize class)
page=sddsdata.ReadPageSparse(a.index,10,3)
if page != 1:
    sddsdata.PrintErrors(a.SDDS_EXIT_PrintErrors)
I get:

Code: Select all

Error:
Unable to read rows--failure reading string (SDDS_ReadBinaryRows)
If I leave off the PrintErrors, it seems to do the exact same thing ReadPage does.

Also, loosely related: it occasionally reads 47914655154177L instead of 1L from the file. Any ideas what's going on with that? It's intermittent too - it's usually just the first page, but sometimes it's every single page. (This is a parameter file I'm testing on that's for use with loading parameters, the occurrence is supposed to be 1 for all of the rows.)

Re: Extract SDDS Page Efficiently

Posted: 28 Sep 2012, 16:45
by soliday
When using this in python the code would look like:

Code: Select all

skipToPage=6
page = 0
while (page < skipToPage - 1)):
    page=sddsdata.ReadPageSparse(a.index,99999999,0)
    if page == 0:
        sddsdata.PrintErrors(a.SDDS_EXIT_PrintErrors)
    if page == -1:
        break
if page != -1:
    page=sddsdata.ReadPageSparse(a.index,1,0)
    while page > 0:
        for i in range(numberOfParameters):
            a.parameterData[i].append(sddsdata.GetParameter(a.index,i))
        for i in range(numberOfColumns):
            a.columnData[i].append(sddsdata.GetColumn(a.index,i))
        page = sddsdata.ReadPage(a.index)
    if page == 0:
        sddsdata.PrintErrors(a.SDDS_EXIT_PrintErrors)
#close SDDS file
if sddsdata.Terminate(a.index) != 1:
    sddsdata.PrintErrors(a.SDDS_EXIT_PrintErrors)

Re: Extract SDDS Page Efficiently

Posted: 28 Sep 2012, 16:57
by JoelFrederico
Thanks for the code, I'll see if I can make it work.

Question: So (sparse_offset >= 0)? Not (sparse_offset > 0)? Is this the same with sparse_interval?

Re: Extract SDDS Page Efficiently

Posted: 01 Oct 2012, 10:15
by soliday
JoelFrederico wrote: Question: So (sparse_offset >= 0)? Not (sparse_offset > 0)? Is this the same with sparse_interval?
Sparse interval has to be 1 or greater. 1 meaning there really is no 'sparsing' going on because it is reading every row.

Sparse offset has to be 0 or greater. 0 means it will read the first row.

So in my example code, it will read the first row of every page but then skip the rest of it. It must read at least one row otherwise it will bomb which is why having the sparse offset greater than the number or rows will cause problems.

Re: Extract SDDS Page Efficiently

Posted: 01 Oct 2012, 13:29
by JoelFrederico
Ohhhhh, awesome, this is exactly the information I was looking for. Thanks! I think this can be called closed now.

Re: Extract SDDS Page Efficiently

Posted: 01 Oct 2012, 17:25
by JoelFrederico
Okay, too quick to speak. I don't think this will be a problem for me now because I'm not looking at integers, but it's strange. I'm including the code you'll need to replicate the problem.

The first run through, I believe before it compiles .pyc files, it has errors reading the second column in pages 3, 4, and 7. (It should always read as 1.) Subsequent runs have errors on the first page. I don't really know what's going on.

Code: Select all

joelfred@noric05 error$ ./script.py 
Attempt to load pages 1-10

Page number 1.
[[['EOFFSET']], [[1L]], [['DE']], [['0.000000e+00\r']]]

Page number 2.
[[['EOFFSET']], [[1L]], [['DE']], [['-1.508067e-03\r']]]

Page number 3.
[[['EOFFSET']], [[1L]], [['DE']], [['1.155152e-03\r']]]

Page number 4.
[[['EOFFSET']], [[47944719925249L]], [['DE']], [['-1.099028e-02\r']]]

Page number 5.
[[['EOFFSET']], [[1L]], [['DE']], [['1.996521e-03\r']]]

Page number 6.
[[['EOFFSET']], [[1L]], [['DE']], [['7.943676e-03\r']]]

Page number 7.
[[['EOFFSET']], [[47944719925249L]], [['DE']], [['4.454010e-03\r']]]

Page number 8.
[[['EOFFSET']], [[1L]], [['DE']], [['3.903766e-03\r']]]

Page number 9.
[[['EOFFSET']], [[1L]], [['DE']], [['-1.027400e-03\r']]]

Page number 10.
[[['EOFFSET']], [[47944719925249L]], [['DE']], [['7.756908e-04\r']]]
joelfred@noric05 error$ ./script.py 
Attempt to load pages 1-10

Page number 1.
[[['EOFFSET']], [[47051366727681L]], [['DE']], [['0.000000e+00\r']]]

Page number 2.
[[['EOFFSET']], [[1L]], [['DE']], [['-1.508067e-03\r']]]

Page number 3.
[[['EOFFSET']], [[1L]], [['DE']], [['1.155152e-03\r']]]

Page number 4.
[[['EOFFSET']], [[1L]], [['DE']], [['-1.099028e-02\r']]]

Page number 5.
[[['EOFFSET']], [[1L]], [['DE']], [['1.996521e-03\r']]]

Page number 6.
[[['EOFFSET']], [[1L]], [['DE']], [['7.943676e-03\r']]]

Page number 7.
[[['EOFFSET']], [[1L]], [['DE']], [['4.454010e-03\r']]]

Page number 8.
[[['EOFFSET']], [[1L]], [['DE']], [['3.903766e-03\r']]]

Page number 9.
[[['EOFFSET']], [[1L]], [['DE']], [['-1.027400e-03\r']]]

Page number 10.
[[['EOFFSET']], [[1L]], [['DE']], [['7.756908e-04\r']]]