(problems with MFSD interface in NCSA HDF)
From marsa Thu Aug 17 13:53:48 1995
To: [email protected]
Subject: [netcdf/libsrc]: [The MFSD interface will only allow us to
write a few SDS's in a single file and will only allow us to open a few
files at once.]
To: [email protected]
Subject: [netcdf/libsrc]: [The MFSD interface will only allow us to
write a few SDS's in a single file and will only allow us to open a few
files at once.]
VERSION:
HDF3.3 Release 4
USER:
Robert Marsa
(512) 471-4700
[email protected]
AREA:
netcdf/libsrc
SYNOPSIS:
The MFSD interface will only allow us to write a few SDS's
in a single file and will only allow us to open a few files
at once.
MACHINE / OPERATING SYSTEM:
Presumably all. SGI/IRIX 5.2, Macintosh/A/UX 3.0.2,
CRAY/UNICOS 8.0.4
COMPILER:
native ANSI cc, gcc
DESCRIPTION:
We have recently run into problems using the HDF MFSD interface.
Through testing and examination of the source code, we have
identified at least two problems.
1) Only 5000 (MAX_NC_DIMS) dimensions are allowed in a single
file. This limitation seems to be enforced by SDcreate in mfsd.c.
We don't properly understand this limitation. If we create a
series of SDS's of size 1 X 1 X 1, we can only write 1666 of
them. Likewise, we can only write 2500 SDS's of size 1 X 1 and
5000 SDS's of size 1. However, if we write SDS's of non-unit
dimension, we are limited even more severly in the number we can
output. Therefore, there may be another limitation at work
which we haven't discovered.
2) Only 32 (MAX_NC_OPEN) files may be open at one time. This
limitation seems to be enforced by NC_open in file.c. We have
programs which will involve more than 32 grid functions, all of
which may need to be output at the same time. The work-around of
closing the files between writes is unacceptably slow.
REPEAT BUG BY:
Here is a test program which will illustrate 1) above.
When you run this with: sdtest 1 1000000
you should see:
test0:
test1:
.
.
.
test2499:
test2500:
Can't create data set test2500
/************************************************************************/
/* sdtest.c
cc -I/usr/local/include -L/usr/local/lib sdtest.c -lnetcdf -ldf -o sdtest
*/
#include
#include
#include
#include
/* Illustrates use of SD HDF/NETCDF interface.
Opens 'test.hdf' and appends 'nsteps' (arg 2) 'n' x 'n' (arg 1)
scientific data sets, if possible. Try
sdtest 1 1000000
to observe limits on how many data sets may be appended.
Authors: Robert Marsa, Matt Choptuik, August 1995
*/
main(int argc,char **argv)
{
int32 rank;
int32 shape[3],start[3];
int32 ret,i,j,sf_id,sds_id;
double time;
double *data,*d;
char nm[8];
int n;
int nsteps;
if(argc<3 || !sscanf(argv[1],"%d",&n)|| !sscanf(argv[2],"%d",&nsteps)){
fprintf(stderr,"Usage: sdtest \n");
exit(1);
}
shape[0]=shape[1]=shape[2]=n;
start[0]=start[1]=start[2]=0;
rank=2;
time=1.0;
data=(double *) malloc(shape[0]*shape[1]*sizeof(double));
for(d=data,i=0;i
From [email protected] Mon Aug 21 20:36:31 1995
Posted-Date: Mon, 21 Aug 1995 16:41:48 -0500
Received-Date: Mon, 21 Aug 95 20:36:31 -0500
Received: from newton.ncsa.uiuc.edu by hoffmann.ph.utexas.edu (931110.SGI/5.51)
id AA00622; Mon, 21 Aug 95 20:36:31 -0500
Received: from space.ncsa.uiuc.edu by newton.ncsa.uiuc.edu with SMTP id AA14122
(5.65a/IDA-1.4.2 for [email protected]); Mon, 21 Aug 95 16:37:06 -0500
Received: (from sxu@localhost) by space.ncsa.uiuc.edu (8.6.11/8.6.11) id QAA14325; Mon, 21 Aug 1995 16:41:48 -0500
Date: Mon, 21 Aug 1995 16:41:48 -0500
Message-Id: <[email protected]>
To: [email protected]
Subject: Re: [netcdf/libsrc]: [The MFSD interface will only allow us to
From: [email protected]
Status: R
>From [email protected] Thu Aug 17 14:01:18 1995
>To: [email protected]
>Subject: [netcdf/libsrc]: [The MFSD interface will only allow us to
>write a few SDS's in a single file and will only allow us to open a few
>files at once.]
> VERSION:
> HDF3.3 Release 4
> USER:
> Robert Marsa
> (512) 471-4700
> [email protected]
> MACHINE / OPERATING SYSTEM:
> Presumably all. SGI/IRIX 5.2, Macintosh/A/UX 3.0.2,
> CRAY/UNICOS 8.0.4
> COMPILER:
> native ANSI cc, gcc
>
> DESCRIPTION:
> We have recently run into problems using the HDF MFSD interface.
> Through testing and examination of the source code, we have
> identified at least two problems.
>
> 1) Only 5000 (MAX_NC_DIMS) dimensions are allowed in a single
> file. This limitation seems to be enforced by SDcreate in mfsd.c.
> We don't properly understand this limitation. If we create a
> series of SDS's of size 1 X 1 X 1, we can only write 1666 of
> them. Likewise, we can only write 2500 SDS's of size 1 X 1 and
> 5000 SDS's of size 1.
You can try to change the limitation by redefining MAX_NC_DIMS and MAX_NC_VARS
in netcdf.h. Or, you can use SDsetdimname() to let varaibles share the
dimension records. If all variables have the same 2 dimensions, only two
dimension records will be created in the file. It will not only solve the
limitation problem but also speed up the write process when closing the file.
> However, if we write SDS's of non-unit
> dimension, we are limited even more severly in the number we can
> output. Therefore, there may be another limitation at work
> which we haven't discovered.
>
Thank you for bringing this problem to our attention. I need to trace down the
code and find out why the dimension sizes limit the total number of dimensions.
> 2) Only 32 (MAX_NC_OPEN) files may be open at one time. This
> limitation seems to be enforced by NC_open in file.c. We have
> programs which will involve more than 32 grid functions, all of
> which may need to be output at the same time. The work-around of
> closing the files between writes is unacceptably slow.
MAX_NC_OPEN is also defined in netcdf.h. You may redefine it and recompile the
mfhdf side.
Please let me know if redefining doesn't help.
Thanks.
Shiming Xu
SDG, NCSA
>
> REPEAT BUG BY:
> Here is a test program which will illustrate 1) above.
> When you run this with: sdtest 1 1000000
> you should see:
> test0:
> test1:
> .
> .
> .
> test2499:
> test2500:
> Can't create data set test2500
>
>/************************************************************************/
>/* sdtest.c
>
>cc -I/usr/local/include -L/usr/local/lib sdtest.c -lnetcdf -ldf -o sdtest
>
>*/
>
>#include
>#include
>#include
>#include
>
>/* Illustrates use of SD HDF/NETCDF interface.
>
> Opens 'test.hdf' and appends 'nsteps' (arg 2) 'n' x 'n' (arg 1)
> scientific data sets, if possible. Try
>
> sdtest 1 1000000
>
> to observe limits on how many data sets may be appended.
>
> Authors: Robert Marsa, Matt Choptuik, August 1995
>
>*/
>
>main(int argc,char **argv)
>{
> int32 rank;
> int32 shape[3],start[3];
> int32 ret,i,j,sf_id,sds_id;
> double time;
> double *data,*d;
> char nm[8];
>
> int n;
> int nsteps;
>
> if(argc<3 || !sscanf(argv[1],"%d",&n)|| !sscanf(argv[2],"%d",&nsteps)){
> fprintf(stderr,"Usage: sdtest \n");
> exit(1);
> }
>
> shape[0]=shape[1]=shape[2]=n;
> start[0]=start[1]=start[2]=0;
> rank=2;
> time=1.0;
> data=(double *) malloc(shape[0]*shape[1]*sizeof(double));
> for(d=data,i=0;i for(j=0;j *(d++)=i*j;
> sf_id=SDstart("test.hdf",DFACC_CREATE);
> for(i=0;i sprintf(nm,"test%3d",i);
> printf("%s: \n",nm);
> sds_id=SDcreate(sf_id,nm,DFNT_FLOAT64,rank,shape);
> if(sds_id==-1){
> fprintf(stderr,"Can't create data set %s\n",nm);
> exit(1);
> }
> if(SDwritedata(sds_id,start,NULL,shape,(VOIDP)data)==-1){
> fprintf(stderr,"Can't write data: %d\n",i);
> exit(1);
> }
> SDendaccess(sds_id);
> }
> SDend(sf_id);
>}
>/**********************************************************************/
>
>
>
>
>
>
>
From marsa Wed Aug 23 18:32:46 1995
To: [email protected]
Subject: [netcdf/libsrc] dimension counting problem
>>From [email protected] Thu Aug 17 14:01:18 1995
>>To: [email protected]
>>Subject: [netcdf/libsrc]: [The MFSD interface will only allow us to
>>write a few SDS's in a single file and will only allow us to open a few
>>files at once.]
>> VERSION:
>> HDF3.3 Release 4
>> USER:
>> Robert Marsa
>> (512) 471-4700
>> [email protected]
>> MACHINE / OPERATING SYSTEM:
>> Presumably all. SGI/IRIX 5.2, Macintosh/A/UX 3.0.2,
>> CRAY/UNICOS 8.0.4
>> COMPILER:
>> native ANSI cc, gcc
>>
>> DESCRIPTION:
>> We have recently run into problems using the HDF MFSD interface.
>> Through testing and examination of the source code, we have
>> identified at least two problems.
>>
>> 1) Only 5000 (MAX_NC_DIMS) dimensions are allowed in a single
>> file. This limitation seems to be enforced by SDcreate in mfsd.c.
>> We don't properly understand this limitation. If we create a
>> series of SDS's of size 1 X 1 X 1, we can only write 1666 of
>> them. Likewise, we can only write 2500 SDS's of size 1 X 1 and
>> 5000 SDS's of size 1.
>
>You can try to change the limitation by redefining MAX_NC_DIMS and MAX_NC_VARS
>in netcdf.h. Or, you can use SDsetdimname() to let varaibles share the
>dimension records. If all variables have the same 2 dimensions, only two
>dimension records will be created in the file. It will not only solve the
>limitation problem but also speed up the write process when closing the file.
Although the test program I sent you doesn't use SDsetdimname(), our "real"
programs do. When we examine the file, we do see that there is only one
set of dimensions. However, we still have the same limitations as if we
were writing many sets of dimensions. I think you may be counting dimensions
for every SDS even though you are only creating one set. This looks like a
legitimate bug.
We would prefer not to redefine MAX_NC_DIMS for two reasons:
1) this is not really a fix. We have no idea what the maximum number of
SDS's we'll want is.
2) this would require everyone who uses our software to edit their HDF
headers and rebuild their HDF libraries.
>> However, if we write SDS's of non-unit
>> dimension, we are limited even more severly in the number we can
>> output. Therefore, there may be another limitation at work
>> which we haven't discovered.
>>
>
>Thank you for bringing this problem to our attention. I need to trace down the
>code and find out why the dimension sizes limit the total number of dimensions.
>
>> 2) Only 32 (MAX_NC_OPEN) files may be open at one time. This
>> limitation seems to be enforced by NC_open in file.c. We have
>> programs which will involve more than 32 grid functions, all of
>> which may need to be output at the same time. The work-around of
>> closing the files between writes is unacceptably slow.
>
>MAX_NC_OPEN is also defined in netcdf.h. You may redefine it and recompile the
>mfhdf side.
We don't want to redefine this for the same two reasons given above. It should
be easy for the HDF routines to use some sort of dynamic data structure to
allow an unlimited number of files to be opened.
>Please let me know if redefining doesn't help.
>
>Thanks.
>
>Shiming Xu
>SDG, NCSA
>
Robert Marsa
-------------------------------------------------------------------------
Wed Sep 6
From Ed Seidel at NCSA
------------------------------------------------------------------------
Mike,
Thanks alot for the thoughtful and detailed response. I'll pass
this on the folks at Austin.
Ed
>>Dear Mike, Shiming, et al,
>>
>> I talked recently with Matt Choptuik, Robert Marsa, and others at
>>UT-Austin, who work with us in the black hole grand challenge. They say
>>they have discovered some bugs in hdf routines, and have contacted hdfhelp
>>without much response. I am sure hdfhelp is overwhelmed with
>>correspondence, so I just wanted to check this out with you directly. Matt
>>and Robert seemed to think there were some serious bugs, although I did not
>>get the details yet (they seem to be documented in
>>http://godel.ph.utexas.edu/Members/marsa/hdf.html). Have you all had a
>>chance to check out their reports yet?
>>
>> Thanks alot,
>> Ed
>
>Ed,
>
>I just talked to Shiming and George Velamparampil about these problems.
>Here's the status. (Feel free to pass this info on to the folks at UT.)
>
>The problems
>------------
>The problems are a result of the interplay between a number of things in
>the library, including netCDF, variations among operating systems that HDF
>runs under, and the way we allocate storage for Vdatas (which are used to
>store dimension information).
>
>Whenever you create a dimension, memory is allocated to manage information
>about that dimension. Unless the dimension is "1", the amount of memory is
>about 35K. (It was set this high to make some the code a little simpler,
>and at the time we didn't have any users who needed more than a few data
>sets in a file. We're not yet sure what happens when a dimension is 1.)
>So when people open 1,000 dimensions, 35 MEGABYTES of memory are allocated,
>which plays havoc on some workstations. By putting a limit on the number
>of dimensions that the library can anticipate, this problem is avoided.
>
>The limitation on the number of files has to be there, but the way our
>different modules handle it are inconsistent and need to be improved.
>Ideally, the limit should be whatever the host machine's OS limit is, and
>this should be set dynamically, of possible. I'm not sure we'll be able to
>do this for all OSs. George has been working on this problem.
>
>Fixing the problems
>-------------------
>We now understand that there are users, like the UT folks, for whom these
>artificial limits are intolerable. We discussed both problems at an HDF
>meeting in late August, after Shiming had communicated with the UT folks
>about it, and decided then to try to fix them as best we can.
>
>We set a deadline of Sept 30 to have them fixed, hoping to release the
>revised code with the next beta release of HDF 4.0, planned for Nov 1. If
>that is too long for the UT people we can try to accelerate it, maybe by
>adding a patch to HDF 3.3r4, but our plate is pretty full now and we'd
>rather stick to the current schedule if possible.
>
>Mike
-------------------------------------------------------------------------
Thurs Sep 7
From Mike Folk at NCSA, including Matt Choptuik's reply to previous
message.
-------------------------------------------------------------------------
At 5:33 PM 9/6/95, Matt Choptuik wrote:
>Ed, Mike ... thanks for the info. However, it's not clear that all
>the things we complained are being covered. We realize
>that there will be a limit to the number of files which can
>be open at any one time and we will have to work around this.
Good. George has made some improvements to this situation. Shiming will
soon be sending out a note to Robert Marsa explaining this.
>The
>dimensioning business is different though; as far as we can
>discern (and as is documented at
>http://godel.ph.utexas.edu/Members/marsa/hdf.html
>), if we append CONSTANT RANK, CONSTANT SHAPE scientific
>data sets to an .hdf file, then we *always* face an (unpredictable) limit to
>how many data sets can be written, and according to Shiming,
>this is not supposed to be the case (i.e. we DO use SDsetdimname()
>but still encounter the problem). Either we aren't understanding
>something or this is a genuine BUG, not a design feature.
It's a bug, not a design feature. Shiming will say a bit more about how
she plans to work on it in the note to Robert.
>Also, on Cray vector machines, our codes which use the mfhdf interface
>don't just encounter .hdf error returns, at some point, they actually
>dump core so we suspect there's something else awry with the
>Cray support.
We didn't know about this Cray core dumping problem. (Or maybe we
misunderstood an earlier message from you.) After we solve the previous
problem on our local workstations, we will check it on a Cray.
>We don't mind waiting until Sep 30 for a fix but
>we urge the support group to get back in touch with US if possible
>so that we can be reasonably certain that the problem does get fixed.
Will do.
Mike
-------------------------------------------------------------------------
Fri, Sep 9 (approx)
From Shiming Xu at NCSA (I believe) via Robert Marsa
-------------------------------------------------------------------------
>From [email protected] Wed Aug 23 18:32:29 1995
>To: [email protected]
>Subject: [netcdf/libsrc] dimension counting problem
>
>>>From [email protected] Thu Aug 17 14:01:18 1995
>>>To: [email protected]
>>>Subject: [netcdf/libsrc]: [The MFSD interface will only allow us to
>>>write a few SDS's in a single file and will only allow us to open a few
>>>files at once.]
>>> VERSION:
>>> HDF3.3 Release 4
>>> USER:
>>> Robert Marsa
>>> (512) 471-4700
>>> [email protected]
>>> MACHINE / OPERATING SYSTEM:
>>> Presumably all. SGI/IRIX 5.2, Macintosh/A/UX 3.0.2,
>>> CRAY/UNICOS 8.0.4
>>> COMPILER:
>>> native ANSI cc, gcc
>>>
>>> DESCRIPTION:
>>> We have recently run into problems using the HDF MFSD interface.
>>> Through testing and examination of the source code, we have
>>> identified at least two problems.
>>>
>>> 1) Only 5000 (MAX_NC_DIMS) dimensions are allowed in a single
>>> file. This limitation seems to be enforced by SDcreate in mfsd.c.
>>> We don't properly understand this limitation. If we create a
>>> series of SDS's of size 1 X 1 X 1, we can only write 1666 of
>>> them. Likewise, we can only write 2500 SDS's of size 1 X 1 and
>>> 5000 SDS's of size 1.
>>
>>You can try to change the limitation by redefining MAX_NC_DIMS and MAX_NC_VARS
>>in netcdf.h. Or, you can use SDsetdimname() to let varaibles share the
>>dimension records. If all variables have the same 2 dimensions, only two
>>dimension records will be created in the file. It will not only solve the
>>limitation problem but also speed up the write process when closing the file.
>
>Although the test program I sent you doesn't use SDsetdimname(), our "real"
>programs do. When we examine the file, we do see that there is only one
>set of dimensions. However, we still have the same limitations as if we
>were writing many sets of dimensions. I think you may be counting dimensions
>for every SDS even though you are only creating one set. This looks like a
>legitimate bug.
I did a quick check on some segments of the HDF code which I thought might cause
the problem. Unfortunately none of them seemed to be wrong.
I think I need more time to trace down the code to find out:
1. Why can't we create MAX_NC_DIMS of dims?
2. Why there is a difference between unit and non-unit dim sizes?
3. Why SDsetdimname doesn't solve the limitation problem?
4. Any factors other than the max dim number restrict the number of SDS.
I have reported this problem to our Aug.23's HDF meeting. On Aug.30's HDF meeting
we set a target date of Sept 30 for fix of this problem (along with other tasks).
I will keep you posted if we find out anything sooner.
>We would prefer not to redefine MAX_NC_DIMS for two reasons:
>
>1) this is not really a fix. We have no idea what the maximum number of
>SDS's we'll want is.
>
>2) this would require everyone who uses our software to edit their HDF
>headers and rebuild their HDF libraries.
>
>>> 2) Only 32 (MAX_NC_OPEN) files may be open at one time. This
>>> limitation seems to be enforced by NC_open in file.c. We have
>>> programs which will involve more than 32 grid functions, all of
>>> which may need to be output at the same time. The work-around of
>>> closing the files between writes is unacceptably slow.
>>
>>MAX_NC_OPEN is also defined in netcdf.h. You may redefine it and recompile the
>>mfhdf side.
>
>We don't want to redefine this for the same two reasons given above. It should
>be easy for the HDF routines to use some sort of dynamic data structure to
>allow an unlimited number of files to be opened.
George has been working on the max file number with another user and
he got a solution yesterday. Below is a copy of his e-mail to that user.
And, we just heard back from that user saying this fix worked.
-------------------------------------------------------------
I've found the problem. It looks like there are 3 places in
the source code where the max # of open files is specified.
This is inconsistent and will be fixed in the next beta release.
The quick fix is to change the following variales
when you want to increase the number of open files.
file variable current value
----- -------- ------------
hdf/src/hdf.h MAX_VFILE 16
hdf/src/hfile.h MAX_FILE 16
mfhdf/libsrc/netcdf.h MAX_NC_OPEN 32
I changed the above to 200(specific to SGI) on an
SGI Indy(IRIX 5.3) and opened/created 197(3 for stdin,stdout,stderr)
hdf files using the SDxx interface.
I believe changing the above should fix it. Remember to do
a full *clean* rebuild.
-------------------------------------------------------------------------
Tue Sep 19 10:21:56 CDT 1995
From Shiming Xu at NCSA, via Robert Marsa
-------------------------------------------------------------------------
>From [email protected] Mon Sep 18 17:41:11 1995
Posted-Date: Mon, 18 Sep 1995 17:41:06 -0500
Received-Date: Mon, 18 Sep 95 17:41:11 -0500
To: [email protected]
Subject: Re: limits on dims and vars
From: [email protected]
Status: R
>>>> DESCRIPTION:
>>>> We have recently run into problems using the HDF MFSD interface.
>>>> Through testing and examination of the source code, we have
>>>> identified at least two problems.
>>>>
>>>> 1) Only 5000 (MAX_NC_DIMS) dimensions are allowed in a single
>>>> file. This limitation seems to be enforced by SDcreate in mfsd.c.
>>>> We don't properly understand this limitation. If we create a
>>>> series of SDS's of size 1 X 1 X 1, we can only write 1666 of
>>>> them. Likewise, we can only write 2500 SDS's of size 1 X 1 and
>>>> 5000 SDS's of size 1.
>>>
>>>You can try to change the limitation by redefining MAX_NC_DIMS and MAX_NC_VARS
>>>in netcdf.h. Or, you can use SDsetdimname() to let varaibles share the
>>>dimension records. If all variables have the same 2 dimensions, only two
>>>dimension records will be created in the file. It will not only solve the
>>>limitation problem but also speed up the write process when closing the file.
>>
>>Although the test program I sent you doesn't use SDsetdimname(), our "real"
>>programs do. When we examine the file, we do see that there is only one
>>set of dimensions. However, we still have the same limitations as if we
>>were writing many sets of dimensions. I think you may be counting dimensions
>>for every SDS even though you are only creating one set. This looks like a
>>legitimate bug.
>
>I did a quick check on some segments of the HDF code which I thought might cause
>the problem. Unfortunately none of them seemed to be wrong.
>I think I need more time to trace down the code to find out:
>
Here are some results:
>1. Why can't we create MAX_NC_DIMS of dims?
>2. Why there is a difference between unit and non-unit dim sizes?
I wrote a short program to create 5000 3D INT8 SDS' on SGI. For dimension sizes
1x1x1, 2x2x2, 20x20x20 and 30x30x30 the program created 1666 SDS and
died at 1667th SDS when it reached the limit of 5000 dimensions.
In other words, the dimension sizes didn't seem to make difference
in terms of the maximum number of dimensions.
I ran your program on SGI with rank=2 and rank=3. With different dim
sizes the program always created 5000 dimensions. For rank=2 the
program died at 2500 variable and for rank=3 it died at 1666.
The only time my program died before 5000 dims was when dims were 50x50x50.
The file would be bigger then 200MB while there were only 140MB available on
my disk.
>3. Why SDsetdimname doesn't solve the limitation problem?
You were right about this, SDsetdimname did not help with the dim limits,
even though it cut the final number of dimensions in HDF files.
The reason is described below.
HDF creates dimension structures and an array of pointers to the dimension
structures dynamically. Every time a new SDS is created a number of rank
(3 in my sample program) dimension structures and pointers are also created.
When SDsetdimname is called to share dimensions, the duplicated dimension
structures are freed. However, the pointers are not. They point to the
corresponding dimension structures which represent each unique dimension.
In short, the number of dimenstion pointers is not decreased and therefore
the limit is still MAX_NC_DIMS no matter the dimensions are shared of not.
I wasn't aware of the details and gave you a wrong suggestion. Sorry about
it.
A quick fix to this problem is to not check the MAX_NC_DIMS. Each pointer
takes 4 bytes on 32-bit machines, 10,000 dimension pointers need 40k bytes.
This will not be a problem for most machines.
A better solution is to not create the duplicated dimension structures and
the pointers at the first place. This can be done by defining each dimension
and using those dimensions to create SDSs, similar to what the netCDF
interface does. However, this requires some current functions to be changed.
We need more discussions about this change.
>4. Any factors other than the max dim number restrict the number of SDS.
There is no other hard-coded limit on the number of dims. However, there
are several factors which should be considered.
In hdf files each dimension is stored as a vdata in a vgroup. All attributes
(of the HDF file, of the SDS' in the file and of the dimensions) are also
stored as vdatas. The maximum number of vdatas that can exist in an HDF file
is 64k, which is the maximum of reference numbers that any one type of object
can have. (A UINT16 is used for reference numbers, hence the 64K limit.)
Each SDS is stored as a vgroup. The maximum number of vgroups that can exist
in a file is also 64k, for the same reason as that for vdata.
If you don't call VSxxxx to create any vdatas on your own, and if you don't
write any attributes, the max number of dimensions will be 64k minus the
total number of SDSs.
If a dimension is assigned attributes, such as label/unit/format/scale/...,
the dimension will be promoted as an SDS, or coordinate SDS. The total
number of SDS, including the dataset SDS' and coordinate SDS' is limited
by MAX_NC_VARS.
Another factor is performance. The more vdatas to be written the longer
the SDend and SDstart take to complete writing/reading the file. Once I created
1300 3D SDS', and SDend was 4 times faster when the 3 dims were shared by all SDS'
than it was when the dimensions were not shared. We are working on this problem
and hopefully it will be improved on in next release.
I don't have any quantitative analysis on how the number of dims affects the
perfomance due to the memory they take. If performance is a issue, you might
want to experiment with it.
In addition to above factors one more reason for setting limits is to
facilitate writing applications and utilities, for example Fortran-77
utilities must allocate arrays statically by using a maximum array size.
We may change the limit to 20,000. However, the existing utilities and
tools will have problems to read new HDF files which contain more than
5000 SDSs.
We feel at this time, for the reasons given, that the 5K limit for a single
file is reasonable for most applications. If this creates an impossible
situation for your group we still can increase it to 20,000.
Now that we've given the reasons we have the 5000 limit, we'd like to
hear your opinions.
Thanks.
Shiming Xu
-------------------------------------------------------------------------
Tue Sep 19 13:40:04 CDT 1995
From Matt Choptuik and Robert Marsa
-------------------------------------------------------------------------
From [email protected] Tue Sep 19 13:38:58 1995
To: [email protected], [email protected], [email protected],
[email protected], [email protected],
[email protected], [email protected]
Subject: Re problems with MFSD interface in HDF
Dear Shiming and Mike:
We are disappointed, to say the least, with your prognosis.
We feel it is intuitively obvious that having *any*
limitation on the number of data objects one can write to a file
is a pretty bad design decision. At a minimum it certainly isn't
very forward looking. For example, we are in an era where simulations
can last many tens or thousands or even millions of time-steps; a canonical
way of viewing results is via video footage. At 30 frames a second
you're saying that no one will ever want to make a video that lasts
more than 3 minutes. The argument about utilities needing to have
limits is very weak, since any properly designed utility (even
in FORTRAN!!) can and should detect when its internal data structures
are full.
Also there are at least two separate issues here
(I apologize for turning one gripe into two but I have raised the
point in previous correspondence with Mike). The first is the
limitations we've been discussing. The second, which is really
more urgent, is that our programs on CRAYs are *core dumping* in the
HDF routines and we can't correlate the crashes with, say, the number
of data sets written, the way we can with the 5000 dimension limit.
This is a serious problem since we've had many instances now where
large two- and three-dimensional calculations have crapped out
after a significant run time (but after having only appended a few
10's to few 100's of data sets) and our only work-around has been
to write a single file per time step. We understand the need
to design and debug on workstations but one of the principal "features"
of HDF is supposed to be its machine-independence and CRAYs
still provide most of our high-perfomance cycles. A couple of
weeks ago Mike said
We didn't know about this Cray core dumping problem. (Or maybe we
misunderstood an earlier message from you.) After we solve the previous
problem on our local workstations, we will check it on a Cray.
so could we have someone there check this out on a Cray to see
whether they can reproduce the problem and isolate what we've been
running into? Perhaps it is just a case of the files being too large
but some experiments we've done suggest that that's *not* the case.
Thanks for keeping in touch, we'll do the same ...
Regards,
Matt Choptuik
Robert Marsa
-------------------------------------------------------------------------
Tue Sep 19 17:33:26 1995
From Mike Folk
-------------------------------------------------------------------------
Right. Now that we understand the other limitations, we will turn our
attention to the Cray problem. We'll keep you posted on our progress.
Hopefully we'll have something positive to report by the end of next week.
Mike
-------------------------------------------------------------------------
Wed Sep 20 13:51:16 1995
From Shiming Xu at NCSA
------------------------------------------------------------------------
From [email protected] Wed Sep 20 13:51:16 1995
Subject: Re: limits on dims and vars
Hi,
I ran your test program with HDF3.3r4patch04 libraries on SDSC's
(San Diego Supercomputing Center) C90, running UNICOS 8.0.3.2,
8 CPU 256 MW. It created 1666 SDS' with dimension sizes of 1,2,3,4,
and 20. No core dump or segmentation fault occurred. The display
of my screen is attached below. (maxd is my test program. Makefile
include both maxd and sdtest as targets. Therefore, 'make' makes
both maxd and sdtest.)
Which Cray are you using? We don't have Cray's any more at NCSA.
I am trying to get an account of Cray-ymp on an HDF user's machine.
If you are using YMP, I may try sdtest on that machine after the
account is available.
Another question is which version of HDF you are using?
Precompiled HDF3.3r4p4 for C90 is available on the NCSA ftp server,
in directory:
/HDF/HDF3.3r4p4/HDF3.3r4p4.c90.tar.Z
Thanks.
Shiming Xu
--------------------- display on my screen ----------------------
(output of sdtest 20 10000)
test1650:
test1651:
test1652:
test1653:
test1654:
test1655:
test1656:
test1657:
test1658:
test1659:
test1660:
test1661:
test1662:
test1663:
test1664:
test1665:
test1666:
Can't create data set test1666
c90-73% ls -l
total 211552
drwx------ 2 u785 use310 4096 Sep 20 18:14 .
drwx------ 4 u785 use310 4096 Sep 20 16:01 ..
-rw------- 1 u785 use310 366 Sep 20 18:14 Makefile
-rw------- 1 u785 use310 2267 Sep 20 16:00 max_dim.c
-rw------- 1 u785 use310 3012 Sep 20 18:14 max_dim.hdf
-rwx------ 1 u785 use310 741680 Sep 20 18:13 maxd
-rwx------ 1 u785 use310 772848 Sep 20 18:14 sdtest
-rw------- 1 u785 use310 1656 Sep 20 16:00 sdtest.c
-rw------- 1 u785 use310 106645750 Sep 20 18:16 test.hdf
c90-74% history
51 mkdir max_dim
52 mv *.c max_dim
53 mv M* max_dim
54 pwd
55 vi max_dim/Makefile
56 cd max_dim
57 vi max_dim.c
58 vi sdtest.c
59 make
60 pwd
61 vi Makefile
62 make
63 cc -g -o maxd max_dim.c -I/usr/tmp/u785/HDF3.3r4p4.c90/include -L/usr/tmp/u785/HDF3.3r4p4.c90/lib -lnetcdf -ldf -ljpeg
64 ls /usr/tmp/u785/HDF3.3r4p4.c90/lib
65 vi Makefile
66 make
67 maxd
68 sdtest 2 10000
69 sdtest 3 10000
70 sdtest 4 10000
71 sdtest 1 10000
72 sdtest 20 10000
73 ls -l
74 history
c90-75%
-------------------------------------------------------------------------
Wed Sep 20 14:09:53 CDT 1995
From Matt Choptuik at UT Austin
-------------------------------------------------------------------------
We are running our programs on a J90 (CRAY J916/5-1024 ) here at
Texas and on the C90 at Pittsburgh. In both cases we're using
HDF3.3r4, which as far as I can tell from the ftp site is the most
current non-alpha version. (The directory you mentioned
HDF/HDF3.3r4p4
does not seem to exist on your ftp server) I'll be glad to set you up access to
the J90 here if you want to check it out, it's on this machine that
we're most concerned about getting things straightened out in the
short term. Let me know how you wish to proceed.
Regards,
Matt Choptuik
-------------------------------------------------------------------------
Wed Sep 20 15:00:00 CDT 1995
From Shiming Xu at NCSA
-------------------------------------------------------------------------
>We are running our programs on a J90 (CRAY J916/5-1024 ) here at
>Texas and on the C90 at Pittsburgh. In both cases we're using
>HDF3.3r4, which as far as I can tell from the ftp site is the most
>current non-alpha version. (The directory you mentioned
>
> HDF/HDF3.3r4p4
>
>does not seem to exist on your ftp server)
Sorry, that should be /HDF/HDF3.3r4/bin/HDF3.3r4p4.c90.tar.Z.
Could you give it a try on C90 at Pittsburgh and let me know if it
works?
> I'll be glad to set you up access to
>the J90 here if you want to check it out, it's on this machine that
>we're most concerned about getting things straightened out in the
>short term. Let me know how you wish to proceed.
I haven't worked on J90 yet and have no idea how much difference it is
between C90 and J90. If binaries for C90 also work on J90, could you
try the precompiled code first? If not, either I give you the source code
for HDF3.3r4p4, you install the library and run sdtest; or, you give me
an account on your J90, I install the library and run the test. In either
case, if there is still core dump or segmentation fault I would like to
find out what causes the problem and fix the bug if there is one in HDF.
Thanks.
Shiming Xu
-------------------------------------------------------------------------
Sun Sep 24 16:00:00 CDT 1995
From Shiming Xu at NCSA
-------------------------------------------------------------------------
> I tried to use the C90-compiled distribution on our J90 but it's
> a no go. If you can supply me with the source, I will install
> it and test out the test program.
You may get the source code from either one of the two places:
The first place is our HDF ftp anonymous server, hdf.ncsa.uiuc.edu,
in subdirectory:
/pub/outgoing/sxu/3.3r4p4/33r4p4.src.tar.Z
This file contains compressed archived source code for HDF3.3r4 patch04.
All you need to do is to uncompress and un-tar the .tar.Z file and then
compile hdf/ and mfhdf/.
(FYI, I just checked out HDF3.3r4patch04 from our CVS, compiled and
tested on my SGI IRIX5.3. Everything worked fine, and the test passed.)
Another place is the NCSA ftp, ftp.ncsa.uiuc.edu, server.
The patch files are in /HDF/HDF3.3r4/patches/, and the patched
source programs are in /HDF/HDF3.3r4/patches/patchedsrc/.
The /HDF/HDF3.3r4/patches/README and the first paragraph of
the patch files explain how to use the patch files to patch
for HDF3.3r4, or replace the current HDF3.3r4 files with the
patched source files in /HDF/HDF3.3r4/patches/patchedsrc/.
Let me know if you have problem downloading or compiling it.
> As far as I can tell, there is
> *no core-dumping problem* on the C90 at Pittsburgh (even without
> using the patched version of 3.3r4). I'm theorizing that the
> problems we had on that machine were possibly due to file-quota
> violations and we assumed that they were related to the difficulty
> we were having with the J90.
Thank you for letting me know that HDF3.3r4 does not core dump on
the C90 and that possibly file-quota violations were the cause of
the problems you were having on J90.
Now, we need to work out the limitation problem. I sent an e-mail to
Mr. Robert Marsa last week Friday and haven't heard back from him
yet. It seems I should cc the e-mail to you as well. I am
attaching two relevant paragraphs below. Please let me know whether
or not you can use the unlimited dimension instead of unlimited number
or variables for your application.
Thanks.
Shiming Xu
------------- e-mail sent to [email protected] ----------
I understand the number of variables is a problem for your project.
If you give me more details (such as: what kind of data you are
collecting; what is defined as a variable; how you will view the
data using what tools; how big the datasets are; etc.) may be we
can figure out some solutions to reach your goal.
In sdtest you use nsteps to control the number of variables. Does it
mean that in your project you need to create a new variable for each
time step? If it is the case do you think we can use the unlimited
dimension to represent the time steps?
-----------------------------------------------------------------------
-------------------------------------------------------------------------
Mon Sep 25 16:00:00 CDT 1995
From Matt Choptuik at UT
-------------------------------------------------------------------------
Dear Shiming:
I've installed 3.3p4 on our J90 and my test program still core dumps
(although it gets a little further than it did previously). I don't
think it's a file-size problem since I can write a much larger file
using unformatted FORTRAN output, for example, without any problem.
I would appreciate if you take a look at things at this point.
If you will send me a short list of machine names (full Internet addresses)
and your login name on those machines, I will update the .rhosts
entry on the 'guest' account so that you will be able to login via
rlogin charon.cc.utexas.edu -l richard
Then
cd test_SD
where you will find a test program 'tsd.c'. After you
make tsd
your should be able to see the core dump by running
tsd foo 3 65 1024
which will attempt to output 1024 65x65x65 double sds's to foo.hdf,
but will actually only write 40.
As the makefile points out, the software was installed with prefix
/hpcf/u0/ph/az/phaz337 from /hpcf/u0/ph/az/phaz337/install/33r4p4
just in case you want to check that I did things properly.
Thanks ...
Matt Choptuik
-------------------------------------------------------------------------
Mon Sep 25 11:00:00 CDT 1995
From Shiming Xu at NCSA
-------------------------------------------------------------------------
Matt,
> I've installed 3.3p4 on our J90 and my test program still core dumps
> (although it gets a little further than it did previously). I don't
> think it's a file-size problem since I can write a much larger file
> using unformatted FORTRAN output, for example, without any problem.
> I would appreciate if you take a look at things at this point.
> If you will send me a short list of machine names (full Internet addresses)
> and your login name on those machines, I will update the .rhosts
> entry on the 'guest' account so that you will be able to login via
>
> rlogin charon.cc.utexas.edu -l richard
>
> Then
>
> cd test_SD
>
> where you will find a test program 'tsd.c'. After you
>
> make tsd
>
> your should be able to see the core dump by running
>
> tsd foo 3 65 1024
>
> which will attempt to output 1024 65x65x65 double sds's to foo.hdf,
> but will actually only write 40.
>
Thank you for setting up the account for me. I'll log in J90 from:
[email protected]
[email protected]
or
[email protected]
> As the makefile points out, the software was installed with prefix
> /hpcf/u0/ph/az/phaz337 from /hpcf/u0/ph/az/phaz337/install/33r4p4
> just in case you want to check that I did things properly.
By the way was the library compiled with '-g' option?
Who should I contact if I need help to get things work on your
machine, you or Marsa or Richard? Could you please give me
the contact person's phone number, just in case?
Thanks.
Shiming
-------------------------------------------------------------------------
Sun Sep 25 12:00:00 CDT 1995
From Matt Choptuik at UT
-------------------------------------------------------------------------
Shiming:
I've updated the .rhosts entry so you should be able to log into
the account now.
Please contact me, Matt Choptuik, if there are problems. Our phones
are a little screwed up here right now since three of us have swapped
offices, so it's best to phone Virginia at the Center for Relativity
office
(512) 471-1103
and have her transfer the call. Also, both libraries were compiled with the
'-g' option and here's part of the traceback I get when running in the
debugger
------------------------------------------------------------------------------
error exit instruction [signal SIGERR] in array$c.NC_arrayfill at line 205 in file array.c
File "array.c" does not exist or does not contain text. (dbg173)
(cdbx) where
Currently stopped in NC_arrayfill at line 205 in file array.c
-> NC_arrayfill was called by hdf_get_data at line 688 in putget.c (0p26430d) with:
[ 1] lo 24695 --> "\026\001\201\304*\202*\366"
[ 2] len = 1000000
[ 3] type = NC_DOUBLE
hdf_get_data was called by hdf_get_vp_aid at line 812 in putget.c (0p27035c) with:
[ 1] handle 417816 --> struct
[ 2] vp 3988966 --> struct
hdf_get_vp_aid was called by hdf_xdr_NCvdata at line 868 in putget.c (0p27151a) with:
[ 1] handle 417816 --> struct
[ 2] vp 3988966 --> struct
hdf_xdr_NCvdata was called by NCvario at line 1497 in putget.c (0p31503c) with:
[ 1] handle 417816 --> struct
[ 2] vp 3988966 --> struct
[ 3] where = 0
[ 4] type = NC_DOUBLE
[ 5] count = 274625
[ 6] values 143188 --> 00
------------------------------------------------------------------------------
Thanks,
Matt Choptuik
PS: Any info re HDF/netCDF documentation?
-------------------------------------------------------------------------
Mon Sep 25 16:00:00 CDT 1995
From Shiming Xu at NCSA
-------------------------------------------------------------------------
Matt,
====== Matt Choptuik's note of Sep 25: 'hdf documentation'======
> Shiming, one other thing. From time to time I peruse NCSA's ftp
> site to see if there is more updated/complete documentation
> on HDF/netCDF. I would appreciate if you could point me
> to the most recent/complete manuals that there are on the
> subject; it will certainly help us a lot as we continue to
> build stuff on top of your libraries ... thanks ... Matt Choptuik
>
HDF implemented netCDF model in mfhdf/. NetCDF documentation is
included in mfhdf/doc/ in our releases. However, the best place
to look for the netCDF docs is:
http://www.unidata.ucar.edu/packages/netcdf/guide.txn_toc.html
which contains a newer version.
There are some differences between netCDF API and
mfhdf API. For example, netCDF requires dimensions to be defined
before ncvardef uses the dims to define variables. While in HDF
SDcreate uses dim-sizes to create dimensions. We are going to use
netCDF approach in next generation of HDF. Another small
example of the differences between netCDF and HDF is that
MAX_NC_VARS is defined as 256 in netCDF, while it is defined
as 5000 in HDF.
You need to reference HDF documentation to use SD interface.
The most recent HDF documentation is the HDF Reference Manual and
the HDF User's Guide which is still a draft version. The final
version will be ready very soon. The draft of HDF User's Guide
is available on the NCSA ftp server, ftp.ncsa.uiuc.edu,
in directory:
/HDF/Documentation/HDF3.3/Users_Guide/HDF3.3_draft/
and the Reference Manual is in:
/HDF/Documentation/HDF3.3/Ref_Manual/.
I would be very glad to discuss with you about how to organize your
application data to make better use of HDF. I did this with many users
and most of the discussions were very fruitful for both parites.
Thanks.
Shiming Xu
-------------------------------------------------------------------------
Tue Sep 26
From Shiming Xu at NCSA
-------------------------------------------------------------------------
Matt,
Could you please set write permission for me to change the
33r4p4 source code and recompile the library, i. e. write permission
for /hpcf/u0/ph/az/phaz337/install/33r4p4/ and its subdirectories?
Please also set up the right $(PATH) for me in ~/.cshrc or ~/.login
file if any special path is required in compiling 33r4p4.
Thanks.
Shiming
-------------------------------------------------------------------------
Tue Sep 26
From Shiming Xu at NCSA
-------------------------------------------------------------------------
====== Matt Choptuik's note of Sep 26: 'Re: HDF on J90's'======
> Shiming ... I'd rather not change the permissions on any of *my* (phaz337)
> directories. Instead, I suggest you install in richard's directory
> (i.e. use install prefixes of $HOME while logged in as richard).
That's fine with me.
It is extermely slow to work across network
in day time. I am now testing memory allocation on my local
machines. Will log on J90 after the peak (across network) is over.
Thanks.
Shiming
> I just created install, lib, bin and include directories as richard
> then did a
>
> cd install
> zcat ~phaz337/install/33r4p4.src.tar.Z | tar xf -
>
> so if you log in as richard, then
>
> cd install/33r4p4
>
> you can take the installation from there. Then you'll have to
>
> cd ~/test_SD
>
> edit the makefile and change
>
> INCLUDEDIR = /hpcf/u0/ph/az/phaz337/include
> LIBDIR = /hpcf/u0/ph/az/phaz337/lib
>
> to
>
> INCLUDEDIR = /hpcf/u0/ph/az/richard/include
> LIB = /hpcf/u0/ph/az/richard/lib
>
> and remake 'tsd' if you want to run the test program.
>
> Let me know if you have any problems, in particular, your path
> should be OK ...
>
> Thanks
>
> Matt Choptuik
>
======= end of Matt Choptuik's forwarded note ======
-------------------------------------------------------------------------
Wed Sep 27
From Shiming Xu at NCSA
-------------------------------------------------------------------------
Matt,
Here are some preliminary results of my experiments on J90:
1. The actual failure happens at HDgetspace, a macro to malloc,
at line 684 of mfhdf/libsrc/putget.c where len=100000:
---------------------------------------------------------------
len = to_do * vp->szof; /* size of buffer for fill values
*/
==> values = (Void *) HDgetspace(len); /* buffer to hold unconv fill vals
*/
----------------------------------------------------------------
I added 4 lines after line 684 to check memory allocation error
and stop the program if values is 0. Otherwise, it will cause core
dump/segmentation fault when values is accessed.
-------------------------------------------------------------------
if (values == NULL) {
printf("Failed in malloc %d bytes in hdf_get_data\n", len);
return FALSE;
}
--------------------------------------------------------------------
I will report this problem to HDF group to add error checks after
HDgetspace calls.
2. It seems to me that we run out of memory on J90.
I don't know what is the maximum memory consumption set for each process
and the processes it spawns. (My guess is somewhere around 35 -- 40 MB).
'limit -v' said unlimited:
------------------------------------------
richard@charon(test_SD){26}% limit -v
unlimited CPU seconds
unlimited words of memory
Session sockbuf limit 0 clicks
------------------------------------------
Could your system administrator help us to find out and
to increase (if possible) the maximum memory?
Experiments on different memory limits showed that the smaller the
memory limit was the fewer number of variables tsd could create,
starting from 4M words.
I will continue studying on this memory allocation problem to see
if there is anything we can do from HDF side. I will report
the results to this week's HDF meeting and will let you know what
we think.
3. I created 1000 65^3 double-precision SDSs successfully on J90
using DFSD interface, see tdfsd.c in ~richard/. DFSD doesn't use
vdata/vgroup to implement SDS. Each new SDS requires only 1
reference number and two tags (DFTAG_SD and DFTAG_SDD)
if the number type and dim sizes are the same for all SDSs.
Therefore the total SDS in a file can be ~64k.
That's all for today.
Thanks.
Shiming
> (512) 471-1103
-------------------------------------------------------------------------
Thu Sep 28
From Shiming Xu at NCSA
-------------------------------------------------------------------------
Matt:
More on core dump:
1. The current implementation of HDF3.3r4p4 allocates a buffer when it is
needed and free the buffer when the job is done. This isn't a problem
on some systems but it is a problem on J90. After writing a certain
number of variables the system can't allocate big buffers and
tsd dies.
2. A quick fix is to hold the big buffer until all variables have been
written out when the file is closed. I have made the changes in
richard/install/mfhdf/libsrc and installed the modified library
in richard/lib and richard/include.
3. This fix works only if you open the file, write out all
variables, and then close the file. Each time the file is opened
the big buffer will be allocated. If the file is opened for too many
times you will still run out of memory and the program will fail.
In HDF4.0 we will use an ANSI call to free all buffers when exiting
the process. That takes care of the file open problem.
4. As mentioned in Mike's previous e-mail, each Vdata needs 35k bytes.
If the dimensions are not shared, 3000 dimensions will need 105MB!
The system can't allocate that many buffers and we will run out of
memory again. To solve this problem I added SDsetdimname in richard/tsd.c
to let all vars share the three dimensions.
5. With the above changes, tsd now created 1000 SDS to foo.hdf on J90.
6. The above changes will be wrapped in HDF4.0 release.
7. HDF uses a 32-bit integer to represent length and offset (in the file)
for each object. This implies that the largest size of a file is the maximum
value of a 32-bit integer which is ~2GB. The size of foo.hdf
almost hits the limit.
-rw-r----- 1 richard phaz 2197169915 Sep 28 18:12 foo.hdf
The same limit applies to unlimited dimension variables.
Please try the new version of the library and let me know if there is
any problem.
Thanks.
Shiming Xu