* Filestreams @ 2008-06-07 23:11 Richard Scobie 2008-06-09 3:31 ` Filestreams Eric Sandeen 0 siblings, 1 reply; 19+ messages in thread From: Richard Scobie @ 2008-06-07 23:11 UTC (permalink / raw) To: xfs Is my understanding of filestreams correct, in that files written to a particular filestreams enabled directory are all allocated to 1 ag as long as the filestreams timeout is not exceeded - which I believe is 30s? In other words, if I copy 100 files into a directory, (cp -a dir_containing_100_files dest_dir), wait 60 seconds and do (cp -a dir_containing_100_different_files dest_dir), each directory and contents in dest_dir will be stored in a single, different ag? Also, would it be possible for the filestreams parameter to be added to man pages/kernel docs? Thanks, Richard ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Filestreams 2008-06-07 23:11 Filestreams Richard Scobie @ 2008-06-09 3:31 ` Eric Sandeen 2008-06-10 1:49 ` Filestreams Timothy Shimmin 0 siblings, 1 reply; 19+ messages in thread From: Eric Sandeen @ 2008-06-09 3:31 UTC (permalink / raw) To: Richard Scobie; +Cc: xfs Richard Scobie wrote: > Is my understanding of filestreams correct, in that files written to a > particular filestreams enabled directory are all allocated to 1 ag as > long as the filestreams timeout is not exceeded - which I believe is 30s? > > In other words, if I copy 100 files into a directory, (cp -a > dir_containing_100_files dest_dir), wait 60 seconds and do (cp -a > dir_containing_100_different_files dest_dir), each directory and > contents in dest_dir will be stored in a single, different ag? > > Also, would it be possible for the filestreams parameter to be added to > man pages/kernel docs? Yes please :) I was just noticing last week that there is no documentation anywhere on this feature .... -Eric > Thanks, > > Richard > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Filestreams 2008-06-09 3:31 ` Filestreams Eric Sandeen @ 2008-06-10 1:49 ` Timothy Shimmin 2008-06-10 2:14 ` Filestreams Richard Scobie 2008-06-10 23:09 ` Filestreams (and 64bit inodes) Richard Scobie 0 siblings, 2 replies; 19+ messages in thread From: Timothy Shimmin @ 2008-06-10 1:49 UTC (permalink / raw) To: Eric Sandeen, Richard Scobie; +Cc: xfs Eric Sandeen wrote: > Richard Scobie wrote: >> Is my understanding of filestreams correct, in that files written to a >> particular filestreams enabled directory are all allocated to 1 ag as >> long as the filestreams timeout is not exceeded - which I believe is 30s? >> >> In other words, if I copy 100 files into a directory, (cp -a >> dir_containing_100_files dest_dir), wait 60 seconds and do (cp -a >> dir_containing_100_different_files dest_dir), each directory and >> contents in dest_dir will be stored in a single, different ag? >> >> Also, would it be possible for the filestreams parameter to be added to >> man pages/kernel docs? > > Yes please :) I was just noticing last week that there is no > documentation anywhere on this feature .... > > -Eric > >> Thanks, >> >> Richard >> >> > Sounds a reasonable idea :) BTW, Sam Vaughan wrote some tutorial notes on the allocator and in particular filestreams which I've pasted below: (I thought it might be here: http://oss.sgi.com/projects/xfs/training/ but I can't see it in the allocator lab and slides). ------------------------------------------------------------------------- Filestreams Allocator A certain class of applications such as those doing film scanner video ingest will write many large files to a directory in sequence. It's important for playback performance that these files end up allocated next to each other on disk, since consecutive data is retrieved optimally by hardware RAID read-ahead. XFS's standard allocator starts out doing the right thing as far as file allocation is concerned. Even if multiple streams are being written simultaneously, their files will be placed separately and contiguously on disk. The problem is that once an allocation group fills up, a new one must be chosen and there's no longer a parent directory in a unique AG to use as an AG "owner". Without a way to reserve the new AG for the original directory's use, all the files being allocated by all the streams will start getting placed in the same AGs as each other. The result is that consecutive frames in one directory are placed on disk with frames from other directories interleaved between them, which is a worst-case layout for playback performance. When reading back the frames in directory A, hardware RAID read-ahead will cache data from frames in directory B which is counterproductive. Create a file system with a small AG size to demonstrate: sles10:~ sjv: sudo mkfs.xfs -d agsize=64m /dev/sdb7 > /dev/null sles10:~ sjv: sudo mount /dev/sdb7 /test sles10:~ sjv: sudo chmod 777 /test sles10:~ sjv: cd /test sles10:/test sjv: Create ten 10MB files concurrently in two directories: sles10:/test sjv: mkdir a b sles10:/test sjv: for dir in a b; do > for file in `seq 0 9`; do > xfs_mkfile 10m $dir/$file > done & > done; wait 2>/dev/null [1] 30904 [2] 30905 sles10:/test sjv: ls -lid * */* 131 drwxr-xr-x 2 sjv users 86 2006-10-20 13:48 a 132 -rw------- 1 sjv users 10485760 2006-10-20 13:48 a/0 133 -rw------- 1 sjv users 10485760 2006-10-20 13:48 a/1 134 -rw------- 1 sjv users 10485760 2006-10-20 13:48 a/2 135 -rw------- 1 sjv users 10485760 2006-10-20 13:48 a/3 136 -rw------- 1 sjv users 10485760 2006-10-20 13:48 a/4 137 -rw------- 1 sjv users 10485760 2006-10-20 13:48 a/5 138 -rw------- 1 sjv users 10485760 2006-10-20 13:48 a/6 139 -rw------- 1 sjv users 10485760 2006-10-20 13:48 a/7 140 -rw------- 1 sjv users 10485760 2006-10-20 13:48 a/8 141 -rw------- 1 sjv users 10485760 2006-10-20 13:48 a/9 262272 drwxr-xr-x 2 sjv users 86 2006-10-20 13:48 b 262273 -rw------- 1 sjv users 10485760 2006-10-20 13:48 b/0 262274 -rw------- 1 sjv users 10485760 2006-10-20 13:48 b/1 262275 -rw------- 1 sjv users 10485760 2006-10-20 13:48 b/2 262276 -rw------- 1 sjv users 10485760 2006-10-20 13:48 b/3 262277 -rw------- 1 sjv users 10485760 2006-10-20 13:48 b/4 262278 -rw------- 1 sjv users 10485760 2006-10-20 13:48 b/5 262279 -rw------- 1 sjv users 10485760 2006-10-20 13:48 b/6 262280 -rw------- 1 sjv users 10485760 2006-10-20 13:48 b/7 262281 -rw------- 1 sjv users 10485760 2006-10-20 13:48 b/8 262282 -rw------- 1 sjv users 10485760 2006-10-20 13:48 b/9 sles10:/test sjv: Note that all the inodes are in the same AGs as each other. What about the file data? Use xfs_bmap -v to examine the extents: sles10:/test sjv: for file in `seq 0 9`; do > bmap_a=`xfs_bmap -v a/$file | tail -1` > bmap_b=`xfs_bmap -v b/$file | tail -1` > ag_a=`echo $bmap_a | awk '{print $4}'` > ag_b=`echo $bmap_b | awk '{print $4}'` > br_a=`echo $bmap_a | awk '{printf "%-18s", $3}'` > br_b=`echo $bmap_b | awk '{printf "%-18s", $3}'` > echo a/$file: $ag_a "$br_a" b/$file: $ag_b "$br_b" > done a/0: 0 96..20575 b/0: 1 131168..151647 a/1: 0 20576..41055 b/1: 1 151648..172127 a/2: 0 41056..61535 b/2: 1 172128..192607 a/3: 0 61536..82015 b/3: 1 192608..213087 a/4: 0 82016..102495 b/4: 1 213088..233567 a/5: 0 102496..122975 b/5: 1 233568..254047 a/6: 2 299600..300111 b/6: 2 262208..275007 a/7: 2 338016..338527 b/7: 2 312400..312911 a/8: 2 344672..361567 b/8: 3 393280..401983 a/9: 2 361568..382047 b/9: 3 401984..421951 sles10:/test sjv: The middle column is the AG number and the right column is the block range. Note how the extents for files in both directories get placed on top of each other in AG 2. Something to note in the results is that even though the file extents have worked their way up into AGs 2 and 3, the inode numbers show that the file inodes are all in the same AGs as their parent directory, i.e. AGs 0 and 1. Why is this? To understand, it's important to consider the order in which events are occurring. The two bash processes writing files are calling xfs_mkfile, which starts by opening a file with the O_CREAT flag. At this point, XFS has no idea how large the file's data is going to be, so it dutifully creates a new inode for the file in the same AG as the parent directory. The call returns successfully and the system continues with its tasks. When XFS is asked write the file data a short time later, a new AG must be found for it because the AG the inode is in is full. The result is a violation of the original goal to keep file data close to its inode on disk. In practice, because inodes are allocated in clusters on disk, a process that's reading back a stream is likely to cache all the inodes it needs with just one or two reads, so the disk seeking involved won't be as bad as it first seems. On the other hand, the extent data placement seen in the xfs_bmap -v output is a problem. Once the data extents spilled into AG 2, both processes were given allocations there on a first-come-first-served basis. This destroyed the neatly contiguous allocation pattern for the files and will certainly degrade read performance later on. To address this issue, a new allocation algorithm was added to XFS that associates a parent directory with an AG until a preset inactivity timeout elapses. The new algorithm is called the Filestreams allocator and it is enabled in one of two ways. Either the filesystem is mounted with the -o filestreams option, or the filestreams chattr flag is applied to a directory to indicate that all allocations beneath that point in the directory hierarchy should use the filestreams allocator. With the filestreams allocator enabled, the above test produces results that look like this: a/0: 0 96..20575 b/0: 1 131168..151647 a/1: 0 20576..41055 b/1: 1 151648..172127 a/2: 0 41056..61535 b/2: 1 172128..192607 a/3: 0 61536..82015 b/3: 1 192608..213087 a/4: 0 82016..102495 b/4: 1 213088..233567 a/5: 0 102496..122975 b/5: 1 233568..254047 a/6: 2 272456..273479 b/6: 3 393280..410271 a/7: 2 290904..300119 b/7: 3 410272..426655 a/8: 2 300632..321111 b/8: 3 426656..441503 a/9: 2 329304..343639 b/9: 3 441504..459935 Once the process writing files to the first directory starts using AG 2, that AG is no longer considered available so the other process skips it and moves to AG 3. --------------------------------- ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Filestreams 2008-06-10 1:49 ` Filestreams Timothy Shimmin @ 2008-06-10 2:14 ` Richard Scobie 2008-06-10 23:09 ` Filestreams (and 64bit inodes) Richard Scobie 1 sibling, 0 replies; 19+ messages in thread From: Richard Scobie @ 2008-06-10 2:14 UTC (permalink / raw) To: Timothy Shimmin; +Cc: Eric Sandeen, xfs Timothy Shimmin wrote: > BTW, Sam Vaughan wrote some tutorial notes on the allocator and in particular > filestreams which I've pasted below: Thanks Timothy, much appreciated. Regards, Richard ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Filestreams (and 64bit inodes) 2008-06-10 1:49 ` Filestreams Timothy Shimmin 2008-06-10 2:14 ` Filestreams Richard Scobie @ 2008-06-10 23:09 ` Richard Scobie 2008-06-11 1:39 ` Timothy Shimmin 1 sibling, 1 reply; 19+ messages in thread From: Richard Scobie @ 2008-06-10 23:09 UTC (permalink / raw) To: Timothy Shimmin; +Cc: xfs Timothy Shimmin wrote: > BTW, Sam Vaughan wrote some tutorial notes on the allocator and in particular > filestreams which I've pasted below: > (I thought it might be here: > http://oss.sgi.com/projects/xfs/training/ but I can't see it > in the allocator lab and slides). I did actually find an entry for filestreams in slide 6. While there I also found the information on 64bit inodes. My filesystem is 9.6TB and could well end up with a large quantity of 1-15MB files stored and the statement: "Operating system interfaces and legacy software products often mandate the use of 32 bit inode numbers even on systems that support 64 bit inode numbers." makes me wonder how common this still is in practice - the slide was written in 2006)? My initial preference would be to go with 64 bit inodes for performance reasons, but as one cannot revert the fs back to 32 bit inodes once committed, I am somewhat hesitant. Or am I worrying unecessarily about the negative impact of 32 bit inodes, given 9.6TB full of 1 to 15MB files? Regards, Richard ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Filestreams (and 64bit inodes) 2008-06-10 23:09 ` Filestreams (and 64bit inodes) Richard Scobie @ 2008-06-11 1:39 ` Timothy Shimmin 2008-06-11 2:47 ` Richard Scobie 2008-06-11 3:23 ` Eric Sandeen 0 siblings, 2 replies; 19+ messages in thread From: Timothy Shimmin @ 2008-06-11 1:39 UTC (permalink / raw) To: Richard Scobie; +Cc: xfs Hi Richard, Richard Scobie wrote: > Timothy Shimmin wrote: > >> BTW, Sam Vaughan wrote some tutorial notes on the allocator and in >> particular >> filestreams which I've pasted below: >> (I thought it might be here: >> http://oss.sgi.com/projects/xfs/training/ but I can't see it >> in the allocator lab and slides). > > I did actually find an entry for filestreams in slide 6. > Yes, so did I but on a quick scan it didn't have the same detail. > While there I also found the information on 64bit inodes. > > My filesystem is 9.6TB and could well end up with a large quantity of > 1-15MB files stored and the statement: > > "Operating system interfaces and legacy software products often mandate > the use of 32 bit inode numbers even on systems that support 64 bit > inode numbers." > > makes me wonder how common this still is in practice - the slide was > written in 2006)? > > My initial preference would be to go with 64 bit inodes for performance > reasons, but as one cannot revert the fs back to 32 bit inodes once > committed, I am somewhat hesitant. > > Or am I worrying unecessarily about the negative impact of 32 bit > inodes, given 9.6TB full of 1 to 15MB files? > Ah, the 32 bit inode versus 64 bit inode question :) I don't have any definitive answers and I'm sure there will be people on the list with their opinions and experiences. So just some thoughts... Firstly, XFS' current support for 32 bit inode numbers was added as an afterthought I think primarily at the time on IRIX for 32 bit backup clients such as Legato Networker. It is only a compatibility thing with performance downsides. So the question then becomes (1)what exactly is the compatibility matrix and (2)under what conditions are there performance problems and by how much. The other thing (3) is then a conversion tool for moving back from 64 bit inodes to 32 bit inodes if you have a compat problem. (3) There is a conversion tool called xfs_reno on IRIX. Barry has ported and modified it for Linux but I believe has not checked it in and has some outstanding Agami review points to address. Ideally, it would be nicer if we had more kernel support (as suggested by Dave (dgc)) for swapping all the inode's metadata (instead of just extents as we currently do). So in other words, it is not there yet and there is the question of whether we should update the kernel first (or maybe the tool should go in anyway for use on older kernels). (1) It would be nice to know what the state of the apps really are. There is also the question of interaction with CXFS and NFS. Greg Banks has a compat matrix for NFS. It looks like the main things is to get something half recent - linux 2.6, nfs v3, apps which use 64 bit sys calls (eg. stat64) etc... Would need to do investigating. There is also the possibility of other 32bit/64bit mapping schemes for xfs but I won't go there. --Tim ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Filestreams (and 64bit inodes) 2008-06-11 1:39 ` Timothy Shimmin @ 2008-06-11 2:47 ` Richard Scobie 2008-06-11 3:23 ` Eric Sandeen 1 sibling, 0 replies; 19+ messages in thread From: Richard Scobie @ 2008-06-11 2:47 UTC (permalink / raw) To: Timothy Shimmin; +Cc: xfs Hi Timothy, Timothy Shimmin wrote: > Ah, the 32 bit inode versus 64 bit inode question :) > I don't have any definitive answers and I'm sure there will be people > on the list with their opinions and experiences. > So just some thoughts... On balance, I'm thinking the best compromise might be to stay 32 bit and bump the inode size to 2048 bytes. Regards, Richard ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Filestreams (and 64bit inodes) 2008-06-11 1:39 ` Timothy Shimmin 2008-06-11 2:47 ` Richard Scobie @ 2008-06-11 3:23 ` Eric Sandeen 2008-06-12 13:52 ` Eric Sandeen 1 sibling, 1 reply; 19+ messages in thread From: Eric Sandeen @ 2008-06-11 3:23 UTC (permalink / raw) To: Timothy Shimmin; +Cc: Richard Scobie, xfs Timothy Shimmin wrote: > (1) It would be nice to know what the state of the apps really are. > There is also the question of interaction with CXFS and NFS. > Greg Banks has a compat matrix for NFS. It looks like the main > things is to get something half recent - linux 2.6, nfs v3, > apps which use 64 bit sys calls (eg. stat64) etc... > Would need to do investigating. Greg has a tool to scan binaries... some day I'm going to run it over the fedora universe, I'll get back to you... someday. -Eric ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Filestreams (and 64bit inodes) 2008-06-11 3:23 ` Eric Sandeen @ 2008-06-12 13:52 ` Eric Sandeen 2008-06-13 1:28 ` Greg Banks 0 siblings, 1 reply; 19+ messages in thread From: Eric Sandeen @ 2008-06-12 13:52 UTC (permalink / raw) To: Timothy Shimmin; +Cc: Richard Scobie, xfs, Greg Banks Eric Sandeen wrote: > Timothy Shimmin wrote: > >> (1) It would be nice to know what the state of the apps really are. >> There is also the question of interaction with CXFS and NFS. >> Greg Banks has a compat matrix for NFS. It looks like the main >> things is to get something half recent - linux 2.6, nfs v3, >> apps which use 64 bit sys calls (eg. stat64) etc... >> Would need to do investigating. > > Greg has a tool to scan binaries... some day I'm going to run it over > the fedora universe, I'll get back to you... someday. someday didn't take too long :) but it ain't pretty. I installed all fedora packages under a directory and ran greg's tool over: /sbin /usr/sbin /bin /usr/bin /usr/kerberos/bin/ /usr/kerberos/sbin/ Aggregate results: 4070 29.1% are scripts (shell, perl, whatever) 6598 47.2% don't use any stat() family calls at all 1829 13.1% use 32-bit stat() family interfaces only 1312 9.4% use 64-bit stat64() family interfaces only 180 1.3% use both 32-bit and 64-bit stat() family interfaces list of packages, sorted by the semi-lame "number of files in package which call a 32-bit stat variant" metric: http://sandeen.fedorapeople.org/stat32-ers I'm going to see if I can't leverage Fedora to clean some of this up. -Eric ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Filestreams (and 64bit inodes) 2008-06-12 13:52 ` Eric Sandeen @ 2008-06-13 1:28 ` Greg Banks 2008-06-13 3:06 ` Eric Sandeen 2008-06-13 3:20 ` Mark Goodwin 0 siblings, 2 replies; 19+ messages in thread From: Greg Banks @ 2008-06-13 1:28 UTC (permalink / raw) To: Eric Sandeen; +Cc: Timothy Shimmin, Richard Scobie, xfs Eric Sandeen wrote: > Eric Sandeen wrote: > >> Timothy Shimmin wrote: >> >> >>> (1) It would be nice to know what the state of the apps really are. >>> There is also the question of interaction with CXFS and NFS. >>> Greg Banks has a compat matrix for NFS. It looks like the main >>> things is to get something half recent - linux 2.6, nfs v3, >>> apps which use 64 bit sys calls (eg. stat64) etc... >>> Would need to do investigating. >>> >> Greg has a tool to scan binaries... some day I'm going to run it over >> the fedora universe, I'll get back to you... someday. >> > > someday didn't take too long :) Cool, thanks for the data Eric. > but it ain't pretty. > > I installed all fedora packages under a directory and ran greg's tool over: > > /sbin /usr/sbin /bin /usr/bin /usr/kerberos/bin/ /usr/kerberos/sbin/ > > Aggregate results: > > 4070 29.1% are scripts (shell, perl, whatever) > 6598 47.2% don't use any stat() family calls at all > 1829 13.1% use 32-bit stat() family interfaces only > 1312 9.4% use 64-bit stat64() family interfaces only > 180 1.3% use both 32-bit and 64-bit stat() family interfaces > Ouch. That's over two thousand executables to patch, rebuild, and ship. > list of packages, sorted by the semi-lame "number of files in package > which call a 32-bit stat variant" metric: > > http://sandeen.fedorapeople.org/stat32-ers > > I'm going to see if I can't leverage Fedora to clean some of this up. > > -Eric > Good luck with that. -- Greg Banks, P.Engineer, SGI Australian Software Group. The cake is *not* a lie. I don't speak for SGI. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Filestreams (and 64bit inodes) 2008-06-13 1:28 ` Greg Banks @ 2008-06-13 3:06 ` Eric Sandeen 2008-06-13 3:24 ` Greg Banks 2008-06-13 3:20 ` Mark Goodwin 1 sibling, 1 reply; 19+ messages in thread From: Eric Sandeen @ 2008-06-13 3:06 UTC (permalink / raw) To: Greg Banks; +Cc: Timothy Shimmin, Richard Scobie, xfs Greg Banks wrote: > Eric Sandeen wrote: > Cool, thanks for the data Eric. > >> but it ain't pretty. >> >> I installed all fedora packages under a directory and ran greg's tool over: >> >> /sbin /usr/sbin /bin /usr/bin /usr/kerberos/bin/ /usr/kerberos/sbin/ >> >> Aggregate results: >> >> 4070 29.1% are scripts (shell, perl, whatever) >> 6598 47.2% don't use any stat() family calls at all >> 1829 13.1% use 32-bit stat() family interfaces only >> 1312 9.4% use 64-bit stat64() family interfaces only >> 180 1.3% use both 32-bit and 64-bit stat() family interfaces >> > Ouch. That's over two thousand executables to patch, rebuild, and ship. >> list of packages, sorted by the semi-lame "number of files in package >> which call a 32-bit stat variant" metric: >> >> http://sandeen.fedorapeople.org/stat32-ers And about 900 packages... >> I'm going to see if I can't leverage Fedora to clean some of this up. >> >> -Eric >> > Good luck with that. Heh :) At first I was just going to correlate with st_ino users to cut it down, but then I learned that glibc will actually give you an EOVERFLOW if, say st_ino overflows, even if you were only going to check st_mode. :( So pretty much everything needs fixing. (FWIW I gathered statfs/statvfs calls, too...) -Eric ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Filestreams (and 64bit inodes) 2008-06-13 3:06 ` Eric Sandeen @ 2008-06-13 3:24 ` Greg Banks 0 siblings, 0 replies; 19+ messages in thread From: Greg Banks @ 2008-06-13 3:24 UTC (permalink / raw) To: Eric Sandeen; +Cc: Timothy Shimmin, Richard Scobie, xfs Eric Sandeen wrote: > > Heh :) At first I was just going to correlate with st_ino users to cut > it down, but then I learned that glibc will actually give you an > EOVERFLOW if, say st_ino overflows, even if you were only going to check > st_mode. :( So pretty much everything needs fixing. > Of course. There's no way for the app to tell glibc that the app doesn't care about st_ino, so glibc must assume that glibc needs to return an accurate st_ino. The alternative is to return the lower 32 bits of st_ino, thus causing silent subtle failures in the very small number of applications which actually do something with st_ino. This is what glibc used to do back when I first started tracking the issue. > (FWIW I gathered statfs/statvfs calls, too...) > > Yes. -- Greg Banks, P.Engineer, SGI Australian Software Group. The cake is *not* a lie. I don't speak for SGI. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Filestreams (and 64bit inodes) 2008-06-13 1:28 ` Greg Banks 2008-06-13 3:06 ` Eric Sandeen @ 2008-06-13 3:20 ` Mark Goodwin 2008-06-13 3:40 ` Greg Banks 2008-06-13 3:45 ` Eric Sandeen 1 sibling, 2 replies; 19+ messages in thread From: Mark Goodwin @ 2008-06-13 3:20 UTC (permalink / raw) To: Greg Banks; +Cc: Eric Sandeen, Timothy Shimmin, Richard Scobie, xfs Greg Banks wrote: > Eric Sandeen wrote: >> 4070 29.1% are scripts (shell, perl, whatever) >> 6598 47.2% don't use any stat() family calls at all >> 1829 13.1% use 32-bit stat() family interfaces only >> 1312 9.4% use 64-bit stat64() family interfaces only >> 180 1.3% use both 32-bit and 64-bit stat() family interfaces >> > Ouch. That's over two thousand executables to patch, rebuild, and ship. >> list of packages, sorted by the semi-lame "number of files in package >> which call a 32-bit stat variant" metric: >> >> http://sandeen.fedorapeople.org/stat32-ers struct dirent has an embedded ino_t too, so for completeness we should also be looking for readdir(), readdir64(), getdirentries(), getdirentries64(), etc. >> I'm going to see if I can't leverage Fedora to clean some of this up. >> >> -Eric >> > Good luck with that. Yes good luck :) (and the plan for statically linked apps? ...) Cheers -- Mark Goodwin markgw@sgi.com Engineering Manager for XFS and PCP Phone: +61-3-99631937 SGI Australian Software Group Cell: +61-4-18969583 ------------------------------------------------------------- ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Filestreams (and 64bit inodes) 2008-06-13 3:20 ` Mark Goodwin @ 2008-06-13 3:40 ` Greg Banks 2008-06-13 3:46 ` Eric Sandeen 2008-06-13 5:35 ` Greg Banks 2008-06-13 3:45 ` Eric Sandeen 1 sibling, 2 replies; 19+ messages in thread From: Greg Banks @ 2008-06-13 3:40 UTC (permalink / raw) To: markgw; +Cc: Eric Sandeen, Timothy Shimmin, Richard Scobie, xfs Mark Goodwin wrote: > > > Greg Banks wrote: >> Eric Sandeen wrote: >>> 4070 29.1% are scripts (shell, perl, whatever) >>> 6598 47.2% don't use any stat() family calls at all >>> 1829 13.1% use 32-bit stat() family interfaces only >>> 1312 9.4% use 64-bit stat64() family interfaces only >>> 180 1.3% use both 32-bit and 64-bit stat() family interfaces >>> >> Ouch. That's over two thousand executables to patch, rebuild, and ship. >>> list of packages, sorted by the semi-lame "number of files in package >>> which call a 32-bit stat variant" metric: >>> >>> http://sandeen.fedorapeople.org/stat32-ers > > struct dirent has an embedded ino_t too, so for completeness we should > also > be looking for readdir(), readdir64(), getdirentries(), > getdirentries64(), etc. Good point. Looking in the code, it seems the getdents common code in glibc will fail with EOVERFLOW if the inode number gets truncated during 64b-32b translation, just like the stat() family. I'll need to improve the scanning tool :-) > >>> I'm going to see if I can't leverage Fedora to clean some of this up. >>> >>> -Eric >>> >> Good luck with that. > > Yes good luck :) > (and the plan for statically linked apps? ...) Perhaps Fedora could enable the glibc magic for issuing warnings at link time when those symbols are used, like what happens today if you use the old unsafe gets() function: gnb@inara 1058> gcc -o fmeh x.c /tmp/ccQhxIIo.o: In function `main': x.c:(.text+0x18): warning: the `gets' function is dangerous and should not be used. -- Greg Banks, P.Engineer, SGI Australian Software Group. The cake is *not* a lie. I don't speak for SGI. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Filestreams (and 64bit inodes) 2008-06-13 3:40 ` Greg Banks @ 2008-06-13 3:46 ` Eric Sandeen 2008-06-13 3:57 ` Greg Banks 2008-06-13 5:35 ` Greg Banks 1 sibling, 1 reply; 19+ messages in thread From: Eric Sandeen @ 2008-06-13 3:46 UTC (permalink / raw) To: Greg Banks; +Cc: markgw, Timothy Shimmin, Richard Scobie, xfs Greg Banks wrote: >> (and the plan for statically linked apps? ...) > > Perhaps Fedora could enable the glibc magic for issuing warnings at link > time when those symbols are used, like what happens today if you use the > old unsafe gets() function: > > gnb@inara 1058> gcc -o fmeh x.c > /tmp/ccQhxIIo.o: In function `main': > x.c:(.text+0x18): warning: the `gets' function is dangerous and should not be used. It wouldn't help the automated builds (nobody'd see it) but for normal user compilations that might be an option... -Eric ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Filestreams (and 64bit inodes) 2008-06-13 3:46 ` Eric Sandeen @ 2008-06-13 3:57 ` Greg Banks 0 siblings, 0 replies; 19+ messages in thread From: Greg Banks @ 2008-06-13 3:57 UTC (permalink / raw) To: Eric Sandeen; +Cc: markgw, Timothy Shimmin, Richard Scobie, xfs Eric Sandeen wrote: > Greg Banks wrote: > > >>> (and the plan for statically linked apps? ...) >>> >> Perhaps Fedora could enable the glibc magic for issuing warnings at link >> time when those symbols are used, like what happens today if you use the >> old unsafe gets() function: >> >> gnb@inara 1058> gcc -o fmeh x.c >> /tmp/ccQhxIIo.o: In function `main': >> x.c:(.text+0x18): warning: the `gets' function is dangerous and should not be used. >> > > > It wouldn't help the automated builds (nobody'd see it) Hmm. I don't see a way in glibc to upgrade that warning to an error to make those builds fail, at least in glibc 2.4. > but for normal > user compilations that might be an option... > > Indeed. -- Greg Banks, P.Engineer, SGI Australian Software Group. The cake is *not* a lie. I don't speak for SGI. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Filestreams (and 64bit inodes) 2008-06-13 3:40 ` Greg Banks 2008-06-13 3:46 ` Eric Sandeen @ 2008-06-13 5:35 ` Greg Banks 2008-06-13 13:28 ` Eric Sandeen 1 sibling, 1 reply; 19+ messages in thread From: Greg Banks @ 2008-06-13 5:35 UTC (permalink / raw) To: Eric Sandeen; +Cc: markgw, Timothy Shimmin, Richard Scobie, xfs [-- Attachment #1: Type: text/plain, Size: 5896 bytes --] Greg Banks wrote: > Mark Goodwin wrote: > >> Greg Banks wrote: >> >>> Eric Sandeen wrote: >>> >>>> 4070 29.1% are scripts (shell, perl, whatever) >>>> 6598 47.2% don't use any stat() family calls at all >>>> 1829 13.1% use 32-bit stat() family interfaces only >>>> 1312 9.4% use 64-bit stat64() family interfaces only >>>> 180 1.3% use both 32-bit and 64-bit stat() family interfaces >>>> >>>> >>> Ouch. That's over two thousand executables to patch, rebuild, and ship. >>> >>>> list of packages, sorted by the semi-lame "number of files in package >>>> which call a 32-bit stat variant" metric: >>>> >>>> http://sandeen.fedorapeople.org/stat32-ers >>>> >> struct dirent has an embedded ino_t too, so for completeness we should >> also >> be looking for readdir(), readdir64(), getdirentries(), >> getdirentries64(), etc. >> > Good point. Looking in the code, it seems the getdents common code in > glibc will fail with EOVERFLOW if the inode number gets truncated during > 64b-32b translation, just like the stat() family. Experiment confirms this behaviour, using a small wrapper program around opendir/readdir/closedir: heave:~/stat64/mnt # ../myls64 idx d_ino d_off d_type d_name --- ---------------- ----- ------ ------ 0 0000000000000080 4 0 . 1 0000000000000080 6 0 .. 2 0000000000000083 8 0 d0 3 0000000080000080 10 0 d1 4 0000000100000080 12 0 d2 5 0000000180000080 14 0 d3 6 0000000200000080 16 0 d4 7 0000000280000080 18 0 d5 8 0000000300000080 20 0 d6 9 0000000380000080 22 0 d7 10 0000000400000080 24 0 d8 11 0000000480000080 26 0 d9 12 0000000500000080 28 0 d10 13 0000000580000080 30 0 d11 14 0000000600000080 32 0 d12 15 0000000680000080 34 0 d13 16 0000000700000080 36 0 d14 17 0000000780000080 38 0 d15 18 0000000800000080 40 0 d16 19 0000000880000080 42 0 d17 20 0000000900000080 44 0 d18 21 0000000980000080 46 0 d19 22 0000000a00000080 48 0 d20 23 0000000a80000080 50 0 d21 24 0000000b00000080 52 0 d22 25 0000000b80000080 54 0 d23 26 0000000c00000080 56 0 d24 27 0000000c80000080 58 0 d25 28 0000000d00000080 60 0 d26 29 0000000d80000080 62 0 d27 30 0000000e00000080 64 0 d28 31 0000000e80000080 66 0 d29 32 0000000f00000080 68 0 d30 33 0000000f80000080 70 0 d31 34 0000001000080080 72 0 d32 35 0000001080000080 74 0 d33 36 0000001100000080 76 0 d34 37 0000001180000080 78 0 d35 38 0000001200000080 80 0 d36 39 0000001280000080 82 0 d37 40 0000001300000080 84 0 d38 41 0000001380000080 86 0 d39 42 0000001400000080 88 0 d40 43 0000001480000080 90 0 d41 44 0000001500000080 92 0 d42 45 0000001580000080 94 0 d43 46 0000001600000080 96 0 d44 47 0000001680000080 98 0 d45 48 0000001700000080 100 0 d46 49 0000001780000080 102 0 d47 50 0000001800000080 104 0 d48 51 0000001880000080 106 0 d49 52 0000001900000080 108 0 d50 53 0000001980000080 110 0 d51 54 0000001a00000080 112 0 d52 55 0000001a80000080 114 0 d53 56 0000001b00000080 116 0 d54 57 0000001b80000080 118 0 d55 58 0000001c00000080 120 0 d56 59 0000001c80000080 122 0 d57 60 0000001d00000080 124 0 d58 61 0000001d80000080 126 0 d59 62 0000001e00000080 128 0 d60 63 0000001e80000080 130 0 d61 64 0000001f00000080 132 0 d62 65 0000001f80000080 512 0 d63 heave:~/stat64/mnt # ../myls32 idx d_ino d_off d_type d_name --- ---------------- ----- ------ ------ 0 0000000000000080 4 0 . 1 0000000000000080 6 0 .. 2 0000000000000083 8 0 d0 3 0000000080000080 10 0 d1 .: Value too large for defined data type heave:~/stat64/mnt # strace -s1024 ../myls32 execve("../myls32", ["../myls32"], [/* 60 vars */]) = 0 [ Process PID=12640 runs in 32 bit mode. ] ... open(".", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 3 ... write(1, "idx d_ino d_off d_type d_name\n", 41) = 41 write(1, "--- ---------------- ----- ------ ------\n", 41) = 41 getdents64(3, /* 66 entries */, 4096) = 1584 _llseek(3, 10, [10], SEEK_SET) = 0 write(1, " 0 0000000000000080 4 0 .\n", 36) = 36 write(1, " 1 0000000000000080 6 0 ..\n", 37) = 37 write(1, " 2 0000000000000083 8 0 d0\n", 37) = 37 write(1, " 3 0000000080000080 10 0 d1\n", 37) = 37 getdents64(3, /* 62 entries */, 4096) = 1488 ... write(4, ".: Value too large for defined data type\n", 41) = 41 ... close(3) = 0 exit_group(0) = ? Process 12640 detached I also confirmed that using readdir64() allows the 32b app to work. Of course, the glibc readdir() interface makes it extra hard for the caller to tell the difference between an error and a normal EOF. In both cases, NULL is returned. In the error case, errno is set. In the EOF case, errno is unchanged. In the success case, EOF is also unchanged. So to detect the error from readdir() the application writer needs to do something like: DIR *dir; struct direntry *de; ... errno = 0; while ((de = readdir(dir)) != NULL) { // handle entry errno = 0; } if (errno) { // handle error } Otherwise the directory traversal just finishes early with no error reported. I'll bet that's what all the apps do :-) > I'll need to improve > the scanning tool :-) Attached. -- Greg Banks, P.Engineer, SGI Australian Software Group. The cake is *not* a lie. I don't speak for SGI. [-- Attachment #2: summarise-stat64-2.pl --] [-- Type: application/x-perl, Size: 4148 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Filestreams (and 64bit inodes) 2008-06-13 5:35 ` Greg Banks @ 2008-06-13 13:28 ` Eric Sandeen 0 siblings, 0 replies; 19+ messages in thread From: Eric Sandeen @ 2008-06-13 13:28 UTC (permalink / raw) To: Greg Banks; +Cc: markgw, Timothy Shimmin, Richard Scobie, xfs Greg Banks wrote: >> I'll need to improve >> the scanning tool :-) > Attached. > I had done something similar, except: --- summarise-stat64-2.pl.orig 2008-06-13 08:26:07.000000000 -0500 +++ summarise-stat64-2.pl 2008-06-13 08:26:24.000000000 -0500 @@ -93,11 +93,11 @@ { $res{used64}++; } - elsif (m/^\s+U readdir$/) + elsif (m/^\s+U readdir(|_r)$/) { $res{used32}++; } - elsif (m/^\s+U readdir64$/) + elsif (m/^\s+U readdir64(|_r)$/) { $res{used64}++; } (lazily whitespace-mangled, sorry) A new scan with my slightly different version of the tool yielded: 447 readdir32 908 stat32 38 statfs32 where the nr. is the nr. of packages with that particular 32-bit call. -Eric ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Filestreams (and 64bit inodes) 2008-06-13 3:20 ` Mark Goodwin 2008-06-13 3:40 ` Greg Banks @ 2008-06-13 3:45 ` Eric Sandeen 1 sibling, 0 replies; 19+ messages in thread From: Eric Sandeen @ 2008-06-13 3:45 UTC (permalink / raw) To: markgw; +Cc: Greg Banks, Timothy Shimmin, Richard Scobie, xfs Mark Goodwin wrote: > struct dirent has an embedded ino_t too, so for completeness we should also > be looking for readdir(), readdir64(), getdirentries(), > getdirentries64(), etc. *sigh* >>> I'm going to see if I can't leverage Fedora to clean some of this up. >>> >>> -Eric >>> >> Good luck with that. > > Yes good luck :) > (and the plan for statically linked apps? ...) I plan to encourage fedora to keep discouraging them ;) -Eric ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2008-06-13 13:27 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-06-07 23:11 Filestreams Richard Scobie 2008-06-09 3:31 ` Filestreams Eric Sandeen 2008-06-10 1:49 ` Filestreams Timothy Shimmin 2008-06-10 2:14 ` Filestreams Richard Scobie 2008-06-10 23:09 ` Filestreams (and 64bit inodes) Richard Scobie 2008-06-11 1:39 ` Timothy Shimmin 2008-06-11 2:47 ` Richard Scobie 2008-06-11 3:23 ` Eric Sandeen 2008-06-12 13:52 ` Eric Sandeen 2008-06-13 1:28 ` Greg Banks 2008-06-13 3:06 ` Eric Sandeen 2008-06-13 3:24 ` Greg Banks 2008-06-13 3:20 ` Mark Goodwin 2008-06-13 3:40 ` Greg Banks 2008-06-13 3:46 ` Eric Sandeen 2008-06-13 3:57 ` Greg Banks 2008-06-13 5:35 ` Greg Banks 2008-06-13 13:28 ` Eric Sandeen 2008-06-13 3:45 ` Eric Sandeen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox