public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* Filestreams
@ 2008-06-07 23:11 Richard Scobie
  2008-06-09  3:31 ` Filestreams Eric Sandeen
  0 siblings, 1 reply; 19+ messages in thread
From: Richard Scobie @ 2008-06-07 23:11 UTC (permalink / raw)
  To: xfs

Is my understanding of filestreams correct, in that files written to a 
particular filestreams enabled directory are all allocated to 1 ag as 
long as the filestreams timeout is not exceeded - which I believe is 30s?

In other words, if I copy 100 files into a directory, (cp -a 
dir_containing_100_files dest_dir), wait 60 seconds and do (cp -a 
dir_containing_100_different_files dest_dir), each directory and 
contents in dest_dir will be stored in a single, different ag?

Also, would it be possible for the filestreams parameter to be added to 
man pages/kernel docs?

Thanks,

Richard

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Filestreams
  2008-06-07 23:11 Filestreams Richard Scobie
@ 2008-06-09  3:31 ` Eric Sandeen
  2008-06-10  1:49   ` Filestreams Timothy Shimmin
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Sandeen @ 2008-06-09  3:31 UTC (permalink / raw)
  To: Richard Scobie; +Cc: xfs

Richard Scobie wrote:
> Is my understanding of filestreams correct, in that files written to a 
> particular filestreams enabled directory are all allocated to 1 ag as 
> long as the filestreams timeout is not exceeded - which I believe is 30s?
> 
> In other words, if I copy 100 files into a directory, (cp -a 
> dir_containing_100_files dest_dir), wait 60 seconds and do (cp -a 
> dir_containing_100_different_files dest_dir), each directory and 
> contents in dest_dir will be stored in a single, different ag?
> 
> Also, would it be possible for the filestreams parameter to be added to 
> man pages/kernel docs?

Yes please :)  I was just noticing last week that there is no
documentation anywhere on this feature ....

-Eric

> Thanks,
> 
> Richard
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Filestreams
  2008-06-09  3:31 ` Filestreams Eric Sandeen
@ 2008-06-10  1:49   ` Timothy Shimmin
  2008-06-10  2:14     ` Filestreams Richard Scobie
  2008-06-10 23:09     ` Filestreams (and 64bit inodes) Richard Scobie
  0 siblings, 2 replies; 19+ messages in thread
From: Timothy Shimmin @ 2008-06-10  1:49 UTC (permalink / raw)
  To: Eric Sandeen, Richard Scobie; +Cc: xfs

Eric Sandeen wrote:
> Richard Scobie wrote:
>> Is my understanding of filestreams correct, in that files written to a 
>> particular filestreams enabled directory are all allocated to 1 ag as 
>> long as the filestreams timeout is not exceeded - which I believe is 30s?
>>
>> In other words, if I copy 100 files into a directory, (cp -a 
>> dir_containing_100_files dest_dir), wait 60 seconds and do (cp -a 
>> dir_containing_100_different_files dest_dir), each directory and 
>> contents in dest_dir will be stored in a single, different ag?
>>
>> Also, would it be possible for the filestreams parameter to be added to 
>> man pages/kernel docs?
> 
> Yes please :)  I was just noticing last week that there is no
> documentation anywhere on this feature ....
> 
> -Eric
> 
>> Thanks,
>>
>> Richard
>>
>>
> 

Sounds a reasonable idea :)


BTW, Sam Vaughan wrote some tutorial notes on the allocator and in particular
filestreams which I've pasted below:
(I thought it might be here:
 http://oss.sgi.com/projects/xfs/training/ but I can't see it
 in the allocator lab and slides).

-------------------------------------------------------------------------
Filestreams Allocator

A certain class of applications such as those doing film scanner
video ingest will write many large files to a directory in sequence.
It's important for playback performance that these files end up
allocated next to each other on disk, since consecutive data is
retrieved optimally by hardware RAID read-ahead.

XFS's standard allocator starts out doing the right thing as far
as file allocation is concerned.  Even if multiple streams are being
written simultaneously, their files will be placed separately and
contiguously on disk.  The problem is that once an allocation group
fills up, a new one must be chosen and there's no longer a parent
directory in a unique AG to use as an AG "owner".  Without a way
to reserve the new AG for the original directory's use, all the
files being allocated by all the streams will start getting placed
in the same AGs as each other.  The result is that consecutive
frames in one directory are placed on disk with frames from other
directories interleaved between them, which is a worst-case layout
for playback performance.  When reading back the frames in directory
A, hardware RAID read-ahead will cache data from frames in directory
B which is counterproductive.

Create a file system with a small AG size to demonstrate:

sles10:~ sjv: sudo mkfs.xfs -d agsize=64m /dev/sdb7 > /dev/null
sles10:~ sjv: sudo mount /dev/sdb7 /test
sles10:~ sjv: sudo chmod 777 /test
sles10:~ sjv: cd /test
sles10:/test sjv:

Create ten 10MB files concurrently in two directories:

sles10:/test sjv: mkdir a b
sles10:/test sjv: for dir in a b; do
> for file in `seq 0 9`; do
> xfs_mkfile 10m $dir/$file
> done &
> done; wait 2>/dev/null
[1] 30904
[2] 30905
sles10:/test sjv: ls -lid * */*
   131 drwxr-xr-x 2 sjv users       86 2006-10-20 13:48 a
   132 -rw------- 1 sjv users 10485760 2006-10-20 13:48 a/0
   133 -rw------- 1 sjv users 10485760 2006-10-20 13:48 a/1
   134 -rw------- 1 sjv users 10485760 2006-10-20 13:48 a/2
   135 -rw------- 1 sjv users 10485760 2006-10-20 13:48 a/3
   136 -rw------- 1 sjv users 10485760 2006-10-20 13:48 a/4
   137 -rw------- 1 sjv users 10485760 2006-10-20 13:48 a/5
   138 -rw------- 1 sjv users 10485760 2006-10-20 13:48 a/6
   139 -rw------- 1 sjv users 10485760 2006-10-20 13:48 a/7
   140 -rw------- 1 sjv users 10485760 2006-10-20 13:48 a/8
   141 -rw------- 1 sjv users 10485760 2006-10-20 13:48 a/9
262272 drwxr-xr-x 2 sjv users       86 2006-10-20 13:48 b
262273 -rw------- 1 sjv users 10485760 2006-10-20 13:48 b/0
262274 -rw------- 1 sjv users 10485760 2006-10-20 13:48 b/1
262275 -rw------- 1 sjv users 10485760 2006-10-20 13:48 b/2
262276 -rw------- 1 sjv users 10485760 2006-10-20 13:48 b/3
262277 -rw------- 1 sjv users 10485760 2006-10-20 13:48 b/4
262278 -rw------- 1 sjv users 10485760 2006-10-20 13:48 b/5
262279 -rw------- 1 sjv users 10485760 2006-10-20 13:48 b/6
262280 -rw------- 1 sjv users 10485760 2006-10-20 13:48 b/7
262281 -rw------- 1 sjv users 10485760 2006-10-20 13:48 b/8
262282 -rw------- 1 sjv users 10485760 2006-10-20 13:48 b/9
sles10:/test sjv:

Note that all the inodes are in the same AGs as each other.  What
about the file data?  Use xfs_bmap -v to examine the extents:

sles10:/test sjv: for file in `seq 0 9`; do
> bmap_a=`xfs_bmap -v a/$file | tail -1`
> bmap_b=`xfs_bmap -v b/$file | tail -1`
> ag_a=`echo $bmap_a | awk '{print $4}'`
> ag_b=`echo $bmap_b | awk '{print $4}'`
> br_a=`echo $bmap_a | awk '{printf "%-18s", $3}'`
> br_b=`echo $bmap_b | awk '{printf "%-18s", $3}'`
> echo a/$file: $ag_a "$br_a" b/$file: $ag_b "$br_b"
> done
a/0: 0 96..20575          b/0: 1 131168..151647
a/1: 0 20576..41055       b/1: 1 151648..172127
a/2: 0 41056..61535       b/2: 1 172128..192607
a/3: 0 61536..82015       b/3: 1 192608..213087
a/4: 0 82016..102495      b/4: 1 213088..233567
a/5: 0 102496..122975     b/5: 1 233568..254047
a/6: 2 299600..300111     b/6: 2 262208..275007
a/7: 2 338016..338527     b/7: 2 312400..312911
a/8: 2 344672..361567     b/8: 3 393280..401983
a/9: 2 361568..382047     b/9: 3 401984..421951
sles10:/test sjv:

The middle column is the AG number and the right column is the block
range.  Note how the extents for files in both directories get
placed on top of each other in AG 2.

Something to note in the results is that even though the file extents
have worked their way up into AGs 2 and 3, the inode numbers show
that the file inodes are all in the same AGs as their parent
directory, i.e. AGs 0 and 1.  Why is this?  To understand, it's
important to consider the order in which events are occurring.  The
two bash processes writing files are calling xfs_mkfile, which
starts by opening a file with the O_CREAT flag.  At this point, XFS
has no idea how large the file's data is going to be, so it dutifully
creates a new inode for the file in the same AG as the parent
directory.  The call returns successfully and the system continues
with its tasks.  When XFS is asked write the file data a short time
later, a new AG must be found for it because the AG the inode is
in is full.  The result is a violation of the original goal to keep
file data close to its inode on disk.  In practice, because inodes
are allocated in clusters on disk, a process that's reading back a
stream is likely to cache all the inodes it needs with just one or
two reads, so the disk seeking involved won't be as bad as it first
seems.

On the other hand, the extent data placement seen in the xfs_bmap
-v output is a problem.  Once the data extents spilled into AG 2,
both processes were given allocations there on a first-come-first-served
basis.  This destroyed the neatly contiguous allocation pattern for
the files and will certainly degrade read performance later on.

To address this issue, a new allocation algorithm was added to XFS
that associates a parent directory with an AG until a preset
inactivity timeout elapses.  The new algorithm is called the
Filestreams allocator and it is enabled in one of two ways.  Either
the filesystem is mounted with the -o filestreams option, or the
filestreams chattr flag is applied to a directory to indicate that
all allocations beneath that point in the directory hierarchy should
use the filestreams allocator.

With the filestreams allocator enabled, the above test produces
results that look like this:

a/0: 0 96..20575          b/0: 1 131168..151647
a/1: 0 20576..41055       b/1: 1 151648..172127
a/2: 0 41056..61535       b/2: 1 172128..192607
a/3: 0 61536..82015       b/3: 1 192608..213087
a/4: 0 82016..102495      b/4: 1 213088..233567
a/5: 0 102496..122975     b/5: 1 233568..254047
a/6: 2 272456..273479     b/6: 3 393280..410271
a/7: 2 290904..300119     b/7: 3 410272..426655
a/8: 2 300632..321111     b/8: 3 426656..441503
a/9: 2 329304..343639     b/9: 3 441504..459935

Once the process writing files to the first directory starts using
AG 2, that AG is no longer considered available so the other process
skips it and moves to AG 3.

---------------------------------

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Filestreams
  2008-06-10  1:49   ` Filestreams Timothy Shimmin
@ 2008-06-10  2:14     ` Richard Scobie
  2008-06-10 23:09     ` Filestreams (and 64bit inodes) Richard Scobie
  1 sibling, 0 replies; 19+ messages in thread
From: Richard Scobie @ 2008-06-10  2:14 UTC (permalink / raw)
  To: Timothy Shimmin; +Cc: Eric Sandeen, xfs

Timothy Shimmin wrote:

> BTW, Sam Vaughan wrote some tutorial notes on the allocator and in particular
> filestreams which I've pasted below:

Thanks Timothy, much appreciated.

Regards,

Richard

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Filestreams (and 64bit inodes)
  2008-06-10  1:49   ` Filestreams Timothy Shimmin
  2008-06-10  2:14     ` Filestreams Richard Scobie
@ 2008-06-10 23:09     ` Richard Scobie
  2008-06-11  1:39       ` Timothy Shimmin
  1 sibling, 1 reply; 19+ messages in thread
From: Richard Scobie @ 2008-06-10 23:09 UTC (permalink / raw)
  To: Timothy Shimmin; +Cc: xfs

Timothy Shimmin wrote:

> BTW, Sam Vaughan wrote some tutorial notes on the allocator and in particular
> filestreams which I've pasted below:
> (I thought it might be here:
>  http://oss.sgi.com/projects/xfs/training/ but I can't see it
>  in the allocator lab and slides).

I did actually find an entry for filestreams in slide 6.

While there I also found the information on 64bit inodes.

My filesystem is 9.6TB and could well end up with a large quantity of 
1-15MB files stored and the statement:

  "Operating system interfaces and legacy software products often 
mandate the use of 32 bit inode numbers even on systems that support 64 
bit inode numbers."

makes me wonder how common this still is in practice - the slide was 
written in 2006)?

My initial preference would be to go with 64 bit inodes for performance 
reasons, but as one cannot revert the fs back to 32 bit inodes once 
committed, I am somewhat hesitant.

Or am I worrying unecessarily about the negative impact of 32 bit 
inodes, given 9.6TB full of 1 to 15MB files?

Regards,

Richard

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Filestreams (and 64bit inodes)
  2008-06-10 23:09     ` Filestreams (and 64bit inodes) Richard Scobie
@ 2008-06-11  1:39       ` Timothy Shimmin
  2008-06-11  2:47         ` Richard Scobie
  2008-06-11  3:23         ` Eric Sandeen
  0 siblings, 2 replies; 19+ messages in thread
From: Timothy Shimmin @ 2008-06-11  1:39 UTC (permalink / raw)
  To: Richard Scobie; +Cc: xfs

Hi Richard,

Richard Scobie wrote:
> Timothy Shimmin wrote:
> 
>> BTW, Sam Vaughan wrote some tutorial notes on the allocator and in
>> particular
>> filestreams which I've pasted below:
>> (I thought it might be here:
>>  http://oss.sgi.com/projects/xfs/training/ but I can't see it
>>  in the allocator lab and slides).
> 
> I did actually find an entry for filestreams in slide 6.
> 
Yes, so did I but on a quick scan it didn't have the same detail.

> While there I also found the information on 64bit inodes.
> 
> My filesystem is 9.6TB and could well end up with a large quantity of
> 1-15MB files stored and the statement:
> 
>  "Operating system interfaces and legacy software products often mandate
> the use of 32 bit inode numbers even on systems that support 64 bit
> inode numbers."
> 
> makes me wonder how common this still is in practice - the slide was
> written in 2006)?
> 
> My initial preference would be to go with 64 bit inodes for performance
> reasons, but as one cannot revert the fs back to 32 bit inodes once
> committed, I am somewhat hesitant.
> 
> Or am I worrying unecessarily about the negative impact of 32 bit
> inodes, given 9.6TB full of 1 to 15MB files?
> 
Ah, the 32 bit inode versus 64 bit inode question :)
I don't have any definitive answers and I'm sure there will be people
on the list with their opinions and experiences.
So just some thoughts...

Firstly, XFS' current support for 32 bit inode numbers was added as
an afterthought I think primarily at the time on IRIX for 32 bit backup
clients such as Legato Networker.
It is only a compatibility thing with performance downsides.
So the question then becomes (1)what exactly is the compatibility matrix
and (2)under what conditions are there performance problems and by how much.
The other thing (3) is then a conversion tool for moving back from 64 bit inodes
to 32 bit inodes if you have a compat problem.

(3) There is a conversion tool called xfs_reno on IRIX. Barry has ported and modified
it for Linux but I believe has not checked it in and has some outstanding
Agami review points to address. Ideally, it would be nicer if we had more
kernel support (as suggested by Dave (dgc)) for swapping all the inode's metadata
(instead of just extents as we currently do).
So in other words, it is not there yet and there is the question of whether
we should update the kernel first (or maybe the tool should go in anyway for
use on older kernels).

(1) It would be nice to know what the state of the apps really are.
There is also the question of interaction with CXFS and NFS.
Greg Banks has a compat matrix for NFS. It looks like the main
things is to get something half recent - linux 2.6, nfs v3,
apps which use 64 bit sys calls (eg. stat64) etc...
Would need to do investigating.

There is also the possibility of other 32bit/64bit mapping schemes for xfs
but I won't go there.

--Tim

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Filestreams (and 64bit inodes)
  2008-06-11  1:39       ` Timothy Shimmin
@ 2008-06-11  2:47         ` Richard Scobie
  2008-06-11  3:23         ` Eric Sandeen
  1 sibling, 0 replies; 19+ messages in thread
From: Richard Scobie @ 2008-06-11  2:47 UTC (permalink / raw)
  To: Timothy Shimmin; +Cc: xfs

Hi Timothy,

Timothy Shimmin wrote:

> Ah, the 32 bit inode versus 64 bit inode question :)
> I don't have any definitive answers and I'm sure there will be people
> on the list with their opinions and experiences.
> So just some thoughts...

On balance, I'm thinking the best compromise might be to stay 32 bit and 
bump the inode size to 2048 bytes.

Regards,

Richard

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Filestreams (and 64bit inodes)
  2008-06-11  1:39       ` Timothy Shimmin
  2008-06-11  2:47         ` Richard Scobie
@ 2008-06-11  3:23         ` Eric Sandeen
  2008-06-12 13:52           ` Eric Sandeen
  1 sibling, 1 reply; 19+ messages in thread
From: Eric Sandeen @ 2008-06-11  3:23 UTC (permalink / raw)
  To: Timothy Shimmin; +Cc: Richard Scobie, xfs

Timothy Shimmin wrote:

> (1) It would be nice to know what the state of the apps really are.
> There is also the question of interaction with CXFS and NFS.
> Greg Banks has a compat matrix for NFS. It looks like the main
> things is to get something half recent - linux 2.6, nfs v3,
> apps which use 64 bit sys calls (eg. stat64) etc...
> Would need to do investigating.

Greg has a tool to scan binaries... some day I'm going to run it over
the fedora universe, I'll get back to you...  someday.

-Eric

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Filestreams (and 64bit inodes)
  2008-06-11  3:23         ` Eric Sandeen
@ 2008-06-12 13:52           ` Eric Sandeen
  2008-06-13  1:28             ` Greg Banks
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Sandeen @ 2008-06-12 13:52 UTC (permalink / raw)
  To: Timothy Shimmin; +Cc: Richard Scobie, xfs, Greg Banks

Eric Sandeen wrote:
> Timothy Shimmin wrote:
> 
>> (1) It would be nice to know what the state of the apps really are.
>> There is also the question of interaction with CXFS and NFS.
>> Greg Banks has a compat matrix for NFS. It looks like the main
>> things is to get something half recent - linux 2.6, nfs v3,
>> apps which use 64 bit sys calls (eg. stat64) etc...
>> Would need to do investigating.
> 
> Greg has a tool to scan binaries... some day I'm going to run it over
> the fedora universe, I'll get back to you...  someday.

someday didn't take too long :)  but it ain't pretty.

I installed all fedora packages under a directory and ran greg's tool over:

/sbin /usr/sbin /bin /usr/bin /usr/kerberos/bin/ /usr/kerberos/sbin/

Aggregate results:

   4070 29.1% are scripts (shell, perl, whatever)
   6598 47.2% don't use any stat() family calls at all
   1829 13.1% use 32-bit stat() family interfaces only
   1312  9.4% use 64-bit stat64() family interfaces only
    180  1.3% use both 32-bit and 64-bit stat() family interfaces

list of packages, sorted by the semi-lame "number of files in package
which call a 32-bit stat variant" metric:

http://sandeen.fedorapeople.org/stat32-ers

I'm going to see if I can't leverage Fedora to clean some of this up.

-Eric

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Filestreams (and 64bit inodes)
  2008-06-12 13:52           ` Eric Sandeen
@ 2008-06-13  1:28             ` Greg Banks
  2008-06-13  3:06               ` Eric Sandeen
  2008-06-13  3:20               ` Mark Goodwin
  0 siblings, 2 replies; 19+ messages in thread
From: Greg Banks @ 2008-06-13  1:28 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Timothy Shimmin, Richard Scobie, xfs

Eric Sandeen wrote:
> Eric Sandeen wrote:
>   
>> Timothy Shimmin wrote:
>>
>>     
>>> (1) It would be nice to know what the state of the apps really are.
>>> There is also the question of interaction with CXFS and NFS.
>>> Greg Banks has a compat matrix for NFS. It looks like the main
>>> things is to get something half recent - linux 2.6, nfs v3,
>>> apps which use 64 bit sys calls (eg. stat64) etc...
>>> Would need to do investigating.
>>>       
>> Greg has a tool to scan binaries... some day I'm going to run it over
>> the fedora universe, I'll get back to you...  someday.
>>     
>
> someday didn't take too long :) 
Cool, thanks for the data Eric.

>  but it ain't pretty.
>
> I installed all fedora packages under a directory and ran greg's tool over:
>
> /sbin /usr/sbin /bin /usr/bin /usr/kerberos/bin/ /usr/kerberos/sbin/
>
> Aggregate results:
>
>    4070 29.1% are scripts (shell, perl, whatever)
>    6598 47.2% don't use any stat() family calls at all
>    1829 13.1% use 32-bit stat() family interfaces only
>    1312  9.4% use 64-bit stat64() family interfaces only
>     180  1.3% use both 32-bit and 64-bit stat() family interfaces
>   
Ouch.  That's over two thousand executables to patch, rebuild, and ship.
> list of packages, sorted by the semi-lame "number of files in package
> which call a 32-bit stat variant" metric:
>
> http://sandeen.fedorapeople.org/stat32-ers
>
> I'm going to see if I can't leverage Fedora to clean some of this up.
>
> -Eric
>   
Good luck with that.

-- 
Greg Banks, P.Engineer, SGI Australian Software Group.
The cake is *not* a lie.
I don't speak for SGI.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Filestreams (and 64bit inodes)
  2008-06-13  1:28             ` Greg Banks
@ 2008-06-13  3:06               ` Eric Sandeen
  2008-06-13  3:24                 ` Greg Banks
  2008-06-13  3:20               ` Mark Goodwin
  1 sibling, 1 reply; 19+ messages in thread
From: Eric Sandeen @ 2008-06-13  3:06 UTC (permalink / raw)
  To: Greg Banks; +Cc: Timothy Shimmin, Richard Scobie, xfs

Greg Banks wrote:
> Eric Sandeen wrote:

> Cool, thanks for the data Eric.
> 
>>  but it ain't pretty.
>>
>> I installed all fedora packages under a directory and ran greg's tool over:
>>
>> /sbin /usr/sbin /bin /usr/bin /usr/kerberos/bin/ /usr/kerberos/sbin/
>>
>> Aggregate results:
>>
>>    4070 29.1% are scripts (shell, perl, whatever)
>>    6598 47.2% don't use any stat() family calls at all
>>    1829 13.1% use 32-bit stat() family interfaces only
>>    1312  9.4% use 64-bit stat64() family interfaces only
>>     180  1.3% use both 32-bit and 64-bit stat() family interfaces
>>   
> Ouch.  That's over two thousand executables to patch, rebuild, and ship.
>> list of packages, sorted by the semi-lame "number of files in package
>> which call a 32-bit stat variant" metric:
>>
>> http://sandeen.fedorapeople.org/stat32-ers

And about 900 packages...

>> I'm going to see if I can't leverage Fedora to clean some of this up.
>>
>> -Eric
>>   
> Good luck with that.

Heh :)  At first I was just going to correlate with st_ino users to cut
it down, but then I learned that glibc will actually give you an
EOVERFLOW if, say st_ino overflows, even if you were only going to check
st_mode.  :(  So pretty much everything needs fixing.

(FWIW I gathered statfs/statvfs calls, too...)

-Eric

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Filestreams (and 64bit inodes)
  2008-06-13  1:28             ` Greg Banks
  2008-06-13  3:06               ` Eric Sandeen
@ 2008-06-13  3:20               ` Mark Goodwin
  2008-06-13  3:40                 ` Greg Banks
  2008-06-13  3:45                 ` Eric Sandeen
  1 sibling, 2 replies; 19+ messages in thread
From: Mark Goodwin @ 2008-06-13  3:20 UTC (permalink / raw)
  To: Greg Banks; +Cc: Eric Sandeen, Timothy Shimmin, Richard Scobie, xfs



Greg Banks wrote:
> Eric Sandeen wrote:
>>    4070 29.1% are scripts (shell, perl, whatever)
>>    6598 47.2% don't use any stat() family calls at all
>>    1829 13.1% use 32-bit stat() family interfaces only
>>    1312  9.4% use 64-bit stat64() family interfaces only
>>     180  1.3% use both 32-bit and 64-bit stat() family interfaces
>>
> Ouch.  That's over two thousand executables to patch, rebuild, and ship.
>> list of packages, sorted by the semi-lame "number of files in package
>> which call a 32-bit stat variant" metric:
>>
>> http://sandeen.fedorapeople.org/stat32-ers

struct dirent has an embedded ino_t too, so for completeness we should also
be looking for readdir(), readdir64(), getdirentries(), 
getdirentries64(), etc.

>> I'm going to see if I can't leverage Fedora to clean some of this up.
>>
>> -Eric
>>
> Good luck with that.

Yes good luck :)
(and the plan for statically linked apps? ...)

Cheers

-- 

  Mark Goodwin                                  markgw@sgi.com
  Engineering Manager for XFS and PCP    Phone: +61-3-99631937
  SGI Australian Software Group           Cell: +61-4-18969583
-------------------------------------------------------------

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Filestreams (and 64bit inodes)
  2008-06-13  3:06               ` Eric Sandeen
@ 2008-06-13  3:24                 ` Greg Banks
  0 siblings, 0 replies; 19+ messages in thread
From: Greg Banks @ 2008-06-13  3:24 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Timothy Shimmin, Richard Scobie, xfs

Eric Sandeen wrote:
>
> Heh :)  At first I was just going to correlate with st_ino users to cut
> it down, but then I learned that glibc will actually give you an
> EOVERFLOW if, say st_ino overflows, even if you were only going to check
> st_mode.  :(  So pretty much everything needs fixing.
>   
Of course. There's no way for the app to tell glibc that the app doesn't
care about st_ino, so glibc must assume that glibc needs to return an
accurate st_ino.  The alternative is to return the lower 32 bits of
st_ino, thus causing silent subtle failures in the very small number of
applications which actually do something with st_ino.  This is what
glibc used to do back when I first started tracking the issue.
> (FWIW I gathered statfs/statvfs calls, too...)
>
>   
Yes.

-- 
Greg Banks, P.Engineer, SGI Australian Software Group.
The cake is *not* a lie.
I don't speak for SGI.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Filestreams (and 64bit inodes)
  2008-06-13  3:20               ` Mark Goodwin
@ 2008-06-13  3:40                 ` Greg Banks
  2008-06-13  3:46                   ` Eric Sandeen
  2008-06-13  5:35                   ` Greg Banks
  2008-06-13  3:45                 ` Eric Sandeen
  1 sibling, 2 replies; 19+ messages in thread
From: Greg Banks @ 2008-06-13  3:40 UTC (permalink / raw)
  To: markgw; +Cc: Eric Sandeen, Timothy Shimmin, Richard Scobie, xfs

Mark Goodwin wrote:
>
>
> Greg Banks wrote:
>> Eric Sandeen wrote:
>>>    4070 29.1% are scripts (shell, perl, whatever)
>>>    6598 47.2% don't use any stat() family calls at all
>>>    1829 13.1% use 32-bit stat() family interfaces only
>>>    1312  9.4% use 64-bit stat64() family interfaces only
>>>     180  1.3% use both 32-bit and 64-bit stat() family interfaces
>>>
>> Ouch.  That's over two thousand executables to patch, rebuild, and ship.
>>> list of packages, sorted by the semi-lame "number of files in package
>>> which call a 32-bit stat variant" metric:
>>>
>>> http://sandeen.fedorapeople.org/stat32-ers
>
> struct dirent has an embedded ino_t too, so for completeness we should
> also
> be looking for readdir(), readdir64(), getdirentries(),
> getdirentries64(), etc.
Good point.  Looking in the code, it seems the getdents common code in
glibc will fail with EOVERFLOW if the inode number gets truncated during
64b-32b translation, just like the stat() family.  I'll need to improve
the scanning tool :-)
>
>>> I'm going to see if I can't leverage Fedora to clean some of this up.
>>>
>>> -Eric
>>>
>> Good luck with that.
>
> Yes good luck :)
> (and the plan for statically linked apps? ...)

Perhaps Fedora could enable the glibc magic for issuing warnings at link
time when those symbols are used, like what happens today if you use the
old unsafe gets() function:

gnb@inara 1058> gcc -o fmeh x.c
/tmp/ccQhxIIo.o: In function `main':
x.c:(.text+0x18): warning: the `gets' function is dangerous and should not be used.




-- 
Greg Banks, P.Engineer, SGI Australian Software Group.
The cake is *not* a lie.
I don't speak for SGI.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Filestreams (and 64bit inodes)
  2008-06-13  3:20               ` Mark Goodwin
  2008-06-13  3:40                 ` Greg Banks
@ 2008-06-13  3:45                 ` Eric Sandeen
  1 sibling, 0 replies; 19+ messages in thread
From: Eric Sandeen @ 2008-06-13  3:45 UTC (permalink / raw)
  To: markgw; +Cc: Greg Banks, Timothy Shimmin, Richard Scobie, xfs

Mark Goodwin wrote:

> struct dirent has an embedded ino_t too, so for completeness we should also
> be looking for readdir(), readdir64(), getdirentries(), 
> getdirentries64(), etc.

*sigh*

>>> I'm going to see if I can't leverage Fedora to clean some of this up.
>>>
>>> -Eric
>>>
>> Good luck with that.
> 
> Yes good luck :)
> (and the plan for statically linked apps? ...)

I plan to encourage fedora to keep discouraging them ;)

-Eric

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Filestreams (and 64bit inodes)
  2008-06-13  3:40                 ` Greg Banks
@ 2008-06-13  3:46                   ` Eric Sandeen
  2008-06-13  3:57                     ` Greg Banks
  2008-06-13  5:35                   ` Greg Banks
  1 sibling, 1 reply; 19+ messages in thread
From: Eric Sandeen @ 2008-06-13  3:46 UTC (permalink / raw)
  To: Greg Banks; +Cc: markgw, Timothy Shimmin, Richard Scobie, xfs

Greg Banks wrote:

>> (and the plan for statically linked apps? ...)
> 
> Perhaps Fedora could enable the glibc magic for issuing warnings at link
> time when those symbols are used, like what happens today if you use the
> old unsafe gets() function:
> 
> gnb@inara 1058> gcc -o fmeh x.c
> /tmp/ccQhxIIo.o: In function `main':
> x.c:(.text+0x18): warning: the `gets' function is dangerous and should not be used.


It wouldn't help the automated builds (nobody'd see it) but for normal
user compilations that might be an option...

-Eric

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Filestreams (and 64bit inodes)
  2008-06-13  3:46                   ` Eric Sandeen
@ 2008-06-13  3:57                     ` Greg Banks
  0 siblings, 0 replies; 19+ messages in thread
From: Greg Banks @ 2008-06-13  3:57 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: markgw, Timothy Shimmin, Richard Scobie, xfs

Eric Sandeen wrote:
> Greg Banks wrote:
>
>   
>>> (and the plan for statically linked apps? ...)
>>>       
>> Perhaps Fedora could enable the glibc magic for issuing warnings at link
>> time when those symbols are used, like what happens today if you use the
>> old unsafe gets() function:
>>
>> gnb@inara 1058> gcc -o fmeh x.c
>> /tmp/ccQhxIIo.o: In function `main':
>> x.c:(.text+0x18): warning: the `gets' function is dangerous and should not be used.
>>     
>
>
> It wouldn't help the automated builds (nobody'd see it)
Hmm.  I don't see a way in glibc to upgrade that warning to an error to
make those builds fail, at least in glibc 2.4.
>  but for normal
> user compilations that might be an option...
>
>   

Indeed.

-- 
Greg Banks, P.Engineer, SGI Australian Software Group.
The cake is *not* a lie.
I don't speak for SGI.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Filestreams (and 64bit inodes)
  2008-06-13  3:40                 ` Greg Banks
  2008-06-13  3:46                   ` Eric Sandeen
@ 2008-06-13  5:35                   ` Greg Banks
  2008-06-13 13:28                     ` Eric Sandeen
  1 sibling, 1 reply; 19+ messages in thread
From: Greg Banks @ 2008-06-13  5:35 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: markgw, Timothy Shimmin, Richard Scobie, xfs

[-- Attachment #1: Type: text/plain, Size: 5896 bytes --]

Greg Banks wrote:
> Mark Goodwin wrote:
>   
>> Greg Banks wrote:
>>     
>>> Eric Sandeen wrote:
>>>       
>>>>    4070 29.1% are scripts (shell, perl, whatever)
>>>>    6598 47.2% don't use any stat() family calls at all
>>>>    1829 13.1% use 32-bit stat() family interfaces only
>>>>    1312  9.4% use 64-bit stat64() family interfaces only
>>>>     180  1.3% use both 32-bit and 64-bit stat() family interfaces
>>>>
>>>>         
>>> Ouch.  That's over two thousand executables to patch, rebuild, and ship.
>>>       
>>>> list of packages, sorted by the semi-lame "number of files in package
>>>> which call a 32-bit stat variant" metric:
>>>>
>>>> http://sandeen.fedorapeople.org/stat32-ers
>>>>         
>> struct dirent has an embedded ino_t too, so for completeness we should
>> also
>> be looking for readdir(), readdir64(), getdirentries(),
>> getdirentries64(), etc.
>>     
> Good point.  Looking in the code, it seems the getdents common code in
> glibc will fail with EOVERFLOW if the inode number gets truncated during
> 64b-32b translation, just like the stat() family.  
Experiment confirms this behaviour, using a small wrapper program around
opendir/readdir/closedir:

heave:~/stat64/mnt # ../myls64
idx       d_ino      d_off d_type d_name
--- ---------------- ----- ------ ------
  0 0000000000000080     4      0 .
  1 0000000000000080     6      0 ..
  2 0000000000000083     8      0 d0
  3 0000000080000080    10      0 d1
  4 0000000100000080    12      0 d2
  5 0000000180000080    14      0 d3
  6 0000000200000080    16      0 d4
  7 0000000280000080    18      0 d5
  8 0000000300000080    20      0 d6
  9 0000000380000080    22      0 d7
 10 0000000400000080    24      0 d8
 11 0000000480000080    26      0 d9
 12 0000000500000080    28      0 d10
 13 0000000580000080    30      0 d11
 14 0000000600000080    32      0 d12
 15 0000000680000080    34      0 d13
 16 0000000700000080    36      0 d14
 17 0000000780000080    38      0 d15
 18 0000000800000080    40      0 d16
 19 0000000880000080    42      0 d17
 20 0000000900000080    44      0 d18
 21 0000000980000080    46      0 d19
 22 0000000a00000080    48      0 d20
 23 0000000a80000080    50      0 d21
 24 0000000b00000080    52      0 d22
 25 0000000b80000080    54      0 d23
 26 0000000c00000080    56      0 d24
 27 0000000c80000080    58      0 d25
 28 0000000d00000080    60      0 d26
 29 0000000d80000080    62      0 d27
 30 0000000e00000080    64      0 d28
 31 0000000e80000080    66      0 d29
 32 0000000f00000080    68      0 d30
 33 0000000f80000080    70      0 d31
 34 0000001000080080    72      0 d32
 35 0000001080000080    74      0 d33
 36 0000001100000080    76      0 d34
 37 0000001180000080    78      0 d35
 38 0000001200000080    80      0 d36
 39 0000001280000080    82      0 d37
 40 0000001300000080    84      0 d38
 41 0000001380000080    86      0 d39
 42 0000001400000080    88      0 d40
 43 0000001480000080    90      0 d41
 44 0000001500000080    92      0 d42
 45 0000001580000080    94      0 d43
 46 0000001600000080    96      0 d44
 47 0000001680000080    98      0 d45
 48 0000001700000080   100      0 d46
 49 0000001780000080   102      0 d47
 50 0000001800000080   104      0 d48
 51 0000001880000080   106      0 d49
 52 0000001900000080   108      0 d50
 53 0000001980000080   110      0 d51
 54 0000001a00000080   112      0 d52
 55 0000001a80000080   114      0 d53
 56 0000001b00000080   116      0 d54
 57 0000001b80000080   118      0 d55
 58 0000001c00000080   120      0 d56
 59 0000001c80000080   122      0 d57
 60 0000001d00000080   124      0 d58
 61 0000001d80000080   126      0 d59
 62 0000001e00000080   128      0 d60
 63 0000001e80000080   130      0 d61
 64 0000001f00000080   132      0 d62
 65 0000001f80000080   512      0 d63


heave:~/stat64/mnt # ../myls32
idx       d_ino      d_off d_type d_name
--- ---------------- ----- ------ ------
  0 0000000000000080     4      0 .
  1 0000000000000080     6      0 ..
  2 0000000000000083     8      0 d0
  3 0000000080000080    10      0 d1
.: Value too large for defined data type


heave:~/stat64/mnt # strace -s1024 ../myls32
execve("../myls32", ["../myls32"], [/* 60 vars */]) = 0
[ Process PID=12640 runs in 32 bit mode. ]
...
open(".", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 3
...
write(1, "idx       d_ino      d_off d_type d_name\n", 41) = 41
write(1, "--- ---------------- ----- ------ ------\n", 41) = 41
getdents64(3, /* 66 entries */, 4096)   = 1584
_llseek(3, 10, [10], SEEK_SET)          = 0
write(1, "  0 0000000000000080     4      0 .\n", 36) = 36
write(1, "  1 0000000000000080     6      0 ..\n", 37) = 37
write(1, "  2 0000000000000083     8      0 d0\n", 37) = 37
write(1, "  3 0000000080000080    10      0 d1\n", 37) = 37
getdents64(3, /* 62 entries */, 4096)   = 1488
...
write(4, ".: Value too large for defined data type\n", 41) = 41
...
close(3)                                = 0
exit_group(0)                           = ?
Process 12640 detached



I also confirmed that using readdir64() allows the 32b app to work.

Of course, the glibc readdir() interface makes it extra hard for the
caller to tell the difference between an error and a normal EOF.  In
both cases, NULL is returned.  In the error case, errno is set.  In the
EOF case, errno is unchanged.  In the success case, EOF is also
unchanged.  So to detect the error from readdir() the application writer
needs to do something like:

DIR *dir;
struct direntry *de;
...
errno = 0;
while ((de = readdir(dir)) != NULL)
{
    // handle entry
    errno = 0;
}
if (errno)
{
     // handle error
}

Otherwise the directory traversal just finishes early with no error
reported.  I'll bet that's what all the apps do :-)

> I'll need to improve
> the scanning tool :-)
Attached.

-- 
Greg Banks, P.Engineer, SGI Australian Software Group.
The cake is *not* a lie.
I don't speak for SGI.


[-- Attachment #2: summarise-stat64-2.pl --]
[-- Type: application/x-perl, Size: 4148 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Filestreams (and 64bit inodes)
  2008-06-13  5:35                   ` Greg Banks
@ 2008-06-13 13:28                     ` Eric Sandeen
  0 siblings, 0 replies; 19+ messages in thread
From: Eric Sandeen @ 2008-06-13 13:28 UTC (permalink / raw)
  To: Greg Banks; +Cc: markgw, Timothy Shimmin, Richard Scobie, xfs

Greg Banks wrote:

>> I'll need to improve
>> the scanning tool :-)
> Attached.
> 

I had done something similar, except:

--- summarise-stat64-2.pl.orig	2008-06-13 08:26:07.000000000 -0500
+++ summarise-stat64-2.pl	2008-06-13 08:26:24.000000000 -0500
@@ -93,11 +93,11 @@
 		{
 			$res{used64}++;
 		}
-		elsif (m/^\s+U readdir$/)
+		elsif (m/^\s+U readdir(|_r)$/)
 		{
 			$res{used32}++;
 		}
-		elsif (m/^\s+U readdir64$/)
+		elsif (m/^\s+U readdir64(|_r)$/)
 		{
 			$res{used64}++;
 		}

(lazily whitespace-mangled, sorry)

A new scan with my slightly different version of the tool yielded:

  447 readdir32
  908 stat32
   38 statfs32

where the nr. is the nr. of packages with that particular 32-bit call.

-Eric

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2008-06-13 13:27 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-07 23:11 Filestreams Richard Scobie
2008-06-09  3:31 ` Filestreams Eric Sandeen
2008-06-10  1:49   ` Filestreams Timothy Shimmin
2008-06-10  2:14     ` Filestreams Richard Scobie
2008-06-10 23:09     ` Filestreams (and 64bit inodes) Richard Scobie
2008-06-11  1:39       ` Timothy Shimmin
2008-06-11  2:47         ` Richard Scobie
2008-06-11  3:23         ` Eric Sandeen
2008-06-12 13:52           ` Eric Sandeen
2008-06-13  1:28             ` Greg Banks
2008-06-13  3:06               ` Eric Sandeen
2008-06-13  3:24                 ` Greg Banks
2008-06-13  3:20               ` Mark Goodwin
2008-06-13  3:40                 ` Greg Banks
2008-06-13  3:46                   ` Eric Sandeen
2008-06-13  3:57                     ` Greg Banks
2008-06-13  5:35                   ` Greg Banks
2008-06-13 13:28                     ` Eric Sandeen
2008-06-13  3:45                 ` Eric Sandeen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox