public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* Questions about testing the Filestream feature
@ 2007-09-21  3:10 Hxsrmeng
  2007-09-21  7:54 ` David Chinner
  0 siblings, 1 reply; 3+ messages in thread
From: Hxsrmeng @ 2007-09-21  3:10 UTC (permalink / raw)
  To: linux-xfs


Hi all,

I need to use the "Filestreams" feature. I wrote a script to write files to
two directories concurrently.  When I check the file bitmap, I found
sometimes the files written in the different directories still interleave
extents on disk. I don't know whether there is something wrong with my
script, or, I misunderstand something.

I am using Opensuse10.2, the kernel is linux-2.6.23-rc4 (source code was
check out from cvs of oss.sgi.com). The filestreams feature is enabled with
a "-o filestreams" mount option.
Here is my script: 
"
  1 #try filestreams
  2 a=$1
  3 filenumber=`expr $a - 1`
  4 filesize=$2
  5
  6 umount /xfs_disk
  7 /sbin/mkfs -t xfs -f /dev/hda5 #>> logfile
  8 mount -t xfs -o filestreams /dev/hda5 /xfs_disk
  9 #enable filestreams
 10
 11 cd /xfs_disk
 12 for dirname in dira dirb
 13     do
 14     mkdir $dirname
 15     for filename in `seq 0 $filenumber`
 16         do
 17         dd if=/dev/zero of=$dirname/$filename bs=$filesize count=1 >
/dev/null 2>&1 &
 18     done
 19 done
 20
 21 wait
 22 for dirname in dira dirb
 23     do
 24     for filename in `seq 0 $filenumber`
 25         do
 26         /usr/sbin/xfs_bmap -v $dirname/$filename > bmapresult
 27         cat bmapresult >> bitmap
 28         a="expr `wc -l bmapresult | awk '{print $1}'` - 2"
 29         b=`$a`
 30         c=`tail -$b bmapresult | awk '{ print $4 }'`
 31         echo $dirname/$filename is in AG $c:>>agmap
 32     done
 33 done
"

Then I got the information of my xfs device first :  
meta-data=/dev/hda5              isize=256      agcount=8, agsize=159895
blks
              =                             sectsz=512   attr=0
data        =                             bsize=4096   blocks=1279160,
imaxpct=25
              =                             sunit=0         swidth=0 blks,
unwritten=1
naming    =                             version 2      bsize=4096
log          =                             internal log    bsize=4096  
blocks=2560, version=1
              =                             sectsz=512   sunit=0 blks
realtime   =                             none            extsz=65536 
blocks=0, rtextents=0


First run, I wrote 3 "big" files, which are 768M, to each directories. The
files in directory dira share AG 0,2,5,7 and files in directory dirb share
AG 1, 3, 4, 6,  which I assume should be correct. But the files extents
doesn't use contiguous blocks, and all files in the same directory put some
of their extents in AG 0. I am not sure whether this is correct.  Here is
part of file bitmap:
"
dira/0:
 EXT: FILE-OFFSET         BLOCK-RANGE      AG AG-OFFSET          TOTAL
   0: [0..7615]:          96..7711          0 (96..7711)          7616
   1: [7616..7679]:       33312..33375      0 (33312..33375)        64
   2: [7680..24063]:      33448..49831      0 (33448..49831)     16384
   3: [24064..52999]:     60608..89543      0 (60608..89543)     28936
   4: [53000..61191]:     95496..103687     0 (95496..103687)     8192
   5: [61192..90791]:     119088..148687    0 (119088..148687)   29600
   6: [90792..131751]:    170264..211223    0 (170264..211223)   40960
   7: [131752..144223]:   219480..231951    0 (219480..231951)   12472
   8: [144224..168799]:   240144..264719    0 (240144..264719)   24576
   ...
dira/1:
 EXT: FILE-OFFSET         BLOCK-RANGE      AG AG-OFFSET           TOTAL
   0: [0..12791]:         7712..20503       0 (7712..20503)       12792
   1: [12792..12863]:     33376..33447      0 (33376..33447)         72
   2: [12864..13391]:     49832..50359      0 (49832..50359)        528
   3: [13392..19575]:     112904..119087    0 (112904..119087)     6184
   4: [19576..27767]:     148688..156879    0 (148688..156879)     8192
   5: [27768..35959]:     211224..219415    0 (211224..219415)     8192
   6: [35960..44151]:     231952..240143    0 (231952..240143)     8192
   7: [44152..68727]:     264784..289359    0 (264784..289359)    24576
   8: [68728..79047]:     309400..319719    0 (309400..319719)    10320
"

Second run, I wrote 1024 "small" files, which are 1M, to each directories. 
Files in directory dira use AG 0,1,3 and files in directory b use AG
2,1,5,6,7,4. So files written in directory dirb use the allocation group 1,
which should be reserved for directory dira . And, sometimes even one file
is written to two AGs. The following is part of file bitmap:
"
 ...
dira/498:
 EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
   0: [0..63]:         570848..570911    0 (570848..570911)    64
   1: [64..2047]:      1666600..1668583  1 (387440..389423)  1984
dira/499:
 EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
   0: [0..23]:         571672..571695    0 (571672..571695)    24
   1: [24..79]:        571776..571831    0 (571776..571831)    56
   2: [80..1839]:      1650616..1652375  1 (371456..373215)  1760
   3: [1840..1903]:    1662240..1662303  1 (383080..383143)    64
   4: [1904..2047]:    1676984..1677127  1 (397824..397967)   144
 ...
dirb/4:
 EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
   0: [0..2047]:       1279264..1281311  1 (104..2151)       2048
dirb/5:
 EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
   0: [0..63]:         1352136..1352199  1 (72976..73039)      64
   1: [64..415]:       1451896..1452247  1 (172736..173087)   352
   2: [416..1279]:     1633616..1634479  1 (354456..355319)   864
   3: [1280..1343]:    1647288..1647351  1 (368128..368191)    64
   4: [1344..2047]:    1677128..1677831  1 (397968..398671)   704
dirb/6:
 EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
   0: [0..2047]:       1285408..1287455  1 (6248..8295)      2048
....
"
I don't know whether the filestreams feature may works this way, or I made
some mistakes?

Thank you so much!
-- 
View this message in context: http://www.nabble.com/Questions-about-testing-the-Filestream-feature-tf4491605.html#a12809900
Sent from the linux-xfs mailing list archive at Nabble.com.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Questions about testing the Filestream feature
  2007-09-21  3:10 Questions about testing the Filestream feature Hxsrmeng
@ 2007-09-21  7:54 ` David Chinner
  2007-09-21 20:45   ` hxsrmeng
  0 siblings, 1 reply; 3+ messages in thread
From: David Chinner @ 2007-09-21  7:54 UTC (permalink / raw)
  To: Hxsrmeng; +Cc: linux-xfs

On Thu, Sep 20, 2007 at 08:10:31PM -0700, Hxsrmeng wrote:
> 
> Hi all,
> 
> I need to use the "Filestreams" feature. I wrote a script to write files to
> two directories concurrently.  When I check the file bitmap, I found
> sometimes the files written in the different directories still interleave
> extents on disk. I don't know whether there is something wrong with my
> script, or, I misunderstand something.
> 
> I am using Opensuse10.2, the kernel is linux-2.6.23-rc4 (source code was
> check out from cvs of oss.sgi.com). The filestreams feature is enabled with
> a "-o filestreams" mount option.
> Here is my script: 

<snip>

Very similar to xfsqa tests 170-174.

> Then I got the information of my xfs device first :  
> meta-data=/dev/hda5   isize=256      agcount=8, agsize=159895 blks
>          =            sectsz=512   attr=0
> data     =            bsize=4096   blocks=1279160,imaxpct=25

Ok, so an AG ~600MB in size, and your filesystem is about 5GB.

> First run, I wrote 3 "big" files, which are 768M, to each directories. The
> files in directory dira share AG 0,2,5,7 and files in directory dirb share
> AG 1, 3, 4, 6,  which I assume should be correct.

Yes, and 3*3*768 = 4GB ~= 80% full.

> But the files extents
> doesn't use contiguous blocks,

filestreams doesn't guarantee contiguous extents - it guarantees sets of files
separated by directories don't intertwine. Within the each set you can see
non-contiguous allocation, but the sets should not interleave in the same
AGs...

> and all files in the same directory put some
> of their extents in AG 0.

AG 0 is the "filestreams failure" allocation group. What you are seeing is
that at some point you've filled your AG's up and a stream write couldn't find
an unused AG that matched the stream association criteria and it gave up.

> I am not sure whether this is correct.  Here is
> part of file bitmap:
> "
> dira/0:
>  EXT: FILE-OFFSET         BLOCK-RANGE      AG AG-OFFSET          TOTAL
>    0: [0..7615]:          96..7711          0 (96..7711)          7616
>    1: [7616..7679]:       33312..33375      0 (33312..33375)        64
>    2: [7680..24063]:      33448..49831      0 (33448..49831)     16384
>    3: [24064..52999]:     60608..89543      0 (60608..89543)     28936
>    4: [53000..61191]:     95496..103687     0 (95496..103687)     8192
>    5: [61192..90791]:     119088..148687    0 (119088..148687)   29600
>    6: [90792..131751]:    170264..211223    0 (170264..211223)   40960
>    7: [131752..144223]:   219480..231951    0 (219480..231951)   12472
>    8: [144224..168799]:   240144..264719    0 (240144..264719)   24576

Ummm - that's a file stat started in AG 0....

>    ...
> dira/1:
>  EXT: FILE-OFFSET         BLOCK-RANGE      AG AG-OFFSET           TOTAL
>    0: [0..12791]:         7712..20503       0 (7712..20503)       12792
>    1: [12792..12863]:     33376..33447      0 (33376..33447)         72
>    2: [12864..13391]:     49832..50359      0 (49832..50359)        528
>    3: [13392..19575]:     112904..119087    0 (112904..119087)     6184
>    4: [19576..27767]:     148688..156879    0 (148688..156879)     8192
>    5: [27768..35959]:     211224..219415    0 (211224..219415)     8192
>    6: [35960..44151]:     231952..240143    0 (231952..240143)     8192
>    7: [44152..68727]:     264784..289359    0 (264784..289359)    24576
>    8: [68728..79047]:     309400..319719    0 (309400..319719)    10320

And so is that. Given that they are in the same directory, this is correct
behaviour.

How much memory in your test box? I suspect that you're getting writeback
from kswapd, not pdflush as you are doing buffered I/O and you're getting
LRU order writeback rather than nice sequential writeback. It's up to the
user/application to prevent intra-stream allocation/fragmentation
problems (e.g. preallocation, extent size hints, large direct I/O, etc)
and that is what your test application is lacking. filestreams only
prevents inter-stream interleaving.

Also, you are running close to filesystem full state. That is known to be
a no-no for deterministic performance from the filesystem, will cause
filesystem fragmentation, and is not the case that filestreams is
designed to optimise for.

however, I agree that the code is not working optimally. In test 171,
there is this comment:

# test large numbers of files, single I/O per file, 120s timeout
# Get close to filesystem full.
# 128 = ENOSPC
# 120 = 93.75% full, gets repeatable failures
# 112 = 87.5% full, should reliably succeed but doesn't *FIXME*
# 100 = 78.1% full, should reliably succeed

The test uses a 1GB filesystem to intentionally stress the allocator,
and at 78.1% full, we are getting intermittent failures. On
some machines (like my test boxes) it passes >95% of the time.
On other machines, it passes maybe 5% of the time. So the low
space behaviour is known to be less than optimal, but at production
sites it is known that they can't use the last 10-15% of the filesystem
because of fragmentation issues associated with stripe alignment.
Hence low space behaviour of the allocator is not considered something
critical because there are other, worse problems at low space
that filestreams can't do anything to prevent.

> Second run, I wrote 1024 "small" files, which are 1M, to each directories. 
> Files in directory dira use AG 0,1,3 and files in directory b use AG
> 2,1,5,6,7,4. So files written in directory dirb use the allocation group 1,
> which should be reserved for directory dira . And, sometimes even one file
> is written to two AGs. The following is part of file bitmap:

That's true only as long as a stream does not time out. An AG is
reserved only as long a the timeout since the last file in a
stream was created or allocated to.

IOWs, if you use buffered I/O, the 30s writeback delay could time
your stream out between file creation and write() syscall and
when pdflush writes it back. Then you have no stream association
and you will get interleaving. Test 172 tests this behaviour,
and we get intermittent failures on that test because the buffered
I/O case occasionally succeeds rather than fails like it is supposed
to....

What's your stream timeout (/proc/sys/fs/xfs/filestream_centisecs) set to?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Questions about testing the Filestream feature
  2007-09-21  7:54 ` David Chinner
@ 2007-09-21 20:45   ` hxsrmeng
  0 siblings, 0 replies; 3+ messages in thread
From: hxsrmeng @ 2007-09-21 20:45 UTC (permalink / raw)
  To: David Chinner; +Cc: XFS

Thank you so much. You really helped me a lot.

Sorry that I had to learn something form the net and manuals first to
understand what you said.:)
My RAM is only 512M and the stream timeout is 3s.......That might be the
problem.I will try this on a test box with bigger RAM and set the stream
timeout to 30s.

> It's up to the
> user/application to prevent intra-stream allocation/fragmentation
> problems (e.g. preallocation, extent size hints, large direct I/O, etc)
> and that is what your test application is lacking. filestreams only
> prevents inter-stream interleaving.

I'll try to get more information about how to modify my script according
to your suggestion.Thank you again!

Have a nice weekend!
Hxsrmeng


On Fri, 2007-09-21 at 17:54 +1000, David Chinner wrote:
> On Thu, Sep 20, 2007 at 08:10:31PM -0700, Hxsrmeng wrote:
> > 
> > Hi all,
> > 
> > I need to use the "Filestreams" feature. I wrote a script to write files to
> > two directories concurrently.  When I check the file bitmap, I found
> > sometimes the files written in the different directories still interleave
> > extents on disk. I don't know whether there is something wrong with my
> > script, or, I misunderstand something.
> > 
> > I am using Opensuse10.2, the kernel is linux-2.6.23-rc4 (source code was
> > check out from cvs of oss.sgi.com). The filestreams feature is enabled with
> > a "-o filestreams" mount option.
> > Here is my script: 
> 
> <snip>
> 
> Very similar to xfsqa tests 170-174.
> 
> > Then I got the information of my xfs device first :  
> > meta-data=/dev/hda5   isize=256      agcount=8, agsize=159895 blks
> >          =            sectsz=512   attr=0
> > data     =            bsize=4096   blocks=1279160,imaxpct=25
> 
> Ok, so an AG ~600MB in size, and your filesystem is about 5GB.
> 
> > First run, I wrote 3 "big" files, which are 768M, to each directories. The
> > files in directory dira share AG 0,2,5,7 and files in directory dirb share
> > AG 1, 3, 4, 6,  which I assume should be correct.
> 
> Yes, and 3*3*768 = 4GB ~= 80% full.
> 
> > But the files extents
> > doesn't use contiguous blocks,
> 
> filestreams doesn't guarantee contiguous extents - it guarantees sets of files
> separated by directories don't intertwine. Within the each set you can see
> non-contiguous allocation, but the sets should not interleave in the same
> AGs...
> 
> > and all files in the same directory put some
> > of their extents in AG 0.
> 
> AG 0 is the "filestreams failure" allocation group. What you are seeing is
> that at some point you've filled your AG's up and a stream write couldn't find
> an unused AG that matched the stream association criteria and it gave up.
> 
> > I am not sure whether this is correct.  Here is
> > part of file bitmap:
> > "
> > dira/0:
> >  EXT: FILE-OFFSET         BLOCK-RANGE      AG AG-OFFSET          TOTAL
> >    0: [0..7615]:          96..7711          0 (96..7711)          7616
> >    1: [7616..7679]:       33312..33375      0 (33312..33375)        64
> >    2: [7680..24063]:      33448..49831      0 (33448..49831)     16384
> >    3: [24064..52999]:     60608..89543      0 (60608..89543)     28936
> >    4: [53000..61191]:     95496..103687     0 (95496..103687)     8192
> >    5: [61192..90791]:     119088..148687    0 (119088..148687)   29600
> >    6: [90792..131751]:    170264..211223    0 (170264..211223)   40960
> >    7: [131752..144223]:   219480..231951    0 (219480..231951)   12472
> >    8: [144224..168799]:   240144..264719    0 (240144..264719)   24576
> 
> Ummm - that's a file stat started in AG 0....
> 
> >    ...
> > dira/1:
> >  EXT: FILE-OFFSET         BLOCK-RANGE      AG AG-OFFSET           TOTAL
> >    0: [0..12791]:         7712..20503       0 (7712..20503)       12792
> >    1: [12792..12863]:     33376..33447      0 (33376..33447)         72
> >    2: [12864..13391]:     49832..50359      0 (49832..50359)        528
> >    3: [13392..19575]:     112904..119087    0 (112904..119087)     6184
> >    4: [19576..27767]:     148688..156879    0 (148688..156879)     8192
> >    5: [27768..35959]:     211224..219415    0 (211224..219415)     8192
> >    6: [35960..44151]:     231952..240143    0 (231952..240143)     8192
> >    7: [44152..68727]:     264784..289359    0 (264784..289359)    24576
> >    8: [68728..79047]:     309400..319719    0 (309400..319719)    10320
> 
> And so is that. Given that they are in the same directory, this is correct
> behaviour.
> 
> How much memory in your test box? I suspect that you're getting writeback
> from kswapd, not pdflush as you are doing buffered I/O and you're getting
> LRU order writeback rather than nice sequential writeback. It's up to the
> user/application to prevent intra-stream allocation/fragmentation
> problems (e.g. preallocation, extent size hints, large direct I/O, etc)
> and that is what your test application is lacking. filestreams only
> prevents inter-stream interleaving.
> 
> Also, you are running close to filesystem full state. That is known to be
> a no-no for deterministic performance from the filesystem, will cause
> filesystem fragmentation, and is not the case that filestreams is
> designed to optimise for.
> 
> however, I agree that the code is not working optimally. In test 171,
> there is this comment:
> 
> # test large numbers of files, single I/O per file, 120s timeout
> # Get close to filesystem full.
> # 128 = ENOSPC
> # 120 = 93.75% full, gets repeatable failures
> # 112 = 87.5% full, should reliably succeed but doesn't *FIXME*
> # 100 = 78.1% full, should reliably succeed
> 
> The test uses a 1GB filesystem to intentionally stress the allocator,
> and at 78.1% full, we are getting intermittent failures. On
> some machines (like my test boxes) it passes >95% of the time.
> On other machines, it passes maybe 5% of the time. So the low
> space behaviour is known to be less than optimal, but at production
> sites it is known that they can't use the last 10-15% of the filesystem
> because of fragmentation issues associated with stripe alignment.
> Hence low space behaviour of the allocator is not considered something
> critical because there are other, worse problems at low space
> that filestreams can't do anything to prevent.
> 
> > Second run, I wrote 1024 "small" files, which are 1M, to each directories. 
> > Files in directory dira use AG 0,1,3 and files in directory b use AG
> > 2,1,5,6,7,4. So files written in directory dirb use the allocation group 1,
> > which should be reserved for directory dira . And, sometimes even one file
> > is written to two AGs. The following is part of file bitmap:
> 
> That's true only as long as a stream does not time out. An AG is
> reserved only as long a the timeout since the last file in a
> stream was created or allocated to.
> 
> IOWs, if you use buffered I/O, the 30s writeback delay could time
> your stream out between file creation and write() syscall and
> when pdflush writes it back. Then you have no stream association
> and you will get interleaving. Test 172 tests this behaviour,
> and we get intermittent failures on that test because the buffered
> I/O case occasionally succeeds rather than fails like it is supposed
> to....
> 
> What's your stream timeout (/proc/sys/fs/xfs/filestream_centisecs) set to?
> 
> Cheers,
> 
> Dave.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2007-09-22  0:48 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-09-21  3:10 Questions about testing the Filestream feature Hxsrmeng
2007-09-21  7:54 ` David Chinner
2007-09-21 20:45   ` hxsrmeng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox