* Questions about testing the Filestream feature
@ 2007-09-21 3:10 Hxsrmeng
2007-09-21 7:54 ` David Chinner
0 siblings, 1 reply; 3+ messages in thread
From: Hxsrmeng @ 2007-09-21 3:10 UTC (permalink / raw)
To: linux-xfs
Hi all,
I need to use the "Filestreams" feature. I wrote a script to write files to
two directories concurrently. When I check the file bitmap, I found
sometimes the files written in the different directories still interleave
extents on disk. I don't know whether there is something wrong with my
script, or, I misunderstand something.
I am using Opensuse10.2, the kernel is linux-2.6.23-rc4 (source code was
check out from cvs of oss.sgi.com). The filestreams feature is enabled with
a "-o filestreams" mount option.
Here is my script:
"
1 #try filestreams
2 a=$1
3 filenumber=`expr $a - 1`
4 filesize=$2
5
6 umount /xfs_disk
7 /sbin/mkfs -t xfs -f /dev/hda5 #>> logfile
8 mount -t xfs -o filestreams /dev/hda5 /xfs_disk
9 #enable filestreams
10
11 cd /xfs_disk
12 for dirname in dira dirb
13 do
14 mkdir $dirname
15 for filename in `seq 0 $filenumber`
16 do
17 dd if=/dev/zero of=$dirname/$filename bs=$filesize count=1 >
/dev/null 2>&1 &
18 done
19 done
20
21 wait
22 for dirname in dira dirb
23 do
24 for filename in `seq 0 $filenumber`
25 do
26 /usr/sbin/xfs_bmap -v $dirname/$filename > bmapresult
27 cat bmapresult >> bitmap
28 a="expr `wc -l bmapresult | awk '{print $1}'` - 2"
29 b=`$a`
30 c=`tail -$b bmapresult | awk '{ print $4 }'`
31 echo $dirname/$filename is in AG $c:>>agmap
32 done
33 done
"
Then I got the information of my xfs device first :
meta-data=/dev/hda5 isize=256 agcount=8, agsize=159895
blks
= sectsz=512 attr=0
data = bsize=4096 blocks=1279160,
imaxpct=25
= sunit=0 swidth=0 blks,
unwritten=1
naming = version 2 bsize=4096
log = internal log bsize=4096
blocks=2560, version=1
= sectsz=512 sunit=0 blks
realtime = none extsz=65536
blocks=0, rtextents=0
First run, I wrote 3 "big" files, which are 768M, to each directories. The
files in directory dira share AG 0,2,5,7 and files in directory dirb share
AG 1, 3, 4, 6, which I assume should be correct. But the files extents
doesn't use contiguous blocks, and all files in the same directory put some
of their extents in AG 0. I am not sure whether this is correct. Here is
part of file bitmap:
"
dira/0:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
0: [0..7615]: 96..7711 0 (96..7711) 7616
1: [7616..7679]: 33312..33375 0 (33312..33375) 64
2: [7680..24063]: 33448..49831 0 (33448..49831) 16384
3: [24064..52999]: 60608..89543 0 (60608..89543) 28936
4: [53000..61191]: 95496..103687 0 (95496..103687) 8192
5: [61192..90791]: 119088..148687 0 (119088..148687) 29600
6: [90792..131751]: 170264..211223 0 (170264..211223) 40960
7: [131752..144223]: 219480..231951 0 (219480..231951) 12472
8: [144224..168799]: 240144..264719 0 (240144..264719) 24576
...
dira/1:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
0: [0..12791]: 7712..20503 0 (7712..20503) 12792
1: [12792..12863]: 33376..33447 0 (33376..33447) 72
2: [12864..13391]: 49832..50359 0 (49832..50359) 528
3: [13392..19575]: 112904..119087 0 (112904..119087) 6184
4: [19576..27767]: 148688..156879 0 (148688..156879) 8192
5: [27768..35959]: 211224..219415 0 (211224..219415) 8192
6: [35960..44151]: 231952..240143 0 (231952..240143) 8192
7: [44152..68727]: 264784..289359 0 (264784..289359) 24576
8: [68728..79047]: 309400..319719 0 (309400..319719) 10320
"
Second run, I wrote 1024 "small" files, which are 1M, to each directories.
Files in directory dira use AG 0,1,3 and files in directory b use AG
2,1,5,6,7,4. So files written in directory dirb use the allocation group 1,
which should be reserved for directory dira . And, sometimes even one file
is written to two AGs. The following is part of file bitmap:
"
...
dira/498:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
0: [0..63]: 570848..570911 0 (570848..570911) 64
1: [64..2047]: 1666600..1668583 1 (387440..389423) 1984
dira/499:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
0: [0..23]: 571672..571695 0 (571672..571695) 24
1: [24..79]: 571776..571831 0 (571776..571831) 56
2: [80..1839]: 1650616..1652375 1 (371456..373215) 1760
3: [1840..1903]: 1662240..1662303 1 (383080..383143) 64
4: [1904..2047]: 1676984..1677127 1 (397824..397967) 144
...
dirb/4:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
0: [0..2047]: 1279264..1281311 1 (104..2151) 2048
dirb/5:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
0: [0..63]: 1352136..1352199 1 (72976..73039) 64
1: [64..415]: 1451896..1452247 1 (172736..173087) 352
2: [416..1279]: 1633616..1634479 1 (354456..355319) 864
3: [1280..1343]: 1647288..1647351 1 (368128..368191) 64
4: [1344..2047]: 1677128..1677831 1 (397968..398671) 704
dirb/6:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
0: [0..2047]: 1285408..1287455 1 (6248..8295) 2048
....
"
I don't know whether the filestreams feature may works this way, or I made
some mistakes?
Thank you so much!
--
View this message in context: http://www.nabble.com/Questions-about-testing-the-Filestream-feature-tf4491605.html#a12809900
Sent from the linux-xfs mailing list archive at Nabble.com.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Questions about testing the Filestream feature
2007-09-21 3:10 Questions about testing the Filestream feature Hxsrmeng
@ 2007-09-21 7:54 ` David Chinner
2007-09-21 20:45 ` hxsrmeng
0 siblings, 1 reply; 3+ messages in thread
From: David Chinner @ 2007-09-21 7:54 UTC (permalink / raw)
To: Hxsrmeng; +Cc: linux-xfs
On Thu, Sep 20, 2007 at 08:10:31PM -0700, Hxsrmeng wrote:
>
> Hi all,
>
> I need to use the "Filestreams" feature. I wrote a script to write files to
> two directories concurrently. When I check the file bitmap, I found
> sometimes the files written in the different directories still interleave
> extents on disk. I don't know whether there is something wrong with my
> script, or, I misunderstand something.
>
> I am using Opensuse10.2, the kernel is linux-2.6.23-rc4 (source code was
> check out from cvs of oss.sgi.com). The filestreams feature is enabled with
> a "-o filestreams" mount option.
> Here is my script:
<snip>
Very similar to xfsqa tests 170-174.
> Then I got the information of my xfs device first :
> meta-data=/dev/hda5 isize=256 agcount=8, agsize=159895 blks
> = sectsz=512 attr=0
> data = bsize=4096 blocks=1279160,imaxpct=25
Ok, so an AG ~600MB in size, and your filesystem is about 5GB.
> First run, I wrote 3 "big" files, which are 768M, to each directories. The
> files in directory dira share AG 0,2,5,7 and files in directory dirb share
> AG 1, 3, 4, 6, which I assume should be correct.
Yes, and 3*3*768 = 4GB ~= 80% full.
> But the files extents
> doesn't use contiguous blocks,
filestreams doesn't guarantee contiguous extents - it guarantees sets of files
separated by directories don't intertwine. Within the each set you can see
non-contiguous allocation, but the sets should not interleave in the same
AGs...
> and all files in the same directory put some
> of their extents in AG 0.
AG 0 is the "filestreams failure" allocation group. What you are seeing is
that at some point you've filled your AG's up and a stream write couldn't find
an unused AG that matched the stream association criteria and it gave up.
> I am not sure whether this is correct. Here is
> part of file bitmap:
> "
> dira/0:
> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
> 0: [0..7615]: 96..7711 0 (96..7711) 7616
> 1: [7616..7679]: 33312..33375 0 (33312..33375) 64
> 2: [7680..24063]: 33448..49831 0 (33448..49831) 16384
> 3: [24064..52999]: 60608..89543 0 (60608..89543) 28936
> 4: [53000..61191]: 95496..103687 0 (95496..103687) 8192
> 5: [61192..90791]: 119088..148687 0 (119088..148687) 29600
> 6: [90792..131751]: 170264..211223 0 (170264..211223) 40960
> 7: [131752..144223]: 219480..231951 0 (219480..231951) 12472
> 8: [144224..168799]: 240144..264719 0 (240144..264719) 24576
Ummm - that's a file stat started in AG 0....
> ...
> dira/1:
> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
> 0: [0..12791]: 7712..20503 0 (7712..20503) 12792
> 1: [12792..12863]: 33376..33447 0 (33376..33447) 72
> 2: [12864..13391]: 49832..50359 0 (49832..50359) 528
> 3: [13392..19575]: 112904..119087 0 (112904..119087) 6184
> 4: [19576..27767]: 148688..156879 0 (148688..156879) 8192
> 5: [27768..35959]: 211224..219415 0 (211224..219415) 8192
> 6: [35960..44151]: 231952..240143 0 (231952..240143) 8192
> 7: [44152..68727]: 264784..289359 0 (264784..289359) 24576
> 8: [68728..79047]: 309400..319719 0 (309400..319719) 10320
And so is that. Given that they are in the same directory, this is correct
behaviour.
How much memory in your test box? I suspect that you're getting writeback
from kswapd, not pdflush as you are doing buffered I/O and you're getting
LRU order writeback rather than nice sequential writeback. It's up to the
user/application to prevent intra-stream allocation/fragmentation
problems (e.g. preallocation, extent size hints, large direct I/O, etc)
and that is what your test application is lacking. filestreams only
prevents inter-stream interleaving.
Also, you are running close to filesystem full state. That is known to be
a no-no for deterministic performance from the filesystem, will cause
filesystem fragmentation, and is not the case that filestreams is
designed to optimise for.
however, I agree that the code is not working optimally. In test 171,
there is this comment:
# test large numbers of files, single I/O per file, 120s timeout
# Get close to filesystem full.
# 128 = ENOSPC
# 120 = 93.75% full, gets repeatable failures
# 112 = 87.5% full, should reliably succeed but doesn't *FIXME*
# 100 = 78.1% full, should reliably succeed
The test uses a 1GB filesystem to intentionally stress the allocator,
and at 78.1% full, we are getting intermittent failures. On
some machines (like my test boxes) it passes >95% of the time.
On other machines, it passes maybe 5% of the time. So the low
space behaviour is known to be less than optimal, but at production
sites it is known that they can't use the last 10-15% of the filesystem
because of fragmentation issues associated with stripe alignment.
Hence low space behaviour of the allocator is not considered something
critical because there are other, worse problems at low space
that filestreams can't do anything to prevent.
> Second run, I wrote 1024 "small" files, which are 1M, to each directories.
> Files in directory dira use AG 0,1,3 and files in directory b use AG
> 2,1,5,6,7,4. So files written in directory dirb use the allocation group 1,
> which should be reserved for directory dira . And, sometimes even one file
> is written to two AGs. The following is part of file bitmap:
That's true only as long as a stream does not time out. An AG is
reserved only as long a the timeout since the last file in a
stream was created or allocated to.
IOWs, if you use buffered I/O, the 30s writeback delay could time
your stream out between file creation and write() syscall and
when pdflush writes it back. Then you have no stream association
and you will get interleaving. Test 172 tests this behaviour,
and we get intermittent failures on that test because the buffered
I/O case occasionally succeeds rather than fails like it is supposed
to....
What's your stream timeout (/proc/sys/fs/xfs/filestream_centisecs) set to?
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Questions about testing the Filestream feature
2007-09-21 7:54 ` David Chinner
@ 2007-09-21 20:45 ` hxsrmeng
0 siblings, 0 replies; 3+ messages in thread
From: hxsrmeng @ 2007-09-21 20:45 UTC (permalink / raw)
To: David Chinner; +Cc: XFS
Thank you so much. You really helped me a lot.
Sorry that I had to learn something form the net and manuals first to
understand what you said.:)
My RAM is only 512M and the stream timeout is 3s.......That might be the
problem.I will try this on a test box with bigger RAM and set the stream
timeout to 30s.
> It's up to the
> user/application to prevent intra-stream allocation/fragmentation
> problems (e.g. preallocation, extent size hints, large direct I/O, etc)
> and that is what your test application is lacking. filestreams only
> prevents inter-stream interleaving.
I'll try to get more information about how to modify my script according
to your suggestion.Thank you again!
Have a nice weekend!
Hxsrmeng
On Fri, 2007-09-21 at 17:54 +1000, David Chinner wrote:
> On Thu, Sep 20, 2007 at 08:10:31PM -0700, Hxsrmeng wrote:
> >
> > Hi all,
> >
> > I need to use the "Filestreams" feature. I wrote a script to write files to
> > two directories concurrently. When I check the file bitmap, I found
> > sometimes the files written in the different directories still interleave
> > extents on disk. I don't know whether there is something wrong with my
> > script, or, I misunderstand something.
> >
> > I am using Opensuse10.2, the kernel is linux-2.6.23-rc4 (source code was
> > check out from cvs of oss.sgi.com). The filestreams feature is enabled with
> > a "-o filestreams" mount option.
> > Here is my script:
>
> <snip>
>
> Very similar to xfsqa tests 170-174.
>
> > Then I got the information of my xfs device first :
> > meta-data=/dev/hda5 isize=256 agcount=8, agsize=159895 blks
> > = sectsz=512 attr=0
> > data = bsize=4096 blocks=1279160,imaxpct=25
>
> Ok, so an AG ~600MB in size, and your filesystem is about 5GB.
>
> > First run, I wrote 3 "big" files, which are 768M, to each directories. The
> > files in directory dira share AG 0,2,5,7 and files in directory dirb share
> > AG 1, 3, 4, 6, which I assume should be correct.
>
> Yes, and 3*3*768 = 4GB ~= 80% full.
>
> > But the files extents
> > doesn't use contiguous blocks,
>
> filestreams doesn't guarantee contiguous extents - it guarantees sets of files
> separated by directories don't intertwine. Within the each set you can see
> non-contiguous allocation, but the sets should not interleave in the same
> AGs...
>
> > and all files in the same directory put some
> > of their extents in AG 0.
>
> AG 0 is the "filestreams failure" allocation group. What you are seeing is
> that at some point you've filled your AG's up and a stream write couldn't find
> an unused AG that matched the stream association criteria and it gave up.
>
> > I am not sure whether this is correct. Here is
> > part of file bitmap:
> > "
> > dira/0:
> > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
> > 0: [0..7615]: 96..7711 0 (96..7711) 7616
> > 1: [7616..7679]: 33312..33375 0 (33312..33375) 64
> > 2: [7680..24063]: 33448..49831 0 (33448..49831) 16384
> > 3: [24064..52999]: 60608..89543 0 (60608..89543) 28936
> > 4: [53000..61191]: 95496..103687 0 (95496..103687) 8192
> > 5: [61192..90791]: 119088..148687 0 (119088..148687) 29600
> > 6: [90792..131751]: 170264..211223 0 (170264..211223) 40960
> > 7: [131752..144223]: 219480..231951 0 (219480..231951) 12472
> > 8: [144224..168799]: 240144..264719 0 (240144..264719) 24576
>
> Ummm - that's a file stat started in AG 0....
>
> > ...
> > dira/1:
> > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
> > 0: [0..12791]: 7712..20503 0 (7712..20503) 12792
> > 1: [12792..12863]: 33376..33447 0 (33376..33447) 72
> > 2: [12864..13391]: 49832..50359 0 (49832..50359) 528
> > 3: [13392..19575]: 112904..119087 0 (112904..119087) 6184
> > 4: [19576..27767]: 148688..156879 0 (148688..156879) 8192
> > 5: [27768..35959]: 211224..219415 0 (211224..219415) 8192
> > 6: [35960..44151]: 231952..240143 0 (231952..240143) 8192
> > 7: [44152..68727]: 264784..289359 0 (264784..289359) 24576
> > 8: [68728..79047]: 309400..319719 0 (309400..319719) 10320
>
> And so is that. Given that they are in the same directory, this is correct
> behaviour.
>
> How much memory in your test box? I suspect that you're getting writeback
> from kswapd, not pdflush as you are doing buffered I/O and you're getting
> LRU order writeback rather than nice sequential writeback. It's up to the
> user/application to prevent intra-stream allocation/fragmentation
> problems (e.g. preallocation, extent size hints, large direct I/O, etc)
> and that is what your test application is lacking. filestreams only
> prevents inter-stream interleaving.
>
> Also, you are running close to filesystem full state. That is known to be
> a no-no for deterministic performance from the filesystem, will cause
> filesystem fragmentation, and is not the case that filestreams is
> designed to optimise for.
>
> however, I agree that the code is not working optimally. In test 171,
> there is this comment:
>
> # test large numbers of files, single I/O per file, 120s timeout
> # Get close to filesystem full.
> # 128 = ENOSPC
> # 120 = 93.75% full, gets repeatable failures
> # 112 = 87.5% full, should reliably succeed but doesn't *FIXME*
> # 100 = 78.1% full, should reliably succeed
>
> The test uses a 1GB filesystem to intentionally stress the allocator,
> and at 78.1% full, we are getting intermittent failures. On
> some machines (like my test boxes) it passes >95% of the time.
> On other machines, it passes maybe 5% of the time. So the low
> space behaviour is known to be less than optimal, but at production
> sites it is known that they can't use the last 10-15% of the filesystem
> because of fragmentation issues associated with stripe alignment.
> Hence low space behaviour of the allocator is not considered something
> critical because there are other, worse problems at low space
> that filestreams can't do anything to prevent.
>
> > Second run, I wrote 1024 "small" files, which are 1M, to each directories.
> > Files in directory dira use AG 0,1,3 and files in directory b use AG
> > 2,1,5,6,7,4. So files written in directory dirb use the allocation group 1,
> > which should be reserved for directory dira . And, sometimes even one file
> > is written to two AGs. The following is part of file bitmap:
>
> That's true only as long as a stream does not time out. An AG is
> reserved only as long a the timeout since the last file in a
> stream was created or allocated to.
>
> IOWs, if you use buffered I/O, the 30s writeback delay could time
> your stream out between file creation and write() syscall and
> when pdflush writes it back. Then you have no stream association
> and you will get interleaving. Test 172 tests this behaviour,
> and we get intermittent failures on that test because the buffered
> I/O case occasionally succeeds rather than fails like it is supposed
> to....
>
> What's your stream timeout (/proc/sys/fs/xfs/filestream_centisecs) set to?
>
> Cheers,
>
> Dave.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2007-09-22 0:48 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-09-21 3:10 Questions about testing the Filestream feature Hxsrmeng
2007-09-21 7:54 ` David Chinner
2007-09-21 20:45 ` hxsrmeng
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox