From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: with ECARTIS (v1.0.0; list xfs); Fri, 21 Sep 2007 17:48:20 -0700 (PDT) Received: from fk-out-0910.google.com (fk-out-0910.google.com [209.85.128.189]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l8M0mDQ3028820 for ; Fri, 21 Sep 2007 17:48:15 -0700 Received: by fk-out-0910.google.com with SMTP id 18so859105fks for ; Fri, 21 Sep 2007 17:48:16 -0700 (PDT) Subject: Re: Questions about testing the Filestream feature From: hxsrmeng Reply-To: hxsrmeng@gmail.com In-Reply-To: <20070921075459.GJ995458@sgi.com> References: <12809900.post@talk.nabble.com> <20070921075459.GJ995458@sgi.com> Content-Type: text/plain Date: Fri, 21 Sep 2007 16:45:55 -0400 Message-Id: <1190407555.5765.8.camel@OpenSUSE-desktop.Home> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com List-Id: xfs To: David Chinner Cc: XFS Thank you so much. You really helped me a lot. Sorry that I had to learn something form the net and manuals first to understand what you said.:) My RAM is only 512M and the stream timeout is 3s.......That might be the problem.I will try this on a test box with bigger RAM and set the stream timeout to 30s. > It's up to the > user/application to prevent intra-stream allocation/fragmentation > problems (e.g. preallocation, extent size hints, large direct I/O, etc) > and that is what your test application is lacking. filestreams only > prevents inter-stream interleaving. I'll try to get more information about how to modify my script according to your suggestion.Thank you again! Have a nice weekend! Hxsrmeng On Fri, 2007-09-21 at 17:54 +1000, David Chinner wrote: > On Thu, Sep 20, 2007 at 08:10:31PM -0700, Hxsrmeng wrote: > > > > Hi all, > > > > I need to use the "Filestreams" feature. I wrote a script to write files to > > two directories concurrently. When I check the file bitmap, I found > > sometimes the files written in the different directories still interleave > > extents on disk. I don't know whether there is something wrong with my > > script, or, I misunderstand something. > > > > I am using Opensuse10.2, the kernel is linux-2.6.23-rc4 (source code was > > check out from cvs of oss.sgi.com). The filestreams feature is enabled with > > a "-o filestreams" mount option. > > Here is my script: > > > > Very similar to xfsqa tests 170-174. > > > Then I got the information of my xfs device first : > > meta-data=/dev/hda5 isize=256 agcount=8, agsize=159895 blks > > = sectsz=512 attr=0 > > data = bsize=4096 blocks=1279160,imaxpct=25 > > Ok, so an AG ~600MB in size, and your filesystem is about 5GB. > > > First run, I wrote 3 "big" files, which are 768M, to each directories. The > > files in directory dira share AG 0,2,5,7 and files in directory dirb share > > AG 1, 3, 4, 6, which I assume should be correct. > > Yes, and 3*3*768 = 4GB ~= 80% full. > > > But the files extents > > doesn't use contiguous blocks, > > filestreams doesn't guarantee contiguous extents - it guarantees sets of files > separated by directories don't intertwine. Within the each set you can see > non-contiguous allocation, but the sets should not interleave in the same > AGs... > > > and all files in the same directory put some > > of their extents in AG 0. > > AG 0 is the "filestreams failure" allocation group. What you are seeing is > that at some point you've filled your AG's up and a stream write couldn't find > an unused AG that matched the stream association criteria and it gave up. > > > I am not sure whether this is correct. Here is > > part of file bitmap: > > " > > dira/0: > > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL > > 0: [0..7615]: 96..7711 0 (96..7711) 7616 > > 1: [7616..7679]: 33312..33375 0 (33312..33375) 64 > > 2: [7680..24063]: 33448..49831 0 (33448..49831) 16384 > > 3: [24064..52999]: 60608..89543 0 (60608..89543) 28936 > > 4: [53000..61191]: 95496..103687 0 (95496..103687) 8192 > > 5: [61192..90791]: 119088..148687 0 (119088..148687) 29600 > > 6: [90792..131751]: 170264..211223 0 (170264..211223) 40960 > > 7: [131752..144223]: 219480..231951 0 (219480..231951) 12472 > > 8: [144224..168799]: 240144..264719 0 (240144..264719) 24576 > > Ummm - that's a file stat started in AG 0.... > > > ... > > dira/1: > > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL > > 0: [0..12791]: 7712..20503 0 (7712..20503) 12792 > > 1: [12792..12863]: 33376..33447 0 (33376..33447) 72 > > 2: [12864..13391]: 49832..50359 0 (49832..50359) 528 > > 3: [13392..19575]: 112904..119087 0 (112904..119087) 6184 > > 4: [19576..27767]: 148688..156879 0 (148688..156879) 8192 > > 5: [27768..35959]: 211224..219415 0 (211224..219415) 8192 > > 6: [35960..44151]: 231952..240143 0 (231952..240143) 8192 > > 7: [44152..68727]: 264784..289359 0 (264784..289359) 24576 > > 8: [68728..79047]: 309400..319719 0 (309400..319719) 10320 > > And so is that. Given that they are in the same directory, this is correct > behaviour. > > How much memory in your test box? I suspect that you're getting writeback > from kswapd, not pdflush as you are doing buffered I/O and you're getting > LRU order writeback rather than nice sequential writeback. It's up to the > user/application to prevent intra-stream allocation/fragmentation > problems (e.g. preallocation, extent size hints, large direct I/O, etc) > and that is what your test application is lacking. filestreams only > prevents inter-stream interleaving. > > Also, you are running close to filesystem full state. That is known to be > a no-no for deterministic performance from the filesystem, will cause > filesystem fragmentation, and is not the case that filestreams is > designed to optimise for. > > however, I agree that the code is not working optimally. In test 171, > there is this comment: > > # test large numbers of files, single I/O per file, 120s timeout > # Get close to filesystem full. > # 128 = ENOSPC > # 120 = 93.75% full, gets repeatable failures > # 112 = 87.5% full, should reliably succeed but doesn't *FIXME* > # 100 = 78.1% full, should reliably succeed > > The test uses a 1GB filesystem to intentionally stress the allocator, > and at 78.1% full, we are getting intermittent failures. On > some machines (like my test boxes) it passes >95% of the time. > On other machines, it passes maybe 5% of the time. So the low > space behaviour is known to be less than optimal, but at production > sites it is known that they can't use the last 10-15% of the filesystem > because of fragmentation issues associated with stripe alignment. > Hence low space behaviour of the allocator is not considered something > critical because there are other, worse problems at low space > that filestreams can't do anything to prevent. > > > Second run, I wrote 1024 "small" files, which are 1M, to each directories. > > Files in directory dira use AG 0,1,3 and files in directory b use AG > > 2,1,5,6,7,4. So files written in directory dirb use the allocation group 1, > > which should be reserved for directory dira . And, sometimes even one file > > is written to two AGs. The following is part of file bitmap: > > That's true only as long as a stream does not time out. An AG is > reserved only as long a the timeout since the last file in a > stream was created or allocated to. > > IOWs, if you use buffered I/O, the 30s writeback delay could time > your stream out between file creation and write() syscall and > when pdflush writes it back. Then you have no stream association > and you will get interleaving. Test 172 tests this behaviour, > and we get intermittent failures on that test because the buffered > I/O case occasionally succeeds rather than fails like it is supposed > to.... > > What's your stream timeout (/proc/sys/fs/xfs/filestream_centisecs) set to? > > Cheers, > > Dave.