* Re: write 'O_DIRECT' file w/odd amount of data: desirable result? [not found] ` <4D64E2BB.7010000@draigBrady.com> @ 2011-02-23 18:04 ` Linda A. Walsh 2011-02-24 1:18 ` Pádraig Brady 2011-02-24 9:26 ` Dave Chinner 0 siblings, 2 replies; 4+ messages in thread From: Linda A. Walsh @ 2011-02-23 18:04 UTC (permalink / raw) To: Padraig Brady <PadraigBrady.com>; +Cc: LKML, xfs-oss FWIW -- xfs-oss, included as 'last line' was of minor interest; known bug on this kernel?: Linux Ishtar 2.6.35.7-T610-Vanilla-1 #2 SMP PREEMPT Mon Oct 11 17:19:41 PDT 2010 x86_64 x86_64 x86_64 GNU/Linux Pádraig Brady wrote: > On 23/02/11 04:30, Linda Walsh wrote: > >> I understand, somewhat, what is happening. I have two different utils, >> 'dd' and mbuffer both of which have a 'direct' option to write to disk. >> mbuffer was from my distro with a direct added, which is >> >> I'm not sure if it's truncating the write to the lower bound of the >> sector size or the file-allocation-unit size but from a {dump|cat}, >> piped into {cat, dd mbuffer}, the output sizes are: >> file size delta >> ------------- ---------- ---- >> dumptest.cat 5776419696 >> dumptest.dd 5776343040 76656 >> dumptest.mbuff 5368709120 407710576 >> - params: >> dd of=dumptest.dd bs=512M oflag=direct >> mbuffer -b 5 -s 512m -direct -f -o dumptest.mbuff >> ---- >> I'm not aware of what either did, but no doubt neither expected an >> error in the final write and didn't handle the results properly. >> Vanilla kernel 2.6.35-7 x86_64 (SMP PREMPT) >> > Note dd will turn off O_DIRECT for the last write > if it's less than the block size. > http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=5929322c > ------ FWIW, 'dd' is from 'coreutils-7.1-3.2.x86_64' (from the suse 11.2 release): While I used dump (xfsdump to be precise) to produce my initial output to mbuffer, it was the error message at the end which caught my attention. Prior I had a tried a series of filters after the initial mem-to-mem buffer performed by 'dd', then later 'mbuffer'. The filters were successively lower-io compress options over the years as disk and network speeds rose and cpu-compression became the choke-point. (xfsdump -b 512m )|(initially 'dd', later, 'mbuffer' )| \ (su -f -m backup -c "$umask $um;${Compress:-} ${Compress_ops:-} \ >${Dmpfile}${Compress_ext}" ) --- Eventually I wanted to get rid of the final filter step altogether and have that 'buffer' statement after the 'dump' go direct to disk, then later "--direct" to disk... It was adding the 'DIRECT' flag then that I noticed mbuffer's error. My first debug step was to go for a shorter dump file (the one that failed on was over 3TB and took over 3h to reproduce). Then I substituted 'cat' as that final filter and ended up with my 'testfile' I used for later tests for 'mbuffer' and 'dd'. NOTE: I tried using the 'iflag=fullblock' as you recommend and it made the problem 'consistent' with the output of 'mbuffer', i.e. it transfered less data and the truncation was consistent with a 512M divisor, indicating it was 'cat' default record output size that was causing the difference. If I use 'dd' to read the base file (no direct i/o) I get consistent results with 'mbuffer' and 'dd': Input: DumpTest.out: 5776419696 Output file sizes are as reported by 'dd', with 'test1' giving the closest answer (short record line concatenated with ' & '): test1> cat DumpTest.out |dd of=dumptest.dd-fb oflag=direct bs=512M dd: writing `dumptest.dd-fb': Invalid argument 0+7346 records in & 0+7345 records out 5776343040 bytes (5.8 GB) copied, 12.4361 s, 464 MB/s test2> cat DumpTest.out |dd of=dumptest.dd+fb oflag=direct bs=512M iflag=fullblock dd: writing `dumptest.dd+fb': Invalid argument 10+1 records in & 10+0 records out 5368709120 bytes (5.4 GB) copied, 12.581 s, 427 MB/s test3> dd if=DumpTest.out bs=512M |dd of=dumptest2.dd+fb oflag=direct bs=512M iflag=fullblock 10+1 records in & 10+1 records out 5776419696 bytes (5.8 GB) copieddd: writing `dumptest2.dd+fb', 11.6493 s, 496 MB/s : Invalid argument 10+1 records in & 10+0 records out 5368709120 bytes (5.4 GB) copied, 11.6513 s, 461 MB/s test4> dd if=DumpTest.out bs=512M |dd of=dumptest2.dd-fb oflag=direct bs=512M 10+1 records in & 10+1 records out dd: writing `dumptest2.dd-fb'5776419696 bytes (5.8 GB) copied, 11.4503 s, 504 MB/s : Invalid argument 10+1 records in & 10+0 records out 5368709120 bytes (5.4 GB) copied, 11.4503 s, 469 MB/s --- I've tried significantly shorter files and NOT had this problem (record size=64k, and 2 files one @ 57k and one at 64+57k). Both copied fine. Something to do with large file buffers. Of *SIGNIFICANT* note. In trying to create an empty file of the size used, from scratch, using 'xfs_mkfile', I got an error: > xfs_mkfile 5776419696 testfile pwrite64: Invalid argument --- I'm having problems generating new kernels (will ask in separate message) so will have to fix those before moving ahead... _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: write 'O_DIRECT' file w/odd amount of data: desirable result? 2011-02-23 18:04 ` write 'O_DIRECT' file w/odd amount of data: desirable result? Linda A. Walsh @ 2011-02-24 1:18 ` Pádraig Brady 2011-02-24 9:26 ` Dave Chinner 1 sibling, 0 replies; 4+ messages in thread From: Pádraig Brady @ 2011-02-24 1:18 UTC (permalink / raw) To: Linda A. Walsh; +Cc: LKML, xfs-oss On 23/02/11 18:04, Linda A. Walsh wrote: > I tried using the 'iflag=fullblock' as you recommend and it made the > problem > 'consistent' with the output of 'mbuffer', i.e. it transfered less data > and the truncation was consistent with a 512M divisor, indicating it was > 'cat' default record output size that was causing the difference. Right. That's expected as with 'fullblock', both mbuffer and dd will read/write 512M at a time. Both will fail in the same way when they try to write the odd sized chunk at the end. This was only changed for dd in version coreutils 7.5 (where it reverts to a standard write for the last chunk) > I've tried significantly shorter files and NOT had this problem > (record size=64k, and 2 files oneat 64+57k). Both copied > fine. > Something to do with large file buffers. Small blocks cause an issue on ext[34] at least. I modified dd here to behave like yours and got: $ truncate -s513 small $ dd oflag=direct if=small of=small.out ./dd: writing `small.out': Invalid argument > Of *SIGNIFICANT* note. In trying to create an empty file of the size > used, from scratch, using 'xfs_mkfile', I got an error: > >> xfs_mkfile 5776419696 testfile > pwrite64: Invalid argument Looks like that uses the same O_DIRECT write method with the same issues? You could try fallocate(1) which is newly available in util-linux and might be supported by your xfs. cheers, Pádraig. p.s. dd would if written today default to using fullblock. For backwards and POSIX compat though we must keep the current default behavior p.p.s. There are situations were fullblock is required, and I'll patch dd soon to auto apply that option when appropriate. [io]flag=direct is one of those cases I think. p.p.p.s coreutils 8.11 should have the oflag=nocache option which will write to disk without using up your page cache, and also avoiding O_DIRECT constraints. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: write 'O_DIRECT' file w/odd amount of data: desirable result? 2011-02-23 18:04 ` write 'O_DIRECT' file w/odd amount of data: desirable result? Linda A. Walsh 2011-02-24 1:18 ` Pádraig Brady @ 2011-02-24 9:26 ` Dave Chinner 2011-03-02 2:27 ` RFE kernel option to do the desirable thing, w/regards to 'O_DIRECT' and mis-aligned data Linda Walsh 1 sibling, 1 reply; 4+ messages in thread From: Dave Chinner @ 2011-02-24 9:26 UTC (permalink / raw) To: Linda A. Walsh; +Cc: PXXdraig Brady, LKML, xfs-oss On Wed, Feb 23, 2011 at 10:04:30AM -0800, Linda A. Walsh wrote: > > > FWIW -- xfs-oss, included as 'last line' was of minor interest; known bug on > this kernel?: > Linux Ishtar 2.6.35.7-T610-Vanilla-1 #2 SMP PREEMPT Mon Oct 11 > 17:19:41 PDT 2010 x86_64 x86_64 x86_64 GNU/Linux .... > Of *SIGNIFICANT* note. In trying to create an empty file of the size > used, from scratch, using 'xfs_mkfile', I got an error: > > > xfs_mkfile 5776419696 testfile > pwrite64: Invalid argument xfs_mkfile does not create an "empty" file. It creates a file that is full of zeros. iAnd you're getting that erro because: 5776419696 / 512 = 11,282,069.7188 the last write is not a multiple of the sector size and xfs_mkfile uses direct IO. It has always failed when you try to do this. If you want to create allocated, zeroed files of abitrary size, then use: xfs_io -f -c "truncate $size" -c "resvsp 0 $size" $filename to preallocate it. it'll be much, much faster than xfs_mkfile. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: RFE kernel option to do the desirable thing, w/regards to 'O_DIRECT' and mis-aligned data 2011-02-24 9:26 ` Dave Chinner @ 2011-03-02 2:27 ` Linda Walsh 0 siblings, 0 replies; 4+ messages in thread From: Linda Walsh @ 2011-03-02 2:27 UTC (permalink / raw) To: LKML; +Cc: PXXdraig Brady, xfs-oss Thanks for the shorthand Dave, but I wasn't really trying to use xfs_mkfs to make a file that was failing -- but was more trying to use it as an example of supporting the idea that both should succeed, and if a write is a partial write to an O_DIRECT file, that it be allowed to succeed and the kernel, knowing the device's minimum write size from the driver, could buffer the last sector. To deal with back-compat issues, it could be based off of a proc var like /proc/kernel/fs/direct_IO_handling using bitfields (or multiple vars if you don't like bitfields, I s with the bits defined as: Bit 0 Controlling allowed partial writes that start at an aligned position Bit 1 Controlling allowed non-aligned writes Bit 2 Controlling allowed partial reads that start at aligned position Bit 3 Controlling allowed non-aligned reads Bit 4 Controlling whether to use general FS cache for affected sectors It's a bit of 'overkill' for what I wanted (just case controlled by Bit 0), but for sake of completeness, I thought all of these combinations should be specified. Default of 0 = current behavior of mis-aligned data accesses failing, while specifying various combinations would allow for variations with the kernel handling mis-aligned accesses automatically, much like the x86 processor handles mis-aligned integer additions or stacks automatically (perhaps at a performance penalty, but with a tendency toward 'working' rather than failing, if possible). It seems better to put that logic in the kernel rather than saddle multiple applications using DIRECT I/O with handling the non-aligned cases. This seems especially useful given the long term trend toward increasing use of static-memory devices which will likely support arbitrary direct I/O sizes. Linda Walsh _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2011-03-02 2:25 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <4D648D7D.7040500@tlinx.org>
[not found] ` <4D64E2BB.7010000@draigBrady.com>
2011-02-23 18:04 ` write 'O_DIRECT' file w/odd amount of data: desirable result? Linda A. Walsh
2011-02-24 1:18 ` Pádraig Brady
2011-02-24 9:26 ` Dave Chinner
2011-03-02 2:27 ` RFE kernel option to do the desirable thing, w/regards to 'O_DIRECT' and mis-aligned data Linda Walsh
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox