Re: write 'O_DIRECT' file w/odd amount of data: desirable result?

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* Re: write 'O_DIRECT' file w/odd amount of data: desirable result?
       [not found] ` <4D64E2BB.7010000@draigBrady.com>
@ 2011-02-23 18:04   ` Linda A. Walsh
  2011-02-24  1:18     ` Pádraig Brady
  2011-02-24  9:26     ` Dave Chinner
  0 siblings, 2 replies; 4+ messages in thread
From: Linda A. Walsh @ 2011-02-23 18:04 UTC (permalink / raw)
  To: Padraig Brady <PadraigBrady.com>; +Cc: LKML, xfs-oss



FWIW -- xfs-oss, included as 'last line' was of minor interest; known bug on
this kernel?:
Linux Ishtar 2.6.35.7-T610-Vanilla-1 #2 SMP PREEMPT Mon Oct 11 17:19:41 
PDT 2010 x86_64 x86_64 x86_64 GNU/Linux

Pádraig Brady wrote:
> On 23/02/11 04:30, Linda Walsh wrote:
>   
>> I understand, somewhat, what is happening. I have two different utils, 
>> 'dd' and mbuffer both of which have a 'direct' option to write to disk. 
>> mbuffer was from my distro with a direct added, which is
>>
>> I'm not sure if it's truncating the write to the lower bound of the 
>> sector size or the file-allocation-unit size but from a {dump|cat},
>> piped into {cat, dd mbuffer}, the output sizes are:
>> file              size       delta
>> -------------   ----------   ----
>> dumptest.cat    5776419696
>> dumptest.dd     5776343040   76656   
>> dumptest.mbuff  5368709120   407710576
>> - params:
>> dd of=dumptest.dd bs=512M oflag=direct
>> mbuffer -b 5 -s 512m -direct -f -o dumptest.mbuff
>> ----
>> I'm not aware of what either did, but no doubt neither expected an
>> error in the final write and didn't handle the results properly.
>> Vanilla kernel 2.6.35-7 x86_64 (SMP PREMPT)
>>     
> Note dd will turn off O_DIRECT for the last write
> if it's less than the block size.
> http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=5929322c
>   
------

FWIW, 'dd' is from 'coreutils-7.1-3.2.x86_64' (from the suse 11.2 release):

While I used dump (xfsdump to be precise) to produce my initial output
to mbuffer, it was the error message at the end which caught my attention.
Prior I had a tried a series of filters after the initial mem-to-mem
buffer performed by 'dd', then later 'mbuffer'.  The filters were
successively lower-io compress options over the years as disk and
network speeds rose and cpu-compression became the choke-point.

(xfsdump -b 512m )|(initially 'dd', later, 'mbuffer' )| \
         (su -f -m backup -c "$umask $um;${Compress:-} ${Compress_ops:-} \
                >${Dmpfile}${Compress_ext}" )
---
Eventually I wanted to get rid of the final filter step altogether and
have that 'buffer' statement after the 'dump' go direct to disk, then
later "--direct" to disk...

    It was adding the 'DIRECT' flag then that I noticed mbuffer's error.

    My first debug step was to go for a shorter dump file (the one that
failed on was over 3TB and took over 3h to reproduce).  Then I substituted
'cat' as that final filter and ended up with my 'testfile' I used for later
tests for 'mbuffer' and 'dd'.

NOTE:

I tried using the 'iflag=fullblock' as you recommend and it made the problem
'consistent' with the output of 'mbuffer', i.e. it transfered less data
and the truncation was consistent with a 512M divisor, indicating it was
'cat' default record output size that was causing the difference. 
If I use 'dd' to read the base file (no direct i/o) I get consistent results
with 'mbuffer' and 'dd':


Input: DumpTest.out: 5776419696

Output file sizes are as reported by 'dd', with 'test1' giving the closest
answer (short record line concatenated with ' & '):

test1> cat DumpTest.out |dd of=dumptest.dd-fb oflag=direct 
bs=512M          
dd: writing `dumptest.dd-fb': Invalid argument
0+7346 records in & 0+7345 records out
5776343040 bytes (5.8 GB) copied, 12.4361 s, 464 MB/s


test2> cat DumpTest.out |dd of=dumptest.dd+fb oflag=direct bs=512M 
iflag=fullblock
dd: writing `dumptest.dd+fb': Invalid argument
10+1 records in & 10+0 records out
5368709120 bytes (5.4 GB) copied, 12.581 s, 427 MB/s

test3> dd if=DumpTest.out bs=512M |dd of=dumptest2.dd+fb oflag=direct 
bs=512M iflag=fullblock               
10+1 records in & 10+1 records out
5776419696 bytes (5.8 GB) copieddd: writing `dumptest2.dd+fb', 11.6493 
s, 496 MB/s
: Invalid argument
10+1 records in & 10+0 records out
5368709120 bytes (5.4 GB) copied, 11.6513 s, 461 MB/s

test4> dd if=DumpTest.out bs=512M |dd of=dumptest2.dd-fb oflag=direct 
bs=512M
10+1 records in & 10+1 records out
dd: writing `dumptest2.dd-fb'5776419696 bytes (5.8 GB) copied, 11.4503 
s, 504 MB/s
: Invalid argument
10+1 records in & 10+0 records out
5368709120 bytes (5.4 GB) copied, 11.4503 s, 469 MB/s

---
I've tried significantly shorter files and NOT had this problem
(record size=64k, and 2 files one @ 57k and one at 64+57k). Both copied 
fine.
Something to do with large file buffers.

Of *SIGNIFICANT* note.  In trying to create an empty file of the size
used, from scratch, using 'xfs_mkfile', I got an error:

>  xfs_mkfile 5776419696 testfile
pwrite64: Invalid argument

---
I'm having problems generating new kernels (will ask in separate
message) so will have to fix those before moving ahead...




_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: write 'O_DIRECT' file w/odd amount of data: desirable result?
  2011-02-23 18:04   ` write 'O_DIRECT' file w/odd amount of data: desirable result? Linda A. Walsh
@ 2011-02-24  1:18     ` Pádraig Brady
  2011-02-24  9:26     ` Dave Chinner
  1 sibling, 0 replies; 4+ messages in thread
From: Pádraig Brady @ 2011-02-24  1:18 UTC (permalink / raw)
  To: Linda A. Walsh; +Cc: LKML, xfs-oss

On 23/02/11 18:04, Linda A. Walsh wrote:
> I tried using the 'iflag=fullblock' as you recommend and it made the
> problem
> 'consistent' with the output of 'mbuffer', i.e. it transfered less data
> and the truncation was consistent with a 512M divisor, indicating it was
> 'cat' default record output size that was causing the difference.

Right. That's expected as with 'fullblock', both mbuffer and dd
will read/write 512M at a time. Both will fail in the same
way when they try to write the odd sized chunk at the end.
This was only changed for dd in version coreutils 7.5
(where it reverts to a standard write for the last chunk)

> I've tried significantly shorter files and NOT had this problem
> (record size=64k, and 2 files oneat 64+57k). Both copied
> fine.
> Something to do with large file buffers.

Small blocks cause an issue on ext[34] at least.
I modified dd here to behave like yours and got:
$ truncate -s513 small
$ dd oflag=direct if=small of=small.out
./dd: writing `small.out': Invalid argument

> Of *SIGNIFICANT* note.  In trying to create an empty file of the size
> used, from scratch, using 'xfs_mkfile', I got an error:
> 
>>  xfs_mkfile 5776419696 testfile
> pwrite64: Invalid argument

Looks like that uses the same O_DIRECT write
method with the same issues?
You could try fallocate(1) which is newly available
in util-linux and might be supported by your xfs.

cheers,
Pádraig.

p.s. dd would if written today default to using fullblock.
For backwards and POSIX compat though we must keep
the current default behavior

p.p.s. There are situations were fullblock is required,
and I'll patch dd soon to auto apply that option when appropriate.
[io]flag=direct is one of those cases I think.

p.p.p.s coreutils 8.11 should have the oflag=nocache option
which will write to disk without using up your page cache,
and also avoiding O_DIRECT constraints.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: write 'O_DIRECT' file w/odd amount of data: desirable result?
  2011-02-23 18:04   ` write 'O_DIRECT' file w/odd amount of data: desirable result? Linda A. Walsh
  2011-02-24  1:18     ` Pádraig Brady
@ 2011-02-24  9:26     ` Dave Chinner
  2011-03-02  2:27       ` RFE kernel option to do the desirable thing, w/regards to 'O_DIRECT' and mis-aligned data Linda Walsh
  1 sibling, 1 reply; 4+ messages in thread
From: Dave Chinner @ 2011-02-24  9:26 UTC (permalink / raw)
  To: Linda A. Walsh; +Cc: PXXdraig Brady, LKML, xfs-oss

On Wed, Feb 23, 2011 at 10:04:30AM -0800, Linda A. Walsh wrote:
> 
> 
> FWIW -- xfs-oss, included as 'last line' was of minor interest; known bug on
> this kernel?:
> Linux Ishtar 2.6.35.7-T610-Vanilla-1 #2 SMP PREEMPT Mon Oct 11
> 17:19:41 PDT 2010 x86_64 x86_64 x86_64 GNU/Linux
....
> Of *SIGNIFICANT* note.  In trying to create an empty file of the size
> used, from scratch, using 'xfs_mkfile', I got an error:
> 
> > xfs_mkfile 5776419696 testfile
> pwrite64: Invalid argument

xfs_mkfile does not create an "empty" file. It creates a file that
is full of zeros.

iAnd you're getting that erro because:

5776419696 / 512 = 11,282,069.7188

the last write is not a multiple of the sector size and xfs_mkfile
uses direct IO. It has always failed when you try to do this. If you
want to create allocated, zeroed files of abitrary size, then use:

xfs_io -f -c "truncate $size" -c "resvsp 0 $size" $filename

to preallocate it. it'll be much, much faster than xfs_mkfile.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFE kernel option to do the desirable thing, w/regards to 'O_DIRECT' and mis-aligned data
  2011-02-24  9:26     ` Dave Chinner
@ 2011-03-02  2:27       ` Linda Walsh
  0 siblings, 0 replies; 4+ messages in thread
From: Linda Walsh @ 2011-03-02  2:27 UTC (permalink / raw)
  To: LKML; +Cc: PXXdraig Brady, xfs-oss

    Thanks for the shorthand Dave, but I wasn't really trying to use
xfs_mkfs to make a file that was failing -- but was more trying to use
it as an example of supporting the idea that both should succeed, and
if a write is a partial write to an O_DIRECT file, that it be allowed
to succeed and the kernel, knowing the device's minimum write size
from the driver, could buffer the last sector. 

To deal with back-compat issues, it could be based off of a proc
var like /proc/kernel/fs/direct_IO_handling using bitfields (or
multiple vars if you don't like bitfields, I s
with the bits defined as:

Bit 0 Controlling allowed partial writes that start at an aligned position
Bit 1 Controlling allowed non-aligned writes
Bit 2 Controlling allowed partial reads that start at aligned position
Bit 3 Controlling allowed non-aligned reads
Bit 4 Controlling whether to use general FS cache for affected sectors

It's a bit of 'overkill' for what I wanted (just case controlled by
Bit 0), but for sake of completeness, I thought all of these combinations
should be specified.

Default of 0 = current behavior of mis-aligned data accesses failing,
while specifying various combinations would allow for variations with
the kernel handling mis-aligned accesses automatically, much like the
x86 processor handles mis-aligned integer additions or stacks automatically
(perhaps at a performance penalty, but with a tendency toward 'working'
rather than failing, if possible).

It seems better to put that logic in the kernel rather than saddle multiple
applications using DIRECT I/O with handling the non-aligned cases. 
This seems especially useful given the long term trend toward
increasing use of static-memory devices which will likely support
arbitrary direct I/O sizes. 

Linda Walsh

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-03-02  2:25 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <4D648D7D.7040500@tlinx.org>
     [not found] ` <4D64E2BB.7010000@draigBrady.com>
2011-02-23 18:04   ` write 'O_DIRECT' file w/odd amount of data: desirable result? Linda A. Walsh
2011-02-24  1:18     ` Pádraig Brady
2011-02-24  9:26     ` Dave Chinner
2011-03-02  2:27       ` RFE kernel option to do the desirable thing, w/regards to 'O_DIRECT' and mis-aligned data Linda Walsh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox