linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Strange hole creation behavior
@ 2014-04-11 17:13 Pádraig Brady
  2014-04-11 20:43 ` Brian Foster
  0 siblings, 1 reply; 4+ messages in thread
From: Pádraig Brady @ 2014-04-11 17:13 UTC (permalink / raw)
  To: xfs-oss; +Cc: Ondřej Vašík

So this coreutils test is failing on XFS:
http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=blob;f=tests/dd/sparse.sh;h=06efc7017
Specifically the last hole check on line 66.

In summary what's happening is that a write(1MiB), lseek(1MiB), write(1MiB)
creates only a 64KiB hole. Is that expected?

Now a 1MiB hole is supported using truncate:
  dd if=/dev/urandom of=file.in bs=1M count=1 iflag=fullblock
  truncate -s+1M file.in
  dd if=/dev/urandom of=file.in bs=1M count=1 iflag=fullblock conv=notrunc oflag=append
  $ du -k file.in
  2048  file.in

But when trying to create the 1MiB hole with dd (lseek) it fails?

  # Create 3MiB input file file
  $ dd if=/dev/urandom of=file.in bs=1M count=3 iflag=fullblock
  $ dd if=/dev/zero    of=file.in bs=1M count=1 seek=1 conv=notrunc
  $ du -k file.in
  3072  file.in

  # Convert to 1MiB hole doesn't work :(
  $ dd if=file.in of=file.out bs=1M conv=sparse
  $ du -k file.out
  3008  file.out

  # Again with syscall details:
  $ strace -e write,lseek dd if=file.in of=file.out bs=1M conv=sparse
  write(1, "...", 1048576) = 1048576
  lseek(1, 1048576, SEEK_CUR)             = 2097152
  write(1, "...", 1048576) = 1048576

So it seems that the lseeks are treated differently to the truncate
that was done in the first example, which is surprising.
If we look at the file layout we can see the hole is
only at the last 64KiB of the middle 1MiB of zeros,
rather than for the whole middle 1MiB as in the first example??

  $ filefrag -v file.out
  Filesystem type is: 58465342
  File size of file.out is 3145728 (768 blocks of 4096 bytes)
   ext:     logical_offset:        physical_offset: length:   expected: flags:
     0:        0..     495:      31271..     31766:    496:
     1:      512..     767:      31783..     32038:    256:      31767: eof

thanks,
Pádraig.

Versions etc. in case useful

$ uname -a
Linux tp2 3.12.6-300.fc20.x86_64 #1 SMP Mon Dec 23 16:44:31 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

$ xfs_info .
meta-data=/dev/loop2             isize=256    agcount=4, agsize=65536 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=262144, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Strange hole creation behavior
  2014-04-11 17:13 Strange hole creation behavior Pádraig Brady
@ 2014-04-11 20:43 ` Brian Foster
  2014-04-11 22:58   ` Pádraig Brady
  0 siblings, 1 reply; 4+ messages in thread
From: Brian Foster @ 2014-04-11 20:43 UTC (permalink / raw)
  To: Pádraig Brady; +Cc: Ondřej Vašík, xfs-oss

On Fri, Apr 11, 2014 at 06:13:59PM +0100, Pádraig Brady wrote:
> So this coreutils test is failing on XFS:
> http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=blob;f=tests/dd/sparse.sh;h=06efc7017
> Specifically the last hole check on line 66.
> 
> In summary what's happening is that a write(1MiB), lseek(1MiB), write(1MiB)
> creates only a 64KiB hole. Is that expected?
> 

This is expected behavior due to speculative preallocation. An FAQ with
regard to this behavior is pending, but see here for reference:

http://oss.sgi.com/archives/xfs/2014-04/msg00083.html

In that particular write(1MB), lseek(+1MB), write(1MB) workload, each
write is preallocating some extra space beyond the current EOF. The seek
then moves past that space, but the space doesn't go away. The
subsequent writes will extend EOF. The previously preallocated space now
resides in the middle of the file and can't be trimmed away when the
file is closed.

> Now a 1MiB hole is supported using truncate:
>   dd if=/dev/urandom of=file.in bs=1M count=1 iflag=fullblock
>   truncate -s+1M file.in
>   dd if=/dev/urandom of=file.in bs=1M count=1 iflag=fullblock conv=notrunc oflag=append
>   $ du -k file.in
>   2048  file.in
> 

This works simply because it is broken into multiple commands. When the
first dd exits, the excess space is trimmed off (the file descriptor is
closed). The subsequent truncate extends the file size without any
extra space getting caught between the old and new EOF.

You can confirm this by using the 'allocsize=4k' mount option to the XFS
mount. If you wanted something more generic for the purpose of testing
the coreutils functionality, you could also set the size of file.out in
advance. E.g., with preallocation in effect:

# dd if=file.in of=file.out bs=1M conv=sparse
# xfs_bmap -v file.out 
file.out:
 EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
   0: [0..3967]:       9773944..9777911  1 (9080..13047)     3968
   1: [3968..4095]:    hole                                   128
   2: [4096..6143]:    9778040..9780087  1 (13176..15223)    2048

... and then prevent preallocation by ensuring writes do not extend the
file:

# rm -f file.out 
# truncate --size=3M file.out
# dd if=file.in of=file.out bs=1M conv=sparse,notrunc
# xfs_bmap -v file.out 
file.out:
 EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
   0: [0..2047]:       9773944..9775991  1 (9080..11127)     2048
   1: [2048..4095]:    hole                                  2048
   2: [4096..6143]:    9778040..9780087  1 (13176..15223)    2048

Hope that helps.

Brian

> But when trying to create the 1MiB hole with dd (lseek) it fails?
> 
>   # Create 3MiB input file file
>   $ dd if=/dev/urandom of=file.in bs=1M count=3 iflag=fullblock
>   $ dd if=/dev/zero    of=file.in bs=1M count=1 seek=1 conv=notrunc
>   $ du -k file.in
>   3072  file.in
> 
>   # Convert to 1MiB hole doesn't work :(
>   $ dd if=file.in of=file.out bs=1M conv=sparse
>   $ du -k file.out
>   3008  file.out
> 
>   # Again with syscall details:
>   $ strace -e write,lseek dd if=file.in of=file.out bs=1M conv=sparse
>   write(1, "...", 1048576) = 1048576
>   lseek(1, 1048576, SEEK_CUR)             = 2097152
>   write(1, "...", 1048576) = 1048576
> 
> So it seems that the lseeks are treated differently to the truncate
> that was done in the first example, which is surprising.
> If we look at the file layout we can see the hole is
> only at the last 64KiB of the middle 1MiB of zeros,
> rather than for the whole middle 1MiB as in the first example??
> 
>   $ filefrag -v file.out
>   Filesystem type is: 58465342
>   File size of file.out is 3145728 (768 blocks of 4096 bytes)
>    ext:     logical_offset:        physical_offset: length:   expected: flags:
>      0:        0..     495:      31271..     31766:    496:
>      1:      512..     767:      31783..     32038:    256:      31767: eof
> 
> thanks,
> Pádraig.
> 
> Versions etc. in case useful
> 
> $ uname -a
> Linux tp2 3.12.6-300.fc20.x86_64 #1 SMP Mon Dec 23 16:44:31 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
> 
> $ xfs_info .
> meta-data=/dev/loop2             isize=256    agcount=4, agsize=65536 blks
>          =                       sectsz=512   attr=2
> data     =                       bsize=4096   blocks=262144, imaxpct=25
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0
> log      =internal               bsize=4096   blocks=2560, version=2
>          =                       sectsz=512   sunit=0 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Strange hole creation behavior
  2014-04-11 20:43 ` Brian Foster
@ 2014-04-11 22:58   ` Pádraig Brady
  2014-04-11 23:05     ` Eric Sandeen
  0 siblings, 1 reply; 4+ messages in thread
From: Pádraig Brady @ 2014-04-11 22:58 UTC (permalink / raw)
  To: Brian Foster; +Cc: Coreutils, Ondřej Vašík, xfs-oss

[-- Attachment #1: Type: text/plain, Size: 5311 bytes --]

On 04/11/2014 09:43 PM, Brian Foster wrote:
> On Fri, Apr 11, 2014 at 06:13:59PM +0100, Pádraig Brady wrote:
>> So this coreutils test is failing on XFS:
>> http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=blob;f=tests/dd/sparse.sh;h=06efc7017
>> Specifically the last hole check on line 66.
>>
>> In summary what's happening is that a write(1MiB), lseek(1MiB), write(1MiB)
>> creates only a 64KiB hole. Is that expected?
>>
> 
> This is expected behavior due to speculative preallocation. An FAQ with
> regard to this behavior is pending, but see here for reference:
> 
> http://oss.sgi.com/archives/xfs/2014-04/msg00083.html
> 
> In that particular write(1MB), lseek(+1MB), write(1MB) workload, each
> write is preallocating some extra space beyond the current EOF. The seek
> then moves past that space, but the space doesn't go away. The
> subsequent writes will extend EOF. The previously preallocated space now
> resides in the middle of the file and can't be trimmed away when the
> file is closed.
> 
>> Now a 1MiB hole is supported using truncate:
>>   dd if=/dev/urandom of=file.in bs=1M count=1 iflag=fullblock
>>   truncate -s+1M file.in
>>   dd if=/dev/urandom of=file.in bs=1M count=1 iflag=fullblock conv=notrunc oflag=append
>>   $ du -k file.in
>>   2048  file.in
>>
> 
> This works simply because it is broken into multiple commands. When the
> first dd exits, the excess space is trimmed off (the file descriptor is
> closed). The subsequent truncate extends the file size without any
> extra space getting caught between the old and new EOF.
> 
> You can confirm this by using the 'allocsize=4k' mount option to the XFS
> mount. If you wanted something more generic for the purpose of testing
> the coreutils functionality, you could also set the size of file.out in
> advance. E.g., with preallocation in effect:
> 
> # dd if=file.in of=file.out bs=1M conv=sparse
> # xfs_bmap -v file.out 
> file.out:
>  EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
>    0: [0..3967]:       9773944..9777911  1 (9080..13047)     3968
>    1: [3968..4095]:    hole                                   128
>    2: [4096..6143]:    9778040..9780087  1 (13176..15223)    2048
> 
> ... and then prevent preallocation by ensuring writes do not extend the
> file:
> 
> # rm -f file.out 
> # truncate --size=3M file.out
> # dd if=file.in of=file.out bs=1M conv=sparse,notrunc
> # xfs_bmap -v file.out 
> file.out:
>  EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
>    0: [0..2047]:       9773944..9775991  1 (9080..11127)     2048
>    1: [2048..4095]:    hole                                  2048
>    2: [4096..6143]:    9778040..9780087  1 (13176..15223)    2048
> 
> Hope that helps.

Excellent info thanks.
With that I can adjust the test so it passes (patch attached).

So for reference this means that cp can no longer recreate holes
<= 1MiB from source to dest (with the default XFS allocation size):

$ cp --sparse=always file.in cp.out
$ xfs_bmap -v !$
xfs_bmap -v cp.out
cp.out:
 EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
   0: [0..3967]:       219104..223071    0 (219104..223071)  3968
   1: [3968..4095]:    hole                                   128
   2: [4096..6143]:    225720..227767    0 (225720..227767)  2048

$ xfs_bmap -v file.out
file.out:
 EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
   0: [0..2047]:       229816..231863    0 (229816..231863)  2048
   1: [2048..4095]:    hole                                  2048
   2: [4096..6143]:    233912..235959    0 (233912..235959)  2048

$ cp file.out cp.out
$ xfs_bmap -v cp.out
cp.out:
 EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
   0: [0..3967]:       250296..254263    0 (250296..254263)  3968
   1: [3968..4095]:    hole                                   128
   2: [4096..6143]:    254392..256439    0 (254392..256439)  2048

Though if we bump up the hole size the representation is better:

$ dd if=/dev/urandom of=bigfile.in bs=1M count=1 iflag=fullblock
$ truncate -s+10M bigfile.in
$ dd if=/dev/urandom of=bigfile.in bs=1M count=1 iflag=fullblock conv=notrunc oflag=append

$ xfs_bmap -v bigfile.in
bigfile.in:
 EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
   0: [0..2047]:       231864..233911    0 (231864..233911)  2048
   1: [2048..22527]:   hole                                 20480
   2: [22528..24575]:  256440..258487    0 (256440..258487)  2048

$ cp bigfile.in bigfile.out
$ xfs_bmap -v bigfile.out
bigfile.out:
 EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
   0: [0..3967]:       260408..264375    0 (260408..264375)  3968
   1: [3968..22527]:   hole                                 18560
   2: [22528..24575]:  264376..266423    0 (264376..266423)  2048

We could I suppose use FALLOC_FL_PUNCH_HOLE where available
to cater for this case. I'll see whether this is worth adding.
That can be used after the fact anyway:

$ fallocate --dig-holes bigfile.out
$ xfs_bmap -v bigfile.out
bigfile.out:
 EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
   0: [0..2047]:       260408..262455    0 (260408..262455)  2048
   1: [2048..22527]:   hole                                 20480
   2: [22528..24575]:  264376..266423    0 (264376..266423)  2048

thanks,
Pádraig.

[-- Attachment #2: dd-sparse-xfs.patch --]
[-- Type: text/x-patch, Size: 2042 bytes --]

>From 7c03fe2c9f498bad7e40d29f2eb4573d23e102d0 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= <P@draigBrady.com>
Date: Fri, 11 Apr 2014 23:44:13 +0100
Subject: [PATCH] tests: fix false dd conv=sparse failure on newer XFS

* tests/dd/sparse.sh: When testing that a hole is created,
use an existing sparse destination file, so that we're
not write extending the file size, and thus avoiding
speculative preallocation which can result in smaller
holes than requested.
Workaround suggested by Brian Foster
---
 THANKS.in          |    1 +
 tests/dd/sparse.sh |    9 ++++++++-
 2 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/THANKS.in b/THANKS.in
index e7298ef..a92540a 100644
--- a/THANKS.in
+++ b/THANKS.in
@@ -95,6 +95,7 @@ Bjorn Helgaas                       helgaas@rsn.hp.com
 Bob McCracken                       kerouac@ravenet.com
 Branden Robinson                    branden@necrotic.deadbeast.net
 Brendan O'Dea                       bod@compusol.com.au
+Brian Foster                        bfoster@redhat.com
 Brian Kimball                       bfk@footbag.org
 Brian M. Carlson                    sandals@crustytoothpaste.ath.cx
 Brian Silverman                     bsilverman@conceptxdesign.com
diff --git a/tests/dd/sparse.sh b/tests/dd/sparse.sh
index 06efc70..a7e90d2 100755
--- a/tests/dd/sparse.sh
+++ b/tests/dd/sparse.sh
@@ -61,8 +61,15 @@ if test $(kb_alloc file.in) -gt 3000; then
   dd if=file.in of=file.out bs=2M conv=sparse
   test 2500 -lt $(kb_alloc file.out) || fail=1
 
+  # Note we recreate a sparse file first to avoid
+  # speculative preallocation seen in XFS, where a write() that
+  # extends a file can preallocate some extra space that
+  # a subsequent seek will not convert to a hole.
+  rm -f file.out
+  truncate --size=3M file.out
+
   # Ensure that this 1MiB string of NULs *is* converted to a hole.
-  dd if=file.in of=file.out bs=1M conv=sparse
+  dd if=file.in of=file.out bs=1M conv=sparse,notrunc
   test $(kb_alloc file.out) -lt 2500 || fail=1
 
 fi
-- 
1.7.7.6


[-- Attachment #3: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: Strange hole creation behavior
  2014-04-11 22:58   ` Pádraig Brady
@ 2014-04-11 23:05     ` Eric Sandeen
  0 siblings, 0 replies; 4+ messages in thread
From: Eric Sandeen @ 2014-04-11 23:05 UTC (permalink / raw)
  To: Pádraig Brady, Brian Foster
  Cc: xfs-oss, Coreutils, Ondřej Vašík

On 4/11/14, 5:58 PM, Pádraig Brady wrote:
> On 04/11/2014 09:43 PM, Brian Foster wrote:
>> On Fri, Apr 11, 2014 at 06:13:59PM +0100, Pádraig Brady wrote:
>>> So this coreutils test is failing on XFS:
>>> http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=blob;f=tests/dd/sparse.sh;h=06efc7017
>>> Specifically the last hole check on line 66.
>>>
>>> In summary what's happening is that a write(1MiB), lseek(1MiB), write(1MiB)
>>> creates only a 64KiB hole. Is that expected?
>>>
>>
>> This is expected behavior due to speculative preallocation. An FAQ with
>> regard to this behavior is pending, but see here for reference:
>>
>> http://oss.sgi.com/archives/xfs/2014-04/msg00083.html
>>
>> In that particular write(1MB), lseek(+1MB), write(1MB) workload, each
>> write is preallocating some extra space beyond the current EOF. The seek
>> then moves past that space, but the space doesn't go away. The
>> subsequent writes will extend EOF. The previously preallocated space now
>> resides in the middle of the file and can't be trimmed away when the
>> file is closed.
>>
>>> Now a 1MiB hole is supported using truncate:
>>>   dd if=/dev/urandom of=file.in bs=1M count=1 iflag=fullblock
>>>   truncate -s+1M file.in
>>>   dd if=/dev/urandom of=file.in bs=1M count=1 iflag=fullblock conv=notrunc oflag=append
>>>   $ du -k file.in
>>>   2048  file.in
>>>
>>
>> This works simply because it is broken into multiple commands. When the
>> first dd exits, the excess space is trimmed off (the file descriptor is
>> closed). The subsequent truncate extends the file size without any
>> extra space getting caught between the old and new EOF.
>>
>> You can confirm this by using the 'allocsize=4k' mount option to the XFS
>> mount. If you wanted something more generic for the purpose of testing
>> the coreutils functionality, you could also set the size of file.out in
>> advance. E.g., with preallocation in effect:
>>
>> # dd if=file.in of=file.out bs=1M conv=sparse
>> # xfs_bmap -v file.out 
>> file.out:
>>  EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
>>    0: [0..3967]:       9773944..9777911  1 (9080..13047)     3968
>>    1: [3968..4095]:    hole                                   128
>>    2: [4096..6143]:    9778040..9780087  1 (13176..15223)    2048
>>
>> ... and then prevent preallocation by ensuring writes do not extend the
>> file:
>>
>> # rm -f file.out 
>> # truncate --size=3M file.out
>> # dd if=file.in of=file.out bs=1M conv=sparse,notrunc
>> # xfs_bmap -v file.out 
>> file.out:
>>  EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
>>    0: [0..2047]:       9773944..9775991  1 (9080..11127)     2048
>>    1: [2048..4095]:    hole                                  2048
>>    2: [4096..6143]:    9778040..9780087  1 (13176..15223)    2048
>>
>> Hope that helps.
> 
> Excellent info thanks.
> With that I can adjust the test so it passes (patch attached).
> 
> So for reference this means that cp can no longer recreate holes
> <= 1MiB from source to dest (with the default XFS allocation size):

Well, the allocation size changes based on the filesize; there's a
heuristic involved.  So I fear that if you hard-code it into your
tests, you risk failing again in the future...

> We could I suppose use FALLOC_FL_PUNCH_HOLE where available
> to cater for this case. I'll see whether this is worth adding.

That might make sense.

But filesystems get to pick their layout; even ext4 may opportunistically
fill in holes, etc - so I think you need to be pretty careful with these
sorts of tests...

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-04-11 23:05 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-11 17:13 Strange hole creation behavior Pádraig Brady
2014-04-11 20:43 ` Brian Foster
2014-04-11 22:58   ` Pádraig Brady
2014-04-11 23:05     ` Eric Sandeen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).