[BUG] during balance operation, WARNING: at fs/btrfs/relocation.c:1624 replace_file

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [BUG] during balance operation, WARNING: at fs/btrfs/relocation.c:1624 replace_file_extents+0x74b/0x7e0 [btrfs]()
@ 2013-03-04 17:24 Stefan Behrens
  2013-03-04 19:31 ` Chris Mason
  2013-03-05 14:36 ` David Sterba
  0 siblings, 2 replies; 9+ messages in thread
From: Stefan Behrens @ 2013-03-04 17:24 UTC (permalink / raw)
  To: Linux Btrfs List

Just ran the following command sequence and got lots of WARNINGs.
The issue is reproducible.
The box was running the cmason/for-linus that made it into Linux 3.9 RC1.

#!/bin/sh
mkfs.btrfs -f /dev/sdl /dev/sdk -m raid1 -d raid1 -l 16384
mount /dev/sdl /mnt
dd if=/dev/urandom of=/mnt/urandom.1GB bs=10M count=100 &
dd if=/dev/zero of=/mnt/zero.4GB bs=10M count=400 &
(cd ~/kernel-src; tar cf - fs) | (cd /mnt && tar xf -)
wait

((cd ~/kernel-src; tar cf - drivers) | (cd /mnt && tar xf -)) &
sleep 5
btrfs fi balance start /mnt



btrfs: disk space caching is enabled
btrfs flagging fs with big metadata feature
btrfs: relocating block group 6471811072 flags 17
btrfs: found 1 extents
------------[ cut here ]------------
WARNING: at fs/btrfs/relocation.c:1624 replace_file_extents+0x74b/0x7e0 [btrfs]()
Hardware name: X8SIL
Modules linked in: btrfs raid6_pq xor raid1 mpt2sas scsi_transport_sas raid_class
Pid: 3390, comm: btrfs Not tainted 3.8.0+ #56
Call Trace:
 [<ffffffff810871fa>] warn_slowpath_common+0x7a/0xb0
 [<ffffffff81087245>] warn_slowpath_null+0x15/0x20
 [<ffffffffa00eddeb>] replace_file_extents+0x74b/0x7e0 [btrfs]
 [<ffffffffa00c0b63>] ? __set_extent_bit+0x223/0x460 [btrfs]
 [<ffffffffa00f32c6>] btrfs_reloc_cow_block+0x186/0x230 [btrfs]
 [<ffffffffa007ff11>] __btrfs_cow_block+0x391/0x4c0 [btrfs]
 [<ffffffff810e098d>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffff810bf98b>] ? local_clock+0x4b/0x60
 [<ffffffffa008023c>] btrfs_cow_block+0x11c/0x1d0 [btrfs]
 [<ffffffffa00ee4bf>] do_relocation+0x46f/0x530 [btrfs]
 [<ffffffff810e12df>] ? lock_release_holdtime.part.24+0xf/0x180
 [<ffffffffa0088a10>] ? block_rsv_add_bytes+0x50/0x70 [btrfs]
 [<ffffffffa00f178b>] relocate_tree_blocks+0x66b/0x6a0 [btrfs]
 [<ffffffffa00eff88>] ? add_data_references+0x288/0x2c0 [btrfs]
 [<ffffffffa00f26f0>] relocate_block_group+0x430/0x690 [btrfs]
 [<ffffffffa00f2af2>] btrfs_relocate_block_group+0x1a2/0x2e0 [btrfs]
 [<ffffffffa00c8c6c>] btrfs_relocate_chunk.isra.60+0x4c/0x460 [btrfs]
 [<ffffffffa00da6bd>] ? btrfs_tree_read_unlock_blocking+0x5d/0xe0 [btrfs]
 [<ffffffffa00cdfcf>] btrfs_balance+0x8bf/0xe40 [btrfs]
 [<ffffffff81197b57>] ? create_object+0x247/0x300
 [<ffffffffa00d477f>] btrfs_ioctl_balance+0x23f/0x550 [btrfs]
 [<ffffffffa00d8e7f>] btrfs_ioctl+0xc0f/0x1ca0 [btrfs]
 [<ffffffff810b0dee>] ? up_read+0x1e/0x40
 [<ffffffff81982a94>] ? __do_page_fault+0x2d4/0x510
 [<ffffffff811ac8cb>] do_vfs_ioctl+0x8b/0x570
 [<ffffffff810e12df>] ? lock_release_holdtime.part.24+0xf/0x180
 [<ffffffff8197f409>] ? retint_swapgs+0xe/0x13
 [<ffffffff811ace41>] sys_ioctl+0x91/0xb0
 [<ffffffff8144b86e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff81986d52>] system_call_fastpath+0x16/0x1b
---[ end trace 6973e65f64077371 ]---


  ret = get_new_location(rc->data_inode, &new_bytenr,
                         bytenr, num_bytes);
  if (ret > 0) {
          WARN_ON(1); <-- line 1624
          continue;
  }
  BUG_ON(ret < 0);


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG] during balance operation, WARNING: at fs/btrfs/relocation.c:1624 replace_file_extents+0x74b/0x7e0 [btrfs]()
  2013-03-04 17:24 [BUG] during balance operation, WARNING: at fs/btrfs/relocation.c:1624 replace_file_extents+0x74b/0x7e0 [btrfs]() Stefan Behrens
@ 2013-03-04 19:31 ` Chris Mason
  2013-03-05 11:59   ` Stefan Behrens
  2013-03-05 14:36 ` David Sterba
  1 sibling, 1 reply; 9+ messages in thread
From: Chris Mason @ 2013-03-04 19:31 UTC (permalink / raw)
  To: Stefan Behrens; +Cc: Linux Btrfs List

On Mon, Mar 04, 2013 at 10:24:39AM -0700, Stefan Behrens wrote:
> Just ran the following command sequence and got lots of WARNINGs.
> The issue is reproducible.
> The box was running the cmason/for-linus that made it into Linux 3.9 RC1.
> 
> #!/bin/sh
> mkfs.btrfs -f /dev/sdl /dev/sdk -m raid1 -d raid1 -l 16384
> mount /dev/sdl /mnt
> dd if=/dev/urandom of=/mnt/urandom.1GB bs=10M count=100 &
> dd if=/dev/zero of=/mnt/zero.4GB bs=10M count=400 &
> (cd ~/kernel-src; tar cf - fs) | (cd /mnt && tar xf -)
> wait
> 
> ((cd ~/kernel-src; tar cf - drivers) | (cd /mnt && tar xf -)) &
> sleep 5
> btrfs fi balance start /mnt

This doesn't look new, are you able to trigger it with an older kernel?

-chris

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG] during balance operation, WARNING: at fs/btrfs/relocation.c:1624 replace_file_extents+0x74b/0x7e0 [btrfs]()
  2013-03-04 19:31 ` Chris Mason
@ 2013-03-05 11:59   ` Stefan Behrens
  2013-03-05 15:11     ` Chris Mason
  2013-03-06  0:19     ` Liu Bo
  0 siblings, 2 replies; 9+ messages in thread
From: Stefan Behrens @ 2013-03-05 11:59 UTC (permalink / raw)
  To: Chris Mason, Linux Btrfs List, zab

On Mon, 4 Mar 2013 14:31:37 -0500, Chris Mason wrote:
> On Mon, Mar 04, 2013 at 10:24:39AM -0700, Stefan Behrens wrote:
>> Just ran the following command sequence and got lots of WARNINGs.
>> The issue is reproducible.
>> The box was running the cmason/for-linus that made it into Linux 3.9 RC1.
>>
>> #!/bin/sh
>> mkfs.btrfs -f /dev/sdl /dev/sdk -m raid1 -d raid1 -l 16384
>> mount /dev/sdl /mnt
>> dd if=/dev/urandom of=/mnt/urandom.1GB bs=10M count=100 &
>> dd if=/dev/zero of=/mnt/zero.4GB bs=10M count=400 &
>> (cd ~/kernel-src; tar cf - fs) | (cd /mnt && tar xf -)
>> wait
>>
>> ((cd ~/kernel-src; tar cf - drivers) | (cd /mnt && tar xf -)) &
>> sleep 5
>> btrfs fi balance start /mnt
> 
> This doesn't look new, are you able to trigger it with an older kernel?
> 

git bisect identifies the following post v3.8 commit to be the one:

commit 24542bf7ea5e4fdfdb5157ff544c093fa4dcb536
Author: Zach Brown <zab@redhat.com>
Date:   Fri Nov 16 00:04:43 2012 +0000

    btrfs: limit fallocate extent reservation to 256MB

    Very large fallocate requests are cpu bound and result in extents with a
    repeating pattern of ever decreasing size:
    
    $ time fallocate -l 1T file
    real	0m13.039s
    
    ( an excerpt of the extents from btrfs-debug-tree: )
      prealloc data disk byte 1536292564992 nr 397312
      prealloc data disk byte 1536292962304 nr 196608
      prealloc data disk byte 1536293158912 nr 98304
      prealloc data disk byte 1536293257216 nr 49152
      prealloc data disk byte 1536293306368 nr 24576
      prealloc data disk byte 1536293330944 nr 12288
      prealloc data disk byte 1536293343232 nr 8192
      prealloc data disk byte 1536293351424 nr 4096
      prealloc data disk byte 1536293355520 nr 4096
      prealloc data disk byte 1536293359616 nr 4096
    
    The excessive cpu use comes from __btrfs_prealloc_file_range() trying to
    allocate the entire remaining size after each extent is allocated.
    btrfs_reserve_extent() repeatedly cuts this requested size in half until
    it gets down to the size that the allocators can return.  We limit the
    problem for now by capping each reservation at 256 meg.
    
    The small extents come from a masking bug when decreasing the requested
    reservation size.  The high 32bits are cleared and the remaining low
    bits might happen to reserve a small size.   Fix this by using
    round_down() which properly casts the mask.
    
    After these fixes huge fallocate requests are fast and result in nice
    large extents:
    
    $ time fallocate -l 1T file
    real	0m0.082s
    
      prealloc data disk byte 1112425889792 nr 268435456
      prealloc data disk byte 1112694325248 nr 268435456
      prealloc data disk byte 1112962760704 nr 268435456
    
    Reported-by: Eric Sandeen <sandeen@redhat.com>
    Signed-off-by: Zach Brown <zab@redhat.com>
    Signed-off-by: Chris Mason <chris.mason@fusionio.com>

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index b3ecca4..d2b3a5e 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -6143,7 +6143,7 @@ again:
 	if (ret == -ENOSPC) {
 		if (!final_tried) {
 			num_bytes = num_bytes >> 1;
-			num_bytes = num_bytes & ~(root->sectorsize - 1);
+			num_bytes = round_down(num_bytes, root->sectorsize);
 			num_bytes = max(num_bytes, min_alloc_size);
 			if (num_bytes == min_alloc_size)
 				final_tried = true;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4e6a11c..3bc62b1 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7894,8 +7894,9 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
 			}
 		}
 
-		ret = btrfs_reserve_extent(trans, root, num_bytes, min_size,
-					   0, *alloc_hint, &ins, 1);
+		ret = btrfs_reserve_extent(trans, root,
+					   min(num_bytes, 256ULL * 1024 * 1024),
+					   min_size, 0, *alloc_hint, &ins, 1);
 		if (ret) {
 			if (own_trans)
 				btrfs_end_transaction(trans, root);



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [BUG] during balance operation, WARNING: at fs/btrfs/relocation.c:1624 replace_file_extents+0x74b/0x7e0 [btrfs]()
  2013-03-04 17:24 [BUG] during balance operation, WARNING: at fs/btrfs/relocation.c:1624 replace_file_extents+0x74b/0x7e0 [btrfs]() Stefan Behrens
  2013-03-04 19:31 ` Chris Mason
@ 2013-03-05 14:36 ` David Sterba
  2013-03-05 14:53   ` Stefan Behrens
  1 sibling, 1 reply; 9+ messages in thread
From: David Sterba @ 2013-03-05 14:36 UTC (permalink / raw)
  To: Stefan Behrens; +Cc: Linux Btrfs List

On Mon, Mar 04, 2013 at 06:24:39PM +0100, Stefan Behrens wrote:
> Just ran the following command sequence and got lots of WARNINGs.
> The issue is reproducible.
> The box was running the cmason/for-linus that made it into Linux 3.9 RC1.
> 
> #!/bin/sh
> mkfs.btrfs -f /dev/sdl /dev/sdk -m raid1 -d raid1 -l 16384
> mount /dev/sdl /mnt
> dd if=/dev/urandom of=/mnt/urandom.1GB bs=10M count=100 &
> dd if=/dev/zero of=/mnt/zero.4GB bs=10M count=400 &
> (cd ~/kernel-src; tar cf - fs) | (cd /mnt && tar xf -)
> wait
> 
> ((cd ~/kernel-src; tar cf - drivers) | (cd /mnt && tar xf -)) &
> sleep 5
> btrfs fi balance start /mnt

Does the balance finish, report progress or is cancellable?

What I see here is 0% after many minutes. I've filled the fs with a few
4GB files as above, otherwise there are kernel sources compiled (ie.
files < 256M) and balance used to run fine there.

thanks,
david

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG] during balance operation, WARNING: at fs/btrfs/relocation.c:1624 replace_file_extents+0x74b/0x7e0 [btrfs]()
  2013-03-05 14:36 ` David Sterba
@ 2013-03-05 14:53   ` Stefan Behrens
  0 siblings, 0 replies; 9+ messages in thread
From: Stefan Behrens @ 2013-03-05 14:53 UTC (permalink / raw)
  To: dsterba, Linux Btrfs List

On Tue, 5 Mar 2013 15:36:33 +0100, David Sterba wrote:
> On Mon, Mar 04, 2013 at 06:24:39PM +0100, Stefan Behrens wrote:
>> Just ran the following command sequence and got lots of WARNINGs.
>> The issue is reproducible.
>> The box was running the cmason/for-linus that made it into Linux 3.9 RC1.
>>
>> #!/bin/sh
>> mkfs.btrfs -f /dev/sdl /dev/sdk -m raid1 -d raid1 -l 16384
>> mount /dev/sdl /mnt
>> dd if=/dev/urandom of=/mnt/urandom.1GB bs=10M count=100 &
>> dd if=/dev/zero of=/mnt/zero.4GB bs=10M count=400 &
>> (cd ~/kernel-src; tar cf - fs) | (cd /mnt && tar xf -)
>> wait
>>
>> ((cd ~/kernel-src; tar cf - drivers) | (cd /mnt && tar xf -)) &
>> sleep 5
>> btrfs fi balance start /mnt
> 
> Does the balance finish, report progress or is cancellable?
> 
> What I see here is 0% after many minutes. I've filled the fs with a few
> 4GB files as above, otherwise there are kernel sources compiled (ie.
> files < 256M) and balance used to run fine there.

It is not cancellable and did not make any progress.

In the good case (without that commit), it looks like this:
btrfs: relocating block group 6471811072 flags 17
btrfs: found 2 extents
btrfs: found 2 extents
btrfs: relocating block group 5398069248 flags 17
btrfs: found 1024 extents
btrfs: found 1024 extents
btrfs: relocating block group 4324327424 flags 17
btrfs: found 115 extents
btrfs: found 115 extents
btrfs: relocating block group 3250585600 flags 17
btrfs: found 829 extents
btrfs: found 829 extents
btrfs: relocating block group 2176843776 flags 17
btrfs: found 186 extents
btrfs: found 186 extents
btrfs: relocating block group 1103101952 flags 17
btrfs: found 880 extents
btrfs: found 880 extents
btrfs: relocating block group 29360128 flags 20
btrfs: found 1389 extents
btrfs: relocating block group 20971520 flags 18
btrfs: found 1 extents
btrfs: relocating block group 12582912 flags 1
btrfs: relocating block group 4194304 flags 4

With the commit included, it looks like this:
btrfs: relocating block group 6471811072 flags 17
btrfs: found 2 extents
WARNING: at fs/btrfs/relocation.c:1624 replace_file_extents+0x753/0x7f0 [btrfs]()
WARNING: at fs/btrfs/relocation.c:1624 replace_file_extents+0x753/0x7f0 [btrfs]()
btrfs: found 2 extents
WARNING: at fs/btrfs/relocation.c:1624 replace_file_extents+0x753/0x7f0 [btrfs]()
WARNING: at fs/btrfs/relocation.c:1624 replace_file_extents+0x753/0x7f0 [btrfs]()
btrfs: found 2 extents
WARNING: at fs/btrfs/relocation.c:1624 replace_file_extents+0x753/0x7f0 [btrfs]()
WARNING: at fs/btrfs/relocation.c:1624 replace_file_extents+0x753/0x7f0 [btrfs]()
btrfs: found 2 extents...
do {"btrfs: found 2 extents", "WARNING", "WARNING"} until the log overflows and someone reboots the box.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG] during balance operation, WARNING: at fs/btrfs/relocation.c:1624 replace_file_extents+0x74b/0x7e0 [btrfs]()
  2013-03-05 11:59   ` Stefan Behrens
@ 2013-03-05 15:11     ` Chris Mason
  2013-03-05 16:40       ` Chris Mason
  2013-03-06  0:19     ` Liu Bo
  1 sibling, 1 reply; 9+ messages in thread
From: Chris Mason @ 2013-03-05 15:11 UTC (permalink / raw)
  To: Stefan Behrens; +Cc: Chris Mason, Linux Btrfs List, zab@redhat.com

On Tue, Mar 05, 2013 at 04:59:05AM -0700, Stefan Behrens wrote:
> On Mon, 4 Mar 2013 14:31:37 -0500, Chris Mason wrote:
> > On Mon, Mar 04, 2013 at 10:24:39AM -0700, Stefan Behrens wrote:
> >> Just ran the following command sequence and got lots of WARNINGs.
> >> The issue is reproducible.
> >> The box was running the cmason/for-linus that made it into Linux 3.9 RC1.
> >>
> >> #!/bin/sh
> >> mkfs.btrfs -f /dev/sdl /dev/sdk -m raid1 -d raid1 -l 16384
> >> mount /dev/sdl /mnt
> >> dd if=/dev/urandom of=/mnt/urandom.1GB bs=10M count=100 &
> >> dd if=/dev/zero of=/mnt/zero.4GB bs=10M count=400 &
> >> (cd ~/kernel-src; tar cf - fs) | (cd /mnt && tar xf -)
> >> wait
> >>
> >> ((cd ~/kernel-src; tar cf - drivers) | (cd /mnt && tar xf -)) &
> >> sleep 5
> >> btrfs fi balance start /mnt
> > 
> > This doesn't look new, are you able to trigger it with an older kernel?
> > 
> 
> git bisect identifies the following post v3.8 commit to be the one:

Is your dd running fallocate?  Trying to figure out how this is related.

-chris

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG] during balance operation, WARNING: at fs/btrfs/relocation.c:1624 replace_file_extents+0x74b/0x7e0 [btrfs]()
  2013-03-05 15:11     ` Chris Mason
@ 2013-03-05 16:40       ` Chris Mason
  2013-03-05 18:43         ` Zach Brown
  0 siblings, 1 reply; 9+ messages in thread
From: Chris Mason @ 2013-03-05 16:40 UTC (permalink / raw)
  To: Chris Mason; +Cc: Stefan Behrens, Linux Btrfs List, zab@redhat.com

On Tue, Mar 05, 2013 at 08:11:21AM -0700, Chris Mason wrote:
> inux 3.9 RC1.
> > >>
> > >> #!/bin/sh
> > >> mkfs.btrfs -f /dev/sdl /dev/sdk -m raid1 -d raid1 -l 16384
> > >> mount /dev/sdl /mnt
> > >> dd if=/dev/urandom of=/mnt/urandom.1GB bs\x10M count\x100 &
> > >> dd if=/dev/zero of=/mnt/zero.4GB bs\x10M count@0 &
> > >> (cd ~/kernel-src; tar cf - fs) | (cd /mnt && tar xf -)
> > >> wait
> > >>
> > >> ((cd ~/kernel-src; tar cf - drivers) | (cd /mnt && tar xf -)) &
> > >> sleep 5
> > >> btrfs fi balance start /mnt
> > >
> > > This doesn't look new, are you able to trigger it with an older kernel?
> > >
> >
> > git bisect identifies the following post v3.8 commit to be the one:
> 
> Is your dd running fallocate?  Trying to figure out how this is related.

So the preallocation came from balance, which is preallocating because
it requires us to make an extent exactly the same size as the one we are
replacing.

Zach's commit broke that rule, which means I finally get to send him a
tshirt to celebrate his first btrfs bug.

Looking through all other callers, min_bytes is always either the sector
size or the total allocation requested, so I've done this and pushed it
to for-linus.

Stefan, many thanks for bisecting and testing the patch.

commit 154ea2893002618bc3f9a1e2d8186c65490968b1
Author: Chris Mason <chris.mason@fusionio.com>
Date:   Tue Mar 5 11:11:26 2013 -0500

    Btrfs: enforce min_bytes parameter during extent allocation
    
    Commit 24542bf7ea5e4fdfdb5157ff544c093fa4dcb536 changed preallocation of
    extents to cap the max size we try to allocate.  It's a valid change,
    but the extent reservation code is also used by balance, and that
    can't tolerate a smaller extent being allocated.
    
    __btrfs_prealloc_file_range already has a min_size parameter, which is
    used by relocation to request a specific extent size.  This commit
    adds an extra check to enforce that minimum extent size.
    
    Signed-off-by: Chris Mason <chris.mason@fusionio.com>
    Reported-by: Stefan Behrens <sbehrens@giantdisaster.de>

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index ecd9c4c..13ab4de 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8502,6 +8502,7 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
 	struct btrfs_key ins;
 	u64 cur_offset = start;
 	u64 i_size;
+	u64 cur_bytes;
 	int ret = 0;
 	bool own_trans = true;
 
@@ -8516,8 +8517,9 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
 			}
 		}
 
-		ret = btrfs_reserve_extent(trans, root,
-					   min(num_bytes, 256ULL * 1024 * 1024),
+		cur_bytes = min(num_bytes, 256ULL * 1024 * 1024);
+		cur_bytes = max(cur_bytes, min_size);
+		ret = btrfs_reserve_extent(trans, root, cur_bytes,
 					   min_size, 0, *alloc_hint, &ins, 1);
 		if (ret) {
 			if (own_trans)

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [BUG] during balance operation, WARNING: at fs/btrfs/relocation.c:1624 replace_file_extents+0x74b/0x7e0 [btrfs]()
  2013-03-05 16:40       ` Chris Mason
@ 2013-03-05 18:43         ` Zach Brown
  0 siblings, 0 replies; 9+ messages in thread
From: Zach Brown @ 2013-03-05 18:43 UTC (permalink / raw)
  To: Chris Mason, Chris Mason, Stefan Behrens, Linux Btrfs List

> Zach's commit broke that rule, which means I finally get to send him a
> tshirt to celebrate his first btrfs bug.

Hooray!  XL.

- z

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG] during balance operation, WARNING: at fs/btrfs/relocation.c:1624 replace_file_extents+0x74b/0x7e0 [btrfs]()
  2013-03-05 11:59   ` Stefan Behrens
  2013-03-05 15:11     ` Chris Mason
@ 2013-03-06  0:19     ` Liu Bo
  1 sibling, 0 replies; 9+ messages in thread
From: Liu Bo @ 2013-03-06  0:19 UTC (permalink / raw)
  To: Stefan Behrens; +Cc: Chris Mason, Linux Btrfs List, zab

On Tue, Mar 05, 2013 at 12:59:05PM +0100, Stefan Behrens wrote:
> On Mon, 4 Mar 2013 14:31:37 -0500, Chris Mason wrote:
> > On Mon, Mar 04, 2013 at 10:24:39AM -0700, Stefan Behrens wrote:
> >> Just ran the following command sequence and got lots of WARNINGs.
> >> The issue is reproducible.
> >> The box was running the cmason/for-linus that made it into Linux 3.9 RC1.
> >>
> >> #!/bin/sh
> >> mkfs.btrfs -f /dev/sdl /dev/sdk -m raid1 -d raid1 -l 16384
> >> mount /dev/sdl /mnt
> >> dd if=/dev/urandom of=/mnt/urandom.1GB bs=10M count=100 &
> >> dd if=/dev/zero of=/mnt/zero.4GB bs=10M count=400 &
> >> (cd ~/kernel-src; tar cf - fs) | (cd /mnt && tar xf -)
> >> wait
> >>
> >> ((cd ~/kernel-src; tar cf - drivers) | (cd /mnt && tar xf -)) &
> >> sleep 5
> >> btrfs fi balance start /mnt
> > 
> > This doesn't look new, are you able to trigger it with an older kernel?
> > 
> 
> git bisect identifies the following post v3.8 commit to be the one:

This bisect can explain a lot :)

Relocating file extents writes their data into reloc inode and then later
updates file extent pointer from reloc inode back to the original block
pointers.

Now prealloc gets the file extent's size smaller than the original one, so here
comes those warnings.

thanks,
liubo

> 
> commit 24542bf7ea5e4fdfdb5157ff544c093fa4dcb536
> Author: Zach Brown <zab@redhat.com>
> Date:   Fri Nov 16 00:04:43 2012 +0000
> 
>     btrfs: limit fallocate extent reservation to 256MB
> 
>     Very large fallocate requests are cpu bound and result in extents with a
>     repeating pattern of ever decreasing size:
>     
>     $ time fallocate -l 1T file
>     real	0m13.039s
>     
>     ( an excerpt of the extents from btrfs-debug-tree: )
>       prealloc data disk byte 1536292564992 nr 397312
>       prealloc data disk byte 1536292962304 nr 196608
>       prealloc data disk byte 1536293158912 nr 98304
>       prealloc data disk byte 1536293257216 nr 49152
>       prealloc data disk byte 1536293306368 nr 24576
>       prealloc data disk byte 1536293330944 nr 12288
>       prealloc data disk byte 1536293343232 nr 8192
>       prealloc data disk byte 1536293351424 nr 4096
>       prealloc data disk byte 1536293355520 nr 4096
>       prealloc data disk byte 1536293359616 nr 4096
>     
>     The excessive cpu use comes from __btrfs_prealloc_file_range() trying to
>     allocate the entire remaining size after each extent is allocated.
>     btrfs_reserve_extent() repeatedly cuts this requested size in half until
>     it gets down to the size that the allocators can return.  We limit the
>     problem for now by capping each reservation at 256 meg.
>     
>     The small extents come from a masking bug when decreasing the requested
>     reservation size.  The high 32bits are cleared and the remaining low
>     bits might happen to reserve a small size.   Fix this by using
>     round_down() which properly casts the mask.
>     
>     After these fixes huge fallocate requests are fast and result in nice
>     large extents:
>     
>     $ time fallocate -l 1T file
>     real	0m0.082s
>     
>       prealloc data disk byte 1112425889792 nr 268435456
>       prealloc data disk byte 1112694325248 nr 268435456
>       prealloc data disk byte 1112962760704 nr 268435456
>     
>     Reported-by: Eric Sandeen <sandeen@redhat.com>
>     Signed-off-by: Zach Brown <zab@redhat.com>
>     Signed-off-by: Chris Mason <chris.mason@fusionio.com>
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index b3ecca4..d2b3a5e 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -6143,7 +6143,7 @@ again:
>  	if (ret == -ENOSPC) {
>  		if (!final_tried) {
>  			num_bytes = num_bytes >> 1;
> -			num_bytes = num_bytes & ~(root->sectorsize - 1);
> +			num_bytes = round_down(num_bytes, root->sectorsize);
>  			num_bytes = max(num_bytes, min_alloc_size);
>  			if (num_bytes == min_alloc_size)
>  				final_tried = true;
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 4e6a11c..3bc62b1 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -7894,8 +7894,9 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
>  			}
>  		}
>  
> -		ret = btrfs_reserve_extent(trans, root, num_bytes, min_size,
> -					   0, *alloc_hint, &ins, 1);
> +		ret = btrfs_reserve_extent(trans, root,
> +					   min(num_bytes, 256ULL * 1024 * 1024),
> +					   min_size, 0, *alloc_hint, &ins, 1);
>  		if (ret) {
>  			if (own_trans)
>  				btrfs_end_transaction(trans, root);
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-03-06  0:21 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-04 17:24 [BUG] during balance operation, WARNING: at fs/btrfs/relocation.c:1624 replace_file_extents+0x74b/0x7e0 [btrfs]() Stefan Behrens
2013-03-04 19:31 ` Chris Mason
2013-03-05 11:59   ` Stefan Behrens
2013-03-05 15:11     ` Chris Mason
2013-03-05 16:40       ` Chris Mason
2013-03-05 18:43         ` Zach Brown
2013-03-06  0:19     ` Liu Bo
2013-03-05 14:36 ` David Sterba
2013-03-05 14:53   ` Stefan Behrens

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).