linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Btrfs: allow us to overcommit our enospc reservations TEST THIS PLEASE!!!
@ 2011-09-26 21:22 Josef Bacik
  2011-10-11 17:33 ` Mitch Harder
  0 siblings, 1 reply; 12+ messages in thread
From: Josef Bacik @ 2011-09-26 21:22 UTC (permalink / raw)
  To: linux-btrfs

One of the things that kills us is the fact that our ENOSPC reservations are
horribly over the top in most normal cases.  There isn't too much that can be
done about this because when we are completely full we really need them to work
like this so we don't under reserve.  However if there is plenty of unallocated
chunks on the disk we can use that to gauge how much we can overcommit.  So this
patch adds chunk free space accounting so we always know how much unallocated
space we have.  Then if we fail to make a reservation within our allocated
space, check to see if we can overcommit.  In the normal flushing case (like
with delalloc metadata reservations) we'll take the free space and divide it by
2 if our metadata profile is setup for DUP or any of those, and then divide it
by 8 to make sure we don't overcommit too much.  Then if we're in a non-flushing
case (we really need this reservation now!) we only limit ourselves to half of
the free space.  This makes this fio test

[torrent]
filename=torrent-test
rw=randwrite
size=4g
ioengine=sync
directory=/mnt/btrfs-test

go from taking around 45 minutes to 10 seconds on my freshly formatted 3 TiB
file system.  This doesn't seem to break my other enospc tests, but could really
use some more testing as this is a super scary change.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
---
 fs/btrfs/ctree.h       |    4 +++
 fs/btrfs/disk-io.c     |    2 +
 fs/btrfs/extent-tree.c |   61 +++++++++++++++++++++++++++++++++++++-----------
 fs/btrfs/volumes.c     |   39 +++++++++++++++++++++++++++---
 4 files changed, 88 insertions(+), 18 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 47dea71..1eafccb 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -893,6 +893,10 @@ struct btrfs_fs_info {
 	spinlock_t block_group_cache_lock;
 	struct rb_root block_group_cache_tree;
 
+	/* keep track of unallocated space */
+	spinlock_t free_chunk_lock;
+	u64 free_chunk_space;
+
 	struct extent_io_tree freed_extents[2];
 	struct extent_io_tree *pinned_extents;
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 4965a01..51372a5 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1648,6 +1648,7 @@ struct btrfs_root *open_ctree(struct super_block *sb,
 	spin_lock_init(&fs_info->fs_roots_radix_lock);
 	spin_lock_init(&fs_info->delayed_iput_lock);
 	spin_lock_init(&fs_info->defrag_inodes_lock);
+	spin_lock_init(&fs_info->free_chunk_lock);
 	mutex_init(&fs_info->reloc_mutex);
 
 	init_completion(&fs_info->kobj_unregister);
@@ -1675,6 +1676,7 @@ struct btrfs_root *open_ctree(struct super_block *sb,
 	fs_info->metadata_ratio = 0;
 	fs_info->defrag_inodes = RB_ROOT;
 	fs_info->trans_no_join = 0;
+	fs_info->free_chunk_space = 0;
 
 	fs_info->thread_pool_size = min_t(unsigned long,
 					  num_online_cpus() + 2, 8);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index fd65f6b..25b69d0 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3410,6 +3410,7 @@ static int shrink_delalloc(struct btrfs_trans_handle *trans,
  * @block_rsv - the block_rsv we're allocating for
  * @orig_bytes - the number of bytes we want
  * @flush - wether or not we can flush to make our reservation
+ * @check - wether this is just to check if we have enough space or not
  *
  * This will reserve orgi_bytes number of bytes from the space info associated
  * with the block_rsv.  If there is not enough space it will make an attempt to
@@ -3420,11 +3421,11 @@ static int shrink_delalloc(struct btrfs_trans_handle *trans,
  */
 static int reserve_metadata_bytes(struct btrfs_root *root,
 				  struct btrfs_block_rsv *block_rsv,
-				  u64 orig_bytes, int flush)
+				  u64 orig_bytes, int flush, int check)
 {
 	struct btrfs_space_info *space_info = block_rsv->space_info;
 	struct btrfs_trans_handle *trans;
-	u64 unused;
+	u64 used;
 	u64 num_bytes = orig_bytes;
 	int retries = 0;
 	int ret = 0;
@@ -3459,9 +3460,9 @@ again:
 	}
 
 	ret = -ENOSPC;
-	unused = space_info->bytes_used + space_info->bytes_reserved +
-		 space_info->bytes_pinned + space_info->bytes_readonly +
-		 space_info->bytes_may_use;
+	used = space_info->bytes_used + space_info->bytes_reserved +
+		space_info->bytes_pinned + space_info->bytes_readonly +
+		space_info->bytes_may_use;
 
 	/*
 	 * The idea here is that we've not already over-reserved the block group
@@ -3470,9 +3471,8 @@ again:
 	 * lets start flushing stuff first and then come back and try to make
 	 * our reservation.
 	 */
-	if (unused <= space_info->total_bytes) {
-		unused = space_info->total_bytes - unused;
-		if (unused >= orig_bytes) {
+	if (used <= space_info->total_bytes) {
+		if (used + orig_bytes <= space_info->total_bytes) {
 			space_info->bytes_may_use += orig_bytes;
 			ret = 0;
 		} else {
@@ -3489,10 +3489,43 @@ again:
 		 * amount plus the amount of bytes that we need for this
 		 * reservation.
 		 */
-		num_bytes = unused - space_info->total_bytes +
+		num_bytes = used - space_info->total_bytes +
 			(orig_bytes * (retries + 1));
 	}
 
+	if (ret && !check) {
+		u64 profile = btrfs_get_alloc_profile(root, 0);
+		u64 avail;
+
+		spin_lock(&root->fs_info->free_chunk_lock);
+		avail = root->fs_info->free_chunk_space;
+
+		/*
+		 * If we have dup, raid1 or raid10 then only half of the free
+		 * space is actually useable.
+		 */
+		if (profile & (BTRFS_BLOCK_GROUP_DUP |
+			       BTRFS_BLOCK_GROUP_RAID1 |
+			       BTRFS_BLOCK_GROUP_RAID10))
+			avail >>= 1;
+
+		/*
+		 * If we aren't flushing don't let us overcommit too much, say
+		 * 1/8th of the space.  If we can flush, let it overcommit up to
+		 * 1/2 of the space.
+		 */
+		if (flush)
+			avail >>= 3;
+		else
+			avail >>= 1;
+		 spin_unlock(&root->fs_info->free_chunk_lock);
+
+		if (used + orig_bytes < space_info->total_bytes + avail) {
+			space_info->bytes_may_use += orig_bytes;
+			ret = 0;
+		}
+	}
+
 	/*
 	 * Couldn't make our reservation, save our place so while we're trying
 	 * to reclaim space we can actually use it instead of somebody else
@@ -3703,7 +3736,7 @@ int btrfs_block_rsv_add(struct btrfs_root *root,
 	if (num_bytes == 0)
 		return 0;
 
-	ret = reserve_metadata_bytes(root, block_rsv, num_bytes, 1);
+	ret = reserve_metadata_bytes(root, block_rsv, num_bytes, 1, 0);
 	if (!ret) {
 		block_rsv_add_bytes(block_rsv, num_bytes, 1);
 		return 0;
@@ -3737,7 +3770,7 @@ int btrfs_block_rsv_check(struct btrfs_root *root,
 	if (!ret)
 		return 0;
 
-	ret = reserve_metadata_bytes(root, block_rsv, num_bytes, flush);
+	ret = reserve_metadata_bytes(root, block_rsv, num_bytes, flush, !flush);
 	if (!ret) {
 		block_rsv_add_bytes(block_rsv, num_bytes, 0);
 		return 0;
@@ -4037,7 +4070,7 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes)
 	to_reserve += calc_csum_metadata_size(inode, num_bytes, 1);
 	spin_unlock(&BTRFS_I(inode)->lock);
 
-	ret = reserve_metadata_bytes(root, block_rsv, to_reserve, flush);
+	ret = reserve_metadata_bytes(root, block_rsv, to_reserve, flush, 0);
 	if (ret) {
 		u64 to_free = 0;
 		unsigned dropped;
@@ -5692,7 +5725,7 @@ use_block_rsv(struct btrfs_trans_handle *trans,
 	block_rsv = get_block_rsv(trans, root);
 
 	if (block_rsv->size == 0) {
-		ret = reserve_metadata_bytes(root, block_rsv, blocksize, 0);
+		ret = reserve_metadata_bytes(root, block_rsv, blocksize, 0, 0);
 		/*
 		 * If we couldn't reserve metadata bytes try and use some from
 		 * the global reserve.
@@ -5713,7 +5746,7 @@ use_block_rsv(struct btrfs_trans_handle *trans,
 		return block_rsv;
 	if (ret) {
 		WARN_ON(1);
-		ret = reserve_metadata_bytes(root, block_rsv, blocksize, 0);
+		ret = reserve_metadata_bytes(root, block_rsv, blocksize, 0, 0);
 		if (!ret) {
 			return block_rsv;
 		} else if (ret && block_rsv != global_rsv) {
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f2a4cc7..e138af7 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1013,8 +1013,13 @@ static int btrfs_free_dev_extent(struct btrfs_trans_handle *trans,
 	}
 	BUG_ON(ret);
 
-	if (device->bytes_used > 0)
-		device->bytes_used -= btrfs_dev_extent_length(leaf, extent);
+	if (device->bytes_used > 0) {
+		u64 len = btrfs_dev_extent_length(leaf, extent);
+		device->bytes_used -= len;
+		spin_lock(&root->fs_info->free_chunk_lock);
+		root->fs_info->free_chunk_space += len;
+		spin_unlock(&root->fs_info->free_chunk_lock);
+	}
 	ret = btrfs_del_item(trans, root, path);
 
 out:
@@ -1356,6 +1361,11 @@ int btrfs_rm_device(struct btrfs_root *root, char *device_path)
 	if (ret)
 		goto error_undo;
 
+	spin_lock(&root->fs_info->free_chunk_lock);
+	root->fs_info->free_chunk_space = device->total_bytes -
+		device->bytes_used;
+	spin_unlock(&root->fs_info->free_chunk_lock);
+
 	device->in_fs_metadata = 0;
 	btrfs_scrub_cancel_dev(root, device);
 
@@ -1691,6 +1701,10 @@ int btrfs_init_new_device(struct btrfs_root *root, char *device_path)
 		root->fs_info->fs_devices->num_can_discard++;
 	root->fs_info->fs_devices->total_rw_bytes += device->total_bytes;
 
+	spin_lock(&root->fs_info->free_chunk_lock);
+	root->fs_info->free_chunk_space += device->total_bytes;
+	spin_unlock(&root->fs_info->free_chunk_lock);
+
 	if (!blk_queue_nonrot(bdev_get_queue(bdev)))
 		root->fs_info->fs_devices->rotating = 1;
 
@@ -2192,8 +2206,12 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
 	lock_chunks(root);
 
 	device->total_bytes = new_size;
-	if (device->writeable)
+	if (device->writeable) {
 		device->fs_devices->total_rw_bytes -= diff;
+		spin_lock(&root->fs_info->free_chunk_lock);
+		root->fs_info->free_chunk_space -= diff;
+		spin_unlock(&root->fs_info->free_chunk_lock);
+	}
 	unlock_chunks(root);
 
 again:
@@ -2257,6 +2275,9 @@ again:
 		device->total_bytes = old_size;
 		if (device->writeable)
 			device->fs_devices->total_rw_bytes += diff;
+		spin_lock(&root->fs_info->free_chunk_lock);
+		root->fs_info->free_chunk_space += diff;
+		spin_unlock(&root->fs_info->free_chunk_lock);
 		unlock_chunks(root);
 		goto done;
 	}
@@ -2615,6 +2636,11 @@ static int __finish_chunk_alloc(struct btrfs_trans_handle *trans,
 		index++;
 	}
 
+	spin_lock(&extent_root->fs_info->free_chunk_lock);
+	extent_root->fs_info->free_chunk_space -= (stripe_size *
+						   map->num_stripes);
+	spin_unlock(&extent_root->fs_info->free_chunk_lock);
+
 	index = 0;
 	stripe = &chunk->stripe;
 	while (index < map->num_stripes) {
@@ -3616,8 +3642,13 @@ static int read_one_dev(struct btrfs_root *root,
 	fill_device_from_item(leaf, dev_item, device);
 	device->dev_root = root->fs_info->dev_root;
 	device->in_fs_metadata = 1;
-	if (device->writeable)
+	if (device->writeable) {
 		device->fs_devices->total_rw_bytes += device->total_bytes;
+		spin_lock(&root->fs_info->free_chunk_lock);
+		root->fs_info->free_chunk_space += device->total_bytes -
+			device->bytes_used;
+		spin_unlock(&root->fs_info->free_chunk_lock);
+	}
 	ret = 0;
 	return ret;
 }
-- 
1.7.5.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] Btrfs: allow us to overcommit our enospc reservations TEST THIS PLEASE!!!
  2011-09-26 21:22 [PATCH] Btrfs: allow us to overcommit our enospc reservations TEST THIS PLEASE!!! Josef Bacik
@ 2011-10-11 17:33 ` Mitch Harder
  2011-10-11 17:43   ` Josef Bacik
                     ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Mitch Harder @ 2011-10-11 17:33 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs

On Mon, Sep 26, 2011 at 4:22 PM, Josef Bacik <josef@redhat.com> wrote:
>
> go from taking around 45 minutes to 10 seconds on my freshly formatte=
d 3 TiB
> file system. =A0This doesn't seem to break my other enospc tests, but=
 could really
> use some more testing as this is a super scary change. =A0Thanks,
>

I've been testing Josef's git.kernel.org testing tree, and I've
bisected an error down to this commit.

I'm triggering the error using a removedirs benchmark in filebench
with the following profile:
load removedirs
set $dir=3D/mnt/benchmark/filebench
set $ndirs=3D400000
run

Here's the dmesg dump:
[   89.972715] device fsid 48b0ed3c-0b57-44ee-9554-37c707dc03c7 devid
1 transid 7 /dev/sdb6
[   89.975565] btrfs: disk space caching is enabled
[  389.240070] btrfs failed to delete reference to 00000076, inode
2175208 parent 2024324
[  389.240095] btrfs failed to delete reference to 00000077, inode
2215544 parent 2024324
[  389.240485] btrfs failed to delete reference to 00000080, inode
2158464 parent 2024324
[  389.240521] btrfs failed to delete reference to 00000081, inode
2187285 parent 2024324
[  389.240693] btrfs failed to delete reference to 00000085, inode
2349157 parent 2024324
[  389.240802] btrfs failed to delete reference to 00000087, inode
2139156 parent 2024324
[  389.241006] btrfs failed to delete reference to 00000090, inode
2353094 parent 2024324
[  389.241041] btrfs failed to delete reference to 00000091, inode
2355786 parent 2024324
[  389.241085] btrfs failed to delete reference to 00000092, inode
2357463 parent 2024324
[  389.241119] btrfs failed to delete reference to 00000093, inode
2361163 parent 2024324
[  389.241300] btrfs failed to delete reference to 00000095, inode
2366103 parent 2024324
[  389.241637] btrfs failed to delete reference to 00000096, inode
2229779 parent 2024324
[  389.241661] btrfs failed to delete reference to 00000097, inode
2349423 parent 2024324
[  389.241741] btrfs failed to delete reference to 00000099, inode
2240025 parent 2024324
[  389.241870] btrfs failed to delete reference to 00000101, inode
2347096 parent 2024324
[  389.241969] btrfs failed to delete reference to 00000103, inode
2198337 parent 2024324
[  389.242239] btrfs failed to delete reference to 00000104, inode
2206224 parent 2024324
[  389.242332] btrfs failed to delete reference to 00000106, inode
2364824 parent 2024324
[  389.242353] btrfs failed to delete reference to 00000107, inode
2276826 parent 2024324
[  389.242374] btrfs failed to delete reference to 00000108, inode
2368177 parent 2024324
[  389.242552] btrfs failed to delete reference to 00000111, inode
2375233 parent 2024324
[  389.243183] btrfs failed to delete reference to 00000118, inode
2165951 parent 2024324
[  389.243221] btrfs failed to delete reference to 00000119, inode
2387229 parent 2024324
[  389.243351] btrfs failed to delete reference to 00000121, inode
2236403 parent 2024324
[  389.243385] btrfs failed to delete reference to 00000120, inode
2392838 parent 2024324
[  389.243456] btrfs failed to delete reference to 00000123, inode
2396706 parent 2024324
[  389.243478] btrfs failed to delete reference to 00000124, inode
2400988 parent 2024324
[  389.243496] btrfs failed to delete reference to 00000125, inode
2363919 parent 2024324
[  389.243889] btrfs failed to delete reference to 00000128, inode
2136496 parent 2024324
[  389.243951] btrfs failed to delete reference to 00000129, inode
2149209 parent 2024324
[  389.244045] btrfs failed to delete reference to 00000132, inode
2151500 parent 2024324
[  389.244114] btrfs failed to delete reference to 00000133, inode
2180704 parent 2024324
[  389.244179] btrfs failed to delete reference to 00000134, inode
2197300 parent 2024324
[  389.244413] btrfs failed to delete reference to 00000131, inode
2126799 parent 2024324
[  389.244434] btrfs failed to delete reference to 00000137, inode
2208205 parent 2024324
[  389.244470] btrfs failed to delete reference to 00000138, inode
2220635 parent 2024324
[  389.244700] btrfs failed to delete reference to 00000143, inode
2182109 parent 2024324
[  389.244914] btrfs failed to delete reference to 00000144, inode
2342857 parent 2024324
[  389.244935] btrfs failed to delete reference to 00000145, inode
2350382 parent 2024324
[  389.245219] btrfs failed to delete reference to 00000146, inode
2357237 parent 2024324
[  389.245437] btrfs failed to delete reference to 00000149, inode
2193875 parent 2024324
[  389.245476] btrfs failed to delete reference to 00000150, inode
2371468 parent 2024324
[  389.245523] btrfs failed to delete reference to 00000151, inode
2379733 parent 2024324
[  389.245684] btrfs failed to delete reference to 00000153, inode
2245651 parent 2024324
[  389.246028] btrfs failed to delete reference to 00000158, inode
2157477 parent 2024324
[  389.246223] btrfs failed to delete reference to 00000159, inode
2165824 parent 2024324
[  389.246970] ------------[ cut here ]------------
[  389.246972] kernel BUG at fs/btrfs/inode.c:2176!
[  389.246974] invalid opcode: 0000 [#1] SMP
[  389.246976] CPU 1
[  389.246978] Modules linked in: ipv6 snd_seq_midi snd_seq_dummy
snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss
lgdt330x cx88_dvb cx88_vp3054_i2c videobuf_dvb dvb_core rc_hauppauge
tuner_simple tuner_types tda9887 tda8290 ir_lirc_codec tuner lirc_dev
nvidia(P) ir_mce_kbd_decoder ir_sony_decoder ir_jvc_decoder
ir_rc6_decoder ir_rc5_decoder ir_nec_decoder cx8800 cx8802 cx88_alsa
snd_ens1371 cx88xx gameport rc_core i2c_algo_bit tveeprom v4l2_common
videodev media videobuf_dma_sg v4l2_compat_ioctl32 tpm_tis tpm ppdev
btcx_risc parport_pc parport videobuf_core tpm_bios sr_mod iTCO_wdt
i2c_i801 r8169 pcspkr i2c_core iTCO_vendor_support snd_rawmidi
intel_agp snd_seq_device intel_gtt snd_ac97_codec ac97_bus snd_pcm
snd_timer snd snd_page_alloc iscsi_tcp libiscsi_tcp libiscsi fuse nfs
nfs_acl auth_rpcgss lockd sunrpc sl811_hcd ohci_hcd uhci_hcd ehci_hcd
[  389.247024]
[  389.247026] Pid: 3629, comm: go_filebench Tainted: P
3.1.0-rc9+ #14 Gigabyte Technology Co., Ltd. P35-DS3L/P35-DS3L
[  389.247030] RIP: 0010:[<ffffffff812b5857>]  [<ffffffff812b5857>]
btrfs_orphan_add+0x11b/0x133
[  389.247037] RSP: 0018:ffff880024f47dd8  EFLAGS: 00010286
[  389.247039] RAX: 0000000000000000 RBX: ffff880073f47800 RCX: 0000000=
00162525c
[  389.247041] RDX: 00000000ffffffe4 RSI: ffff880024f47d98 RDI: ffffea0=
001e938c0
[  389.247043] RBP: ffff880024f47e08 R08: ffffffff81295e05 R09: 0000000=
009550000
[  389.247045] R10: 0000000000000001 R11: ffff880024f47d98 R12: ffff880=
07a7e14c8
[  389.247047] R13: ffff8800556a9870 R14: 0000000000000000 R15: 0000000=
000000000
[  389.247049] FS:  00007fffe824f700(0000) GS:ffff88007fd00000(0000)
knlGS:0000000000000000
[  389.247051] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  389.247053] CR2: 00007ffff7ff4000 CR3: 0000000031009000 CR4: 0000000=
0000006e0
[  389.247055] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000=
000000000
[  389.247057] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000=
000000400
[  389.247060] Process go_filebench (pid: 3629, threadinfo
ffff880024f46000, task ffff88007bd8a170)
[  389.247061] Stack:
[  389.247063]  0000000000000003 ffff8800556a9870 ffff88007a7e14c8
ffff88006729cd00
[  389.247066]  ffff880073f47800 ffff8800525fa900 ffff880024f47e58
ffffffff812b844a
[  389.247069]  ffff8800751e4c00 ffff8800525fa9a0 ffff880024f47e58
ffff8800525fa900
[  389.247073] Call Trace:
[  389.247076]  [<ffffffff812b844a>] btrfs_rmdir+0xc2/0x124
[  389.247079]  [<ffffffff81104cbd>] vfs_rmdir+0x93/0xeb
[  389.247082]  [<ffffffff81106a63>] do_rmdir+0xdb/0x12f
[  389.247086]  [<ffffffff813335b3>] ? rwsem_wake+0x39/0x42
[  389.247089]  [<ffffffff81337557>] ? call_rwsem_wake+0x17/0x30
[  389.247092]  [<ffffffff810cfc35>] ? remove_vma+0x77/0x7f
[  389.247095]  [<ffffffff81107a8b>] sys_rmdir+0x16/0x18
[  389.247099]  [<ffffffff8162316b>] system_call_fastpath+0x16/0x1b
[  389.247101] Code: 95 78 fe ff ff 48 85 d2 74 0a 41 80 bd 80 fe ff
ff 84 75 04 49 8b 55 40 48 89 de 4c 89 e7 e8 59 19 02 00 89 c2 31 c0
85 d2 74 0b <0f> 0b b8 f4 ff ff ff eb 02 31 c0 41 5a 5b 41 5c 41 5d 41
5e 41
[  389.247124] RIP  [<ffffffff812b5857>] btrfs_orphan_add+0x11b/0x133
[  389.247127]  RSP <ffff880024f47dd8>
[  389.247141] ---[ end trace 5f145d9895a8631c ]---
[  402.382211] ------------[ cut here ]------------
[  402.382214] kernel BUG at fs/btrfs/extent-tree.c:5565!
[  402.382216] invalid opcode: 0000 [#2] SMP
[  402.382219] CPU 0
[  402.382220] Modules linked in: ipv6 snd_seq_midi snd_seq_dummy
snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss
lgdt330x cx88_dvb cx88_vp3054_i2c videobuf_dvb dvb_core rc_hauppauge
tuner_simple tuner_types tda9887 tda8290 ir_lirc_codec tuner lirc_dev
nvidia(P) ir_mce_kbd_decoder ir_sony_decoder ir_jvc_decoder
ir_rc6_decoder ir_rc5_decoder ir_nec_decoder cx8800 cx8802 cx88_alsa
snd_ens1371 cx88xx gameport rc_core i2c_algo_bit tveeprom v4l2_common
videodev media videobuf_dma_sg v4l2_compat_ioctl32 tpm_tis tpm ppdev
btcx_risc parport_pc parport videobuf_core tpm_bios sr_mod iTCO_wdt
i2c_i801 r8169 pcspkr i2c_core iTCO_vendor_support snd_rawmidi
intel_agp snd_seq_device intel_gtt snd_ac97_codec ac97_bus snd_pcm
snd_timer snd snd_page_alloc iscsi_tcp libiscsi_tcp libiscsi fuse nfs
nfs_acl auth_rpcgss lockd sunrpc sl811_hcd ohci_hcd uhci_hcd ehci_hcd
[  402.382266]
[  402.382269] Pid: 3491, comm: btrfs-transacti Tainted: P      D
3.1.0-rc9+ #14 Gigabyte Technology Co., Ltd. P35-DS3L/P35-DS3L
[  402.382273] RIP: 0010:[<ffffffff812a2c05>]  [<ffffffff812a2c05>]
run_clustered_refs+0x38f/0x6e0
[  402.382281] RSP: 0018:ffff880075a67c50  EFLAGS: 00010286
[  402.382283] RAX: 00000000ffffffe4 RBX: ffff88007603b0c0 RCX: 0000000=
000000000
[  402.382285] RDX: ffff880059f99b18 RSI: 0000000000000282 RDI: 0000000=
000000000
[  402.382287] RBP: ffff880075a67d30 R08: 0000000000000000 R09: 0000000=
000000000
[  402.382289] R10: 0000000000000001 R11: ffff880075a67ce0 R12: ffff880=
07524a480
[  402.382291] R13: ffff8800336b4280 R14: ffff8800751e4800 R15: ffff880=
074e974b0
[  402.382293] FS:  0000000000000000(0000) GS:ffff88007fc00000(0000)
knlGS:0000000000000000
[  402.382295] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  402.382297] CR2: 00007ffff7ff5000 CR3: 0000000039ba9000 CR4: 0000000=
0000006f0
[  402.382299] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000=
000000000
[  402.382301] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000=
000000400
[  402.382304] Process btrfs-transacti (pid: 3491, threadinfo
ffff880075a66000, task ffff8800760abc30)
[  402.382305] Stack:
[  402.382307]  0000000000000000 0000000000000000 ffff880000000001
0000000000000000
[  402.382310]  ffff88007599c000 ffff880074e97568 000000006d84f440
ffff880075a67d60
[  402.382314]  0000000000000000 ffff8800751e4920 0000000000000cd1
0000000000000005
[  402.382317] Call Trace:
[  402.382322]  [<ffffffff812e2401>] ? btrfs_find_ref_cluster+0x7a/0x12=
d
[  402.382325]  [<ffffffff812a3027>] btrfs_run_delayed_refs+0xd1/0x1c5
[  402.382329]  [<ffffffff812b0668>] btrfs_commit_transaction+0x8a/0x6f=
d
[  402.382332]  [<ffffffff812aff2f>] ? join_transaction.clone.24+0x20/0=
x1f0
[  402.382336]  [<ffffffff810543cf>] ? wake_up_bit+0x2a/0x2a
[  402.382339]  [<ffffffff812ab299>] transaction_kthread+0x172/0x227
[  402.382342]  [<ffffffff812ab127>] ? btrfs_congested_fn+0x86/0x86
[  402.382345]  [<ffffffff812ab127>] ? btrfs_congested_fn+0x86/0x86
[  402.382347]  [<ffffffff81053f12>] kthread+0x82/0x8a
[  402.382351]  [<ffffffff81624294>] kernel_thread_helper+0x4/0x10
[  402.382354]  [<ffffffff81053e90>] ? kthread_worker_fn+0x13a/0x13a
[  402.382356]  [<ffffffff81624290>] ? gs_change+0xb/0xb
[  402.382358] Code: 00 08 41 b9 01 00 00 00 48 8b 72 20 c7 45 cc 33
00 00 00 4c 8d 45 cc 48 8d 4d b0 48 89 c2 48 8b 7d a0 e8 ad 82 ff ff
85 c0 74 02 <0f> 0b 48 8b 55 98 48 8b 45 98 48 8b 12 48 63 70 40 48 89
d7 48
[  402.382381] RIP  [<ffffffff812a2c05>] run_clustered_refs+0x38f/0x6e0
[  402.382384]  RSP <ffff880075a67c50>
[  402.382386] ---[ end trace 5f145d9895a8631d ]---
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Btrfs: allow us to overcommit our enospc reservations TEST THIS PLEASE!!!
  2011-10-11 17:33 ` Mitch Harder
@ 2011-10-11 17:43   ` Josef Bacik
  2011-10-11 18:27   ` Josef Bacik
  2011-10-11 19:00   ` Josef Bacik
  2 siblings, 0 replies; 12+ messages in thread
From: Josef Bacik @ 2011-10-11 17:43 UTC (permalink / raw)
  To: Mitch Harder; +Cc: Josef Bacik, linux-btrfs

On Tue, Oct 11, 2011 at 12:33:48PM -0500, Mitch Harder wrote:
> On Mon, Sep 26, 2011 at 4:22 PM, Josef Bacik <josef@redhat.com> wrote=
:
> >
> > go from taking around 45 minutes to 10 seconds on my freshly format=
ted 3 TiB
> > file system. =A0This doesn't seem to break my other enospc tests, b=
ut could really
> > use some more testing as this is a super scary change. =A0Thanks,
> >
>=20
> I've been testing Josef's git.kernel.org testing tree, and I've
> bisected an error down to this commit.
>=20
> I'm triggering the error using a removedirs benchmark in filebench
> with the following profile:
> load removedirs
> set $dir=3D/mnt/benchmark/filebench
> set $ndirs=3D400000
> run
>=20

Ouch.  Does your pull have this patch

Btrfs: wait for ordered extents if we didn't reclaim enough

If not can you repull and retry your test?  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Btrfs: allow us to overcommit our enospc reservations TEST THIS PLEASE!!!
  2011-10-11 17:33 ` Mitch Harder
  2011-10-11 17:43   ` Josef Bacik
@ 2011-10-11 18:27   ` Josef Bacik
  2011-10-11 19:00   ` Josef Bacik
  2 siblings, 0 replies; 12+ messages in thread
From: Josef Bacik @ 2011-10-11 18:27 UTC (permalink / raw)
  To: Mitch Harder; +Cc: Josef Bacik, linux-btrfs

On Tue, Oct 11, 2011 at 12:33:48PM -0500, Mitch Harder wrote:
> On Mon, Sep 26, 2011 at 4:22 PM, Josef Bacik <josef@redhat.com> wrote=
:
> >
> > go from taking around 45 minutes to 10 seconds on my freshly format=
ted 3 TiB
> > file system. =A0This doesn't seem to break my other enospc tests, b=
ut could really
> > use some more testing as this is a super scary change. =A0Thanks,
> >
>=20
> I've been testing Josef's git.kernel.org testing tree, and I've
> bisected an error down to this commit.
>=20
> I'm triggering the error using a removedirs benchmark in filebench
> with the following profile:
> load removedirs
> set $dir=3D/mnt/benchmark/filebench
> set $ndirs=3D400000
> run
>=20

Hmm I can't get it to reproduce, can you give this a whirl and see if i=
t helps?
Thanks

Josef


diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index fc0de68..609989b 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3334,7 +3334,7 @@ out:
  * shrink metadata reservation for delalloc
  */
 static int shrink_delalloc(struct btrfs_trans_handle *trans,
-			   struct btrfs_root *root, u64 to_reclaim, int sync)
+			   struct btrfs_root *root, u64 to_reclaim, int retries)
 {
 	struct btrfs_block_rsv *block_rsv;
 	struct btrfs_space_info *space_info;
@@ -3384,14 +3384,22 @@ static int shrink_delalloc(struct btrfs_trans_h=
andle *trans,
 		if (reserved =3D=3D 0 || reclaimed >=3D max_reclaim)
 			break;
=20
-		if (trans && trans->transaction->blocked)
+		if (trans)
 			return -EAGAIN;
=20
-		time_left =3D schedule_timeout_interruptible(1);
+		if (!retries) {
+			time_left =3D schedule_timeout_interruptible(1);
=20
-		/* We were interrupted, exit */
-		if (time_left)
-			break;
+			/* We were interrupted, exit */
+			if (time_left)
+				break;
+		} else {
+			/*
+			 * We've already done this song and dance once, let's
+			 * really wait for some work to get done.
+			 */
+			btrfs_wait_ordered_extents(root, 0, 0);
+		}
=20
 		/* we've kicked the IO a few times, if anything has been freed,
 		 * exit.  There is no sense in looping here for a long time
@@ -3552,7 +3560,7 @@ again:
 	 * We do synchronous shrinking since we don't actually unreserve
 	 * metadata until after the IO is completed.
 	 */
-	ret =3D shrink_delalloc(trans, root, num_bytes, 1);
+	ret =3D shrink_delalloc(trans, root, num_bytes, retries);
 	if (ret < 0)
 		goto out;
=20
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] Btrfs: allow us to overcommit our enospc reservations TEST THIS PLEASE!!!
  2011-10-11 17:33 ` Mitch Harder
  2011-10-11 17:43   ` Josef Bacik
  2011-10-11 18:27   ` Josef Bacik
@ 2011-10-11 19:00   ` Josef Bacik
  2011-10-11 19:44     ` Mitch Harder
  2 siblings, 1 reply; 12+ messages in thread
From: Josef Bacik @ 2011-10-11 19:00 UTC (permalink / raw)
  To: Mitch Harder; +Cc: Josef Bacik, linux-btrfs

On Tue, Oct 11, 2011 at 12:33:48PM -0500, Mitch Harder wrote:
> On Mon, Sep 26, 2011 at 4:22 PM, Josef Bacik <josef@redhat.com> wrote=
:
> >
> > go from taking around 45 minutes to 10 seconds on my freshly format=
ted 3 TiB
> > file system. =A0This doesn't seem to break my other enospc tests, b=
ut could really
> > use some more testing as this is a super scary change. =A0Thanks,
> >
>=20
> I've been testing Josef's git.kernel.org testing tree, and I've
> bisected an error down to this commit.
>=20
> I'm triggering the error using a removedirs benchmark in filebench
> with the following profile:
> load removedirs
> set $dir=3D/mnt/benchmark/filebench
> set $ndirs=3D400000
> run
>=20

Ok try this one, it will write out more and harder, see if that helps. =
 Thanks,

Josef


diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index fc0de68..c81ca44 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3334,7 +3334,7 @@ out:
  * shrink metadata reservation for delalloc
  */
 static int shrink_delalloc(struct btrfs_trans_handle *trans,
-			   struct btrfs_root *root, u64 to_reclaim, int sync)
+			   struct btrfs_root *root, u64 to_reclaim, int retries)
 {
 	struct btrfs_block_rsv *block_rsv;
 	struct btrfs_space_info *space_info;
@@ -3365,12 +3365,10 @@ static int shrink_delalloc(struct btrfs_trans_h=
andle *trans,
 	}
=20
 	max_reclaim =3D min(reserved, to_reclaim);
+	if (max_reclaim > (2 * 1024 * 1024))
+		nr_pages =3D max_reclaim >> PAGE_CACHE_SHIFT;
=20
 	while (loops < 1024) {
-		/* have the flusher threads jump in and do some IO */
-		smp_mb();
-		nr_pages =3D min_t(unsigned long, nr_pages,
-		       root->fs_info->delalloc_bytes >> PAGE_CACHE_SHIFT);
 		writeback_inodes_sb_nr_if_idle(root->fs_info->sb, nr_pages);
=20
 		spin_lock(&space_info->lock);
@@ -3384,14 +3382,22 @@ static int shrink_delalloc(struct btrfs_trans_h=
andle *trans,
 		if (reserved =3D=3D 0 || reclaimed >=3D max_reclaim)
 			break;
=20
-		if (trans && trans->transaction->blocked)
+		if (trans)
 			return -EAGAIN;
=20
-		time_left =3D schedule_timeout_interruptible(1);
+		if (!retries) {
+			time_left =3D schedule_timeout_interruptible(1);
=20
-		/* We were interrupted, exit */
-		if (time_left)
-			break;
+			/* We were interrupted, exit */
+			if (time_left)
+				break;
+		} else {
+			/*
+			 * We've already done this song and dance once, let's
+			 * really wait for some work to get done.
+			 */
+			btrfs_wait_ordered_extents(root, 0, 0);
+		}
=20
 		/* we've kicked the IO a few times, if anything has been freed,
 		 * exit.  There is no sense in looping here for a long time
@@ -3399,15 +3405,13 @@ static int shrink_delalloc(struct btrfs_trans_h=
andle *trans,
 		 * just too many writers without enough free space
 		 */
=20
-		if (loops > 3) {
+		if (!retries && loops > 3) {
 			smp_mb();
 			if (progress !=3D space_info->reservation_progress)
 				break;
 		}
=20
 	}
-	if (reclaimed < to_reclaim && !trans)
-		btrfs_wait_ordered_extents(root, 0, 0);
 	return reclaimed >=3D to_reclaim;
 }
=20
@@ -3552,7 +3556,7 @@ again:
 	 * We do synchronous shrinking since we don't actually unreserve
 	 * metadata until after the IO is completed.
 	 */
-	ret =3D shrink_delalloc(trans, root, num_bytes, 1);
+	ret =3D shrink_delalloc(trans, root, num_bytes, retries);
 	if (ret < 0)
 		goto out;
=20
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] Btrfs: allow us to overcommit our enospc reservations TEST THIS PLEASE!!!
  2011-10-11 19:00   ` Josef Bacik
@ 2011-10-11 19:44     ` Mitch Harder
  2011-10-11 20:01       ` Josef Bacik
  0 siblings, 1 reply; 12+ messages in thread
From: Mitch Harder @ 2011-10-11 19:44 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs

On Tue, Oct 11, 2011 at 2:00 PM, Josef Bacik <josef@redhat.com> wrote:
> On Tue, Oct 11, 2011 at 12:33:48PM -0500, Mitch Harder wrote:
>> On Mon, Sep 26, 2011 at 4:22 PM, Josef Bacik <josef@redhat.com> wrot=
e:
>> >
>> > go from taking around 45 minutes to 10 seconds on my freshly forma=
tted 3 TiB
>> > file system. =A0This doesn't seem to break my other enospc tests, =
but could really
>> > use some more testing as this is a super scary change. =A0Thanks,
>> >
>>
>> I've been testing Josef's git.kernel.org testing tree, and I've
>> bisected an error down to this commit.
>>
>> I'm triggering the error using a removedirs benchmark in filebench
>> with the following profile:
>> load removedirs
>> set $dir=3D/mnt/benchmark/filebench
>> set $ndirs=3D400000
>> run
>>
>
> Ok try this one, it will write out more and harder, see if that helps=
=2E =A0Thanks,
>

Still running into BUG at fs/btrfs/inode.c:2176!
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Btrfs: allow us to overcommit our enospc reservations TEST THIS PLEASE!!!
  2011-10-11 19:44     ` Mitch Harder
@ 2011-10-11 20:01       ` Josef Bacik
  2011-10-11 20:45         ` Mitch Harder
  0 siblings, 1 reply; 12+ messages in thread
From: Josef Bacik @ 2011-10-11 20:01 UTC (permalink / raw)
  To: Mitch Harder; +Cc: Josef Bacik, linux-btrfs

On Tue, Oct 11, 2011 at 02:44:09PM -0500, Mitch Harder wrote:
> On Tue, Oct 11, 2011 at 2:00 PM, Josef Bacik <josef@redhat.com> wrote=
:
> > On Tue, Oct 11, 2011 at 12:33:48PM -0500, Mitch Harder wrote:
> >> On Mon, Sep 26, 2011 at 4:22 PM, Josef Bacik <josef@redhat.com> wr=
ote:
> >> >
> >> > go from taking around 45 minutes to 10 seconds on my freshly for=
matted 3 TiB
> >> > file system. =A0This doesn't seem to break my other enospc tests=
, but could really
> >> > use some more testing as this is a super scary change. =A0Thanks=
,
> >> >
> >>
> >> I've been testing Josef's git.kernel.org testing tree, and I've
> >> bisected an error down to this commit.
> >>
> >> I'm triggering the error using a removedirs benchmark in filebench
> >> with the following profile:
> >> load removedirs
> >> set $dir=3D/mnt/benchmark/filebench
> >> set $ndirs=3D400000
> >> run
> >>
> >
> > Ok try this one, it will write out more and harder, see if that hel=
ps. =A0Thanks,
> >
>=20
> Still running into BUG at fs/btrfs/inode.c:2176!

How about this one?

Josef


diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index fc0de68..c81ca44 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3334,7 +3334,7 @@ out:
  * shrink metadata reservation for delalloc
  */
 static int shrink_delalloc(struct btrfs_trans_handle *trans,
-			   struct btrfs_root *root, u64 to_reclaim, int sync)
+			   struct btrfs_root *root, u64 to_reclaim, int retries)
 {
 	struct btrfs_block_rsv *block_rsv;
 	struct btrfs_space_info *space_info;
@@ -3365,12 +3365,10 @@ static int shrink_delalloc(struct btrfs_trans_h=
andle *trans,
 	}
=20
 	max_reclaim =3D min(reserved, to_reclaim);
+	if (max_reclaim > (2 * 1024 * 1024))
+		nr_pages =3D max_reclaim >> PAGE_CACHE_SHIFT;
=20
 	while (loops < 1024) {
-		/* have the flusher threads jump in and do some IO */
-		smp_mb();
-		nr_pages =3D min_t(unsigned long, nr_pages,
-		       root->fs_info->delalloc_bytes >> PAGE_CACHE_SHIFT);
 		writeback_inodes_sb_nr_if_idle(root->fs_info->sb, nr_pages);
=20
 		spin_lock(&space_info->lock);
@@ -3384,14 +3382,22 @@ static int shrink_delalloc(struct btrfs_trans_h=
andle *trans,
 		if (reserved =3D=3D 0 || reclaimed >=3D max_reclaim)
 			break;
=20
-		if (trans && trans->transaction->blocked)
+		if (trans)
 			return -EAGAIN;
=20
-		time_left =3D schedule_timeout_interruptible(1);
+		if (!retries) {
+			time_left =3D schedule_timeout_interruptible(1);
=20
-		/* We were interrupted, exit */
-		if (time_left)
-			break;
+			/* We were interrupted, exit */
+			if (time_left)
+				break;
+		} else {
+			/*
+			 * We've already done this song and dance once, let's
+			 * really wait for some work to get done.
+			 */
+			btrfs_wait_ordered_extents(root, 0, 0);
+		}
=20
 		/* we've kicked the IO a few times, if anything has been freed,
 		 * exit.  There is no sense in looping here for a long time
@@ -3399,15 +3405,13 @@ static int shrink_delalloc(struct btrfs_trans_h=
andle *trans,
 		 * just too many writers without enough free space
 		 */
=20
-		if (loops > 3) {
+		if (!retries && loops > 3) {
 			smp_mb();
 			if (progress !=3D space_info->reservation_progress)
 				break;
 		}
=20
 	}
-	if (reclaimed < to_reclaim && !trans)
-		btrfs_wait_ordered_extents(root, 0, 0);
 	return reclaimed >=3D to_reclaim;
 }
=20
@@ -3552,7 +3556,7 @@ again:
 	 * We do synchronous shrinking since we don't actually unreserve
 	 * metadata until after the IO is completed.
 	 */
-	ret =3D shrink_delalloc(trans, root, num_bytes, 1);
+	ret =3D shrink_delalloc(trans, root, num_bytes, retries);
 	if (ret < 0)
 		goto out;
=20
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 1153731..1785307 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2758,7 +2758,16 @@ static struct btrfs_trans_handle *__unlink_start=
_trans(struct inode *dir,
 	u64 ino =3D btrfs_ino(inode);
 	u64 dir_ino =3D btrfs_ino(dir);
=20
-	trans =3D btrfs_start_transaction(root, 10);
+	/*
+	 * 1 for the possible orphan item
+	 * 1 for the dir item
+	 * 1 for the dir index
+	 * 1 for the inode ref
+	 * 1 for the inode ref in the tree log
+	 * 2 for the dir entries in the log
+	 * 1 for the inode
+	 */
+	trans =3D btrfs_start_transaction(root, 8);
 	if (!IS_ERR(trans) || PTR_ERR(trans) !=3D -ENOSPC)
 		return trans;
=20
@@ -2781,7 +2790,8 @@ static struct btrfs_trans_handle *__unlink_start_=
trans(struct inode *dir,
 		return ERR_PTR(-ENOMEM);
 	}
=20
-	trans =3D btrfs_start_transaction(root, 0);
+	/* 1 for the orphan item */
+	trans =3D btrfs_start_transaction(root, 1);
 	if (IS_ERR(trans)) {
 		btrfs_free_path(path);
 		root->fs_info->enospc_unlink =3D 0;
@@ -2892,6 +2902,11 @@ out:
 		return ERR_PTR(err);
 	}
=20
+	ret =3D btrfs_block_rsv_migrate(trans->block_rsv,
+				      &root->fs_info->global_block_rsv,
+				      btrfs_calc_trans_metadata_size(root, 1));
+	BUG_ON(ret);
+
 	trans->block_rsv =3D &root->fs_info->global_block_rsv;
 	return trans;
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] Btrfs: allow us to overcommit our enospc reservations TEST THIS PLEASE!!!
  2011-10-11 20:01       ` Josef Bacik
@ 2011-10-11 20:45         ` Mitch Harder
  2011-10-12 17:50           ` Josef Bacik
  0 siblings, 1 reply; 12+ messages in thread
From: Mitch Harder @ 2011-10-11 20:45 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs

On Tue, Oct 11, 2011 at 3:01 PM, Josef Bacik <josef@redhat.com> wrote:
> On Tue, Oct 11, 2011 at 02:44:09PM -0500, Mitch Harder wrote:
>> On Tue, Oct 11, 2011 at 2:00 PM, Josef Bacik <josef@redhat.com> wrot=
e:
>> > On Tue, Oct 11, 2011 at 12:33:48PM -0500, Mitch Harder wrote:
>> >> On Mon, Sep 26, 2011 at 4:22 PM, Josef Bacik <josef@redhat.com> w=
rote:
>> >> >
>> >> > go from taking around 45 minutes to 10 seconds on my freshly fo=
rmatted 3 TiB
>> >> > file system. =A0This doesn't seem to break my other enospc test=
s, but could really
>> >> > use some more testing as this is a super scary change. =A0Thank=
s,
>> >> >
>> >>
>> >> I've been testing Josef's git.kernel.org testing tree, and I've
>> >> bisected an error down to this commit.
>> >>
>> >> I'm triggering the error using a removedirs benchmark in filebenc=
h
>> >> with the following profile:
>> >> load removedirs
>> >> set $dir=3D/mnt/benchmark/filebench
>> >> set $ndirs=3D400000
>> >> run
>> >>
>> >
>> > Ok try this one, it will write out more and harder, see if that he=
lps. =A0Thanks,
>> >
>>
>> Still running into BUG at fs/btrfs/inode.c:2176!
>
> How about this one?
>

Sorry, still getting the same bug.

[  175.956273] kernel BUG at fs/btrfs/inode.c:2176!
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Btrfs: allow us to overcommit our enospc reservations TEST THIS PLEASE!!!
  2011-10-11 20:45         ` Mitch Harder
@ 2011-10-12 17:50           ` Josef Bacik
  2011-10-12 20:45             ` Mitch Harder
  0 siblings, 1 reply; 12+ messages in thread
From: Josef Bacik @ 2011-10-12 17:50 UTC (permalink / raw)
  To: Mitch Harder; +Cc: Josef Bacik, linux-btrfs

On Tue, Oct 11, 2011 at 03:45:45PM -0500, Mitch Harder wrote:
> On Tue, Oct 11, 2011 at 3:01 PM, Josef Bacik <josef@redhat.com> wrote=
:
> > On Tue, Oct 11, 2011 at 02:44:09PM -0500, Mitch Harder wrote:
> >> On Tue, Oct 11, 2011 at 2:00 PM, Josef Bacik <josef@redhat.com> wr=
ote:
> >> > On Tue, Oct 11, 2011 at 12:33:48PM -0500, Mitch Harder wrote:
> >> >> On Mon, Sep 26, 2011 at 4:22 PM, Josef Bacik <josef@redhat.com>=
 wrote:
> >> >> >
> >> >> > go from taking around 45 minutes to 10 seconds on my freshly =
formatted 3 TiB
> >> >> > file system. =A0This doesn't seem to break my other enospc te=
sts, but could really
> >> >> > use some more testing as this is a super scary change. =A0Tha=
nks,
> >> >> >
> >> >>
> >> >> I've been testing Josef's git.kernel.org testing tree, and I've
> >> >> bisected an error down to this commit.
> >> >>
> >> >> I'm triggering the error using a removedirs benchmark in filebe=
nch
> >> >> with the following profile:
> >> >> load removedirs
> >> >> set $dir=3D/mnt/benchmark/filebench
> >> >> set $ndirs=3D400000
> >> >> run
> >> >>
> >> >
> >> > Ok try this one, it will write out more and harder, see if that =
helps. =A0Thanks,
> >> >
> >>
> >> Still running into BUG at fs/btrfs/inode.c:2176!
> >
> > How about this one?
> >
>=20
> Sorry, still getting the same bug.
>=20
> [  175.956273] kernel BUG at fs/btrfs/inode.c:2176!

Ok I think I see what's happening, this patch replaces the previous one=
, let me
know how it goes.  Thanks,

Josef

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index fc0de68..e595372 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3334,7 +3334,7 @@ out:
  * shrink metadata reservation for delalloc
  */
 static int shrink_delalloc(struct btrfs_trans_handle *trans,
-			   struct btrfs_root *root, u64 to_reclaim, int sync)
+			   struct btrfs_root *root, u64 to_reclaim, int retries)
 {
 	struct btrfs_block_rsv *block_rsv;
 	struct btrfs_space_info *space_info;
@@ -3365,12 +3365,10 @@ static int shrink_delalloc(struct btrfs_trans_h=
andle *trans,
 	}
=20
 	max_reclaim =3D min(reserved, to_reclaim);
+	if (max_reclaim > (2 * 1024 * 1024))
+		nr_pages =3D max_reclaim >> PAGE_CACHE_SHIFT;
=20
 	while (loops < 1024) {
-		/* have the flusher threads jump in and do some IO */
-		smp_mb();
-		nr_pages =3D min_t(unsigned long, nr_pages,
-		       root->fs_info->delalloc_bytes >> PAGE_CACHE_SHIFT);
 		writeback_inodes_sb_nr_if_idle(root->fs_info->sb, nr_pages);
=20
 		spin_lock(&space_info->lock);
@@ -3384,14 +3382,22 @@ static int shrink_delalloc(struct btrfs_trans_h=
andle *trans,
 		if (reserved =3D=3D 0 || reclaimed >=3D max_reclaim)
 			break;
=20
-		if (trans && trans->transaction->blocked)
+		if (trans)
 			return -EAGAIN;
=20
-		time_left =3D schedule_timeout_interruptible(1);
+		if (!retries) {
+			time_left =3D schedule_timeout_interruptible(1);
=20
-		/* We were interrupted, exit */
-		if (time_left)
-			break;
+			/* We were interrupted, exit */
+			if (time_left)
+				break;
+		} else {
+			/*
+			 * We've already done this song and dance once, let's
+			 * really wait for some work to get done.
+			 */
+			btrfs_wait_ordered_extents(root, 0, 0);
+		}
=20
 		/* we've kicked the IO a few times, if anything has been freed,
 		 * exit.  There is no sense in looping here for a long time
@@ -3399,15 +3405,13 @@ static int shrink_delalloc(struct btrfs_trans_h=
andle *trans,
 		 * just too many writers without enough free space
 		 */
=20
-		if (loops > 3) {
+		if (!retries && loops > 3) {
 			smp_mb();
 			if (progress !=3D space_info->reservation_progress)
 				break;
 		}
=20
 	}
-	if (reclaimed < to_reclaim && !trans)
-		btrfs_wait_ordered_extents(root, 0, 0);
 	return reclaimed >=3D to_reclaim;
 }
=20
@@ -3552,7 +3556,7 @@ again:
 	 * We do synchronous shrinking since we don't actually unreserve
 	 * metadata until after the IO is completed.
 	 */
-	ret =3D shrink_delalloc(trans, root, num_bytes, 1);
+	ret =3D shrink_delalloc(trans, root, num_bytes, retries);
 	if (ret < 0)
 		goto out;
=20
@@ -3568,17 +3572,6 @@ again:
 		goto again;
 	}
=20
-	/*
-	 * Not enough space to be reclaimed, don't bother committing the
-	 * transaction.
-	 */
-	spin_lock(&space_info->lock);
-	if (space_info->bytes_pinned < orig_bytes)
-		ret =3D -ENOSPC;
-	spin_unlock(&space_info->lock);
-	if (ret)
-		goto out;
-
 	ret =3D -EAGAIN;
 	if (trans)
 		goto out;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 1153731..1785307 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2758,7 +2758,16 @@ static struct btrfs_trans_handle *__unlink_start=
_trans(struct inode *dir,
 	u64 ino =3D btrfs_ino(inode);
 	u64 dir_ino =3D btrfs_ino(dir);
=20
-	trans =3D btrfs_start_transaction(root, 10);
+	/*
+	 * 1 for the possible orphan item
+	 * 1 for the dir item
+	 * 1 for the dir index
+	 * 1 for the inode ref
+	 * 1 for the inode ref in the tree log
+	 * 2 for the dir entries in the log
+	 * 1 for the inode
+	 */
+	trans =3D btrfs_start_transaction(root, 8);
 	if (!IS_ERR(trans) || PTR_ERR(trans) !=3D -ENOSPC)
 		return trans;
=20
@@ -2781,7 +2790,8 @@ static struct btrfs_trans_handle *__unlink_start_=
trans(struct inode *dir,
 		return ERR_PTR(-ENOMEM);
 	}
=20
-	trans =3D btrfs_start_transaction(root, 0);
+	/* 1 for the orphan item */
+	trans =3D btrfs_start_transaction(root, 1);
 	if (IS_ERR(trans)) {
 		btrfs_free_path(path);
 		root->fs_info->enospc_unlink =3D 0;
@@ -2892,6 +2902,11 @@ out:
 		return ERR_PTR(err);
 	}
=20
+	ret =3D btrfs_block_rsv_migrate(trans->block_rsv,
+				      &root->fs_info->global_block_rsv,
+				      btrfs_calc_trans_metadata_size(root, 1));
+	BUG_ON(ret);
+
 	trans->block_rsv =3D &root->fs_info->global_block_rsv;
 	return trans;
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] Btrfs: allow us to overcommit our enospc reservations TEST THIS PLEASE!!!
  2011-10-12 17:50           ` Josef Bacik
@ 2011-10-12 20:45             ` Mitch Harder
  2011-10-13 12:57               ` Josef Bacik
  0 siblings, 1 reply; 12+ messages in thread
From: Mitch Harder @ 2011-10-12 20:45 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs

On Wed, Oct 12, 2011 at 12:50 PM, Josef Bacik <josef@redhat.com> wrote:
> On Tue, Oct 11, 2011 at 03:45:45PM -0500, Mitch Harder wrote:
>> On Tue, Oct 11, 2011 at 3:01 PM, Josef Bacik <josef@redhat.com> wrot=
e:
>> > On Tue, Oct 11, 2011 at 02:44:09PM -0500, Mitch Harder wrote:
>> >> On Tue, Oct 11, 2011 at 2:00 PM, Josef Bacik <josef@redhat.com> w=
rote:
>> >> > On Tue, Oct 11, 2011 at 12:33:48PM -0500, Mitch Harder wrote:
>> >> >> On Mon, Sep 26, 2011 at 4:22 PM, Josef Bacik <josef@redhat.com=
> wrote:
>> >> >> >
>> >> >> > go from taking around 45 minutes to 10 seconds on my freshly=
 formatted 3 TiB
>> >> >> > file system. =A0This doesn't seem to break my other enospc t=
ests, but could really
>> >> >> > use some more testing as this is a super scary change. =A0Th=
anks,
>> >> >> >
>> >> >>
>> >> >> I've been testing Josef's git.kernel.org testing tree, and I'v=
e
>> >> >> bisected an error down to this commit.
>> >> >>
>> >> >> I'm triggering the error using a removedirs benchmark in fileb=
ench
>> >> >> with the following profile:
>> >> >> load removedirs
>> >> >> set $dir=3D/mnt/benchmark/filebench
>> >> >> set $ndirs=3D400000
>> >> >> run
>> >> >>
>> >> >
>> >> > Ok try this one, it will write out more and harder, see if that=
 helps. =A0Thanks,
>> >> >
>> >>
>> >> Still running into BUG at fs/btrfs/inode.c:2176!
>> >
>> > How about this one?
>> >
>>
>> Sorry, still getting the same bug.
>>
>> [ =A0175.956273] kernel BUG at fs/btrfs/inode.c:2176!
>
> Ok I think I see what's happening, this patch replaces the previous o=
ne, let me
> know how it goes. =A0Thanks,
>

Getting a slightly different BUG this time:

[  172.889179] ------------[ cut here ]------------
[  172.889182] kernel BUG at fs/btrfs/inode.c:785!
[  172.889184] invalid opcode: 0000 [#1] SMP
[  172.889186] CPU 1
[  172.889187] Modules linked in: ipv6 snd_seq_midi snd_seq_dummy
snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss
lgdt330x cx88_dvb cx88_vp3054_i2c videobuf_dvb dvb_core rc_hauppauge
tuner_simple tuner_types tda9887 tda8290 tuner ir_lirc_codec lirc_dev
ir_mce_kbd_decoder ir_sony_decoder ir_jvc_decoder cx8800 cx8802
cx88_alsa cx88xx ir_rc6_decoder ir_rc5_decoder ir_nec_decoder rc_core
i2c_algo_bit tveeprom v4l2_common videodev snd_ens1371 gameport
videobuf_dma_sg media v4l2_compat_ioctl32 videobuf_core snd_rawmidi
btcx_risc snd_seq_device sr_mod snd_ac97_codec ppdev parport_pc
parport ac97_bus tpm_tis intel_agp snd_pcm tpm i2c_i801 snd_timer
i2c_core intel_gtt tpm_bios snd iTCO_wdt iTCO_vendor_support pcspkr
r8169 snd_page_alloc iscsi_tcp libiscsi_tcp libiscsi fuse nfs nfs_acl
auth_rpcgss lockd sunrpc sl811_hcd ohci_hcd uhci_hcd ehci_hcd
[  172.889232]
[  172.889235] Pid: 1812, comm: btrfs-transacti Not tainted
3.1.0-rc9-josef+ #18 Gigabyte Technology Co., Ltd. P35-DS3L/P35-DS3L
[  172.889239] RIP: 0010:[<ffffffff812b6974>]  [<ffffffff812b6974>]
cow_file_range+0x6a/0x31e
[  172.889245] RSP: 0018:ffff88007aee1570  EFLAGS: 00010246
[  172.889247] RAX: ffff88007af28000 RBX: ffff88007aee7c00 RCX: 0000000=
00000ffff
[  172.889249] RDX: 0000000000000000 RSI: ffffea0001dcc280 RDI: ffff880=
07abb14a0
[  172.889251] RBP: ffff88007aee1620 R08: ffff88007aee18dc R09: ffff880=
07aee18c0
[  172.889253] R10: 0000000000000000 R11: dead000000200200 R12: 0000000=
000000000
[  172.889255] R13: ffff88007abb14a0 R14: 0000000000001000 R15: ffff880=
07abb1310
[  172.889257] FS:  0000000000000000(0000) GS:ffff88007fd00000(0000)
knlGS:0000000000000000
[  172.889259] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  172.889261] CR2: ffffffffff600400 CR3: 000000007acdf000 CR4: 0000000=
0000006e0
[  172.889263] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000=
000000000
[  172.889265] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000=
000000400
[  172.889268] Process btrfs-transacti (pid: 1812, threadinfo
ffff88007aee0000, task ffff88007bccd040)
[  172.889269] Stack:
[  172.889271]  00000002ad0dbfff ffff88005d239448 0000000000000282
0000000000010000
[  172.889274]  0000000000010000 ffffea0001dcc280 ffff88007abb1330
ffff88007aee18dc
[  172.889277]  ffff88007aee18c0 0000000000000000 ffff880000000001
000000000000ffff
[  172.889280] Call Trace:
[  172.889285]  [<ffffffff812d72ab>] ? btrfs_tree_read_unlock_blocking+=
0x51/0x59
[  172.889288]  [<ffffffff812b71ed>] run_delalloc_nocow+0x5c5/0x654
[  172.889292]  [<ffffffff810e95ee>] ? kmem_cache_free+0x20/0xcf
[  172.889295]  [<ffffffff812b72f1>] run_delalloc_range+0x75/0x34f
[  172.889298]  [<ffffffff812c9926>] __extent_writepage+0x1fb/0x5e6
[  172.889302]  [<ffffffff81332915>] ? radix_tree_gang_lookup_tag_slot+=
0x81/0xa2
[  172.889306]  [<ffffffff810ad650>] ? find_get_pages_tag+0x43/0xfb
[  172.889309]  [<ffffffff812c9e8a>]
extent_write_cache_pages.clone.10.clone.20+0x179/0x2b3
[  172.889312]  [<ffffffff812ca1cc>] extent_writepages+0x47/0x5c
[  172.889315]  [<ffffffff812c7290>] ? free_extent_state+0x48/0x4c
[  172.889318]  [<ffffffff812b45a3>] ? uncompress_inline.clone.36+0x148=
/0x148
[  172.889320]  [<ffffffff812c78ba>] ? clear_extent_bit+0x2b7/0x2f2
[  172.889323]  [<ffffffff812b3fdc>] btrfs_writepages+0x27/0x29
[  172.889326]  [<ffffffff810b6674>] do_writepages+0x21/0x2a
[  172.889328]  [<ffffffff810ae4d8>] __filemap_fdatawrite_range+0x53/0x=
55
[  172.889331]  [<ffffffff810af11d>] filemap_fdatawrite+0x1f/0x21
[  172.889334]  [<ffffffff810af13c>] filemap_write_and_wait+0x1d/0x38
[  172.889337]  [<ffffffff812de0f5>] __btrfs_write_out_cache+0x5a2/0x80=
e
[  172.889340]  [<ffffffff812e2b46>] ? btrfs_find_ref_cluster+0x113/0x1=
2d
[  172.889343]  [<ffffffff812de3f2>] btrfs_write_out_cache+0x91/0xc0
[  172.889346]  [<ffffffff812a361f>] btrfs_write_dirty_block_groups+0x3=
ff/0x473
[  172.889349]  [<ffffffff812af8a6>] commit_cowonly_roots+0xc9/0x191
[  172.889352]  [<ffffffff812b0b5b>] btrfs_commit_transaction+0x3f5/0x6=
f3
[  172.889355]  [<ffffffff812b00b7>] ? join_transaction.clone.24+0x20/0=
x1f0
[  172.889359]  [<ffffffff810543cf>] ? wake_up_bit+0x2a/0x2a
[  172.889362]  [<ffffffff812ab421>] transaction_kthread+0x172/0x227
[  172.889365]  [<ffffffff812ab2af>] ? btrfs_congested_fn+0x86/0x86
[  172.889367]  [<ffffffff812ab2af>] ? btrfs_congested_fn+0x86/0x86
[  172.889370]  [<ffffffff81053f12>] kthread+0x82/0x8a
[  172.889373]  [<ffffffff81624914>] kernel_thread_helper+0x4/0x10
[  172.889376]  [<ffffffff81053e90>] ? kthread_worker_fn+0x13a/0x13a
[  172.889378]  [<ffffffff81624910>] ? gs_change+0xb/0xb
[  172.889379] Code: 20 01 00 00 48 89 4d a8 4c 89 45 88 4c 89 4d 90
44 8b b3 f8 02 00 00 48 3b 58 28 74 0e 48 83 bf 78 fe ff ff f4 0f 85
97 02 00 00 <0f> 0b 0f 0b 45 89 f6 48 8b 83 20 01 00 00 48 8b 55 a0 48
05 38
[  172.889402] RIP  [<ffffffff812b6974>] cow_file_range+0x6a/0x31e
[  172.889405]  RSP <ffff88007aee1570>
[  172.889408] ---[ end trace bd2a7fa17108e565 ]---
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Btrfs: allow us to overcommit our enospc reservations TEST THIS PLEASE!!!
  2011-10-12 20:45             ` Mitch Harder
@ 2011-10-13 12:57               ` Josef Bacik
  2011-10-13 15:03                 ` Christian Brunner
  0 siblings, 1 reply; 12+ messages in thread
From: Josef Bacik @ 2011-10-13 12:57 UTC (permalink / raw)
  To: Mitch Harder; +Cc: Josef Bacik, linux-btrfs

On Wed, Oct 12, 2011 at 03:45:04PM -0500, Mitch Harder wrote:
> On Wed, Oct 12, 2011 at 12:50 PM, Josef Bacik <josef@redhat.com> wrot=
e:
> > On Tue, Oct 11, 2011 at 03:45:45PM -0500, Mitch Harder wrote:
> >> On Tue, Oct 11, 2011 at 3:01 PM, Josef Bacik <josef@redhat.com> wr=
ote:
> >> > On Tue, Oct 11, 2011 at 02:44:09PM -0500, Mitch Harder wrote:
> >> >> On Tue, Oct 11, 2011 at 2:00 PM, Josef Bacik <josef@redhat.com>=
 wrote:
> >> >> > On Tue, Oct 11, 2011 at 12:33:48PM -0500, Mitch Harder wrote:
> >> >> >> On Mon, Sep 26, 2011 at 4:22 PM, Josef Bacik <josef@redhat.c=
om> wrote:
> >> >> >> >
> >> >> >> > go from taking around 45 minutes to 10 seconds on my fresh=
ly formatted 3 TiB
> >> >> >> > file system. =A0This doesn't seem to break my other enospc=
 tests, but could really
> >> >> >> > use some more testing as this is a super scary change. =A0=
Thanks,
> >> >> >> >
> >> >> >>
> >> >> >> I've been testing Josef's git.kernel.org testing tree, and I=
've
> >> >> >> bisected an error down to this commit.
> >> >> >>
> >> >> >> I'm triggering the error using a removedirs benchmark in fil=
ebench
> >> >> >> with the following profile:
> >> >> >> load removedirs
> >> >> >> set $dir=3D/mnt/benchmark/filebench
> >> >> >> set $ndirs=3D400000
> >> >> >> run
> >> >> >>
> >> >> >
> >> >> > Ok try this one, it will write out more and harder, see if th=
at helps. =A0Thanks,
> >> >> >
> >> >>
> >> >> Still running into BUG at fs/btrfs/inode.c:2176!
> >> >
> >> > How about this one?
> >> >
> >>
> >> Sorry, still getting the same bug.
> >>
> >> [ =A0175.956273] kernel BUG at fs/btrfs/inode.c:2176!
> >
> > Ok I think I see what's happening, this patch replaces the previous=
 one, let me
> > know how it goes. =A0Thanks,
> >
>=20
> Getting a slightly different BUG this time:
>=20

Ok looks like I've fixed the original problem and now we're hitting a p=
roblem
with the free space cache.  This patch will replace the last one, its a=
ll the
fixes up to now and a new set of BUG_ON()'s to figure out which free sp=
ace cache
inode is screwing us up.  Thanks,

Josef


diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index fc0de68..e595372 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3334,7 +3334,7 @@ out:
  * shrink metadata reservation for delalloc
  */
 static int shrink_delalloc(struct btrfs_trans_handle *trans,
-			   struct btrfs_root *root, u64 to_reclaim, int sync)
+			   struct btrfs_root *root, u64 to_reclaim, int retries)
 {
 	struct btrfs_block_rsv *block_rsv;
 	struct btrfs_space_info *space_info;
@@ -3365,12 +3365,10 @@ static int shrink_delalloc(struct btrfs_trans_h=
andle *trans,
 	}
=20
 	max_reclaim =3D min(reserved, to_reclaim);
+	if (max_reclaim > (2 * 1024 * 1024))
+		nr_pages =3D max_reclaim >> PAGE_CACHE_SHIFT;
=20
 	while (loops < 1024) {
-		/* have the flusher threads jump in and do some IO */
-		smp_mb();
-		nr_pages =3D min_t(unsigned long, nr_pages,
-		       root->fs_info->delalloc_bytes >> PAGE_CACHE_SHIFT);
 		writeback_inodes_sb_nr_if_idle(root->fs_info->sb, nr_pages);
=20
 		spin_lock(&space_info->lock);
@@ -3384,14 +3382,22 @@ static int shrink_delalloc(struct btrfs_trans_h=
andle *trans,
 		if (reserved =3D=3D 0 || reclaimed >=3D max_reclaim)
 			break;
=20
-		if (trans && trans->transaction->blocked)
+		if (trans)
 			return -EAGAIN;
=20
-		time_left =3D schedule_timeout_interruptible(1);
+		if (!retries) {
+			time_left =3D schedule_timeout_interruptible(1);
=20
-		/* We were interrupted, exit */
-		if (time_left)
-			break;
+			/* We were interrupted, exit */
+			if (time_left)
+				break;
+		} else {
+			/*
+			 * We've already done this song and dance once, let's
+			 * really wait for some work to get done.
+			 */
+			btrfs_wait_ordered_extents(root, 0, 0);
+		}
=20
 		/* we've kicked the IO a few times, if anything has been freed,
 		 * exit.  There is no sense in looping here for a long time
@@ -3399,15 +3405,13 @@ static int shrink_delalloc(struct btrfs_trans_h=
andle *trans,
 		 * just too many writers without enough free space
 		 */
=20
-		if (loops > 3) {
+		if (!retries && loops > 3) {
 			smp_mb();
 			if (progress !=3D space_info->reservation_progress)
 				break;
 		}
=20
 	}
-	if (reclaimed < to_reclaim && !trans)
-		btrfs_wait_ordered_extents(root, 0, 0);
 	return reclaimed >=3D to_reclaim;
 }
=20
@@ -3552,7 +3556,7 @@ again:
 	 * We do synchronous shrinking since we don't actually unreserve
 	 * metadata until after the IO is completed.
 	 */
-	ret =3D shrink_delalloc(trans, root, num_bytes, 1);
+	ret =3D shrink_delalloc(trans, root, num_bytes, retries);
 	if (ret < 0)
 		goto out;
=20
@@ -3568,17 +3572,6 @@ again:
 		goto again;
 	}
=20
-	/*
-	 * Not enough space to be reclaimed, don't bother committing the
-	 * transaction.
-	 */
-	spin_lock(&space_info->lock);
-	if (space_info->bytes_pinned < orig_bytes)
-		ret =3D -ENOSPC;
-	spin_unlock(&space_info->lock);
-	if (ret)
-		goto out;
-
 	ret =3D -EAGAIN;
 	if (trans)
 		goto out;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d6ba353..cb63904 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -782,7 +782,8 @@ static noinline int cow_file_range(struct inode *in=
ode,
 	struct extent_map_tree *em_tree =3D &BTRFS_I(inode)->extent_tree;
 	int ret =3D 0;
=20
-	BUG_ON(btrfs_is_free_space_inode(root, inode));
+	BUG_ON(root =3D=3D root->fs_info->tree_root);
+	BUG_ON(BTRFS_I(inode)->location.objectid =3D=3D BTRFS_FREE_INO_OBJECT=
ID);
 	trans =3D btrfs_join_transaction(root);
 	BUG_ON(IS_ERR(trans));
 	trans->block_rsv =3D &root->fs_info->delalloc_block_rsv;
@@ -2790,7 +2791,8 @@ static struct btrfs_trans_handle *__unlink_start_=
trans(struct inode *dir,
 		return ERR_PTR(-ENOMEM);
 	}
=20
-	trans =3D btrfs_start_transaction(root, 0);
+	/* 1 for the orphan item */
+	trans =3D btrfs_start_transaction(root, 1);
 	if (IS_ERR(trans)) {
 		btrfs_free_path(path);
 		root->fs_info->enospc_unlink =3D 0;
@@ -2901,6 +2903,11 @@ out:
 		return ERR_PTR(err);
 	}
=20
+	ret =3D btrfs_block_rsv_migrate(trans->block_rsv,
+				      &root->fs_info->global_block_rsv,
+				      btrfs_calc_trans_metadata_size(root, 1));
+	BUG_ON(ret);
+
 	trans->block_rsv =3D &root->fs_info->global_block_rsv;
 	return trans;
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] Btrfs: allow us to overcommit our enospc reservations TEST THIS PLEASE!!!
  2011-10-13 12:57               ` Josef Bacik
@ 2011-10-13 15:03                 ` Christian Brunner
  0 siblings, 0 replies; 12+ messages in thread
From: Christian Brunner @ 2011-10-13 15:03 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Mitch Harder, linux-btrfs

2011/10/13 Josef Bacik <josef@redhat.com>:
[...]
>> >> [ =A0175.956273] kernel BUG at fs/btrfs/inode.c:2176!
>> >
>> > Ok I think I see what's happening, this patch replaces the previou=
s one, let me
>> > know how it goes. =A0Thanks,
>> >
>>
>> Getting a slightly different BUG this time:
>>
>
> Ok looks like I've fixed the original problem and now we're hitting a=
 problem
> with the free space cache. =A0This patch will replace the last one, i=
ts all the
> fixes up to now and a new set of BUG_ON()'s to figure out which free =
space cache
> inode is screwing us up. =A0Thanks,
>
> Josef
>
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index fc0de68..e595372 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -3334,7 +3334,7 @@ out:
> =A0* shrink metadata reservation for delalloc
> =A0*/
> =A0static int shrink_delalloc(struct btrfs_trans_handle *trans,
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0struct btrfs_roo=
t *root, u64 to_reclaim, int sync)
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0struct btrfs_roo=
t *root, u64 to_reclaim, int retries)
> =A0{
> =A0 =A0 =A0 =A0struct btrfs_block_rsv *block_rsv;
> =A0 =A0 =A0 =A0struct btrfs_space_info *space_info;
> @@ -3365,12 +3365,10 @@ static int shrink_delalloc(struct btrfs_trans=
_handle *trans,
> =A0 =A0 =A0 =A0}
>
> =A0 =A0 =A0 =A0max_reclaim =3D min(reserved, to_reclaim);
> + =A0 =A0 =A0 if (max_reclaim > (2 * 1024 * 1024))
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 nr_pages =3D max_reclaim >> PAGE_CACHE_=
SHIFT;
>
> =A0 =A0 =A0 =A0while (loops < 1024) {
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* have the flusher threads jump in and=
 do some IO */
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 smp_mb();
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 nr_pages =3D min_t(unsigned long, nr_pa=
ges,
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0root->fs_info->delalloc_=
bytes >> PAGE_CACHE_SHIFT);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0writeback_inodes_sb_nr_if_idle(root->f=
s_info->sb, nr_pages);
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0spin_lock(&space_info->lock);
> @@ -3384,14 +3382,22 @@ static int shrink_delalloc(struct btrfs_trans=
_handle *trans,
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (reserved =3D=3D 0 || reclaimed >=3D=
 max_reclaim)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0break;
>
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (trans && trans->transaction->blocke=
d)
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (trans)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return -EAGAIN;
>
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 time_left =3D schedule_timeout_interrup=
tible(1);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (!retries) {
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 time_left =3D schedule_=
timeout_interruptible(1);
>
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* We were interrupted, exit */
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (time_left)
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 break;
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* We were interrupted,=
 exit */
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (time_left)
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 break;
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 } else {
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /*
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0* We've already done=
 this song and dance once, let's
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0* really wait for so=
me work to get done.
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0*/
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 btrfs_wait_ordered_exte=
nts(root, 0, 0);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0/* we've kicked the IO a few times, if=
 anything has been freed,
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 * exit. =A0There is no sense in loopi=
ng here for a long time
> @@ -3399,15 +3405,13 @@ static int shrink_delalloc(struct btrfs_trans=
_handle *trans,
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 * just too many writers without enoug=
h free space
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 */
>
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (loops > 3) {
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (!retries && loops > 3) {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0smp_mb();
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (progress !=3D spac=
e_info->reservation_progress)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0break;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0}
>
> =A0 =A0 =A0 =A0}
> - =A0 =A0 =A0 if (reclaimed < to_reclaim && !trans)
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 btrfs_wait_ordered_extents(root, 0, 0);
> =A0 =A0 =A0 =A0return reclaimed >=3D to_reclaim;
> =A0}
>
> @@ -3552,7 +3556,7 @@ again:
> =A0 =A0 =A0 =A0 * We do synchronous shrinking since we don't actually=
 unreserve
> =A0 =A0 =A0 =A0 * metadata until after the IO is completed.
> =A0 =A0 =A0 =A0 */
> - =A0 =A0 =A0 ret =3D shrink_delalloc(trans, root, num_bytes, 1);
> + =A0 =A0 =A0 ret =3D shrink_delalloc(trans, root, num_bytes, retries=
);
> =A0 =A0 =A0 =A0if (ret < 0)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0goto out;
>
> @@ -3568,17 +3572,6 @@ again:
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0goto again;
> =A0 =A0 =A0 =A0}
>
> - =A0 =A0 =A0 /*
> - =A0 =A0 =A0 =A0* Not enough space to be reclaimed, don't bother com=
mitting the
> - =A0 =A0 =A0 =A0* transaction.
> - =A0 =A0 =A0 =A0*/
> - =A0 =A0 =A0 spin_lock(&space_info->lock);
> - =A0 =A0 =A0 if (space_info->bytes_pinned < orig_bytes)
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 ret =3D -ENOSPC;
> - =A0 =A0 =A0 spin_unlock(&space_info->lock);
> - =A0 =A0 =A0 if (ret)
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 goto out;
> -
> =A0 =A0 =A0 =A0ret =3D -EAGAIN;
> =A0 =A0 =A0 =A0if (trans)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0goto out;
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index d6ba353..cb63904 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -782,7 +782,8 @@ static noinline int cow_file_range(struct inode *=
inode,
> =A0 =A0 =A0 =A0struct extent_map_tree *em_tree =3D &BTRFS_I(inode)->e=
xtent_tree;
> =A0 =A0 =A0 =A0int ret =3D 0;
>
> - =A0 =A0 =A0 BUG_ON(btrfs_is_free_space_inode(root, inode));
> + =A0 =A0 =A0 BUG_ON(root =3D=3D root->fs_info->tree_root);
> + =A0 =A0 =A0 BUG_ON(BTRFS_I(inode)->location.objectid =3D=3D BTRFS_F=
REE_INO_OBJECTID);
> =A0 =A0 =A0 =A0trans =3D btrfs_join_transaction(root);
> =A0 =A0 =A0 =A0BUG_ON(IS_ERR(trans));
> =A0 =A0 =A0 =A0trans->block_rsv =3D &root->fs_info->delalloc_block_rs=
v;
> @@ -2790,7 +2791,8 @@ static struct btrfs_trans_handle *__unlink_star=
t_trans(struct inode *dir,
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return ERR_PTR(-ENOMEM);
> =A0 =A0 =A0 =A0}
>
> - =A0 =A0 =A0 trans =3D btrfs_start_transaction(root, 0);
> + =A0 =A0 =A0 /* 1 for the orphan item */
> + =A0 =A0 =A0 trans =3D btrfs_start_transaction(root, 1);
> =A0 =A0 =A0 =A0if (IS_ERR(trans)) {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0btrfs_free_path(path);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0root->fs_info->enospc_unlink =3D 0;

Could it be, that the missing space for the orphan item, is the reason
for our warning?

[  105.209232] ------------[ cut here ]------------
[  105.214458] WARNING: at fs/btrfs/inode.c:2114
btrfs_orphan_commit_root+0xb0/0xc0 [btrfs]()
[  105.223794] Hardware name: ProLiant DL180 G6
[  105.228930] Modules linked in: btrfs zlib_deflate libcrc32c bonding
ipv6 serio_raw pcspkr ghes hed iTCO_wdt iTCO_vendor_support
i7core_edac edac_core ixgbe dca mdio iomemory_vsl(P) hpsa squashfs
[last unloaded: scsi_wait_scan]
[  105.253539] Pid: 1774, comm: kworker/0:2 Tainted: P
3.0.6-1.fits.2.el6.x86_64 #1
[  105.263015] Call Trace:
[  105.265956]  [<ffffffff8106344f>] warn_slowpath_common+0x7f/0xc0
[  105.272841]  [<ffffffff810634aa>] warn_slowpath_null+0x1a/0x20
[  105.279503]  [<ffffffffa022bef0>] btrfs_orphan_commit_root+0xb0/0xc0=
 [btrfs]
[  105.287564]  [<ffffffffa0226ce5>] commit_fs_roots+0xc5/0x1b0 [btrfs]
[  105.294824]  [<ffffffffa0227c36>]
btrfs_commit_transaction+0x3c6/0x820 [btrfs]
[  105.303044]  [<ffffffff810507c0>] ? __dequeue_entity+0x30/0x50
[  105.309745]  [<ffffffff81086410>] ? wake_up_bit+0x40/0x40
[  105.315944]  [<ffffffffa0228090>] ?
btrfs_commit_transaction+0x820/0x820 [btrfs]
[  105.324404]  [<ffffffffa02280af>] do_async_commit+0x1f/0x30 [btrfs]
[  105.331590]  [<ffffffff8107e8b8>] process_one_work+0x128/0x450
[  105.338291]  [<ffffffff810816cb>] worker_thread+0x17b/0x3c0
[  105.344708]  [<ffffffff81081550>] ? manage_workers+0x220/0x220
[  105.351407]  [<ffffffff81085d96>] kthread+0x96/0xa0
[  105.357040]  [<ffffffff815639c4>] kernel_thread_helper+0x4/0x10
[  105.363824]  [<ffffffff81085d00>] ? kthread_worker_fn+0x1a0/0x1a0
[  105.370776]  [<ffffffff815639c0>] ? gs_change+0x13/0x13
[  105.376771] ---[ end trace 144230b62b45be67 ]---

Thanks,
Christian

> @@ -2901,6 +2903,11 @@ out:
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return ERR_PTR(err);
> =A0 =A0 =A0 =A0}
>
> + =A0 =A0 =A0 ret =3D btrfs_block_rsv_migrate(trans->block_rsv,
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
 =A0 &root->fs_info->global_block_rsv,
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
 =A0 btrfs_calc_trans_metadata_size(root, 1));
> + =A0 =A0 =A0 BUG_ON(ret);
> +
> =A0 =A0 =A0 =A0trans->block_rsv =3D &root->fs_info->global_block_rsv;
> =A0 =A0 =A0 =A0return trans;
> =A0}
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs=
" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2011-10-13 15:03 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-09-26 21:22 [PATCH] Btrfs: allow us to overcommit our enospc reservations TEST THIS PLEASE!!! Josef Bacik
2011-10-11 17:33 ` Mitch Harder
2011-10-11 17:43   ` Josef Bacik
2011-10-11 18:27   ` Josef Bacik
2011-10-11 19:00   ` Josef Bacik
2011-10-11 19:44     ` Mitch Harder
2011-10-11 20:01       ` Josef Bacik
2011-10-11 20:45         ` Mitch Harder
2011-10-12 17:50           ` Josef Bacik
2011-10-12 20:45             ` Mitch Harder
2011-10-13 12:57               ` Josef Bacik
2011-10-13 15:03                 ` Christian Brunner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).