linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* mkfs.btrfs on 24 disks in parallel crashes kernel
@ 2010-11-08  7:22 Albert Strasheim
  2010-11-09  4:10 ` Ian Kent
  0 siblings, 1 reply; 7+ messages in thread
From: Albert Strasheim @ 2010-11-08  7:22 UTC (permalink / raw)
  To: linux-btrfs

Hello all

I did some experiments on Fedora 14 with 2.6.35.6, running mkfs.btrfs
followed by a mount in parallel on 24 disks.

This seems to crash reliably.

I reported the bug to Fedora here:

https://bugzilla.redhat.com/show_bug.cgi?id=650261

The bug report has some stacktraces and stuff attached.

I saw stuff like this:

[  203.861368] Btrfs loaded
[  203.871418] device label myvol-12b0f8ba-e1ef-4d00-aae4-578d59d955e4
devid 1 transid 7 /dev/sdv
[  203.981371] device label myvol-fe7fdbcd-8f2f-4b34-aa6b-0d3bb7582ebc
devid 1 transid 7 /dev/sdl
[  203.990234] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000128
[  204.001153] IP: [<ffffffffa019a010>] btrfs_test_super+0x10/0x26 [btrfs]
[  204.016840] PGD 0
[  204.018696] Oops: 0000 [#1] SMP
[  204.021264] last sysfs file:
/sys/devices/pci0000:00/0000:00:05.0/0000:0d:00.0/host7/port-7:0/expander-7:0/port-7:0:9/end_device-7:0:9/target7:0:9/7:0:9:0/uevent
[  204.045966] CPU 0
[  204.055933] Modules linked in: btrfs zlib_deflate libcrc32c ipv6
mlx4_ib ib_mad ib_core mlx4_en igb mlx4_core ses iTCO_wdt ioatdma dca
iTCO_vendor_support i7core_edac edac_core enclosure i2c_i801 i2c_core
microcode joydev serio_raw mptsas mptscsih mptbase scsi_transport_sas
[last unloaded: scsi_wait_scan]
[  204.100672]
[  204.103025] Pid: 2166, comm: mount Not tainted
2.6.35.6-48.fc14.x86_64 #1 ASSY,BLADE,X6270      /SUN BLADE X6270
SERVER MODULE
[  204.121895] RIP: 0010:[<ffffffffa019a010>]  [<ffffffffa019a010>]
btrfs_test_super+0x10/0x26 [btrfs]
[  204.139279] RSP: 0018:ffff8803768ddcd8  EFLAGS: 00010287
[  204.144675] RAX: 0000000000000000 RBX: ffff8803772df800 RCX: ffff8801f7d34480
[  204.160885] RDX: 0000000000000120 RSI: ffff8801f7d34480 RDI: ffff8803772df800
[  204.179473] RBP: ffff8803768ddcd8 R08: ffff8801f7d344f8 R09: ffff880375c78760
[  204.185673] R10: ffff8803768ddb68 R11: ffff8801f7d34480 R12: ffffffffa01f13d0
[  204.200174] R13: ffffffffa019a000 R14: ffff8801f7d34480 R15: ffffffffa01f1400
[  204.221347] FS:  00007f766050d800(0000) GS:ffff880022200000(0000)
knlGS:0000000000000000
[  204.226597] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  204.238144] CR2: 0000000000000128 CR3: 00000001f7504000 CR4: 00000000000006f0
[  204.256446] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  204.263975] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  204.280097] Process mount (pid: 2166, threadinfo ffff8803768dc000,
task ffff880376ab8000)
[  204.296529] Stack:
[  204.298975]  ffff8803768ddd38 ffffffff81118df6 ffffffffa01f13d0
0000000000000000
[  204.305955] <0> ffffffff811184a7 0000000000000000 ffff8803768ddd38
0000000000000003
[  204.320999] <0> ffffffffa01f13d0 0000000000000000 ffff880375c78680
ffff8801f4f59d00
[  204.338658] Call Trace:
[  204.341138]  [<ffffffff81118df6>] sget+0x54/0x367
[  204.355474]  [<ffffffff811184a7>] ? set_anon_super+0x0/0xe7
[  204.360487]  [<ffffffffa019a821>] btrfs_get_sb+0x108/0x3eb [btrfs]
[  204.375799]  [<ffffffff81118b99>] vfs_kern_mount+0xad/0x1ac
[  204.379952]  [<ffffffff81118d00>] do_kern_mount+0x4d/0xef
[  204.395757]  [<ffffffff8112e45a>] do_mount+0x700/0x75d
[  204.400676]  [<ffffffff810e5aec>] ? strndup_user+0x54/0x85
[  204.417219]  [<ffffffff8112e6e7>] sys_mount+0x88/0xc2
[  204.420399]  [<ffffffff81009cf2>] system_call_fastpath+0x16/0x1b
[  204.435716] Code: <48> 8b 80 28 01 00 00 48 39 b0 28 22 00 00 c9 0f
94 c0 0f b6 c0 c3
[  204.447369] RIP  [<ffffffffa019a010>] btrfs_test_super+0x10/0x26 [btrfs]
[  204.458280]  RSP <ffff8803768ddcd8>
[  204.462330] CR2: 0000000000000128
[  204.466479] ---[ end trace c8bb842fa664d021 ]---
[  204.498273] device label myvol-11d50321-7482-4ee2-8da5-5ee5f623ae17
devid 1 transid 7 /dev/sdk

Regards,

Albert

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mkfs.btrfs on 24 disks in parallel crashes kernel
  2010-11-08  7:22 mkfs.btrfs on 24 disks in parallel crashes kernel Albert Strasheim
@ 2010-11-09  4:10 ` Ian Kent
  2010-11-09  7:52   ` Ian Kent
  0 siblings, 1 reply; 7+ messages in thread
From: Ian Kent @ 2010-11-09  4:10 UTC (permalink / raw)
  To: Albert Strasheim; +Cc: linux-btrfs

On Mon, 2010-11-08 at 09:22 +0200, Albert Strasheim wrote:
> Hello all
> 
> I did some experiments on Fedora 14 with 2.6.35.6, running mkfs.btrfs
> followed by a mount in parallel on 24 disks.
> 
> This seems to crash reliably.
> 
> I reported the bug to Fedora here:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=650261
> 
> The bug report has some stacktraces and stuff attached.
> 
> I saw stuff like this:
> 
> [  203.861368] Btrfs loaded
> [  203.871418] device label myvol-12b0f8ba-e1ef-4d00-aae4-578d59d955e4
> devid 1 transid 7 /dev/sdv
> [  203.981371] device label myvol-fe7fdbcd-8f2f-4b34-aa6b-0d3bb7582ebc
> devid 1 transid 7 /dev/sdl
> [  203.990234] BUG: unable to handle kernel NULL pointer dereference
> at 0000000000000128
> [  204.001153] IP: [<ffffffffa019a010>] btrfs_test_super+0x10/0x26 [btrfs]
> [  204.016840] PGD 0
> [  204.018696] Oops: 0000 [#1] SMP
> [  204.021264] last sysfs file:
> /sys/devices/pci0000:00/0000:00:05.0/0000:0d:00.0/host7/port-7:0/expander-7:0/port-7:0:9/end_device-7:0:9/target7:0:9/7:0:9:0/uevent
> [  204.045966] CPU 0
> [  204.055933] Modules linked in: btrfs zlib_deflate libcrc32c ipv6
> mlx4_ib ib_mad ib_core mlx4_en igb mlx4_core ses iTCO_wdt ioatdma dca
> iTCO_vendor_support i7core_edac edac_core enclosure i2c_i801 i2c_core
> microcode joydev serio_raw mptsas mptscsih mptbase scsi_transport_sas
> [last unloaded: scsi_wait_scan]
> [  204.100672]
> [  204.103025] Pid: 2166, comm: mount Not tainted
> 2.6.35.6-48.fc14.x86_64 #1 ASSY,BLADE,X6270      /SUN BLADE X6270
> SERVER MODULE
> [  204.121895] RIP: 0010:[<ffffffffa019a010>]  [<ffffffffa019a010>]
> btrfs_test_super+0x10/0x26 [btrfs]
> [  204.139279] RSP: 0018:ffff8803768ddcd8  EFLAGS: 00010287
> [  204.144675] RAX: 0000000000000000 RBX: ffff8803772df800 RCX: ffff8801f7d34480
> [  204.160885] RDX: 0000000000000120 RSI: ffff8801f7d34480 RDI: ffff8803772df800
> [  204.179473] RBP: ffff8803768ddcd8 R08: ffff8801f7d344f8 R09: ffff880375c78760
> [  204.185673] R10: ffff8803768ddb68 R11: ffff8801f7d34480 R12: ffffffffa01f13d0
> [  204.200174] R13: ffffffffa019a000 R14: ffff8801f7d34480 R15: ffffffffa01f1400
> [  204.221347] FS:  00007f766050d800(0000) GS:ffff880022200000(0000)
> knlGS:0000000000000000
> [  204.226597] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  204.238144] CR2: 0000000000000128 CR3: 00000001f7504000 CR4: 00000000000006f0
> [  204.256446] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  204.263975] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [  204.280097] Process mount (pid: 2166, threadinfo ffff8803768dc000,
> task ffff880376ab8000)
> [  204.296529] Stack:
> [  204.298975]  ffff8803768ddd38 ffffffff81118df6 ffffffffa01f13d0
> 0000000000000000
> [  204.305955] <0> ffffffff811184a7 0000000000000000 ffff8803768ddd38
> 0000000000000003
> [  204.320999] <0> ffffffffa01f13d0 0000000000000000 ffff880375c78680
> ffff8801f4f59d00
> [  204.338658] Call Trace:
> [  204.341138]  [<ffffffff81118df6>] sget+0x54/0x367
> [  204.355474]  [<ffffffff811184a7>] ? set_anon_super+0x0/0xe7
> [  204.360487]  [<ffffffffa019a821>] btrfs_get_sb+0x108/0x3eb [btrfs]
> [  204.375799]  [<ffffffff81118b99>] vfs_kern_mount+0xad/0x1ac
> [  204.379952]  [<ffffffff81118d00>] do_kern_mount+0x4d/0xef
> [  204.395757]  [<ffffffff8112e45a>] do_mount+0x700/0x75d
> [  204.400676]  [<ffffffff810e5aec>] ? strndup_user+0x54/0x85
> [  204.417219]  [<ffffffff8112e6e7>] sys_mount+0x88/0xc2
> [  204.420399]  [<ffffffff81009cf2>] system_call_fastpath+0x16/0x1b
> [  204.435716] Code: <48> 8b 80 28 01 00 00 48 39 b0 28 22 00 00 c9 0f
> 94 c0 0f b6 c0 c3
> [  204.447369] RIP  [<ffffffffa019a010>] btrfs_test_super+0x10/0x26 [btrfs]
> [  204.458280]  RSP <ffff8803768ddcd8>
> [  204.462330] CR2: 0000000000000128
> [  204.466479] ---[ end trace c8bb842fa664d021 ]---
> [  204.498273] device label myvol-11d50321-7482-4ee2-8da5-5ee5f623ae17
> devid 1 transid 7 /dev/sdk

This looks like a fairly obvious race.
How about some comments on the suitability of this totally untested
patch?

I thought about trying to merge the tests following sget() (those
surrounded by the mutex) into btrfs_fill_super() but that seemed a bit
too hard. Maybe someone can advise on how that should be done if this
approach is not OK.

btrfs - fix race in btrfs_get_sb()

From: Ian Kent <raven@themaw.net>

When mounting a btrfs file system btrfs_test_super() may attempt to
use sb->s_fs_info before it is set. This is because it isn't set until
btrfs_fill_super() is called but the super block creation locks are
dropped earlier during sget().

Also, for the same reason, it looks like there is a possible race
with the s->s_root check which should also be dealt with by this
patch.
---

 fs/btrfs/super.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)


diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 718b10d..9b463b9 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -565,6 +565,8 @@ static int btrfs_test_super(struct super_block *s, void *data)
 	struct btrfs_fs_devices *test_fs_devices = data;
 	struct btrfs_root *root = btrfs_sb(s);
 
+	if (!root)
+		return 0;
 	return root->fs_info->fs_devices == test_fs_devices;
 }
 
@@ -585,6 +587,7 @@ static int btrfs_get_sb(struct file_system_type *fs_type, int flags,
 	char *subvol_name = NULL;
 	u64 subvol_objectid = 0;
 	int error = 0;
+	DEFINE_MUTEX(super_mutex);
 
 	if (!(flags & MS_RDONLY))
 		mode |= FMODE_WRITE;
@@ -613,8 +616,10 @@ static int btrfs_get_sb(struct file_system_type *fs_type, int flags,
 	if (IS_ERR(s))
 		goto error_s;
 
+	mutex_lock(&super_mutex);
 	if (s->s_root) {
 		if ((flags ^ s->s_flags) & MS_RDONLY) {
+			mutex_unlock(&super_mutex);
 			deactivate_locked_super(s);
 			error = -EBUSY;
 			goto error_close_devices;
@@ -629,6 +634,7 @@ static int btrfs_get_sb(struct file_system_type *fs_type, int flags,
 		error = btrfs_fill_super(s, fs_devices, data,
 					 flags & MS_SILENT ? 1 : 0);
 		if (error) {
+			mutex_unlock(&super_mutex);
 			deactivate_locked_super(s);
 			goto error_free_subvol_name;
 		}
@@ -636,6 +642,7 @@ static int btrfs_get_sb(struct file_system_type *fs_type, int flags,
 		btrfs_sb(s)->fs_info->bdev_holder = fs_type;
 		s->s_flags |= MS_ACTIVE;
 	}
+	mutex_unlock(&super_mutex);
 
 	root = get_default_root(s, subvol_objectid);
 	if (IS_ERR(root)) {



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: mkfs.btrfs on 24 disks in parallel crashes kernel
  2010-11-09  4:10 ` Ian Kent
@ 2010-11-09  7:52   ` Ian Kent
  2010-11-15 14:21     ` Ian Kent
  0 siblings, 1 reply; 7+ messages in thread
From: Ian Kent @ 2010-11-09  7:52 UTC (permalink / raw)
  To: Albert Strasheim; +Cc: linux-btrfs

On Tue, 2010-11-09 at 12:10 +0800, Ian Kent wrote:
> On Mon, 2010-11-08 at 09:22 +0200, Albert Strasheim wrote:
> > Hello all
> > 
> > I did some experiments on Fedora 14 with 2.6.35.6, running mkfs.btrfs
> > followed by a mount in parallel on 24 disks.
> > 
> > This seems to crash reliably.
> > 
> > I reported the bug to Fedora here:
> > 
> > https://bugzilla.redhat.com/show_bug.cgi?id=650261
> > 
> > The bug report has some stacktraces and stuff attached.
> > 
> > I saw stuff like this:
> > 
> > [  203.861368] Btrfs loaded
> > [  203.871418] device label myvol-12b0f8ba-e1ef-4d00-aae4-578d59d955e4
> > devid 1 transid 7 /dev/sdv
> > [  203.981371] device label myvol-fe7fdbcd-8f2f-4b34-aa6b-0d3bb7582ebc
> > devid 1 transid 7 /dev/sdl
> > [  203.990234] BUG: unable to handle kernel NULL pointer dereference
> > at 0000000000000128
> > [  204.001153] IP: [<ffffffffa019a010>] btrfs_test_super+0x10/0x26 [btrfs]
> > [  204.016840] PGD 0
> > [  204.018696] Oops: 0000 [#1] SMP
> > [  204.021264] last sysfs file:
> > /sys/devices/pci0000:00/0000:00:05.0/0000:0d:00.0/host7/port-7:0/expander-7:0/port-7:0:9/end_device-7:0:9/target7:0:9/7:0:9:0/uevent
> > [  204.045966] CPU 0
> > [  204.055933] Modules linked in: btrfs zlib_deflate libcrc32c ipv6
> > mlx4_ib ib_mad ib_core mlx4_en igb mlx4_core ses iTCO_wdt ioatdma dca
> > iTCO_vendor_support i7core_edac edac_core enclosure i2c_i801 i2c_core
> > microcode joydev serio_raw mptsas mptscsih mptbase scsi_transport_sas
> > [last unloaded: scsi_wait_scan]
> > [  204.100672]
> > [  204.103025] Pid: 2166, comm: mount Not tainted
> > 2.6.35.6-48.fc14.x86_64 #1 ASSY,BLADE,X6270      /SUN BLADE X6270
> > SERVER MODULE
> > [  204.121895] RIP: 0010:[<ffffffffa019a010>]  [<ffffffffa019a010>]
> > btrfs_test_super+0x10/0x26 [btrfs]
> > [  204.139279] RSP: 0018:ffff8803768ddcd8  EFLAGS: 00010287
> > [  204.144675] RAX: 0000000000000000 RBX: ffff8803772df800 RCX: ffff8801f7d34480
> > [  204.160885] RDX: 0000000000000120 RSI: ffff8801f7d34480 RDI: ffff8803772df800
> > [  204.179473] RBP: ffff8803768ddcd8 R08: ffff8801f7d344f8 R09: ffff880375c78760
> > [  204.185673] R10: ffff8803768ddb68 R11: ffff8801f7d34480 R12: ffffffffa01f13d0
> > [  204.200174] R13: ffffffffa019a000 R14: ffff8801f7d34480 R15: ffffffffa01f1400
> > [  204.221347] FS:  00007f766050d800(0000) GS:ffff880022200000(0000)
> > knlGS:0000000000000000
> > [  204.226597] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > [  204.238144] CR2: 0000000000000128 CR3: 00000001f7504000 CR4: 00000000000006f0
> > [  204.256446] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [  204.263975] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > [  204.280097] Process mount (pid: 2166, threadinfo ffff8803768dc000,
> > task ffff880376ab8000)
> > [  204.296529] Stack:
> > [  204.298975]  ffff8803768ddd38 ffffffff81118df6 ffffffffa01f13d0
> > 0000000000000000
> > [  204.305955] <0> ffffffff811184a7 0000000000000000 ffff8803768ddd38
> > 0000000000000003
> > [  204.320999] <0> ffffffffa01f13d0 0000000000000000 ffff880375c78680
> > ffff8801f4f59d00
> > [  204.338658] Call Trace:
> > [  204.341138]  [<ffffffff81118df6>] sget+0x54/0x367
> > [  204.355474]  [<ffffffff811184a7>] ? set_anon_super+0x0/0xe7
> > [  204.360487]  [<ffffffffa019a821>] btrfs_get_sb+0x108/0x3eb [btrfs]
> > [  204.375799]  [<ffffffff81118b99>] vfs_kern_mount+0xad/0x1ac
> > [  204.379952]  [<ffffffff81118d00>] do_kern_mount+0x4d/0xef
> > [  204.395757]  [<ffffffff8112e45a>] do_mount+0x700/0x75d
> > [  204.400676]  [<ffffffff810e5aec>] ? strndup_user+0x54/0x85
> > [  204.417219]  [<ffffffff8112e6e7>] sys_mount+0x88/0xc2
> > [  204.420399]  [<ffffffff81009cf2>] system_call_fastpath+0x16/0x1b
> > [  204.435716] Code: <48> 8b 80 28 01 00 00 48 39 b0 28 22 00 00 c9 0f
> > 94 c0 0f b6 c0 c3
> > [  204.447369] RIP  [<ffffffffa019a010>] btrfs_test_super+0x10/0x26 [btrfs]
> > [  204.458280]  RSP <ffff8803768ddcd8>
> > [  204.462330] CR2: 0000000000000128
> > [  204.466479] ---[ end trace c8bb842fa664d021 ]---
> > [  204.498273] device label myvol-11d50321-7482-4ee2-8da5-5ee5f623ae17
> > devid 1 transid 7 /dev/sdk
> 
> This looks like a fairly obvious race.
> How about some comments on the suitability of this totally untested
> patch?
> 
> I thought about trying to merge the tests following sget() (those
> surrounded by the mutex) into btrfs_fill_super() but that seemed a bit
> too hard. Maybe someone can advise on how that should be done if this
> approach is not OK.

Hahaha, excuse me, of course this is not ok!
A function local mutex won't do anything to resolve this.

For the sake of discussion let's just assume that it has been defined as
a static global in fs/btrfs/super.c instead.

> 
> btrfs - fix race in btrfs_get_sb()
> 
> From: Ian Kent <raven@themaw.net>
> 
> When mounting a btrfs file system btrfs_test_super() may attempt to
> use sb->s_fs_info before it is set. This is because it isn't set until
> btrfs_fill_super() is called but the super block creation locks are
> dropped earlier during sget().
> 
> Also, for the same reason, it looks like there is a possible race
> with the s->s_root check which should also be dealt with by this
> patch.
> ---
> 
>  fs/btrfs/super.c |    7 +++++++
>  1 files changed, 7 insertions(+), 0 deletions(-)
> 
> 
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index 718b10d..9b463b9 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -565,6 +565,8 @@ static int btrfs_test_super(struct super_block *s, void *data)
>  	struct btrfs_fs_devices *test_fs_devices = data;
>  	struct btrfs_root *root = btrfs_sb(s);
>  
> +	if (!root)
> +		return 0;
>  	return root->fs_info->fs_devices == test_fs_devices;
>  }
>  
> @@ -585,6 +587,7 @@ static int btrfs_get_sb(struct file_system_type *fs_type, int flags,
>  	char *subvol_name = NULL;
>  	u64 subvol_objectid = 0;
>  	int error = 0;
> +	DEFINE_MUTEX(super_mutex);
>  
>  	if (!(flags & MS_RDONLY))
>  		mode |= FMODE_WRITE;
> @@ -613,8 +616,10 @@ static int btrfs_get_sb(struct file_system_type *fs_type, int flags,
>  	if (IS_ERR(s))
>  		goto error_s;
>  
> +	mutex_lock(&super_mutex);
>  	if (s->s_root) {
>  		if ((flags ^ s->s_flags) & MS_RDONLY) {
> +			mutex_unlock(&super_mutex);
>  			deactivate_locked_super(s);
>  			error = -EBUSY;
>  			goto error_close_devices;
> @@ -629,6 +634,7 @@ static int btrfs_get_sb(struct file_system_type *fs_type, int flags,
>  		error = btrfs_fill_super(s, fs_devices, data,
>  					 flags & MS_SILENT ? 1 : 0);
>  		if (error) {
> +			mutex_unlock(&super_mutex);
>  			deactivate_locked_super(s);
>  			goto error_free_subvol_name;
>  		}
> @@ -636,6 +642,7 @@ static int btrfs_get_sb(struct file_system_type *fs_type, int flags,
>  		btrfs_sb(s)->fs_info->bdev_holder = fs_type;
>  		s->s_flags |= MS_ACTIVE;
>  	}
> +	mutex_unlock(&super_mutex);
>  
>  	root = get_default_root(s, subvol_objectid);
>  	if (IS_ERR(root)) {
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mkfs.btrfs on 24 disks in parallel crashes kernel
  2010-11-09  7:52   ` Ian Kent
@ 2010-11-15 14:21     ` Ian Kent
  2010-11-15 16:48       ` Josef Bacik
  0 siblings, 1 reply; 7+ messages in thread
From: Ian Kent @ 2010-11-15 14:21 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs, Albert Strasheim

On Tue, 2010-11-09 at 15:52 +0800, Ian Kent wrote:

Chris,

> On Tue, 2010-11-09 at 12:10 +0800, Ian Kent wrote:
> > On Mon, 2010-11-08 at 09:22 +0200, Albert Strasheim wrote:
> > > Hello all
> > > 
> > > I did some experiments on Fedora 14 with 2.6.35.6, running mkfs.btrfs
> > > followed by a mount in parallel on 24 disks.
> > > 
> > > This seems to crash reliably.
> > > 
> > > I reported the bug to Fedora here:
> > > 
> > > https://bugzilla.redhat.com/show_bug.cgi?id=650261
> > > 
> > > The bug report has some stacktraces and stuff attached.
> > > 
> > > I saw stuff like this:
> > > 
> > > [  203.861368] Btrfs loaded
> > > [  203.871418] device label myvol-12b0f8ba-e1ef-4d00-aae4-578d59d955e4
> > > devid 1 transid 7 /dev/sdv
> > > [  203.981371] device label myvol-fe7fdbcd-8f2f-4b34-aa6b-0d3bb7582ebc
> > > devid 1 transid 7 /dev/sdl
> > > [  203.990234] BUG: unable to handle kernel NULL pointer dereference
> > > at 0000000000000128
> > > [  204.001153] IP: [<ffffffffa019a010>] btrfs_test_super+0x10/0x26 [btrfs]
> > > [  204.016840] PGD 0
> > > [  204.018696] Oops: 0000 [#1] SMP
> > > [  204.021264] last sysfs file:
> > > /sys/devices/pci0000:00/0000:00:05.0/0000:0d:00.0/host7/port-7:0/expander-7:0/port-7:0:9/end_device-7:0:9/target7:0:9/7:0:9:0/uevent
> > > [  204.045966] CPU 0
> > > [  204.055933] Modules linked in: btrfs zlib_deflate libcrc32c ipv6
> > > mlx4_ib ib_mad ib_core mlx4_en igb mlx4_core ses iTCO_wdt ioatdma dca
> > > iTCO_vendor_support i7core_edac edac_core enclosure i2c_i801 i2c_core
> > > microcode joydev serio_raw mptsas mptscsih mptbase scsi_transport_sas
> > > [last unloaded: scsi_wait_scan]
> > > [  204.100672]
> > > [  204.103025] Pid: 2166, comm: mount Not tainted
> > > 2.6.35.6-48.fc14.x86_64 #1 ASSY,BLADE,X6270      /SUN BLADE X6270
> > > SERVER MODULE
> > > [  204.121895] RIP: 0010:[<ffffffffa019a010>]  [<ffffffffa019a010>]
> > > btrfs_test_super+0x10/0x26 [btrfs]
> > > [  204.139279] RSP: 0018:ffff8803768ddcd8  EFLAGS: 00010287
> > > [  204.144675] RAX: 0000000000000000 RBX: ffff8803772df800 RCX: ffff8801f7d34480
> > > [  204.160885] RDX: 0000000000000120 RSI: ffff8801f7d34480 RDI: ffff8803772df800
> > > [  204.179473] RBP: ffff8803768ddcd8 R08: ffff8801f7d344f8 R09: ffff880375c78760
> > > [  204.185673] R10: ffff8803768ddb68 R11: ffff8801f7d34480 R12: ffffffffa01f13d0
> > > [  204.200174] R13: ffffffffa019a000 R14: ffff8801f7d34480 R15: ffffffffa01f1400
> > > [  204.221347] FS:  00007f766050d800(0000) GS:ffff880022200000(0000)
> > > knlGS:0000000000000000
> > > [  204.226597] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > > [  204.238144] CR2: 0000000000000128 CR3: 00000001f7504000 CR4: 00000000000006f0
> > > [  204.256446] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > [  204.263975] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > > [  204.280097] Process mount (pid: 2166, threadinfo ffff8803768dc000,
> > > task ffff880376ab8000)
> > > [  204.296529] Stack:
> > > [  204.298975]  ffff8803768ddd38 ffffffff81118df6 ffffffffa01f13d0
> > > 0000000000000000
> > > [  204.305955] <0> ffffffff811184a7 0000000000000000 ffff8803768ddd38
> > > 0000000000000003
> > > [  204.320999] <0> ffffffffa01f13d0 0000000000000000 ffff880375c78680
> > > ffff8801f4f59d00
> > > [  204.338658] Call Trace:
> > > [  204.341138]  [<ffffffff81118df6>] sget+0x54/0x367
> > > [  204.355474]  [<ffffffff811184a7>] ? set_anon_super+0x0/0xe7
> > > [  204.360487]  [<ffffffffa019a821>] btrfs_get_sb+0x108/0x3eb [btrfs]
> > > [  204.375799]  [<ffffffff81118b99>] vfs_kern_mount+0xad/0x1ac
> > > [  204.379952]  [<ffffffff81118d00>] do_kern_mount+0x4d/0xef
> > > [  204.395757]  [<ffffffff8112e45a>] do_mount+0x700/0x75d
> > > [  204.400676]  [<ffffffff810e5aec>] ? strndup_user+0x54/0x85
> > > [  204.417219]  [<ffffffff8112e6e7>] sys_mount+0x88/0xc2
> > > [  204.420399]  [<ffffffff81009cf2>] system_call_fastpath+0x16/0x1b
> > > [  204.435716] Code: <48> 8b 80 28 01 00 00 48 39 b0 28 22 00 00 c9 0f
> > > 94 c0 0f b6 c0 c3
> > > [  204.447369] RIP  [<ffffffffa019a010>] btrfs_test_super+0x10/0x26 [btrfs]
> > > [  204.458280]  RSP <ffff8803768ddcd8>
> > > [  204.462330] CR2: 0000000000000128
> > > [  204.466479] ---[ end trace c8bb842fa664d021 ]---
> > > [  204.498273] device label myvol-11d50321-7482-4ee2-8da5-5ee5f623ae17
> > > devid 1 transid 7 /dev/sdk
> > 
> > This looks like a fairly obvious race.
> > How about some comments on the suitability of this totally untested
> > patch?
> > 
> > I thought about trying to merge the tests following sget() (those
> > surrounded by the mutex) into btrfs_fill_super() but that seemed a bit
> > too hard. Maybe someone can advise on how that should be done if this
> > approach is not OK.
> 
> Hahaha, excuse me, of course this is not ok!
> A function local mutex won't do anything to resolve this.
> 
> For the sake of discussion let's just assume that it has been defined as
> a static global in fs/btrfs/super.c instead.

In investigating the traces here I've discovered a few interesting
things.

First, most of these traces appear to be from a deadlock between the BKL
and dev->bd_mutex in blkdev_{get,put}. The BKL has been removed from
these functions in 2.6.36 and, hopefully a simple patch will fix the
problem for earlier kernels. Hence not a problem to concern yourself
with.

The second issue (actually two issues) is with btrfs_get_sb(). First the
one from the trace here which is a use before set (in btrfs_test_super)
of the btrfs tree root. Anything that is to be used in the test function
must be set before returning from sget() as the super block is added to
the file system list of supers at the end of sget() and can be used from
that point onward. I've been looking at the code and have concluded that
trying to move the needed code into a set function, that would be used
instead of set_anon_super(), and passed to sget() is difficult and would
be quite messy. This brings me back to using a static global mutex
around the block of code similar to what I have in the patch I posted
here, which I think should also resolve the problem.

Could you have a look at this area of the code and offer your advice on
how you think it would be best to resolve it please.

Oh and the second problem I noticed was a possible use after clear of
the tree root. The put_super super block method is call a little before
the super block is removed from the file system list of supers leaving a
very small window for this to be used after it has been cleared. I think
it's easily fixed by adding a check of s->s_active being 0 in
btrfs_test_super(), as that will tell us the super block about to go
away. So, again not something for you to worry about as I'll forward
patches when the above is resolved.

> 
> > 
> > btrfs - fix race in btrfs_get_sb()
> > 
> > From: Ian Kent <raven@themaw.net>
> > 
> > When mounting a btrfs file system btrfs_test_super() may attempt to
> > use sb->s_fs_info before it is set. This is because it isn't set until
> > btrfs_fill_super() is called but the super block creation locks are
> > dropped earlier during sget().
> > 
> > Also, for the same reason, it looks like there is a possible race
> > with the s->s_root check which should also be dealt with by this
> > patch.
> > ---
> > 
> >  fs/btrfs/super.c |    7 +++++++
> >  1 files changed, 7 insertions(+), 0 deletions(-)
> > 
> > 
> > diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> > index 718b10d..9b463b9 100644
> > --- a/fs/btrfs/super.c
> > +++ b/fs/btrfs/super.c
> > @@ -565,6 +565,8 @@ static int btrfs_test_super(struct super_block *s, void *data)
> >  	struct btrfs_fs_devices *test_fs_devices = data;
> >  	struct btrfs_root *root = btrfs_sb(s);
> >  
> > +	if (!root)
> > +		return 0;
> >  	return root->fs_info->fs_devices == test_fs_devices;
> >  }
> >  
> > @@ -585,6 +587,7 @@ static int btrfs_get_sb(struct file_system_type *fs_type, int flags,
> >  	char *subvol_name = NULL;
> >  	u64 subvol_objectid = 0;
> >  	int error = 0;
> > +	DEFINE_MUTEX(super_mutex);
> >  
> >  	if (!(flags & MS_RDONLY))
> >  		mode |= FMODE_WRITE;
> > @@ -613,8 +616,10 @@ static int btrfs_get_sb(struct file_system_type *fs_type, int flags,
> >  	if (IS_ERR(s))
> >  		goto error_s;
> >  
> > +	mutex_lock(&super_mutex);
> >  	if (s->s_root) {
> >  		if ((flags ^ s->s_flags) & MS_RDONLY) {
> > +			mutex_unlock(&super_mutex);
> >  			deactivate_locked_super(s);
> >  			error = -EBUSY;
> >  			goto error_close_devices;
> > @@ -629,6 +634,7 @@ static int btrfs_get_sb(struct file_system_type *fs_type, int flags,
> >  		error = btrfs_fill_super(s, fs_devices, data,
> >  					 flags & MS_SILENT ? 1 : 0);
> >  		if (error) {
> > +			mutex_unlock(&super_mutex);
> >  			deactivate_locked_super(s);
> >  			goto error_free_subvol_name;
> >  		}
> > @@ -636,6 +642,7 @@ static int btrfs_get_sb(struct file_system_type *fs_type, int flags,
> >  		btrfs_sb(s)->fs_info->bdev_holder = fs_type;
> >  		s->s_flags |= MS_ACTIVE;
> >  	}
> > +	mutex_unlock(&super_mutex);
> >  
> >  	root = get_default_root(s, subvol_objectid);
> >  	if (IS_ERR(root)) {
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mkfs.btrfs on 24 disks in parallel crashes kernel
  2010-11-15 14:21     ` Ian Kent
@ 2010-11-15 16:48       ` Josef Bacik
  2010-11-15 17:52         ` Ian Kent
  2010-11-16  0:52         ` Li Zefan
  0 siblings, 2 replies; 7+ messages in thread
From: Josef Bacik @ 2010-11-15 16:48 UTC (permalink / raw)
  To: Ian Kent; +Cc: Chris Mason, linux-btrfs, Albert Strasheim

On Mon, Nov 15, 2010 at 10:21:42PM +0800, Ian Kent wrote:
> On Tue, 2010-11-09 at 15:52 +0800, Ian Kent wrote:
> 
> Chris,
> 
> > On Tue, 2010-11-09 at 12:10 +0800, Ian Kent wrote:
> > > On Mon, 2010-11-08 at 09:22 +0200, Albert Strasheim wrote:
> > > > Hello all
> > > > 
> > > > I did some experiments on Fedora 14 with 2.6.35.6, running mkfs.btrfs
> > > > followed by a mount in parallel on 24 disks.
> > > > 
> > > > This seems to crash reliably.
> > > > 
> > > > I reported the bug to Fedora here:
> > > > 
> > > > https://bugzilla.redhat.com/show_bug.cgi?id=650261
> > > > 
> > > > The bug report has some stacktraces and stuff attached.
> > > > 
> > > > I saw stuff like this:
> > > > 
> > > > [  203.861368] Btrfs loaded
> > > > [  203.871418] device label myvol-12b0f8ba-e1ef-4d00-aae4-578d59d955e4
> > > > devid 1 transid 7 /dev/sdv
> > > > [  203.981371] device label myvol-fe7fdbcd-8f2f-4b34-aa6b-0d3bb7582ebc
> > > > devid 1 transid 7 /dev/sdl
> > > > [  203.990234] BUG: unable to handle kernel NULL pointer dereference
> > > > at 0000000000000128
> > > > [  204.001153] IP: [<ffffffffa019a010>] btrfs_test_super+0x10/0x26 [btrfs]
> > > > [  204.016840] PGD 0
> > > > [  204.018696] Oops: 0000 [#1] SMP
> > > > [  204.021264] last sysfs file:
> > > > /sys/devices/pci0000:00/0000:00:05.0/0000:0d:00.0/host7/port-7:0/expander-7:0/port-7:0:9/end_device-7:0:9/target7:0:9/7:0:9:0/uevent
> > > > [  204.045966] CPU 0
> > > > [  204.055933] Modules linked in: btrfs zlib_deflate libcrc32c ipv6
> > > > mlx4_ib ib_mad ib_core mlx4_en igb mlx4_core ses iTCO_wdt ioatdma dca
> > > > iTCO_vendor_support i7core_edac edac_core enclosure i2c_i801 i2c_core
> > > > microcode joydev serio_raw mptsas mptscsih mptbase scsi_transport_sas
> > > > [last unloaded: scsi_wait_scan]
> > > > [  204.100672]
> > > > [  204.103025] Pid: 2166, comm: mount Not tainted
> > > > 2.6.35.6-48.fc14.x86_64 #1 ASSY,BLADE,X6270      /SUN BLADE X6270
> > > > SERVER MODULE
> > > > [  204.121895] RIP: 0010:[<ffffffffa019a010>]  [<ffffffffa019a010>]
> > > > btrfs_test_super+0x10/0x26 [btrfs]
> > > > [  204.139279] RSP: 0018:ffff8803768ddcd8  EFLAGS: 00010287
> > > > [  204.144675] RAX: 0000000000000000 RBX: ffff8803772df800 RCX: ffff8801f7d34480
> > > > [  204.160885] RDX: 0000000000000120 RSI: ffff8801f7d34480 RDI: ffff8803772df800
> > > > [  204.179473] RBP: ffff8803768ddcd8 R08: ffff8801f7d344f8 R09: ffff880375c78760
> > > > [  204.185673] R10: ffff8803768ddb68 R11: ffff8801f7d34480 R12: ffffffffa01f13d0
> > > > [  204.200174] R13: ffffffffa019a000 R14: ffff8801f7d34480 R15: ffffffffa01f1400
> > > > [  204.221347] FS:  00007f766050d800(0000) GS:ffff880022200000(0000)
> > > > knlGS:0000000000000000
> > > > [  204.226597] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > > > [  204.238144] CR2: 0000000000000128 CR3: 00000001f7504000 CR4: 00000000000006f0
> > > > [  204.256446] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > [  204.263975] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > > > [  204.280097] Process mount (pid: 2166, threadinfo ffff8803768dc000,
> > > > task ffff880376ab8000)
> > > > [  204.296529] Stack:
> > > > [  204.298975]  ffff8803768ddd38 ffffffff81118df6 ffffffffa01f13d0
> > > > 0000000000000000
> > > > [  204.305955] <0> ffffffff811184a7 0000000000000000 ffff8803768ddd38
> > > > 0000000000000003
> > > > [  204.320999] <0> ffffffffa01f13d0 0000000000000000 ffff880375c78680
> > > > ffff8801f4f59d00
> > > > [  204.338658] Call Trace:
> > > > [  204.341138]  [<ffffffff81118df6>] sget+0x54/0x367
> > > > [  204.355474]  [<ffffffff811184a7>] ? set_anon_super+0x0/0xe7
> > > > [  204.360487]  [<ffffffffa019a821>] btrfs_get_sb+0x108/0x3eb [btrfs]
> > > > [  204.375799]  [<ffffffff81118b99>] vfs_kern_mount+0xad/0x1ac
> > > > [  204.379952]  [<ffffffff81118d00>] do_kern_mount+0x4d/0xef
> > > > [  204.395757]  [<ffffffff8112e45a>] do_mount+0x700/0x75d
> > > > [  204.400676]  [<ffffffff810e5aec>] ? strndup_user+0x54/0x85
> > > > [  204.417219]  [<ffffffff8112e6e7>] sys_mount+0x88/0xc2
> > > > [  204.420399]  [<ffffffff81009cf2>] system_call_fastpath+0x16/0x1b
> > > > [  204.435716] Code: <48> 8b 80 28 01 00 00 48 39 b0 28 22 00 00 c9 0f
> > > > 94 c0 0f b6 c0 c3
> > > > [  204.447369] RIP  [<ffffffffa019a010>] btrfs_test_super+0x10/0x26 [btrfs]
> > > > [  204.458280]  RSP <ffff8803768ddcd8>
> > > > [  204.462330] CR2: 0000000000000128
> > > > [  204.466479] ---[ end trace c8bb842fa664d021 ]---
> > > > [  204.498273] device label myvol-11d50321-7482-4ee2-8da5-5ee5f623ae17
> > > > devid 1 transid 7 /dev/sdk
> > > 
> > > This looks like a fairly obvious race.
> > > How about some comments on the suitability of this totally untested
> > > patch?
> > > 
> > > I thought about trying to merge the tests following sget() (those
> > > surrounded by the mutex) into btrfs_fill_super() but that seemed a bit
> > > too hard. Maybe someone can advise on how that should be done if this
> > > approach is not OK.
> > 
> > Hahaha, excuse me, of course this is not ok!
> > A function local mutex won't do anything to resolve this.
> > 
> > For the sake of discussion let's just assume that it has been defined as
> > a static global in fs/btrfs/super.c instead.
> 
> In investigating the traces here I've discovered a few interesting
> things.
> 
> First, most of these traces appear to be from a deadlock between the BKL
> and dev->bd_mutex in blkdev_{get,put}. The BKL has been removed from
> these functions in 2.6.36 and, hopefully a simple patch will fix the
> problem for earlier kernels. Hence not a problem to concern yourself
> with.
> 
> The second issue (actually two issues) is with btrfs_get_sb(). First the
> one from the trace here which is a use before set (in btrfs_test_super)
> of the btrfs tree root. Anything that is to be used in the test function
> must be set before returning from sget() as the super block is added to
> the file system list of supers at the end of sget() and can be used from
> that point onward. I've been looking at the code and have concluded that
> trying to move the needed code into a set function, that would be used
> instead of set_anon_super(), and passed to sget() is difficult and would
> be quite messy. This brings me back to using a static global mutex
> around the block of code similar to what I have in the patch I posted
> here, which I think should also resolve the problem.
> 

So I think we're on the same page for this problem, I've posted a test patch to
the bugzilla

https://bugzilla.redhat.com/show_bug.cgi?id=650261

Basically we just need to setup a blank root/fs_info so it can be set in
set_super, and then we use those blank structs in open_ctree.  Hopefully this
works, if it does I'll post it to the list for wider review.

> Could you have a look at this area of the code and offer your advice on
> how you think it would be best to resolve it please.
> 
> Oh and the second problem I noticed was a possible use after clear of
> the tree root. The put_super super block method is call a little before
> the super block is removed from the file system list of supers leaving a
> very small window for this to be used after it has been cleared. I think
> it's easily fixed by adding a check of s->s_active being 0 in
> btrfs_test_super(), as that will tell us the super block about to go
> away. So, again not something for you to worry about as I'll forward
> patches when the above is resolved.
> 

put_super should be called after kill_sb() which is what removes us from the
fs_supers list, so we should be ok here.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mkfs.btrfs on 24 disks in parallel crashes kernel
  2010-11-15 16:48       ` Josef Bacik
@ 2010-11-15 17:52         ` Ian Kent
  2010-11-16  0:52         ` Li Zefan
  1 sibling, 0 replies; 7+ messages in thread
From: Ian Kent @ 2010-11-15 17:52 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Chris Mason, linux-btrfs, Albert Strasheim

On Mon, 2010-11-15 at 11:48 -0500, Josef Bacik wrote:
> On Mon, Nov 15, 2010 at 10:21:42PM +0800, Ian Kent wrote:
> > On Tue, 2010-11-09 at 15:52 +0800, Ian Kent wrote:
> > 
> > Chris,
> > 
> > > On Tue, 2010-11-09 at 12:10 +0800, Ian Kent wrote:
> > > > On Mon, 2010-11-08 at 09:22 +0200, Albert Strasheim wrote:
> > > > > Hello all
> > > > > 
> > > > > I did some experiments on Fedora 14 with 2.6.35.6, running mkfs.btrfs
> > > > > followed by a mount in parallel on 24 disks.
> > > > > 
> > > > > This seems to crash reliably.
> > > > > 
> > > > > I reported the bug to Fedora here:
> > > > > 
> > > > > https://bugzilla.redhat.com/show_bug.cgi?id=650261
> > > > > 
> > > > > The bug report has some stacktraces and stuff attached.
> > > > > 
> > > > > I saw stuff like this:
> > > > > 
> > > > > [  203.861368] Btrfs loaded
> > > > > [  203.871418] device label myvol-12b0f8ba-e1ef-4d00-aae4-578d59d955e4
> > > > > devid 1 transid 7 /dev/sdv
> > > > > [  203.981371] device label myvol-fe7fdbcd-8f2f-4b34-aa6b-0d3bb7582ebc
> > > > > devid 1 transid 7 /dev/sdl
> > > > > [  203.990234] BUG: unable to handle kernel NULL pointer dereference
> > > > > at 0000000000000128
> > > > > [  204.001153] IP: [<ffffffffa019a010>] btrfs_test_super+0x10/0x26 [btrfs]
> > > > > [  204.016840] PGD 0
> > > > > [  204.018696] Oops: 0000 [#1] SMP
> > > > > [  204.021264] last sysfs file:
> > > > > /sys/devices/pci0000:00/0000:00:05.0/0000:0d:00.0/host7/port-7:0/expander-7:0/port-7:0:9/end_device-7:0:9/target7:0:9/7:0:9:0/uevent
> > > > > [  204.045966] CPU 0
> > > > > [  204.055933] Modules linked in: btrfs zlib_deflate libcrc32c ipv6
> > > > > mlx4_ib ib_mad ib_core mlx4_en igb mlx4_core ses iTCO_wdt ioatdma dca
> > > > > iTCO_vendor_support i7core_edac edac_core enclosure i2c_i801 i2c_core
> > > > > microcode joydev serio_raw mptsas mptscsih mptbase scsi_transport_sas
> > > > > [last unloaded: scsi_wait_scan]
> > > > > [  204.100672]
> > > > > [  204.103025] Pid: 2166, comm: mount Not tainted
> > > > > 2.6.35.6-48.fc14.x86_64 #1 ASSY,BLADE,X6270      /SUN BLADE X6270
> > > > > SERVER MODULE
> > > > > [  204.121895] RIP: 0010:[<ffffffffa019a010>]  [<ffffffffa019a010>]
> > > > > btrfs_test_super+0x10/0x26 [btrfs]
> > > > > [  204.139279] RSP: 0018:ffff8803768ddcd8  EFLAGS: 00010287
> > > > > [  204.144675] RAX: 0000000000000000 RBX: ffff8803772df800 RCX: ffff8801f7d34480
> > > > > [  204.160885] RDX: 0000000000000120 RSI: ffff8801f7d34480 RDI: ffff8803772df800
> > > > > [  204.179473] RBP: ffff8803768ddcd8 R08: ffff8801f7d344f8 R09: ffff880375c78760
> > > > > [  204.185673] R10: ffff8803768ddb68 R11: ffff8801f7d34480 R12: ffffffffa01f13d0
> > > > > [  204.200174] R13: ffffffffa019a000 R14: ffff8801f7d34480 R15: ffffffffa01f1400
> > > > > [  204.221347] FS:  00007f766050d800(0000) GS:ffff880022200000(0000)
> > > > > knlGS:0000000000000000
> > > > > [  204.226597] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > > > > [  204.238144] CR2: 0000000000000128 CR3: 00000001f7504000 CR4: 00000000000006f0
> > > > > [  204.256446] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > [  204.263975] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > > > > [  204.280097] Process mount (pid: 2166, threadinfo ffff8803768dc000,
> > > > > task ffff880376ab8000)
> > > > > [  204.296529] Stack:
> > > > > [  204.298975]  ffff8803768ddd38 ffffffff81118df6 ffffffffa01f13d0
> > > > > 0000000000000000
> > > > > [  204.305955] <0> ffffffff811184a7 0000000000000000 ffff8803768ddd38
> > > > > 0000000000000003
> > > > > [  204.320999] <0> ffffffffa01f13d0 0000000000000000 ffff880375c78680
> > > > > ffff8801f4f59d00
> > > > > [  204.338658] Call Trace:
> > > > > [  204.341138]  [<ffffffff81118df6>] sget+0x54/0x367
> > > > > [  204.355474]  [<ffffffff811184a7>] ? set_anon_super+0x0/0xe7
> > > > > [  204.360487]  [<ffffffffa019a821>] btrfs_get_sb+0x108/0x3eb [btrfs]
> > > > > [  204.375799]  [<ffffffff81118b99>] vfs_kern_mount+0xad/0x1ac
> > > > > [  204.379952]  [<ffffffff81118d00>] do_kern_mount+0x4d/0xef
> > > > > [  204.395757]  [<ffffffff8112e45a>] do_mount+0x700/0x75d
> > > > > [  204.400676]  [<ffffffff810e5aec>] ? strndup_user+0x54/0x85
> > > > > [  204.417219]  [<ffffffff8112e6e7>] sys_mount+0x88/0xc2
> > > > > [  204.420399]  [<ffffffff81009cf2>] system_call_fastpath+0x16/0x1b
> > > > > [  204.435716] Code: <48> 8b 80 28 01 00 00 48 39 b0 28 22 00 00 c9 0f
> > > > > 94 c0 0f b6 c0 c3
> > > > > [  204.447369] RIP  [<ffffffffa019a010>] btrfs_test_super+0x10/0x26 [btrfs]
> > > > > [  204.458280]  RSP <ffff8803768ddcd8>
> > > > > [  204.462330] CR2: 0000000000000128
> > > > > [  204.466479] ---[ end trace c8bb842fa664d021 ]---
> > > > > [  204.498273] device label myvol-11d50321-7482-4ee2-8da5-5ee5f623ae17
> > > > > devid 1 transid 7 /dev/sdk
> > > > 
> > > > This looks like a fairly obvious race.
> > > > How about some comments on the suitability of this totally untested
> > > > patch?
> > > > 
> > > > I thought about trying to merge the tests following sget() (those
> > > > surrounded by the mutex) into btrfs_fill_super() but that seemed a bit
> > > > too hard. Maybe someone can advise on how that should be done if this
> > > > approach is not OK.
> > > 
> > > Hahaha, excuse me, of course this is not ok!
> > > A function local mutex won't do anything to resolve this.
> > > 
> > > For the sake of discussion let's just assume that it has been defined as
> > > a static global in fs/btrfs/super.c instead.
> > 
> > In investigating the traces here I've discovered a few interesting
> > things.
> > 
> > First, most of these traces appear to be from a deadlock between the BKL
> > and dev->bd_mutex in blkdev_{get,put}. The BKL has been removed from
> > these functions in 2.6.36 and, hopefully a simple patch will fix the
> > problem for earlier kernels. Hence not a problem to concern yourself
> > with.
> > 
> > The second issue (actually two issues) is with btrfs_get_sb(). First the
> > one from the trace here which is a use before set (in btrfs_test_super)
> > of the btrfs tree root. Anything that is to be used in the test function
> > must be set before returning from sget() as the super block is added to
> > the file system list of supers at the end of sget() and can be used from
> > that point onward. I've been looking at the code and have concluded that
> > trying to move the needed code into a set function, that would be used
> > instead of set_anon_super(), and passed to sget() is difficult and would
> > be quite messy. This brings me back to using a static global mutex
> > around the block of code similar to what I have in the patch I posted
> > here, which I think should also resolve the problem.
> > 
> 
> So I think we're on the same page for this problem, I've posted a test patch to
> the bugzilla
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=650261
> 
> Basically we just need to setup a blank root/fs_info so it can be set in
> set_super, and then we use those blank structs in open_ctree.  Hopefully this
> works, if it does I'll post it to the list for wider review.

Personally I'd prefer to do it this way.
I'll have a look at the patch.

> 
> > Could you have a look at this area of the code and offer your advice on
> > how you think it would be best to resolve it please.
> > 
> > Oh and the second problem I noticed was a possible use after clear of
> > the tree root. The put_super super block method is call a little before
> > the super block is removed from the file system list of supers leaving a
> > very small window for this to be used after it has been cleared. I think
> > it's easily fixed by adding a check of s->s_active being 0 in
> > btrfs_test_super(), as that will tell us the super block about to go
> > away. So, again not something for you to worry about as I'll forward
> > patches when the above is resolved.
> > 
> 
> put_super should be called after kill_sb() which is what removes us from the
> fs_supers list, so we should be ok here.  Thanks,

Sure, as I said, it's a (very) small window.

AFAICS, ->kill_sb == kill_anon_super() which calls ->put_super() which
clears the tree root, then removes it from the list at the end of the
function.

But hey, maybe these a lock in place that I missed.
Anyway, I still recommend adding the check in btrfs_test_super() since
->kill_sb() gets called when ->s_active == 0 and it's a small change.

> 
> Josef



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mkfs.btrfs on 24 disks in parallel crashes kernel
  2010-11-15 16:48       ` Josef Bacik
  2010-11-15 17:52         ` Ian Kent
@ 2010-11-16  0:52         ` Li Zefan
  1 sibling, 0 replies; 7+ messages in thread
From: Li Zefan @ 2010-11-16  0:52 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Ian Kent, Chris Mason, linux-btrfs, Albert Strasheim

> So I think we're on the same page for this problem, I've posted a test patch to
> the bugzilla
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=650261
> 
> Basically we just need to setup a blank root/fs_info so it can be set in
> set_super, and then we use those blank structs in open_ctree.  Hopefully this
> works, if it does I'll post it to the list for wider review.
> 

You can reproduce the bug easily by this simple script:

# cat test.sh
#! /bin/sh

for ((; ;))
{
    mount $1 $2
    umount $2
}

Now run the test (I was using loop devices):

# ./test.sh /dev/loop1 /mnt1 &
# ./test.sh /dev/loop2 /mnt2

(You'll get exactly the same crash)

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-11-16  0:52 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-08  7:22 mkfs.btrfs on 24 disks in parallel crashes kernel Albert Strasheim
2010-11-09  4:10 ` Ian Kent
2010-11-09  7:52   ` Ian Kent
2010-11-15 14:21     ` Ian Kent
2010-11-15 16:48       ` Josef Bacik
2010-11-15 17:52         ` Ian Kent
2010-11-16  0:52         ` Li Zefan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).