[PATCH] block: protect iterate_bdevs() against concurrent close

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] block: protect iterate_bdevs() against concurrent close
@ 2016-03-10 12:26 Rabin Vincent
  2016-03-10 17:37 ` Jan Kara
  0 siblings, 1 reply; 10+ messages in thread
From: Rabin Vincent @ 2016-03-10 12:26 UTC (permalink / raw)
  To: viro; +Cc: jack, linux-fsdevel, linux-kernel, Rabin Vincent

From: Rabin Vincent <rabinv@axis.com>

If a block device is closed while iterate_bdevs() is handling it, the
following NULL pointer dereference occurs because bdev->b_disk is NULL
in bdev_get_queue(), which is called from blk_get_backing_dev_info() (in
turn called by the mapping_cap_writeback_dirty() call in
__filemap_fdatawrite_range()):

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000508
 IP: [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20
 PGD 9e62067 PUD 9ee8067 PMD 0
 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
 Modules linked in:
 CPU: 1 PID: 2422 Comm: sync Not tainted 4.5.0-rc7+ #400
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
 task: ffff880009f4d700 ti: ffff880009f5c000 task.ti: ffff880009f5c000
 RIP: 0010:[<ffffffff81314790>]  [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20
 RSP: 0018:ffff880009f5fe68  EFLAGS: 00010246
 RAX: 0000000000000000 RBX: ffff88000ec17a38 RCX: ffffffff81a4e940
 RDX: 7fffffffffffffff RSI: 0000000000000000 RDI: ffff88000ec176c0
 RBP: ffff880009f5fe68 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000001 R11: 0000000000000000 R12: ffff88000ec17860
 R13: ffffffff811b25c0 R14: ffff88000ec178e0 R15: ffff88000ec17a38
 FS:  00007faee505d700(0000) GS:ffff88000fb00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
 CR2: 0000000000000508 CR3: 0000000009e8a000 CR4: 00000000000006e0
 Stack:
  ffff880009f5feb8 ffffffff8112e7f5 0000000000000000 7fffffffffffffff
  0000000000000000 0000000000000000 7fffffffffffffff 0000000000000001
  ffff88000ec178e0 ffff88000ec17860 ffff880009f5fec8 ffffffff8112e81f
 Call Trace:
  [<ffffffff8112e7f5>] __filemap_fdatawrite_range+0x85/0x90
  [<ffffffff8112e81f>] filemap_fdatawrite+0x1f/0x30
  [<ffffffff811b25d6>] fdatawrite_one_bdev+0x16/0x20
  [<ffffffff811bc402>] iterate_bdevs+0xf2/0x130
  [<ffffffff811b2763>] sys_sync+0x63/0x90
  [<ffffffff815d4272>] entry_SYSCALL_64_fastpath+0x12/0x76
 Code: 0f 1f 44 00 00 48 8b 87 f0 00 00 00 55 48 89 e5 <48> 8b 80 08 05 00 00 5d
 RIP  [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20
  RSP <ffff880009f5fe68>
 CR2: 0000000000000508
 ---[ end trace 2487336ceb3de62d ]---

The crash is easily reproducible by running the following command, if an
msleep(100) is inserted before the call to func() in iterate_devs():

 while :; do head -c1 /dev/nullb0; done > /dev/null & while :; do sync; done

Fix it by holding the bd_mutex across the func() call and only calling
func() if the bdev is opened.

Cc: stable@vger.kernel.org
Signed-off-by: Rabin Vincent <rabinv@axis.com>
---
 fs/block_dev.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 826b164..78c9f2a 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1884,6 +1884,7 @@ void iterate_bdevs(void (*func)(struct block_device *, void *), void *arg)
 	spin_lock(&blockdev_superblock->s_inode_list_lock);
 	list_for_each_entry(inode, &blockdev_superblock->s_inodes, i_sb_list) {
 		struct address_space *mapping = inode->i_mapping;
+		struct block_device *bdev;
 
 		spin_lock(&inode->i_lock);
 		if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW) ||
@@ -1904,8 +1905,12 @@ void iterate_bdevs(void (*func)(struct block_device *, void *), void *arg)
 		 */
 		iput(old_inode);
 		old_inode = inode;
+		bdev = I_BDEV(inode);
 
-		func(I_BDEV(inode), arg);
+		mutex_lock(&bdev->bd_mutex);
+		if (bdev->bd_openers)
+			func(bdev, arg);
+		mutex_unlock(&bdev->bd_mutex);
 
 		spin_lock(&blockdev_superblock->s_inode_list_lock);
 	}
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: protect iterate_bdevs() against concurrent close
  2016-03-10 12:26 [PATCH] block: protect iterate_bdevs() against concurrent close Rabin Vincent
@ 2016-03-10 17:37 ` Jan Kara
  2016-03-14 17:42   ` Rabin Vincent
  2016-03-15  7:51   ` Christoph Hellwig
  0 siblings, 2 replies; 10+ messages in thread
From: Jan Kara @ 2016-03-10 17:37 UTC (permalink / raw)
  To: Rabin Vincent
  Cc: viro, jack, linux-fsdevel, linux-kernel, Rabin Vincent, jaxboe,
	hch

On Thu 10-03-16 13:26:03, Rabin Vincent wrote:
> From: Rabin Vincent <rabinv@axis.com>
> 
> If a block device is closed while iterate_bdevs() is handling it, the
> following NULL pointer dereference occurs because bdev->b_disk is NULL
> in bdev_get_queue(), which is called from blk_get_backing_dev_info() (in
> turn called by the mapping_cap_writeback_dirty() call in
> __filemap_fdatawrite_range()):
> 
>  BUG: unable to handle kernel NULL pointer dereference at 0000000000000508
>  IP: [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20
>  PGD 9e62067 PUD 9ee8067 PMD 0
>  Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
>  Modules linked in:
>  CPU: 1 PID: 2422 Comm: sync Not tainted 4.5.0-rc7+ #400
>  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
>  task: ffff880009f4d700 ti: ffff880009f5c000 task.ti: ffff880009f5c000
>  RIP: 0010:[<ffffffff81314790>]  [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20
>  RSP: 0018:ffff880009f5fe68  EFLAGS: 00010246
>  RAX: 0000000000000000 RBX: ffff88000ec17a38 RCX: ffffffff81a4e940
>  RDX: 7fffffffffffffff RSI: 0000000000000000 RDI: ffff88000ec176c0
>  RBP: ffff880009f5fe68 R08: 0000000000000000 R09: 0000000000000000
>  R10: 0000000000000001 R11: 0000000000000000 R12: ffff88000ec17860
>  R13: ffffffff811b25c0 R14: ffff88000ec178e0 R15: ffff88000ec17a38
>  FS:  00007faee505d700(0000) GS:ffff88000fb00000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>  CR2: 0000000000000508 CR3: 0000000009e8a000 CR4: 00000000000006e0
>  Stack:
>   ffff880009f5feb8 ffffffff8112e7f5 0000000000000000 7fffffffffffffff
>   0000000000000000 0000000000000000 7fffffffffffffff 0000000000000001
>   ffff88000ec178e0 ffff88000ec17860 ffff880009f5fec8 ffffffff8112e81f
>  Call Trace:
>   [<ffffffff8112e7f5>] __filemap_fdatawrite_range+0x85/0x90
>   [<ffffffff8112e81f>] filemap_fdatawrite+0x1f/0x30
>   [<ffffffff811b25d6>] fdatawrite_one_bdev+0x16/0x20
>   [<ffffffff811bc402>] iterate_bdevs+0xf2/0x130
>   [<ffffffff811b2763>] sys_sync+0x63/0x90
>   [<ffffffff815d4272>] entry_SYSCALL_64_fastpath+0x12/0x76
>  Code: 0f 1f 44 00 00 48 8b 87 f0 00 00 00 55 48 89 e5 <48> 8b 80 08 05 00 00 5d
>  RIP  [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20
>   RSP <ffff880009f5fe68>
>  CR2: 0000000000000508
>  ---[ end trace 2487336ceb3de62d ]---
> 
> The crash is easily reproducible by running the following command, if an
> msleep(100) is inserted before the call to func() in iterate_devs():
> 
>  while :; do head -c1 /dev/nullb0; done > /dev/null & while :; do sync; done
> 
> Fix it by holding the bd_mutex across the func() call and only calling
> func() if the bdev is opened.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Rabin Vincent <rabinv@axis.com>

Thanks for spotting the problem. The patch will fix the problem you found.
But what prevents e.g. flusher thread from trying to writeback the block
device inode while that gets invalidated at the same moment? I.e., the
problem doesn't seem to be specific to iterate_bdevs() but rather hits
anything which writes back block device inodes, right?

Frankly it seems like a catch that mapping_cap_foo() requires block device
to be open (if mapping belongs to a block device) to be safe. The trouble
is we'd need to make inode_to_bdi() result stable while we work with the
block device inode. Christoph, do you have any idea how to cleanly achieve
that? All I'm able to come up with are ugly hacks...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: protect iterate_bdevs() against concurrent close
  2016-03-10 17:37 ` Jan Kara
@ 2016-03-14 17:42   ` Rabin Vincent
  2016-03-15  8:23     ` Jan Kara
  2016-03-15  7:51   ` Christoph Hellwig
  1 sibling, 1 reply; 10+ messages in thread
From: Rabin Vincent @ 2016-03-14 17:42 UTC (permalink / raw)
  To: Jan Kara; +Cc: Rabin Vincent, viro, linux-fsdevel, linux-kernel, axboe, hch

(fixed Jens' address)

On Thu, Mar 10, 2016 at 06:37:27PM +0100, Jan Kara wrote:
> On Thu 10-03-16 13:26:03, Rabin Vincent wrote:
> > If a block device is closed while iterate_bdevs() is handling it, the
> > following NULL pointer dereference occurs because bdev->b_disk is NULL
> > in bdev_get_queue(), which is called from blk_get_backing_dev_info() (in
> > turn called by the mapping_cap_writeback_dirty() call in
> > __filemap_fdatawrite_range()):
> 
> Thanks for spotting the problem. The patch will fix the problem you
> found.  But what prevents e.g. flusher thread from trying to writeback
> the block device inode while that gets invalidated at the same moment?

Don't the sync_block_dev() / bdev_write_inode() calls in __blkdev_put()
prevent this?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: protect iterate_bdevs() against concurrent close
  2016-03-10 17:37 ` Jan Kara
  2016-03-14 17:42   ` Rabin Vincent
@ 2016-03-15  7:51   ` Christoph Hellwig
  1 sibling, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2016-03-15  7:51 UTC (permalink / raw)
  To: Jan Kara
  Cc: Rabin Vincent, viro, linux-fsdevel, linux-kernel, Rabin Vincent,
	jaxboe, hch

On Thu, Mar 10, 2016 at 06:37:27PM +0100, Jan Kara wrote:
> Thanks for spotting the problem. The patch will fix the problem you found.
> But what prevents e.g. flusher thread from trying to writeback the block
> device inode while that gets invalidated at the same moment? I.e., the
> problem doesn't seem to be specific to iterate_bdevs() but rather hits
> anything which writes back block device inodes, right?
> 
> Frankly it seems like a catch that mapping_cap_foo() requires block device
> to be open (if mapping belongs to a block device) to be safe. The trouble
> is we'd need to make inode_to_bdi() result stable while we work with the
> block device inode. Christoph, do you have any idea how to cleanly achieve
> that? All I'm able to come up with are ugly hacks...

I think the whole idea of clearing a queue from a block device
just because it's not open is the root of the problem.  Or, to dive
deeper into the issue our whole object model and life time rules
between struct block_device, struct gendisk, struct request_queue and
struct backing_dev_info are rather broken.  I'd done a few attempt at
sorting this out, but it's a huge task and I haven't gotten far.

I think for now the patch from Rabin should be fine.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: protect iterate_bdevs() against concurrent close
  2016-03-14 17:42   ` Rabin Vincent
@ 2016-03-15  8:23     ` Jan Kara
  0 siblings, 0 replies; 10+ messages in thread
From: Jan Kara @ 2016-03-15  8:23 UTC (permalink / raw)
  To: Rabin Vincent
  Cc: Jan Kara, Rabin Vincent, viro, linux-fsdevel, linux-kernel, axboe,
	hch

On Mon 14-03-16 18:42:58, Rabin Vincent wrote:
> (fixed Jens' address)
> 
> On Thu, Mar 10, 2016 at 06:37:27PM +0100, Jan Kara wrote:
> > On Thu 10-03-16 13:26:03, Rabin Vincent wrote:
> > > If a block device is closed while iterate_bdevs() is handling it, the
> > > following NULL pointer dereference occurs because bdev->b_disk is NULL
> > > in bdev_get_queue(), which is called from blk_get_backing_dev_info() (in
> > > turn called by the mapping_cap_writeback_dirty() call in
> > > __filemap_fdatawrite_range()):
> > 
> > Thanks for spotting the problem. The patch will fix the problem you
> > found.  But what prevents e.g. flusher thread from trying to writeback
> > the block device inode while that gets invalidated at the same moment?
> 
> Don't the sync_block_dev() / bdev_write_inode() calls in __blkdev_put()
> prevent this?

They don't. They make sure all dirty data is written but flusher thread can
still be in the middle of iteration over block device pages. So flusher
thread may still hold reference to the block device inode while
__blkdev_put() is done with it. Although flusher has nothing to write
anymore, it can still call some function where using inode_to_bdi() will
result in NULL pointer dereference...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] block: protect iterate_bdevs() against concurrent close
@ 2016-12-01  8:18 Jan Kara
  2016-12-01 10:16 ` Christoph Hellwig
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Jan Kara @ 2016-12-01  8:18 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Andrew Morton, Wei Fang, linux-block, Christoph Hellwig,
	Rabin Vincent, stable, Jan Kara

From: Rabin Vincent <rabinv@axis.com>

If a block device is closed while iterate_bdevs() is handling it, the
following NULL pointer dereference occurs because bdev->b_disk is NULL
in bdev_get_queue(), which is called from blk_get_backing_dev_info() (in
turn called by the mapping_cap_writeback_dirty() call in
__filemap_fdatawrite_range()):

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000508
 IP: [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20
 PGD 9e62067 PUD 9ee8067 PMD 0
 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
 Modules linked in:
 CPU: 1 PID: 2422 Comm: sync Not tainted 4.5.0-rc7+ #400
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
 task: ffff880009f4d700 ti: ffff880009f5c000 task.ti: ffff880009f5c000
 RIP: 0010:[<ffffffff81314790>]  [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20
 RSP: 0018:ffff880009f5fe68  EFLAGS: 00010246
 RAX: 0000000000000000 RBX: ffff88000ec17a38 RCX: ffffffff81a4e940
 RDX: 7fffffffffffffff RSI: 0000000000000000 RDI: ffff88000ec176c0
 RBP: ffff880009f5fe68 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000001 R11: 0000000000000000 R12: ffff88000ec17860
 R13: ffffffff811b25c0 R14: ffff88000ec178e0 R15: ffff88000ec17a38
 FS:  00007faee505d700(0000) GS:ffff88000fb00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
 CR2: 0000000000000508 CR3: 0000000009e8a000 CR4: 00000000000006e0
 Stack:
  ffff880009f5feb8 ffffffff8112e7f5 0000000000000000 7fffffffffffffff
  0000000000000000 0000000000000000 7fffffffffffffff 0000000000000001
  ffff88000ec178e0 ffff88000ec17860 ffff880009f5fec8 ffffffff8112e81f
 Call Trace:
  [<ffffffff8112e7f5>] __filemap_fdatawrite_range+0x85/0x90
  [<ffffffff8112e81f>] filemap_fdatawrite+0x1f/0x30
  [<ffffffff811b25d6>] fdatawrite_one_bdev+0x16/0x20
  [<ffffffff811bc402>] iterate_bdevs+0xf2/0x130
  [<ffffffff811b2763>] sys_sync+0x63/0x90
  [<ffffffff815d4272>] entry_SYSCALL_64_fastpath+0x12/0x76
 Code: 0f 1f 44 00 00 48 8b 87 f0 00 00 00 55 48 89 e5 <48> 8b 80 08 05 00 00 5d
 RIP  [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20
  RSP <ffff880009f5fe68>
 CR2: 0000000000000508
 ---[ end trace 2487336ceb3de62d ]---

The crash is easily reproducible by running the following command, if an
msleep(100) is inserted before the call to func() in iterate_devs():

 while :; do head -c1 /dev/nullb0; done > /dev/null & while :; do sync; done

Fix it by holding the bd_mutex across the func() call and only calling
func() if the bdev is opened.

Cc: stable@vger.kernel.org
Fixes: 5c0d6b60a0ba46d45020547eacf7199171920935
Reported-and-tested-by: Wei Fang <fangwei1@huawei.com>
Signed-off-by: Rabin Vincent <rabinv@axis.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/block_dev.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 05b553368bb4..899fa8ccc347 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1950,6 +1950,7 @@ void iterate_bdevs(void (*func)(struct block_device *, void *), void *arg)
 	spin_lock(&blockdev_superblock->s_inode_list_lock);
 	list_for_each_entry(inode, &blockdev_superblock->s_inodes, i_sb_list) {
 		struct address_space *mapping = inode->i_mapping;
+		struct block_device *bdev;
 
 		spin_lock(&inode->i_lock);
 		if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW) ||
@@ -1970,8 +1971,12 @@ void iterate_bdevs(void (*func)(struct block_device *, void *), void *arg)
 		 */
 		iput(old_inode);
 		old_inode = inode;
+		bdev = I_BDEV(inode);
 
-		func(I_BDEV(inode), arg);
+		mutex_lock(&bdev->bd_mutex);
+		if (bdev->bd_openers)
+			func(bdev, arg);
+		mutex_unlock(&bdev->bd_mutex);
 
 		spin_lock(&blockdev_superblock->s_inode_list_lock);
 	}
-- 
2.6.6

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: protect iterate_bdevs() against concurrent close
  2016-12-01  8:18 Jan Kara
@ 2016-12-01 10:16 ` Christoph Hellwig
  2016-12-01 15:27 ` Jens Axboe
  2017-01-06  0:03 ` Dan Williams
  2 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2016-12-01 10:16 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, Andrew Morton, Wei Fang, linux-block,
	Christoph Hellwig, Rabin Vincent, stable

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: protect iterate_bdevs() against concurrent close
  2016-12-01  8:18 Jan Kara
  2016-12-01 10:16 ` Christoph Hellwig
@ 2016-12-01 15:27 ` Jens Axboe
  2017-01-06  0:03 ` Dan Williams
  2 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2016-12-01 15:27 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andrew Morton, Wei Fang, linux-block, Christoph Hellwig,
	Rabin Vincent, stable

On 12/01/2016 01:18 AM, Jan Kara wrote:
> From: Rabin Vincent <rabinv@axis.com>
> 
> If a block device is closed while iterate_bdevs() is handling it, the
> following NULL pointer dereference occurs because bdev->b_disk is NULL
> in bdev_get_queue(), which is called from blk_get_backing_dev_info() (in
> turn called by the mapping_cap_writeback_dirty() call in
> __filemap_fdatawrite_range()):
> 
>  BUG: unable to handle kernel NULL pointer dereference at 0000000000000508
>  IP: [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20
>  PGD 9e62067 PUD 9ee8067 PMD 0
>  Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
>  Modules linked in:
>  CPU: 1 PID: 2422 Comm: sync Not tainted 4.5.0-rc7+ #400
>  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
>  task: ffff880009f4d700 ti: ffff880009f5c000 task.ti: ffff880009f5c000
>  RIP: 0010:[<ffffffff81314790>]  [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20
>  RSP: 0018:ffff880009f5fe68  EFLAGS: 00010246
>  RAX: 0000000000000000 RBX: ffff88000ec17a38 RCX: ffffffff81a4e940
>  RDX: 7fffffffffffffff RSI: 0000000000000000 RDI: ffff88000ec176c0
>  RBP: ffff880009f5fe68 R08: 0000000000000000 R09: 0000000000000000
>  R10: 0000000000000001 R11: 0000000000000000 R12: ffff88000ec17860
>  R13: ffffffff811b25c0 R14: ffff88000ec178e0 R15: ffff88000ec17a38
>  FS:  00007faee505d700(0000) GS:ffff88000fb00000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>  CR2: 0000000000000508 CR3: 0000000009e8a000 CR4: 00000000000006e0
>  Stack:
>   ffff880009f5feb8 ffffffff8112e7f5 0000000000000000 7fffffffffffffff
>   0000000000000000 0000000000000000 7fffffffffffffff 0000000000000001
>   ffff88000ec178e0 ffff88000ec17860 ffff880009f5fec8 ffffffff8112e81f
>  Call Trace:
>   [<ffffffff8112e7f5>] __filemap_fdatawrite_range+0x85/0x90
>   [<ffffffff8112e81f>] filemap_fdatawrite+0x1f/0x30
>   [<ffffffff811b25d6>] fdatawrite_one_bdev+0x16/0x20
>   [<ffffffff811bc402>] iterate_bdevs+0xf2/0x130
>   [<ffffffff811b2763>] sys_sync+0x63/0x90
>   [<ffffffff815d4272>] entry_SYSCALL_64_fastpath+0x12/0x76
>  Code: 0f 1f 44 00 00 48 8b 87 f0 00 00 00 55 48 89 e5 <48> 8b 80 08 05 00 00 5d
>  RIP  [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20
>   RSP <ffff880009f5fe68>
>  CR2: 0000000000000508
>  ---[ end trace 2487336ceb3de62d ]---
> 
> The crash is easily reproducible by running the following command, if an
> msleep(100) is inserted before the call to func() in iterate_devs():
> 
>  while :; do head -c1 /dev/nullb0; done > /dev/null & while :; do sync; done
> 
> Fix it by holding the bd_mutex across the func() call and only calling
> func() if the bdev is opened.

I've added this to the 4.10 branch since it's a bit late in the cycle,
and the regression is really old.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: protect iterate_bdevs() against concurrent close
  2016-12-01  8:18 Jan Kara
  2016-12-01 10:16 ` Christoph Hellwig
  2016-12-01 15:27 ` Jens Axboe
@ 2017-01-06  0:03 ` Dan Williams
  2017-01-06  0:19   ` Dan Williams
  2 siblings, 1 reply; 10+ messages in thread
From: Dan Williams @ 2017-01-06  0:03 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, Andrew Morton, Wei Fang, linux-block,
	Christoph Hellwig, Rabin Vincent, stable

On Thu, Dec 1, 2016 at 12:18 AM, Jan Kara <jack@suse.cz> wrote:
> From: Rabin Vincent <rabinv@axis.com>
>
> If a block device is closed while iterate_bdevs() is handling it, the
> following NULL pointer dereference occurs because bdev->b_disk is NULL
> in bdev_get_queue(), which is called from blk_get_backing_dev_info() (in
> turn called by the mapping_cap_writeback_dirty() call in
> __filemap_fdatawrite_range()):
>
>  BUG: unable to handle kernel NULL pointer dereference at 0000000000000508
>  IP: [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20
>  PGD 9e62067 PUD 9ee8067 PMD 0
>  Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
>  Modules linked in:
>  CPU: 1 PID: 2422 Comm: sync Not tainted 4.5.0-rc7+ #400
>  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
>  task: ffff880009f4d700 ti: ffff880009f5c000 task.ti: ffff880009f5c000
>  RIP: 0010:[<ffffffff81314790>]  [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20
>  RSP: 0018:ffff880009f5fe68  EFLAGS: 00010246
>  RAX: 0000000000000000 RBX: ffff88000ec17a38 RCX: ffffffff81a4e940
>  RDX: 7fffffffffffffff RSI: 0000000000000000 RDI: ffff88000ec176c0
>  RBP: ffff880009f5fe68 R08: 0000000000000000 R09: 0000000000000000
>  R10: 0000000000000001 R11: 0000000000000000 R12: ffff88000ec17860
>  R13: ffffffff811b25c0 R14: ffff88000ec178e0 R15: ffff88000ec17a38
>  FS:  00007faee505d700(0000) GS:ffff88000fb00000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>  CR2: 0000000000000508 CR3: 0000000009e8a000 CR4: 00000000000006e0
>  Stack:
>   ffff880009f5feb8 ffffffff8112e7f5 0000000000000000 7fffffffffffffff
>   0000000000000000 0000000000000000 7fffffffffffffff 0000000000000001
>   ffff88000ec178e0 ffff88000ec17860 ffff880009f5fec8 ffffffff8112e81f
>  Call Trace:
>   [<ffffffff8112e7f5>] __filemap_fdatawrite_range+0x85/0x90
>   [<ffffffff8112e81f>] filemap_fdatawrite+0x1f/0x30
>   [<ffffffff811b25d6>] fdatawrite_one_bdev+0x16/0x20
>   [<ffffffff811bc402>] iterate_bdevs+0xf2/0x130
>   [<ffffffff811b2763>] sys_sync+0x63/0x90
>   [<ffffffff815d4272>] entry_SYSCALL_64_fastpath+0x12/0x76
>  Code: 0f 1f 44 00 00 48 8b 87 f0 00 00 00 55 48 89 e5 <48> 8b 80 08 05 00 00 5d
>  RIP  [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20
>   RSP <ffff880009f5fe68>
>  CR2: 0000000000000508
>  ---[ end trace 2487336ceb3de62d ]---
>
> The crash is easily reproducible by running the following command, if an
> msleep(100) is inserted before the call to func() in iterate_devs():
>
>  while :; do head -c1 /dev/nullb0; done > /dev/null & while :; do sync; done
>
> Fix it by holding the bd_mutex across the func() call and only calling
> func() if the bdev is opened.
>
> Cc: stable@vger.kernel.org
> Fixes: 5c0d6b60a0ba46d45020547eacf7199171920935
> Reported-and-tested-by: Wei Fang <fangwei1@huawei.com>
> Signed-off-by: Rabin Vincent <rabinv@axis.com>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/block_dev.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 05b553368bb4..899fa8ccc347 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -1950,6 +1950,7 @@ void iterate_bdevs(void (*func)(struct block_device *, void *), void *arg)
>         spin_lock(&blockdev_superblock->s_inode_list_lock);
>         list_for_each_entry(inode, &blockdev_superblock->s_inodes, i_sb_list) {
>                 struct address_space *mapping = inode->i_mapping;
> +               struct block_device *bdev;
>
>                 spin_lock(&inode->i_lock);
>                 if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW) ||
> @@ -1970,8 +1971,12 @@ void iterate_bdevs(void (*func)(struct block_device *, void *), void *arg)
>                  */
>                 iput(old_inode);
>                 old_inode = inode;
> +               bdev = I_BDEV(inode);
>
> -               func(I_BDEV(inode), arg);
> +               mutex_lock(&bdev->bd_mutex);
> +               if (bdev->bd_openers)
> +                       func(bdev, arg);
> +               mutex_unlock(&bdev->bd_mutex);
>
>                 spin_lock(&blockdev_superblock->s_inode_list_lock);
>         }
> --


Hi,

I hit a bug with a similar signature back in October:

    http://marc.info/?l=linux-block&m=147769594003740&w=2

...and still see it in 4.10-rc2.

My reproducer is not very reliable. I'm thinking this fix doesn't work
because it assumes the only race is close vs sync, but my case is
device-shutdown vs sync. In fact iterate_bdevs() is not in my
backtrace:

[ 5750.941063] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000568
[..]
[ 5750.959449] CPU: 1 PID: 8910 Comm: lt-libndctl Tainted: G
O    4.10.0-rc2+ #672
[ 5750.961283] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
[ 5750.963538] task: ffff88033173c600 task.stack: ffffc90006a80000
[ 5750.964694] RIP: 0010:blk_get_backing_dev_info+0x10/0x20
[ 5750.965774] RSP: 0018:ffffc90006a83b00 EFLAGS: 00010246
[ 5750.966842] RAX: 0000000000000000 RBX: ffffc90006a83b60 RCX: 0000000000000000
[ 5750.968136] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88031b3a8040
[ 5750.969429] RBP: ffffc90006a83b00 R08: ffffffff82b013d0 R09: ffffffff81ea8107
[ 5750.970732] R10: 0000000000000001 R11: ffffffff82acd5c0 R12: ffff88031b3a83d0
[ 5750.972046] R13: ffff88031b3a81d0 R14: ffff8801efa934a0 R15: ffff88031b3a81d0
[ 5750.973344] FS:  00007f2cc4ed5d80(0000) GS:ffff88033ed00000(0000)
knlGS:0000000000000000
[ 5750.975171] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5750.976295] CR2: 0000000000000568 CR3: 0000000333fcf000 CR4: 00000000000406e0
[ 5750.977608] Call Trace:
[ 5750.978345]  __inode_attach_wb+0x3a7/0x5d0
[ 5750.979281]  __filemap_fdatawrite_range+0xf8/0x100
[ 5750.980303]  filemap_write_and_wait+0x40/0x90
[ 5750.981270]  fsync_bdev+0x54/0x60
[ 5750.982110]  ? bdget_disk+0x30/0x40
[ 5750.982967]  invalidate_partition+0x24/0x50
[ 5750.983910]  del_gendisk+0xfa/0x230
[ 5750.984768]  pmem_release_disk+0x12/0x20 [nd_pmem]
[ 5750.985787]  devm_action_release+0xf/0x20
[ 5750.986697]  release_nodes+0x16d/0x2b0
[ 5750.987595]  devres_release_all+0x3c/0x60
[ 5750.988510]  device_release_driver_internal+0x16d/0x210
[ 5750.989586]  device_release_driver+0x12/0x20
[ 5750.990537]  unbind_store+0x10f/0x160
[ 5750.991425]  drv_attr_store+0x25/0x30
[ 5750.992301]  sysfs_kf_write+0x45/0x60
[ 5750.993176]  kernfs_fop_write+0x13c/0x1c0
[ 5750.994099]  __vfs_write+0x37/0x160
[ 5750.994967]  ? rcu_read_lock_sched_held+0x5d/0x70
[ 5750.995975]  ? rcu_sync_lockdep_assert+0x2f/0x60
[ 5750.996975]  ? __sb_start_write+0xce/0x1d0
[ 5750.997913]  ? vfs_write+0x17d/0x1a0
[ 5750.998790]  vfs_write+0xb5/0x1a0
[ 5750.999626]  SyS_write+0x58/0xc0
[ 5751.000458]  entry_SYSCALL_64_fastpath+0x1f/0xc2

...so I'm going to take a look at having blk_get_backing_dev_info()
take its own blkdev_get() reference.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: protect iterate_bdevs() against concurrent close
  2017-01-06  0:03 ` Dan Williams
@ 2017-01-06  0:19   ` Dan Williams
  0 siblings, 0 replies; 10+ messages in thread
From: Dan Williams @ 2017-01-06  0:19 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, Andrew Morton, Wei Fang, linux-block,
	Christoph Hellwig, Rabin Vincent, stable

On Thu, Jan 5, 2017 at 4:03 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Thu, Dec 1, 2016 at 12:18 AM, Jan Kara <jack@suse.cz> wrote:
>> From: Rabin Vincent <rabinv@axis.com>
>>
>> If a block device is closed while iterate_bdevs() is handling it, the
>> following NULL pointer dereference occurs because bdev->b_disk is NULL
>> in bdev_get_queue(), which is called from blk_get_backing_dev_info() (in
>> turn called by the mapping_cap_writeback_dirty() call in
>> __filemap_fdatawrite_range()):
>>
>>  BUG: unable to handle kernel NULL pointer dereference at 0000000000000508
>>  IP: [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20
>>  PGD 9e62067 PUD 9ee8067 PMD 0
>>  Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
>>  Modules linked in:
>>  CPU: 1 PID: 2422 Comm: sync Not tainted 4.5.0-rc7+ #400
>>  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
>>  task: ffff880009f4d700 ti: ffff880009f5c000 task.ti: ffff880009f5c000
>>  RIP: 0010:[<ffffffff81314790>]  [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20
>>  RSP: 0018:ffff880009f5fe68  EFLAGS: 00010246
>>  RAX: 0000000000000000 RBX: ffff88000ec17a38 RCX: ffffffff81a4e940
>>  RDX: 7fffffffffffffff RSI: 0000000000000000 RDI: ffff88000ec176c0
>>  RBP: ffff880009f5fe68 R08: 0000000000000000 R09: 0000000000000000
>>  R10: 0000000000000001 R11: 0000000000000000 R12: ffff88000ec17860
>>  R13: ffffffff811b25c0 R14: ffff88000ec178e0 R15: ffff88000ec17a38
>>  FS:  00007faee505d700(0000) GS:ffff88000fb00000(0000) knlGS:0000000000000000
>>  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>  CR2: 0000000000000508 CR3: 0000000009e8a000 CR4: 00000000000006e0
>>  Stack:
>>   ffff880009f5feb8 ffffffff8112e7f5 0000000000000000 7fffffffffffffff
>>   0000000000000000 0000000000000000 7fffffffffffffff 0000000000000001
>>   ffff88000ec178e0 ffff88000ec17860 ffff880009f5fec8 ffffffff8112e81f
>>  Call Trace:
>>   [<ffffffff8112e7f5>] __filemap_fdatawrite_range+0x85/0x90
>>   [<ffffffff8112e81f>] filemap_fdatawrite+0x1f/0x30
>>   [<ffffffff811b25d6>] fdatawrite_one_bdev+0x16/0x20
>>   [<ffffffff811bc402>] iterate_bdevs+0xf2/0x130
>>   [<ffffffff811b2763>] sys_sync+0x63/0x90
>>   [<ffffffff815d4272>] entry_SYSCALL_64_fastpath+0x12/0x76
>>  Code: 0f 1f 44 00 00 48 8b 87 f0 00 00 00 55 48 89 e5 <48> 8b 80 08 05 00 00 5d
>>  RIP  [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20
>>   RSP <ffff880009f5fe68>
>>  CR2: 0000000000000508
>>  ---[ end trace 2487336ceb3de62d ]---
>>
>> The crash is easily reproducible by running the following command, if an
>> msleep(100) is inserted before the call to func() in iterate_devs():
>>
>>  while :; do head -c1 /dev/nullb0; done > /dev/null & while :; do sync; done
>>
>> Fix it by holding the bd_mutex across the func() call and only calling
>> func() if the bdev is opened.
>>
>> Cc: stable@vger.kernel.org
>> Fixes: 5c0d6b60a0ba46d45020547eacf7199171920935
>> Reported-and-tested-by: Wei Fang <fangwei1@huawei.com>
>> Signed-off-by: Rabin Vincent <rabinv@axis.com>
>> Signed-off-by: Jan Kara <jack@suse.cz>
>> ---
>>  fs/block_dev.c | 7 ++++++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/block_dev.c b/fs/block_dev.c
>> index 05b553368bb4..899fa8ccc347 100644
>> --- a/fs/block_dev.c
>> +++ b/fs/block_dev.c
>> @@ -1950,6 +1950,7 @@ void iterate_bdevs(void (*func)(struct block_device *, void *), void *arg)
>>         spin_lock(&blockdev_superblock->s_inode_list_lock);
>>         list_for_each_entry(inode, &blockdev_superblock->s_inodes, i_sb_list) {
>>                 struct address_space *mapping = inode->i_mapping;
>> +               struct block_device *bdev;
>>
>>                 spin_lock(&inode->i_lock);
>>                 if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW) ||
>> @@ -1970,8 +1971,12 @@ void iterate_bdevs(void (*func)(struct block_device *, void *), void *arg)
>>                  */
>>                 iput(old_inode);
>>                 old_inode = inode;
>> +               bdev = I_BDEV(inode);
>>
>> -               func(I_BDEV(inode), arg);
>> +               mutex_lock(&bdev->bd_mutex);
>> +               if (bdev->bd_openers)
>> +                       func(bdev, arg);
>> +               mutex_unlock(&bdev->bd_mutex);
>>
>>                 spin_lock(&blockdev_superblock->s_inode_list_lock);
>>         }
>> --
>
>
> Hi,
>
> I hit a bug with a similar signature back in October:
>
>     http://marc.info/?l=linux-block&m=147769594003740&w=2
>
> ...and still see it in 4.10-rc2.
>
> My reproducer is not very reliable. I'm thinking this fix doesn't work
> because it assumes the only race is close vs sync, but my case is
> device-shutdown vs sync. In fact iterate_bdevs() is not in my
> backtrace:
>
> [ 5750.941063] BUG: unable to handle kernel NULL pointer dereference
> at 0000000000000568
> [..]
> [ 5750.959449] CPU: 1 PID: 8910 Comm: lt-libndctl Tainted: G
> O    4.10.0-rc2+ #672
> [ 5750.961283] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
> [ 5750.963538] task: ffff88033173c600 task.stack: ffffc90006a80000
> [ 5750.964694] RIP: 0010:blk_get_backing_dev_info+0x10/0x20
> [ 5750.965774] RSP: 0018:ffffc90006a83b00 EFLAGS: 00010246
> [ 5750.966842] RAX: 0000000000000000 RBX: ffffc90006a83b60 RCX: 0000000000000000
> [ 5750.968136] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88031b3a8040
> [ 5750.969429] RBP: ffffc90006a83b00 R08: ffffffff82b013d0 R09: ffffffff81ea8107
> [ 5750.970732] R10: 0000000000000001 R11: ffffffff82acd5c0 R12: ffff88031b3a83d0
> [ 5750.972046] R13: ffff88031b3a81d0 R14: ffff8801efa934a0 R15: ffff88031b3a81d0
> [ 5750.973344] FS:  00007f2cc4ed5d80(0000) GS:ffff88033ed00000(0000)
> knlGS:0000000000000000
> [ 5750.975171] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 5750.976295] CR2: 0000000000000568 CR3: 0000000333fcf000 CR4: 00000000000406e0
> [ 5750.977608] Call Trace:
> [ 5750.978345]  __inode_attach_wb+0x3a7/0x5d0
> [ 5750.979281]  __filemap_fdatawrite_range+0xf8/0x100
> [ 5750.980303]  filemap_write_and_wait+0x40/0x90
> [ 5750.981270]  fsync_bdev+0x54/0x60
> [ 5750.982110]  ? bdget_disk+0x30/0x40
> [ 5750.982967]  invalidate_partition+0x24/0x50
> [ 5750.983910]  del_gendisk+0xfa/0x230
> [ 5750.984768]  pmem_release_disk+0x12/0x20 [nd_pmem]
> [ 5750.985787]  devm_action_release+0xf/0x20
> [ 5750.986697]  release_nodes+0x16d/0x2b0
> [ 5750.987595]  devres_release_all+0x3c/0x60
> [ 5750.988510]  device_release_driver_internal+0x16d/0x210
> [ 5750.989586]  device_release_driver+0x12/0x20
> [ 5750.990537]  unbind_store+0x10f/0x160
> [ 5750.991425]  drv_attr_store+0x25/0x30
> [ 5750.992301]  sysfs_kf_write+0x45/0x60
> [ 5750.993176]  kernfs_fop_write+0x13c/0x1c0
> [ 5750.994099]  __vfs_write+0x37/0x160
> [ 5750.994967]  ? rcu_read_lock_sched_held+0x5d/0x70
> [ 5750.995975]  ? rcu_sync_lockdep_assert+0x2f/0x60
> [ 5750.996975]  ? __sb_start_write+0xce/0x1d0
> [ 5750.997913]  ? vfs_write+0x17d/0x1a0
> [ 5750.998790]  vfs_write+0xb5/0x1a0
> [ 5750.999626]  SyS_write+0x58/0xc0
> [ 5751.000458]  entry_SYSCALL_64_fastpath+0x1f/0xc2
>
> ...so I'm going to take a look at having blk_get_backing_dev_info()
> take its own blkdev_get() reference.

No, that's wrong I think we just need make bdev->bd_disk have the same
lifetime as bdev->bd_inode...

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-01-06  0:19 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-10 12:26 [PATCH] block: protect iterate_bdevs() against concurrent close Rabin Vincent
2016-03-10 17:37 ` Jan Kara
2016-03-14 17:42   ` Rabin Vincent
2016-03-15  8:23     ` Jan Kara
2016-03-15  7:51   ` Christoph Hellwig
  -- strict thread matches above, loose matches on Subject: below --
2016-12-01  8:18 Jan Kara
2016-12-01 10:16 ` Christoph Hellwig
2016-12-01 15:27 ` Jens Axboe
2017-01-06  0:03 ` Dan Williams
2017-01-06  0:19   ` Dan Williams

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.