Unable to handle kernel NULL pointer dereference in super

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Unable to handle kernel NULL pointer dereference in super_written
       [not found] <678678296.35099303.1459240762496.JavaMail.zimbra@redhat.com>
@ 2016-03-29 12:22 ` Xiao Ni
  2016-03-29 21:37   ` Shaohua Li
  0 siblings, 1 reply; 8+ messages in thread
From: Xiao Ni @ 2016-03-29 12:22 UTC (permalink / raw)
  To: linux-raid; +Cc: shli, Jes.Sorensen, Neil Brown

[-- Attachment #1: Type: text/plain, Size: 10953 bytes --]

Hi all

I encountered one NULL pointer dereference problem.

The environment：
latest linux-stable and mdadm codes
aarch64 platform
the md device is created with loop devices

It's a test case to check date integrity. I added the test script as the attachment.

[37158.968198] Unable to handle kernel NULL pointer dereference at virtual address 000002a8
[37158.976261] pgd = fffffe0001300000
[37158.979648] [000002a8] *pgd=00000043f9a50003, *pud=00000043f9a50003, *pmd=00000043f9a50003, *pte=00e8000078090707
[37158.989911] Internal error: Oops: 96000006 [#1] SMP
[37158.994766] Modules linked in: ext4 mbcache jbd2 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq md_mod loop vfat fat sg xgene_rng nfsd ip_tables xfs libcrc32c dm_mirror dm_region_hash dm_log dm_mod realtek(E)
[37159.016342] CPU: 0 PID: 1817 Comm: loop0 Tainted: G            E   4.5.0 #1
[37159.023271] Hardware name: AppliedMicro Mustang/Mustang, BIOS 1.1.0 Oct 20 2015
[37159.030548] task: fffffe03dd7db300 ti: fffffe03d5fa4000 task.ti: fffffe03d5fa4000
[37159.038021] PC is at super_written+0x34/0x98 [md_mod]
[37159.043052] LR is at bio_endio+0x90/0xc4
[37159.046956] pc : [<fffffdfffc577984>] lr : [<fffffe0000360890>] pstate: 800001c5
[37159.054319] sp : fffffe03d5fa7b40
[37159.057617] x29: fffffe03d5fa7b40 x28: 0000000000000000 
[37159.062924] x27: 0000000000000000 x26: 0000000000000000 
[37159.068230] x25: fffffe03d848adf8 x24: 0000000000000000 
[37159.073535] x23: 0000000000000000 x22: fffffe03d876ce00 
[37159.078841] x21: fffffe00bc3a8c00 x20: fffffe019df0aa00 
[37159.084147] x19: 0000000000000000 x18: 000003ffe853c260 
[37159.089455] x17: 00000000004f0468 x16: fffffe0000219b20 
[37159.094760] x15: 0000000000000000 x14: 0000000000000000 
[37159.100066] x13: 0000000000000000 x12: 0000000000000000 
[37159.105371] x11: fffffe00007137b8 x10: 0000000000000aa0 
[37159.110677] x9 : fffffe03ffe3ca60 x8 : 0000000000000000 
[37159.115982] x7 : 00000003ff340000 x6 : 00000000032ff9d8 
[37159.121288] x5 : 0000000000000000 x4 : 0000000000000000 
[37159.126593] x3 : fffffe0000b43000 x2 : 00000000000002a8 
[37159.131899] x1 : 0000000000000000 x0 : fffffe0000360890 
[37159.137204] 
[37159.138688] Process loop0 (pid: 1817, stack limit = 0xfffffe03d5fa4020)
[37159.145272] Stack: (0xfffffe03d5fa7b40 to 0xfffffe03d5fa8000)
[37159.150991] 7b40: fffffe03d5fa7b70 fffffe0000360890 fffffe019df0aa00 fffffe00003608c4
[37159.158788] 7b60: 0000000000000000 dead000000000200 fffffe03d5fa7ba0 fffffe0000367c8c
[37159.166584] 7b80: fffffe019df0aa00 0000000000000000 0000000000000000 fffffe03d876ce00
[37159.174379] 7ba0: fffffe03d5fa7be0 fffffe00003715dc fffffe03d876ce00 0000000000000000
[37159.182174] 7bc0: fffffe00bc801200 0000000000000000 fffffdfffc533690 0000000000000140
[37159.189969] 7be0: fffffe03d5fa7c00 fffffe000036b5a4 fffffe03d876ce00 fffffe03d848ae80
[37159.197764] 7c00: fffffe03d5fa7c50 fffffe000036b960 fffffe03d848ae80 fffffe03d848aea0
[37159.205559] 7c20: 0000000000000001 0000000000000000 fffffe00bc801200 0000000000000140
[37159.213354] 7c40: fffffe03d848ae80 0000000000000004 fffffe03d5fa7ca0 fffffe0000371600
[37159.221148] 7c60: fffffe03dc904000 0000000000000000 fffffe03d5fa4000 0000000000000001
[37159.228945] 7c80: fffffe00be386f50 fffffe00011a4ff0 0000000000000000 0000000000000000
[37159.236739] 7ca0: fffffe03d5fa7cc0 fffffe0000371800 fffffe03dc904000 0000000000000000
[37159.244534] 7cc0: fffffe03d5fa7cf0 fffffe0000371848 fffffe03dc904000 0000000000000000
[37159.252330] 7ce0: fffffe03d5fa4000 fffffdfffc532948 fffffe03d5fa7d10 fffffdfffc532740
[37159.260125] 7d00: fffffe00be386e00 0000000000000000 fffffe03d5fa7df0 fffffe00000d4f78
[37159.267921] 7d20: fffffe00011a4000 fffffe00be386f48 fffffe03d5fa4000 0000000000000001
[37159.275717] 7d40: fffffe00be386f50 fffffe00011a4ff0 0000000000000000 0000000000000000
[37159.283513] 7d60: 0000000000000000 0000000000000000 fffffe03d5fa7dc0 fffffe03dc904170
[37159.291308] 7d80: fffffe03dc904000 fffffe00be386f48 fffffe03d5fa4000 0000000000000001
[37159.299104] 7da0: fffffe00be386f50 fffffe00011a4ff0 0000000000000000 0000000000000000
[37159.306900] 7dc0: fffffe03d5fa7de0 fffffe00006f64f0 fffffe03d5fa7df0 fffffe00000d4fb4
[37159.314695] 7de0: fffffe03d5fa7df0 fffffe00000d4fcc fffffe03d5fa7e30 fffffe00000d4f00
[37159.322491] 7e00: fffffe03d848c100 fffffe0001111e68 fffffe0000915ff8 fffffe00be386f48
[37159.330286] 7e20: fffffe00000d4f14 0000000000000000 0000000000000000 fffffe00000859c0
[37159.338082] 7e40: fffffe00000d4e24 fffffe03d848c100 0000000000000000 0000000000000000
[37159.345876] 7e60: 0000000000000000 fffffe00000e1b28 fffffe03dc0ecb00 0000000000000000
[37159.353673] 7e80: 0000000000000000 fffffe00be386f48 0000000000000000 0000000000000000
[37159.361469] 7ea0: fffffe03d5fa7ea0 fffffe03d5fa7ea0 0000000000000000 fffffe0000000000
[37159.369263] 7ec0: fffffe03d5fa7ec0 fffffe03d5fa7ec0 0000000000000000 0000000000000000
[37159.377059] 7ee0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[37159.384855] 7f00: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[37159.392651] 7f20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[37159.400446] 7f40: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[37159.408240] 7f60: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[37159.416035] 7f80: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[37159.423831] 7fa0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[37159.431625] 7fc0: 0000000000000000 0000000000000000 0000000000000000 0000000000000005
[37159.439420] 7fe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[37159.447214] Call trace:
[37159.449649] Exception stack(0xfffffe03d5fa7980 to 0xfffffe03d5fa7aa0)
[37159.456059] 7980: 0000000000000000 fffffe019df0aa00 fffffe03d5fa7b40 fffffdfffc577984
[37159.463854] 79a0: fffffe0000b43000 fffffe03ffe39400 fffffe03d5fa7a00 fffffe00006f6254
[37159.471648] 79c0: fffffe03d5fa4000 fffffe00006f5b20 7fffffffffffffff 0000000000000002
[37159.479445] 79e0: fffffe03d5fa4000 fffffe03d5fa7b48 7fffffffffffffff 0000000000000000
[37159.487240] 7a00: fffffe03d5fa7a20 fffffe00006f8ca4 fffffe03ffe39400 00000000fffffffb
[37159.495034] 7a20: fffffe0000360890 0000000000000000 00000000000002a8 fffffe0000b43000
[37159.502830] 7a40: 0000000000000000 0000000000000000 00000000032ff9d8 00000003ff340000
[37159.510625] 7a60: 0000000000000000 fffffe03ffe3ca60 0000000000000aa0 fffffe00007137b8
[37159.518421] 7a80: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[37159.526237] [<fffffdfffc577984>] super_written+0x34/0x98 [md_mod]
[37159.532302] [<fffffe0000360890>] bio_endio+0x90/0xc4
[37159.537246] [<fffffe0000367c8c>] blk_update_request+0xb8/0x34c
[37159.543053] [<fffffe00003715dc>] blk_mq_end_request+0x2c/0x84
[37159.548773] [<fffffe000036b5a4>] blk_flush_complete_seq+0x1ac/0x308
[37159.555011] [<fffffe000036b960>] flush_end_io+0x124/0x1c8
[37159.560384] [<fffffe0000371600>] blk_mq_end_request+0x50/0x84
[37159.566104] [<fffffe0000371800>] __blk_mq_complete_request+0x108/0x118
[37159.572601] [<fffffe0000371848>] blk_mq_complete_request+0x38/0x44
[37159.578755] [<fffffdfffc532740>] loop_queue_work+0x368/0x870 [loop]
[37159.584995] [<fffffe00000d4f78>] kthread_worker_fn+0x64/0x160
[37159.590714] [<fffffe00000d4f00>] kthread+0xdc/0xf0
[37159.595483] [<fffffe00000859c0>] ret_from_fork+0x10/0x50
[37159.600771] Code: f9400eb3 35000281 910aa262 f9800051 (885f7c40)


I added BUG_ON(rdev->mddev == NULL) in super_write and super_written. 
Panic happened in super_written :

[ 4829.714552] md: export_rdev(loop0)
[ 4829.850794] ------------[ cut here ]------------
[ 4829.855396] kernel BUG at /root/md/md.c:713!

 708 static void super_written(struct bio *bio)
 709 {
 710         struct md_rdev *rdev = bio->bi_private;
 711         struct mddev *mddev = rdev->mddev;
 712 
 713         BUG_ON(rdev->mddev == NULL);

I tried this on x86_64 too, it gave another calltrace:

[26396.335146] BUG: unable to handle kernel NULL pointer dereference at 00000000000002a8
[26396.342990] IP: [<ffffffffa0425b00>] super_written+0x20/0x80 [md_mod]
[26396.349449] PGD 0 
[26396.351468] Oops: 0002 [#1] SMP 
[26396.354898] Modules linked in: ext4 mbcache jbd2 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_td
[26396.408404] CPU: 5 PID: 3261 Comm: loop0 Not tainted 4.5.0 #1
[26396.414140] Hardware name: Dell Inc. PowerEdge R715/0G2DP3, BIOS 3.2.2 09/15/2014
[26396.421608] task: ffff8808339be680 ti: ffff8808365f4000 task.ti: ffff8808365f4000
[26396.429074] RIP: 0010:[<ffffffffa0425b00>]  [<ffffffffa0425b00>] super_written+0x20/0x80 [md_mod]
[26396.437952] RSP: 0018:ffff8808365f7c38  EFLAGS: 00010046
[26396.443252] RAX: ffffffffa0425ae0 RBX: ffff8804336a7900 RCX: ffffe8f9f7b41198
[26396.450371] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8804336a7900
[26396.457489] RBP: ffff8808365f7c50 R08: 0000000000000005 R09: 00001801e02ce3d7
[26396.464608] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[26396.471728] R13: ffff8808338d9a00 R14: 0000000000000000 R15: ffff880833f9fe00
[26396.478849] FS:  00007f9e5066d740(0000) GS:ffff880237b40000(0000) knlGS:0000000000000000
[26396.486922] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[26396.492656] CR2: 00000000000002a8 CR3: 00000000019ea000 CR4: 00000000000006e0
[26396.499775] Stack:
[26396.501781]  ffff8804336a7900 0000000000000000 0000000000000000 ffff8808365f7c68
[26396.509199]  ffffffff81308cd0 ffff8804336a7900 ffff8808365f7ca8 ffffffff81310637
[26396.516618]  00000000a0233a00 ffff880833f9fe00 0000000000000000 ffff880833fb0000
[26396.524038] Call Trace:
[26396.526485]  [<ffffffff81308cd0>] bio_endio+0x40/0x60
[26396.531529]  [<ffffffff81310637>] blk_update_request+0x87/0x320
[26396.537439]  [<ffffffff8131a20a>] blk_mq_end_request+0x1a/0x70
[26396.543261]  [<ffffffff81313889>] blk_flush_complete_seq+0xd9/0x2a0
[26396.549517]  [<ffffffff81313ccf>] flush_end_io+0x15f/0x240
[26396.554993]  [<ffffffff8131a22a>] blk_mq_end_request+0x3a/0x70
[26396.560815]  [<ffffffff8131a314>] __blk_mq_complete_request+0xb4/0xe0
[26396.567246]  [<ffffffff8131a35c>] blk_mq_complete_request+0x1c/0x20
[26396.573506]  [<ffffffffa04182df>] loop_queue_work+0x6f/0x72c [loop]
[26396.579764]  [<ffffffff81697844>] ? __schedule+0x2b4/0x8f0
[26396.585242]  [<ffffffff810a7812>] kthread_worker_fn+0x52/0x170
[26396.591065]  [<ffffffff810a77c0>] ? kthread_create_on_node+0x1a0/0x1a0
[26396.597582]  [<ffffffff810a7238>] kthread+0xd8/0xf0
[26396.602453]  [<ffffffff810a7160>] ? kthread_park+0x60/0x60
[26396.607929]  [<ffffffff8169bdcf>] ret_from_fork+0x3f/0x70
[26396.613319]  [<ffffffff810a7160>] ? kthread_park+0x60/0x60


Best Regards
Xiao

[-- Attachment #2: test.sh --]
[-- Type: application/x-shellscript, Size: 2240 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Unable to handle kernel NULL pointer dereference in super_written
  2016-03-29 12:22 ` Unable to handle kernel NULL pointer dereference in super_written Xiao Ni
@ 2016-03-29 21:37   ` Shaohua Li
  2016-03-29 22:23     ` NeilBrown
                       ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Shaohua Li @ 2016-03-29 21:37 UTC (permalink / raw)
  To: Xiao Ni; +Cc: linux-raid, Jes.Sorensen, Neil Brown

On Tue, Mar 29, 2016 at 08:22:00AM -0400, Xiao Ni wrote:
> Hi all
> 
> I encountered one NULL pointer dereference problem.
> 
> The environment：
> latest linux-stable and mdadm codes
> aarch64 platform
> the md device is created with loop devices
> 
> It's a test case to check date integrity. I added the test script as the attachment.

Could you please try this patch:


From b86d9e1724184c79ad1ea63901aec802492b861c Mon Sep 17 00:00:00 2001
Message-Id: <b86d9e1724184c79ad1ea63901aec802492b861c.1459285706.git.shli@fb.com>
From: Shaohua Li <shli@fb.com>
Date: Tue, 29 Mar 2016 14:00:19 -0700
Subject: [PATCH] MD: add rdev reference for super write

md_super_write() and corresponding md_super_wait() generally are called
with reconfig_mutex locked, which prevents disk disappears. There is one
case this rule is broken. write_sb_page of bitmap.c doesn't hold the
mutex. next_active_rdev does increase rdev reference, but it decreases
the reference too early (eg, before IO finish). disk can disappear at
the window. We unconditionally increase rdev reference in
md_super_write() to avoid the race.

Reported-by: Xiao Ni <xni@redhat.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Shaohua Li <shli@fb.com>
---
 drivers/md/md.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index c068f17..bcfde333 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -718,6 +718,7 @@ static void super_written(struct bio *bio)
 
 	if (atomic_dec_and_test(&mddev->pending_writes))
 		wake_up(&mddev->sb_wait);
+	rdev_dec_pending(rdev, mddev);
 	bio_put(bio);
 }
 
@@ -732,6 +733,8 @@ void md_super_write(struct mddev *mddev, struct md_rdev *rdev,
 	 */
 	struct bio *bio = bio_alloc_mddev(GFP_NOIO, 1, mddev);
 
+	atomic_inc(&rdev->nr_pending);
+
 	bio->bi_bdev = rdev->meta_bdev ? rdev->meta_bdev : rdev->bdev;
 	bio->bi_iter.bi_sector = sector;
 	bio_add_page(bio, page, size, 0);
-- 
2.8.0.rc2

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: Unable to handle kernel NULL pointer dereference in super_written
  2016-03-29 21:37   ` Shaohua Li
@ 2016-03-29 22:23     ` NeilBrown
  2016-03-30  2:34     ` Guoqing Jiang
  2016-03-30  7:44     ` Xiao Ni
  2 siblings, 0 replies; 8+ messages in thread
From: NeilBrown @ 2016-03-29 22:23 UTC (permalink / raw)
  To: Shaohua Li, Xiao Ni; +Cc: linux-raid, Jes.Sorensen

[-- Attachment #1: Type: text/plain, Size: 2421 bytes --]

On Wed, Mar 30 2016, Shaohua Li wrote:

> On Tue, Mar 29, 2016 at 08:22:00AM -0400, Xiao Ni wrote:
>> Hi all
>> 
>> I encountered one NULL pointer dereference problem.
>> 
>> The environment：
>> latest linux-stable and mdadm codes
>> aarch64 platform
>> the md device is created with loop devices
>> 
>> It's a test case to check date integrity. I added the test script as the attachment.
>
> Could you please try this patch:
>
>
> From b86d9e1724184c79ad1ea63901aec802492b861c Mon Sep 17 00:00:00 2001
> Message-Id: <b86d9e1724184c79ad1ea63901aec802492b861c.1459285706.git.shli@fb.com>
> From: Shaohua Li <shli@fb.com>
> Date: Tue, 29 Mar 2016 14:00:19 -0700
> Subject: [PATCH] MD: add rdev reference for super write
>
> md_super_write() and corresponding md_super_wait() generally are called
> with reconfig_mutex locked, which prevents disk disappears. There is one
> case this rule is broken. write_sb_page of bitmap.c doesn't hold the
> mutex. next_active_rdev does increase rdev reference, but it decreases
> the reference too early (eg, before IO finish). disk can disappear at
> the window. We unconditionally increase rdev reference in
> md_super_write() to avoid the race.
>

Yes, that makes sense.  Thanks.

Acked-by: NeilBrown <neilb@suse.com>


> Reported-by: Xiao Ni <xni@redhat.com>
> Cc: Neil Brown <neilb@suse.de>
> Signed-off-by: Shaohua Li <shli@fb.com>
> ---
>  drivers/md/md.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index c068f17..bcfde333 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -718,6 +718,7 @@ static void super_written(struct bio *bio)
>  
>  	if (atomic_dec_and_test(&mddev->pending_writes))
>  		wake_up(&mddev->sb_wait);
> +	rdev_dec_pending(rdev, mddev);
>  	bio_put(bio);
>  }
>  
> @@ -732,6 +733,8 @@ void md_super_write(struct mddev *mddev, struct md_rdev *rdev,
>  	 */
>  	struct bio *bio = bio_alloc_mddev(GFP_NOIO, 1, mddev);
>  
> +	atomic_inc(&rdev->nr_pending);
> +
>  	bio->bi_bdev = rdev->meta_bdev ? rdev->meta_bdev : rdev->bdev;
>  	bio->bi_iter.bi_sector = sector;
>  	bio_add_page(bio, page, size, 0);
> -- 
> 2.8.0.rc2
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Unable to handle kernel NULL pointer dereference in super_written
  2016-03-29 21:37   ` Shaohua Li
  2016-03-29 22:23     ` NeilBrown
@ 2016-03-30  2:34     ` Guoqing Jiang
  2016-03-30 17:16       ` Shaohua Li
  2016-03-30  7:44     ` Xiao Ni
  2 siblings, 1 reply; 8+ messages in thread
From: Guoqing Jiang @ 2016-03-30  2:34 UTC (permalink / raw)
  To: Shaohua Li, Xiao Ni; +Cc: linux-raid, Jes.Sorensen, Neil Brown



On 03/30/2016 05:37 AM, Shaohua Li wrote:
> On Tue, Mar 29, 2016 at 08:22:00AM -0400, Xiao Ni wrote:
>> Hi all
>>
>> I encountered one NULL pointer dereference problem.
>>
>> The environment：
>> latest linux-stable and mdadm codes
>> aarch64 platform
>> the md device is created with loop devices
>>
>> It's a test case to check date integrity. I added the test script as the attachment.
> Could you please try this patch:
>
>
>  From b86d9e1724184c79ad1ea63901aec802492b861c Mon Sep 17 00:00:00 2001
> Message-Id: <b86d9e1724184c79ad1ea63901aec802492b861c.1459285706.git.shli@fb.com>
> From: Shaohua Li <shli@fb.com>
> Date: Tue, 29 Mar 2016 14:00:19 -0700
> Subject: [PATCH] MD: add rdev reference for super write
>
> md_super_write() and corresponding md_super_wait() generally are called
> with reconfig_mutex locked, which prevents disk disappears.

Just for curious, I find several paths maybe also don't hold reconfig_mutex,
take the followings as example.

1.  md_run -> md_update_sb -> md_super_write/md_super_wait
2.  rdev_size_store -> rdev_size_change -> md_super_write/md_super_wait


Thanks,
Guoqing

> There is one
> case this rule is broken. write_sb_page of bitmap.c doesn't hold the
> mutex. next_active_rdev does increase rdev reference, but it decreases
> the reference too early (eg, before IO finish). disk can disappear at
> the window. We unconditionally increase rdev reference in
> md_super_write() to avoid the race.
>
> Reported-by: Xiao Ni <xni@redhat.com>
> Cc: Neil Brown <neilb@suse.de>
> Signed-off-by: Shaohua Li <shli@fb.com>
> ---
>   drivers/md/md.c | 3 +++
>   1 file changed, 3 insertions(+)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index c068f17..bcfde333 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -718,6 +718,7 @@ static void super_written(struct bio *bio)
>   
>   	if (atomic_dec_and_test(&mddev->pending_writes))
>   		wake_up(&mddev->sb_wait);
> +	rdev_dec_pending(rdev, mddev);
>   	bio_put(bio);
>   }
>   
> @@ -732,6 +733,8 @@ void md_super_write(struct mddev *mddev, struct md_rdev *rdev,
>   	 */
>   	struct bio *bio = bio_alloc_mddev(GFP_NOIO, 1, mddev);
>   
> +	atomic_inc(&rdev->nr_pending);
> +
>   	bio->bi_bdev = rdev->meta_bdev ? rdev->meta_bdev : rdev->bdev;
>   	bio->bi_iter.bi_sector = sector;
>   	bio_add_page(bio, page, size, 0);

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Unable to handle kernel NULL pointer dereference in super_written
  2016-03-30  2:34     ` Guoqing Jiang
@ 2016-03-30 17:16       ` Shaohua Li
  0 siblings, 0 replies; 8+ messages in thread
From: Shaohua Li @ 2016-03-30 17:16 UTC (permalink / raw)
  To: Guoqing Jiang, Shaohua Li, Xiao Ni; +Cc: linux-raid, Jes.Sorensen, Neil Brown



On 03/29/2016 07:34 PM, Guoqing Jiang wrote:
>
>
> On 03/30/2016 05:37 AM, Shaohua Li wrote:
>> On Tue, Mar 29, 2016 at 08:22:00AM -0400, Xiao Ni wrote:
>>> Hi all
>>>
>>> I encountered one NULL pointer dereference problem.
>>>
>>> The environment：
>>> latest linux-stable and mdadm codes
>>> aarch64 platform
>>> the md device is created with loop devices
>>>
>>> It's a test case to check date integrity. I added the test script as 
>>> the attachment.
>> Could you please try this patch:
>>
>>
>>  From b86d9e1724184c79ad1ea63901aec802492b861c Mon Sep 17 00:00:00 2001
>> Message-Id: 
>> <b86d9e1724184c79ad1ea63901aec802492b861c.1459285706.git.shli@fb.com>
>> From: Shaohua Li <shli@fb.com>
>> Date: Tue, 29 Mar 2016 14:00:19 -0700
>> Subject: [PATCH] MD: add rdev reference for super write
>>
>> md_super_write() and corresponding md_super_wait() generally are called
>> with reconfig_mutex locked, which prevents disk disappears.
>
> Just for curious, I find several paths maybe also don't hold 
> reconfig_mutex,
> take the followings as example.
>
> 1.  md_run -> md_update_sb -> md_super_write/md_super_wait
> 2.  rdev_size_store -> rdev_size_change -> md_super_write/md_super_wait
we do mddev_lock/unlock calling these. The rdev_size_sotre is a bit 
tricky. the lock is hold in rdev_attr_store

Thanks,
Shaohua
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Unable to handle kernel NULL pointer dereference in super_written
  2016-03-29 21:37   ` Shaohua Li
  2016-03-29 22:23     ` NeilBrown
  2016-03-30  2:34     ` Guoqing Jiang
@ 2016-03-30  7:44     ` Xiao Ni
  2016-03-30 17:27       ` Shaohua Li
  2 siblings, 1 reply; 8+ messages in thread
From: Xiao Ni @ 2016-03-30  7:44 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid, Jes Sorensen, Neil Brown



----- Original Message -----
> From: "Shaohua Li" <shli@kernel.org>
> To: "Xiao Ni" <xni@redhat.com>
> Cc: "linux-raid" <linux-raid@vger.kernel.org>, "Jes Sorensen" <Jes.Sorensen@redhat.com>, "Neil Brown" <neilb@suse.de>
> Sent: Wednesday, March 30, 2016 5:37:31 AM
> Subject: Re: Unable to handle kernel NULL pointer dereference in super_written
> 
> On Tue, Mar 29, 2016 at 08:22:00AM -0400, Xiao Ni wrote:
> > Hi all
> > 
> > I encountered one NULL pointer dereference problem.
> > 
> > The environment：
> > latest linux-stable and mdadm codes
> > aarch64 platform
> > the md device is created with loop devices
> > 
> > It's a test case to check date integrity. I added the test script as the
> > attachment.
> 
> Could you please try this patch:

Thanks for the patch, I'm running test and will give the result. It need to run 
more than 300 iterations to reproduce this.

> 
> 
> From b86d9e1724184c79ad1ea63901aec802492b861c Mon Sep 17 00:00:00 2001
> Message-Id:
> <b86d9e1724184c79ad1ea63901aec802492b861c.1459285706.git.shli@fb.com>
> From: Shaohua Li <shli@fb.com>
> Date: Tue, 29 Mar 2016 14:00:19 -0700
> Subject: [PATCH] MD: add rdev reference for super write
> 
> md_super_write() and corresponding md_super_wait() generally are called
> with reconfig_mutex locked, which prevents disk disappears. There is one
> case this rule is broken. write_sb_page of bitmap.c doesn't hold the
> mutex. next_active_rdev does increase rdev reference, but it decreases
> the reference too early (eg, before IO finish). disk can disappear at
> the window. We unconditionally increase rdev reference in
> md_super_write() to avoid the race.

In the path hot_remove_disk, the write_sb_page is protected by reconfig_mutex.
It shouldn't submit bio to the leg which is already set FAULTY. Could you give
an example to show how the buy happen? 

Best Regards
Xiao
> 
> Reported-by: Xiao Ni <xni@redhat.com>
> Cc: Neil Brown <neilb@suse.de>
> Signed-off-by: Shaohua Li <shli@fb.com>
> ---
>  drivers/md/md.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index c068f17..bcfde333 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -718,6 +718,7 @@ static void super_written(struct bio *bio)
>  
>  	if (atomic_dec_and_test(&mddev->pending_writes))
>  		wake_up(&mddev->sb_wait);
> +	rdev_dec_pending(rdev, mddev);
>  	bio_put(bio);
>  }
>  
> @@ -732,6 +733,8 @@ void md_super_write(struct mddev *mddev, struct md_rdev
> *rdev,
>  	 */
>  	struct bio *bio = bio_alloc_mddev(GFP_NOIO, 1, mddev);
>  
> +	atomic_inc(&rdev->nr_pending);
> +
>  	bio->bi_bdev = rdev->meta_bdev ? rdev->meta_bdev : rdev->bdev;
>  	bio->bi_iter.bi_sector = sector;
>  	bio_add_page(bio, page, size, 0);
> --
> 2.8.0.rc2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Unable to handle kernel NULL pointer dereference in super_written
  2016-03-30  7:44     ` Xiao Ni
@ 2016-03-30 17:27       ` Shaohua Li
  2016-03-31  3:30         ` Xiao Ni
  0 siblings, 1 reply; 8+ messages in thread
From: Shaohua Li @ 2016-03-30 17:27 UTC (permalink / raw)
  To: Xiao Ni, Shaohua Li; +Cc: linux-raid, Jes Sorensen, Neil Brown



On 03/30/2016 12:44 AM, Xiao Ni wrote:
>
> ----- Original Message -----
>> From: "Shaohua Li" <shli@kernel.org>
>> To: "Xiao Ni" <xni@redhat.com>
>> Cc: "linux-raid" <linux-raid@vger.kernel.org>, "Jes Sorensen" <Jes.Sorensen@redhat.com>, "Neil Brown" <neilb@suse.de>
>> Sent: Wednesday, March 30, 2016 5:37:31 AM
>> Subject: Re: Unable to handle kernel NULL pointer dereference in super_written
>>
>> On Tue, Mar 29, 2016 at 08:22:00AM -0400, Xiao Ni wrote:
>>> Hi all
>>>
>>> I encountered one NULL pointer dereference problem.
>>>
>>> The environment：
>>> latest linux-stable and mdadm codes
>>> aarch64 platform
>>> the md device is created with loop devices
>>>
>>> It's a test case to check date integrity. I added the test script as the
>>> attachment.
>> Could you please try this patch:
> Thanks for the patch, I'm running test and will give the result. It need to run
> more than 300 iterations to reproduce this.
>
>>
>>  From b86d9e1724184c79ad1ea63901aec802492b861c Mon Sep 17 00:00:00 2001
>> Message-Id:
>> <b86d9e1724184c79ad1ea63901aec802492b861c.1459285706.git.shli@fb.com>
>> From: Shaohua Li <shli@fb.com>
>> Date: Tue, 29 Mar 2016 14:00:19 -0700
>> Subject: [PATCH] MD: add rdev reference for super write
>>
>> md_super_write() and corresponding md_super_wait() generally are called
>> with reconfig_mutex locked, which prevents disk disappears. There is one
>> case this rule is broken. write_sb_page of bitmap.c doesn't hold the
>> mutex. next_active_rdev does increase rdev reference, but it decreases
>> the reference too early (eg, before IO finish). disk can disappear at
>> the window. We unconditionally increase rdev reference in
>> md_super_write() to avoid the race.
> In the path hot_remove_disk, the write_sb_page is protected by reconfig_mutex.
> It shouldn't submit bio to the leg which is already set FAULTY. Could you give
> an example to show how the buy happen?

Not sure if I understand your question correctly, but I try to answer. 
When a disk is reported faulty with md_error we don't immediately remove 
the disk as there is risk for example some IO is running in the rdev. We 
increase rdev reference in every IO and decrease the reference after IO 
finishes. You can find this in raid5.c for example. We only delete the 
rdev after the reference is 0, please see remove_and_add_spares(). So 
it's possible you will find disk with FAULTY set, but it's still in rdev 
list.

Thanks,
Shaohua
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Unable to handle kernel NULL pointer dereference in super_written
  2016-03-30 17:27       ` Shaohua Li
@ 2016-03-31  3:30         ` Xiao Ni
  0 siblings, 0 replies; 8+ messages in thread
From: Xiao Ni @ 2016-03-31  3:30 UTC (permalink / raw)
  To: shli; +Cc: linux-raid, Jes Sorensen, Neil Brown



----- Original Message -----
> From: "Shaohua Li" <shlikernel@gmail.com>
> To: "Xiao Ni" <xni@redhat.com>, "Shaohua Li" <shli@kernel.org>
> Cc: "linux-raid" <linux-raid@vger.kernel.org>, "Jes Sorensen" <Jes.Sorensen@redhat.com>, "Neil Brown" <neilb@suse.de>
> Sent: Thursday, March 31, 2016 1:27:19 AM
> Subject: Re: Unable to handle kernel NULL pointer dereference in super_written
> 
> 
> 
> On 03/30/2016 12:44 AM, Xiao Ni wrote:
> >
> > ----- Original Message -----
> >> From: "Shaohua Li" <shli@kernel.org>
> >> To: "Xiao Ni" <xni@redhat.com>
> >> Cc: "linux-raid" <linux-raid@vger.kernel.org>, "Jes Sorensen"
> >> <Jes.Sorensen@redhat.com>, "Neil Brown" <neilb@suse.de>
> >> Sent: Wednesday, March 30, 2016 5:37:31 AM
> >> Subject: Re: Unable to handle kernel NULL pointer dereference in
> >> super_written
> >>
> >> On Tue, Mar 29, 2016 at 08:22:00AM -0400, Xiao Ni wrote:
> >>> Hi all
> >>>
> >>> I encountered one NULL pointer dereference problem.
> >>>
> >>> The environment：
> >>> latest linux-stable and mdadm codes
> >>> aarch64 platform
> >>> the md device is created with loop devices
> >>>
> >>> It's a test case to check date integrity. I added the test script as the
> >>> attachment.
> >> Could you please try this patch:
> > Thanks for the patch, I'm running test and will give the result. It need to
> > run
> > more than 300 iterations to reproduce this.

Hi Shaohua

The test have run for more than 1000 times. The patch fixed the bug.

> >
> >>
> >>  From b86d9e1724184c79ad1ea63901aec802492b861c Mon Sep 17 00:00:00 2001
> >> Message-Id:
> >> <b86d9e1724184c79ad1ea63901aec802492b861c.1459285706.git.shli@fb.com>
> >> From: Shaohua Li <shli@fb.com>
> >> Date: Tue, 29 Mar 2016 14:00:19 -0700
> >> Subject: [PATCH] MD: add rdev reference for super write
> >>
> >> md_super_write() and corresponding md_super_wait() generally are called
> >> with reconfig_mutex locked, which prevents disk disappears. There is one
> >> case this rule is broken. write_sb_page of bitmap.c doesn't hold the
> >> mutex. next_active_rdev does increase rdev reference, but it decreases
> >> the reference too early (eg, before IO finish). disk can disappear at
> >> the window. We unconditionally increase rdev reference in
> >> md_super_write() to avoid the race.
> > In the path hot_remove_disk, the write_sb_page is protected by
> > reconfig_mutex.
> > It shouldn't submit bio to the leg which is already set FAULTY. Could you
> > give
> > an example to show how the buy happen?
> 
> Not sure if I understand your question correctly, but I try to answer.
> When a disk is reported faulty with md_error we don't immediately remove
> the disk as there is risk for example some IO is running in the rdev. We
> increase rdev reference in every IO and decrease the reference after IO
> finishes. You can find this in raid5.c for example. We only delete the
> rdev after the reference is 0, please see remove_and_add_spares(). So
> it's possible you will find disk with FAULTY set, but it's still in rdev
> list.

I'm sorry that I didn't describe clearly.

I just want to know how the bug happen. At first I just focus my attention
on the hot_remove_disk. I think it shouldn't write superblock to the device
which is already removed by md_kick_rdev_from_array. 

I read the comments from the patch and the codes again. Now I think I understand
clearly.

It's because the bitmap_deamon_work->write_page->write_sb_page->md_super_write
which is called by md_check_recovery. It doesn't protected by reconfig_mutex. 
So there is a chance that the disk is removed (rdev->mddev = NULL) when the
super io is flighting. Is it right?

Regards
Xiao
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-03-31  3:30 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <678678296.35099303.1459240762496.JavaMail.zimbra@redhat.com>
2016-03-29 12:22 ` Unable to handle kernel NULL pointer dereference in super_written Xiao Ni
2016-03-29 21:37   ` Shaohua Li
2016-03-29 22:23     ` NeilBrown
2016-03-30  2:34     ` Guoqing Jiang
2016-03-30 17:16       ` Shaohua Li
2016-03-30  7:44     ` Xiao Ni
2016-03-30 17:27       ` Shaohua Li
2016-03-31  3:30         ` Xiao Ni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).