* Re: Unable to handle kernel NULL pointer dereference in super_written
2016-03-29 21:37 ` Shaohua Li
@ 2016-03-29 22:23 ` NeilBrown
2016-03-30 2:34 ` Guoqing Jiang
2016-03-30 7:44 ` Xiao Ni
2 siblings, 0 replies; 8+ messages in thread
From: NeilBrown @ 2016-03-29 22:23 UTC (permalink / raw)
To: Shaohua Li, Xiao Ni; +Cc: linux-raid, Jes.Sorensen
[-- Attachment #1: Type: text/plain, Size: 2421 bytes --]
On Wed, Mar 30 2016, Shaohua Li wrote:
> On Tue, Mar 29, 2016 at 08:22:00AM -0400, Xiao Ni wrote:
>> Hi all
>>
>> I encountered one NULL pointer dereference problem.
>>
>> The environment:
>> latest linux-stable and mdadm codes
>> aarch64 platform
>> the md device is created with loop devices
>>
>> It's a test case to check date integrity. I added the test script as the attachment.
>
> Could you please try this patch:
>
>
> From b86d9e1724184c79ad1ea63901aec802492b861c Mon Sep 17 00:00:00 2001
> Message-Id: <b86d9e1724184c79ad1ea63901aec802492b861c.1459285706.git.shli@fb.com>
> From: Shaohua Li <shli@fb.com>
> Date: Tue, 29 Mar 2016 14:00:19 -0700
> Subject: [PATCH] MD: add rdev reference for super write
>
> md_super_write() and corresponding md_super_wait() generally are called
> with reconfig_mutex locked, which prevents disk disappears. There is one
> case this rule is broken. write_sb_page of bitmap.c doesn't hold the
> mutex. next_active_rdev does increase rdev reference, but it decreases
> the reference too early (eg, before IO finish). disk can disappear at
> the window. We unconditionally increase rdev reference in
> md_super_write() to avoid the race.
>
Yes, that makes sense. Thanks.
Acked-by: NeilBrown <neilb@suse.com>
> Reported-by: Xiao Ni <xni@redhat.com>
> Cc: Neil Brown <neilb@suse.de>
> Signed-off-by: Shaohua Li <shli@fb.com>
> ---
> drivers/md/md.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index c068f17..bcfde333 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -718,6 +718,7 @@ static void super_written(struct bio *bio)
>
> if (atomic_dec_and_test(&mddev->pending_writes))
> wake_up(&mddev->sb_wait);
> + rdev_dec_pending(rdev, mddev);
> bio_put(bio);
> }
>
> @@ -732,6 +733,8 @@ void md_super_write(struct mddev *mddev, struct md_rdev *rdev,
> */
> struct bio *bio = bio_alloc_mddev(GFP_NOIO, 1, mddev);
>
> + atomic_inc(&rdev->nr_pending);
> +
> bio->bi_bdev = rdev->meta_bdev ? rdev->meta_bdev : rdev->bdev;
> bio->bi_iter.bi_sector = sector;
> bio_add_page(bio, page, size, 0);
> --
> 2.8.0.rc2
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Unable to handle kernel NULL pointer dereference in super_written
2016-03-29 21:37 ` Shaohua Li
2016-03-29 22:23 ` NeilBrown
@ 2016-03-30 2:34 ` Guoqing Jiang
2016-03-30 17:16 ` Shaohua Li
2016-03-30 7:44 ` Xiao Ni
2 siblings, 1 reply; 8+ messages in thread
From: Guoqing Jiang @ 2016-03-30 2:34 UTC (permalink / raw)
To: Shaohua Li, Xiao Ni; +Cc: linux-raid, Jes.Sorensen, Neil Brown
On 03/30/2016 05:37 AM, Shaohua Li wrote:
> On Tue, Mar 29, 2016 at 08:22:00AM -0400, Xiao Ni wrote:
>> Hi all
>>
>> I encountered one NULL pointer dereference problem.
>>
>> The environment:
>> latest linux-stable and mdadm codes
>> aarch64 platform
>> the md device is created with loop devices
>>
>> It's a test case to check date integrity. I added the test script as the attachment.
> Could you please try this patch:
>
>
> From b86d9e1724184c79ad1ea63901aec802492b861c Mon Sep 17 00:00:00 2001
> Message-Id: <b86d9e1724184c79ad1ea63901aec802492b861c.1459285706.git.shli@fb.com>
> From: Shaohua Li <shli@fb.com>
> Date: Tue, 29 Mar 2016 14:00:19 -0700
> Subject: [PATCH] MD: add rdev reference for super write
>
> md_super_write() and corresponding md_super_wait() generally are called
> with reconfig_mutex locked, which prevents disk disappears.
Just for curious, I find several paths maybe also don't hold reconfig_mutex,
take the followings as example.
1. md_run -> md_update_sb -> md_super_write/md_super_wait
2. rdev_size_store -> rdev_size_change -> md_super_write/md_super_wait
Thanks,
Guoqing
> There is one
> case this rule is broken. write_sb_page of bitmap.c doesn't hold the
> mutex. next_active_rdev does increase rdev reference, but it decreases
> the reference too early (eg, before IO finish). disk can disappear at
> the window. We unconditionally increase rdev reference in
> md_super_write() to avoid the race.
>
> Reported-by: Xiao Ni <xni@redhat.com>
> Cc: Neil Brown <neilb@suse.de>
> Signed-off-by: Shaohua Li <shli@fb.com>
> ---
> drivers/md/md.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index c068f17..bcfde333 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -718,6 +718,7 @@ static void super_written(struct bio *bio)
>
> if (atomic_dec_and_test(&mddev->pending_writes))
> wake_up(&mddev->sb_wait);
> + rdev_dec_pending(rdev, mddev);
> bio_put(bio);
> }
>
> @@ -732,6 +733,8 @@ void md_super_write(struct mddev *mddev, struct md_rdev *rdev,
> */
> struct bio *bio = bio_alloc_mddev(GFP_NOIO, 1, mddev);
>
> + atomic_inc(&rdev->nr_pending);
> +
> bio->bi_bdev = rdev->meta_bdev ? rdev->meta_bdev : rdev->bdev;
> bio->bi_iter.bi_sector = sector;
> bio_add_page(bio, page, size, 0);
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Unable to handle kernel NULL pointer dereference in super_written
2016-03-30 2:34 ` Guoqing Jiang
@ 2016-03-30 17:16 ` Shaohua Li
0 siblings, 0 replies; 8+ messages in thread
From: Shaohua Li @ 2016-03-30 17:16 UTC (permalink / raw)
To: Guoqing Jiang, Shaohua Li, Xiao Ni; +Cc: linux-raid, Jes.Sorensen, Neil Brown
On 03/29/2016 07:34 PM, Guoqing Jiang wrote:
>
>
> On 03/30/2016 05:37 AM, Shaohua Li wrote:
>> On Tue, Mar 29, 2016 at 08:22:00AM -0400, Xiao Ni wrote:
>>> Hi all
>>>
>>> I encountered one NULL pointer dereference problem.
>>>
>>> The environment:
>>> latest linux-stable and mdadm codes
>>> aarch64 platform
>>> the md device is created with loop devices
>>>
>>> It's a test case to check date integrity. I added the test script as
>>> the attachment.
>> Could you please try this patch:
>>
>>
>> From b86d9e1724184c79ad1ea63901aec802492b861c Mon Sep 17 00:00:00 2001
>> Message-Id:
>> <b86d9e1724184c79ad1ea63901aec802492b861c.1459285706.git.shli@fb.com>
>> From: Shaohua Li <shli@fb.com>
>> Date: Tue, 29 Mar 2016 14:00:19 -0700
>> Subject: [PATCH] MD: add rdev reference for super write
>>
>> md_super_write() and corresponding md_super_wait() generally are called
>> with reconfig_mutex locked, which prevents disk disappears.
>
> Just for curious, I find several paths maybe also don't hold
> reconfig_mutex,
> take the followings as example.
>
> 1. md_run -> md_update_sb -> md_super_write/md_super_wait
> 2. rdev_size_store -> rdev_size_change -> md_super_write/md_super_wait
we do mddev_lock/unlock calling these. The rdev_size_sotre is a bit
tricky. the lock is hold in rdev_attr_store
Thanks,
Shaohua
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Unable to handle kernel NULL pointer dereference in super_written
2016-03-29 21:37 ` Shaohua Li
2016-03-29 22:23 ` NeilBrown
2016-03-30 2:34 ` Guoqing Jiang
@ 2016-03-30 7:44 ` Xiao Ni
2016-03-30 17:27 ` Shaohua Li
2 siblings, 1 reply; 8+ messages in thread
From: Xiao Ni @ 2016-03-30 7:44 UTC (permalink / raw)
To: Shaohua Li; +Cc: linux-raid, Jes Sorensen, Neil Brown
----- Original Message -----
> From: "Shaohua Li" <shli@kernel.org>
> To: "Xiao Ni" <xni@redhat.com>
> Cc: "linux-raid" <linux-raid@vger.kernel.org>, "Jes Sorensen" <Jes.Sorensen@redhat.com>, "Neil Brown" <neilb@suse.de>
> Sent: Wednesday, March 30, 2016 5:37:31 AM
> Subject: Re: Unable to handle kernel NULL pointer dereference in super_written
>
> On Tue, Mar 29, 2016 at 08:22:00AM -0400, Xiao Ni wrote:
> > Hi all
> >
> > I encountered one NULL pointer dereference problem.
> >
> > The environment:
> > latest linux-stable and mdadm codes
> > aarch64 platform
> > the md device is created with loop devices
> >
> > It's a test case to check date integrity. I added the test script as the
> > attachment.
>
> Could you please try this patch:
Thanks for the patch, I'm running test and will give the result. It need to run
more than 300 iterations to reproduce this.
>
>
> From b86d9e1724184c79ad1ea63901aec802492b861c Mon Sep 17 00:00:00 2001
> Message-Id:
> <b86d9e1724184c79ad1ea63901aec802492b861c.1459285706.git.shli@fb.com>
> From: Shaohua Li <shli@fb.com>
> Date: Tue, 29 Mar 2016 14:00:19 -0700
> Subject: [PATCH] MD: add rdev reference for super write
>
> md_super_write() and corresponding md_super_wait() generally are called
> with reconfig_mutex locked, which prevents disk disappears. There is one
> case this rule is broken. write_sb_page of bitmap.c doesn't hold the
> mutex. next_active_rdev does increase rdev reference, but it decreases
> the reference too early (eg, before IO finish). disk can disappear at
> the window. We unconditionally increase rdev reference in
> md_super_write() to avoid the race.
In the path hot_remove_disk, the write_sb_page is protected by reconfig_mutex.
It shouldn't submit bio to the leg which is already set FAULTY. Could you give
an example to show how the buy happen?
Best Regards
Xiao
>
> Reported-by: Xiao Ni <xni@redhat.com>
> Cc: Neil Brown <neilb@suse.de>
> Signed-off-by: Shaohua Li <shli@fb.com>
> ---
> drivers/md/md.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index c068f17..bcfde333 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -718,6 +718,7 @@ static void super_written(struct bio *bio)
>
> if (atomic_dec_and_test(&mddev->pending_writes))
> wake_up(&mddev->sb_wait);
> + rdev_dec_pending(rdev, mddev);
> bio_put(bio);
> }
>
> @@ -732,6 +733,8 @@ void md_super_write(struct mddev *mddev, struct md_rdev
> *rdev,
> */
> struct bio *bio = bio_alloc_mddev(GFP_NOIO, 1, mddev);
>
> + atomic_inc(&rdev->nr_pending);
> +
> bio->bi_bdev = rdev->meta_bdev ? rdev->meta_bdev : rdev->bdev;
> bio->bi_iter.bi_sector = sector;
> bio_add_page(bio, page, size, 0);
> --
> 2.8.0.rc2
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Unable to handle kernel NULL pointer dereference in super_written
2016-03-30 7:44 ` Xiao Ni
@ 2016-03-30 17:27 ` Shaohua Li
2016-03-31 3:30 ` Xiao Ni
0 siblings, 1 reply; 8+ messages in thread
From: Shaohua Li @ 2016-03-30 17:27 UTC (permalink / raw)
To: Xiao Ni, Shaohua Li; +Cc: linux-raid, Jes Sorensen, Neil Brown
On 03/30/2016 12:44 AM, Xiao Ni wrote:
>
> ----- Original Message -----
>> From: "Shaohua Li" <shli@kernel.org>
>> To: "Xiao Ni" <xni@redhat.com>
>> Cc: "linux-raid" <linux-raid@vger.kernel.org>, "Jes Sorensen" <Jes.Sorensen@redhat.com>, "Neil Brown" <neilb@suse.de>
>> Sent: Wednesday, March 30, 2016 5:37:31 AM
>> Subject: Re: Unable to handle kernel NULL pointer dereference in super_written
>>
>> On Tue, Mar 29, 2016 at 08:22:00AM -0400, Xiao Ni wrote:
>>> Hi all
>>>
>>> I encountered one NULL pointer dereference problem.
>>>
>>> The environment:
>>> latest linux-stable and mdadm codes
>>> aarch64 platform
>>> the md device is created with loop devices
>>>
>>> It's a test case to check date integrity. I added the test script as the
>>> attachment.
>> Could you please try this patch:
> Thanks for the patch, I'm running test and will give the result. It need to run
> more than 300 iterations to reproduce this.
>
>>
>> From b86d9e1724184c79ad1ea63901aec802492b861c Mon Sep 17 00:00:00 2001
>> Message-Id:
>> <b86d9e1724184c79ad1ea63901aec802492b861c.1459285706.git.shli@fb.com>
>> From: Shaohua Li <shli@fb.com>
>> Date: Tue, 29 Mar 2016 14:00:19 -0700
>> Subject: [PATCH] MD: add rdev reference for super write
>>
>> md_super_write() and corresponding md_super_wait() generally are called
>> with reconfig_mutex locked, which prevents disk disappears. There is one
>> case this rule is broken. write_sb_page of bitmap.c doesn't hold the
>> mutex. next_active_rdev does increase rdev reference, but it decreases
>> the reference too early (eg, before IO finish). disk can disappear at
>> the window. We unconditionally increase rdev reference in
>> md_super_write() to avoid the race.
> In the path hot_remove_disk, the write_sb_page is protected by reconfig_mutex.
> It shouldn't submit bio to the leg which is already set FAULTY. Could you give
> an example to show how the buy happen?
Not sure if I understand your question correctly, but I try to answer.
When a disk is reported faulty with md_error we don't immediately remove
the disk as there is risk for example some IO is running in the rdev. We
increase rdev reference in every IO and decrease the reference after IO
finishes. You can find this in raid5.c for example. We only delete the
rdev after the reference is 0, please see remove_and_add_spares(). So
it's possible you will find disk with FAULTY set, but it's still in rdev
list.
Thanks,
Shaohua
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Unable to handle kernel NULL pointer dereference in super_written
2016-03-30 17:27 ` Shaohua Li
@ 2016-03-31 3:30 ` Xiao Ni
0 siblings, 0 replies; 8+ messages in thread
From: Xiao Ni @ 2016-03-31 3:30 UTC (permalink / raw)
To: shli; +Cc: linux-raid, Jes Sorensen, Neil Brown
----- Original Message -----
> From: "Shaohua Li" <shlikernel@gmail.com>
> To: "Xiao Ni" <xni@redhat.com>, "Shaohua Li" <shli@kernel.org>
> Cc: "linux-raid" <linux-raid@vger.kernel.org>, "Jes Sorensen" <Jes.Sorensen@redhat.com>, "Neil Brown" <neilb@suse.de>
> Sent: Thursday, March 31, 2016 1:27:19 AM
> Subject: Re: Unable to handle kernel NULL pointer dereference in super_written
>
>
>
> On 03/30/2016 12:44 AM, Xiao Ni wrote:
> >
> > ----- Original Message -----
> >> From: "Shaohua Li" <shli@kernel.org>
> >> To: "Xiao Ni" <xni@redhat.com>
> >> Cc: "linux-raid" <linux-raid@vger.kernel.org>, "Jes Sorensen"
> >> <Jes.Sorensen@redhat.com>, "Neil Brown" <neilb@suse.de>
> >> Sent: Wednesday, March 30, 2016 5:37:31 AM
> >> Subject: Re: Unable to handle kernel NULL pointer dereference in
> >> super_written
> >>
> >> On Tue, Mar 29, 2016 at 08:22:00AM -0400, Xiao Ni wrote:
> >>> Hi all
> >>>
> >>> I encountered one NULL pointer dereference problem.
> >>>
> >>> The environment:
> >>> latest linux-stable and mdadm codes
> >>> aarch64 platform
> >>> the md device is created with loop devices
> >>>
> >>> It's a test case to check date integrity. I added the test script as the
> >>> attachment.
> >> Could you please try this patch:
> > Thanks for the patch, I'm running test and will give the result. It need to
> > run
> > more than 300 iterations to reproduce this.
Hi Shaohua
The test have run for more than 1000 times. The patch fixed the bug.
> >
> >>
> >> From b86d9e1724184c79ad1ea63901aec802492b861c Mon Sep 17 00:00:00 2001
> >> Message-Id:
> >> <b86d9e1724184c79ad1ea63901aec802492b861c.1459285706.git.shli@fb.com>
> >> From: Shaohua Li <shli@fb.com>
> >> Date: Tue, 29 Mar 2016 14:00:19 -0700
> >> Subject: [PATCH] MD: add rdev reference for super write
> >>
> >> md_super_write() and corresponding md_super_wait() generally are called
> >> with reconfig_mutex locked, which prevents disk disappears. There is one
> >> case this rule is broken. write_sb_page of bitmap.c doesn't hold the
> >> mutex. next_active_rdev does increase rdev reference, but it decreases
> >> the reference too early (eg, before IO finish). disk can disappear at
> >> the window. We unconditionally increase rdev reference in
> >> md_super_write() to avoid the race.
> > In the path hot_remove_disk, the write_sb_page is protected by
> > reconfig_mutex.
> > It shouldn't submit bio to the leg which is already set FAULTY. Could you
> > give
> > an example to show how the buy happen?
>
> Not sure if I understand your question correctly, but I try to answer.
> When a disk is reported faulty with md_error we don't immediately remove
> the disk as there is risk for example some IO is running in the rdev. We
> increase rdev reference in every IO and decrease the reference after IO
> finishes. You can find this in raid5.c for example. We only delete the
> rdev after the reference is 0, please see remove_and_add_spares(). So
> it's possible you will find disk with FAULTY set, but it's still in rdev
> list.
I'm sorry that I didn't describe clearly.
I just want to know how the bug happen. At first I just focus my attention
on the hot_remove_disk. I think it shouldn't write superblock to the device
which is already removed by md_kick_rdev_from_array.
I read the comments from the patch and the codes again. Now I think I understand
clearly.
It's because the bitmap_deamon_work->write_page->write_sb_page->md_super_write
which is called by md_check_recovery. It doesn't protected by reconfig_mutex.
So there is a chance that the disk is removed (rdev->mddev = NULL) when the
super io is flighting. Is it right?
Regards
Xiao
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread