From: Coly Li <colyli@suse.de>
To: Shaohua Li <shli@kernel.org>
Cc: linux-raid@vger.kernel.org, Shaohua Li <shli@fb.com>,
Neil Brown <neilb@suse.com>,
stable@vger.kernel.org
Subject: Re: [PATCH v3] md linear: fix a race between linear_add() and linear_congested()
Date: Sun, 29 Jan 2017 14:39:11 +0800 [thread overview]
Message-ID: <ecda9771-dc76-66e9-862a-e8b953d94a0b@suse.de> (raw)
In-Reply-To: <20170129014555.lgmk2ivste2xttxq@kernel.org>
On 2017/1/29 上午9:45, Shaohua Li wrote:
> On Sat, Jan 28, 2017 at 09:11:49PM +0800, colyli@suse.de wrote:
>> Recently I receive a bug report that on Linux v3.0 based kerenl, hot add
>> disk to a md linear device causes kernel crash at linear_congested(). From
>> the crash image analysis, I find in linear_congested(), mddev->raid_disks
>> contains value N, but conf->disks[] only has N-1 pointers available. Then
>> a NULL pointer deference crashes the kernel.
>>
>> There is a race between linear_add() and linear_congested(), RCU stuffs
>> used in these two functions cannot avoid the race. Since Linuv v4.0
>> RCU code is replaced by introducing mddev_suspend(). After checking the
>> upstream code, it seems linear_congested() is not called in
>> generic_make_request() code patch, so mddev_suspend() cannot provent it
>> from being called. The possible race still exists.
>>
>> Here I explain how the race still exists in current code. For a machine
>> has many CPUs, on one CPU, linear_add() is called to add a hard disk to a
>> md linear device; at the same time on other CPU, linear_congested() is
>> called to detect whether this md linear device is congested before issuing
>> an I/O request onto it.
>>
>> Now I use a possible code execution time sequence to demo how the possible
>> race happens,
>>
>> seq linear_add() linear_congested()
>> 0 conf=mddev->private
>> 1 oldconf=mddev->private
>> 2 mddev->raid_disks++
>> 3 for (i=0; i<mddev->raid_disks;i++)
>> 4 bdev_get_queue(conf->disks[i].rdev->bdev)
>> 5 mddev->private=newconf
>>
>> In linear_add() mddev->raid_disks is increased in time seq 2, and on
>> another CPU in linear_congested() the for-loop iterates conf->disks[i] by
>> the increased mddev->raid_disks in time seq 3,4. But conf with one more
>> element (which is a pointer to struct dev_info type) to conf->disks[] is
>> not updated yet, accessing its structure member in time seq 4 will cause a
>> NULL pointer deference fault.
>>
>> To fix this race, there are 2 parts of modification in the patch,
>> 1) Add 'int raid_disks' in struct linear_conf, as a copy of
>> mddev->raid_disks. It is initialized in linear_conf(), always being
>> consistent with pointers number of 'struct dev_info disks[]'. When
>> iterating conf->disks[] in linear_congested(), use conf->raid_disks to
>> replace mddev->raid_disks in the for-loop, then NULL pointer deference
>> will not happen again.
>> 2) RCU stuffs are back again, and use kfree_rcu() in linear_add() to
>> free oldconf memory. Because oldconf may be referenced as mddev->private
>> in linear_congested(), kfree_rcu() makes sure that its memory will not
>> be released until no one uses it any more.
>> Also some code comments are added in this patch, to make this modification
>> to be easier understandable.
>>
>> This patch can be applied for kernels since v4.0 after commit:
>> 3be260cc18f8 ("md/linear: remove rcu protections in favour of
>> suspend/resume"). But this bug is reported on Linux v3.0 based kernel, for
>> people who maintain kernels before Linux v4.0, they need to do some back
>> back port to this patch.
>>
>> Changelog:
>> - V3: add 'int raid_disks' in struct linear_conf, and use kfree_rcu() to
>> replace rcu_call() in linear_add().
>> - v2: add RCU stuffs by suggestion from Shaohua and Neil.
>> - v1: initial effort.
>>
>> Signed-off-by: Coly Li <colyli@suse.de>
>> Cc: Shaohua Li <shli@fb.com>
>> Cc: Neil Brown <neilb@suse.com>
>> Cc: stable@vger.kernel.org
> Hi,
>
> Happy new year! Applied, though I changed the format of comments. It should be:
> /*
> * comments
> */
I just use this comment format follow the existing comments in linear.c
file. OK I will follow the above format in future.
Thank you! And Happy New Year :-)
Coly
prev parent reply other threads:[~2017-01-29 6:39 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-01-28 13:11 [PATCH v3] md linear: fix a race between linear_add() and linear_congested() colyli
2017-01-29 1:45 ` Shaohua Li
2017-01-29 6:39 ` Coly Li [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ecda9771-dc76-66e9-862a-e8b953d94a0b@suse.de \
--to=colyli@suse.de \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.com \
--cc=shli@fb.com \
--cc=shli@kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox