From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Coly Li <colyli@suse.de>,
Shaohua Li <shli@fb.com>, Neil Brown <neilb@suse.com>
Subject: [PATCH 4.4 85/91] md linear: fix a race between linear_add() and linear_congested()
Date: Fri, 10 Mar 2017 10:09:24 +0100 [thread overview]
Message-ID: <20170310083905.041590216@linuxfoundation.org> (raw)
In-Reply-To: <20170310083900.730556986@linuxfoundation.org>
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: colyli@suse.de <colyli@suse.de>
commit 03a9e24ef2aaa5f1f9837356aed79c860521407a upstream.
Recently I receive a bug report that on Linux v3.0 based kerenl, hot add
disk to a md linear device causes kernel crash at linear_congested(). From
the crash image analysis, I find in linear_congested(), mddev->raid_disks
contains value N, but conf->disks[] only has N-1 pointers available. Then
a NULL pointer deference crashes the kernel.
There is a race between linear_add() and linear_congested(), RCU stuffs
used in these two functions cannot avoid the race. Since Linuv v4.0
RCU code is replaced by introducing mddev_suspend(). After checking the
upstream code, it seems linear_congested() is not called in
generic_make_request() code patch, so mddev_suspend() cannot provent it
from being called. The possible race still exists.
Here I explain how the race still exists in current code. For a machine
has many CPUs, on one CPU, linear_add() is called to add a hard disk to a
md linear device; at the same time on other CPU, linear_congested() is
called to detect whether this md linear device is congested before issuing
an I/O request onto it.
Now I use a possible code execution time sequence to demo how the possible
race happens,
seq linear_add() linear_congested()
0 conf=mddev->private
1 oldconf=mddev->private
2 mddev->raid_disks++
3 for (i=0; i<mddev->raid_disks;i++)
4 bdev_get_queue(conf->disks[i].rdev->bdev)
5 mddev->private=newconf
In linear_add() mddev->raid_disks is increased in time seq 2, and on
another CPU in linear_congested() the for-loop iterates conf->disks[i] by
the increased mddev->raid_disks in time seq 3,4. But conf with one more
element (which is a pointer to struct dev_info type) to conf->disks[] is
not updated yet, accessing its structure member in time seq 4 will cause a
NULL pointer deference fault.
To fix this race, there are 2 parts of modification in the patch,
1) Add 'int raid_disks' in struct linear_conf, as a copy of
mddev->raid_disks. It is initialized in linear_conf(), always being
consistent with pointers number of 'struct dev_info disks[]'. When
iterating conf->disks[] in linear_congested(), use conf->raid_disks to
replace mddev->raid_disks in the for-loop, then NULL pointer deference
will not happen again.
2) RCU stuffs are back again, and use kfree_rcu() in linear_add() to
free oldconf memory. Because oldconf may be referenced as mddev->private
in linear_congested(), kfree_rcu() makes sure that its memory will not
be released until no one uses it any more.
Also some code comments are added in this patch, to make this modification
to be easier understandable.
This patch can be applied for kernels since v4.0 after commit:
3be260cc18f8 ("md/linear: remove rcu protections in favour of
suspend/resume"). But this bug is reported on Linux v3.0 based kernel, for
people who maintain kernels before Linux v4.0, they need to do some back
back port to this patch.
Changelog:
- V3: add 'int raid_disks' in struct linear_conf, and use kfree_rcu() to
replace rcu_call() in linear_add().
- v2: add RCU stuffs by suggestion from Shaohua and Neil.
- v1: initial effort.
Signed-off-by: Coly Li <colyli@suse.de>
Cc: Shaohua Li <shli@fb.com>
Cc: Neil Brown <neilb@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/md/linear.c | 39 ++++++++++++++++++++++++++++++++++-----
drivers/md/linear.h | 1 +
2 files changed, 35 insertions(+), 5 deletions(-)
--- a/drivers/md/linear.c
+++ b/drivers/md/linear.c
@@ -52,18 +52,26 @@ static inline struct dev_info *which_dev
return conf->disks + lo;
}
+/*
+ * In linear_congested() conf->raid_disks is used as a copy of
+ * mddev->raid_disks to iterate conf->disks[], because conf->raid_disks
+ * and conf->disks[] are created in linear_conf(), they are always
+ * consitent with each other, but mddev->raid_disks does not.
+ */
static int linear_congested(struct mddev *mddev, int bits)
{
struct linear_conf *conf;
int i, ret = 0;
- conf = mddev->private;
+ rcu_read_lock();
+ conf = rcu_dereference(mddev->private);
- for (i = 0; i < mddev->raid_disks && !ret ; i++) {
+ for (i = 0; i < conf->raid_disks && !ret ; i++) {
struct request_queue *q = bdev_get_queue(conf->disks[i].rdev->bdev);
ret |= bdi_congested(&q->backing_dev_info, bits);
}
+ rcu_read_unlock();
return ret;
}
@@ -143,6 +151,19 @@ static struct linear_conf *linear_conf(s
conf->disks[i-1].end_sector +
conf->disks[i].rdev->sectors;
+ /*
+ * conf->raid_disks is copy of mddev->raid_disks. The reason to
+ * keep a copy of mddev->raid_disks in struct linear_conf is,
+ * mddev->raid_disks may not be consistent with pointers number of
+ * conf->disks[] when it is updated in linear_add() and used to
+ * iterate old conf->disks[] earray in linear_congested().
+ * Here conf->raid_disks is always consitent with number of
+ * pointers in conf->disks[] array, and mddev->private is updated
+ * with rcu_assign_pointer() in linear_addr(), such race can be
+ * avoided.
+ */
+ conf->raid_disks = raid_disks;
+
return conf;
out:
@@ -195,15 +216,23 @@ static int linear_add(struct mddev *mdde
if (!newconf)
return -ENOMEM;
+ /* newconf->raid_disks already keeps a copy of * the increased
+ * value of mddev->raid_disks, WARN_ONCE() is just used to make
+ * sure of this. It is possible that oldconf is still referenced
+ * in linear_congested(), therefore kfree_rcu() is used to free
+ * oldconf until no one uses it anymore.
+ */
mddev_suspend(mddev);
- oldconf = mddev->private;
+ oldconf = rcu_dereference(mddev->private);
mddev->raid_disks++;
- mddev->private = newconf;
+ WARN_ONCE(mddev->raid_disks != newconf->raid_disks,
+ "copied raid_disks doesn't match mddev->raid_disks");
+ rcu_assign_pointer(mddev->private, newconf);
md_set_array_sectors(mddev, linear_size(mddev, 0, 0));
set_capacity(mddev->gendisk, mddev->array_sectors);
mddev_resume(mddev);
revalidate_disk(mddev->gendisk);
- kfree(oldconf);
+ kfree_rcu(oldconf, rcu);
return 0;
}
--- a/drivers/md/linear.h
+++ b/drivers/md/linear.h
@@ -10,6 +10,7 @@ struct linear_conf
{
struct rcu_head rcu;
sector_t array_sectors;
+ int raid_disks; /* a copy of mddev->raid_disks */
struct dev_info disks[0];
};
#endif
next prev parent reply other threads:[~2017-03-10 9:16 UTC|newest]
Thread overview: 101+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-10 9:07 [PATCH 4.4 00/91] 4.4.53-stable review Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 01/91] MIPS: Fix special case in 64 bit IP checksumming Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 03/91] MIPS: OCTEON: Fix copy_from_user fault handling for large buffers Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 04/91] MIPS: Lantiq: Keep ethernet enabled during boot Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 05/91] MIPS: Clear ISA bit correctly in get_frame_info() Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 06/91] MIPS: Prevent unaligned accesses during stack unwinding Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 07/91] MIPS: Fix get_frame_info() handling of microMIPS function size Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 08/91] MIPS: Fix is_jump_ins() handling of 16b microMIPS instructions Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 09/91] MIPS: Calculate microMIPS ra properly when unwinding the stack Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 10/91] MIPS: Handle microMIPS jumps in the same way as MIPS32/MIPS64 jumps Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 11/91] [media] am437x-vpfe: always assign bpp variable Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 12/91] [media] uvcvideo: Fix a wrong macro Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 13/91] [media] media: fix dm1105.c build error Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 14/91] ARM: at91: define LPDDR types Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 15/91] ARM: dts: at91: Enable DMA on sama5d4_xplained console Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 16/91] ARM: dts: at91: Enable DMA on sama5d2_xplained console Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 17/91] ALSA: hda/realtek - Cannot adjust speakers volume on a Dell AIO Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 18/91] ALSA: hda - fix Lewisburg audio issue Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 19/91] ALSA: timer: Reject user params with too small ticks Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 20/91] ALSA: ctxfi: Fallback DMA mask to 32bit Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 21/91] ALSA: seq: Fix link corruption by event error handling Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 22/91] ALSA: hda - Add subwoofer support for Dell Inspiron 17 7000 Gaming Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 23/91] ALSA: hda - Fix micmute hotkey problem for a lenovo AIO machine Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 24/91] staging: rtl: fix possible NULL pointer dereference Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 25/91] regulator: Fix regulator_summary for deviceless consumers Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 26/91] iommu/vt-d: Fix some macros that are incorrectly specified in intel-iommu Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 27/91] iommu/vt-d: Tylersburg isoch identity map check is done too late Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 28/91] mm/page_alloc: fix nodes for reclaim in fast path Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 29/91] mm: vmpressure: fix sending wrong events on underflow Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 30/91] mm: do not access page->mapping directly on page_endio Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 31/91] ipc/shm: Fix shmat mmap nil-page protection Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 32/91] dm cache: fix corruption seen when using cache > 2TB Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 33/91] dm stats: fix a leaked s->histogram_boundaries array Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 34/91] scsi: storvsc: use tagged SRB requests if supported by the device Greg Kroah-Hartman
2017-03-10 14:56 ` Ben Hutchings
2017-03-10 15:21 ` Greg Kroah-Hartman
2017-03-10 15:29 ` KY Srinivasan
2017-03-10 9:08 ` [PATCH 4.4 35/91] scsi: storvsc: properly handle SRB_ERROR when sense message is present Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 36/91] scsi: storvsc: properly set residual data length on errors Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 37/91] scsi: aacraid: Reorder Adapter status check Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 38/91] scsi: use scsi_device_from_queue() for scsi_dh Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 39/91] sd: get disk reference in sd_check_events() Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 40/91] Fix: Disable sys_membarrier when nohz_full is enabled Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 41/91] jbd2: dont leak modified metadata buffers on an aborted journal Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 42/91] block/loop: fix race between I/O and set_status Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 43/91] loop: fix LO_FLAGS_PARTSCAN hang Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 44/91] ext4: Include forgotten start block on fallocate insert range Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 45/91] ext4: do not polute the extents cache while shifting extents Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 46/91] ext4: trim allocation requests to group size Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 47/91] ext4: fix data corruption in data=journal mode Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 48/91] ext4: fix inline data error paths Greg Kroah-Hartman
2017-03-10 16:48 ` Ben Hutchings
2017-03-12 5:22 ` Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 49/91] ext4: preserve the needs_recovery flag when the journal is aborted Greg Kroah-Hartman
2017-03-10 16:58 ` Ben Hutchings
2017-03-10 20:14 ` Theodore Ts'o
2017-03-11 5:27 ` Ben Hutchings
2017-03-10 9:08 ` [PATCH 4.4 50/91] ext4: return EROFS if device is r/o and journal replay is needed Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 52/91] target: Obtain se_node_acl->acl_kref during get_initiator_node_acl Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 53/91] target: Fix multi-session dynamic se_node_acl double free OOPs Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 54/91] ath5k: drop bogus warning on drv_set_key with unsupported cipher Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 55/91] ath9k: fix race condition in enabling/disabling IRQs Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 56/91] ath9k: use correct OTP register offsets for the AR9340 and AR9550 Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 57/91] crypto: testmgr - Pad aes_ccm_enc_tv_template vector Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 58/91] fuse: add missing FR_FORCE Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 59/91] arm/arm64: KVM: Enforce unconditional flush to PoC when mapping to stage-2 Greg Kroah-Hartman
2017-03-10 9:08 ` [PATCH 4.4 60/91] iio: pressure: mpl115: do not rely on structure field ordering Greg Kroah-Hartman
2017-03-10 9:09 ` [PATCH 4.4 61/91] iio: pressure: mpl3115: " Greg Kroah-Hartman
2017-03-10 9:09 ` [PATCH 4.4 62/91] can: usb_8dev: Fix memory leak of priv->cmd_msg_buffer Greg Kroah-Hartman
2017-03-10 9:09 ` [PATCH 4.4 63/91] w1: dont leak refcount on slave attach failure in w1_attach_slave_device() Greg Kroah-Hartman
2017-03-10 9:09 ` [PATCH 4.4 64/91] w1: ds2490: USB transfer buffers need to be DMAable Greg Kroah-Hartman
2017-03-10 9:09 ` [PATCH 4.4 65/91] usb: musb: da8xx: Remove CPPI 3.0 quirk and methods Greg Kroah-Hartman
2017-03-10 9:09 ` [PATCH 4.4 66/91] usb: host: xhci: plat: check hcc_params after add hcd Greg Kroah-Hartman
2017-03-10 9:09 ` [PATCH 4.4 67/91] usb: gadget: udc: fsl: Add missing complete function Greg Kroah-Hartman
2017-03-10 9:09 ` [PATCH 4.4 68/91] hv: allocate synic pages for all present CPUs Greg Kroah-Hartman
2017-03-10 9:09 ` [PATCH 4.4 69/91] hv: init percpu_list in hv_synic_alloc() Greg Kroah-Hartman
2017-03-10 9:09 ` [PATCH 4.4 70/91] Drivers: hv: util: kvp: Fix a rescind processing issue Greg Kroah-Hartman
2017-03-10 9:09 ` [PATCH 4.4 71/91] Drivers: hv: util: Fcopy: " Greg Kroah-Hartman
2017-03-10 9:09 ` [PATCH 4.4 72/91] Drivers: hv: util: Backup: " Greg Kroah-Hartman
2017-03-10 9:09 ` [PATCH 4.4 73/91] RDMA/core: Fix incorrect structure packing for booleans Greg Kroah-Hartman
2017-03-10 9:09 ` [PATCH 4.4 74/91] rdma_cm: fail iwarp accepts w/o connection params Greg Kroah-Hartman
2017-03-10 9:09 ` [PATCH 4.4 75/91] gfs2: Add missing rcu locking for glock lookup Greg Kroah-Hartman
2017-03-10 9:09 ` [PATCH 4.4 76/91] rtlwifi: Fix alignment issues Greg Kroah-Hartman
2017-03-10 9:09 ` [PATCH 4.4 77/91] rtlwifi: rtl8192c-common: Fix "BUG: KASAN: Greg Kroah-Hartman
2017-03-10 9:09 ` [PATCH 4.4 78/91] nfsd: minor nfsd_setattr cleanup Greg Kroah-Hartman
2017-03-10 9:09 ` [PATCH 4.4 79/91] nfsd: special case truncates some more Greg Kroah-Hartman
2017-03-10 9:09 ` [PATCH 4.4 80/91] NFSv4: Fix memory and state leak in _nfs4_open_and_get_state Greg Kroah-Hartman
2017-03-10 9:09 ` [PATCH 4.4 81/91] NFSv4: fix getacl head length estimation Greg Kroah-Hartman
2017-03-10 9:09 ` [PATCH 4.4 82/91] NFSv4: fix getacl ERANGE for some ACL buffer sizes Greg Kroah-Hartman
2017-03-10 9:09 ` [PATCH 4.4 83/91] rtc: sun6i: Add some locking Greg Kroah-Hartman
2017-03-10 9:09 ` [PATCH 4.4 84/91] rtc: sun6i: Switch to the external oscillator Greg Kroah-Hartman
2017-03-10 9:09 ` Greg Kroah-Hartman [this message]
2017-03-10 9:09 ` [PATCH 4.4 87/91] dmaengine: ipu: Make sure the interrupt routine checks all interrupts Greg Kroah-Hartman
2017-03-10 9:09 ` [PATCH 4.4 88/91] powerpc/xmon: Fix data-breakpoint Greg Kroah-Hartman
2017-03-10 9:09 ` [PATCH 4.4 89/91] MIPS: IP22: Reformat inline assembler code to modern standards Greg Kroah-Hartman
2017-03-10 9:09 ` [PATCH 4.4 90/91] MIPS: IP22: Fix build error due to binutils 2.25 uselessnes Greg Kroah-Hartman
2017-03-10 9:09 ` [PATCH 4.4 91/91] scsi: lpfc: Correct WQ creation for pagesize Greg Kroah-Hartman
2017-03-10 18:35 ` [PATCH 4.4 00/91] 4.4.53-stable review Guenter Roeck
2017-03-10 19:15 ` Shuah Khan
[not found] ` <58c2d01c.cdd8190a.421eb.b1d4@mx.google.com>
[not found] ` <m2pohoes9u.fsf@baylibre.com>
2017-03-13 8:56 ` Thomas Petazzoni
2017-03-14 17:08 ` Kevin Hilman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170310083905.041590216@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=colyli@suse.de \
--cc=linux-kernel@vger.kernel.org \
--cc=neilb@suse.com \
--cc=shli@fb.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).