From: NeilBrown <neilb@suse.de>
To: linux-raid@vger.kernel.org
Subject: [md PATCH 01/16] md: Fix race when creating a new md device.
Date: Wed, 11 May 2011 16:30:30 +1000 [thread overview]
Message-ID: <20110511063030.21263.73208.stgit@notabene.brown> (raw)
In-Reply-To: <20110511062743.21263.72802.stgit@notabene.brown>
There is a race when creating an md device by opening /dev/mdXX.
If two processes do this at much the same time they will follow the
call path
__blkdev_get -> get_gendisk -> kobj_lookup
The first will call
-> md_probe -> md_alloc -> add_disk -> blk_register_region
and the race happens when the second gets to kobj_lookup after
add_disk has called blk_register_region but before it returns to
md_alloc.
In the case the second will not call md_probe (as the probe is already
done) but will get a handle on the gendisk, return to __blkdev_get
which will then call md_open (via the ->open) pointer.
As mddev->gendisk hasn't been set yet, md_open will think something is
wrong an return with ERESTARTSYS.
This can loop endlessly while the first thread makes no progress
through add_disk. Nothing is blocking it, but due to scheduler
behaviour it doesn't get a turn.
So this is essentially a live-lock.
We fix this by simply moving the assignment to mddev->gendisk before
the call the add_disk() so md_open doesn't get confused.
Also move blk_queue_flush earlier because add_disk should be as late
as possible.
To make sure that md_open doesn't complete until md_alloc has done all
that is needed, we take mddev->open_mutex during the last part of
md_alloc. md_open will wait for this.
This can cause a lock-up on boot so Cc:ing for stable.
For 2.6.36 and earlier a different patch will be needed as the
'blk_queue_flush' call isn't there.
Signed-off-by: NeilBrown <neilb@suse.de>
Reported-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
Tested-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
Cc: stable@kernel.org
---
drivers/md/md.c | 11 ++++++++---
1 files changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 7d6f7f1..4a4c0f8 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4347,13 +4347,19 @@ static int md_alloc(dev_t dev, char *name)
disk->fops = &md_fops;
disk->private_data = mddev;
disk->queue = mddev->queue;
+ blk_queue_flush(mddev->queue, REQ_FLUSH | REQ_FUA);
/* Allow extended partitions. This makes the
* 'mdp' device redundant, but we can't really
* remove it now.
*/
disk->flags |= GENHD_FL_EXT_DEVT;
- add_disk(disk);
mddev->gendisk = disk;
+ /* As soon as we call add_disk(), another thread could get
+ * through to md_open, so make sure it doesn't get too far
+ */
+ mutex_lock(&mddev->open_mutex);
+ add_disk(disk);
+
error = kobject_init_and_add(&mddev->kobj, &md_ktype,
&disk_to_dev(disk)->kobj, "%s", "md");
if (error) {
@@ -4367,8 +4373,7 @@ static int md_alloc(dev_t dev, char *name)
if (mddev->kobj.sd &&
sysfs_create_group(&mddev->kobj, &md_bitmap_group))
printk(KERN_DEBUG "pointless warning\n");
-
- blk_queue_flush(mddev->queue, REQ_FLUSH | REQ_FUA);
+ mutex_unlock(&mddev->open_mutex);
abort:
mutex_unlock(&disks_mutex);
if (!error && mddev->kobj.sd) {
next prev parent reply other threads:[~2011-05-11 6:30 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-05-11 6:30 [md PATCH 00/16] md patches for 2.6.40 NeilBrown
2011-05-11 6:30 ` [md PATCH 04/16] md: simplify raid10 read_balance NeilBrown
2011-05-11 6:30 ` [md PATCH 03/16] md/bitmap: fix saving of events_cleared and other state NeilBrown
2011-05-11 6:30 ` NeilBrown [this message]
2011-05-11 6:30 ` [md PATCH 02/16] md: reject a re-add request that cannot be honoured NeilBrown
2011-05-11 6:30 ` [md PATCH 07/16] md: make error_handler functions more uniform and correct NeilBrown
2011-05-11 6:30 ` [md PATCH 15/16] md/raid10: reformat some loops with less indenting NeilBrown
2011-05-11 6:30 ` [md PATCH 11/16] md/raid1: improve handling of pages allocated for write-behind NeilBrown
2011-05-11 6:30 ` [md PATCH 12/16] md/raid10: some tidying up in fix_read_error NeilBrown
2011-05-11 6:30 ` [md PATCH 16/16] md: allow resync_start to be set while an array is active NeilBrown
2011-05-11 6:30 ` [md PATCH 09/16] md/raid1: tidy up new functions: process_checks and fix_sync_read_error NeilBrown
2011-05-11 6:30 ` [md PATCH 06/16] md/multipath: discard ->working_disks in favour of ->degraded NeilBrown
2011-05-11 6:30 ` [md PATCH 08/16] md/raid1: split out two sub-functions from sync_request_write NeilBrown
2011-05-11 6:30 ` [md PATCH 05/16] md/raid1: clean up read_balance NeilBrown
2011-05-11 6:30 ` [md PATCH 13/16] md/raid10: make more use of 'slot' in raid10d NeilBrown
2011-05-11 6:30 ` [md PATCH 14/16] md/raid10: remove unused variable NeilBrown
2011-05-11 6:30 ` [md PATCH 10/16] md/raid1: try fix_sync_read_error before process_checks NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110511063030.21263.73208.stgit@notabene.brown \
--to=neilb@suse.de \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).