From: NeilBrown <neilb@suse.de>
To: Shaohua Li <shli@kernel.org>
Cc: linux-raid@vger.kernel.org, axboe@kernel.dk,
dan.j.williams@intel.com, shli@fusionio.com
Subject: Re: [patch 08/10 v3] raid5: make_request use batch stripe release
Date: Mon, 2 Jul 2012 12:31:12 +1000 [thread overview]
Message-ID: <20120702123112.795e1db3@notabene.brown> (raw)
In-Reply-To: <20120625072702.921605418@kernel.org>
[-- Attachment #1: Type: text/plain, Size: 9493 bytes --]
On Mon, 25 Jun 2012 15:24:55 +0800 Shaohua Li <shli@kernel.org> wrote:
> make_request() does stripe release for every stripe and the stripe usually has
> count 1, which makes previous release_stripe() optimization not work. In my
> test, this release_stripe() becomes the heaviest pleace to take
> conf->device_lock after previous patches applied.
>
> Below patch makes stripe release batch. All the stripes will be released in
> unplug. The STRIPE_ON_UNPLUG_LIST bit is to protect concurrent access stripe
> lru.
>
I've applied this patch, but I'm afraid I butchered it a bit first :-)
> @@ -3984,6 +3985,51 @@ static struct stripe_head *__get_priorit
> return sh;
> }
>
> +#define raid5_unplug_list(mdcb) (struct list_head *)(mdcb + 1)
I really don't like this sort of construct. It is much cleaner (I think) to
add to a structure by embedding it in a larger structure, then using
"container_of" to map from the inner to the outer structure. So I have
changed that.
> @@ -4114,7 +4161,14 @@ static void make_request(struct mddev *m
> if ((bi->bi_rw & REQ_SYNC) &&
> !test_and_set_bit(STRIPE_PREREAD_ACTIVE, &sh->state))
> atomic_inc(&conf->preread_active_stripes);
> - release_stripe(sh);
> + /*
> + * We must recheck here. schedule() might be called
> + * above which makes unplug invoked already, so the old
> + * mdcb is invalid
> + */
I agree that this is an important check, but as a 'schedule()' can
theoretically happen at any time that preempt isn't explicitly disabled, we
really need to be even more careful. So I have changed the md code to
disable preempt, and require the caller to re-enable preempt after it has
used the returned value.
The resulting serious should appear in my for-next shortly. However for
easier review I'll include two patches below. The first change
mddev_check_plugged to disable preemption.
The second is a diff against your patch which changes it to use an embedded
structure and container_of.
I haven't actually tested this yet, so there may be further changes.
Thanks,
NeilBrown
From 04b7dd7d0ad4a21622cad7c10821f914a8d9ccd3 Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Mon, 2 Jul 2012 12:14:49 +1000
Subject: [PATCH] md/plug: disable preempt when reported a plug is present.
As 'schedule' will unplug a queue, a plug added by
mddev_check_plugged is only valid until the next schedule().
So call preempt_disable before installing the plug, and require the
called to call preempt_enable once the value has been used.
Signed-off-by: NeilBrown <neilb@suse.de>
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 1369c9d..63ea6d6 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -512,6 +512,10 @@ static void plugger_unplug(struct blk_plug_cb *cb)
/* Check that an unplug wakeup will come shortly.
* If not, wakeup the md thread immediately
+ * Note that the structure returned is only value until
+ * the next schedule(), so preemption is disabled when it
+ * is not NULL, and must be re-enabled after the value
+ * has been used.
*/
struct md_plug_cb *mddev_check_plugged(struct mddev *mddev,
md_unplug_func_t unplug, size_t size)
@@ -522,6 +526,7 @@ struct md_plug_cb *mddev_check_plugged(struct mddev *mddev,
if (!plug)
return NULL;
+ preempt_disable();
list_for_each_entry(mdcb, &plug->cb_list, cb.list) {
if (mdcb->cb.callback == plugger_unplug &&
mdcb->mddev == mddev) {
@@ -533,6 +538,7 @@ struct md_plug_cb *mddev_check_plugged(struct mddev *mddev,
return mdcb;
}
}
+ preempt_enable();
/* Not currently on the callback list */
if (size < sizeof(*mdcb))
size = sizeof(*mdcb);
@@ -540,6 +546,7 @@ struct md_plug_cb *mddev_check_plugged(struct mddev *mddev,
if (!mdcb)
return NULL;
+ preempt_disable();
mdcb->mddev = mddev;
mdcb->cb.callback = plugger_unplug;
atomic_inc(&mddev->plug_cnt);
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index ebce488..2e19b68 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -883,7 +883,6 @@ static void make_request(struct mddev *mddev, struct bio * bio)
const unsigned long do_sync = (bio->bi_rw & REQ_SYNC);
const unsigned long do_flush_fua = (bio->bi_rw & (REQ_FLUSH | REQ_FUA));
struct md_rdev *blocked_rdev;
- int plugged;
int first_clone;
int sectors_handled;
int max_sectors;
@@ -1034,8 +1033,6 @@ read_again:
* the bad blocks. Each set of writes gets it's own r1bio
* with a set of bios attached.
*/
- plugged = !!mddev_check_plugged(mddev, NULL, 0);
-
disks = conf->raid_disks * 2;
retry_write:
blocked_rdev = NULL;
@@ -1214,8 +1211,11 @@ read_again:
/* In case raid1d snuck in to freeze_array */
wake_up(&conf->wait_barrier);
- if (do_sync || !bitmap || !plugged)
+ if (do_sync ||
+ !mddev_check_plugged(mddev, NULL, 0))
md_wakeup_thread(mddev->thread);
+ else
+ preempt_enable();
}
static void status(struct seq_file *seq, struct mddev *mddev)
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 978a996..54f4b33 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1039,7 +1039,6 @@ static void make_request(struct mddev *mddev, struct bio * bio)
const unsigned long do_fua = (bio->bi_rw & REQ_FUA);
unsigned long flags;
struct md_rdev *blocked_rdev;
- int plugged;
int sectors_handled;
int max_sectors;
int sectors;
@@ -1239,7 +1238,6 @@ read_again:
* of r10_bios is recored in bio->bi_phys_segments just as with
* the read case.
*/
- plugged = !!mddev_check_plugged(mddev, NULL, 0);
r10_bio->read_slot = -1; /* make sure repl_bio gets freed */
raid10_find_phys(conf, r10_bio);
@@ -1449,8 +1447,11 @@ retry_write:
/* In case raid10d snuck in to freeze_array */
wake_up(&conf->wait_barrier);
- if (do_sync || !mddev->bitmap || !plugged)
+ if (do_sync ||
+ !mddev_check_plugged(mddev, NULL, 0))
md_wakeup_thread(mddev->thread);
+ else
+ preempt_enable();
}
static void status(struct seq_file *seq, struct mddev *mddev)
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 0a23037..e9e920c 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4004,7 +4004,6 @@ static void make_request(struct mddev *mddev, struct bio * bi)
struct stripe_head *sh;
const int rw = bio_data_dir(bi);
int remaining;
- int plugged;
if (unlikely(bi->bi_rw & REQ_FLUSH)) {
md_flush_request(mddev, bi);
@@ -4023,7 +4022,6 @@ static void make_request(struct mddev *mddev, struct bio * bi)
bi->bi_next = NULL;
bi->bi_phys_segments = 1; /* over-loaded to count active stripes */
- plugged = !!mddev_check_plugged(mddev, NULL, 0);
for (;logical_sector < last_sector; logical_sector += STRIPE_SECTORS) {
DEFINE_WAIT(w);
int previous;
@@ -4134,8 +4132,10 @@ static void make_request(struct mddev *mddev, struct bio * bi)
}
}
- if (!plugged)
+ if (!mddev_check_plugged(mddev, NULL, 0))
md_wakeup_thread(mddev->thread);
+ else
+ preempt_enable();
spin_lock_irq(&conf->device_lock);
remaining = raid5_dec_bi_phys_segments(bi);
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index fc98086..ef3baa4 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -3996,18 +3996,23 @@ static struct stripe_head *__get_priority_stripe(struct r5conf *conf)
return sh;
}
-#define raid5_unplug_list(mdcb) (struct list_head *)(mdcb + 1)
+struct raid5_plug_cb {
+ struct md_plug_cb cb;
+ struct list_head list;
+};
+
static void raid5_unplug(struct md_plug_cb *mdcb)
{
- struct list_head *list = raid5_unplug_list(mdcb);
+ struct raid5_plug_cb *cb = container_of(
+ mdcb, struct raid5_plug_cb, cb);
struct stripe_head *sh;
struct r5conf *conf = mdcb->mddev->private;
- if (list->next == NULL || list_empty(list))
+ if (cb->list.next == NULL || list_empty(&cb->list))
return;
spin_lock_irq(&conf->device_lock);
- while (!list_empty(list)) {
- sh = list_entry(list->next, struct stripe_head, lru);
+ while (!list_empty(&cb->list)) {
+ sh = list_first_entry(&cb->list, struct stripe_head, lru);
list_del_init(&sh->lru);
/*
* avoid race release_stripe_plug() sees STRIPE_ON_UNPLUG_LIST
@@ -4024,20 +4029,20 @@ static void release_stripe_plug(struct mddev *mddev,
struct stripe_head *sh)
{
struct md_plug_cb *mdcb = mddev_check_plugged(mddev, raid5_unplug,
- sizeof(struct list_head));
- struct list_head *list = raid5_unplug_list(mdcb);
+ sizeof(struct raid5_plug_cb));
+ struct raid5_plug_cb *cb;
if (!mdcb) {
release_stripe(sh);
return;
}
- if (list->next == NULL) {
- INIT_LIST_HEAD(list);
- mdcb->unplug = raid5_unplug;
- }
+ cb = container_of(mdcb, struct raid5_plug_cb, cb);
+
+ if (cb->list.next == NULL)
+ INIT_LIST_HEAD(&cb->list);
if (!test_and_set_bit(STRIPE_ON_UNPLUG_LIST, &sh->state))
- list_add_tail(&sh->lru, list);
+ list_add_tail(&sh->lru, &cb->list);
else
release_stripe(sh);
preempt_enable();
@@ -4180,7 +4185,7 @@ static void make_request(struct mddev *mddev, struct bio * bi)
}
}
- if (!mddev_check_plugged(mddev, raid5_unplug, sizeof(struct list_head)))
+ if (!mddev_check_plugged(mddev, raid5_unplug, sizeof(struct raid5_plug_cb)))
md_wakeup_thread(mddev->thread);
else
preempt_enable();
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
next prev parent reply other threads:[~2012-07-02 2:31 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-25 7:24 [patch 00/10 v3] raid5: improve write performance for fast storage Shaohua Li
2012-06-25 7:24 ` [patch 01/10 v3] raid5: use wake_up_all for overlap waking Shaohua Li
2012-06-28 7:26 ` NeilBrown
2012-06-28 8:53 ` Shaohua Li
2012-06-25 7:24 ` [patch 02/10 v3] raid5: delayed stripe fix Shaohua Li
2012-07-02 0:46 ` NeilBrown
2012-07-02 0:49 ` Shaohua Li
2012-07-02 0:55 ` NeilBrown
2012-06-25 7:24 ` [patch 03/10 v3] raid5: add a per-stripe lock Shaohua Li
2012-07-02 0:50 ` NeilBrown
2012-07-02 3:16 ` Shaohua Li
2012-07-02 7:39 ` NeilBrown
2012-07-03 1:27 ` Shaohua Li
2012-07-03 12:16 ` majianpeng
2012-07-03 23:56 ` NeilBrown
2012-07-04 1:09 ` majianpeng
2012-06-25 7:24 ` [patch 04/10 v3] raid5: lockless access raid5 overrided bi_phys_segments Shaohua Li
2012-06-25 7:24 ` [patch 05/10 v3] raid5: remove some device_lock locking places Shaohua Li
2012-06-25 7:24 ` [patch 06/10 v3] raid5: reduce chance release_stripe() taking device_lock Shaohua Li
2012-07-02 0:57 ` NeilBrown
2012-06-25 7:24 ` [patch 07/10 v3] md: personality can provide unplug private data Shaohua Li
2012-07-02 1:06 ` NeilBrown
2012-06-25 7:24 ` [patch 08/10 v3] raid5: make_request use batch stripe release Shaohua Li
2012-07-02 2:31 ` NeilBrown [this message]
2012-07-02 2:59 ` Shaohua Li
2012-07-02 5:07 ` NeilBrown
2012-06-25 7:24 ` [patch 09/10 v3] raid5: raid5d handle stripe in batch way Shaohua Li
2012-07-02 2:32 ` NeilBrown
2012-06-25 7:24 ` [patch 10/10 v3] raid5: create multiple threads to handle stripes Shaohua Li
2012-07-02 2:39 ` NeilBrown
2012-07-02 20:03 ` Dan Williams
2012-07-03 8:04 ` Shaohua Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120702123112.795e1db3@notabene.brown \
--to=neilb@suse.de \
--cc=axboe@kernel.dk \
--cc=dan.j.williams@intel.com \
--cc=linux-raid@vger.kernel.org \
--cc=shli@fusionio.com \
--cc=shli@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.