From: Jens Axboe <jens.axboe@oracle.com>
To: Neil Brown <neilb@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
"Rafael J. Wysocki" <rjw@sisk.pl>, jurriaan <thunder7@xs4all.nl>,
linux-kernel@vger.kernel.org
Subject: Re: 2.6.27-rc4: lots of 'in_atomic():1, irqs_disabled():0' with software-raid1
Date: Fri, 29 Aug 2008 09:11:53 +0200 [thread overview]
Message-ID: <20080829071153.GM20055@kernel.dk> (raw)
In-Reply-To: <18615.36284.491484.331385@notabene.brown>
On Fri, Aug 29 2008, Neil Brown wrote:
> On Friday August 29, neilb@suse.de wrote:
> >
> > Here is my (untested yet) patch to address the problem.
> > I'll try to get some testing done and push it out early next week, but
> > if anyone could review and/or test that would be a great help.
>
> Actually, that was my "haven't even compiled it yet" patch. The one I
> meant to send was this one.
>
> Sorry for the noise.
>
> NeilBrown
>
>
> From 3a0646137016c69dbcaeed0114558f67daa4a6f0 Mon Sep 17 00:00:00 2001
> From: NeilBrown <neilb@suse.de>
> Date: Fri, 29 Aug 2008 15:46:38 +1000
> Subject: [PATCH] Fix problem with waiting while holding rcu read lock in md/bitmap.c
>
> A recent patch to protect the rdev list with rcu locking leaves us
> with a problem because we can sleep on memalloc while holding the
> rcu lock.
>
> The rcu lock is only needed while walking the linked list as
> uninteresting devices (failed or spares) can be removed at any time.
>
> So only take the rcu lock while actually walking the linked list.
> Take a refcount on the rdev during the time when we drop the lock
> and do the memalloc to start IO.
> When we return to the locked code, all the interesting devices
> on the list will not have moved, so we can simply use
> list_for_each_continue_rcu to pick up where we left off.
>
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
> drivers/md/bitmap.c | 44 +++++++++++++++++++++++++++++++++++++-------
> 1 files changed, 37 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
> index 7e65bad..c17eb4f 100644
> --- a/drivers/md/bitmap.c
> +++ b/drivers/md/bitmap.c
> @@ -238,15 +238,46 @@ static struct page *read_sb_page(mddev_t *mddev, long offset, unsigned long inde
>
> }
>
> +static mdk_rdev_t *next_active_rdev(mdk_rdev_t *rdev, mddev_t *mddev)
> +{
> + /* Iterate the disks of an mddev, using rcu to protect access to the
> + * linked list, and raising the refcount of devices we return to ensure
> + * they don't disappear while in use.
> + * As devices are only added or removed when raid_disk is < 0 and
> + * nr_pending is 0 and In_sync is clear, the entries we return will
> + * still be in the same position on the list when we re-enter
> + * list_for_each_continue_rcu.
> + */
> + struct list_head *pos;
> + rcu_read_lock();
> + if (rdev == NULL)
> + /* start at the beginning */
> + pos = &mddev->disks;
> + else
> + /* release the previous rdev */
> + rdev_dec_pending(rdev, mddev);
> +
> + list_for_each_continue_rcu(pos, &mddev->disks) {
> + rdev = list_entry(pos, mdk_rdev_t, same_set);
> + if (rdev->raid_disk >= 0 &&
> + test_bit(In_sync, &rdev->flags) &&
> + !test_bit(Faulty, &rdev->flags)) {
> + /* this is a usable devices */
> + atomic_inc(&rdev->nr_pending);
> + rcu_read_unlock();
> + return rdev;
> + }
> + }
> + rcu_read_unlock();
> + return NULL;
> +}
> +
> static int write_sb_page(struct bitmap *bitmap, struct page *page, int wait)
> {
> - mdk_rdev_t *rdev;
> + mdk_rdev_t *rdev = NULL;
> mddev_t *mddev = bitmap->mddev;
>
> - rcu_read_lock();
> - rdev_for_each_rcu(rdev, mddev)
> - if (test_bit(In_sync, &rdev->flags)
> - && !test_bit(Faulty, &rdev->flags)) {
> + while ((rdev = next_active_rdev(rdev, mddev)) != NULL) {
> int size = PAGE_SIZE;
> if (page->index == bitmap->file_pages-1)
> size = roundup(bitmap->last_page_size,
> @@ -281,8 +312,7 @@ static int write_sb_page(struct bitmap *bitmap, struct page *page, int wait)
> + page->index * (PAGE_SIZE/512),
> size,
> page);
> - }
> - rcu_read_unlock();
> + }
>
> if (wait)
> md_super_wait(mddev);
> --
> 1.5.6.5
Looks like an elegant solution to the problem Neil, good stuff!
--
Jens Axboe
prev parent reply other threads:[~2008-08-29 7:12 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-08-27 17:05 2.6.27-rc4: lots of 'in_atomic():1, irqs_disabled():0' with software-raid1 jurriaan
2008-08-27 21:47 ` Rafael J. Wysocki
2008-08-28 7:33 ` Jens Axboe
2008-08-28 7:45 ` Andrew Morton
2008-08-28 7:48 ` Jens Axboe
2008-08-28 7:56 ` Andre Noll
2008-08-28 8:11 ` Jens Axboe
2008-08-28 8:04 ` Andre Noll
2008-08-28 8:27 ` Neil Brown
2008-08-28 8:36 ` Jens Axboe
2008-08-28 9:00 ` Andrew Morton
2008-08-29 7:36 ` Neil Brown
2008-08-29 7:47 ` Jens Axboe
2008-08-29 8:14 ` Andrew Morton
2008-08-29 5:44 ` Neil Brown
2008-08-29 5:48 ` Neil Brown
2008-08-29 7:11 ` Jens Axboe [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080829071153.GM20055@kernel.dk \
--to=jens.axboe@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=neilb@suse.de \
--cc=rjw@sisk.pl \
--cc=thunder7@xs4all.nl \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox