public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <jens.axboe@oracle.com>
To: Neil Brown <neilb@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	"Rafael J. Wysocki" <rjw@sisk.pl>, jurriaan <thunder7@xs4all.nl>,
	linux-kernel@vger.kernel.org
Subject: Re: 2.6.27-rc4: lots of 'in_atomic():1, irqs_disabled():0' with  software-raid1
Date: Fri, 29 Aug 2008 09:11:53 +0200	[thread overview]
Message-ID: <20080829071153.GM20055@kernel.dk> (raw)
In-Reply-To: <18615.36284.491484.331385@notabene.brown>

On Fri, Aug 29 2008, Neil Brown wrote:
> On Friday August 29, neilb@suse.de wrote:
> > 
> > Here is my (untested yet) patch to address the problem.
> > I'll try to get some testing done and push it out early next week, but
> > if anyone could review and/or test that would be a great help.
> 
> Actually, that was my "haven't even compiled it yet" patch.  The one I
> meant to send was this one.
> 
> Sorry for the noise.
> 
> NeilBrown
> 
> 
> From 3a0646137016c69dbcaeed0114558f67daa4a6f0 Mon Sep 17 00:00:00 2001
> From: NeilBrown <neilb@suse.de>
> Date: Fri, 29 Aug 2008 15:46:38 +1000
> Subject: [PATCH] Fix problem with waiting while holding rcu read lock in md/bitmap.c
> 
> A recent patch to protect the rdev list with rcu locking leaves us
> with a problem because we can sleep on memalloc while holding the
> rcu lock.
> 
> The rcu lock is only needed while walking the linked list as
> uninteresting devices (failed or spares) can be removed at any time.
> 
> So only take the rcu lock while actually walking the linked list.
> Take a refcount on the rdev during the time when we drop the lock
> and do the memalloc to start IO.
> When we return to the locked code, all the interesting devices
> on the list will not have moved, so we can simply use
> list_for_each_continue_rcu to pick up where we left off.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  drivers/md/bitmap.c |   44 +++++++++++++++++++++++++++++++++++++-------
>  1 files changed, 37 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
> index 7e65bad..c17eb4f 100644
> --- a/drivers/md/bitmap.c
> +++ b/drivers/md/bitmap.c
> @@ -238,15 +238,46 @@ static struct page *read_sb_page(mddev_t *mddev, long offset, unsigned long inde
>  
>  }
>  
> +static mdk_rdev_t *next_active_rdev(mdk_rdev_t *rdev, mddev_t *mddev)
> +{
> +	/* Iterate the disks of an mddev, using rcu to protect access to the
> +	 * linked list, and raising the refcount of devices we return to ensure
> +	 * they don't disappear while in use.
> +	 * As devices are only added or removed when raid_disk is < 0 and
> +	 * nr_pending is 0 and In_sync is clear, the entries we return will
> +	 * still be in the same position on the list when we re-enter
> +	 * list_for_each_continue_rcu.
> +	 */
> +	struct list_head *pos;
> +	rcu_read_lock();
> +	if (rdev == NULL)
> +		/* start at the beginning */
> +		pos = &mddev->disks;
> +	else
> +		/* release the previous rdev */
> +		rdev_dec_pending(rdev, mddev);
> +	
> +	list_for_each_continue_rcu(pos, &mddev->disks) {
> +		rdev = list_entry(pos, mdk_rdev_t, same_set);
> +		if (rdev->raid_disk >= 0 &&
> +		    test_bit(In_sync, &rdev->flags) &&
> +		    !test_bit(Faulty, &rdev->flags)) {
> +			/* this is a usable devices */
> +			atomic_inc(&rdev->nr_pending);
> +			rcu_read_unlock();
> +			return rdev;
> +		}
> +	}
> +	rcu_read_unlock();
> +	return NULL;
> +}
> +
>  static int write_sb_page(struct bitmap *bitmap, struct page *page, int wait)
>  {
> -	mdk_rdev_t *rdev;
> +	mdk_rdev_t *rdev = NULL;
>  	mddev_t *mddev = bitmap->mddev;
>  
> -	rcu_read_lock();
> -	rdev_for_each_rcu(rdev, mddev)
> -		if (test_bit(In_sync, &rdev->flags)
> -		    && !test_bit(Faulty, &rdev->flags)) {
> +	while ((rdev = next_active_rdev(rdev, mddev)) != NULL) {
>  			int size = PAGE_SIZE;
>  			if (page->index == bitmap->file_pages-1)
>  				size = roundup(bitmap->last_page_size,
> @@ -281,8 +312,7 @@ static int write_sb_page(struct bitmap *bitmap, struct page *page, int wait)
>  				       + page->index * (PAGE_SIZE/512),
>  				       size,
>  				       page);
> -		}
> -	rcu_read_unlock();
> +	}
>  
>  	if (wait)
>  		md_super_wait(mddev);
> -- 
> 1.5.6.5

Looks like an elegant solution to the problem Neil, good stuff!

-- 
Jens Axboe


      reply	other threads:[~2008-08-29  7:12 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-08-27 17:05 2.6.27-rc4: lots of 'in_atomic():1, irqs_disabled():0' with software-raid1 jurriaan
2008-08-27 21:47 ` Rafael J. Wysocki
2008-08-28  7:33   ` Jens Axboe
2008-08-28  7:45     ` Andrew Morton
2008-08-28  7:48       ` Jens Axboe
2008-08-28  7:56         ` Andre Noll
2008-08-28  8:11           ` Jens Axboe
2008-08-28  8:04             ` Andre Noll
2008-08-28  8:27         ` Neil Brown
2008-08-28  8:36           ` Jens Axboe
2008-08-28  9:00           ` Andrew Morton
2008-08-29  7:36             ` Neil Brown
2008-08-29  7:47               ` Jens Axboe
2008-08-29  8:14               ` Andrew Morton
2008-08-29  5:44           ` Neil Brown
2008-08-29  5:48             ` Neil Brown
2008-08-29  7:11               ` Jens Axboe [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080829071153.GM20055@kernel.dk \
    --to=jens.axboe@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=rjw@sisk.pl \
    --cc=thunder7@xs4all.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox