All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <jens.axboe@oracle.com>
To: Neil Brown <neilb@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	"Rafael J. Wysocki" <rjw@sisk.pl>, jurriaan <thunder7@xs4all.nl>,
	linux-kernel@vger.kernel.org
Subject: Re: 2.6.27-rc4: lots of 'in_atomic():1, irqs_disabled():0' with  software-raid1
Date: Fri, 29 Aug 2008 09:11:53 +0200	[thread overview]
Message-ID: <20080829071153.GM20055@kernel.dk> (raw)
In-Reply-To: <18615.36284.491484.331385@notabene.brown>

On Fri, Aug 29 2008, Neil Brown wrote:
> On Friday August 29, neilb@suse.de wrote:
> > 
> > Here is my (untested yet) patch to address the problem.
> > I'll try to get some testing done and push it out early next week, but
> > if anyone could review and/or test that would be a great help.
> 
> Actually, that was my "haven't even compiled it yet" patch.  The one I
> meant to send was this one.
> 
> Sorry for the noise.
> 
> NeilBrown
> 
> 
> From 3a0646137016c69dbcaeed0114558f67daa4a6f0 Mon Sep 17 00:00:00 2001
> From: NeilBrown <neilb@suse.de>
> Date: Fri, 29 Aug 2008 15:46:38 +1000
> Subject: [PATCH] Fix problem with waiting while holding rcu read lock in md/bitmap.c
> 
> A recent patch to protect the rdev list with rcu locking leaves us
> with a problem because we can sleep on memalloc while holding the
> rcu lock.
> 
> The rcu lock is only needed while walking the linked list as
> uninteresting devices (failed or spares) can be removed at any time.
> 
> So only take the rcu lock while actually walking the linked list.
> Take a refcount on the rdev during the time when we drop the lock
> and do the memalloc to start IO.
> When we return to the locked code, all the interesting devices
> on the list will not have moved, so we can simply use
> list_for_each_continue_rcu to pick up where we left off.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  drivers/md/bitmap.c |   44 +++++++++++++++++++++++++++++++++++++-------
>  1 files changed, 37 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
> index 7e65bad..c17eb4f 100644
> --- a/drivers/md/bitmap.c
> +++ b/drivers/md/bitmap.c
> @@ -238,15 +238,46 @@ static struct page *read_sb_page(mddev_t *mddev, long offset, unsigned long inde
>  
>  }
>  
> +static mdk_rdev_t *next_active_rdev(mdk_rdev_t *rdev, mddev_t *mddev)
> +{
> +	/* Iterate the disks of an mddev, using rcu to protect access to the
> +	 * linked list, and raising the refcount of devices we return to ensure
> +	 * they don't disappear while in use.
> +	 * As devices are only added or removed when raid_disk is < 0 and
> +	 * nr_pending is 0 and In_sync is clear, the entries we return will
> +	 * still be in the same position on the list when we re-enter
> +	 * list_for_each_continue_rcu.
> +	 */
> +	struct list_head *pos;
> +	rcu_read_lock();
> +	if (rdev == NULL)
> +		/* start at the beginning */
> +		pos = &mddev->disks;
> +	else
> +		/* release the previous rdev */
> +		rdev_dec_pending(rdev, mddev);
> +	
> +	list_for_each_continue_rcu(pos, &mddev->disks) {
> +		rdev = list_entry(pos, mdk_rdev_t, same_set);
> +		if (rdev->raid_disk >= 0 &&
> +		    test_bit(In_sync, &rdev->flags) &&
> +		    !test_bit(Faulty, &rdev->flags)) {
> +			/* this is a usable devices */
> +			atomic_inc(&rdev->nr_pending);
> +			rcu_read_unlock();
> +			return rdev;
> +		}
> +	}
> +	rcu_read_unlock();
> +	return NULL;
> +}
> +
>  static int write_sb_page(struct bitmap *bitmap, struct page *page, int wait)
>  {
> -	mdk_rdev_t *rdev;
> +	mdk_rdev_t *rdev = NULL;
>  	mddev_t *mddev = bitmap->mddev;
>  
> -	rcu_read_lock();
> -	rdev_for_each_rcu(rdev, mddev)
> -		if (test_bit(In_sync, &rdev->flags)
> -		    && !test_bit(Faulty, &rdev->flags)) {
> +	while ((rdev = next_active_rdev(rdev, mddev)) != NULL) {
>  			int size = PAGE_SIZE;
>  			if (page->index == bitmap->file_pages-1)
>  				size = roundup(bitmap->last_page_size,
> @@ -281,8 +312,7 @@ static int write_sb_page(struct bitmap *bitmap, struct page *page, int wait)
>  				       + page->index * (PAGE_SIZE/512),
>  				       size,
>  				       page);
> -		}
> -	rcu_read_unlock();
> +	}
>  
>  	if (wait)
>  		md_super_wait(mddev);
> -- 
> 1.5.6.5

Looks like an elegant solution to the problem Neil, good stuff!

-- 
Jens Axboe


      reply	other threads:[~2008-08-29  7:12 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-08-27 17:05 2.6.27-rc4: lots of 'in_atomic():1, irqs_disabled():0' with software-raid1 jurriaan
2008-08-27 21:47 ` Rafael J. Wysocki
2008-08-28  7:33   ` Jens Axboe
2008-08-28  7:45     ` Andrew Morton
2008-08-28  7:48       ` Jens Axboe
2008-08-28  7:56         ` Andre Noll
2008-08-28  8:11           ` Jens Axboe
2008-08-28  8:04             ` Andre Noll
2008-08-28  8:27         ` Neil Brown
2008-08-28  8:36           ` Jens Axboe
2008-08-28  9:00           ` Andrew Morton
2008-08-29  7:36             ` Neil Brown
2008-08-29  7:47               ` Jens Axboe
2008-08-29  8:14               ` Andrew Morton
2008-08-29  5:44           ` Neil Brown
2008-08-29  5:48             ` Neil Brown
2008-08-29  7:11               ` Jens Axboe [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080829071153.GM20055@kernel.dk \
    --to=jens.axboe@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=rjw@sisk.pl \
    --cc=thunder7@xs4all.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.