public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Neil Brown <neilb@suse.de>
To: Jens Axboe <jens.axboe@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Rafael J. Wysocki" <rjw@sisk.pl>, jurriaan <thunder7@xs4all.nl>,
	linux-kernel@vger.kernel.org
Subject: Re: 2.6.27-rc4: lots of 'in_atomic():1, irqs_disabled():0' with  software-raid1
Date: Fri, 29 Aug 2008 15:44:09 +1000	[thread overview]
Message-ID: <18615.36009.954947.757412@notabene.brown> (raw)
In-Reply-To: message from Neil Brown on Thursday August 28

On Thursday August 28, neilb@suse.de wrote:
> 
> I think I'll have to think about it a bit more.

It is amazing what a good nights sleep can do for clear thinking!

Here is my (untested yet) patch to address the problem.
I'll try to get some testing done and push it out early next week, but
if anyone could review and/or test that would be a great help.

Thanks,
NeilBrown

>From 8ce1dd0fe4e42b4019824f73f24e4b3d91ecde4c Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Fri, 29 Aug 2008 15:40:23 +1000
Subject: [PATCH] Fix problem with waiting while holding rcu read lock in md/bitmap.c

A recent patch to protect the rdev list with rcu locking leaves us
with a problem because we can sleep on memalloc while holding the
rcu lock.

The rcu lock is only needed while walking the linked list as
uninteresting devices (failed or spares) can be removed at any time.

So only take the rcu lock while actually walking the linked list.
Take a refcount on the rdev during the time when we drop the lock
and do the memalloc to start IO.
When we return to the locked code, all the interesting devices
on the list will not have moved, so we can simply use
list_for_each_continue_rcu to pick up where we left off.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 drivers/md/bitmap.c |   43 ++++++++++++++++++++++++++++++++++++-------
 1 files changed, 36 insertions(+), 7 deletions(-)

diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
index 7e65bad..92250d5 100644
--- a/drivers/md/bitmap.c
+++ b/drivers/md/bitmap.c
@@ -238,15 +238,45 @@ static struct page *read_sb_page(mddev_t *mddev, long offset, unsigned long inde
 
 }
 
+static mdk_rdev_t *next_active_rdev(struct mdk_rdev_t *rdev, mddev_t *mddev)
+{
+	/* Iterate the disks of an mddev, using rcu to protect access to the
+	 * linked list, and raising the refcount of devices we return to ensure
+	 * they don't disappear while in use.
+	 * As devices are only added or removed when raid_disk is < 0 and
+	 * nr_pending is 0 and In_sync is clear, the entries we return will
+	 * still be in the same position on the list when we re-enter
+	 * list_for_each_continue_rcu.
+	 */
+	rcu_read_lock();
+	if (rdev == NULL)
+		/* start at the beginning */
+		pos = &mddev->disks;
+	else
+		/* release the previous rdev */
+		rdev_dec_pending(rdev);
+	
+	list_for_each_continue_rcu(pos, &mddev->head) {
+		rdev = list_entry(pos, struct mdk_rdev_t, same_set);
+		if (rdev->raid_disk >= 0 &&
+		    test_bit(In_sync, &rdev->flags) &&
+		    !test_bit(Faulty, &rdev->flags)) {
+			/* this is a usable devices */
+			atomic_inc(&rdev->nr_pending);
+			rcu_read_unlock();
+			return rdev;
+		}
+	}
+	rcu_read_unlock();
+	return NULL;
+}
+
 static int write_sb_page(struct bitmap *bitmap, struct page *page, int wait)
 {
-	mdk_rdev_t *rdev;
+	mdk_rdev_t *rdev = NULL;
 	mddev_t *mddev = bitmap->mddev;
 
-	rcu_read_lock();
-	rdev_for_each_rcu(rdev, mddev)
-		if (test_bit(In_sync, &rdev->flags)
-		    && !test_bit(Faulty, &rdev->flags)) {
+	while ((rdev = next_active_rdev(rdev, mddev)) != NULL) {
 			int size = PAGE_SIZE;
 			if (page->index == bitmap->file_pages-1)
 				size = roundup(bitmap->last_page_size,
@@ -281,8 +311,7 @@ static int write_sb_page(struct bitmap *bitmap, struct page *page, int wait)
 				       + page->index * (PAGE_SIZE/512),
 				       size,
 				       page);
-		}
-	rcu_read_unlock();
+	}
 
 	if (wait)
 		md_super_wait(mddev);
-- 
1.5.6.5


  parent reply	other threads:[~2008-08-29  5:44 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-08-27 17:05 2.6.27-rc4: lots of 'in_atomic():1, irqs_disabled():0' with software-raid1 jurriaan
2008-08-27 21:47 ` Rafael J. Wysocki
2008-08-28  7:33   ` Jens Axboe
2008-08-28  7:45     ` Andrew Morton
2008-08-28  7:48       ` Jens Axboe
2008-08-28  7:56         ` Andre Noll
2008-08-28  8:11           ` Jens Axboe
2008-08-28  8:04             ` Andre Noll
2008-08-28  8:27         ` Neil Brown
2008-08-28  8:36           ` Jens Axboe
2008-08-28  9:00           ` Andrew Morton
2008-08-29  7:36             ` Neil Brown
2008-08-29  7:47               ` Jens Axboe
2008-08-29  8:14               ` Andrew Morton
2008-08-29  5:44           ` Neil Brown [this message]
2008-08-29  5:48             ` Neil Brown
2008-08-29  7:11               ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=18615.36009.954947.757412@notabene.brown \
    --to=neilb@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rjw@sisk.pl \
    --cc=thunder7@xs4all.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox