From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753437AbYH2HML (ORCPT ); Fri, 29 Aug 2008 03:12:11 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751451AbYH2HL5 (ORCPT ); Fri, 29 Aug 2008 03:11:57 -0400 Received: from pasmtpa.tele.dk ([80.160.77.114]:42480 "EHLO pasmtpA.tele.dk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751091AbYH2HL4 (ORCPT ); Fri, 29 Aug 2008 03:11:56 -0400 Date: Fri, 29 Aug 2008 09:11:53 +0200 From: Jens Axboe To: Neil Brown Cc: Andrew Morton , "Rafael J. Wysocki" , jurriaan , linux-kernel@vger.kernel.org Subject: Re: 2.6.27-rc4: lots of 'in_atomic():1, irqs_disabled():0' with software-raid1 Message-ID: <20080829071153.GM20055@kernel.dk> References: <20080827170538.GA24393@amd64.of.nowhere> <200808272347.43577.rjw@sisk.pl> <20080828073324.GR20055@kernel.dk> <20080828004532.45d8b8c9.akpm@linux-foundation.org> <20080828074830.GV20055@kernel.dk> <18614.24929.952454.829021@notabene.brown> <18615.36009.954947.757412@notabene.brown> <18615.36284.491484.331385@notabene.brown> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <18615.36284.491484.331385@notabene.brown> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Aug 29 2008, Neil Brown wrote: > On Friday August 29, neilb@suse.de wrote: > > > > Here is my (untested yet) patch to address the problem. > > I'll try to get some testing done and push it out early next week, but > > if anyone could review and/or test that would be a great help. > > Actually, that was my "haven't even compiled it yet" patch. The one I > meant to send was this one. > > Sorry for the noise. > > NeilBrown > > > From 3a0646137016c69dbcaeed0114558f67daa4a6f0 Mon Sep 17 00:00:00 2001 > From: NeilBrown > Date: Fri, 29 Aug 2008 15:46:38 +1000 > Subject: [PATCH] Fix problem with waiting while holding rcu read lock in md/bitmap.c > > A recent patch to protect the rdev list with rcu locking leaves us > with a problem because we can sleep on memalloc while holding the > rcu lock. > > The rcu lock is only needed while walking the linked list as > uninteresting devices (failed or spares) can be removed at any time. > > So only take the rcu lock while actually walking the linked list. > Take a refcount on the rdev during the time when we drop the lock > and do the memalloc to start IO. > When we return to the locked code, all the interesting devices > on the list will not have moved, so we can simply use > list_for_each_continue_rcu to pick up where we left off. > > Signed-off-by: NeilBrown > --- > drivers/md/bitmap.c | 44 +++++++++++++++++++++++++++++++++++++------- > 1 files changed, 37 insertions(+), 7 deletions(-) > > diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c > index 7e65bad..c17eb4f 100644 > --- a/drivers/md/bitmap.c > +++ b/drivers/md/bitmap.c > @@ -238,15 +238,46 @@ static struct page *read_sb_page(mddev_t *mddev, long offset, unsigned long inde > > } > > +static mdk_rdev_t *next_active_rdev(mdk_rdev_t *rdev, mddev_t *mddev) > +{ > + /* Iterate the disks of an mddev, using rcu to protect access to the > + * linked list, and raising the refcount of devices we return to ensure > + * they don't disappear while in use. > + * As devices are only added or removed when raid_disk is < 0 and > + * nr_pending is 0 and In_sync is clear, the entries we return will > + * still be in the same position on the list when we re-enter > + * list_for_each_continue_rcu. > + */ > + struct list_head *pos; > + rcu_read_lock(); > + if (rdev == NULL) > + /* start at the beginning */ > + pos = &mddev->disks; > + else > + /* release the previous rdev */ > + rdev_dec_pending(rdev, mddev); > + > + list_for_each_continue_rcu(pos, &mddev->disks) { > + rdev = list_entry(pos, mdk_rdev_t, same_set); > + if (rdev->raid_disk >= 0 && > + test_bit(In_sync, &rdev->flags) && > + !test_bit(Faulty, &rdev->flags)) { > + /* this is a usable devices */ > + atomic_inc(&rdev->nr_pending); > + rcu_read_unlock(); > + return rdev; > + } > + } > + rcu_read_unlock(); > + return NULL; > +} > + > static int write_sb_page(struct bitmap *bitmap, struct page *page, int wait) > { > - mdk_rdev_t *rdev; > + mdk_rdev_t *rdev = NULL; > mddev_t *mddev = bitmap->mddev; > > - rcu_read_lock(); > - rdev_for_each_rcu(rdev, mddev) > - if (test_bit(In_sync, &rdev->flags) > - && !test_bit(Faulty, &rdev->flags)) { > + while ((rdev = next_active_rdev(rdev, mddev)) != NULL) { > int size = PAGE_SIZE; > if (page->index == bitmap->file_pages-1) > size = roundup(bitmap->last_page_size, > @@ -281,8 +312,7 @@ static int write_sb_page(struct bitmap *bitmap, struct page *page, int wait) > + page->index * (PAGE_SIZE/512), > size, > page); > - } > - rcu_read_unlock(); > + } > > if (wait) > md_super_wait(mddev); > -- > 1.5.6.5 Looks like an elegant solution to the problem Neil, good stuff! -- Jens Axboe