Re: raid1_end_read_request does not retry failed READ from a recovering drive

All of lore.kernel.org
 help / color / mirror / Atom feed

From: NeilBrown <neilb@suse.com>
To: Alexander Lyakas <alex.bolshoy@gmail.com>
Cc: linux-raid <linux-raid@vger.kernel.org>
Subject: Re: raid1_end_read_request does not retry failed READ from a recovering drive
Date: Mon, 27 Jul 2015 11:56:53 +1000	[thread overview]
Message-ID: <20150727115653.530daba6@noble> (raw)
In-Reply-To: <CAGRgLy7qkyQquRDGCShjJsuREQXNN+WaNGaq5wK2psMLQ6BGtw@mail.gmail.com>

On Sun, 26 Jul 2015 10:15:05 +0200 Alexander Lyakas
<alex.bolshoy@gmail.com> wrote:

> Hi Neil,
> 
> On Fri, Jul 24, 2015 at 1:24 AM, NeilBrown <neilb@suse.com> wrote:
> > Hi Alex
> > thanks for noticing!
> > Just to be sure we mean the same thing: this is the patch which is
> > missing - correct?
> >
> > Thanks,
> > NeilBrown
> >
> > From: NeilBrown <neilb@suse.com>
> > Date: Fri, 24 Jul 2015 09:22:16 +1000
> > Subject: [PATCH] md/raid1: fix test for 'was read error from last working
> >  device'.
> >
> > When we get a read error from the last working device, we don't
> > try to repair it, and don't fail the device.  We simple report a
> > read error to the caller.
> >
> > However the current test for 'is this the last working device' is
> > wrong.
> > When there is only one fully working device, it assumes that a
> > non-faulty device is that device.  However a spare which is rebuilding
> > would be non-faulty but so not the only working device.
> >
> > So change the test from "!Faulty" to "In_sync".  If ->degraded says
> > there is only one fully working device and this device is in_sync,
> > this must be the one.
> >
> > This bug has existed since we allowed read_balance to read from
> > a recovering spare in v3.0
> >
> > Reported-and-tested-by: Alexander Lyakas <alex.bolshoy@gmail.com>
> > Fixes: 76073054c95b ("md/raid1: clean up read_balance.")
> > Cc: stable@vger.kernel.org (v3.0+)
> > Signed-off-by: NeilBrown <neilb@suse.com>
> >
> > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> > index 166616411215..b368307a9651 100644
> > --- a/drivers/md/raid1.c
> > +++ b/drivers/md/raid1.c
> > @@ -336,7 +336,7 @@ static void raid1_end_read_request(struct bio *bio, int error)
> >                 spin_lock_irqsave(&conf->device_lock, flags);
> >                 if (r1_bio->mddev->degraded == conf->raid_disks ||
> >                     (r1_bio->mddev->degraded == conf->raid_disks-1 &&
> > -                    !test_bit(Faulty, &conf->mirrors[mirror].rdev->flags)))
> > +                    test_bit(In_sync, &conf->mirrors[mirror].rdev->flags)))
> >                         uptodate = 1;
> >                 spin_unlock_irqrestore(&conf->device_lock, flags);
> >         }
> 
> Yes, this was the missing piece.
> 
> But you also advised to put the whole of "raid1_spare_active" under
> the spinlock:
> > 2/ extend the spinlock in raid1_spare_active to cover the whole function.
> So we did this bit too. Can you pls comment if this is needed?

When you say "we did this bit" it would help a lot to actually show the
patch - helps avoid misunderstanding.  In fact I was probably hoping
that you would post a patch once you had it fixed.... no mater.
See below for that patch I have just queue.  There is an extra place
were a spinlock is probably helpful.

> 
> Also, will you be sending this patch to "stable"? We are moving to
> kernel 3.18, so we should get this then.

Putting "cc: stable@vger.kernel.org" means that it will automatically
migrated to the stable kernels once it has appeared in Linus' kernel.
So yes: it will go to -stable

> 
> Thanks!
> Alex.

Thanks,
NeilBrown

From 04f58ef505b9d354dd06c477a94a7e314a38cb72 Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.com>
Date: Mon, 27 Jul 2015 11:48:52 +1000
Subject: [PATCH] md/raid1: extend spinlock to protect raid1_end_read_request
 against inconsistencies

raid1_end_read_request() assumes that the In_sync bits are consistent
with the ->degaded count.
raid1_spare_active updates the In_sync bit before the ->degraded count
and so exposes an inconsistency, as does error()
So extend the spinlock in raid1_spare_active() and error() to hide those
inconsistencies.

This should probably be part of
  Commit: 34cab6f42003 ("md/raid1: fix test for 'was read error from
  last working device'.")
as it addresses the same issue.  It fixes the same bug and should go
to -stable for same reasons.

Fixes: 76073054c95b ("md/raid1: clean up read_balance.")
Cc: stable@vger.kernel.org (v3.0+)
Signed-off-by: NeilBrown <neilb@suse.com>

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index b368307a9651..742b50794dfd 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1476,6 +1476,7 @@ static void error(struct mddev *mddev, struct md_rdev *rdev)
 {
 	char b[BDEVNAME_SIZE];
 	struct r1conf *conf = mddev->private;
+	unsigned long flags;
 
 	/*
 	 * If it is not operational, then we have already marked it as dead
@@ -1495,14 +1496,13 @@ static void error(struct mddev *mddev, struct md_rdev *rdev)
 		return;
 	}
 	set_bit(Blocked, &rdev->flags);
+	spin_lock_irqsave(&conf->device_lock, flags);
 	if (test_and_clear_bit(In_sync, &rdev->flags)) {
-		unsigned long flags;
-		spin_lock_irqsave(&conf->device_lock, flags);
 		mddev->degraded++;
 		set_bit(Faulty, &rdev->flags);
-		spin_unlock_irqrestore(&conf->device_lock, flags);
 	} else
 		set_bit(Faulty, &rdev->flags);
+	spin_unlock_irqrestore(&conf->device_lock, flags);
 	/*
 	 * if recovery is running, make sure it aborts.
 	 */
@@ -1568,7 +1568,10 @@ static int raid1_spare_active(struct mddev *mddev)
 	 * Find all failed disks within the RAID1 configuration
 	 * and mark them readable.
 	 * Called under mddev lock, so rcu protection not needed.
+	 * device_lock used to avoid races with raid1_end_read_request
+	 * which expects 'In_sync' flags and ->degraded to be consistent.
 	 */
+	spin_lock_irqsave(&conf->device_lock, flags);
 	for (i = 0; i < conf->raid_disks; i++) {
 		struct md_rdev *rdev = conf->mirrors[i].rdev;
 		struct md_rdev *repl = conf->mirrors[conf->raid_disks + i].rdev;
@@ -1599,7 +1602,6 @@ static int raid1_spare_active(struct mddev *mddev)
 			sysfs_notify_dirent_safe(rdev->sysfs_state);
 		}
 	}
-	spin_lock_irqsave(&conf->device_lock, flags);
 	mddev->degraded -= count;
 	spin_unlock_irqrestore(&conf->device_lock, flags);

next prev parent reply	other threads:[~2015-07-27  1:56 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-07 14:18 raid1_end_read_request does not retry failed READ from a recovering drive Alexander Lyakas
2014-09-08  7:17 ` NeilBrown
2014-09-17 17:57   ` Alexander Lyakas
2014-09-18  1:05     ` NeilBrown
     [not found]       ` <CAGRgLy7+bN8ztaaN+oh6uATE7=Z=8LnU=R0m2CnVff4ZqgJFKg@mail.gmail.com>
     [not found]         ` <20140922101704.53be2056@notabene.brown>
2015-07-22 16:10           ` Alexander Lyakas
2015-07-23 23:24             ` NeilBrown
2015-07-26  8:15               ` Alexander Lyakas
2015-07-27  1:56                 ` NeilBrown [this message]
2015-07-27  8:07                   ` Alexander Lyakas
2015-07-31  0:12                     ` NeilBrown

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:b368307a965 dfblob:742b50794df )
 OR (
bs:"md/raid1: extend spinlock to protect raid1_end_read_request" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150727115653.530daba6@noble \
    --to=neilb@suse.com \
    --cc=alex.bolshoy@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.