[GIT PULL REQUEST] late md/raid1 bug fixes for 3.17

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [GIT PULL REQUEST] late md/raid1 bug fixes for 3.17
@ 2014-09-24  2:18 NeilBrown
  2014-09-26 19:08 ` BillStuff
  0 siblings, 1 reply; 3+ messages in thread
From: NeilBrown @ 2014-09-24  2:18 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux RAID, lkml, Alexander Lyakas, Bassow Jonathan, majianpeng

[-- Attachment #1: Type: text/plain, Size: 1634 bytes --]


Hi Linus,
 it is amazing how much easier it is to find bugs when you know one is there.
Two bug reports resulted in finding 7 bugs!!

All are tagged for -stable.  Those that can't cause (rare) data corruption,
cause lockups.

Thanks,
NeilBrown


The following changes since commit d030671f3f261e528dc6e396a13f10859a74ae7c:

  Merge branch 'for-3.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup (2014-09-07 20:20:16 -0700)

are available in the git repository at:

  git://git.neil.brown.name/md/ tags/md/3.17-more-fixes

for you to fetch changes up to b8cb6b4c121e1bf1963c16ed69e7adcb1bc301cd:

  md/raid1: fix_read_error should act on all non-faulty devices. (2014-09-22 11:26:01 +1000)

----------------------------------------------------------------
Bugfixes for md/raid1

particularly, but not only, fixing new "resync" code.

----------------------------------------------------------------
NeilBrown (8):
      md/raid1: intialise start_next_window for READ case to avoid hang
      md/raid1:  be more cautious where we read-balance during resync.
      md/raid1: clean up request counts properly in close_sync()
      md/raid1: make sure resync waits for conflicting writes to complete.
      md/raid1: Don't use next_resync to determine how far resync has progressed
      md/raid1: update next_resync under resync_lock.
      md/raid1: count resync requests in nr_pending.
      md/raid1: fix_read_error should act on all non-faulty devices.

 drivers/md/raid1.c | 40 ++++++++++++++++++++++------------------
 1 file changed, 22 insertions(+), 18 deletions(-)

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [GIT PULL REQUEST] late md/raid1 bug fixes for 3.17
  2014-09-24  2:18 [GIT PULL REQUEST] late md/raid1 bug fixes for 3.17 NeilBrown
@ 2014-09-26 19:08 ` BillStuff
  2014-09-27  0:09   ` NeilBrown
  0 siblings, 1 reply; 3+ messages in thread
From: BillStuff @ 2014-09-26 19:08 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux RAID

On 09/23/2014 09:18 PM, NeilBrown wrote:
[snip]
>        md/raid1: intialise start_next_window for READ case to avoid hang
>

Neil, I've been testing these patches for the past week or two to see if 
they help a raid1 "check" hang I had.

They seem to help, but I noticed the above patch is different from what 
you originally sent on the list.

The original patch has an extra chunk:

@@ -1444,6 +1445,7 @@ read_again:
		r1_bio->state = 0;
		r1_bio->mddev = mddev;
		r1_bio->sector = bio->bi_iter.bi_sector + sectors_handled;
+		start_next_window = wait_barrier(conf, bio);
		goto retry_write;
	}

Is the correct patch with or without this chunk?

Thanks,
Bill

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [GIT PULL REQUEST] late md/raid1 bug fixes for 3.17
  2014-09-26 19:08 ` BillStuff
@ 2014-09-27  0:09   ` NeilBrown
  0 siblings, 0 replies; 3+ messages in thread
From: NeilBrown @ 2014-09-27  0:09 UTC (permalink / raw)
  To: BillStuff; +Cc: linux RAID

[-- Attachment #1: Type: text/plain, Size: 1186 bytes --]

On Fri, 26 Sep 2014 14:08:08 -0500 BillStuff <billstuff2001@sbcglobal.net>
wrote:

> On 09/23/2014 09:18 PM, NeilBrown wrote:
> [snip]
> >        md/raid1: intialise start_next_window for READ case to avoid hang
> >
> 
> Neil, I've been testing these patches for the past week or two to see if 
> they help a raid1 "check" hang I had.
> 
> They seem to help, but I noticed the above patch is different from what 
> you originally sent on the list.
> 
> The original patch has an extra chunk:
> 
> @@ -1444,6 +1445,7 @@ read_again:
> 		r1_bio->state = 0;
> 		r1_bio->mddev = mddev;
> 		r1_bio->sector = bio->bi_iter.bi_sector + sectors_handled;
> +		start_next_window = wait_barrier(conf, bio);
> 		goto retry_write;
> 	}
> 
> Is the correct patch with or without this chunk?
> 
> Thanks,
> Bill

That hunk was wrong.
This new r1_bio is attached to the previous one and they all complete (and
particularly all "allow_barrier") as a unit.  So only one wait_barrier is
needed.

That chunk only has any affect if you have a bad-blocks list with bad blocks
in it, and try to write a range of the device which includes the bad block.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-09-27  0:09 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-24  2:18 [GIT PULL REQUEST] late md/raid1 bug fixes for 3.17 NeilBrown
2014-09-26 19:08 ` BillStuff
2014-09-27  0:09   ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).