linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* raid1 repair: sync_request() aborts if one of the drives has bad block recorded
@ 2012-07-12 15:38 Alexander Lyakas
  2012-07-16  3:37 ` NeilBrown
  0 siblings, 1 reply; 8+ messages in thread
From: Alexander Lyakas @ 2012-07-12 15:38 UTC (permalink / raw)
  To: linux-raid, NeilBrown

Hi Neil,
I am testing the following simple scenario:
- RAID1 with two drives
- First drive has a bad block marked in the bad block list
- "repair" is triggered

What is happening is that the code in raid1.c:sync_request() selects
candidates for reading. If it encounters a bad block recorded, it
skips this particular drive:
			if (is_badblock(rdev, sector_nr, good_sectors,
					&first_bad, &bad_sectors)) {
				if (first_bad > sector_nr)
					.../* we don't go here*/
				else {
                                        /* we go here*/
					bad_sectors -= (sector_nr - first_bad);
					if (min_bad == 0 ||
					    min_bad > bad_sectors)
						min_bad = bad_sectors;
				}
			}
			if (sector_nr < first_bad) {
                            /* we don't go here */
                            ...
But the second drive has no bad blocks recorded, so it is selected.
So we end up with read_targets=1 and min_bad>0.

Then the following code:
	if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) && read_targets > 0)
		/* extra read targets are also write targets */
		write_targets += read_targets-1;
leaves write_targets=0

Then the following code:
	if (write_targets == 0 || read_targets == 0) {
		/* There is nowhere to write, so all non-sync
		 * drives must be failed - so we are finished
		 */
		sector_t rv = max_sector - sector_nr;
aborts the resync.

Is this the intended behavior? Because it looks like this bit does not
take into account the bad-blocks existence.
I am not sure about which logic would solve it, but something like "If
we did not select a drive for reading because of bad blocks, and as a
result we have only one read_target, and MD_RECOVERY_REQUESTED, then
let's report only the bad block as skipped and not the whole drive".
Something like that, but there are probably many cases I am not
thinking about.

What do you think?

Thanks,
Alex.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-07-31  5:56 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-12 15:38 raid1 repair: sync_request() aborts if one of the drives has bad block recorded Alexander Lyakas
2012-07-16  3:37 ` NeilBrown
2012-07-16  8:45   ` Alexander Lyakas
2012-07-17  1:17     ` NeilBrown
2012-07-17 13:17       ` Alexander Lyakas
2012-07-24 19:30         ` Alexander Lyakas
2012-07-31  2:11           ` NeilBrown
2012-07-31  5:56             ` Alexander Lyakas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).