* Resync issue in RAID1
@ 2016-10-21 15:53 V
2016-10-28 4:01 ` NeilBrown
0 siblings, 1 reply; 6+ messages in thread
From: V @ 2016-10-21 15:53 UTC (permalink / raw)
To: linux-raid
Hi,
I am facing an issue during RAID1 resync. I have an ubuntu
4.4.0-31-generic running with raid1 configured with 2 disks as active
and 2 as spares. On the first powercycle, after installing RAID, i see
the following messages in kern.log
My disks are configured with 4K sector size (both logical and
physical) (sda and sdb are active disks for this raid)
===========
Oct 18 03:52:56 kernel: [ 52.869113] md: using 128k window, over a
total of 51167104k.
Oct 18 03:52:56 kernel: [ 52.869114] md: resuming resync of md2
from checkpoint.
Oct 18 03:52:56 kernel: [ 52.869378] sd 0:0:0:0: [sda] Bad block
number requested
Oct 18 03:52:56 kernel: [ 52.869414] sd 0:0:0:0: [sda] Bad block
number requested
Oct 18 03:52:56 kernel: [ 52.869436] sd 0:0:0:0: [sda] Bad block
number requested
Oct 18 03:52:56 kernel: [ 52.869465] sd 0:0:0:0: [sda] Bad block
number requested
Oct 18 03:52:56 kernel: [ 52.869503] sd 0:0:1:0: [sdb] Bad block
number requested
Oct 18 03:52:56 kernel: [ 52.869536] md/raid1:md2: sda:
unrecoverable I/O read error for block 3
Oct 18 03:52:56 kernel: [ 52.869581] md: md2: resync interrupted.
Oct 18 03:52:56 kernel: [ 52.869584] sd 0:0:0:0: [sda] Bad block
number requested
Oct 18 03:52:56 kernel: [ 52.869609] sd 0:0:0:0: [sda] Bad block
number requested
Oct 18 03:52:56 kernel: [ 52.869633] sd 0:0:1:0: [sdb] Bad block
number requested
Oct 18 03:52:56 kernel: [ 52.869692] md/raid1:md2: sda:
unrecoverable I/O read error for block 131
Oct 18 03:52:56 kernel: [ 52.869735] sd 0:0:0:0: [sda] Bad block
number requested
Oct 18 03:52:56 kernel: [ 52.869808] sd 0:0:1:0: [sdb] Bad block
number requested
Oct 18 03:52:56 kernel: [ 52.869837] md/raid1:md2: sda:
unrecoverable I/O read error for block 259
Oct 18 03:52:56 kernel: [ 52.869908] sd 0:0:0:0: [sda] Bad block
number requested
Oct 18 03:52:56 kernel: [ 52.869958] sd 0:0:1:0: [sdb] Bad block
number requested
Oct 18 03:52:56 kernel: [ 52.870022] md/raid1:md2: sda:
unrecoverable I/O read error for block 387
===========
Seems to be an unaligned access from mdraid1 during resync. Do you
have any idea, why we are seeing it. How do i debug this issue.
I was looking at the following patch (
http://marc.info/?l=linux-raid&m=142405887609959&w=2 ), by Nate in the
function narrow_write_error(), where the alignment was made correctly
to disk sector size. Do you think, similar roundups are required in
resync path as well.
More info:
1) We have checks for previous raid configurations present in device
partitions ( we do mdadm --examine and if present, we zero out
superblocks)
2) Each device is partitioned into 4 and used for 4 different raid devices.
3) Say sda is partitioned into 4 (sda1,sda2,sda3,sda4) and each
partition is part of one raid device.
4) Raid1 create command.
/sbin/mdadm --quiet --create /dev/md1 --force --level=1
--metadata=default --raid-devices=2 /dev/sda1 /dev/sdb1
--spare-devices=2 /dev/sdc1/dev/sdd1
Thanks in advance,
Viswesh
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: Resync issue in RAID1 2016-10-21 15:53 Resync issue in RAID1 V @ 2016-10-28 4:01 ` NeilBrown 2016-10-28 5:33 ` V 0 siblings, 1 reply; 6+ messages in thread From: NeilBrown @ 2016-10-28 4:01 UTC (permalink / raw) To: V, linux-raid [-- Attachment #1: Type: text/plain, Size: 2057 bytes --] On Sat, Oct 22 2016, V wrote: > Hi, > > I am facing an issue during RAID1 resync. I have an ubuntu > 4.4.0-31-generic running with raid1 configured with 2 disks as active > and 2 as spares. On the first powercycle, after installing RAID, i see > the following messages in kern.log > > > My disks are configured with 4K sector size (both logical and > physical) (sda and sdb are active disks for this raid) > > > =========== > Oct 18 03:52:56 kernel: [ 52.869113] md: using 128k window, over a > total of 51167104k. > Oct 18 03:52:56 kernel: [ 52.869114] md: resuming resync of md2 from checkpoint. This line (above) combined with ... > Oct 18 03:52:56 kernel: [ 52.869536] md/raid1:md2: sda: unrecoverable I/O read error for block 3 this line suggests that when you shut down, md had already started a resync, and it had checkpointed at block '3'. The subsequent error are: > Oct 18 03:52:56 kernel: [ 52.869692] md/raid1:md2: sda: unrecoverable I/O read error for block 131 > Oct 18 03:52:56 kernel: [ 52.869837] md/raid1:md2: sda: unrecoverable I/O read error for block 259 > Oct 18 03:52:56 kernel: [ 52.870022] md/raid1:md2: sda: unrecoverable I/O read error for block 387 which are every 128 blocks (aka sectors) from '3'. I know what caused that. The patch below will stop it happening again. You might be able get your array working again by stopping it and assembling with --update=resync. That will reset the checkpoint to 0. NeilBrown diff --git a/drivers/md/md.c b/drivers/md/md.c index 2cf0e1c00b9a..aa2ca23463f4 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -8099,7 +8099,8 @@ void md_do_sync(struct md_thread *thread) mddev->curr_resync > 2) { if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) { if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) { - if (mddev->curr_resync >= mddev->recovery_cp) { + if (mddev->curr_resync >= mddev->recovery_cp && + mddev->curr_resync > 3) { printk(KERN_INFO "md: checkpointing %s of %s.\n", desc, mdname(mddev)); [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 800 bytes --] ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: Resync issue in RAID1 2016-10-28 4:01 ` NeilBrown @ 2016-10-28 5:33 ` V 2016-10-28 6:01 ` NeilBrown 0 siblings, 1 reply; 6+ messages in thread From: V @ 2016-10-28 5:33 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid Hi Neil, Thanks for the response. But during this phase, why is the scsi driver complaining about bad block number ? Oct 18 03:52:56 kernel: [ 52.869378] sd 0:0:0:0: [sda] Bad block number requested Oct 18 03:52:56 kernel: [ 52.869414] sd 0:0:0:0: [sda] Bad block number requested Oct 18 03:52:56 kernel: [ 52.869436] sd 0:0:0:0: [sda] Bad block number requested Oct 18 03:52:56 kernel: [ 52.869465] sd 0:0:0:0: [sda] Bad block number requested Oct 18 03:52:56 kernel: [ 52.869503] sd 0:0:1:0: [sdb] Bad block number requested Thanks, V On Thu, Oct 27, 2016 at 9:01 PM, NeilBrown <neilb@suse.com> wrote: > On Sat, Oct 22 2016, V wrote: > >> Hi, >> >> I am facing an issue during RAID1 resync. I have an ubuntu >> 4.4.0-31-generic running with raid1 configured with 2 disks as active >> and 2 as spares. On the first powercycle, after installing RAID, i see >> the following messages in kern.log >> >> >> My disks are configured with 4K sector size (both logical and >> physical) (sda and sdb are active disks for this raid) >> >> >> =========== >> Oct 18 03:52:56 kernel: [ 52.869113] md: using 128k window, over a >> total of 51167104k. >> Oct 18 03:52:56 kernel: [ 52.869114] md: resuming resync of md2 from checkpoint. > > This line (above) combined with ... > >> Oct 18 03:52:56 kernel: [ 52.869536] md/raid1:md2: sda: unrecoverable I/O read error for block 3 > > this line suggests that when you shut down, md had already started a > resync, and it had checkpointed at block '3'. > > The subsequent error are: > >> Oct 18 03:52:56 kernel: [ 52.869692] md/raid1:md2: sda: unrecoverable I/O read error for block 131 >> Oct 18 03:52:56 kernel: [ 52.869837] md/raid1:md2: sda: unrecoverable I/O read error for block 259 >> Oct 18 03:52:56 kernel: [ 52.870022] md/raid1:md2: sda: unrecoverable I/O read error for block 387 > > which are every 128 blocks (aka sectors) from '3'. > I know what caused that. The patch below will stop it happening again. > > You might be able get your array working again by stopping it > and assembling with --update=resync. > That will reset the checkpoint to 0. > > NeilBrown > > diff --git a/drivers/md/md.c b/drivers/md/md.c > index 2cf0e1c00b9a..aa2ca23463f4 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -8099,7 +8099,8 @@ void md_do_sync(struct md_thread *thread) > mddev->curr_resync > 2) { > if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) { > if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) { > - if (mddev->curr_resync >= mddev->recovery_cp) { > + if (mddev->curr_resync >= mddev->recovery_cp && > + mddev->curr_resync > 3) { > printk(KERN_INFO > "md: checkpointing %s of %s.\n", > desc, mdname(mddev)); ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Resync issue in RAID1 2016-10-28 5:33 ` V @ 2016-10-28 6:01 ` NeilBrown 2016-10-28 6:07 ` V 0 siblings, 1 reply; 6+ messages in thread From: NeilBrown @ 2016-10-28 6:01 UTC (permalink / raw) To: V; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 3402 bytes --] On Fri, Oct 28 2016, V wrote: > Hi Neil, > > Thanks for the response. But during this phase, why is the scsi driver > complaining about bad block number ? > > Oct 18 03:52:56 kernel: [ 52.869378] sd 0:0:0:0: [sda] Bad block > number requested Because md is asking to read blocks are offsets which are not a multiple of 8 sectors. NeilBrown > Oct 18 03:52:56 kernel: [ 52.869414] sd 0:0:0:0: [sda] Bad block > number requested > Oct 18 03:52:56 kernel: [ 52.869436] sd 0:0:0:0: [sda] Bad block > number requested > Oct 18 03:52:56 kernel: [ 52.869465] sd 0:0:0:0: [sda] Bad block > number requested > Oct 18 03:52:56 kernel: [ 52.869503] sd 0:0:1:0: [sdb] Bad block > number requested > > Thanks, > V > > On Thu, Oct 27, 2016 at 9:01 PM, NeilBrown <neilb@suse.com> wrote: >> On Sat, Oct 22 2016, V wrote: >> >>> Hi, >>> >>> I am facing an issue during RAID1 resync. I have an ubuntu >>> 4.4.0-31-generic running with raid1 configured with 2 disks as active >>> and 2 as spares. On the first powercycle, after installing RAID, i see >>> the following messages in kern.log >>> >>> >>> My disks are configured with 4K sector size (both logical and >>> physical) (sda and sdb are active disks for this raid) >>> >>> >>> =========== >>> Oct 18 03:52:56 kernel: [ 52.869113] md: using 128k window, over a >>> total of 51167104k. >>> Oct 18 03:52:56 kernel: [ 52.869114] md: resuming resync of md2 from checkpoint. >> >> This line (above) combined with ... >> >>> Oct 18 03:52:56 kernel: [ 52.869536] md/raid1:md2: sda: unrecoverable I/O read error for block 3 >> >> this line suggests that when you shut down, md had already started a >> resync, and it had checkpointed at block '3'. >> >> The subsequent error are: >> >>> Oct 18 03:52:56 kernel: [ 52.869692] md/raid1:md2: sda: unrecoverable I/O read error for block 131 >>> Oct 18 03:52:56 kernel: [ 52.869837] md/raid1:md2: sda: unrecoverable I/O read error for block 259 >>> Oct 18 03:52:56 kernel: [ 52.870022] md/raid1:md2: sda: unrecoverable I/O read error for block 387 >> >> which are every 128 blocks (aka sectors) from '3'. >> I know what caused that. The patch below will stop it happening again. >> >> You might be able get your array working again by stopping it >> and assembling with --update=resync. >> That will reset the checkpoint to 0. >> >> NeilBrown >> >> diff --git a/drivers/md/md.c b/drivers/md/md.c >> index 2cf0e1c00b9a..aa2ca23463f4 100644 >> --- a/drivers/md/md.c >> +++ b/drivers/md/md.c >> @@ -8099,7 +8099,8 @@ void md_do_sync(struct md_thread *thread) >> mddev->curr_resync > 2) { >> if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) { >> if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) { >> - if (mddev->curr_resync >= mddev->recovery_cp) { >> + if (mddev->curr_resync >= mddev->recovery_cp && >> + mddev->curr_resync > 3) { >> printk(KERN_INFO >> "md: checkpointing %s of %s.\n", >> desc, mdname(mddev)); > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 800 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Resync issue in RAID1 2016-10-28 6:01 ` NeilBrown @ 2016-10-28 6:07 ` V 2016-11-04 3:33 ` NeilBrown 0 siblings, 1 reply; 6+ messages in thread From: V @ 2016-10-28 6:07 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid Is there any reason, why this happens in the resync flow. Normally the upper layer driver tries to align with device block size for the request. So could there be an issue in this path ? Thanks, V On Thu, Oct 27, 2016 at 11:01 PM, NeilBrown <neilb@suse.com> wrote: > On Fri, Oct 28 2016, V wrote: > >> Hi Neil, >> >> Thanks for the response. But during this phase, why is the scsi driver >> complaining about bad block number ? >> >> Oct 18 03:52:56 kernel: [ 52.869378] sd 0:0:0:0: [sda] Bad block >> number requested > > Because md is asking to read blocks are offsets which are not a multiple > of 8 sectors. > > NeilBrown > > >> Oct 18 03:52:56 kernel: [ 52.869414] sd 0:0:0:0: [sda] Bad block >> number requested >> Oct 18 03:52:56 kernel: [ 52.869436] sd 0:0:0:0: [sda] Bad block >> number requested >> Oct 18 03:52:56 kernel: [ 52.869465] sd 0:0:0:0: [sda] Bad block >> number requested >> Oct 18 03:52:56 kernel: [ 52.869503] sd 0:0:1:0: [sdb] Bad block >> number requested >> >> Thanks, >> V >> >> On Thu, Oct 27, 2016 at 9:01 PM, NeilBrown <neilb@suse.com> wrote: >>> On Sat, Oct 22 2016, V wrote: >>> >>>> Hi, >>>> >>>> I am facing an issue during RAID1 resync. I have an ubuntu >>>> 4.4.0-31-generic running with raid1 configured with 2 disks as active >>>> and 2 as spares. On the first powercycle, after installing RAID, i see >>>> the following messages in kern.log >>>> >>>> >>>> My disks are configured with 4K sector size (both logical and >>>> physical) (sda and sdb are active disks for this raid) >>>> >>>> >>>> =========== >>>> Oct 18 03:52:56 kernel: [ 52.869113] md: using 128k window, over a >>>> total of 51167104k. >>>> Oct 18 03:52:56 kernel: [ 52.869114] md: resuming resync of md2 from checkpoint. >>> >>> This line (above) combined with ... >>> >>>> Oct 18 03:52:56 kernel: [ 52.869536] md/raid1:md2: sda: unrecoverable I/O read error for block 3 >>> >>> this line suggests that when you shut down, md had already started a >>> resync, and it had checkpointed at block '3'. >>> >>> The subsequent error are: >>> >>>> Oct 18 03:52:56 kernel: [ 52.869692] md/raid1:md2: sda: unrecoverable I/O read error for block 131 >>>> Oct 18 03:52:56 kernel: [ 52.869837] md/raid1:md2: sda: unrecoverable I/O read error for block 259 >>>> Oct 18 03:52:56 kernel: [ 52.870022] md/raid1:md2: sda: unrecoverable I/O read error for block 387 >>> >>> which are every 128 blocks (aka sectors) from '3'. >>> I know what caused that. The patch below will stop it happening again. >>> >>> You might be able get your array working again by stopping it >>> and assembling with --update=resync. >>> That will reset the checkpoint to 0. >>> >>> NeilBrown >>> >>> diff --git a/drivers/md/md.c b/drivers/md/md.c >>> index 2cf0e1c00b9a..aa2ca23463f4 100644 >>> --- a/drivers/md/md.c >>> +++ b/drivers/md/md.c >>> @@ -8099,7 +8099,8 @@ void md_do_sync(struct md_thread *thread) >>> mddev->curr_resync > 2) { >>> if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) { >>> if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) { >>> - if (mddev->curr_resync >= mddev->recovery_cp) { >>> + if (mddev->curr_resync >= mddev->recovery_cp && >>> + mddev->curr_resync > 3) { >>> printk(KERN_INFO >>> "md: checkpointing %s of %s.\n", >>> desc, mdname(mddev)); >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Resync issue in RAID1 2016-10-28 6:07 ` V @ 2016-11-04 3:33 ` NeilBrown 0 siblings, 0 replies; 6+ messages in thread From: NeilBrown @ 2016-11-04 3:33 UTC (permalink / raw) To: V; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 4105 bytes --] On Fri, Oct 28 2016, V wrote: > Is there any reason, why this happens in the resync flow. Normally the > upper layer driver tries to align with device block size for the > request. So could there be an issue in this path ? This happens in the resync flow because there is a bug which lets the number "3" escape and be used incorrectly as a device address. The same bug wouldn't affect data from any upper level driver. NeilBrown > > Thanks, > V > > On Thu, Oct 27, 2016 at 11:01 PM, NeilBrown <neilb@suse.com> wrote: >> On Fri, Oct 28 2016, V wrote: >> >>> Hi Neil, >>> >>> Thanks for the response. But during this phase, why is the scsi driver >>> complaining about bad block number ? >>> >>> Oct 18 03:52:56 kernel: [ 52.869378] sd 0:0:0:0: [sda] Bad block >>> number requested >> >> Because md is asking to read blocks are offsets which are not a multiple >> of 8 sectors. >> >> NeilBrown >> >> >>> Oct 18 03:52:56 kernel: [ 52.869414] sd 0:0:0:0: [sda] Bad block >>> number requested >>> Oct 18 03:52:56 kernel: [ 52.869436] sd 0:0:0:0: [sda] Bad block >>> number requested >>> Oct 18 03:52:56 kernel: [ 52.869465] sd 0:0:0:0: [sda] Bad block >>> number requested >>> Oct 18 03:52:56 kernel: [ 52.869503] sd 0:0:1:0: [sdb] Bad block >>> number requested >>> >>> Thanks, >>> V >>> >>> On Thu, Oct 27, 2016 at 9:01 PM, NeilBrown <neilb@suse.com> wrote: >>>> On Sat, Oct 22 2016, V wrote: >>>> >>>>> Hi, >>>>> >>>>> I am facing an issue during RAID1 resync. I have an ubuntu >>>>> 4.4.0-31-generic running with raid1 configured with 2 disks as active >>>>> and 2 as spares. On the first powercycle, after installing RAID, i see >>>>> the following messages in kern.log >>>>> >>>>> >>>>> My disks are configured with 4K sector size (both logical and >>>>> physical) (sda and sdb are active disks for this raid) >>>>> >>>>> >>>>> =========== >>>>> Oct 18 03:52:56 kernel: [ 52.869113] md: using 128k window, over a >>>>> total of 51167104k. >>>>> Oct 18 03:52:56 kernel: [ 52.869114] md: resuming resync of md2 from checkpoint. >>>> >>>> This line (above) combined with ... >>>> >>>>> Oct 18 03:52:56 kernel: [ 52.869536] md/raid1:md2: sda: unrecoverable I/O read error for block 3 >>>> >>>> this line suggests that when you shut down, md had already started a >>>> resync, and it had checkpointed at block '3'. >>>> >>>> The subsequent error are: >>>> >>>>> Oct 18 03:52:56 kernel: [ 52.869692] md/raid1:md2: sda: unrecoverable I/O read error for block 131 >>>>> Oct 18 03:52:56 kernel: [ 52.869837] md/raid1:md2: sda: unrecoverable I/O read error for block 259 >>>>> Oct 18 03:52:56 kernel: [ 52.870022] md/raid1:md2: sda: unrecoverable I/O read error for block 387 >>>> >>>> which are every 128 blocks (aka sectors) from '3'. >>>> I know what caused that. The patch below will stop it happening again. >>>> >>>> You might be able get your array working again by stopping it >>>> and assembling with --update=resync. >>>> That will reset the checkpoint to 0. >>>> >>>> NeilBrown >>>> >>>> diff --git a/drivers/md/md.c b/drivers/md/md.c >>>> index 2cf0e1c00b9a..aa2ca23463f4 100644 >>>> --- a/drivers/md/md.c >>>> +++ b/drivers/md/md.c >>>> @@ -8099,7 +8099,8 @@ void md_do_sync(struct md_thread *thread) >>>> mddev->curr_resync > 2) { >>>> if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) { >>>> if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) { >>>> - if (mddev->curr_resync >= mddev->recovery_cp) { >>>> + if (mddev->curr_resync >= mddev->recovery_cp && >>>> + mddev->curr_resync > 3) { >>>> printk(KERN_INFO >>>> "md: checkpointing %s of %s.\n", >>>> desc, mdname(mddev)); >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 800 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-11-04 3:33 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-10-21 15:53 Resync issue in RAID1 V 2016-10-28 4:01 ` NeilBrown 2016-10-28 5:33 ` V 2016-10-28 6:01 ` NeilBrown 2016-10-28 6:07 ` V 2016-11-04 3:33 ` NeilBrown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).