Can raid1-resync cause a corruption?

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Can raid1-resync cause a corruption?
@ 2013-12-29 11:27 Alexander Lyakas
  2013-12-29 21:41 ` NeilBrown
       [not found] ` <CA+res+QwDuaJTf1FVNtb--nTcoRjmFM4T6AadEt88UxPiG=EUw@mail.gmail.com>
  0 siblings, 2 replies; 8+ messages in thread
From: Alexander Lyakas @ 2013-12-29 11:27 UTC (permalink / raw)
  To: NeilBrown, linux-raid; +Cc: yair

Hi Neil,
I see in the code, that read_balance is careful not to do
read-balancing, if the area in question is under resync:
    if (conf->mddev->recovery_cp < MaxSector &&
        (this_sector + sectors >= conf->next_resync)) {
        choose_first = 1;
        start_disk = 0;
    } else {
        choose_first = 0;
        start_disk = conf->last_used;
    }

However, I see that next_resync is updated in sync_request:
conf->next_resync = sector_nr;
before actually completing the resync of that area, before even
submitting the read-bios. Is this correct? Should we perhaps update
next_sync at the point where we update curr_resync_completed in
md_do_sync?
Kernel that I am looking at is Ubuntu-Precise 3.2.0-29.46.

Below is a sequence of event that lead us to this question.

We have a 3-disk RADI1, with ext4 mounted on top and mysql running,
whose files are on that ext4.
# 3-disk RAID1 was recovering from an unclean shutdown (crash) of the
machine, it started resync:

Dec 26 08:05:42 vc kernel: [11576870.520393] md: md1 stopped.
Dec 26 08:05:42 vc kernel: [11576870.526898] md: bind<dm-2>
Dec 26 08:05:42 vc kernel: [11576870.534819] md: bind<dm-3>
Dec 26 08:05:42 vc kernel: [11576870.535908] md: bind<dm-1>
Dec 26 08:05:42 vc kernel: [11576870.580730] md/raid1:md1: not clean
-- starting background reconstruction
Dec 26 08:05:42 vc kernel: [11576870.580735] md/raid1:md1: active with
3 out of 3 mirrors
Dec 26 08:05:42 vc kernel: [11576870.581399] created bitmap (1 pages)
for device md1
Dec 26 08:05:42 vc kernel: [11576870.582386] md1: bitmap initialized
from disk: read 1/1 pages, set 5 of 5 bits
Dec 26 08:05:42 vc kernel: [11576870.583335] md1: detected capacity
change from 0 to 320773120
Dec 26 08:05:42 vc kernel: [11576870.585593] md: resync of RAID array md1
Dec 26 08:05:42 vc kernel: [11576870.585597] md: minimum _guaranteed_
speed: 10000 KB/sec/disk.
Dec 26 08:05:42 vc kernel: [11576870.585601] md: using maximum
available idle IO bandwidth (but not more than 200000 KB/sec) for
resync.
Dec 26 08:05:42 vc kernel: [11576870.585617] md: using 128k window,
over a total of 313255k.
Dec 26 08:05:42 vc kernel: [11576870.687868]  md1: unknown partition table

# meanwhile, ext4 was mounted on top of it (dm-4 is a devicemapper
target on top of MD1, which collects some stats, but otherwise does
not interfere with bios in any way). ext4 also realized the unclean
shutdown:
Dec 26 08:05:42 vc kernel: [11576870.936166] [14831]dm_iostat
[dm_iostat_ctr] Successfully set up dm (252:4) iostats on top of
/dev/md1 (9:1), flags=0x80
Dec 26 08:05:43 vc kernel: [11576871.159955] [EXT4 FS bs=1024, gc=39,
bpg=8192, ipg=2016, mo=9c02c818, mo2=0000]
Dec 26 08:05:43 vc kernel: [11576871.160092] EXT4-fs (dm-4): recovery complete
Dec 26 08:05:43 vc kernel: [11576871.161099] EXT4-fs (dm-4): mounted
filesystem with ordered data mode. Opts: debug,data_err=abort

# mysql was started on top of ext4, and also realized the unclean shutdown:
InnoDB: Log scan progressed past the checkpoint lsn 18148639
131226  8:05:44  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
InnoDB: Doing recovery: scanned up to log sequence number 18154016
131226  8:05:44  InnoDB: Starting an apply batch of log records to the
database...
InnoDB: Progress in percents: 11 12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92
93 94 95 96 97 98 99
InnoDB: Apply batch completed
131226  8:05:44  InnoDB: Waiting for the background threads to start

8 seconds later, raid1-resync completed:
Dec 26 08:05:52 vc kernel: [11576880.161306] md: md1: resync done.
Dec 26 08:05:52 vc kernel: [11576880.221701] RAID1 conf printout:
Dec 26 08:05:52 vc kernel: [11576880.221705]  --- wd:3 rd:3
Dec 26 08:05:52 vc kernel: [11576880.221709]  disk 0, wo:0, o:1, dev:dm-1
Dec 26 08:05:52 vc kernel: [11576880.221712]  disk 1, wo:0, o:1, dev:dm-2
Dec 26 08:05:52 vc kernel: [11576880.221714]  disk 2, wo:0, o:1, dev:dm-3

# The system ran normally for about 5 minutes, and then was shutdown cleanly:
# mysql was stopped cleanly
131226  8:11:28 [Note] /usr/sbin/mysqld: Normal shutdown
131226  8:11:28 [Note] Event Scheduler: Purging the queue. 0 events
131226  8:11:30 [Warning] /usr/sbin/mysqld: Forcing close of thread 68
 user: 'change_vsa_user'
131226  8:11:30 [Warning] /usr/sbin/mysqld: Forcing close of thread 59
 user: 'standbyVc0'
131226  8:11:30  InnoDB: Starting shutdown...
131226  8:11:31  InnoDB: Waiting for 3 pages to be flushed
131226  8:11:31  InnoDB: Shutdown completed; log sequence number 18626878
131226  8:11:31 [Note] /usr/sbin/mysqld: Shutdown complete

# ext4 was cleanly unmounted (no print in the kernel for that)
# md1 was stopped cleanly
Dec 26 08:11:31 vc kernel: [11577219.716151] md1: detected capacity
change from 320773120 to 0
Dec 26 08:11:31 vc kernel: [11577219.716158] md: md1 stopped.
Dec 26 08:11:31 vc kernel: [11577219.716167] md: unbind<dm-1>
Dec 26 08:11:31 vc kernel: [11577219.740152] md: export_rdev(dm-1)
Dec 26 08:11:31 vc kernel: [11577219.740174] md: unbind<dm-3>
Dec 26 08:11:31 vc kernel: [11577219.772137] md: export_rdev(dm-3)
Dec 26 08:11:31 vc kernel: [11577219.772158] md: unbind<dm-2>
Dec 26 08:11:31 vc kernel: [11577219.810269] md: export_rdev(dm-2)

# Then the system was brought up again
# md1 was assembled with a clean bitmap
Dec 26 08:11:40 vc kernel: [  337.918676] md: md1 stopped.
Dec 26 08:11:40 vc kernel: [  337.923588] md: bind<dm-2>
Dec 26 08:11:40 vc kernel: [  337.924085] md: bind<dm-3>
Dec 26 08:11:40 vc kernel: [  337.924511] md: bind<dm-1>
Dec 26 08:11:40 vc kernel: [  337.927127] md: raid1 personality
registered for level 1
Dec 26 08:11:40 vc kernel: [  337.927264] bio: create slab <bio-1> at 1
Dec 26 08:11:40 vc kernel: [  337.927435] md/raid1:md1: active with 3
out of 3 mirrors
Dec 26 08:11:40 vc kernel: [  337.927826] created bitmap (1 pages) for
device md1
Dec 26 08:11:40 vc kernel: [  337.928341] md1: bitmap initialized from
disk: read 1/1 pages, set 0 of 5 bits
Dec 26 08:11:40 vc kernel: [  337.929316] md1: detected capacity
change from 0 to 320773120
Dec 26 08:11:40 vc kernel: [  337.934610]  md1: unknown partition table

# ext4 was mounted cleanly
Dec 26 08:11:40 vc kernel: [  338.036300] [6536]dm_iostat
[dm_iostat_ctr] Successfully set up dm (252:4) iostats on top of
/dev/md1 (9:1), flags=0x80
Dec 26 08:11:40 vc kernel: [  338.063921] [EXT4 FS bs=1024, gc=39,
bpg=8192, ipg=2016, mo=9c02c818, mo2=0000]
Dec 26 08:11:40 vc kernel: [  338.064104] EXT4-fs (dm-4): mounted
filesystem with ordered data mode. Opts: debug,data_err=abort

# but then mysql detected a corruption
131226 8:11:40 InnoDB: Error: space id and page n:o stored in the page
InnoDB: read in are 543256175:543236195, should be 0:595!
InnoDB: Database page corruption on disk or a failed
InnoDB: file read of page 595.
InnoDB: You may have to recover from a backup.
131226 8:11:40 InnoDB: Page dump in ascii and hex (16384 bytes):

It happened only once, and we do a lot of testing of such unclean
shutdowns.The only theory that we have is that raid1 resync somehow
changed the underlying data of mysql/ext4.

Thanks,
Alex.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Can raid1-resync cause a corruption?
  2013-12-29 11:27 Can raid1-resync cause a corruption? Alexander Lyakas
@ 2013-12-29 21:41 ` NeilBrown
  2014-01-02 16:52   ` Alexander Lyakas
  2014-01-06 16:58   ` Alexander Lyakas
       [not found] ` <CA+res+QwDuaJTf1FVNtb--nTcoRjmFM4T6AadEt88UxPiG=EUw@mail.gmail.com>
  1 sibling, 2 replies; 8+ messages in thread
From: NeilBrown @ 2013-12-29 21:41 UTC (permalink / raw)
  To: Alexander Lyakas; +Cc: linux-raid, yair

[-- Attachment #1: Type: text/plain, Size: 8150 bytes --]

On Sun, 29 Dec 2013 13:27:35 +0200 Alexander Lyakas <alex.bolshoy@gmail.com>
wrote:

> Hi Neil,
> I see in the code, that read_balance is careful not to do
> read-balancing, if the area in question is under resync:
>     if (conf->mddev->recovery_cp < MaxSector &&
>         (this_sector + sectors >= conf->next_resync)) {
>         choose_first = 1;
>         start_disk = 0;
>     } else {
>         choose_first = 0;
>         start_disk = conf->last_used;
>     }
> 
> However, I see that next_resync is updated in sync_request:
> conf->next_resync = sector_nr;
> before actually completing the resync of that area, before even
> submitting the read-bios. Is this correct? Should we perhaps update
> next_sync at the point where we update curr_resync_completed in
> md_do_sync?

Your analysis appears correct.  Thanks!
Can you send a patch?  If so, could you check if raid10 needs a similar fix?

Thanks,
NeilBrown



> Kernel that I am looking at is Ubuntu-Precise 3.2.0-29.46.
> 
> Below is a sequence of event that lead us to this question.
> 
> We have a 3-disk RADI1, with ext4 mounted on top and mysql running,
> whose files are on that ext4.
> # 3-disk RAID1 was recovering from an unclean shutdown (crash) of the
> machine, it started resync:
> 
> Dec 26 08:05:42 vc kernel: [11576870.520393] md: md1 stopped.
> Dec 26 08:05:42 vc kernel: [11576870.526898] md: bind<dm-2>
> Dec 26 08:05:42 vc kernel: [11576870.534819] md: bind<dm-3>
> Dec 26 08:05:42 vc kernel: [11576870.535908] md: bind<dm-1>
> Dec 26 08:05:42 vc kernel: [11576870.580730] md/raid1:md1: not clean
> -- starting background reconstruction
> Dec 26 08:05:42 vc kernel: [11576870.580735] md/raid1:md1: active with
> 3 out of 3 mirrors
> Dec 26 08:05:42 vc kernel: [11576870.581399] created bitmap (1 pages)
> for device md1
> Dec 26 08:05:42 vc kernel: [11576870.582386] md1: bitmap initialized
> from disk: read 1/1 pages, set 5 of 5 bits
> Dec 26 08:05:42 vc kernel: [11576870.583335] md1: detected capacity
> change from 0 to 320773120
> Dec 26 08:05:42 vc kernel: [11576870.585593] md: resync of RAID array md1
> Dec 26 08:05:42 vc kernel: [11576870.585597] md: minimum _guaranteed_
> speed: 10000 KB/sec/disk.
> Dec 26 08:05:42 vc kernel: [11576870.585601] md: using maximum
> available idle IO bandwidth (but not more than 200000 KB/sec) for
> resync.
> Dec 26 08:05:42 vc kernel: [11576870.585617] md: using 128k window,
> over a total of 313255k.
> Dec 26 08:05:42 vc kernel: [11576870.687868]  md1: unknown partition table
> 
> # meanwhile, ext4 was mounted on top of it (dm-4 is a devicemapper
> target on top of MD1, which collects some stats, but otherwise does
> not interfere with bios in any way). ext4 also realized the unclean
> shutdown:
> Dec 26 08:05:42 vc kernel: [11576870.936166] [14831]dm_iostat
> [dm_iostat_ctr] Successfully set up dm (252:4) iostats on top of
> /dev/md1 (9:1), flags=0x80
> Dec 26 08:05:43 vc kernel: [11576871.159955] [EXT4 FS bs=1024, gc=39,
> bpg=8192, ipg=2016, mo=9c02c818, mo2=0000]
> Dec 26 08:05:43 vc kernel: [11576871.160092] EXT4-fs (dm-4): recovery complete
> Dec 26 08:05:43 vc kernel: [11576871.161099] EXT4-fs (dm-4): mounted
> filesystem with ordered data mode. Opts: debug,data_err=abort
> 
> # mysql was started on top of ext4, and also realized the unclean shutdown:
> InnoDB: Log scan progressed past the checkpoint lsn 18148639
> 131226  8:05:44  InnoDB: Database was not shut down normally!
> InnoDB: Starting crash recovery.
> InnoDB: Reading tablespace information from the .ibd files...
> InnoDB: Restoring possible half-written data pages from the doublewrite
> InnoDB: buffer...
> InnoDB: Doing recovery: scanned up to log sequence number 18154016
> 131226  8:05:44  InnoDB: Starting an apply batch of log records to the
> database...
> InnoDB: Progress in percents: 11 12 13 14 15 16 17 18 19 20 21 22 23
> 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
> 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69
> 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92
> 93 94 95 96 97 98 99
> InnoDB: Apply batch completed
> 131226  8:05:44  InnoDB: Waiting for the background threads to start
> 
> 8 seconds later, raid1-resync completed:
> Dec 26 08:05:52 vc kernel: [11576880.161306] md: md1: resync done.
> Dec 26 08:05:52 vc kernel: [11576880.221701] RAID1 conf printout:
> Dec 26 08:05:52 vc kernel: [11576880.221705]  --- wd:3 rd:3
> Dec 26 08:05:52 vc kernel: [11576880.221709]  disk 0, wo:0, o:1, dev:dm-1
> Dec 26 08:05:52 vc kernel: [11576880.221712]  disk 1, wo:0, o:1, dev:dm-2
> Dec 26 08:05:52 vc kernel: [11576880.221714]  disk 2, wo:0, o:1, dev:dm-3
> 
> # The system ran normally for about 5 minutes, and then was shutdown cleanly:
> # mysql was stopped cleanly
> 131226  8:11:28 [Note] /usr/sbin/mysqld: Normal shutdown
> 131226  8:11:28 [Note] Event Scheduler: Purging the queue. 0 events
> 131226  8:11:30 [Warning] /usr/sbin/mysqld: Forcing close of thread 68
>  user: 'change_vsa_user'
> 131226  8:11:30 [Warning] /usr/sbin/mysqld: Forcing close of thread 59
>  user: 'standbyVc0'
> 131226  8:11:30  InnoDB: Starting shutdown...
> 131226  8:11:31  InnoDB: Waiting for 3 pages to be flushed
> 131226  8:11:31  InnoDB: Shutdown completed; log sequence number 18626878
> 131226  8:11:31 [Note] /usr/sbin/mysqld: Shutdown complete
> 
> # ext4 was cleanly unmounted (no print in the kernel for that)
> # md1 was stopped cleanly
> Dec 26 08:11:31 vc kernel: [11577219.716151] md1: detected capacity
> change from 320773120 to 0
> Dec 26 08:11:31 vc kernel: [11577219.716158] md: md1 stopped.
> Dec 26 08:11:31 vc kernel: [11577219.716167] md: unbind<dm-1>
> Dec 26 08:11:31 vc kernel: [11577219.740152] md: export_rdev(dm-1)
> Dec 26 08:11:31 vc kernel: [11577219.740174] md: unbind<dm-3>
> Dec 26 08:11:31 vc kernel: [11577219.772137] md: export_rdev(dm-3)
> Dec 26 08:11:31 vc kernel: [11577219.772158] md: unbind<dm-2>
> Dec 26 08:11:31 vc kernel: [11577219.810269] md: export_rdev(dm-2)
> 
> # Then the system was brought up again
> # md1 was assembled with a clean bitmap
> Dec 26 08:11:40 vc kernel: [  337.918676] md: md1 stopped.
> Dec 26 08:11:40 vc kernel: [  337.923588] md: bind<dm-2>
> Dec 26 08:11:40 vc kernel: [  337.924085] md: bind<dm-3>
> Dec 26 08:11:40 vc kernel: [  337.924511] md: bind<dm-1>
> Dec 26 08:11:40 vc kernel: [  337.927127] md: raid1 personality
> registered for level 1
> Dec 26 08:11:40 vc kernel: [  337.927264] bio: create slab <bio-1> at 1
> Dec 26 08:11:40 vc kernel: [  337.927435] md/raid1:md1: active with 3
> out of 3 mirrors
> Dec 26 08:11:40 vc kernel: [  337.927826] created bitmap (1 pages) for
> device md1
> Dec 26 08:11:40 vc kernel: [  337.928341] md1: bitmap initialized from
> disk: read 1/1 pages, set 0 of 5 bits
> Dec 26 08:11:40 vc kernel: [  337.929316] md1: detected capacity
> change from 0 to 320773120
> Dec 26 08:11:40 vc kernel: [  337.934610]  md1: unknown partition table
> 
> # ext4 was mounted cleanly
> Dec 26 08:11:40 vc kernel: [  338.036300] [6536]dm_iostat
> [dm_iostat_ctr] Successfully set up dm (252:4) iostats on top of
> /dev/md1 (9:1), flags=0x80
> Dec 26 08:11:40 vc kernel: [  338.063921] [EXT4 FS bs=1024, gc=39,
> bpg=8192, ipg=2016, mo=9c02c818, mo2=0000]
> Dec 26 08:11:40 vc kernel: [  338.064104] EXT4-fs (dm-4): mounted
> filesystem with ordered data mode. Opts: debug,data_err=abort
> 
> # but then mysql detected a corruption
> 131226 8:11:40 InnoDB: Error: space id and page n:o stored in the page
> InnoDB: read in are 543256175:543236195, should be 0:595!
> InnoDB: Database page corruption on disk or a failed
> InnoDB: file read of page 595.
> InnoDB: You may have to recover from a backup.
> 131226 8:11:40 InnoDB: Page dump in ascii and hex (16384 bytes):
> 
> It happened only once, and we do a lot of testing of such unclean
> shutdowns.The only theory that we have is that raid1 resync somehow
> changed the underlying data of mysql/ext4.
> 
> Thanks,
> Alex.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Can raid1-resync cause a corruption?
  2013-12-29 21:41 ` NeilBrown
@ 2014-01-02 16:52   ` Alexander Lyakas
  2014-01-06 16:58   ` Alexander Lyakas
  1 sibling, 0 replies; 8+ messages in thread
From: Alexander Lyakas @ 2014-01-02 16:52 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Yes, Neil, I will work next week to provide a fix.

On Sun, Dec 29, 2013 at 11:41 PM, NeilBrown <neilb@suse.de> wrote:
> On Sun, 29 Dec 2013 13:27:35 +0200 Alexander Lyakas <alex.bolshoy@gmail.com>
> wrote:
>
>> Hi Neil,
>> I see in the code, that read_balance is careful not to do
>> read-balancing, if the area in question is under resync:
>>     if (conf->mddev->recovery_cp < MaxSector &&
>>         (this_sector + sectors >= conf->next_resync)) {
>>         choose_first = 1;
>>         start_disk = 0;
>>     } else {
>>         choose_first = 0;
>>         start_disk = conf->last_used;
>>     }
>>
>> However, I see that next_resync is updated in sync_request:
>> conf->next_resync = sector_nr;
>> before actually completing the resync of that area, before even
>> submitting the read-bios. Is this correct? Should we perhaps update
>> next_sync at the point where we update curr_resync_completed in
>> md_do_sync?
>
> Your analysis appears correct.  Thanks!
> Can you send a patch?  If so, could you check if raid10 needs a similar fix?
>
> Thanks,
> NeilBrown
>
>
>
>> Kernel that I am looking at is Ubuntu-Precise 3.2.0-29.46.
>>
>> Below is a sequence of event that lead us to this question.
>>
>> We have a 3-disk RADI1, with ext4 mounted on top and mysql running,
>> whose files are on that ext4.
>> # 3-disk RAID1 was recovering from an unclean shutdown (crash) of the
>> machine, it started resync:
>>
>> Dec 26 08:05:42 vc kernel: [11576870.520393] md: md1 stopped.
>> Dec 26 08:05:42 vc kernel: [11576870.526898] md: bind<dm-2>
>> Dec 26 08:05:42 vc kernel: [11576870.534819] md: bind<dm-3>
>> Dec 26 08:05:42 vc kernel: [11576870.535908] md: bind<dm-1>
>> Dec 26 08:05:42 vc kernel: [11576870.580730] md/raid1:md1: not clean
>> -- starting background reconstruction
>> Dec 26 08:05:42 vc kernel: [11576870.580735] md/raid1:md1: active with
>> 3 out of 3 mirrors
>> Dec 26 08:05:42 vc kernel: [11576870.581399] created bitmap (1 pages)
>> for device md1
>> Dec 26 08:05:42 vc kernel: [11576870.582386] md1: bitmap initialized
>> from disk: read 1/1 pages, set 5 of 5 bits
>> Dec 26 08:05:42 vc kernel: [11576870.583335] md1: detected capacity
>> change from 0 to 320773120
>> Dec 26 08:05:42 vc kernel: [11576870.585593] md: resync of RAID array md1
>> Dec 26 08:05:42 vc kernel: [11576870.585597] md: minimum _guaranteed_
>> speed: 10000 KB/sec/disk.
>> Dec 26 08:05:42 vc kernel: [11576870.585601] md: using maximum
>> available idle IO bandwidth (but not more than 200000 KB/sec) for
>> resync.
>> Dec 26 08:05:42 vc kernel: [11576870.585617] md: using 128k window,
>> over a total of 313255k.
>> Dec 26 08:05:42 vc kernel: [11576870.687868]  md1: unknown partition table
>>
>> # meanwhile, ext4 was mounted on top of it (dm-4 is a devicemapper
>> target on top of MD1, which collects some stats, but otherwise does
>> not interfere with bios in any way). ext4 also realized the unclean
>> shutdown:
>> Dec 26 08:05:42 vc kernel: [11576870.936166] [14831]dm_iostat
>> [dm_iostat_ctr] Successfully set up dm (252:4) iostats on top of
>> /dev/md1 (9:1), flags=0x80
>> Dec 26 08:05:43 vc kernel: [11576871.159955] [EXT4 FS bs=1024, gc=39,
>> bpg=8192, ipg=2016, mo=9c02c818, mo2=0000]
>> Dec 26 08:05:43 vc kernel: [11576871.160092] EXT4-fs (dm-4): recovery complete
>> Dec 26 08:05:43 vc kernel: [11576871.161099] EXT4-fs (dm-4): mounted
>> filesystem with ordered data mode. Opts: debug,data_err=abort
>>
>> # mysql was started on top of ext4, and also realized the unclean shutdown:
>> InnoDB: Log scan progressed past the checkpoint lsn 18148639
>> 131226  8:05:44  InnoDB: Database was not shut down normally!
>> InnoDB: Starting crash recovery.
>> InnoDB: Reading tablespace information from the .ibd files...
>> InnoDB: Restoring possible half-written data pages from the doublewrite
>> InnoDB: buffer...
>> InnoDB: Doing recovery: scanned up to log sequence number 18154016
>> 131226  8:05:44  InnoDB: Starting an apply batch of log records to the
>> database...
>> InnoDB: Progress in percents: 11 12 13 14 15 16 17 18 19 20 21 22 23
>> 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
>> 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69
>> 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92
>> 93 94 95 96 97 98 99
>> InnoDB: Apply batch completed
>> 131226  8:05:44  InnoDB: Waiting for the background threads to start
>>
>> 8 seconds later, raid1-resync completed:
>> Dec 26 08:05:52 vc kernel: [11576880.161306] md: md1: resync done.
>> Dec 26 08:05:52 vc kernel: [11576880.221701] RAID1 conf printout:
>> Dec 26 08:05:52 vc kernel: [11576880.221705]  --- wd:3 rd:3
>> Dec 26 08:05:52 vc kernel: [11576880.221709]  disk 0, wo:0, o:1, dev:dm-1
>> Dec 26 08:05:52 vc kernel: [11576880.221712]  disk 1, wo:0, o:1, dev:dm-2
>> Dec 26 08:05:52 vc kernel: [11576880.221714]  disk 2, wo:0, o:1, dev:dm-3
>>
>> # The system ran normally for about 5 minutes, and then was shutdown cleanly:
>> # mysql was stopped cleanly
>> 131226  8:11:28 [Note] /usr/sbin/mysqld: Normal shutdown
>> 131226  8:11:28 [Note] Event Scheduler: Purging the queue. 0 events
>> 131226  8:11:30 [Warning] /usr/sbin/mysqld: Forcing close of thread 68
>>  user: 'change_vsa_user'
>> 131226  8:11:30 [Warning] /usr/sbin/mysqld: Forcing close of thread 59
>>  user: 'standbyVc0'
>> 131226  8:11:30  InnoDB: Starting shutdown...
>> 131226  8:11:31  InnoDB: Waiting for 3 pages to be flushed
>> 131226  8:11:31  InnoDB: Shutdown completed; log sequence number 18626878
>> 131226  8:11:31 [Note] /usr/sbin/mysqld: Shutdown complete
>>
>> # ext4 was cleanly unmounted (no print in the kernel for that)
>> # md1 was stopped cleanly
>> Dec 26 08:11:31 vc kernel: [11577219.716151] md1: detected capacity
>> change from 320773120 to 0
>> Dec 26 08:11:31 vc kernel: [11577219.716158] md: md1 stopped.
>> Dec 26 08:11:31 vc kernel: [11577219.716167] md: unbind<dm-1>
>> Dec 26 08:11:31 vc kernel: [11577219.740152] md: export_rdev(dm-1)
>> Dec 26 08:11:31 vc kernel: [11577219.740174] md: unbind<dm-3>
>> Dec 26 08:11:31 vc kernel: [11577219.772137] md: export_rdev(dm-3)
>> Dec 26 08:11:31 vc kernel: [11577219.772158] md: unbind<dm-2>
>> Dec 26 08:11:31 vc kernel: [11577219.810269] md: export_rdev(dm-2)
>>
>> # Then the system was brought up again
>> # md1 was assembled with a clean bitmap
>> Dec 26 08:11:40 vc kernel: [  337.918676] md: md1 stopped.
>> Dec 26 08:11:40 vc kernel: [  337.923588] md: bind<dm-2>
>> Dec 26 08:11:40 vc kernel: [  337.924085] md: bind<dm-3>
>> Dec 26 08:11:40 vc kernel: [  337.924511] md: bind<dm-1>
>> Dec 26 08:11:40 vc kernel: [  337.927127] md: raid1 personality
>> registered for level 1
>> Dec 26 08:11:40 vc kernel: [  337.927264] bio: create slab <bio-1> at 1
>> Dec 26 08:11:40 vc kernel: [  337.927435] md/raid1:md1: active with 3
>> out of 3 mirrors
>> Dec 26 08:11:40 vc kernel: [  337.927826] created bitmap (1 pages) for
>> device md1
>> Dec 26 08:11:40 vc kernel: [  337.928341] md1: bitmap initialized from
>> disk: read 1/1 pages, set 0 of 5 bits
>> Dec 26 08:11:40 vc kernel: [  337.929316] md1: detected capacity
>> change from 0 to 320773120
>> Dec 26 08:11:40 vc kernel: [  337.934610]  md1: unknown partition table
>>
>> # ext4 was mounted cleanly
>> Dec 26 08:11:40 vc kernel: [  338.036300] [6536]dm_iostat
>> [dm_iostat_ctr] Successfully set up dm (252:4) iostats on top of
>> /dev/md1 (9:1), flags=0x80
>> Dec 26 08:11:40 vc kernel: [  338.063921] [EXT4 FS bs=1024, gc=39,
>> bpg=8192, ipg=2016, mo=9c02c818, mo2=0000]
>> Dec 26 08:11:40 vc kernel: [  338.064104] EXT4-fs (dm-4): mounted
>> filesystem with ordered data mode. Opts: debug,data_err=abort
>>
>> # but then mysql detected a corruption
>> 131226 8:11:40 InnoDB: Error: space id and page n:o stored in the page
>> InnoDB: read in are 543256175:543236195, should be 0:595!
>> InnoDB: Database page corruption on disk or a failed
>> InnoDB: file read of page 595.
>> InnoDB: You may have to recover from a backup.
>> 131226 8:11:40 InnoDB: Page dump in ascii and hex (16384 bytes):
>>
>> It happened only once, and we do a lot of testing of such unclean
>> shutdowns.The only theory that we have is that raid1 resync somehow
>> changed the underlying data of mysql/ext4.
>>
>> Thanks,
>> Alex.
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Can raid1-resync cause a corruption?
  2013-12-29 21:41 ` NeilBrown
  2014-01-02 16:52   ` Alexander Lyakas
@ 2014-01-06 16:58   ` Alexander Lyakas
  2014-01-06 23:56     ` NeilBrown
  1 sibling, 1 reply; 8+ messages in thread
From: Alexander Lyakas @ 2014-01-06 16:58 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid, yair

Hi Neil,

looking again at resync/rebuild flow in raid1, I see that sync_sequest() calls
raise_barrier(conf);
and only then sets
conf->next_resync = sector_nr;

while a READ request calls:
wait_barrier(conf);
and only later:
rdisk = read_balance(conf, r1_bio, &max_sectors);
which is the one looking at next_resync.

Basically, this guarantees that while READ is looking at next_resync,
there are no sync requests in-flight, so for sure the area up to
next_resync has already been synced. So my previous analysis seems to
be incorrect? Or still somehow the nest_resync race can happen?

Thanks,
Alex.


On Sun, Dec 29, 2013 at 11:41 PM, NeilBrown <neilb@suse.de> wrote:
> On Sun, 29 Dec 2013 13:27:35 +0200 Alexander Lyakas <alex.bolshoy@gmail.com>
> wrote:
>
>> Hi Neil,
>> I see in the code, that read_balance is careful not to do
>> read-balancing, if the area in question is under resync:
>>     if (conf->mddev->recovery_cp < MaxSector &&
>>         (this_sector + sectors >= conf->next_resync)) {
>>         choose_first = 1;
>>         start_disk = 0;
>>     } else {
>>         choose_first = 0;
>>         start_disk = conf->last_used;
>>     }
>>
>> However, I see that next_resync is updated in sync_request:
>> conf->next_resync = sector_nr;
>> before actually completing the resync of that area, before even
>> submitting the read-bios. Is this correct? Should we perhaps update
>> next_sync at the point where we update curr_resync_completed in
>> md_do_sync?
>
> Your analysis appears correct.  Thanks!
> Can you send a patch?  If so, could you check if raid10 needs a similar fix?
>
> Thanks,
> NeilBrown
>
>
>
>> Kernel that I am looking at is Ubuntu-Precise 3.2.0-29.46.
>>
>> Below is a sequence of event that lead us to this question.
>>
>> We have a 3-disk RADI1, with ext4 mounted on top and mysql running,
>> whose files are on that ext4.
>> # 3-disk RAID1 was recovering from an unclean shutdown (crash) of the
>> machine, it started resync:
>>
>> Dec 26 08:05:42 vc kernel: [11576870.520393] md: md1 stopped.
>> Dec 26 08:05:42 vc kernel: [11576870.526898] md: bind<dm-2>
>> Dec 26 08:05:42 vc kernel: [11576870.534819] md: bind<dm-3>
>> Dec 26 08:05:42 vc kernel: [11576870.535908] md: bind<dm-1>
>> Dec 26 08:05:42 vc kernel: [11576870.580730] md/raid1:md1: not clean
>> -- starting background reconstruction
>> Dec 26 08:05:42 vc kernel: [11576870.580735] md/raid1:md1: active with
>> 3 out of 3 mirrors
>> Dec 26 08:05:42 vc kernel: [11576870.581399] created bitmap (1 pages)
>> for device md1
>> Dec 26 08:05:42 vc kernel: [11576870.582386] md1: bitmap initialized
>> from disk: read 1/1 pages, set 5 of 5 bits
>> Dec 26 08:05:42 vc kernel: [11576870.583335] md1: detected capacity
>> change from 0 to 320773120
>> Dec 26 08:05:42 vc kernel: [11576870.585593] md: resync of RAID array md1
>> Dec 26 08:05:42 vc kernel: [11576870.585597] md: minimum _guaranteed_
>> speed: 10000 KB/sec/disk.
>> Dec 26 08:05:42 vc kernel: [11576870.585601] md: using maximum
>> available idle IO bandwidth (but not more than 200000 KB/sec) for
>> resync.
>> Dec 26 08:05:42 vc kernel: [11576870.585617] md: using 128k window,
>> over a total of 313255k.
>> Dec 26 08:05:42 vc kernel: [11576870.687868]  md1: unknown partition table
>>
>> # meanwhile, ext4 was mounted on top of it (dm-4 is a devicemapper
>> target on top of MD1, which collects some stats, but otherwise does
>> not interfere with bios in any way). ext4 also realized the unclean
>> shutdown:
>> Dec 26 08:05:42 vc kernel: [11576870.936166] [14831]dm_iostat
>> [dm_iostat_ctr] Successfully set up dm (252:4) iostats on top of
>> /dev/md1 (9:1), flags=0x80
>> Dec 26 08:05:43 vc kernel: [11576871.159955] [EXT4 FS bs=1024, gc=39,
>> bpg=8192, ipg=2016, mo=9c02c818, mo2=0000]
>> Dec 26 08:05:43 vc kernel: [11576871.160092] EXT4-fs (dm-4): recovery complete
>> Dec 26 08:05:43 vc kernel: [11576871.161099] EXT4-fs (dm-4): mounted
>> filesystem with ordered data mode. Opts: debug,data_err=abort
>>
>> # mysql was started on top of ext4, and also realized the unclean shutdown:
>> InnoDB: Log scan progressed past the checkpoint lsn 18148639
>> 131226  8:05:44  InnoDB: Database was not shut down normally!
>> InnoDB: Starting crash recovery.
>> InnoDB: Reading tablespace information from the .ibd files...
>> InnoDB: Restoring possible half-written data pages from the doublewrite
>> InnoDB: buffer...
>> InnoDB: Doing recovery: scanned up to log sequence number 18154016
>> 131226  8:05:44  InnoDB: Starting an apply batch of log records to the
>> database...
>> InnoDB: Progress in percents: 11 12 13 14 15 16 17 18 19 20 21 22 23
>> 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
>> 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69
>> 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92
>> 93 94 95 96 97 98 99
>> InnoDB: Apply batch completed
>> 131226  8:05:44  InnoDB: Waiting for the background threads to start
>>
>> 8 seconds later, raid1-resync completed:
>> Dec 26 08:05:52 vc kernel: [11576880.161306] md: md1: resync done.
>> Dec 26 08:05:52 vc kernel: [11576880.221701] RAID1 conf printout:
>> Dec 26 08:05:52 vc kernel: [11576880.221705]  --- wd:3 rd:3
>> Dec 26 08:05:52 vc kernel: [11576880.221709]  disk 0, wo:0, o:1, dev:dm-1
>> Dec 26 08:05:52 vc kernel: [11576880.221712]  disk 1, wo:0, o:1, dev:dm-2
>> Dec 26 08:05:52 vc kernel: [11576880.221714]  disk 2, wo:0, o:1, dev:dm-3
>>
>> # The system ran normally for about 5 minutes, and then was shutdown cleanly:
>> # mysql was stopped cleanly
>> 131226  8:11:28 [Note] /usr/sbin/mysqld: Normal shutdown
>> 131226  8:11:28 [Note] Event Scheduler: Purging the queue. 0 events
>> 131226  8:11:30 [Warning] /usr/sbin/mysqld: Forcing close of thread 68
>>  user: 'change_vsa_user'
>> 131226  8:11:30 [Warning] /usr/sbin/mysqld: Forcing close of thread 59
>>  user: 'standbyVc0'
>> 131226  8:11:30  InnoDB: Starting shutdown...
>> 131226  8:11:31  InnoDB: Waiting for 3 pages to be flushed
>> 131226  8:11:31  InnoDB: Shutdown completed; log sequence number 18626878
>> 131226  8:11:31 [Note] /usr/sbin/mysqld: Shutdown complete
>>
>> # ext4 was cleanly unmounted (no print in the kernel for that)
>> # md1 was stopped cleanly
>> Dec 26 08:11:31 vc kernel: [11577219.716151] md1: detected capacity
>> change from 320773120 to 0
>> Dec 26 08:11:31 vc kernel: [11577219.716158] md: md1 stopped.
>> Dec 26 08:11:31 vc kernel: [11577219.716167] md: unbind<dm-1>
>> Dec 26 08:11:31 vc kernel: [11577219.740152] md: export_rdev(dm-1)
>> Dec 26 08:11:31 vc kernel: [11577219.740174] md: unbind<dm-3>
>> Dec 26 08:11:31 vc kernel: [11577219.772137] md: export_rdev(dm-3)
>> Dec 26 08:11:31 vc kernel: [11577219.772158] md: unbind<dm-2>
>> Dec 26 08:11:31 vc kernel: [11577219.810269] md: export_rdev(dm-2)
>>
>> # Then the system was brought up again
>> # md1 was assembled with a clean bitmap
>> Dec 26 08:11:40 vc kernel: [  337.918676] md: md1 stopped.
>> Dec 26 08:11:40 vc kernel: [  337.923588] md: bind<dm-2>
>> Dec 26 08:11:40 vc kernel: [  337.924085] md: bind<dm-3>
>> Dec 26 08:11:40 vc kernel: [  337.924511] md: bind<dm-1>
>> Dec 26 08:11:40 vc kernel: [  337.927127] md: raid1 personality
>> registered for level 1
>> Dec 26 08:11:40 vc kernel: [  337.927264] bio: create slab <bio-1> at 1
>> Dec 26 08:11:40 vc kernel: [  337.927435] md/raid1:md1: active with 3
>> out of 3 mirrors
>> Dec 26 08:11:40 vc kernel: [  337.927826] created bitmap (1 pages) for
>> device md1
>> Dec 26 08:11:40 vc kernel: [  337.928341] md1: bitmap initialized from
>> disk: read 1/1 pages, set 0 of 5 bits
>> Dec 26 08:11:40 vc kernel: [  337.929316] md1: detected capacity
>> change from 0 to 320773120
>> Dec 26 08:11:40 vc kernel: [  337.934610]  md1: unknown partition table
>>
>> # ext4 was mounted cleanly
>> Dec 26 08:11:40 vc kernel: [  338.036300] [6536]dm_iostat
>> [dm_iostat_ctr] Successfully set up dm (252:4) iostats on top of
>> /dev/md1 (9:1), flags=0x80
>> Dec 26 08:11:40 vc kernel: [  338.063921] [EXT4 FS bs=1024, gc=39,
>> bpg=8192, ipg=2016, mo=9c02c818, mo2=0000]
>> Dec 26 08:11:40 vc kernel: [  338.064104] EXT4-fs (dm-4): mounted
>> filesystem with ordered data mode. Opts: debug,data_err=abort
>>
>> # but then mysql detected a corruption
>> 131226 8:11:40 InnoDB: Error: space id and page n:o stored in the page
>> InnoDB: read in are 543256175:543236195, should be 0:595!
>> InnoDB: Database page corruption on disk or a failed
>> InnoDB: file read of page 595.
>> InnoDB: You may have to recover from a backup.
>> 131226 8:11:40 InnoDB: Page dump in ascii and hex (16384 bytes):
>>
>> It happened only once, and we do a lot of testing of such unclean
>> shutdowns.The only theory that we have is that raid1 resync somehow
>> changed the underlying data of mysql/ext4.
>>
>> Thanks,
>> Alex.
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Can raid1-resync cause a corruption?
  2014-01-06 16:58   ` Alexander Lyakas
@ 2014-01-06 23:56     ` NeilBrown
  2014-01-07 15:01       ` Alexander Lyakas
  0 siblings, 1 reply; 8+ messages in thread
From: NeilBrown @ 2014-01-06 23:56 UTC (permalink / raw)
  To: Alexander Lyakas; +Cc: linux-raid, yair

[-- Attachment #1: Type: text/plain, Size: 9981 bytes --]

On Mon, 6 Jan 2014 18:58:30 +0200 Alexander Lyakas <alex.bolshoy@gmail.com>
wrote:

> Hi Neil,
> 
> looking again at resync/rebuild flow in raid1, I see that sync_sequest() calls
> raise_barrier(conf);
> and only then sets
> conf->next_resync = sector_nr;
> 
> while a READ request calls:
> wait_barrier(conf);
> and only later:
> rdisk = read_balance(conf, r1_bio, &max_sectors);
> which is the one looking at next_resync.
> 
> Basically, this guarantees that while READ is looking at next_resync,
> there are no sync requests in-flight, so for sure the area up to
> next_resync has already been synced. So my previous analysis seems to
> be incorrect? Or still somehow the nest_resync race can happen?

I know I agreed with you last time when you were wrong, so agreeing with you
again probably would sound very convincing, but it seems you are right.
The RAID1 code doesn't seem to have a race here (which is good news to me of
course).

So you have a corruption which could be caused by md/raid1, or by ext4fs, or
by mysql.  Or maybe by your dm layer.  And you cannot easily reproduce it.
That is not going to be easy to fix :-(   I'm afraid there is nothing I can
suggest.

NeilBrown


> 
> Thanks,
> Alex.
> 
> 
> On Sun, Dec 29, 2013 at 11:41 PM, NeilBrown <neilb@suse.de> wrote:
> > On Sun, 29 Dec 2013 13:27:35 +0200 Alexander Lyakas <alex.bolshoy@gmail.com>
> > wrote:
> >
> >> Hi Neil,
> >> I see in the code, that read_balance is careful not to do
> >> read-balancing, if the area in question is under resync:
> >>     if (conf->mddev->recovery_cp < MaxSector &&
> >>         (this_sector + sectors >= conf->next_resync)) {
> >>         choose_first = 1;
> >>         start_disk = 0;
> >>     } else {
> >>         choose_first = 0;
> >>         start_disk = conf->last_used;
> >>     }
> >>
> >> However, I see that next_resync is updated in sync_request:
> >> conf->next_resync = sector_nr;
> >> before actually completing the resync of that area, before even
> >> submitting the read-bios. Is this correct? Should we perhaps update
> >> next_sync at the point where we update curr_resync_completed in
> >> md_do_sync?
> >
> > Your analysis appears correct.  Thanks!
> > Can you send a patch?  If so, could you check if raid10 needs a similar fix?
> >
> > Thanks,
> > NeilBrown
> >
> >
> >
> >> Kernel that I am looking at is Ubuntu-Precise 3.2.0-29.46.
> >>
> >> Below is a sequence of event that lead us to this question.
> >>
> >> We have a 3-disk RADI1, with ext4 mounted on top and mysql running,
> >> whose files are on that ext4.
> >> # 3-disk RAID1 was recovering from an unclean shutdown (crash) of the
> >> machine, it started resync:
> >>
> >> Dec 26 08:05:42 vc kernel: [11576870.520393] md: md1 stopped.
> >> Dec 26 08:05:42 vc kernel: [11576870.526898] md: bind<dm-2>
> >> Dec 26 08:05:42 vc kernel: [11576870.534819] md: bind<dm-3>
> >> Dec 26 08:05:42 vc kernel: [11576870.535908] md: bind<dm-1>
> >> Dec 26 08:05:42 vc kernel: [11576870.580730] md/raid1:md1: not clean
> >> -- starting background reconstruction
> >> Dec 26 08:05:42 vc kernel: [11576870.580735] md/raid1:md1: active with
> >> 3 out of 3 mirrors
> >> Dec 26 08:05:42 vc kernel: [11576870.581399] created bitmap (1 pages)
> >> for device md1
> >> Dec 26 08:05:42 vc kernel: [11576870.582386] md1: bitmap initialized
> >> from disk: read 1/1 pages, set 5 of 5 bits
> >> Dec 26 08:05:42 vc kernel: [11576870.583335] md1: detected capacity
> >> change from 0 to 320773120
> >> Dec 26 08:05:42 vc kernel: [11576870.585593] md: resync of RAID array md1
> >> Dec 26 08:05:42 vc kernel: [11576870.585597] md: minimum _guaranteed_
> >> speed: 10000 KB/sec/disk.
> >> Dec 26 08:05:42 vc kernel: [11576870.585601] md: using maximum
> >> available idle IO bandwidth (but not more than 200000 KB/sec) for
> >> resync.
> >> Dec 26 08:05:42 vc kernel: [11576870.585617] md: using 128k window,
> >> over a total of 313255k.
> >> Dec 26 08:05:42 vc kernel: [11576870.687868]  md1: unknown partition table
> >>
> >> # meanwhile, ext4 was mounted on top of it (dm-4 is a devicemapper
> >> target on top of MD1, which collects some stats, but otherwise does
> >> not interfere with bios in any way). ext4 also realized the unclean
> >> shutdown:
> >> Dec 26 08:05:42 vc kernel: [11576870.936166] [14831]dm_iostat
> >> [dm_iostat_ctr] Successfully set up dm (252:4) iostats on top of
> >> /dev/md1 (9:1), flags=0x80
> >> Dec 26 08:05:43 vc kernel: [11576871.159955] [EXT4 FS bs=1024, gc=39,
> >> bpg=8192, ipg=2016, mo=9c02c818, mo2=0000]
> >> Dec 26 08:05:43 vc kernel: [11576871.160092] EXT4-fs (dm-4): recovery complete
> >> Dec 26 08:05:43 vc kernel: [11576871.161099] EXT4-fs (dm-4): mounted
> >> filesystem with ordered data mode. Opts: debug,data_err=abort
> >>
> >> # mysql was started on top of ext4, and also realized the unclean shutdown:
> >> InnoDB: Log scan progressed past the checkpoint lsn 18148639
> >> 131226  8:05:44  InnoDB: Database was not shut down normally!
> >> InnoDB: Starting crash recovery.
> >> InnoDB: Reading tablespace information from the .ibd files...
> >> InnoDB: Restoring possible half-written data pages from the doublewrite
> >> InnoDB: buffer...
> >> InnoDB: Doing recovery: scanned up to log sequence number 18154016
> >> 131226  8:05:44  InnoDB: Starting an apply batch of log records to the
> >> database...
> >> InnoDB: Progress in percents: 11 12 13 14 15 16 17 18 19 20 21 22 23
> >> 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
> >> 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69
> >> 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92
> >> 93 94 95 96 97 98 99
> >> InnoDB: Apply batch completed
> >> 131226  8:05:44  InnoDB: Waiting for the background threads to start
> >>
> >> 8 seconds later, raid1-resync completed:
> >> Dec 26 08:05:52 vc kernel: [11576880.161306] md: md1: resync done.
> >> Dec 26 08:05:52 vc kernel: [11576880.221701] RAID1 conf printout:
> >> Dec 26 08:05:52 vc kernel: [11576880.221705]  --- wd:3 rd:3
> >> Dec 26 08:05:52 vc kernel: [11576880.221709]  disk 0, wo:0, o:1, dev:dm-1
> >> Dec 26 08:05:52 vc kernel: [11576880.221712]  disk 1, wo:0, o:1, dev:dm-2
> >> Dec 26 08:05:52 vc kernel: [11576880.221714]  disk 2, wo:0, o:1, dev:dm-3
> >>
> >> # The system ran normally for about 5 minutes, and then was shutdown cleanly:
> >> # mysql was stopped cleanly
> >> 131226  8:11:28 [Note] /usr/sbin/mysqld: Normal shutdown
> >> 131226  8:11:28 [Note] Event Scheduler: Purging the queue. 0 events
> >> 131226  8:11:30 [Warning] /usr/sbin/mysqld: Forcing close of thread 68
> >>  user: 'change_vsa_user'
> >> 131226  8:11:30 [Warning] /usr/sbin/mysqld: Forcing close of thread 59
> >>  user: 'standbyVc0'
> >> 131226  8:11:30  InnoDB: Starting shutdown...
> >> 131226  8:11:31  InnoDB: Waiting for 3 pages to be flushed
> >> 131226  8:11:31  InnoDB: Shutdown completed; log sequence number 18626878
> >> 131226  8:11:31 [Note] /usr/sbin/mysqld: Shutdown complete
> >>
> >> # ext4 was cleanly unmounted (no print in the kernel for that)
> >> # md1 was stopped cleanly
> >> Dec 26 08:11:31 vc kernel: [11577219.716151] md1: detected capacity
> >> change from 320773120 to 0
> >> Dec 26 08:11:31 vc kernel: [11577219.716158] md: md1 stopped.
> >> Dec 26 08:11:31 vc kernel: [11577219.716167] md: unbind<dm-1>
> >> Dec 26 08:11:31 vc kernel: [11577219.740152] md: export_rdev(dm-1)
> >> Dec 26 08:11:31 vc kernel: [11577219.740174] md: unbind<dm-3>
> >> Dec 26 08:11:31 vc kernel: [11577219.772137] md: export_rdev(dm-3)
> >> Dec 26 08:11:31 vc kernel: [11577219.772158] md: unbind<dm-2>
> >> Dec 26 08:11:31 vc kernel: [11577219.810269] md: export_rdev(dm-2)
> >>
> >> # Then the system was brought up again
> >> # md1 was assembled with a clean bitmap
> >> Dec 26 08:11:40 vc kernel: [  337.918676] md: md1 stopped.
> >> Dec 26 08:11:40 vc kernel: [  337.923588] md: bind<dm-2>
> >> Dec 26 08:11:40 vc kernel: [  337.924085] md: bind<dm-3>
> >> Dec 26 08:11:40 vc kernel: [  337.924511] md: bind<dm-1>
> >> Dec 26 08:11:40 vc kernel: [  337.927127] md: raid1 personality
> >> registered for level 1
> >> Dec 26 08:11:40 vc kernel: [  337.927264] bio: create slab <bio-1> at 1
> >> Dec 26 08:11:40 vc kernel: [  337.927435] md/raid1:md1: active with 3
> >> out of 3 mirrors
> >> Dec 26 08:11:40 vc kernel: [  337.927826] created bitmap (1 pages) for
> >> device md1
> >> Dec 26 08:11:40 vc kernel: [  337.928341] md1: bitmap initialized from
> >> disk: read 1/1 pages, set 0 of 5 bits
> >> Dec 26 08:11:40 vc kernel: [  337.929316] md1: detected capacity
> >> change from 0 to 320773120
> >> Dec 26 08:11:40 vc kernel: [  337.934610]  md1: unknown partition table
> >>
> >> # ext4 was mounted cleanly
> >> Dec 26 08:11:40 vc kernel: [  338.036300] [6536]dm_iostat
> >> [dm_iostat_ctr] Successfully set up dm (252:4) iostats on top of
> >> /dev/md1 (9:1), flags=0x80
> >> Dec 26 08:11:40 vc kernel: [  338.063921] [EXT4 FS bs=1024, gc=39,
> >> bpg=8192, ipg=2016, mo=9c02c818, mo2=0000]
> >> Dec 26 08:11:40 vc kernel: [  338.064104] EXT4-fs (dm-4): mounted
> >> filesystem with ordered data mode. Opts: debug,data_err=abort
> >>
> >> # but then mysql detected a corruption
> >> 131226 8:11:40 InnoDB: Error: space id and page n:o stored in the page
> >> InnoDB: read in are 543256175:543236195, should be 0:595!
> >> InnoDB: Database page corruption on disk or a failed
> >> InnoDB: file read of page 595.
> >> InnoDB: You may have to recover from a backup.
> >> 131226 8:11:40 InnoDB: Page dump in ascii and hex (16384 bytes):
> >>
> >> It happened only once, and we do a lot of testing of such unclean
> >> shutdowns.The only theory that we have is that raid1 resync somehow
> >> changed the underlying data of mysql/ext4.
> >>
> >> Thanks,
> >> Alex.
> >


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Can raid1-resync cause a corruption?
  2014-01-06 23:56     ` NeilBrown
@ 2014-01-07 15:01       ` Alexander Lyakas
  2014-01-09  3:10         ` NeilBrown
  0 siblings, 1 reply; 8+ messages in thread
From: Alexander Lyakas @ 2014-01-07 15:01 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid, yair

Hi Neil,
Thank you for your comments. Yes, apparently in this case md/raid1 was
not the cause. I studied the code more by adding prints and following
the resync flow, but cannot see any obvious problem.

I did find another case, in which raid1-resync can read phantom data,
although this is not what happened to us:
# raid1 has 3 disks A,B,C and is resyncing after an unclean shutdown.
sync_request always selects disk=A as read_disk.
# application reads from far sector (beyond next_resync), so
read_balance() selects disk=A to read from (it is the first one)
# disk A fails
# resync aborts and restarts, now sync_request reads from B and syncs into C
# application reads again from the same far sector, now read_balance()
selects disk B to read from

So potentially we could get a different data from these two reads. In
our case, though, there were no disk failures.

FWIW, the raid1 code I was once responsible for, treated this
situation as follows:
# READ comes from application
# raid1 sees that it is resyncing, so it locks the relevant area of
the raid1 and syncs it. Then it unlocks and proceeds to serve the READ
normally
# resync thread comes to appropriate area, locks it and sees that it
has already been synced (bits are off in the bitmap), so it proceeds
further

However in md/raid1, there is no mechanism currently that can lock a
part of the raid. We only have raise_barrier/wait_barrier that
effectively locks the whole capacity.

Is it, for example, reasonable to READ the data as you normally do,
then to trigger a WRITE with the same data and only then to complete
the original READ? There are a lot of inefficiencies here, I know,
like re-writing the same data again on read_disk, and syncing this
data again later. (I know, patches are welcome...)

Thanks,
Alex.

On Tue, Jan 7, 2014 at 1:56 AM, NeilBrown <neilb@suse.de> wrote:
> On Mon, 6 Jan 2014 18:58:30 +0200 Alexander Lyakas <alex.bolshoy@gmail.com>
> wrote:
>
>> Hi Neil,
>>
>> looking again at resync/rebuild flow in raid1, I see that sync_sequest() calls
>> raise_barrier(conf);
>> and only then sets
>> conf->next_resync = sector_nr;
>>
>> while a READ request calls:
>> wait_barrier(conf);
>> and only later:
>> rdisk = read_balance(conf, r1_bio, &max_sectors);
>> which is the one looking at next_resync.
>>
>> Basically, this guarantees that while READ is looking at next_resync,
>> there are no sync requests in-flight, so for sure the area up to
>> next_resync has already been synced. So my previous analysis seems to
>> be incorrect? Or still somehow the nest_resync race can happen?
>
> I know I agreed with you last time when you were wrong, so agreeing with you
> again probably would sound very convincing, but it seems you are right.
> The RAID1 code doesn't seem to have a race here (which is good news to me of
> course).
>
> So you have a corruption which could be caused by md/raid1, or by ext4fs, or
> by mysql.  Or maybe by your dm layer.  And you cannot easily reproduce it.
> That is not going to be easy to fix :-(   I'm afraid there is nothing I can
> suggest.
>
> NeilBrown
>
>
>>
>> Thanks,
>> Alex.
>>
>>
>> On Sun, Dec 29, 2013 at 11:41 PM, NeilBrown <neilb@suse.de> wrote:
>> > On Sun, 29 Dec 2013 13:27:35 +0200 Alexander Lyakas <alex.bolshoy@gmail.com>
>> > wrote:
>> >
>> >> Hi Neil,
>> >> I see in the code, that read_balance is careful not to do
>> >> read-balancing, if the area in question is under resync:
>> >>     if (conf->mddev->recovery_cp < MaxSector &&
>> >>         (this_sector + sectors >= conf->next_resync)) {
>> >>         choose_first = 1;
>> >>         start_disk = 0;
>> >>     } else {
>> >>         choose_first = 0;
>> >>         start_disk = conf->last_used;
>> >>     }
>> >>
>> >> However, I see that next_resync is updated in sync_request:
>> >> conf->next_resync = sector_nr;
>> >> before actually completing the resync of that area, before even
>> >> submitting the read-bios. Is this correct? Should we perhaps update
>> >> next_sync at the point where we update curr_resync_completed in
>> >> md_do_sync?
>> >
>> > Your analysis appears correct.  Thanks!
>> > Can you send a patch?  If so, could you check if raid10 needs a similar fix?
>> >
>> > Thanks,
>> > NeilBrown
>> >
>> >
>> >
>> >> Kernel that I am looking at is Ubuntu-Precise 3.2.0-29.46.
>> >>
>> >> Below is a sequence of event that lead us to this question.
>> >>
>> >> We have a 3-disk RADI1, with ext4 mounted on top and mysql running,
>> >> whose files are on that ext4.
>> >> # 3-disk RAID1 was recovering from an unclean shutdown (crash) of the
>> >> machine, it started resync:
>> >>
>> >> Dec 26 08:05:42 vc kernel: [11576870.520393] md: md1 stopped.
>> >> Dec 26 08:05:42 vc kernel: [11576870.526898] md: bind<dm-2>
>> >> Dec 26 08:05:42 vc kernel: [11576870.534819] md: bind<dm-3>
>> >> Dec 26 08:05:42 vc kernel: [11576870.535908] md: bind<dm-1>
>> >> Dec 26 08:05:42 vc kernel: [11576870.580730] md/raid1:md1: not clean
>> >> -- starting background reconstruction
>> >> Dec 26 08:05:42 vc kernel: [11576870.580735] md/raid1:md1: active with
>> >> 3 out of 3 mirrors
>> >> Dec 26 08:05:42 vc kernel: [11576870.581399] created bitmap (1 pages)
>> >> for device md1
>> >> Dec 26 08:05:42 vc kernel: [11576870.582386] md1: bitmap initialized
>> >> from disk: read 1/1 pages, set 5 of 5 bits
>> >> Dec 26 08:05:42 vc kernel: [11576870.583335] md1: detected capacity
>> >> change from 0 to 320773120
>> >> Dec 26 08:05:42 vc kernel: [11576870.585593] md: resync of RAID array md1
>> >> Dec 26 08:05:42 vc kernel: [11576870.585597] md: minimum _guaranteed_
>> >> speed: 10000 KB/sec/disk.
>> >> Dec 26 08:05:42 vc kernel: [11576870.585601] md: using maximum
>> >> available idle IO bandwidth (but not more than 200000 KB/sec) for
>> >> resync.
>> >> Dec 26 08:05:42 vc kernel: [11576870.585617] md: using 128k window,
>> >> over a total of 313255k.
>> >> Dec 26 08:05:42 vc kernel: [11576870.687868]  md1: unknown partition table
>> >>
>> >> # meanwhile, ext4 was mounted on top of it (dm-4 is a devicemapper
>> >> target on top of MD1, which collects some stats, but otherwise does
>> >> not interfere with bios in any way). ext4 also realized the unclean
>> >> shutdown:
>> >> Dec 26 08:05:42 vc kernel: [11576870.936166] [14831]dm_iostat
>> >> [dm_iostat_ctr] Successfully set up dm (252:4) iostats on top of
>> >> /dev/md1 (9:1), flags=0x80
>> >> Dec 26 08:05:43 vc kernel: [11576871.159955] [EXT4 FS bs=1024, gc=39,
>> >> bpg=8192, ipg=2016, mo=9c02c818, mo2=0000]
>> >> Dec 26 08:05:43 vc kernel: [11576871.160092] EXT4-fs (dm-4): recovery complete
>> >> Dec 26 08:05:43 vc kernel: [11576871.161099] EXT4-fs (dm-4): mounted
>> >> filesystem with ordered data mode. Opts: debug,data_err=abort
>> >>
>> >> # mysql was started on top of ext4, and also realized the unclean shutdown:
>> >> InnoDB: Log scan progressed past the checkpoint lsn 18148639
>> >> 131226  8:05:44  InnoDB: Database was not shut down normally!
>> >> InnoDB: Starting crash recovery.
>> >> InnoDB: Reading tablespace information from the .ibd files...
>> >> InnoDB: Restoring possible half-written data pages from the doublewrite
>> >> InnoDB: buffer...
>> >> InnoDB: Doing recovery: scanned up to log sequence number 18154016
>> >> 131226  8:05:44  InnoDB: Starting an apply batch of log records to the
>> >> database...
>> >> InnoDB: Progress in percents: 11 12 13 14 15 16 17 18 19 20 21 22 23
>> >> 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
>> >> 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69
>> >> 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92
>> >> 93 94 95 96 97 98 99
>> >> InnoDB: Apply batch completed
>> >> 131226  8:05:44  InnoDB: Waiting for the background threads to start
>> >>
>> >> 8 seconds later, raid1-resync completed:
>> >> Dec 26 08:05:52 vc kernel: [11576880.161306] md: md1: resync done.
>> >> Dec 26 08:05:52 vc kernel: [11576880.221701] RAID1 conf printout:
>> >> Dec 26 08:05:52 vc kernel: [11576880.221705]  --- wd:3 rd:3
>> >> Dec 26 08:05:52 vc kernel: [11576880.221709]  disk 0, wo:0, o:1, dev:dm-1
>> >> Dec 26 08:05:52 vc kernel: [11576880.221712]  disk 1, wo:0, o:1, dev:dm-2
>> >> Dec 26 08:05:52 vc kernel: [11576880.221714]  disk 2, wo:0, o:1, dev:dm-3
>> >>
>> >> # The system ran normally for about 5 minutes, and then was shutdown cleanly:
>> >> # mysql was stopped cleanly
>> >> 131226  8:11:28 [Note] /usr/sbin/mysqld: Normal shutdown
>> >> 131226  8:11:28 [Note] Event Scheduler: Purging the queue. 0 events
>> >> 131226  8:11:30 [Warning] /usr/sbin/mysqld: Forcing close of thread 68
>> >>  user: 'change_vsa_user'
>> >> 131226  8:11:30 [Warning] /usr/sbin/mysqld: Forcing close of thread 59
>> >>  user: 'standbyVc0'
>> >> 131226  8:11:30  InnoDB: Starting shutdown...
>> >> 131226  8:11:31  InnoDB: Waiting for 3 pages to be flushed
>> >> 131226  8:11:31  InnoDB: Shutdown completed; log sequence number 18626878
>> >> 131226  8:11:31 [Note] /usr/sbin/mysqld: Shutdown complete
>> >>
>> >> # ext4 was cleanly unmounted (no print in the kernel for that)
>> >> # md1 was stopped cleanly
>> >> Dec 26 08:11:31 vc kernel: [11577219.716151] md1: detected capacity
>> >> change from 320773120 to 0
>> >> Dec 26 08:11:31 vc kernel: [11577219.716158] md: md1 stopped.
>> >> Dec 26 08:11:31 vc kernel: [11577219.716167] md: unbind<dm-1>
>> >> Dec 26 08:11:31 vc kernel: [11577219.740152] md: export_rdev(dm-1)
>> >> Dec 26 08:11:31 vc kernel: [11577219.740174] md: unbind<dm-3>
>> >> Dec 26 08:11:31 vc kernel: [11577219.772137] md: export_rdev(dm-3)
>> >> Dec 26 08:11:31 vc kernel: [11577219.772158] md: unbind<dm-2>
>> >> Dec 26 08:11:31 vc kernel: [11577219.810269] md: export_rdev(dm-2)
>> >>
>> >> # Then the system was brought up again
>> >> # md1 was assembled with a clean bitmap
>> >> Dec 26 08:11:40 vc kernel: [  337.918676] md: md1 stopped.
>> >> Dec 26 08:11:40 vc kernel: [  337.923588] md: bind<dm-2>
>> >> Dec 26 08:11:40 vc kernel: [  337.924085] md: bind<dm-3>
>> >> Dec 26 08:11:40 vc kernel: [  337.924511] md: bind<dm-1>
>> >> Dec 26 08:11:40 vc kernel: [  337.927127] md: raid1 personality
>> >> registered for level 1
>> >> Dec 26 08:11:40 vc kernel: [  337.927264] bio: create slab <bio-1> at 1
>> >> Dec 26 08:11:40 vc kernel: [  337.927435] md/raid1:md1: active with 3
>> >> out of 3 mirrors
>> >> Dec 26 08:11:40 vc kernel: [  337.927826] created bitmap (1 pages) for
>> >> device md1
>> >> Dec 26 08:11:40 vc kernel: [  337.928341] md1: bitmap initialized from
>> >> disk: read 1/1 pages, set 0 of 5 bits
>> >> Dec 26 08:11:40 vc kernel: [  337.929316] md1: detected capacity
>> >> change from 0 to 320773120
>> >> Dec 26 08:11:40 vc kernel: [  337.934610]  md1: unknown partition table
>> >>
>> >> # ext4 was mounted cleanly
>> >> Dec 26 08:11:40 vc kernel: [  338.036300] [6536]dm_iostat
>> >> [dm_iostat_ctr] Successfully set up dm (252:4) iostats on top of
>> >> /dev/md1 (9:1), flags=0x80
>> >> Dec 26 08:11:40 vc kernel: [  338.063921] [EXT4 FS bs=1024, gc=39,
>> >> bpg=8192, ipg=2016, mo=9c02c818, mo2=0000]
>> >> Dec 26 08:11:40 vc kernel: [  338.064104] EXT4-fs (dm-4): mounted
>> >> filesystem with ordered data mode. Opts: debug,data_err=abort
>> >>
>> >> # but then mysql detected a corruption
>> >> 131226 8:11:40 InnoDB: Error: space id and page n:o stored in the page
>> >> InnoDB: read in are 543256175:543236195, should be 0:595!
>> >> InnoDB: Database page corruption on disk or a failed
>> >> InnoDB: file read of page 595.
>> >> InnoDB: You may have to recover from a backup.
>> >> 131226 8:11:40 InnoDB: Page dump in ascii and hex (16384 bytes):
>> >>
>> >> It happened only once, and we do a lot of testing of such unclean
>> >> shutdowns.The only theory that we have is that raid1 resync somehow
>> >> changed the underlying data of mysql/ext4.
>> >>
>> >> Thanks,
>> >> Alex.
>> >
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Can raid1-resync cause a corruption?
  2014-01-07 15:01       ` Alexander Lyakas
@ 2014-01-09  3:10         ` NeilBrown
  0 siblings, 0 replies; 8+ messages in thread
From: NeilBrown @ 2014-01-09  3:10 UTC (permalink / raw)
  To: Alexander Lyakas; +Cc: linux-raid, yair

[-- Attachment #1: Type: text/plain, Size: 2810 bytes --]

On Tue, 7 Jan 2014 17:01:23 +0200 Alexander Lyakas <alex.bolshoy@gmail.com>
wrote:

> Hi Neil,
> Thank you for your comments. Yes, apparently in this case md/raid1 was
> not the cause. I studied the code more by adding prints and following
> the resync flow, but cannot see any obvious problem.
> 
> I did find another case, in which raid1-resync can read phantom data,
> although this is not what happened to us:
> # raid1 has 3 disks A,B,C and is resyncing after an unclean shutdown.
> sync_request always selects disk=A as read_disk.
> # application reads from far sector (beyond next_resync), so
> read_balance() selects disk=A to read from (it is the first one)
> # disk A fails
> # resync aborts and restarts, now sync_request reads from B and syncs into C
> # application reads again from the same far sector, now read_balance()
> selects disk B to read from
> 
> So potentially we could get a different data from these two reads. In
> our case, though, there were no disk failures.
> 
> FWIW, the raid1 code I was once responsible for, treated this
> situation as follows:
> # READ comes from application
> # raid1 sees that it is resyncing, so it locks the relevant area of
> the raid1 and syncs it. Then it unlocks and proceeds to serve the READ
> normally
> # resync thread comes to appropriate area, locks it and sees that it
> has already been synced (bits are off in the bitmap), so it proceeds
> further
> 
> However in md/raid1, there is no mechanism currently that can lock a
> part of the raid. We only have raise_barrier/wait_barrier that
> effectively locks the whole capacity.
> 
> Is it, for example, reasonable to READ the data as you normally do,
> then to trigger a WRITE with the same data and only then to complete
> the original READ? There are a lot of inefficiencies here, I know,
> like re-writing the same data again on read_disk, and syncing this
> data again later. (I know, patches are welcome...)


Hmmm.. yes that could conceivably cause a problem.  It would apply to RAID6
too.
To "fix" it we would have to either read-and-check the replicas or parity
whenever we read from a block that is not "in-sync", and/or write them out.
This could be rather expensive for  fairly little gain.

If we were doing a bitmap-based resync, then we could maybe expedite the
resync of any region before reading from it.
i.e. before reading from an block which is not known to be in-sync, we wait
for it to be in-sync, but also signal the resync process to do this 'bit'
worth next.  That could be rather messy... but might not be too bad.
Rather than using the resync thread to handle the extra bits, maybe we could
have a work-queue which just handled specifically requested regions...

Patches certainly welcome :-)

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

[parent not found: <CA+res+QwDuaJTf1FVNtb--nTcoRjmFM4T6AadEt88UxPiG=EUw@mail.gmail.com>]

* Re:: Can raid1-resync cause a corruption?
       [not found] ` <CA+res+QwDuaJTf1FVNtb--nTcoRjmFM4T6AadEt88UxPiG=EUw@mail.gmail.com>
@ 2014-01-02 15:17   ` Jack Wang
  0 siblings, 0 replies; 8+ messages in thread
From: Jack Wang @ 2014-01-02 15:17 UTC (permalink / raw)
  To: alex.bolshoy, NeilBrown, linux-raid

> 
> Hi Neil,
> I see in the code, that read_balance is careful not to do
> read-balancing, if the area in question is under resync:
>     if (conf->mddev->recovery_cp < MaxSector &&
>         (this_sector + sectors >= conf->next_resync)) {
>         choose_first = 1;
>         start_disk = 0;
>     } else {
>         choose_first = 0;
>         start_disk = conf->last_used;
>     }
> 
> However, I see that next_resync is updated in sync_request:
> conf->next_resync = sector_nr;
> before actually completing the resync of that area, before even
> submitting the read-bios. Is this correct? Should we perhaps update
> next_sync at the point where we update curr_resync_completed in
> md_do_sync?
> Kernel that I am looking at is Ubuntu-Precise 3.2.0-29.46.
> 
> Below is a sequence of event that lead us to this question.
> 
> We have a 3-disk RADI1, with ext4 mounted on top and mysql running,
> whose files are on that ext4.
> # 3-disk RAID1 was recovering from an unclean shutdown (crash) of the
> machine, it started resync:
> 
> Dec 26 08:05:42 vc kernel: [11576870.520393] md: md1 stopped.
> Dec 26 08:05:42 vc kernel: [11576870.526898] md: bind<dm-2>
> Dec 26 08:05:42 vc kernel: [11576870.534819] md: bind<dm-3>
> Dec 26 08:05:42 vc kernel: [11576870.535908] md: bind<dm-1>
> Dec 26 08:05:42 vc kernel: [11576870.580730] md/raid1:md1: not clean
> -- starting background reconstruction
> Dec 26 08:05:42 vc kernel: [11576870.580735] md/raid1:md1: active with
> 3 out of 3 mirrors
> Dec 26 08:05:42 vc kernel: [11576870.581399] created bitmap (1 pages)
> for device md1
> Dec 26 08:05:42 vc kernel: [11576870.582386] md1: bitmap initialized
> from disk: read 1/1 pages, set 5 of 5 bits
> Dec 26 08:05:42 vc kernel: [11576870.583335] md1: detected capacity
> change from 0 to 320773120
> Dec 26 08:05:42 vc kernel: [11576870.585593] md: resync of RAID array md1
> Dec 26 08:05:42 vc kernel: [11576870.585597] md: minimum _guaranteed_
> speed: 10000 KB/sec/disk.
> Dec 26 08:05:42 vc kernel: [11576870.585601] md: using maximum
> available idle IO bandwidth (but not more than 200000 KB/sec) for
> resync.
> Dec 26 08:05:42 vc kernel: [11576870.585617] md: using 128k window,
> over a total of 313255k.
> Dec 26 08:05:42 vc kernel: [11576870.687868]  md1: unknown partition table
> 
> # meanwhile, ext4 was mounted on top of it (dm-4 is a devicemapper
> target on top of MD1, which collects some stats, but otherwise does
> not interfere with bios in any way). ext4 also realized the unclean
> shutdown:
> Dec 26 08:05:42 vc kernel: [11576870.936166] [14831]dm_iostat
> [dm_iostat_ctr] Successfully set up dm (252:4) iostats on top of
> /dev/md1 (9:1), flags=0x80
> Dec 26 08:05:43 vc kernel: [11576871.159955] [EXT4 FS bs=1024, gc=39,
> bpg=8192, ipg=2016, mo=9c02c818, mo2=0000]
> Dec 26 08:05:43 vc kernel: [11576871.160092] EXT4-fs (dm-4): recovery
> complete
> Dec 26 08:05:43 vc kernel: [11576871.161099] EXT4-fs (dm-4): mounted
> filesystem with ordered data mode. Opts: debug,data_err=abort
> 
> # mysql was started on top of ext4, and also realized the unclean shutdown:
> InnoDB: Log scan progressed past the checkpoint lsn 18148639
> 131226  8:05:44  InnoDB: Database was not shut down normally!
> InnoDB: Starting crash recovery.
> InnoDB: Reading tablespace information from the .ibd files...
> InnoDB: Restoring possible half-written data pages from the doublewrite
> InnoDB: buffer...
> InnoDB: Doing recovery: scanned up to log sequence number 18154016
> 131226  8:05:44  InnoDB: Starting an apply batch of log records to the
> database...
> InnoDB: Progress in percents: 11 12 13 14 15 16 17 18 19 20 21 22 23
> 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
> 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69
> 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92
> 93 94 95 96 97 98 99
> InnoDB: Apply batch completed
> 131226  8:05:44  InnoDB: Waiting for the background threads to start
> 
> 8 seconds later, raid1-resync completed:
> Dec 26 08:05:52 vc kernel: [11576880.161306] md: md1: resync done.
> Dec 26 08:05:52 vc kernel: [11576880.221701] RAID1 conf printout:
> Dec 26 08:05:52 vc kernel: [11576880.221705]  --- wd:3 rd:3
> Dec 26 08:05:52 vc kernel: [11576880.221709]  disk 0, wo:0, o:1, dev:dm-1
> Dec 26 08:05:52 vc kernel: [11576880.221712]  disk 1, wo:0, o:1, dev:dm-2
> Dec 26 08:05:52 vc kernel: [11576880.221714]  disk 2, wo:0, o:1, dev:dm-3
> 
> # The system ran normally for about 5 minutes, and then was shutdown
> cleanly:
> # mysql was stopped cleanly
> 131226  8:11:28 [Note] /usr/sbin/mysqld: Normal shutdown
> 131226  8:11:28 [Note] Event Scheduler: Purging the queue. 0 events
> 131226  8:11:30 [Warning] /usr/sbin/mysqld: Forcing close of thread 68
>  user: 'change_vsa_user'
> 131226  8:11:30 [Warning] /usr/sbin/mysqld: Forcing close of thread 59
>  user: 'standbyVc0'
> 131226  8:11:30  InnoDB: Starting shutdown...
> 131226  8:11:31  InnoDB: Waiting for 3 pages to be flushed
> 131226  8:11:31  InnoDB: Shutdown completed; log sequence number 18626878
> 131226  8:11:31 [Note] /usr/sbin/mysqld: Shutdown complete
> 
> # ext4 was cleanly unmounted (no print in the kernel for that)
> # md1 was stopped cleanly
> Dec 26 08:11:31 vc kernel: [11577219.716151] md1: detected capacity
> change from 320773120 to 0
> Dec 26 08:11:31 vc kernel: [11577219.716158] md: md1 stopped.
> Dec 26 08:11:31 vc kernel: [11577219.716167] md: unbind<dm-1>
> Dec 26 08:11:31 vc kernel: [11577219.740152] md: export_rdev(dm-1)
> Dec 26 08:11:31 vc kernel: [11577219.740174] md: unbind<dm-3>
> Dec 26 08:11:31 vc kernel: [11577219.772137] md: export_rdev(dm-3)
> Dec 26 08:11:31 vc kernel: [11577219.772158] md: unbind<dm-2>
> Dec 26 08:11:31 vc kernel: [11577219.810269] md: export_rdev(dm-2)
> 
> # Then the system was brought up again
> # md1 was assembled with a clean bitmap
> Dec 26 08:11:40 vc kernel: [  337.918676] md: md1 stopped.
> Dec 26 08:11:40 vc kernel: [  337.923588] md: bind<dm-2>
> Dec 26 08:11:40 vc kernel: [  337.924085] md: bind<dm-3>
> Dec 26 08:11:40 vc kernel: [  337.924511] md: bind<dm-1>
> Dec 26 08:11:40 vc kernel: [  337.927127] md: raid1 personality
> registered for level 1
> Dec 26 08:11:40 vc kernel: [  337.927264] bio: create slab <bio-1> at 1
> Dec 26 08:11:40 vc kernel: [  337.927435] md/raid1:md1: active with 3
> out of 3 mirrors
> Dec 26 08:11:40 vc kernel: [  337.927826] created bitmap (1 pages) for
> device md1
> Dec 26 08:11:40 vc kernel: [  337.928341] md1: bitmap initialized from
> disk: read 1/1 pages, set 0 of 5 bits
> Dec 26 08:11:40 vc kernel: [  337.929316] md1: detected capacity
> change from 0 to 320773120
> Dec 26 08:11:40 vc kernel: [  337.934610]  md1: unknown partition table
> 
> # ext4 was mounted cleanly
> Dec 26 08:11:40 vc kernel: [  338.036300] [6536]dm_iostat
> [dm_iostat_ctr] Successfully set up dm (252:4) iostats on top of
> /dev/md1 (9:1), flags=0x80
> Dec 26 08:11:40 vc kernel: [  338.063921] [EXT4 FS bs=1024, gc=39,
> bpg=8192, ipg=2016, mo=9c02c818, mo2=0000]
> Dec 26 08:11:40 vc kernel: [  338.064104] EXT4-fs (dm-4): mounted
> filesystem with ordered data mode. Opts: debug,data_err=abort
> 
> # but then mysql detected a corruption
> 131226 8:11:40 InnoDB: Error: space id and page n:o stored in the page
> InnoDB: read in are 543256175:543236195, should be 0:595!
> InnoDB: Database page corruption on disk or a failed
> InnoDB: file read of page 595.
> InnoDB: You may have to recover from a backup.
> 131226 8:11:40 InnoDB: Page dump in ascii and hex (16384 bytes):
> 
> It happened only once, and we do a lot of testing of such unclean
> shutdowns.The only theory that we have is that raid1 resync somehow
> changed the underlying data of mysql/ext4.
> 
> Thanks,
> Alex.
Hi Alex & Neil,

Happy New Year:)

We also saw data corruption on top of raid1 when storage busy
occasionally , sd device error handle offline raid member devices, when
sd device back, raid1 resync, and ext4fs report error, and remounting
filesystem to read-only and report a lot err msg like :
ext4_lookup:1050:inode #263664: comm chef-client: deleted inode
referenced: 263699.

It's possible we hit same bug.

I'm trying to reproduce the bug with scripts below, but no lucky, could
share insight on this, thanks.

mdadm -C /dev/md1 --force --run -l1 -n2 /dev/sdb /dev/sdc  -b internal
--assume-clean

mkfs.ext4 /dev/md1

for ((i=0; i<=100; i++)); do
mount /dev/md1 /test

dd of=/test/img if=/dev/random bs=4K count=1k oflag=direct &

#cat /proc/mdstat &

mdadm -D /dev/md1 &

echo offline > /sys/block/sdb/device/state
sleep 10
echo running > /sys/block/sdb/device/state

mdadm /dev/md1 -r /dev/sdb -a /dev/sdb

sleep 10

pkill dd

sleep 10

umount /dev/md1

sleep 10
fsck.ext4 -f /dev/md1

done

mdadm -S /dev/md1


Regards,
Jack
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> <mailto:majordomo@vger.kernel.org>
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-01-09  3:10 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-29 11:27 Can raid1-resync cause a corruption? Alexander Lyakas
2013-12-29 21:41 ` NeilBrown
2014-01-02 16:52   ` Alexander Lyakas
2014-01-06 16:58   ` Alexander Lyakas
2014-01-06 23:56     ` NeilBrown
2014-01-07 15:01       ` Alexander Lyakas
2014-01-09  3:10         ` NeilBrown
     [not found] ` <CA+res+QwDuaJTf1FVNtb--nTcoRjmFM4T6AadEt88UxPiG=EUw@mail.gmail.com>
2014-01-02 15:17   ` Jack Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).