* [PATCH] md/raid10: fix data corruption and crash during resync
@ 2015-11-04 16:30 Artur Paszkiewicz
2015-11-04 22:33 ` Shaohua Li
0 siblings, 1 reply; 4+ messages in thread
From: Artur Paszkiewicz @ 2015-11-04 16:30 UTC (permalink / raw)
To: neilb; +Cc: linux-raid, pawel.baldysiak, Artur Paszkiewicz
The commit c31df25f20e3 ("md/raid10: make sync_request_write() call
bio_copy_data()") replaced manual data copying with bio_copy_data() but
it doesn't work as intended. The source bio (fbio) is already processed,
so its bvec_iter has bi_size == 0 and bi_idx == bi_vcnt. Because of
this, bio_copy_data() either does not copy anything, or worse, copies
data from the ->bi_next bio if it is set. This causes wrong data to be
written to drives during resync and sometimes lockups/crashes in
bio_copy_data():
[ 517.338478] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [md126_raid10:3319]
[ 517.347324] Modules linked in: raid10 xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables x86_pkg_temp_thermal coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul cryptd shpchp pcspkr ipmi_si ipmi_msghandler tpm_crb acpi_power_meter acpi_cpufreq ext4 mbcache jbd2 sr_mod cdrom sd_mod e1000e ax88179_178a usbnet mii ahci ata_generic crc32c_intel libahci ptp pata_acpi lib
ata pps_core wmi sunrpc dm_mirror dm_region_hash dm_log dm_mod
[ 517.440555] CPU: 0 PID: 3319 Comm: md126_raid10 Not tainted 4.3.0-rc6+ #1
[ 517.448384] Hardware name: Intel Corporation PURLEY/PURLEY, BIOS PLYDCRB1.86B.0055.D14.1509221924 09/22/2015
[ 517.459768] task: ffff880153773980 ti: ffff880150df8000 task.ti: ffff880150df8000
[ 517.468529] RIP: 0010:[<ffffffff812e1888>] [<ffffffff812e1888>] bio_copy_data+0xc8/0x3c0
[ 517.478164] RSP: 0018:ffff880150dfbc98 EFLAGS: 00000246
[ 517.484341] RAX: ffff880169356688 RBX: 0000000000001000 RCX: 0000000000000000
[ 517.492558] RDX: 0000000000000000 RSI: ffffea0001ac2980 RDI: ffffea0000d835c0
[ 517.500773] RBP: ffff880150dfbd08 R08: 0000000000000001 R09: ffff880153773980
[ 517.508987] R10: ffff880169356600 R11: 0000000000001000 R12: 0000000000010000
[ 517.517199] R13: 000000000000e000 R14: 0000000000000000 R15: 0000000000001000
[ 517.525412] FS: 0000000000000000(0000) GS:ffff880174a00000(0000) knlGS:0000000000000000
[ 517.534844] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 517.541507] CR2: 00007f8a044d5fed CR3: 0000000169504000 CR4: 00000000001406f0
[ 517.549722] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 517.557929] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 517.566144] Stack:
[ 517.568626] ffff880174a16bc0 ffff880153773980 ffff880169356600 0000000000000000
[ 517.577659] 0000000000000001 0000000000000001 ffff880153773980 ffff88016a61a800
[ 517.586715] ffff880150dfbcf8 0000000000000001 ffff88016dd209e0 0000000000001000
[ 517.595773] Call Trace:
[ 517.598747] [<ffffffffa043ef95>] raid10d+0xfc5/0x1690 [raid10]
[ 517.605610] [<ffffffff816697ae>] ? __schedule+0x29e/0x8e2
[ 517.611987] [<ffffffff814ff206>] md_thread+0x106/0x140
[ 517.618072] [<ffffffff810c1d80>] ? wait_woken+0x80/0x80
[ 517.624252] [<ffffffff814ff100>] ? super_1_load+0x520/0x520
[ 517.630817] [<ffffffff8109ef89>] kthread+0xc9/0xe0
[ 517.636506] [<ffffffff8109eec0>] ? flush_kthread_worker+0x70/0x70
[ 517.643653] [<ffffffff8166d99f>] ret_from_fork+0x3f/0x70
[ 517.649929] [<ffffffff8109eec0>] ? flush_kthread_worker+0x70/0x70
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
---
drivers/md/raid10.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 96f3659..23bbe61 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1944,6 +1944,8 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
first = i;
fbio = r10_bio->devs[i].bio;
+ fbio->bi_iter.bi_size = r10_bio->sectors << 9;
+ fbio->bi_iter.bi_idx = 0;
vcnt = (r10_bio->sectors + (PAGE_SIZE >> 9) - 1) >> (PAGE_SHIFT - 9);
/* now find blocks with errors */
@@ -1987,7 +1989,7 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
bio_reset(tbio);
tbio->bi_vcnt = vcnt;
- tbio->bi_iter.bi_size = r10_bio->sectors << 9;
+ tbio->bi_iter.bi_size = fbio->bi_iter.bi_size;
tbio->bi_rw = WRITE;
tbio->bi_private = r10_bio;
tbio->bi_iter.bi_sector = r10_bio->devs[i].addr;
--
2.1.4
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] md/raid10: fix data corruption and crash during resync
2015-11-04 16:30 [PATCH] md/raid10: fix data corruption and crash during resync Artur Paszkiewicz
@ 2015-11-04 22:33 ` Shaohua Li
2015-12-14 14:22 ` Baldysiak, Pawel
2015-12-16 1:29 ` NeilBrown
0 siblings, 2 replies; 4+ messages in thread
From: Shaohua Li @ 2015-11-04 22:33 UTC (permalink / raw)
To: Artur Paszkiewicz; +Cc: neilb, linux-raid, pawel.baldysiak
On Wed, Nov 04, 2015 at 05:30:30PM +0100, Artur Paszkiewicz wrote:
> The commit c31df25f20e3 ("md/raid10: make sync_request_write() call
> bio_copy_data()") replaced manual data copying with bio_copy_data() but
> it doesn't work as intended. The source bio (fbio) is already processed,
> so its bvec_iter has bi_size == 0 and bi_idx == bi_vcnt. Because of
> this, bio_copy_data() either does not copy anything, or worse, copies
> data from the ->bi_next bio if it is set. This causes wrong data to be
> written to drives during resync and sometimes lockups/crashes in
> bio_copy_data():
>
> [ 517.338478] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [md126_raid10:3319]
> [ 517.347324] Modules linked in: raid10 xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables x86_pkg_temp_thermal coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul cryptd shpchp pcspkr ipmi_si ipmi_msghandler tpm_crb acpi_power_meter acpi_cpufreq ext4 mbcache jbd2 sr_mod cdrom sd_mod e1000e ax88179_178a usbnet mii ahci ata_generic crc32c_intel libahci ptp pata_acpi l
ibata pps_core wmi sunrpc dm_mirror dm_region_hash dm_log dm_mod
> [ 517.440555] CPU: 0 PID: 3319 Comm: md126_raid10 Not tainted 4.3.0-rc6+ #1
> [ 517.448384] Hardware name: Intel Corporation PURLEY/PURLEY, BIOS PLYDCRB1.86B.0055.D14.1509221924 09/22/2015
> [ 517.459768] task: ffff880153773980 ti: ffff880150df8000 task.ti: ffff880150df8000
> [ 517.468529] RIP: 0010:[<ffffffff812e1888>] [<ffffffff812e1888>] bio_copy_data+0xc8/0x3c0
> [ 517.478164] RSP: 0018:ffff880150dfbc98 EFLAGS: 00000246
> [ 517.484341] RAX: ffff880169356688 RBX: 0000000000001000 RCX: 0000000000000000
> [ 517.492558] RDX: 0000000000000000 RSI: ffffea0001ac2980 RDI: ffffea0000d835c0
> [ 517.500773] RBP: ffff880150dfbd08 R08: 0000000000000001 R09: ffff880153773980
> [ 517.508987] R10: ffff880169356600 R11: 0000000000001000 R12: 0000000000010000
> [ 517.517199] R13: 000000000000e000 R14: 0000000000000000 R15: 0000000000001000
> [ 517.525412] FS: 0000000000000000(0000) GS:ffff880174a00000(0000) knlGS:0000000000000000
> [ 517.534844] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 517.541507] CR2: 00007f8a044d5fed CR3: 0000000169504000 CR4: 00000000001406f0
> [ 517.549722] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 517.557929] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 517.566144] Stack:
> [ 517.568626] ffff880174a16bc0 ffff880153773980 ffff880169356600 0000000000000000
> [ 517.577659] 0000000000000001 0000000000000001 ffff880153773980 ffff88016a61a800
> [ 517.586715] ffff880150dfbcf8 0000000000000001 ffff88016dd209e0 0000000000001000
> [ 517.595773] Call Trace:
> [ 517.598747] [<ffffffffa043ef95>] raid10d+0xfc5/0x1690 [raid10]
> [ 517.605610] [<ffffffff816697ae>] ? __schedule+0x29e/0x8e2
> [ 517.611987] [<ffffffff814ff206>] md_thread+0x106/0x140
> [ 517.618072] [<ffffffff810c1d80>] ? wait_woken+0x80/0x80
> [ 517.624252] [<ffffffff814ff100>] ? super_1_load+0x520/0x520
> [ 517.630817] [<ffffffff8109ef89>] kthread+0xc9/0xe0
> [ 517.636506] [<ffffffff8109eec0>] ? flush_kthread_worker+0x70/0x70
> [ 517.643653] [<ffffffff8166d99f>] ret_from_fork+0x3f/0x70
> [ 517.649929] [<ffffffff8109eec0>] ? flush_kthread_worker+0x70/0x70
>
> Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
> ---
> drivers/md/raid10.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 96f3659..23bbe61 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -1944,6 +1944,8 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
>
> first = i;
> fbio = r10_bio->devs[i].bio;
> + fbio->bi_iter.bi_size = r10_bio->sectors << 9;
> + fbio->bi_iter.bi_idx = 0;
>
> vcnt = (r10_bio->sectors + (PAGE_SIZE >> 9) - 1) >> (PAGE_SHIFT - 9);
> /* now find blocks with errors */
> @@ -1987,7 +1989,7 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
> bio_reset(tbio);
>
> tbio->bi_vcnt = vcnt;
> - tbio->bi_iter.bi_size = r10_bio->sectors << 9;
> + tbio->bi_iter.bi_size = fbio->bi_iter.bi_size;
> tbio->bi_rw = WRITE;
> tbio->bi_private = r10_bio;
> tbio->bi_iter.bi_sector = r10_bio->devs[i].addr;
Looks good. Reviewed-by: Shaohua Li <shli@kernel.org>
A nitpick, I'm wondering if we should do a full reset like raid1 does to make this more clear.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] md/raid10: fix data corruption and crash during resync
2015-11-04 22:33 ` Shaohua Li
@ 2015-12-14 14:22 ` Baldysiak, Pawel
2015-12-16 1:29 ` NeilBrown
1 sibling, 0 replies; 4+ messages in thread
From: Baldysiak, Pawel @ 2015-12-14 14:22 UTC (permalink / raw)
To: neilb@suse.de; +Cc: linux-raid@vger.kernel.org, Paszkiewicz, Artur
Hi Neil,
Please look at the patch below.
Thanks,
Pawel Baldysiak
On Wed, 2015-11-04 at 14:33 -0800, Shaohua Li wrote:
> On Wed, Nov 04, 2015 at 05:30:30PM +0100, Artur Paszkiewicz wrote:
> > The commit c31df25f20e3 ("md/raid10: make sync_request_write() call
> > bio_copy_data()") replaced manual data copying with bio_copy_data()
> > but
> > it doesn't work as intended. The source bio (fbio) is already
> > processed,
> > so its bvec_iter has bi_size == 0 and bi_idx == bi_vcnt. Because
> > of
> > this, bio_copy_data() either does not copy anything, or worse,
> > copies
> > data from the ->bi_next bio if it is set. This causes wrong data
> > to be
> > written to drives during resync and sometimes lockups/crashes in
> > bio_copy_data():
> >
> > [ 517.338478] NMI watchdog: BUG: soft lockup - CPU#0 stuck for
> > 22s! [md126_raid10:3319]
> > [ 517.347324] Modules linked in: raid10 xt_CHECKSUM ipt_MASQUERADE
> > nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
> > ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute
> > bridge stp llc ebtable_filter ebtables ip6table_nat
> > nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle
> > ip6table_security ip6table_raw ip6table_filter ip6_tables
> > iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat
> > nf_conntrack iptable_mangle iptable_security iptable_raw
> > iptable_filter ip_tables x86_pkg_temp_thermal coretemp kvm_intel
> > kvm crct10dif_pclmul crc32_pclmul cryptd shpchp pcspkr ipmi_si
> > ipmi_msghandler tpm_crb acpi_power_meter acpi_cpufreq ext4 mbcache
> > jbd2 sr_mod cdrom sd_mod e1000e ax88179_178a usbnet mii ahci
> > ata_generic crc32c_intel libahci ptp pata_acpi libata pps_core wmi
> > sunrpc dm_mirror dm_region_hash dm_log dm_mod
> > [ 517.440555] CPU: 0 PID: 3319 Comm: md126_raid10 Not tainted
> > 4.3.0-rc6+ #1
> > [ 517.448384] Hardware name: Intel Corporation PURLEY/PURLEY, BIOS
> > PLYDCRB1.86B.0055.D14.1509221924 09/22/2015
> > [ 517.459768] task: ffff880153773980 ti: ffff880150df8000 task.ti:
> > ffff880150df8000
> > [ 517.468529] RIP: 0010:[<ffffffff812e1888>] [<ffffffff812e1888>]
> > bio_copy_data+0xc8/0x3c0
> > [ 517.478164] RSP: 0018:ffff880150dfbc98 EFLAGS: 00000246
> > [ 517.484341] RAX: ffff880169356688 RBX: 0000000000001000 RCX:
> > 0000000000000000
> > [ 517.492558] RDX: 0000000000000000 RSI: ffffea0001ac2980 RDI:
> > ffffea0000d835c0
> > [ 517.500773] RBP: ffff880150dfbd08 R08: 0000000000000001 R09:
> > ffff880153773980
> > [ 517.508987] R10: ffff880169356600 R11: 0000000000001000 R12:
> > 0000000000010000
> > [ 517.517199] R13: 000000000000e000 R14: 0000000000000000 R15:
> > 0000000000001000
> > [ 517.525412] FS: 0000000000000000(0000)
> > GS:ffff880174a00000(0000) knlGS:0000000000000000
> > [ 517.534844] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 517.541507] CR2: 00007f8a044d5fed CR3: 0000000169504000 CR4:
> > 00000000001406f0
> > [ 517.549722] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > 0000000000000000
> > [ 517.557929] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > 0000000000000400
> > [ 517.566144] Stack:
> > [ 517.568626] ffff880174a16bc0 ffff880153773980 ffff880169356600
> > 0000000000000000
> > [ 517.577659] 0000000000000001 0000000000000001 ffff880153773980
> > ffff88016a61a800
> > [ 517.586715] ffff880150dfbcf8 0000000000000001 ffff88016dd209e0
> > 0000000000001000
> > [ 517.595773] Call Trace:
> > [ 517.598747] [<ffffffffa043ef95>] raid10d+0xfc5/0x1690 [raid10]
> > [ 517.605610] [<ffffffff816697ae>] ? __schedule+0x29e/0x8e2
> > [ 517.611987] [<ffffffff814ff206>] md_thread+0x106/0x140
> > [ 517.618072] [<ffffffff810c1d80>] ? wait_woken+0x80/0x80
> > [ 517.624252] [<ffffffff814ff100>] ? super_1_load+0x520/0x520
> > [ 517.630817] [<ffffffff8109ef89>] kthread+0xc9/0xe0
> > [ 517.636506] [<ffffffff8109eec0>] ?
> > flush_kthread_worker+0x70/0x70
> > [ 517.643653] [<ffffffff8166d99f>] ret_from_fork+0x3f/0x70
> > [ 517.649929] [<ffffffff8109eec0>] ?
> > flush_kthread_worker+0x70/0x70
> >
> > Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
> > ---
> > drivers/md/raid10.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> > index 96f3659..23bbe61 100644
> > --- a/drivers/md/raid10.c
> > +++ b/drivers/md/raid10.c
> > @@ -1944,6 +1944,8 @@ static void sync_request_write(struct mddev
> > *mddev, struct r10bio *r10_bio)
> >
> > first = i;
> > fbio = r10_bio->devs[i].bio;
> > + fbio->bi_iter.bi_size = r10_bio->sectors << 9;
> > + fbio->bi_iter.bi_idx = 0;
> >
> > vcnt = (r10_bio->sectors + (PAGE_SIZE >> 9) - 1) >>
> > (PAGE_SHIFT - 9);
> > /* now find blocks with errors */
> > @@ -1987,7 +1989,7 @@ static void sync_request_write(struct mddev
> > *mddev, struct r10bio *r10_bio)
> > bio_reset(tbio);
> >
> > tbio->bi_vcnt = vcnt;
> > - tbio->bi_iter.bi_size = r10_bio->sectors << 9;
> > + tbio->bi_iter.bi_size = fbio->bi_iter.bi_size;
> > tbio->bi_rw = WRITE;
> > tbio->bi_private = r10_bio;
> > tbio->bi_iter.bi_sector = r10_bio->devs[i].addr;
>
> Looks good. Reviewed-by: Shaohua Li <shli@kernel.org>
>
> A nitpick, I'm wondering if we should do a full reset like raid1 does
> to make this more clear.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] md/raid10: fix data corruption and crash during resync
2015-11-04 22:33 ` Shaohua Li
2015-12-14 14:22 ` Baldysiak, Pawel
@ 2015-12-16 1:29 ` NeilBrown
1 sibling, 0 replies; 4+ messages in thread
From: NeilBrown @ 2015-12-16 1:29 UTC (permalink / raw)
To: Shaohua Li, Artur Paszkiewicz; +Cc: linux-raid, pawel.baldysiak
[-- Attachment #1: Type: text/plain, Size: 5714 bytes --]
On Thu, Nov 05 2015, Shaohua Li wrote:
> On Wed, Nov 04, 2015 at 05:30:30PM +0100, Artur Paszkiewicz wrote:
>> The commit c31df25f20e3 ("md/raid10: make sync_request_write() call
>> bio_copy_data()") replaced manual data copying with bio_copy_data() but
>> it doesn't work as intended. The source bio (fbio) is already processed,
>> so its bvec_iter has bi_size == 0 and bi_idx == bi_vcnt. Because of
>> this, bio_copy_data() either does not copy anything, or worse, copies
>> data from the ->bi_next bio if it is set. This causes wrong data to be
>> written to drives during resync and sometimes lockups/crashes in
>> bio_copy_data():
>>
>> [ 517.338478] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [md126_raid10:3319]
>> [ 517.347324] Modules linked in: raid10 xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables x86_pkg_temp_thermal coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul cryptd shpchp pcspkr ipmi_si ipmi_msghandler tpm_crb acpi_power_meter acpi_cpufreq ext4 mbcache jbd2 sr_mod cdrom sd_mod e1000e ax88179_178a usbnet mii ahci ata_generic crc32c_intel libahci ptp pata_acpi libata pps_core wmi sunrpc dm_mirror dm_region_hash dm_log dm_mod
>> [ 517.440555] CPU: 0 PID: 3319 Comm: md126_raid10 Not tainted 4.3.0-rc6+ #1
>> [ 517.448384] Hardware name: Intel Corporation PURLEY/PURLEY, BIOS PLYDCRB1.86B.0055.D14.1509221924 09/22/2015
>> [ 517.459768] task: ffff880153773980 ti: ffff880150df8000 task.ti: ffff880150df8000
>> [ 517.468529] RIP: 0010:[<ffffffff812e1888>] [<ffffffff812e1888>] bio_copy_data+0xc8/0x3c0
>> [ 517.478164] RSP: 0018:ffff880150dfbc98 EFLAGS: 00000246
>> [ 517.484341] RAX: ffff880169356688 RBX: 0000000000001000 RCX: 0000000000000000
>> [ 517.492558] RDX: 0000000000000000 RSI: ffffea0001ac2980 RDI: ffffea0000d835c0
>> [ 517.500773] RBP: ffff880150dfbd08 R08: 0000000000000001 R09: ffff880153773980
>> [ 517.508987] R10: ffff880169356600 R11: 0000000000001000 R12: 0000000000010000
>> [ 517.517199] R13: 000000000000e000 R14: 0000000000000000 R15: 0000000000001000
>> [ 517.525412] FS: 0000000000000000(0000) GS:ffff880174a00000(0000) knlGS:0000000000000000
>> [ 517.534844] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 517.541507] CR2: 00007f8a044d5fed CR3: 0000000169504000 CR4: 00000000001406f0
>> [ 517.549722] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [ 517.557929] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [ 517.566144] Stack:
>> [ 517.568626] ffff880174a16bc0 ffff880153773980 ffff880169356600 0000000000000000
>> [ 517.577659] 0000000000000001 0000000000000001 ffff880153773980 ffff88016a61a800
>> [ 517.586715] ffff880150dfbcf8 0000000000000001 ffff88016dd209e0 0000000000001000
>> [ 517.595773] Call Trace:
>> [ 517.598747] [<ffffffffa043ef95>] raid10d+0xfc5/0x1690 [raid10]
>> [ 517.605610] [<ffffffff816697ae>] ? __schedule+0x29e/0x8e2
>> [ 517.611987] [<ffffffff814ff206>] md_thread+0x106/0x140
>> [ 517.618072] [<ffffffff810c1d80>] ? wait_woken+0x80/0x80
>> [ 517.624252] [<ffffffff814ff100>] ? super_1_load+0x520/0x520
>> [ 517.630817] [<ffffffff8109ef89>] kthread+0xc9/0xe0
>> [ 517.636506] [<ffffffff8109eec0>] ? flush_kthread_worker+0x70/0x70
>> [ 517.643653] [<ffffffff8166d99f>] ret_from_fork+0x3f/0x70
>> [ 517.649929] [<ffffffff8109eec0>] ? flush_kthread_worker+0x70/0x70
>>
>> Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
>> ---
>> drivers/md/raid10.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
>> index 96f3659..23bbe61 100644
>> --- a/drivers/md/raid10.c
>> +++ b/drivers/md/raid10.c
>> @@ -1944,6 +1944,8 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
>>
>> first = i;
>> fbio = r10_bio->devs[i].bio;
>> + fbio->bi_iter.bi_size = r10_bio->sectors << 9;
>> + fbio->bi_iter.bi_idx = 0;
>>
>> vcnt = (r10_bio->sectors + (PAGE_SIZE >> 9) - 1) >> (PAGE_SHIFT - 9);
>> /* now find blocks with errors */
>> @@ -1987,7 +1989,7 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
>> bio_reset(tbio);
>>
>> tbio->bi_vcnt = vcnt;
>> - tbio->bi_iter.bi_size = r10_bio->sectors << 9;
>> + tbio->bi_iter.bi_size = fbio->bi_iter.bi_size;
>> tbio->bi_rw = WRITE;
>> tbio->bi_private = r10_bio;
>> tbio->bi_iter.bi_sector = r10_bio->devs[i].addr;
>
> Looks good. Reviewed-by: Shaohua Li <shli@kernel.org>
Thanks for the patch and the review.
I've added:
Cc: stable@vger.kernel.org (v4.2+)
Fixes: c31df25f20e3 ("md/raid10: make sync_request_write() call bio_copy_data()")
Signed-off-by: NeilBrown <neilb@suse.com>
and will hopefully submit to Linus in a day or so.
>
> A nitpick, I'm wondering if we should do a full reset like raid1 does to make this more clear.
Might make sense. I'm happy with the code as it is, but if doing a full
reset makes the code cleared I'd accept that too.
Thanks,
NeilBrown
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2015-12-16 1:29 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-04 16:30 [PATCH] md/raid10: fix data corruption and crash during resync Artur Paszkiewicz
2015-11-04 22:33 ` Shaohua Li
2015-12-14 14:22 ` Baldysiak, Pawel
2015-12-16 1:29 ` NeilBrown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).