From: Shaohua Li <shli@kernel.org>
To: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Cc: neilb@suse.de, linux-raid@vger.kernel.org, pawel.baldysiak@intel.com
Subject: Re: [PATCH] md/raid10: fix data corruption and crash during resync
Date: Wed, 4 Nov 2015 14:33:53 -0800 [thread overview]
Message-ID: <20151104223353.GA99478@kernel.org> (raw)
In-Reply-To: <1446654630-24067-1-git-send-email-artur.paszkiewicz@intel.com>
On Wed, Nov 04, 2015 at 05:30:30PM +0100, Artur Paszkiewicz wrote:
> The commit c31df25f20e3 ("md/raid10: make sync_request_write() call
> bio_copy_data()") replaced manual data copying with bio_copy_data() but
> it doesn't work as intended. The source bio (fbio) is already processed,
> so its bvec_iter has bi_size == 0 and bi_idx == bi_vcnt. Because of
> this, bio_copy_data() either does not copy anything, or worse, copies
> data from the ->bi_next bio if it is set. This causes wrong data to be
> written to drives during resync and sometimes lockups/crashes in
> bio_copy_data():
>
> [ 517.338478] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [md126_raid10:3319]
> [ 517.347324] Modules linked in: raid10 xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables x86_pkg_temp_thermal coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul cryptd shpchp pcspkr ipmi_si ipmi_msghandler tpm_crb acpi_power_meter acpi_cpufreq ext4 mbcache jbd2 sr_mod cdrom sd_mod e1000e ax88179_178a usbnet mii ahci ata_generic crc32c_intel libahci ptp pata_acpi l
ibata pps_core wmi sunrpc dm_mirror dm_region_hash dm_log dm_mod
> [ 517.440555] CPU: 0 PID: 3319 Comm: md126_raid10 Not tainted 4.3.0-rc6+ #1
> [ 517.448384] Hardware name: Intel Corporation PURLEY/PURLEY, BIOS PLYDCRB1.86B.0055.D14.1509221924 09/22/2015
> [ 517.459768] task: ffff880153773980 ti: ffff880150df8000 task.ti: ffff880150df8000
> [ 517.468529] RIP: 0010:[<ffffffff812e1888>] [<ffffffff812e1888>] bio_copy_data+0xc8/0x3c0
> [ 517.478164] RSP: 0018:ffff880150dfbc98 EFLAGS: 00000246
> [ 517.484341] RAX: ffff880169356688 RBX: 0000000000001000 RCX: 0000000000000000
> [ 517.492558] RDX: 0000000000000000 RSI: ffffea0001ac2980 RDI: ffffea0000d835c0
> [ 517.500773] RBP: ffff880150dfbd08 R08: 0000000000000001 R09: ffff880153773980
> [ 517.508987] R10: ffff880169356600 R11: 0000000000001000 R12: 0000000000010000
> [ 517.517199] R13: 000000000000e000 R14: 0000000000000000 R15: 0000000000001000
> [ 517.525412] FS: 0000000000000000(0000) GS:ffff880174a00000(0000) knlGS:0000000000000000
> [ 517.534844] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 517.541507] CR2: 00007f8a044d5fed CR3: 0000000169504000 CR4: 00000000001406f0
> [ 517.549722] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 517.557929] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 517.566144] Stack:
> [ 517.568626] ffff880174a16bc0 ffff880153773980 ffff880169356600 0000000000000000
> [ 517.577659] 0000000000000001 0000000000000001 ffff880153773980 ffff88016a61a800
> [ 517.586715] ffff880150dfbcf8 0000000000000001 ffff88016dd209e0 0000000000001000
> [ 517.595773] Call Trace:
> [ 517.598747] [<ffffffffa043ef95>] raid10d+0xfc5/0x1690 [raid10]
> [ 517.605610] [<ffffffff816697ae>] ? __schedule+0x29e/0x8e2
> [ 517.611987] [<ffffffff814ff206>] md_thread+0x106/0x140
> [ 517.618072] [<ffffffff810c1d80>] ? wait_woken+0x80/0x80
> [ 517.624252] [<ffffffff814ff100>] ? super_1_load+0x520/0x520
> [ 517.630817] [<ffffffff8109ef89>] kthread+0xc9/0xe0
> [ 517.636506] [<ffffffff8109eec0>] ? flush_kthread_worker+0x70/0x70
> [ 517.643653] [<ffffffff8166d99f>] ret_from_fork+0x3f/0x70
> [ 517.649929] [<ffffffff8109eec0>] ? flush_kthread_worker+0x70/0x70
>
> Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
> ---
> drivers/md/raid10.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 96f3659..23bbe61 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -1944,6 +1944,8 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
>
> first = i;
> fbio = r10_bio->devs[i].bio;
> + fbio->bi_iter.bi_size = r10_bio->sectors << 9;
> + fbio->bi_iter.bi_idx = 0;
>
> vcnt = (r10_bio->sectors + (PAGE_SIZE >> 9) - 1) >> (PAGE_SHIFT - 9);
> /* now find blocks with errors */
> @@ -1987,7 +1989,7 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
> bio_reset(tbio);
>
> tbio->bi_vcnt = vcnt;
> - tbio->bi_iter.bi_size = r10_bio->sectors << 9;
> + tbio->bi_iter.bi_size = fbio->bi_iter.bi_size;
> tbio->bi_rw = WRITE;
> tbio->bi_private = r10_bio;
> tbio->bi_iter.bi_sector = r10_bio->devs[i].addr;
Looks good. Reviewed-by: Shaohua Li <shli@kernel.org>
A nitpick, I'm wondering if we should do a full reset like raid1 does to make this more clear.
next prev parent reply other threads:[~2015-11-04 22:33 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-04 16:30 [PATCH] md/raid10: fix data corruption and crash during resync Artur Paszkiewicz
2015-11-04 22:33 ` Shaohua Li [this message]
2015-12-14 14:22 ` Baldysiak, Pawel
2015-12-16 1:29 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151104223353.GA99478@kernel.org \
--to=shli@kernel.org \
--cc=artur.paszkiewicz@intel.com \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
--cc=pawel.baldysiak@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.