All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Xiao Ni <xni@redhat.com>,
	David Jeffery <djeffery@redhat.com>,
	Nigel Croxon <ncroxon@redhat.com>,
	Song Liu <songliubraving@fb.com>, Jens Axboe <axboe@kernel.dk>
Subject: [PATCH 5.1 13/46] Dont jump to compute_result state from check_result state
Date: Wed, 15 May 2019 12:56:37 +0200	[thread overview]
Message-ID: <20190515090622.314017827@linuxfoundation.org> (raw)
In-Reply-To: <20190515090616.670410738@linuxfoundation.org>

From: Nigel Croxon <ncroxon@redhat.com>

commit 4f4fd7c5798bbdd5a03a60f6269cf1177fbd11ef upstream.

Changing state from check_state_check_result to
check_state_compute_result not only is unsafe but also doesn't
appear to serve a valid purpose.  A raid6 check should only be
pushing out extra writes if doing repair and a mis-match occurs.
The stripe dev management will already try and do repair writes
for failing sectors.

This patch makes the raid6 check_state_check_result handling
work more like raid5's.  If somehow too many failures for a
check, just quit the check operation for the stripe.  When any
checks pass, don't try and use check_state_compute_result for
a purpose it isn't needed for and is unsafe for.  Just mark the
stripe as in sync for passing its parity checks and let the
stripe dev read/write code and the bad blocks list do their
job handling I/O errors.

Repro steps from Xiao:

These are the steps to reproduce this problem:
1. redefined OPT_MEDIUM_ERR_ADDR to 12000 in scsi_debug.c
2. insmod scsi_debug.ko dev_size_mb=11000  max_luns=1 num_tgts=1
3. mdadm --create /dev/md127 --level=6 --raid-devices=5 /dev/sde1 /dev/sde2 /dev/sde3 /dev/sde5 /dev/sde6
sde is the disk created by scsi_debug
4. echo "2" >/sys/module/scsi_debug/parameters/opts
5. raid-check

It panic:
[ 4854.730899] md: data-check of RAID array md127
[ 4854.857455] sd 5:0:0:0: [sdr] tag#80 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 4854.859246] sd 5:0:0:0: [sdr] tag#80 Sense Key : Medium Error [current]
[ 4854.860694] sd 5:0:0:0: [sdr] tag#80 Add. Sense: Unrecovered read error
[ 4854.862207] sd 5:0:0:0: [sdr] tag#80 CDB: Read(10) 28 00 00 00 2d 88 00 04 00 00
[ 4854.864196] print_req_error: critical medium error, dev sdr, sector 11656 flags 0
[ 4854.867409] sd 5:0:0:0: [sdr] tag#100 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 4854.869469] sd 5:0:0:0: [sdr] tag#100 Sense Key : Medium Error [current]
[ 4854.871206] sd 5:0:0:0: [sdr] tag#100 Add. Sense: Unrecovered read error
[ 4854.872858] sd 5:0:0:0: [sdr] tag#100 CDB: Read(10) 28 00 00 00 2e e0 00 00 08 00
[ 4854.874587] print_req_error: critical medium error, dev sdr, sector 12000 flags 4000
[ 4854.876456] sd 5:0:0:0: [sdr] tag#101 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 4854.878552] sd 5:0:0:0: [sdr] tag#101 Sense Key : Medium Error [current]
[ 4854.880278] sd 5:0:0:0: [sdr] tag#101 Add. Sense: Unrecovered read error
[ 4854.881846] sd 5:0:0:0: [sdr] tag#101 CDB: Read(10) 28 00 00 00 2e e8 00 00 08 00
[ 4854.883691] print_req_error: critical medium error, dev sdr, sector 12008 flags 4000
[ 4854.893927] sd 5:0:0:0: [sdr] tag#166 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 4854.896002] sd 5:0:0:0: [sdr] tag#166 Sense Key : Medium Error [current]
[ 4854.897561] sd 5:0:0:0: [sdr] tag#166 Add. Sense: Unrecovered read error
[ 4854.899110] sd 5:0:0:0: [sdr] tag#166 CDB: Read(10) 28 00 00 00 2e e0 00 00 10 00
[ 4854.900989] print_req_error: critical medium error, dev sdr, sector 12000 flags 0
[ 4854.902757] md/raid:md127: read error NOT corrected!! (sector 9952 on sdr1).
[ 4854.904375] md/raid:md127: read error NOT corrected!! (sector 9960 on sdr1).
[ 4854.906201] ------------[ cut here ]------------
[ 4854.907341] kernel BUG at drivers/md/raid5.c:4190!

raid5.c:4190 above is this BUG_ON:

    handle_parity_checks6()
        ...
        BUG_ON(s->uptodate < disks - 1); /* We don't need Q to recover */

Cc: <stable@vger.kernel.org> # v3.16+
OriginalAuthor: David Jeffery <djeffery@redhat.com>
Cc: Xiao Ni <xni@redhat.com>
Tested-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: David Jeffy <djeffery@redhat.com>
Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/md/raid5.c |   19 ++++---------------
 1 file changed, 4 insertions(+), 15 deletions(-)

--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4223,26 +4223,15 @@ static void handle_parity_checks6(struct
 	case check_state_check_result:
 		sh->check_state = check_state_idle;
 
+		if (s->failed > 1)
+			break;
 		/* handle a successful check operation, if parity is correct
 		 * we are done.  Otherwise update the mismatch count and repair
 		 * parity if !MD_RECOVERY_CHECK
 		 */
 		if (sh->ops.zero_sum_result == 0) {
-			/* both parities are correct */
-			if (!s->failed)
-				set_bit(STRIPE_INSYNC, &sh->state);
-			else {
-				/* in contrast to the raid5 case we can validate
-				 * parity, but still have a failure to write
-				 * back
-				 */
-				sh->check_state = check_state_compute_result;
-				/* Returning at this point means that we may go
-				 * off and bring p and/or q uptodate again so
-				 * we make sure to check zero_sum_result again
-				 * to verify if p or q need writeback
-				 */
-			}
+			/* Any parity checked was correct */
+			set_bit(STRIPE_INSYNC, &sh->state);
 		} else {
 			atomic64_add(STRIPE_SECTORS, &conf->mddev->resync_mismatches);
 			if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery)) {



  parent reply	other threads:[~2019-05-15 11:36 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-15 10:56 [PATCH 5.1 00/46] 5.1.3-stable review Greg Kroah-Hartman
2019-05-15 10:56 ` [PATCH 5.1 01/46] platform/x86: sony-laptop: Fix unintentional fall-through Greg Kroah-Hartman
2019-05-15 10:56 ` [PATCH 5.1 02/46] platform/x86: thinkpad_acpi: Disable Bluetooth for some machines Greg Kroah-Hartman
2019-05-15 10:56 ` [PATCH 5.1 03/46] platform/x86: dell-laptop: fix rfkill functionality Greg Kroah-Hartman
2019-05-15 10:56 ` [PATCH 5.1 04/46] hwmon: (pwm-fan) Disable PWM if fetching cooling data fails Greg Kroah-Hartman
2019-05-15 10:56 ` [PATCH 5.1 05/46] hwmon: (occ) Fix extended status bits Greg Kroah-Hartman
2019-05-15 10:56 ` [PATCH 5.1 06/46] selftests/seccomp: Handle namespace failures gracefully Greg Kroah-Hartman
2019-05-15 10:56 ` [PATCH 5.1 07/46] i2c: core: ratelimit transfer when suspended errors Greg Kroah-Hartman
2019-05-15 10:56 ` [PATCH 5.1 08/46] kernfs: fix barrier usage in __kernfs_new_node() Greg Kroah-Hartman
2019-05-15 10:56 ` [PATCH 5.1 09/46] virt: vbox: Sanity-check parameter types for hgcm-calls coming from userspace Greg Kroah-Hartman
2019-05-15 10:56 ` [PATCH 5.1 10/46] USB: serial: fix unthrottle races Greg Kroah-Hartman
2019-05-15 10:56 ` [PATCH 5.1 11/46] mwl8k: Fix rate_idx underflow Greg Kroah-Hartman
2019-05-15 10:56 ` [PATCH 5.1 12/46] rtlwifi: rtl8723ae: Fix missing break in switch statement Greg Kroah-Hartman
2019-05-15 10:56 ` Greg Kroah-Hartman [this message]
2019-05-15 10:56 ` [PATCH 5.1 14/46] bonding: fix arp_validate toggling in active-backup mode Greg Kroah-Hartman
2019-05-15 10:56 ` [PATCH 5.1 15/46] bridge: Fix error path for kobject_init_and_add() Greg Kroah-Hartman
2019-05-15 10:56 ` [PATCH 5.1 16/46] dpaa_eth: fix SG frame cleanup Greg Kroah-Hartman
2019-05-15 10:56 ` [PATCH 5.1 17/46] fib_rules: return 0 directly if an exactly same rule exists when NLM_F_EXCL not supplied Greg Kroah-Hartman
2019-05-15 10:56 ` [PATCH 5.1 18/46] ipv4: Fix raw socket lookup for local traffic Greg Kroah-Hartman
2019-05-15 10:56 ` [PATCH 5.1 19/46] net: dsa: Fix error cleanup path in dsa_init_module Greg Kroah-Hartman
2019-05-15 10:56 ` [PATCH 5.1 20/46] net: ethernet: stmmac: dwmac-sun8i: enable support of unicast filtering Greg Kroah-Hartman
2019-05-15 10:56 ` [PATCH 5.1 21/46] net: macb: Change interrupt and napi enable order in open Greg Kroah-Hartman
2019-05-15 10:56 ` [PATCH 5.1 22/46] net: seeq: fix crash caused by not set dev.parent Greg Kroah-Hartman
2019-05-15 10:56 ` [PATCH 5.1 23/46] net: ucc_geth - fix Oops when changing number of buffers in the ring Greg Kroah-Hartman
2019-05-15 10:56 ` [PATCH 5.1 24/46] packet: Fix error path in packet_init Greg Kroah-Hartman
2019-05-15 10:56 ` [PATCH 5.1 25/46] selinux: do not report error on connect(AF_UNSPEC) Greg Kroah-Hartman
2019-05-15 10:56 ` [PATCH 5.1 26/46] tipc: fix hanging clients using poll with EPOLLOUT flag Greg Kroah-Hartman
2019-05-15 10:56 ` [PATCH 5.1 27/46] vlan: disable SIOCSHWTSTAMP in container Greg Kroah-Hartman
2019-05-15 10:56 ` [PATCH 5.1 28/46] vrf: sit mtu should not be updated when vrf netdev is the link Greg Kroah-Hartman
2019-05-15 10:56 ` [PATCH 5.1 29/46] aqc111: fix endianness issue in aqc111_change_mtu Greg Kroah-Hartman
2019-05-15 10:56 ` [PATCH 5.1 30/46] aqc111: fix writing to the phy on BE Greg Kroah-Hartman
2019-05-15 10:56 ` [PATCH 5.1 31/46] aqc111: fix double endianness swap " Greg Kroah-Hartman
2019-05-15 10:56 ` [PATCH 5.1 32/46] tuntap: fix dividing by zero in ebpf queue selection Greg Kroah-Hartman
2019-05-15 10:56 ` [PATCH 5.1 33/46] tuntap: synchronize through tfiles array instead of tun->numqueues Greg Kroah-Hartman
2019-05-15 10:56 ` [PATCH 5.1 34/46] net: phy: fix phy_validate_pause Greg Kroah-Hartman
2019-05-15 10:56 ` [PATCH 5.1 35/46] flow_dissector: disable preemption around BPF calls Greg Kroah-Hartman
2019-05-15 10:57 ` [PATCH 5.1 36/46] isdn: bas_gigaset: use usb_fill_int_urb() properly Greg Kroah-Hartman
2019-05-15 10:57 ` [PATCH 5.1 37/46] drivers/virt/fsl_hypervisor.c: dereferencing error pointers in ioctl Greg Kroah-Hartman
2019-05-15 10:57 ` [PATCH 5.1 38/46] drivers/virt/fsl_hypervisor.c: prevent integer overflow " Greg Kroah-Hartman
2019-05-15 10:57 ` [PATCH 5.1 39/46] powerpc/book3s/64: check for NULL pointer in pgd_alloc() Greg Kroah-Hartman
2019-05-15 10:57 ` [PATCH 5.1 40/46] powerpc/powernv/idle: Restore IAMR after idle Greg Kroah-Hartman
2019-05-15 10:57 ` [PATCH 5.1 41/46] powerpc/booke64: set RI in default MSR Greg Kroah-Hartman
2019-05-15 10:57 ` [PATCH 5.1 42/46] virtio_ring: Fix potential mem leak in virtqueue_add_indirect_packed Greg Kroah-Hartman
2019-05-15 10:57 ` [PATCH 5.1 43/46] PCI: hv: Fix a memory leak in hv_eject_device_work() Greg Kroah-Hartman
2019-05-15 10:57 ` [PATCH 5.1 44/46] PCI: hv: Add hv_pci_remove_slots() when we unload the driver Greg Kroah-Hartman
2019-05-15 10:57 ` [PATCH 5.1 45/46] PCI: hv: Add pci_destroy_slot() in pci_devices_present_work(), if necessary Greg Kroah-Hartman
2019-05-15 10:57 ` [PATCH 5.1 46/46] f2fs: Fix use of number of devices Greg Kroah-Hartman
2019-05-15 13:56 ` [PATCH 5.1 00/46] 5.1.3-stable review Igor Russkikh
2019-05-15 14:18   ` Greg Kroah-Hartman
2019-05-15 13:58 ` Igor Russkikh
2019-05-15 19:56 ` Naresh Kamboju
2019-05-16  6:21   ` Greg Kroah-Hartman
2019-05-16  3:38 ` Guenter Roeck
2019-05-16  6:20   ` Greg Kroah-Hartman
2019-05-16 11:04 ` Jon Hunter
2019-05-16 11:04   ` Jon Hunter
2019-05-16 16:50   ` Greg Kroah-Hartman
2019-05-16 13:55 ` shuah
2019-05-16 16:49   ` Greg Kroah-Hartman
2019-05-17  6:34 ` Kelsey Skunberg
2019-05-17  7:25   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190515090622.314017827@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=axboe@kernel.dk \
    --cc=djeffery@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ncroxon@redhat.com \
    --cc=songliubraving@fb.com \
    --cc=stable@vger.kernel.org \
    --cc=xni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.