stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Coly Li <colyli@suse.de>,
	NeilBrown <neilb@suse.com>,
	Jack Wang <jinpu.wang@profitbricks.com>, Shaohua Li <shli@fb.com>
Subject: [PATCH 4.4 22/30] md/raid1/10: fix potential deadlock
Date: Fri, 24 Mar 2017 18:59:01 +0100	[thread overview]
Message-ID: <20170324151222.149528027@linuxfoundation.org> (raw)
In-Reply-To: <20170324151220.759111698@linuxfoundation.org>

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Shaohua Li <shli@fb.com>

commit 61eb2b43b99ebdc9bc6bc83d9792257b243e7cb3 upstream.

Neil Brown pointed out a potential deadlock in raid 10 code with
bio_split/chain. The raid1 code could have the same issue, but recent
barrier rework makes it less likely to happen. The deadlock happens in
below sequence:

1. generic_make_request(bio), this will set current->bio_list
2. raid10_make_request will split bio to bio1 and bio2
3. __make_request(bio1), wait_barrer, add underlayer disk bio to
current->bio_list
4. __make_request(bio2), wait_barrer

If raise_barrier happens between 3 & 4, since wait_barrier runs at 3,
raise_barrier waits for IO completion from 3. And since raise_barrier
sets barrier, 4 waits for raise_barrier. But IO from 3 can't be
dispatched because raid10_make_request() doesn't finished yet.

The solution is to adjust the IO ordering. Quotes from Neil:
"
It is much safer to:

    if (need to split) {
        split = bio_split(bio, ...)
        bio_chain(...)
        make_request_fn(split);
        generic_make_request(bio);
   } else
        make_request_fn(mddev, bio);

This way we first process the initial section of the bio (in 'split')
which will queue some requests to the underlying devices.  These
requests will be queued in generic_make_request.
Then we queue the remainder of the bio, which will be added to the end
of the generic_make_request queue.
Then we return.
generic_make_request() will pop the lower-level device requests off the
queue and handle them first.  Then it will process the remainder
of the original bio once the first section has been fully processed.
"

Note, this only happens in read path. In write path, the bio is flushed to
underlaying disks either by blk flush (from schedule) or offladed to raid1/10d.
It's queued in current->bio_list.

Cc: Coly Li <colyli@suse.de>
Suggested-by: NeilBrown <neilb@suse.com>
Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/md/raid10.c |   18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1477,7 +1477,25 @@ static void make_request(struct mddev *m
 			split = bio;
 		}
 
+		/*
+		 * If a bio is splitted, the first part of bio will pass
+		 * barrier but the bio is queued in current->bio_list (see
+		 * generic_make_request). If there is a raise_barrier() called
+		 * here, the second part of bio can't pass barrier. But since
+		 * the first part bio isn't dispatched to underlaying disks
+		 * yet, the barrier is never released, hence raise_barrier will
+		 * alays wait. We have a deadlock.
+		 * Note, this only happens in read path. For write path, the
+		 * first part of bio is dispatched in a schedule() call
+		 * (because of blk plug) or offloaded to raid10d.
+		 * Quitting from the function immediately can change the bio
+		 * order queued in bio_list and avoid the deadlock.
+		 */
 		__make_request(mddev, split);
+		if (split != bio && bio_data_dir(bio) == READ) {
+			generic_make_request(bio);
+			break;
+		}
 	} while (split != bio);
 
 	/* In case raid10d snuck in to freeze_array */

  parent reply	other threads:[~2017-03-24 18:02 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-24 17:58 [PATCH 4.4 00/30] 4.4.57-stable review Greg Kroah-Hartman
2017-03-24 17:58 ` [PATCH 4.4 01/30] usb: core: hub: hub_port_init lock controller instead of bus Greg Kroah-Hartman
2017-03-24 17:58 ` [PATCH 4.4 02/30] USB: dont free bandwidth_mutex too early Greg Kroah-Hartman
2017-03-24 17:58 ` [PATCH 4.4 03/30] crypto: ghash-clmulni - Fix load failure Greg Kroah-Hartman
2017-03-24 17:58 ` [PATCH 4.4 04/30] crypto: cryptd - Assign statesize properly Greg Kroah-Hartman
2017-03-24 17:58 ` [PATCH 4.4 05/30] crypto: mcryptd - Fix load failure Greg Kroah-Hartman
2017-03-24 17:58 ` [PATCH 4.4 06/30] cxlflash: Increase cmd_per_lun for better throughput Greg Kroah-Hartman
2017-03-24 17:58 ` [PATCH 4.4 07/30] ACPI / video: skip evaluating _DOD when it does not exist Greg Kroah-Hartman
2017-03-24 17:58 ` [PATCH 4.4 08/30] pinctrl: cherryview: Do not mask all interrupts in probe Greg Kroah-Hartman
2017-03-24 17:58 ` [PATCH 4.4 09/30] Drivers: hv: balloon: dont crash when memory is added in non-sorted order Greg Kroah-Hartman
2017-03-24 17:58 ` [PATCH 4.4 10/30] Drivers: hv: avoid vfree() on crash Greg Kroah-Hartman
2017-03-24 17:58 ` [PATCH 4.4 11/30] xen/qspinlock: Dont kick CPU if IRQ is not initialized Greg Kroah-Hartman
2017-03-24 17:58 ` [PATCH 4.4 12/30] KVM: PPC: Book3S PR: Fix illegal opcode emulation Greg Kroah-Hartman
2017-03-24 17:58 ` [PATCH 4.4 13/30] s390/pci: fix use after free in dma_init Greg Kroah-Hartman
2017-03-24 17:58 ` [PATCH 4.4 14/30] drm/amdgpu: add missing irq.h include Greg Kroah-Hartman
2017-03-24 17:58 ` [PATCH 4.4 15/30] tpm_tis: Use devm_free_irq not free_irq Greg Kroah-Hartman
2017-03-24 17:58 ` [PATCH 4.4 16/30] hv_netvsc: use skb_get_hash() instead of a homegrown implementation Greg Kroah-Hartman
2017-03-24 17:58 ` [PATCH 4.4 17/30] kernek/fork.c: allocate idle task for a CPU always on its local node Greg Kroah-Hartman
2017-03-24 17:58 ` [PATCH 4.4 18/30] give up on gcc ilog2() constant optimizations Greg Kroah-Hartman
2017-03-24 17:58 ` [PATCH 4.4 19/30] perf/core: Fix event inheritance on fork() Greg Kroah-Hartman
2017-03-24 17:58 ` [PATCH 4.4 20/30] cpufreq: Fix and clean up show_cpuinfo_cur_freq() Greg Kroah-Hartman
2017-03-24 17:59 ` [PATCH 4.4 21/30] powerpc/boot: Fix zImage TOC alignment Greg Kroah-Hartman
2017-03-24 17:59 ` Greg Kroah-Hartman [this message]
2017-03-24 17:59 ` [PATCH 4.4 23/30] target/pscsi: Fix TYPE_TAPE + TYPE_MEDIMUM_CHANGER export Greg Kroah-Hartman
2017-03-24 17:59 ` [PATCH 4.4 24/30] scsi: lpfc: Add shutdown method for kexec Greg Kroah-Hartman
2017-03-24 17:59 ` [PATCH 4.4 25/30] scsi: libiscsi: add lock around task lists to fix list corruption regression Greg Kroah-Hartman
2017-03-24 17:59 ` [PATCH 4.4 26/30] target: Fix VERIFY_16 handling in sbc_parse_cdb Greg Kroah-Hartman
2017-03-24 17:59 ` [PATCH 4.4 27/30] isdn/gigaset: fix NULL-deref at probe Greg Kroah-Hartman
2017-03-24 17:59 ` [PATCH 4.4 28/30] gfs2: Avoid alignment hole in struct lm_lockname Greg Kroah-Hartman
2017-03-24 17:59 ` [PATCH 4.4 29/30] percpu: acquire pcpu_lock when updating pcpu_nr_empty_pop_pages Greg Kroah-Hartman
2017-03-24 17:59 ` [PATCH 4.4 30/30] ext4: fix fencepost in s_first_meta_bg validation Greg Kroah-Hartman
2017-03-25  0:02 ` [PATCH 4.4 00/30] 4.4.57-stable review Shuah Khan
     [not found] ` <58d5acb7.5ed7190a.6dd61.3845@mx.google.com>
     [not found]   ` <m2vaqycj2a.fsf@baylibre.com>
2017-03-25  4:15     ` Guenter Roeck
2017-03-25 11:27       ` Alexandre Belloni
2017-03-25 11:35         ` Alexandre Belloni
2017-03-25  4:16 ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170324151222.149528027@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=colyli@suse.de \
    --cc=jinpu.wang@profitbricks.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neilb@suse.com \
    --cc=shli@fb.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).