From: Jinpu Wang <jinpu.wang@profitbricks.com>
To: NeilBrown <neilb@suse.com>
Cc: linux-raid@vger.kernel.org, Shaohua Li <shli@fb.com>,
Nate Dailey <nate.dailey@stratus.com>
Subject: Re: [BUG] MD/RAID1 hung forever on freeze_array
Date: Wed, 14 Dec 2016 13:13:27 +0100 [thread overview]
Message-ID: <CAMGffE=KoVdoYRzkHdRMuCopjmUdcrP9-woFFr-4-VszGsSHRQ@mail.gmail.com> (raw)
In-Reply-To: <CAMGffEnCesgUp4gBsPN2L9qg3WSxNXsCcYEPWH-BaeEEktaqcw@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 8255 bytes --]
On Wed, Dec 14, 2016 at 11:22 AM, Jinpu Wang
<jinpu.wang@profitbricks.com> wrote:
> Thanks Neil,
>
> On Tue, Dec 13, 2016 at 11:18 PM, NeilBrown <neilb@suse.com> wrote:
>> On Wed, Dec 14 2016, Jinpu Wang wrote:
>>
>>>
>>> As you suggested, I re-run same test with 4.4.36 with no our own patch on MD.
>>> I can still reproduce the same bug, nr_pending on heathy leg(loop1) is till 1.
>>>
>>
>> Thanks.
>>
>> I have an hypothesis.
>>
>> md_make_request() calls blk_queue_split().
>> If that does split the request it will call generic_make_request()
>> on the first half. That will call back into md_make_request() and
>> raid1_make_request() which will submit requests to the underlying
>> devices. These will get caught on the bio_list_on_stack queue in
>> generic_make_request().
>> This is a queue which is not accounted in nr_queued.
>>
>> When blk_queue_split() completes, 'bio' will be the second half of the
>> bio.
>> This enters raid1_make_request() and by this time the array have been
>> frozen.
>> So wait_barrier() has to wait for pending requests to complete, and that
>> includes the one that it stuck in bio_list_on_stack, which will never
>> complete now.
>>
>> To see if this might be happening, please change the
>>
>> blk_queue_split(q, &bio, q->bio_split);
>>
>> call in md_make_request() to
>>
>> struct bio *tmp = bio;
>> blk_queue_split(q, &bio, q->bio_split);
>> WARN_ON_ONCE(bio != tmp);
>>
>> If that ever triggers, then the above is a real possibility.
>
> I triggered the warning as you expected, we can confirm the bug was
> caused by your above hypothesis.
> [ 429.282235] ------------[ cut here ]------------
> [ 429.282407] WARNING: CPU: 2 PID: 4139 at drivers/md/md.c:262
> md_set_array_sectors+0xac0/0xc30 [md_mod]()
> [ 429.285288] Modules linked in: raid1 ibnbd_client(O)
> ibtrs_client(O) dm_service_time dm_multipath rdma_ucm ib_ucm rdma_cm
> iw_cm ib_ipoib ib_cm ib_uverbs ib_umad mlx5_ib mlx5_c
> ore vxlan ip6_udp_tunnel udp_tunnel mlx4_ib ib_sa ib_mad ib_core
> ib_addr ib_netlink mlx4_core mlx_compat loop md_mod kvm_amd
> edac_mce_amd kvm edac_core irqbypass acpi_cpufreq tpm
> _infineon tpm_tis i2c_piix4 tpm serio_raw evdev k10temp processor
> button fam15h_power crct10dif_pclmul crc32_pclmul sg sd_mod ahci
> libahci libata scsi_mod crc32c_intel r8169 psmo
> use xhci_pci xhci_hcd [last unloaded: mlx_compat]
> [ 429.288543] CPU: 2 PID: 4139 Comm: fio Tainted: G O 4.4.36-1-pse
> rver #1
> [ 429.288825] Hardware name: To be filled by O.E.M. To be filled by
> O.E.M./M5A97 R2.0, BIOS 2501 04/07/2014
> [ 429.289113] 0000000000000000 ffff8801f64ff8f0 ffffffff81424486
> 0000000000000000
> [ 429.289538] ffffffffa0561938 ffff8801f64ff928 ffffffff81058a60
> ffff8800b8f3e000
> [ 429.290157] 0000000000000000 ffff8800b51f4100 ffff880234f9a700
> ffff880234f9a700
> [ 429.290594] Call Trace:
> [ 429.290743] [<ffffffff81424486>] dump_stack+0x4d/0x67
> [ 429.290893] [<ffffffff81058a60>] warn_slowpath_common+0x90/0xd0
> [ 429.291046] [<ffffffff81058b55>] warn_slowpath_null+0x15/0x20
> [ 429.291202] [<ffffffffa0550740>] md_set_array_sectors+0xac0/0xc30 [md_mod]
> [ 429.291358] [<ffffffff813fd3de>] generic_make_request+0xfe/0x1e0
> [ 429.291540] [<ffffffff813fd522>] submit_bio+0x62/0x150
> [ 429.291693] [<ffffffff813f53d9>] ? bio_set_pages_dirty+0x49/0x60
> [ 429.291847] [<ffffffff811d32a7>] do_blockdev_direct_IO+0x2317/0x2ba0
> [ 429.292011] [<ffffffffa0834f64>] ?
> ib_post_rdma_write_imm+0x24/0x30 [ibtrs_client]
> [ 429.292271] [<ffffffff811cdc40>] ? I_BDEV+0x10/0x10
> [ 429.292417] [<ffffffff811d3b6e>] __blockdev_direct_IO+0x3e/0x40
> [ 429.292566] [<ffffffff811ce2d7>] blkdev_direct_IO+0x47/0x50
> [ 429.292746] [<ffffffff81132abf>] generic_file_read_iter+0x45f/0x580
> [ 429.292894] [<ffffffff811ce620>] ? blkdev_write_iter+0x110/0x110
> [ 429.293073] [<ffffffff811ce652>] blkdev_read_iter+0x32/0x40
> [ 429.293284] [<ffffffff811deb86>] aio_run_iocb+0x116/0x2a0
> [ 429.293492] [<ffffffff813fed52>] ? blk_flush_plug_list+0xc2/0x200
> [ 429.293703] [<ffffffff81183ac6>] ? kmem_cache_alloc+0xb6/0x180
> [ 429.293901] [<ffffffff811dfaf4>] ? do_io_submit+0x184/0x4d0
> [ 429.294047] [<ffffffff811dfbaa>] do_io_submit+0x23a/0x4d0
> [ 429.294194] [<ffffffff811dfe4b>] SyS_io_submit+0xb/0x10
> [ 429.294375] [<ffffffff81815497>] entry_SYSCALL_64_fastpath+0x12/0x6a
> [ 429.294610] ---[ end trace 25d1cece0e01494b ]---
>
> I double checked the nr_pending on heathy leg is still 1 as before.
>
>>
>> Fixing the problem isn't very easy...
>>
>> You could try:
>> 1/ write a function in raid1.c which calls punt_bios_to_rescuer()
>> (which you will need to export from block/bio.c),
>> passing mddev->queue->bio_split as the bio_set.
>>
>> 1/ change the wait_event_lock_irq() call in wait_barrier() to
>> wait_event_lock_irq_cmd(), and pass the new function as the command.
>>
>> That way, if wait_barrier() ever blocks, all the requests in
>> bio_list_on_stack will be handled by a separate thread.
>>
>> NeilBrown
>
> I will try your sugested way to see if it fix the bug, will report back soon.
>
Hi Neil,
Sorry, bad news, with the 2 patch attached, I can still reproduce the same bug.
nr_pending on healthy leg is still 1, as before.
crash> struct r1conf 0xffff8800b7176100
struct r1conf {
mddev = 0xffff8800b59b0000,
mirrors = 0xffff88022bab7900,
raid_disks = 2,
next_resync = 18446744073709527039,
start_next_window = 18446744073709551615,
current_window_requests = 0,
next_window_requests = 0,
device_lock = {
{
rlock = {
raw_lock = {
val = {
counter = 0
}
}
}
}
},
retry_list = {
next = 0xffff880211b2ec40,
prev = 0xffff88022819ad40
},
bio_end_io_list = {
next = 0xffff880227e9a9c0,
prev = 0xffff8802119c6140
},
pending_bio_list = {
head = 0x0,
tail = 0x0
},
pending_count = 0,
wait_barrier = {
lock = {
{
rlock = {
raw_lock = {
val = {
counter = 0
}
}
}
}
},
task_list = {
next = 0xffff8800adf3b818,
prev = 0xffff88021180f7a8
}
},
resync_lock = {
{
rlock = {
raw_lock = {
val = {
counter = 0
}
}
}
}
},
nr_pending = 1675,
nr_waiting = 100,
nr_queued = 1673,
barrier = 0,
array_frozen = 1,
fullsync = 0,
recovery_disabled = 1,
poolinfo = 0xffff88022c80f640,
r1bio_pool = 0xffff88022b8b6a20,
r1buf_pool = 0x0,
tmppage = 0xffffea0008a90c80,
thread = 0x0,
cluster_sync_low = 0,
cluster_sync_high = 0
}
kobj = {
name = 0xffff88022b7194a0 "dev-loop1",
entry = {
next = 0xffff880231495280,
prev = 0xffff880231495280
},
parent = 0xffff8800b59b0050,
kset = 0x0,
ktype = 0xffffffffa0564060 <rdev_ktype>,
sd = 0xffff8800b6510960,
kref = {
refcount = {
counter = 1
}
},
state_initialized = 1,
state_in_sysfs = 1,
state_add_uevent_sent = 0,
state_remove_uevent_sent = 0,
uevent_suppress = 0
},
flags = 2,
blocked_wait = {
lock = {
{
rlock = {
raw_lock = {
val = {
counter = 0
}
}
}
}
},
task_list = {
next = 0xffff8802314952c8,
prev = 0xffff8802314952c8
}
},
desc_nr = 1,
raid_disk = 1,
new_raid_disk = 0,
saved_raid_disk = -1,
{
recovery_offset = 0,
journal_tail = 0
},
nr_pending = {
counter = 1
},
--
Jinpu Wang
Linux Kernel Developer
ProfitBricks GmbH
Greifswalder Str. 207
D - 10405 Berlin
Tel: +49 30 577 008 042
Fax: +49 30 577 008 299
Email: jinpu.wang@profitbricks.com
URL: https://www.profitbricks.de
Sitz der Gesellschaft: Berlin
Registergericht: Amtsgericht Charlottenburg, HRB 125506 B
Geschäftsführer: Achim Weiss
[-- Attachment #2: 0001-block-export-punt_bios_to_rescuer.patch --]
[-- Type: text/x-patch, Size: 1566 bytes --]
From e7adbbb1a8d542ea68ada5996e0f9ffe87c479b6 Mon Sep 17 00:00:00 2001
From: Jack Wang <jinpu.wang@profitbricks.com>
Date: Wed, 14 Dec 2016 11:26:23 +0100
Subject: [PATCH 1/2] block: export punt_bios_to_rescuer
We need it later in raid1
Suggested-by: Neil Brown <neil.brown@suse.com>
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
block/bio.c | 3 ++-
include/linux/bio.h | 1 +
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/block/bio.c b/block/bio.c
index 46e2cc1..f6a250d 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -354,7 +354,7 @@ static void bio_alloc_rescue(struct work_struct *work)
}
}
-static void punt_bios_to_rescuer(struct bio_set *bs)
+void punt_bios_to_rescuer(struct bio_set *bs)
{
struct bio_list punt, nopunt;
struct bio *bio;
@@ -384,6 +384,7 @@ static void punt_bios_to_rescuer(struct bio_set *bs)
queue_work(bs->rescue_workqueue, &bs->rescue_work);
}
+EXPORT_SYMBOL(punt_bios_to_rescuer);
/**
* bio_alloc_bioset - allocate a bio for I/O
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 42e4e3c..6256ba7 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -479,6 +479,7 @@ extern void bio_advance(struct bio *, unsigned);
extern void bio_init(struct bio *);
extern void bio_reset(struct bio *);
void bio_chain(struct bio *, struct bio *);
+void punt_bios_to_rescuer(struct bio_set *);
extern int bio_add_page(struct bio *, struct page *, unsigned int,unsigned int);
extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *,
--
2.7.4
[-- Attachment #3: 0002-raid1-fix-deadlock.patch --]
[-- Type: text/x-patch, Size: 1420 bytes --]
From 2ad4cc5e8b5d7ec9db7a6fffaa2fdcd5f20419bf Mon Sep 17 00:00:00 2001
From: Jack Wang <jinpu.wang@profitbricks.com>
Date: Wed, 14 Dec 2016 11:35:52 +0100
Subject: [PATCH 2/2] raid1: fix deadlock
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
drivers/md/raid1.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 478223c..61dafb1 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -190,6 +190,11 @@ static void put_all_bios(struct r1conf *conf, struct r1bio *r1_bio)
}
}
+static void raid1_punt_bios_to_rescuer(struct r1conf *conf)
+{
+ punt_bios_to_rescuer(conf->mddev->queue->bio_split);
+}
+
static void free_r1bio(struct r1bio *r1_bio)
{
struct r1conf *conf = r1_bio->mddev->private;
@@ -871,14 +876,15 @@ static sector_t wait_barrier(struct r1conf *conf, struct bio *bio)
* that queue to allow conf->start_next_window
* to increase.
*/
- wait_event_lock_irq(conf->wait_barrier,
+ wait_event_lock_irq_cmd(conf->wait_barrier,
!conf->array_frozen &&
(!conf->barrier ||
((conf->start_next_window <
conf->next_resync + RESYNC_SECTORS) &&
current->bio_list &&
!bio_list_empty(current->bio_list))),
- conf->resync_lock);
+ conf->resync_lock,
+ raid1_punt_bios_to_rescuer(conf));
conf->nr_waiting--;
}
--
2.7.4
next prev parent reply other threads:[~2016-12-14 12:13 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-25 13:30 [BUG] MD/RAID1 hung forever on freeze_array Jinpu Wang
2016-11-25 13:59 ` Jinpu Wang
2016-11-28 4:47 ` Coly Li
2016-11-28 8:24 ` Jinpu Wang
2016-11-28 8:54 ` Coly Li
2016-11-28 9:02 ` Jinpu Wang
2016-11-28 9:10 ` Coly Li
2016-11-29 11:15 ` Jinpu Wang
2016-12-07 14:17 ` Jinpu Wang
2016-12-08 3:17 ` NeilBrown
2016-12-08 9:50 ` Jinpu Wang
2016-12-09 6:01 ` NeilBrown
2016-12-09 15:28 ` Jinpu Wang
2016-12-09 15:36 ` Jinpu Wang
2016-12-12 0:59 ` NeilBrown
2016-12-12 13:10 ` Jinpu Wang
2016-12-12 21:53 ` NeilBrown
2016-12-13 15:08 ` Jinpu Wang
2016-12-13 22:18 ` NeilBrown
2016-12-14 10:22 ` Jinpu Wang
2016-12-14 12:13 ` Jinpu Wang [this message]
2016-12-14 14:49 ` Jinpu Wang
2016-12-15 3:20 ` NeilBrown
2016-12-15 9:24 ` Jinpu Wang
[not found] ` <CAMGffEkufeaDytaHxtLR02iiQifZDhcwkLdzMj3X8_yaitSoFQ@mail.gmail.com>
2016-12-19 14:56 ` Jinpu Wang
2016-12-19 22:45 ` NeilBrown
2016-12-20 10:34 ` Jinpu Wang
2016-12-20 21:23 ` NeilBrown
2016-12-21 12:48 ` Jinpu Wang
2016-12-21 23:51 ` NeilBrown
2016-12-22 8:35 ` Jinpu Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAMGffE=KoVdoYRzkHdRMuCopjmUdcrP9-woFFr-4-VszGsSHRQ@mail.gmail.com' \
--to=jinpu.wang@profitbricks.com \
--cc=linux-raid@vger.kernel.org \
--cc=nate.dailey@stratus.com \
--cc=neilb@suse.com \
--cc=shli@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).