From: NeilBrown <neilb@suse.com>
To: Xiao Ni <xni@redhat.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without
Date: Mon, 09 Oct 2017 15:57:43 +1100 [thread overview]
Message-ID: <87a810zznc.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <1345780738.18087591.1507512089744.JavaMail.zimbra@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 5608 bytes --]
On Sun, Oct 08 2017, Xiao Ni wrote:
> ----- Original Message -----
>> From: "NeilBrown" <neilb@suse.com>
>> To: "Xiao Ni" <xni@redhat.com>
>> Cc: linux-raid@vger.kernel.org
>> Sent: Friday, October 6, 2017 12:32:19 PM
>> Subject: Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without
>>
>> On Fri, Oct 06 2017, Xiao Ni wrote:
>>
>> > On 10/05/2017 01:17 PM, NeilBrown wrote:
>> >> On Thu, Sep 14 2017, Xiao Ni wrote:
>> >>
>> >>>> What do
>> >>>> cat /proc/8987/stack
>> >>>> cat /proc/8983/stack
>> >>>> cat /proc/8966/stack
>> >>>> cat /proc/8381/stack
>> >>>>
>> >>>> show??
>> >> ...
>> >>
>> >>> /usr/sbin/mdadm --grow --continue /dev/md0. Is it the reason to add
>> >>> lockdep_assert_held(&mddev->reconfig_mutex)?
>> >>> [root@dell-pr1700-02 ~]# cat /proc/8983/stack
>> >>> [<ffffffffa0a3464c>] mddev_suspend+0x12c/0x160 [md_mod]
>> >>> [<ffffffffa0a379ec>] suspend_lo_store+0x7c/0xe0 [md_mod]
>> >>> [<ffffffffa0a3b7d0>] md_attr_store+0x80/0xc0 [md_mod]
>> >>> [<ffffffff812ec8da>] sysfs_kf_write+0x3a/0x50
>> >>> [<ffffffff812ec39f>] kernfs_fop_write+0xff/0x180
>> >>> [<ffffffff81260457>] __vfs_write+0x37/0x170
>> >>> [<ffffffff812619e2>] vfs_write+0xb2/0x1b0
>> >>> [<ffffffff81263015>] SyS_write+0x55/0xc0
>> >>> [<ffffffff810037c7>] do_syscall_64+0x67/0x150
>> >>> [<ffffffff81777527>] entry_SYSCALL64_slow_path+0x25/0x25
>> >>> [<ffffffffffffffff>] 0xffffffffffffffff
>> >>>
>> >>> [jbd2/md0-8]
>> >>> [root@dell-pr1700-02 ~]# cat /proc/8966/stack
>> >>> [<ffffffffa0a39b20>] md_write_start+0xf0/0x220 [md_mod]
>> >>> [<ffffffffa0972b49>] raid5_make_request+0x89/0x8b0 [raid456]
>> >>> [<ffffffffa0a34175>] md_make_request+0xf5/0x260 [md_mod]
>> >>> [<ffffffff81376427>] generic_make_request+0x117/0x2f0
>> >>> [<ffffffff81376675>] submit_bio+0x75/0x150
>> >>> [<ffffffff8129e0b0>] submit_bh_wbc+0x140/0x170
>> >>> [<ffffffff8129e683>] submit_bh+0x13/0x20
>> >>> [<ffffffffa0957e29>] jbd2_write_superblock+0x109/0x230 [jbd2]
>> >>> [<ffffffffa0957f8b>] jbd2_journal_update_sb_log_tail+0x3b/0x80 [jbd2]
>> >>> [<ffffffffa09517ff>] jbd2_journal_commit_transaction+0x16ef/0x19e0 [jbd2]
>> >>> [<ffffffffa0955d02>] kjournald2+0xd2/0x260 [jbd2]
>> >>> [<ffffffff810c73f9>] kthread+0x109/0x140
>> >>> [<ffffffff817776c5>] ret_from_fork+0x25/0x30
>> >>> [<ffffffffffffffff>] 0xffffffffffffffff
>> >> Thanks for this (and sorry it took so long to get to it).
>> >> It looks like
>> >>
>> >> Commit: cc27b0c78c79 ("md: fix deadlock between mddev_suspend() and
>> >> md_write_start()")
>> >>
>> >> is badly broken. I wonder how it ever passed testing.
>> >>
>> >> In write_start() is change the wait_event() call to
>> >>
>> >> wait_event(mddev->sb_wait,
>> >> !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) &&
>> >> !mddev->suspended);
>> >>
>> >>
>> >> That should be
>> >>
>> >> wait_event(mddev->sb_wait,
>> >> !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) ||
>> >> mddev->suspended);
>> > Hi Neil
>> >
>> > Do we want write bio can be handled when mddev->suspended is 1? After
>> > changing to this,
>> > write bio can be handled when mddev->suspended is 1.
>>
>> This is OK.
>> New write bios will not get past md_handle_request().
>> A write bios that did get past md_handle_request() is still allowed
>> through md_write_start(). The mddev_suspend() call won't complete until
>> that write bio has finished.
>
> Hi Neil
>
> Thanks for the explanation. I took some time to read the emails about the
> patch cc27b0c78 which introduced this. It's similar with this problem I
> countered. But there is a call of function mddev_suspend in level_store.
> So add the check of mddev->suspended in md_write_start can fix the problem
> "reshape raid5 -> raid6 atop bcache deadlocks at start on md_attr_store /
> raid5_make_request".
>
> In function suspend_lo_store it doesn't call mddev_suspend under mddev->reconfig_mutex.
It would if you had applied
[PATCH 3/4] md: use mddev_suspend/resume instead of ->quiesce()
Did you apply all 4 patches?
> So there is still a race possibility as you said at first analysis.
>>
>> >
>> > When the stuck happens, mddev->suspended is 0 and MD_SB_CHANGE_PENDING
>> > is set. So
>> > the patch can't fix this problem. I tried the patch, the problem still
>> > exists.
>> >
>>
>>
>> I need to see all the stack traces.
>
> I've added the calltrace as a attachment.
Thanks. I looks suspend_lo_store() is calling raid5_quiesce() directly
as you say - so a patch is missing.
>
>>
>>
>> > [ 7710.589274] mddev suspend : 0
>> > [ 7710.592228] mddev ro : 0
>> > [ 7710.594746] mddev insync : 0
>> > [ 7710.597620] mddev SB CHANGE PENDING is set
>> > [ 7710.601698] mddev SB CHANGE CLEAN is set
>> > [ 7710.605601] mddev->persistent : 1
>> > [ 7710.608905] mddev->external : 0
>> > [ 7710.612030] conf quiesce : 2
>> >
>> > raid5 is still spinning.
>> >
>> > Hmm, I have a question. Why can't call md_check_recovery when
>> > MD_SB_CHANGE_PENDING
>> > is set in raid5d?
>>
>> When MD_SB_CHANGE_PENDING is not set, there is no need to call
>> md_check_recovery(). I wouldn't hurt except that it would be a waste of
>> time.
>
> I'm confused. If we want to call md_check_recovery when MD_SB_CHANGE_PENDING
> is set, it should be
Sorry, I described the condition wrongly.
If any bit is set in ->sb_flags (except MD_SB_CHANGE_PENDING), then
we need to call md_check_recovery(). If none of those other bits
are set, there is no need.
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
next prev parent reply other threads:[~2017-10-09 4:57 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-12 1:49 [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without NeilBrown
2017-09-12 1:49 ` [PATCH 1/4] md: always hold reconfig_mutex when calling mddev_suspend() NeilBrown
2017-09-12 1:49 ` [PATCH 2/4] md: don't call bitmap_create() while array is quiesced NeilBrown
2017-09-12 1:49 ` [PATCH 4/4] md: allow metadata update while suspending NeilBrown
2017-09-12 1:49 ` [PATCH 3/4] md: use mddev_suspend/resume instead of ->quiesce() NeilBrown
2017-09-12 2:51 ` [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without Xiao Ni
2017-09-13 2:11 ` Xiao Ni
2017-09-13 15:09 ` Xiao Ni
2017-09-13 23:05 ` NeilBrown
2017-09-14 4:55 ` Xiao Ni
2017-09-14 5:32 ` NeilBrown
2017-09-14 7:57 ` Xiao Ni
2017-09-16 13:15 ` Xiao Ni
2017-10-05 5:17 ` NeilBrown
2017-10-06 3:53 ` Xiao Ni
2017-10-06 4:32 ` NeilBrown
2017-10-09 1:21 ` Xiao Ni
2017-10-09 4:57 ` NeilBrown [this message]
2017-10-09 5:32 ` Xiao Ni
2017-10-09 5:52 ` NeilBrown
2017-10-10 6:05 ` Xiao Ni
2017-10-10 21:20 ` NeilBrown
[not found] ` <960568852.19225619.1507689864371.JavaMail.zimbra@redhat.com>
2017-10-13 3:48 ` NeilBrown
2017-10-16 4:43 ` Xiao Ni
2017-09-30 9:46 ` Xiao Ni
2017-10-05 5:03 ` NeilBrown
2017-10-06 3:40 ` Xiao Ni
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87a810zznc.fsf@notabene.neil.brown.name \
--to=neilb@suse.com \
--cc=linux-raid@vger.kernel.org \
--cc=xni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).