From: NeilBrown <neilb@suse.com>
To: Shaohua Li <shli@kernel.org>
Cc: linux-raid@vger.kernel.org
Subject: Re: [PATCH 1/3] MD: hold mddev lock for .quiesce in md_do_sync
Date: Thu, 25 Aug 2016 14:59:13 +1000 [thread overview]
Message-ID: <87bn0hfnq6.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <20160824052512.GA1921@kernel.org>
[-- Attachment #1: Type: text/plain, Size: 3542 bytes --]
On Wed, Aug 24 2016, Shaohua Li wrote:
> On Wed, Aug 24, 2016 at 02:49:57PM +1000, Neil Brown wrote:
>> On Wed, Aug 17 2016, Shaohua Li wrote:
>> >> >
>> >> > We will have the same deadlock issue with just stopping/restarting the reclaim
>> >> > thread. As stopping the thread will wait for the thread, which probably is
>> >> > doing r5l_do_reclaim and writting the superblock. Since we are writting the
>> >> > superblock, we must hold the reconfig_mutex.
>> >>
>> >> When you say "writing the superblock" you presumably mean "blocked in
>> >> r5l_write_super_and_discard_space(), waiting for MD_CHANGE_PENDING to
>> >> be cleared" ??
>> > right
>> >> With a bit of care you could wait for MD_CHANGE_PENDING to clear, or for
>> >> ->quiesce to be set, and then exit gracefully.
>> >
>> > Can you give details about this please? .quiesce is called with reconfig_mutex
>> > hold, so the MD_CHANGE_PENDING will never get cleared.
>>
>> raid5_quiesce(mddev, 1) sets conf->quiesce and then calls r5l_quiesce().
>>
>> r5l_quiesce() tells the reclaim_thread to exit and waits for it to do so.
>>
>> But the reclaim thread might be in
>> r5l_do_reclaim() -> r5l_write_super_and_discard_space()
>> waiting for MD_CHANGE_PENDING to clear. That will only get cleared when
>> the main thread can get the reconfig_mutex, which the thread calling
>> raid5_quiesce() might hold. So we get a deadlock.
>>
>> My suggestion is to change r5l_write_super_and_discard_space() so that
>> it waits for *either* MD_CHANGE_PENDING to be clear, or conf->quiesce
>> to be set. That will avoid the deadlock.
>>
>> Whatever thread called raid5_quiesce() will now be in control of the
>> array without any async IO going on. If it needs the metadata to be
>> sync, it can do that itself. If not, then it doesn't really matter that
>> r5l_write_super_and_discard_space() didn't wait.
>
> I'm afraid waiting conf->quiesce set isn't safe. The reason to wait for
> superblock write isn't because of async IO. discard could zero data, so before
> we do discard, we must make sure superblock points to correct log tail,
> otherwise recovery will not work. This is the reason we wait for superblock
> write.
>
>> r5l_write_super_and_discard_space() shouldn't call discard if the
>> superblock write didn't complete, and probably r5l_do_reclaim()
>> shouldn't update last_checkpoint and last_cp_seq in that case.
>> This is what I mean by "with a bit of care" and "exit gracefully".
>> Maybe I should have said "abort cleanly". The goal is to get the thread
>> to exit. It doesn't need to complete what it was doing, it just needs
>> to make sure that it leaves things in a tidy state so that when it
>> starts up again, it can pick up where it left off.
>
> Agree, we could ignore discard sometime, which happens occasionally, so impact
> is little. I tested something like below recently. Assume this is the solution
> we agree on?
Yes, this definitely looks like it is heading in the right direction.
I thought that
> - set_mask_bits(&mddev->flags, 0,
> - BIT(MD_CHANGE_DEVS) | BIT(MD_CHANGE_PENDING));
> - md_wakeup_thread(mddev->thread);
would still be there in the case that the lock cannot be claimed.
You could even record the ->events value before setting the flags,
and record the range that needs to be discarded. Next time
r5l_do_reclaim is entered, if ->events has moved on, then it should be
safe to discard the recorded range. Maybe.
Thanks,
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]
next prev parent reply other threads:[~2016-08-25 4:59 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-30 23:54 [PATCH 1/3] MD: hold mddev lock for .quiesce in md_do_sync shli
2016-07-30 23:54 ` [PATCH 2/3] MD: hold mddev lock to change bitmap location shli
2016-08-03 0:03 ` NeilBrown
2016-07-30 23:54 ` [PATCH 3/3] MD: hold mddev lock for md-cluster receive thread shli
2016-08-01 8:38 ` Guoqing Jiang
2016-08-01 21:45 ` Shaohua Li
2016-08-02 9:52 ` Guoqing Jiang
2016-08-02 22:44 ` Shaohua Li
2016-08-03 3:18 ` Guoqing Jiang
2016-08-03 0:09 ` NeilBrown
2016-08-03 3:42 ` Guoqing Jiang
2016-07-31 6:03 ` [PATCH 1/3] MD: hold mddev lock for .quiesce in md_do_sync yizhan
2016-08-02 23:47 ` NeilBrown
2016-08-04 3:16 ` NeilBrown
2016-08-06 4:14 ` Shaohua Li
2016-08-12 0:04 ` NeilBrown
2016-08-17 1:28 ` Shaohua Li
2016-08-24 4:49 ` NeilBrown
2016-08-24 5:25 ` Shaohua Li
2016-08-25 4:59 ` NeilBrown [this message]
2016-08-25 17:17 ` Shaohua Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87bn0hfnq6.fsf@notabene.neil.brown.name \
--to=neilb@suse.com \
--cc=linux-raid@vger.kernel.org \
--cc=shli@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).