From: NeilBrown <neilb@suse.com>
To: Shaohua Li <shli@kernel.org>
Cc: linux-raid@vger.kernel.org, Shaohua Li <shli@fb.com>
Subject: Re: [PATCH 1/3] MD: hold mddev lock for .quiesce in md_do_sync
Date: Fri, 12 Aug 2016 10:04:05 +1000 [thread overview]
Message-ID: <87h9aqluoa.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <20160806041428.GB9281@kernel.org>
[-- Attachment #1: Type: text/plain, Size: 3748 bytes --]
On Sat, Aug 06 2016, Shaohua Li wrote:
> On Thu, Aug 04, 2016 at 01:16:49PM +1000, Neil Brown wrote:
>> On Wed, Aug 03 2016, NeilBrown wrote:
>>
>> > [ Unknown signature status ]
>> > On Sun, Jul 31 2016, shli@kernel.org wrote:
>> >
>> >> From: Shaohua Li <shli@fb.com>
>> >>
>> >> .quiesce is called with mddev lock hold at most places. There are few
>> >> exceptions. Calling .quesce without the lock hold could create races. For
>> >> example, the .quesce of raid1 can't be recursively. The purpose of the patches
>> >> is to fix a race in raid5-cache. The raid5-cache .quesce will write md
>> >> superblock and should be called with mddev lock hold.
>> >>
>> >> Cc: NeilBrown <neilb@suse.com>
>> >> Signed-off-by: Shaohua Li <shli@fb.com>
>> >
>> > Acked-by: NeilBrown <neilb@suse.com>
>> >
>> > This should be safe but I'm not sure I really like it.
>> > The raid1 quiesce could be changed so that it can be called recursively.
>> > The raid5-cache situation would be harder to get right and maybe this is
>> > the best solution... It's just that 'quiesce' should be a fairly
>> > light-weight operation, just waiting for pending requests to flush. It
>> > shouldn't really *need* a lock.
>>
>> Actually, the more I think about this, the less I like it.
>>
>> I would much rather make .quiesce lighter weight so that no locking was
>> needed.
>>
>> For r5l_quiesce, that probable means removed the "r5l_do_reclaim()".
>> Stopping and restarting the reclaim thread seems reasonable, but calling
>> r5l_do_reclaim() should not be needed. It should be done periodically
>> by the thread, and at 'stop' time, but otherwise isn't needed.
>> You would need to hold some mutex while calling md_register_thread, but
>> that could be probably be log->io_mutex, or maybe even some other new
>> mutex
>
> We will have the same deadlock issue with just stopping/restarting the reclaim
> thread. As stopping the thread will wait for the thread, which probably is
> doing r5l_do_reclaim and writting the superblock. Since we are writting the
> superblock, we must hold the reconfig_mutex.
When you say "writing the superblock" you presumably mean "blocked in
r5l_write_super_and_discard_space(), waiting for MD_CHANGE_PENDING to
be cleared" ??
With a bit of care you could wait for MD_CHANGE_PENDING to clear, or for
->quiesce to be set, and then exit gracefully.
>
> Letting raid5_quiesce call r5l_do_reclaim gives us a clean log. Just
> stop/restart the reclaim thread can't guarantee this, as it's possible some
> space aren't reclaimed yet. A clean log will simplify a lot of things, for
> example we change the layout of the array. The log doesn't need to remember
> which part is for the old layout and which part is the new layout.
I really think you are putting too much functionality into quiesce.
When we change the shape of the array, we do much more than just
quiesce it. We also call check_reshape and start_reshape etc.
They are called with reconfig_mutex held and it would be perfectly
appropriate to finish of the r5l_do_reclaim() work in there.
>
> I think we can add a new parameter for .quiesce to indicate if reconfig_mutex
> is hold. raid5_quiesce can check the parameter and hold reconfig_mutex if
> necessary.
Adding a new parameter because it happens to be convenient in one case
is not necessarily a good idea. It is often a sign that the interface
isn't well designed, or isn't well understood, or is being used poorly.
I really really don't think ->quiesce() should care about whether
reconfig_mutex is held. All it should do is drain all IO and stop new
IO so that other threads can do unusually things in race-free ways.
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]
next prev parent reply other threads:[~2016-08-12 0:04 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-30 23:54 [PATCH 1/3] MD: hold mddev lock for .quiesce in md_do_sync shli
2016-07-30 23:54 ` [PATCH 2/3] MD: hold mddev lock to change bitmap location shli
2016-08-03 0:03 ` NeilBrown
2016-07-30 23:54 ` [PATCH 3/3] MD: hold mddev lock for md-cluster receive thread shli
2016-08-01 8:38 ` Guoqing Jiang
2016-08-01 21:45 ` Shaohua Li
2016-08-02 9:52 ` Guoqing Jiang
2016-08-02 22:44 ` Shaohua Li
2016-08-03 3:18 ` Guoqing Jiang
2016-08-03 0:09 ` NeilBrown
2016-08-03 3:42 ` Guoqing Jiang
2016-07-31 6:03 ` [PATCH 1/3] MD: hold mddev lock for .quiesce in md_do_sync yizhan
2016-08-02 23:47 ` NeilBrown
2016-08-04 3:16 ` NeilBrown
2016-08-06 4:14 ` Shaohua Li
2016-08-12 0:04 ` NeilBrown [this message]
2016-08-17 1:28 ` Shaohua Li
2016-08-24 4:49 ` NeilBrown
2016-08-24 5:25 ` Shaohua Li
2016-08-25 4:59 ` NeilBrown
2016-08-25 17:17 ` Shaohua Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87h9aqluoa.fsf@notabene.neil.brown.name \
--to=neilb@suse.com \
--cc=linux-raid@vger.kernel.org \
--cc=shli@fb.com \
--cc=shli@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).