linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.com>
To: Shaohua Li <shli@kernel.org>
Cc: linux-raid@vger.kernel.org, Shaohua Li <shli@fb.com>
Subject: Re: [PATCH 1/3] MD: hold mddev lock for .quiesce in md_do_sync
Date: Fri, 12 Aug 2016 10:04:05 +1000	[thread overview]
Message-ID: <87h9aqluoa.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <20160806041428.GB9281@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 3748 bytes --]

On Sat, Aug 06 2016, Shaohua Li wrote:

> On Thu, Aug 04, 2016 at 01:16:49PM +1000, Neil Brown wrote:
>> On Wed, Aug 03 2016, NeilBrown wrote:
>> 
>> > [ Unknown signature status ]
>> > On Sun, Jul 31 2016, shli@kernel.org wrote:
>> >
>> >> From: Shaohua Li <shli@fb.com>
>> >>
>> >> .quiesce is called with mddev lock hold at most places. There are few
>> >> exceptions. Calling .quesce without the lock hold could create races. For
>> >> example, the .quesce of raid1 can't be recursively. The purpose of the patches
>> >> is to fix a race in raid5-cache. The raid5-cache .quesce will write md
>> >> superblock and should be called with mddev lock hold.
>> >>
>> >> Cc: NeilBrown <neilb@suse.com>
>> >> Signed-off-by: Shaohua Li <shli@fb.com>
>> >
>> > Acked-by: NeilBrown <neilb@suse.com>
>> >
>> > This should be safe but I'm not sure I really like it.
>> > The raid1 quiesce could be changed so that it can be called recursively.
>> > The raid5-cache situation would be harder to get right and maybe this is
>> > the best solution... It's just that 'quiesce' should be a fairly
>> > light-weight operation, just waiting for pending requests to flush.  It
>> > shouldn't really *need* a lock.
>> 
>> Actually, the more I think about this, the less I like it.
>> 
>> I would much rather make .quiesce lighter weight so that no locking was
>> needed.
>> 
>> For r5l_quiesce, that probable means removed the "r5l_do_reclaim()".
>> Stopping and restarting the reclaim thread seems reasonable, but calling
>> r5l_do_reclaim() should not be needed.  It should be done periodically
>> by the thread, and at 'stop' time, but otherwise isn't needed.
>> You would need to hold some mutex while calling md_register_thread, but
>> that could be probably be log->io_mutex, or maybe even some other new
>> mutex
>
> We will have the same deadlock issue with just stopping/restarting the reclaim
> thread. As stopping the thread will wait for the thread, which probably is
> doing r5l_do_reclaim and writting the superblock. Since we are writting the
> superblock, we must hold the reconfig_mutex.

When you say "writing the superblock" you presumably mean "blocked in
r5l_write_super_and_discard_space(), waiting for  MD_CHANGE_PENDING to
be cleared" ??
With a bit of care you could wait for MD_CHANGE_PENDING to clear, or for
->quiesce to be set, and then exit gracefully.

>
> Letting raid5_quiesce call r5l_do_reclaim gives us a clean log. Just
> stop/restart the reclaim thread can't guarantee this, as it's possible some
> space aren't reclaimed yet. A clean log will simplify a lot of things, for
> example we change the layout of the array. The log doesn't need to remember
> which part is for the old layout and which part is the new layout.

I really think you are putting too much functionality into quiesce.
When we change the shape of the array, we do much more than just
quiesce it.  We also call check_reshape and start_reshape etc.
They are called with reconfig_mutex held and it would be perfectly
appropriate to finish of the r5l_do_reclaim() work in there.

>
> I think we can add a new parameter for .quiesce to indicate if reconfig_mutex
> is hold. raid5_quiesce can check the parameter and hold reconfig_mutex if
> necessary.

Adding a new parameter because it happens to be convenient in one case
is not necessarily a good idea.  It is often a sign that the interface
isn't well designed, or isn't well understood, or is being used poorly.

I really really don't think ->quiesce() should care about whether
reconfig_mutex is held.  All it should do is drain all IO and stop new
IO so that other threads can do unusually things in race-free ways.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

  reply	other threads:[~2016-08-12  0:04 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-30 23:54 [PATCH 1/3] MD: hold mddev lock for .quiesce in md_do_sync shli
2016-07-30 23:54 ` [PATCH 2/3] MD: hold mddev lock to change bitmap location shli
2016-08-03  0:03   ` NeilBrown
2016-07-30 23:54 ` [PATCH 3/3] MD: hold mddev lock for md-cluster receive thread shli
2016-08-01  8:38   ` Guoqing Jiang
2016-08-01 21:45     ` Shaohua Li
2016-08-02  9:52       ` Guoqing Jiang
2016-08-02 22:44         ` Shaohua Li
2016-08-03  3:18           ` Guoqing Jiang
2016-08-03  0:09         ` NeilBrown
2016-08-03  3:42           ` Guoqing Jiang
2016-07-31  6:03 ` [PATCH 1/3] MD: hold mddev lock for .quiesce in md_do_sync yizhan
2016-08-02 23:47 ` NeilBrown
2016-08-04  3:16   ` NeilBrown
2016-08-06  4:14     ` Shaohua Li
2016-08-12  0:04       ` NeilBrown [this message]
2016-08-17  1:28         ` Shaohua Li
2016-08-24  4:49           ` NeilBrown
2016-08-24  5:25             ` Shaohua Li
2016-08-25  4:59               ` NeilBrown
2016-08-25 17:17                 ` Shaohua Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h9aqluoa.fsf@notabene.neil.brown.name \
    --to=neilb@suse.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=shli@fb.com \
    --cc=shli@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).