Re: [PATCH V3 08/13] md: set MD_CHANGE_PENDING in a spinlocked region

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: NeilBrown <neilb@suse.de>
To: Shaohua Li <shli@kernel.org>, Guoqing Jiang <gqjiang@suse.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: [PATCH V3 08/13] md: set MD_CHANGE_PENDING in a spinlocked region
Date: Fri, 29 Apr 2016 11:26:32 +1000	[thread overview]
Message-ID: <87eg9purw7.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <20160428035802.GA90901@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 3524 bytes --]

On Thu, Apr 28 2016, Shaohua Li wrote:

> On Wed, Apr 27, 2016 at 10:55:43PM -0400, Guoqing Jiang wrote:
>> 
>> 
>> On 04/27/2016 11:27 AM, Shaohua Li wrote:
>> >On Tue, Apr 26, 2016 at 09:56:26PM -0400, Guoqing Jiang wrote:
>> >>Some code waits for a metadata update by:
>> >>
>> >>1. flagging that it is needed (MD_CHANGE_DEVS or MD_CHANGE_CLEAN)
>> >>2. setting MD_CHANGE_PENDING and waking the management thread
>> >>3. waiting for MD_CHANGE_PENDING to be cleared
>> >>
>> >>If the first two are done without locking, the code in md_update_sb()
>> >>which checks if it needs to repeat might test if an update is needed
>> >>before step 1, then clear MD_CHANGE_PENDING after step 2, resulting
>> >>in the wait returning early.
>> >>
>> >>So make sure all places that set MD_CHANGE_PENDING are protected by
>> >>mddev->lock.
>> >>
>> >>Reviewed-by: NeilBrown <neilb@suse.com>
>> >>Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
>> >>---
>> >>V3 changes:
>> >>1. use spin_lock_irqsave/spin_unlock_irqrestore in error funcs and
>> >>    raid10's __make_request
>> >shouldn't other places use spin_lock_irq/spin_unlock_irq? interrupt can occur
>> >after you do spin_lock(), and if it's md_error, we deadlock.
>> 
>> It could possible in theory if func was interrupted by md_error after it
>> called spin_lock,
>> but seems lots of place in md.c also use spin_lock/unlock for mddev->lock,
>> take
>> md_do_sync and md_update_sb as example, both of them used
>> spin_lock(&mddev->lock)
>> and spin_unlock(&mddev->lock) before.
>> 
>> So I guess it will not cause trouble, otherwise, then we need to change all
>> the usages of
>> spin_lock/unlock(&mddev->lock), or introduce a new lock for this scenario. I
>> am not sure
>> which one is more acceptable.
>
> It doesn't cause trouble, because no interrupt/bh uses lock before. But now we
> use it in softirq, that's the difference. Please enable lockdep, I think it
> will complain. either we change all the locking to irq save or introducing a
> new lock. either is ok.

Thanks for catching this!

As you say, the straight forward solution is change all the current
  spin_lock(&mddev->lock);
to
  spin_lock_irq(&mddev->lock);

and similar for unlock.  Except where it can be called from interrupts
we have to use spin_lock_irqsave().

There is another option that occurs to me.  Not sure if it is elegant or
ugly, so I'm keen to see what you think.

In the places where were set MD_CHANGE_PENDING and one other bit -
either MD_CHANGE_DEVS or MD_CHANGE_CLEAN - we could use set_mask_bits
to set them both atomically.

  set_mask_bits(&mddev->flags, BIT(MD_CHANGE_PENDING) | BIT(MD_CHANGE_DEVS));

Then in md_update_sb, when deciding whether to loop back to "repeat:" we
use a new "bit_clear_unless".

#define bit_clear_unless(ptr, _clear, _test)			\
({								\
	const typeof(*ptr) clear = (_clear), test = (_test);	\
	typeof(*ptr) old, new;					\
								\
	do {							\
		old = ACCESS_ONCE(*ptr);			\
		new = old & ~clear;				\
	} while (!(old & test) && cmpxchg(ptr, old, new) != old);\
								\
	!(old & test);							\
})

The code in md_update_sb() would be

 if (mddev->in_sync != sync_req ||
     !bit_clear_unless(&mddev->flags, BIT(MD_CHANGE_PENDING),
                       BIT(MD_CHANGE_DEVS)|BIT(MD_CHANGE_CLEAN)))
        goto repeat;


So if either DEV or CLEAN we set, PENDING would not be cleared and the
code would goto repeat.

What do you think?

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

next prev parent reply	other threads:[~2016-04-29  1:26 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-21  5:58 [PATCH 00/13] The latest patches for md-cluster Guoqing Jiang
2016-04-21  5:58 ` [PATCH 01/13] md-cluster: change resync lock from asynchronous to synchronous Guoqing Jiang
2016-04-21  6:20   ` kbuild test robot
2016-04-21  6:55   ` [PATCH V2 " Guoqing Jiang
2016-04-21  5:58 ` [PATCH 02/13] md-cluser: make resync_finish only called after pers->sync_request Guoqing Jiang
2016-04-21  5:58 ` [PATCH 03/13] md-cluster: wake up thread to continue recovery Guoqing Jiang
2016-04-21  5:58 ` [PATCH 04/13] md-cluster: unregister thread if err happened Guoqing Jiang
2016-04-21  5:58 ` [PATCH 05/13] md-cluster: fix locking when node joins cluster during message broadcast Guoqing Jiang
2016-04-21  5:58 ` [PATCH 06/13] md-cluster: change array_sectors and update size are not supported Guoqing Jiang
2016-04-21  5:58 ` [PATCH 07/13] md-cluster: wakeup thread if activated a spare disk Guoqing Jiang
2016-04-21  5:58 ` [PATCH 08/13] md: set MD_CHANGE_PENDING in a spinlocked region Guoqing Jiang
2016-04-21  6:26   ` kbuild test robot
2016-04-21  6:58   ` [PATCH V2 " Guoqing Jiang
2016-04-25 17:32     ` Shaohua Li
2016-04-26  3:19       ` Guoqing Jiang
2016-04-27  1:56     ` [PATCH V3 " Guoqing Jiang
2016-04-27 15:27       ` Shaohua Li
2016-04-28  2:55         ` Guoqing Jiang
2016-04-28  3:58           ` Shaohua Li
2016-04-29  1:26             ` NeilBrown [this message]
2016-04-29 18:21               ` Shaohua Li
2016-04-21  5:58 ` [PATCH 09/13] md-cluster: always setup in-memory bitmap Guoqing Jiang
2016-04-21  7:00   ` kbuild test robot
2016-04-21  7:00   ` [PATCH] md-cluster: fix ifnullfree.cocci warnings kbuild test robot
2016-04-21  9:10     ` Guoqing Jiang
2016-04-25 17:45   ` [PATCH 09/13] md-cluster: always setup in-memory bitmap Shaohua Li
2016-04-26  3:22     ` Guoqing Jiang
2016-04-27 15:24       ` Shaohua Li
2016-04-28  2:59         ` Guoqing Jiang
2016-04-21  5:58 ` [PATCH 10/13] md-cluster: sync bitmap when node received RESYNCING msg Guoqing Jiang
2016-04-21  5:58 ` [PATCH 11/13] md-cluster/bitmap: fix wrong calcuation of offset Guoqing Jiang
2016-04-21  5:58 ` [PATCH 12/13] md-cluster/bitmap: fix wrong page num in bitmap_file_clear_bit and bitmap_file_set_bit Guoqing Jiang
2016-04-21  5:58 ` [PATCH 13/13] md-cluster/bitmap: unplug bitmap to sync dirty pages to disk Guoqing Jiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87eg9purw7.fsf@notabene.neil.brown.name \
    --to=neilb@suse.de \
    --cc=gqjiang@suse.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=shli@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).