From: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: mc@linux.vnet.ibm.com, Stephen Boyd <sboyd@codeaurora.org>,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
Nick Piggin <npiggin@kernel.dk>,
david@fromorbit.com,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
Maciej Rutecki <maciej.rutecki@gmail.com>
Subject: Re: [PATCH] VFS: br_write_lock locks on possible CPUs other than online CPUs
Date: Wed, 21 Dec 2011 00:42:04 +0530 [thread overview]
Message-ID: <4EF0DE04.6030604@linux.vnet.ibm.com> (raw)
In-Reply-To: <20111220175919.GE23916@ZenIV.linux.org.uk>
On 12/20/2011 11:29 PM, Al Viro wrote:
> On Tue, Dec 20, 2011 at 08:05:32PM +0530, Srivatsa S. Bhat wrote:
>
>> Sorry but I didn't quite get your point...
>> No two cpu hotplug operations can race because of the cpu_hotplug lock they
>> use. Hence, if a cpu online operation begins, it has to succeed or fail
>> eventually. No other cpu hotplug operation can intervene. Ditto for cpu offline
>> operations.
>>
>> Hence a CPU_UP_PREPARE event *will* be followed by a corresponding
>> CPU_UP_CANCELED or CPU_ONLINE event for the same cpu. (And we ignore the
>> CPU_STARTING event that comes in between, on purpose, so as to avoid the race
>> with cpu_online_mask). Similar is the story for offline operation.
>>
>> And if the notifier grabs the spinlock and keeps it locked between these 2
>> points of a cpu hotplug operation, it ensures that our br locks will spin,
>> instead of block till the cpu hotplug operation is complete. Isn't this what
>> we desired all along? "A non-blocking way to sync br locks with cpu hotplug"?
>>
>> Or am I missing something?
>
> The standard reason why losing the timeslice while holding a spinlock means
> deadlocks?
> CPU1: grabs spinlock
> CPU[2..n]: tries to grab the same spinlock, spins
> CPU1: does something blocking, process loses timeslice
> CPU1: whatever got scheduled there happens to to try and grab the same
> spinlock and you are stuck. At that point *all* CPUs are spinning on
> that spinlock and your code that would eventually unlock it has no chance
> to get any CPU to run on.
>
> Having the callback grab and release a spinlock is fine (as long as you
> don't do anything blocking between these spin_lock/spin_unlock). Having
> it leave with spinlock held, though, means that the area where you can't
> block has expanded a whole lot. As I said, brittle...
>
Ah, now I see your point! Thanks for the explanation.
> A quick grep through the actual callbacks immediately shows e.g.
> del_timer_sync() done on CPU_DOWN_PREPARE. And sysfs_remove_group(),
> which leads to outright mutex_lock(). And sysfs_remove_link() (ditto).
> And via_cputemp_device_remove() (again, mutex_lock()). And free_irq().
> And perf_event_exit_cpu() (mutex_lock()). And...
>
> IOW, there are shitloads of deadlocks right there. If your callback's
> position in the chain is earlier than any of those, you are screwed.
>
The thought makes me shudder!
> No, what I had in mind was different - use the callbacks to maintain a
> bitmap that would contain
> a) all CPUs that were online at the moment
> b) ... and not too much else
> Updates protected by spinlock; in all cases it gets dropped before the
> callback returns. br_write_lock() grabs that spinlock and iterates over
> the set; it *does* leave the spinlock grabbed - that's OK, since all
> code between br_write_lock() and br_write_unlock() must be non-blocking
> anyway. br_write_unlock() iterates over the same bitmap (unchanged since
> br_write_lock()) and finally drops the spinlock.
>
I had this same thing in mind when I started out to write the patch.. but
after Cong raised that concern, I changed track and in the meantime, tried
to get rid of maintaining our own bitmap as well...
But unfortunately that turned out to be disastrous!
> AFAICS, what we want in callback is
> CPU_DEAD, CPU_DEAD_FROZEN, CPU_UP_CANCELLED, CPU_UP_CANCELLED_FROZEN:
> grab spinlock
> remove cpu from bitmap
> drop spinlock
> CPU_UP_PREPARE, CPU_UP_PREPARE_FROZEN
> grab spinlock
> add cpu to bitmap
> drop spinlock
> That ought to keep bitmap close to cpu_online_mask, which is enough for
> our purposes.
>
Yes, that should do. And while initializing our bitmap, we could use
get_online_cpus()
make a copy of cpu_online_mask
put_online_cpus()
since blocking here is acceptable, as this is done in the lock_init() code.
Right?
That would be better than
register_hotcpu_notifier(...);
grab spinlock
for_each_online_cpu(N)
add N to bitmap
release spinlock
because the latter code is not fully race-free (because we don't handle
CPU_DOWN_PREPARE event in the callback and hence cpu_online_mask can get
updated in-between). But it would still work since cpus going down don't
really pose problems for us.
However the former code is race-free, and we can afford it since we are
free to block at that point.
Regards,
Srivatsa S. Bhat
next prev parent reply other threads:[~2011-12-20 19:12 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-12-19 3:36 [PATCH] VFS: br_write_lock locks on possible CPUs other than online CPUs mengcong
2011-12-19 4:11 ` Al Viro
2011-12-19 5:00 ` Dave Chinner
2011-12-19 6:07 ` mengcong
2011-12-19 7:31 ` Srivatsa S. Bhat
2011-12-19 9:12 ` Stephen Boyd
2011-12-19 11:03 ` Srivatsa S. Bhat
2011-12-19 12:11 ` Al Viro
2011-12-19 20:23 ` Srivatsa S. Bhat
2011-12-19 20:52 ` Al Viro
2011-12-20 4:56 ` Srivatsa S. Bhat
2011-12-20 6:27 ` Al Viro
2011-12-20 7:28 ` Srivatsa S. Bhat
2011-12-20 9:37 ` mengcong
2011-12-20 10:36 ` Srivatsa S. Bhat
2011-12-20 11:08 ` Srivatsa S. Bhat
2011-12-20 12:50 ` Srivatsa S. Bhat
2011-12-20 14:06 ` Al Viro
2011-12-20 14:35 ` Srivatsa S. Bhat
2011-12-20 17:59 ` Al Viro
2011-12-20 19:12 ` Srivatsa S. Bhat [this message]
2011-12-20 19:58 ` Al Viro
2011-12-20 22:27 ` Dave Chinner
2011-12-20 23:31 ` Al Viro
2011-12-21 21:15 ` Srivatsa S. Bhat
2011-12-21 22:02 ` Al Viro
2011-12-21 22:12 ` Andrew Morton
2011-12-22 7:02 ` Al Viro
2011-12-22 7:20 ` Andrew Morton
2011-12-22 8:08 ` Al Viro
2011-12-22 8:17 ` Andi Kleen
2011-12-22 8:39 ` Al Viro
2011-12-22 8:22 ` Andi Kleen
2011-12-20 7:30 ` mengcong
2011-12-20 7:37 ` Srivatsa S. Bhat
2011-12-19 23:56 ` Dave Chinner
2011-12-20 4:05 ` Al Viro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4EF0DE04.6030604@linux.vnet.ibm.com \
--to=srivatsa.bhat@linux.vnet.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=david@fromorbit.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=maciej.rutecki@gmail.com \
--cc=mc@linux.vnet.ibm.com \
--cc=npiggin@kernel.dk \
--cc=sboyd@codeaurora.org \
--cc=viro@ZenIV.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.