public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* crazy idea: big percpu lock (Re: task isolation)
@ 2015-10-08 21:25 Andy Lutomirski
  2015-10-08 22:01 ` Christoph Lameter
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Andy Lutomirski @ 2015-10-08 21:25 UTC (permalink / raw)
  To: Chris Metcalf
  Cc: Luiz Capitulino, Gilad Ben Yossef, Steven Rostedt, Ingo Molnar,
	Peter Zijlstra, Andrew Morton, Rik van Riel, Tejun Heo,
	Frederic Weisbecker, Thomas Gleixner, Paul E. McKenney,
	Christoph Lameter, Viresh Kumar, Catalin Marinas, Will Deacon,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	Linus Torvalds

This whole isolation vs vmstat, etc thing made me think:

It seems to me that a big part of the problem is that there's all
kinds of per-cpu deferred housekeeping work that can be done on the
CPU in question without any complicated or heavyweight locking but
that can't be done remotely without a mess.  This presumably includes
vmstat, draining the LRU list, etc.  This is a problem for allowing
CPUs to spend a long time without any interrupts.

I want to propose a new primitive that might go a long way toward
solving this issue.  The new primitive would be called the "big percpu
lock".  Non-nohz CPUs would hold their big percpu lock all the time.
Nohz CPUs would hold it all the time unless idle.  Full nohz cpus
would hold it all the time except when idle or in user mode.  No CPU
promises to hold it while processing an NMI or similar NMI-like work.

This should help in a ton of cases.

For vunmap global kernel TLB flushes, we could stick the flushes in a
list of deferred flushes to be processed on entry, and that list would
be protected by the big percpu lock.  For any kind of draining of
non-NMI-safe percpu data (LRU, vmstat, whatever), we could have a
housekeeping cpu try to do it using the big percpu lock

There's a race here that affects task isolation.  On exit to user
mode, there's no obvious way to tell that an IPI is already pending.
We could add that, too: whenever we send an IPI to a nohz_full CPU, we
increment a percpu pending IPI count, then try to get the big percpu
lock, and then, if we fail, send the IPI.  IOW, we might want a helper
that takes a remote big percpu lock or calls a remote function that
guards against this race.

Thoughts?  Am I nuts?

--Andy

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-11-10 14:19 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-08 21:25 crazy idea: big percpu lock (Re: task isolation) Andy Lutomirski
2015-10-08 22:01 ` Christoph Lameter
2015-10-08 22:28   ` Andy Lutomirski
2015-10-09 11:24     ` Christoph Lameter
2015-10-09  9:08 ` Peter Zijlstra
2015-10-09  9:27   ` Thomas Gleixner
2015-10-09 18:56     ` Andy Lutomirski
2015-10-28 18:42 ` Chris Metcalf
2015-10-28 18:45   ` Andy Lutomirski
2015-11-10 14:19     ` Rik van Riel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox