Re: [rfc] "fair" rw spinlocks

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andi Kleen <andi@firstfloor.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@elte.hu>,
	Christoph Hellwig <hch@infradead.org>,
	Nick Piggin <npiggin@suse.de>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Oleg Nesterov <oleg@redhat.com>
Subject: Re: [rfc] "fair" rw spinlocks
Date: Mon, 7 Dec 2009 18:37:03 -0800	[thread overview]
Message-ID: <20091208023703.GY6808@linux.vnet.ibm.com> (raw)
In-Reply-To: <m1hbs2cize.fsf@fess.ebiederm.org>

On Mon, Dec 07, 2009 at 06:11:49PM -0800, Eric W. Biederman wrote:
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> writes:
> 
> > On Mon, Dec 07, 2009 at 03:19:59PM -0800, Eric W. Biederman wrote:
> >> Andi Kleen <andi@firstfloor.org> writes:
> >> 
> >> > ebiederm@xmission.com (Eric W. Biederman) writes:
> >> >
> >> >> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> writes:
> >> >>>
> >> >>> Is it required that all of the processes see the signal before the
> >> >>> corresponding interrupt handler returns?  (My guess is "no", which
> >> >>> enables a trick or two, but thought I should ask.)
> >> >>
> >> >> Not that I recall.  I think it is just an I/O completed signal.
> >> >
> >> > Wasn't there the sysrq SAK too? That one definitely would need
> >> > to be careful about synchronicity.
> >> 
> >> SAK from sysrq is done through schedule work, I seem to recall the
> >> locking being impossible otherwise.  There is also send_sig_all and a
> >> few others from sysrq.  I expect we could legitimately make them
> >> schedule_work as well if needed.
> >
> > OK, I will chance it...  Here is one possible trick:
> >
> > o	Maintain a list of ongoing group-signal operations, protected
> > 	by some suitable lock.  These could be in a per-chain-locked
> > 	hash table, hashed by the signal target (e.g., pgrp).
> >
> > o	When a task is created, it scans the above list, committing
> > 	suicide (or doing whatever the signal requires) if appropriate.
> >
> > o	When creating a child task, the parent holds an SRCU across
> > 	creation.  It acquires SRCU before starting creation, and
> > 	releases it when it knows that the child has completed
> > 	scanning the above list.
> >
> > o	The updater does the following:
> >
> > 	o	Add its request to the above list.
> >
> > 	o	Wait for an SRCU grace period to elapse.
> >
> > 	o	Kill off everything currently in the task list,
> > 		and then wait for each such task to get to a point
> > 		where it can be guaranteed not to spawn additional
> > 		tasks.  (This might be mediated via a reference
> > 		count in the corresponding list element, or by
> > 		rescanning the task list, or any of a number of
> > 		similar tricks.)
> >
> > 		Of course, if the signal is non-fatal, then it is
> > 		necessary only to wait until the child has taken
> > 		the signal.
> >
> > 	o	If it is possible for a given task's children to
> > 		outlive it, despite the fact that the children must
> > 		commit suicide upon finding themselves indicated by the
> > 		list, wait for another SRCU grace period to elapse.
> > 		(This additional SRCU grace period would be required
> > 		for a non-fatal pgrp signal, for example.)
> >
> > 	o	Remove the element from the list.
> >
> > Does this approach make sense, or am I misunderstanding the problem?
> 
> I think that is about right.  I played with that idea a little bit.
> I was thinking of simply having new children return -ERESTARTSYS, and
> retry the fork.  I put it down because I decided that seems like a
> very twisted implementation of a read/write lock.
> 
> If we can scale noticeably better a than tasklist_lock it is
> definitely worth doing.  I think it is really easy to tie yourself up
> in pretzels thinking about this.

No argument here!!!

> An srcu in the pid structure that we hold while signaling tasks.
> Interesting.

;-)

> > Either way, one additional question...  It seems to me that non-fatal
> > signals really don't require the above mechanism, because if a task
> > handles the signal, and then spawns a child, one can argue that the
> > child came after the signal and should thus be unaffected.  Right?
> > Or more confusion on my part?
> 
> SIGSTOP also seems pretty important not to escape.  I'm not certain of
> the others.  I think I would get a bit upset if job control signals in
> the shell stopped working properly.  I think asking the question did
> that app do something wrong with SIGTERM or did the kernel drop it
> would drive me a bit batty.

Good point!!!  It does indeed seem to me that SIGSTOP needs to be
handled as carefully as does a fatal signal.  I guess that SIGCONT
is easier, at least unless there is some tricky way that a stopped
task can nevertheless spawn a new task.  ;-)

> It is hard to tell what breaks because most buggy implementations will
> work correctly most of the time.

Indeed you are quite right, and it is thus worthwhile burning a few
extra CPU cycles to faithfully emulate the old behavior.

							Thanx, Paul

next prev parent reply	other threads:[~2009-12-08  2:37 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-23 14:54 [rfc] "fair" rw spinlocks Nick Piggin
2009-11-24 20:19 ` David Miller
2009-11-25  6:52   ` Nick Piggin
2009-11-25  8:49   ` Andi Kleen
2009-11-25  8:56     ` Nick Piggin
2009-11-24 20:47 ` Andi Kleen
2009-11-25  6:54   ` Nick Piggin
2009-11-25  8:48     ` Andi Kleen
2009-11-25 13:09       ` Arnd Bergmann
2009-11-28  2:07 ` Paul E. McKenney
2009-11-28 11:15   ` Andi Kleen
2009-11-28 15:20     ` Paul E. McKenney
2009-11-28 17:30 ` Linus Torvalds
2009-11-29 18:51   ` Paul E. McKenney
2009-11-30  7:57     ` Nick Piggin
2009-11-30  7:55   ` Nick Piggin
2009-11-30 15:22     ` Linus Torvalds
2009-11-30 15:40       ` Nick Piggin
2009-11-30 16:07         ` Linus Torvalds
2009-11-30 16:17           ` Nick Piggin
2009-11-30 16:39           ` Paul E. McKenney
2009-11-30 17:05             ` Linus Torvalds
2009-11-30 17:13               ` Nick Piggin
2009-11-30 17:18                 ` Linus Torvalds
2009-12-01 17:03                   ` Arnd Bergmann
2009-12-01 17:15                     ` Linus Torvalds
2009-11-30 18:29                 ` Paul E. McKenney
2009-11-30 16:20     ` Paul E. McKenney
2009-11-30 10:00   ` Christoph Hellwig
2009-11-30 15:52     ` Linus Torvalds
2009-11-30 17:46       ` Ingo Molnar
2009-11-30 21:12         ` Thomas Gleixner
2009-11-30 21:27           ` Peter Zijlstra
2009-11-30 22:02             ` Thomas Gleixner
2009-11-30 22:11               ` Linus Torvalds
2009-11-30 22:37                 ` Thomas Gleixner
2009-11-30 22:49                   ` Linus Torvalds
2009-12-01 17:37                     ` [PATCH] audit: Call tty_audit_push_task() outside preempt disabled region Thomas Gleixner
2009-12-01 18:22                       ` Oleg Nesterov
2009-12-01 19:53                         ` Thomas Gleixner
2009-12-06  3:12                     ` [rfc] "fair" rw spinlocks Eric W. Biederman
2009-12-07 18:18                       ` Paul E. McKenney
2009-12-07 22:24                         ` Eric W. Biederman
2009-12-07 22:35                           ` Andi Kleen
2009-12-07 23:19                             ` Eric W. Biederman
2009-12-08  1:39                               ` Paul E. McKenney
2009-12-08  2:11                                 ` Eric W. Biederman
2009-12-08  2:37                                   ` Paul E. McKenney [this message]
2009-12-07 18:32                       ` Oleg Nesterov
2009-12-07 20:38                         ` Peter Zijlstra
2009-12-09 15:55                           ` Oleg Nesterov
2009-12-07 22:10                         ` Eric W. Biederman
2009-12-09 15:37                           ` Oleg Nesterov
2009-12-10  3:36                             ` Eric W. Biederman
2009-12-10  6:22                             ` Paul E. McKenney
2009-12-10 10:31                               ` Eric W. Biederman
2009-12-10 16:41                                 ` Paul E. McKenney
2009-12-01 19:01 ` Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091208023703.GY6808@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=andi@firstfloor.org \
    --cc=ebiederm@xmission.com \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=npiggin@suse.de \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.