From: ebiederm@xmission.com (Eric W. Biederman)
To: paulmck@linux.vnet.ibm.com
Cc: Andi Kleen <andi@firstfloor.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Thomas Gleixner <tglx@linutronix.de>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@elte.hu>,
Christoph Hellwig <hch@infradead.org>,
Nick Piggin <npiggin@suse.de>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Oleg Nesterov <oleg@redhat.com>
Subject: Re: [rfc] "fair" rw spinlocks
Date: Mon, 07 Dec 2009 18:11:49 -0800 [thread overview]
Message-ID: <m1hbs2cize.fsf@fess.ebiederm.org> (raw)
In-Reply-To: <20091208013900.GU6808@linux.vnet.ibm.com> (Paul E. McKenney's message of "Mon\, 7 Dec 2009 17\:39\:00 -0800")
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> writes:
> On Mon, Dec 07, 2009 at 03:19:59PM -0800, Eric W. Biederman wrote:
>> Andi Kleen <andi@firstfloor.org> writes:
>>
>> > ebiederm@xmission.com (Eric W. Biederman) writes:
>> >
>> >> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> writes:
>> >>>
>> >>> Is it required that all of the processes see the signal before the
>> >>> corresponding interrupt handler returns? (My guess is "no", which
>> >>> enables a trick or two, but thought I should ask.)
>> >>
>> >> Not that I recall. I think it is just an I/O completed signal.
>> >
>> > Wasn't there the sysrq SAK too? That one definitely would need
>> > to be careful about synchronicity.
>>
>> SAK from sysrq is done through schedule work, I seem to recall the
>> locking being impossible otherwise. There is also send_sig_all and a
>> few others from sysrq. I expect we could legitimately make them
>> schedule_work as well if needed.
>
> OK, I will chance it... Here is one possible trick:
>
> o Maintain a list of ongoing group-signal operations, protected
> by some suitable lock. These could be in a per-chain-locked
> hash table, hashed by the signal target (e.g., pgrp).
>
> o When a task is created, it scans the above list, committing
> suicide (or doing whatever the signal requires) if appropriate.
>
> o When creating a child task, the parent holds an SRCU across
> creation. It acquires SRCU before starting creation, and
> releases it when it knows that the child has completed
> scanning the above list.
>
> o The updater does the following:
>
> o Add its request to the above list.
>
> o Wait for an SRCU grace period to elapse.
>
> o Kill off everything currently in the task list,
> and then wait for each such task to get to a point
> where it can be guaranteed not to spawn additional
> tasks. (This might be mediated via a reference
> count in the corresponding list element, or by
> rescanning the task list, or any of a number of
> similar tricks.)
>
> Of course, if the signal is non-fatal, then it is
> necessary only to wait until the child has taken
> the signal.
>
> o If it is possible for a given task's children to
> outlive it, despite the fact that the children must
> commit suicide upon finding themselves indicated by the
> list, wait for another SRCU grace period to elapse.
> (This additional SRCU grace period would be required
> for a non-fatal pgrp signal, for example.)
>
> o Remove the element from the list.
>
> Does this approach make sense, or am I misunderstanding the problem?
I think that is about right. I played with that idea a little bit.
I was thinking of simply having new children return -ERESTARTSYS, and
retry the fork. I put it down because I decided that seems like a
very twisted implementation of a read/write lock.
If we can scale noticeably better a than tasklist_lock it is
definitely worth doing. I think it is really easy to tie yourself up
in pretzels thinking about this.
An srcu in the pid structure that we hold while signaling tasks.
Interesting.
> Either way, one additional question... It seems to me that non-fatal
> signals really don't require the above mechanism, because if a task
> handles the signal, and then spawns a child, one can argue that the
> child came after the signal and should thus be unaffected. Right?
> Or more confusion on my part?
SIGSTOP also seems pretty important not to escape. I'm not certain of
the others. I think I would get a bit upset if job control signals in
the shell stopped working properly. I think asking the question did
that app do something wrong with SIGTERM or did the kernel drop it
would drive me a bit batty.
It is hard to tell what breaks because most buggy implementations will
work correctly most of the time.
Eric
next prev parent reply other threads:[~2009-12-08 2:11 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-11-23 14:54 [rfc] "fair" rw spinlocks Nick Piggin
2009-11-24 20:19 ` David Miller
2009-11-25 6:52 ` Nick Piggin
2009-11-25 8:49 ` Andi Kleen
2009-11-25 8:56 ` Nick Piggin
2009-11-24 20:47 ` Andi Kleen
2009-11-25 6:54 ` Nick Piggin
2009-11-25 8:48 ` Andi Kleen
2009-11-25 13:09 ` Arnd Bergmann
2009-11-28 2:07 ` Paul E. McKenney
2009-11-28 11:15 ` Andi Kleen
2009-11-28 15:20 ` Paul E. McKenney
2009-11-28 17:30 ` Linus Torvalds
2009-11-29 18:51 ` Paul E. McKenney
2009-11-30 7:57 ` Nick Piggin
2009-11-30 7:55 ` Nick Piggin
2009-11-30 15:22 ` Linus Torvalds
2009-11-30 15:40 ` Nick Piggin
2009-11-30 16:07 ` Linus Torvalds
2009-11-30 16:17 ` Nick Piggin
2009-11-30 16:39 ` Paul E. McKenney
2009-11-30 17:05 ` Linus Torvalds
2009-11-30 17:13 ` Nick Piggin
2009-11-30 17:18 ` Linus Torvalds
2009-12-01 17:03 ` Arnd Bergmann
2009-12-01 17:15 ` Linus Torvalds
2009-11-30 18:29 ` Paul E. McKenney
2009-11-30 16:20 ` Paul E. McKenney
2009-11-30 10:00 ` Christoph Hellwig
2009-11-30 15:52 ` Linus Torvalds
2009-11-30 17:46 ` Ingo Molnar
2009-11-30 21:12 ` Thomas Gleixner
2009-11-30 21:27 ` Peter Zijlstra
2009-11-30 22:02 ` Thomas Gleixner
2009-11-30 22:11 ` Linus Torvalds
2009-11-30 22:37 ` Thomas Gleixner
2009-11-30 22:49 ` Linus Torvalds
2009-12-01 17:37 ` [PATCH] audit: Call tty_audit_push_task() outside preempt disabled region Thomas Gleixner
2009-12-01 18:22 ` Oleg Nesterov
2009-12-01 19:53 ` Thomas Gleixner
2009-12-06 3:12 ` [rfc] "fair" rw spinlocks Eric W. Biederman
2009-12-07 18:18 ` Paul E. McKenney
2009-12-07 22:24 ` Eric W. Biederman
2009-12-07 22:35 ` Andi Kleen
2009-12-07 23:19 ` Eric W. Biederman
2009-12-08 1:39 ` Paul E. McKenney
2009-12-08 2:11 ` Eric W. Biederman [this message]
2009-12-08 2:37 ` Paul E. McKenney
2009-12-07 18:32 ` Oleg Nesterov
2009-12-07 20:38 ` Peter Zijlstra
2009-12-09 15:55 ` Oleg Nesterov
2009-12-07 22:10 ` Eric W. Biederman
2009-12-09 15:37 ` Oleg Nesterov
2009-12-10 3:36 ` Eric W. Biederman
2009-12-10 6:22 ` Paul E. McKenney
2009-12-10 10:31 ` Eric W. Biederman
2009-12-10 16:41 ` Paul E. McKenney
2009-12-01 19:01 ` Mathieu Desnoyers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m1hbs2cize.fsf@fess.ebiederm.org \
--to=ebiederm@xmission.com \
--cc=andi@firstfloor.org \
--cc=hch@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=npiggin@suse.de \
--cc=oleg@redhat.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.