* perf_counters issue with self-sampling threads
@ 2009-07-27 16:51 stephane eranian
2009-07-27 16:56 ` Peter Zijlstra
` (2 more replies)
0 siblings, 3 replies; 49+ messages in thread
From: stephane eranian @ 2009-07-27 16:51 UTC (permalink / raw)
To: Ingo Molnar
Cc: LKML, Andrew Morton, Thomas Gleixner, Robert Richter,
Peter Zijlstra, Paul Mackerras, Andi Kleen, Maynard Johnson,
Carl Love, Corey J Ashford, Philip Mucci, Dan Terpstra,
perfmon2-devel
Hi,
I believe there is a problem with the current perf_counters (PCL)
code for self-sampling threads. The problem is related to sample
notifications via signal.
PCL (just like perfmon) is using SIGIO, an asynchronous signal,
to notify user applications of the availability of data in the event
buffer.
POSIX does not mandate that asynchronous signals be delivered
to the thread in which they originated. Any thread in the process
may process the signal, assuming it does not have the signal
blocked.
This is a serious problem with any self-sampling program such as
those built on top of PAPI. When sampling, you do want the signal
to be delivered to the thread in which the counter overflowed. This is
not just for convenience but it is required if the signal handler needs
to operate on the thread's machine state. Although, there is always
a possibility of forwarding the signal via tkill() to the right thread, I
do not think this is the right solution as it incurs additional latency
and therefore skid.
Looking at the kernel source code related to that, it seems that
kill_fasync() ends up calling group_send_sig_info(). This function
adds the signal to the process SHARED sigpending queue. Then,
it picks a thread to "wakeup". It first tries the thread in which the
signal originated with the following selection test (wants_signal):
- signal is not blocked
- thread is not exiting
- no signal private pending for this thread
If that does not work, it iterates over the other threads of the process.
This explains why in trivial tests, the SIGIO is always delivered
to the right thread. However it the monitored thread is using any
other signals, e.g., SIGALRM, then the SIGIO signal can go to the
wrong thread. The problem also arises if the first SIGIO is not already
processed by the time a 2nd is pended.
For self-sampling, we want (and in fact require) asynchronous notifications.
But we want the extra guarantee that the signal is ALWAYS delivered to
the thread in which the event occurred.
It seems like we could either create a different version of kill_fasync() or
pass an extra parameter to force this function to use specific_send_sig_info().
This would be only when self-monitoring. When a tool is monitoring another
thread, it is probably okay to have the signal delivered to any threads. Most
likely, the tool is setup such that threads not processing notifications have
the signal blocked.
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: perf_counters issue with self-sampling threads 2009-07-27 16:51 perf_counters issue with self-sampling threads stephane eranian @ 2009-07-27 16:56 ` Peter Zijlstra 2009-07-27 21:25 ` Andi Kleen 2009-07-29 12:19 ` Peter Zijlstra 2 siblings, 0 replies; 49+ messages in thread From: Peter Zijlstra @ 2009-07-27 16:56 UTC (permalink / raw) To: eranian Cc: Ingo Molnar, LKML, Andrew Morton, Thomas Gleixner, Robert Richter, Paul Mackerras, Andi Kleen, Maynard Johnson, Carl Love, Corey J Ashford, Philip Mucci, Dan Terpstra, perfmon2-devel On Mon, 2009-07-27 at 18:51 +0200, stephane eranian wrote: > Hi, > > I believe there is a problem with the current perf_counters (PCL) > code for self-sampling threads. The problem is related to sample > notifications via signal. > > PCL (just like perfmon) is using SIGIO, an asynchronous signal, > to notify user applications of the availability of data in the event > buffer. > > POSIX does not mandate that asynchronous signals be delivered > to the thread in which they originated. Any thread in the process > may process the signal, assuming it does not have the signal > blocked. Bugger, you're right. /me kicks POSIX again for creating these crazy ass semantics. I'll look at fixing this. Thanks! ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: perf_counters issue with self-sampling threads 2009-07-27 16:51 perf_counters issue with self-sampling threads stephane eranian 2009-07-27 16:56 ` Peter Zijlstra @ 2009-07-27 21:25 ` Andi Kleen [not found] ` <7c86c4470907272213w2ee57080re50dd22a4d73a7e0@mail.gmail.com> 2009-07-29 12:19 ` Peter Zijlstra 2 siblings, 1 reply; 49+ messages in thread From: Andi Kleen @ 2009-07-27 21:25 UTC (permalink / raw) To: eranian Cc: Ingo Molnar, LKML, Andrew Morton, Thomas Gleixner, Robert Richter, Peter Zijlstra, Paul Mackerras, Andi Kleen, Maynard Johnson, Carl Love, Corey J Ashford, Philip Mucci, Dan Terpstra, perfmon2-devel I wonder how SIGPROF gets around the same problem, or is it buggy too? -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 49+ messages in thread
[parent not found: <7c86c4470907272213w2ee57080re50dd22a4d73a7e0@mail.gmail.com>]
* Re: perf_counters issue with self-sampling threads [not found] ` <7c86c4470907272213w2ee57080re50dd22a4d73a7e0@mail.gmail.com> @ 2009-07-28 8:51 ` stephane eranian 2009-07-28 8:56 ` Andi Kleen 2009-08-04 16:09 ` stephane eranian 1 sibling, 1 reply; 49+ messages in thread From: stephane eranian @ 2009-07-28 8:51 UTC (permalink / raw) To: Andi Kleen Cc: perfmon2-devel, Maynard Johnson, Robert Richter, LKML, Thomas Gleixner, Paul Mackerras, Carl Love, Corey J Ashford, Andrew Morton, Peter Zijlstra, Philip Mucci, Dan Terpstra, Ingo Molnar [Reposting the message because of stupid MIME-encoding error on my part] Andi, Looks like SIGPROF is calling _group_send_sig_info(), so I think it is subject to the same problem. I have built a perfmon example to test the problem, it is relatively easy to trigger by simply self-monitoring a thread which is using setitimer() and thus SIGALRM. You just have to increase the timer frequency. At some point, the SIGPROF signal will be delivered to the wrong thread. One thing I did not mention in my message is that one would think that forcing a specific signal via F_SETSIG could be a workaround if that signal is synchronous, e.g., SIGFPE. F_SETSIG looks like a Linux extension but it does not solve the problem. Linux determines the mode of delivery not on the signal number but on the code path, it seems. If you use F_ASYNC+F_SETOWN, then this is systematically considered asynchronous regardless of the signal used.If you come from traps.c, then the signal is delivered to the correct thread. All of this does not look unreasonable to me. I believe sampling, be it timer or PMU-based, may be a special case here. We want asynchronous + specific thread (only) when self- sampling. An alternative may be to choose the pending queue based on the signal type (synchronous, asynchronous). Then, we could use F_SETSIG to override SIGIO with a synchronous signal instead. But I am not a POSIX signal expert. On Jul 27, 2009 11:25 PM, "Andi Kleen" <andi@firstfloor.org> wrote: I wonder how SIGPROF gets around the same problem, or is it buggy too? -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: perf_counters issue with self-sampling threads 2009-07-28 8:51 ` stephane eranian @ 2009-07-28 8:56 ` Andi Kleen 2009-07-28 9:13 ` stephane eranian 0 siblings, 1 reply; 49+ messages in thread From: Andi Kleen @ 2009-07-28 8:56 UTC (permalink / raw) To: eranian Cc: Andi Kleen, perfmon2-devel, Maynard Johnson, Robert Richter, LKML, Thomas Gleixner, Paul Mackerras, Carl Love, Corey J Ashford, Andrew Morton, Peter Zijlstra, Philip Mucci, Dan Terpstra, Ingo Molnar On Tue, Jul 28, 2009 at 10:51:04AM +0200, stephane eranian wrote: > [Reposting the message because of stupid MIME-encoding error on my part] > Andi, > > Looks like SIGPROF is calling _group_send_sig_info(), so I > think it is subject to the same problem. So sounds like a whole class of signals can't be POSIX compliant. Likely others will run into the same problem. Perhaps should define a new sigaction flag for this to make it all explicit. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: perf_counters issue with self-sampling threads 2009-07-28 8:56 ` Andi Kleen @ 2009-07-28 9:13 ` stephane eranian 0 siblings, 0 replies; 49+ messages in thread From: stephane eranian @ 2009-07-28 9:13 UTC (permalink / raw) To: Andi Kleen Cc: perfmon2-devel, Maynard Johnson, Robert Richter, LKML, Thomas Gleixner, Paul Mackerras, Carl Love, Corey J Ashford, Andrew Morton, Peter Zijlstra, Philip Mucci, Dan Terpstra, Ingo Molnar Andi, On Tue, Jul 28, 2009 at 10:56 AM, Andi Kleen<andi@firstfloor.org> wrote: > On Tue, Jul 28, 2009 at 10:51:04AM +0200, stephane eranian wrote: >> [Reposting the message because of stupid MIME-encoding error on my part] >> Andi, >> >> Looks like SIGPROF is calling _group_send_sig_info(), so I >> think it is subject to the same problem. > > So sounds like a whole class of signals can't be POSIX compliant. > Well, the problem is that I don't where to find the POSIX spec that defines signal types and how they should be handled in multi-threaded programs. That would be a good starting point. > Likely others will run into the same problem. > > Perhaps should define a new sigaction flag for this to make > it all explicit. > That's probably a clean way of doing this. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: perf_counters issue with self-sampling threads [not found] ` <7c86c4470907272213w2ee57080re50dd22a4d73a7e0@mail.gmail.com> 2009-07-28 8:51 ` stephane eranian @ 2009-08-04 16:09 ` stephane eranian 1 sibling, 0 replies; 49+ messages in thread From: stephane eranian @ 2009-08-04 16:09 UTC (permalink / raw) To: Andi Kleen Cc: perfmon2-devel, Maynard Johnson, Robert Richter, LKML, Thomas Gleixner, Paul Mackerras, Carl Love, Corey J Ashford, eranian, Andrew Morton, Peter Zijlstra, Philip Mucci, Dan Terpstra, Ingo Molnar On Tue, Jul 28, 2009 at 7:13 AM, stephane eranian<eranian@googlemail.com> wrote: > Andi, > > Looks like SIGPROF is calling _group_send_sig_info(), so I think it is > subject to the same problem. > I did not look into SIGPROF a bit more. First, SIGPROF is generated from ITIMER_PROF. My understanding is that this is a global timer for the process. It may therefore fire in any thread. Then SIGPROF is pended to the shared signal queue as per the group_send_sig_info() code path. That means the thread receiving the signal may not be the one in which the timer expired. But typically things even out. But things change if the monitored process is using signal, and in particular signals pended to the private signal queue which is what happens with pthread_kill() I would think. Then, yes there may be an imbalance. But in the case of SIGPROF, it is not clear to me if you want to change this behavior as well. It all depends on the definition for ITIMER_PROF. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: perf_counters issue with self-sampling threads 2009-07-27 16:51 perf_counters issue with self-sampling threads stephane eranian 2009-07-27 16:56 ` Peter Zijlstra 2009-07-27 21:25 ` Andi Kleen @ 2009-07-29 12:19 ` Peter Zijlstra 2009-07-29 12:37 ` stephane eranian 2009-07-29 22:17 ` Oleg Nesterov 2 siblings, 2 replies; 49+ messages in thread From: Peter Zijlstra @ 2009-07-29 12:19 UTC (permalink / raw) To: eranian Cc: Ingo Molnar, LKML, Andrew Morton, Thomas Gleixner, Robert Richter, Paul Mackerras, Andi Kleen, Maynard Johnson, Carl Love, Corey J Ashford, Philip Mucci, Dan Terpstra, perfmon2-devel, Michael Kerrisk, oleg On Mon, 2009-07-27 at 18:51 +0200, stephane eranian wrote: > I believe there is a problem with the current perf_counters (PCL) > code for self-sampling threads. The problem is related to sample > notifications via signal. > > PCL (just like perfmon) is using SIGIO, an asynchronous signal, > to notify user applications of the availability of data in the event > buffer. > > POSIX does not mandate that asynchronous signals be delivered > to the thread in which they originated. Any thread in the process > may process the signal, assuming it does not have the signal > blocked. This signal stuff makes my head spin a little, however: fcntl(2) for F_SETOWN says: If a non-zero value is given to F_SETSIG in a multi‐ threaded process running with a threading library that supports thread groups (e.g., NPTL), then a positive value given to F_SETOWN has a different meaning: instead of being a process ID identifying a whole pro‐ cess, it is a thread ID identifying a specific thread within a process. Consequently, it may be necessary to pass F_SETOWN the result of gettid(2) instead of get‐ pid(2) to get sensible results when F_SETSIG is used. (In current Linux threading implementations, a main thread’s thread ID is the same as its process ID. This means that a single-threaded program can equally use gettid(2) or getpid(2) in this scenario.) Note, how‐ ever, that the statements in this paragraph do not apply to the SIGURG signal generated for out-of-band data on a socket: this signal is always sent to either a process or a process group, depending on the value given to F_SETOWN. Note also that Linux imposes a limit on the number of real-time signals that may be queued to a process (see getrlimit(2) and signal(7)) and if this limit is reached, then the kernel reverts to delivering SIGIO, and this signal is delivered to the entire process rather than to a specific thread. Which seems to imply that when we feed fcntl(F_SETOWN) a TID instead of a PID it should deliver SIGIO to the thread instead of the whole process -- which, to me, seems a sane semantic. However, kill_fasync(SIGIO) __kill_fasync() send_sigio() /* if pid_type is a PIDTYPE_PID and pid a TID this should only iterate the one thread, I think */ do_each_pid_task() { send_sigio_to_task(); } while_each_pid_task(); where: send_sigio_to_task() group_send_sig_info() __group_send_sig_info() send_signal(.group = 1) /* uh-ow trouble */ __send_signal() if (group) pending = &t->signal->shared_pending which will result in the signal being send to the whole process anyway. Now I was considering teaching send_sigio_to_task() to use specific_send_sig_info() when fown->pid != fown->group_leader->pid or something, but I'm not sure that won't break anything. Alternatively, I've missed a detail and I either read the manpage wrong, or the code, or both of them. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: perf_counters issue with self-sampling threads 2009-07-29 12:19 ` Peter Zijlstra @ 2009-07-29 12:37 ` stephane eranian 2009-07-29 12:46 ` Peter Zijlstra 2009-07-29 22:17 ` Oleg Nesterov 1 sibling, 1 reply; 49+ messages in thread From: stephane eranian @ 2009-07-29 12:37 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, LKML, Andrew Morton, Thomas Gleixner, Robert Richter, Paul Mackerras, Andi Kleen, Maynard Johnson, Carl Love, Corey J Ashford, Philip Mucci, Dan Terpstra, perfmon2-devel, Michael Kerrisk, oleg Peter, On Wed, Jul 29, 2009 at 2:19 PM, Peter Zijlstra<a.p.zijlstra@chello.nl> wrote: > On Mon, 2009-07-27 at 18:51 +0200, stephane eranian wrote: >> I believe there is a problem with the current perf_counters (PCL) >> code for self-sampling threads. The problem is related to sample >> notifications via signal. >> >> PCL (just like perfmon) is using SIGIO, an asynchronous signal, >> to notify user applications of the availability of data in the event >> buffer. >> >> POSIX does not mandate that asynchronous signals be delivered >> to the thread in which they originated. Any thread in the process >> may process the signal, assuming it does not have the signal >> blocked. > > This signal stuff makes my head spin a little, however: > > fcntl(2) for F_SETOWN says: > > If a non-zero value is given to F_SETSIG in a multi‐ threaded > process running with a threading library that supports thread groups > (e.g., NPTL), then a positive value given to F_SETOWN has a > different meaning: instead of being a process ID identifying a whole > pro‐ cess, it is a thread ID identifying a specific thread within a > process. Consequently, it may be necessary to pass F_SETOWN the > result of gettid(2) instead of get‐ pid(2) to get sensible results > when F_SETSIG is used. (In current Linux threading > implementations, a main thread’s thread ID is the same as its process > ID. This means that a single-threaded program can equally use > gettid(2) or getpid(2) in this scenario.) Note, how‐ ever, that > the statements in this paragraph do not apply to the SIGURG signal > generated for out-of-band data on a socket: this signal is always > sent to either a process or a process group, depending on the value > given to F_SETOWN. Note also that Linux imposes a limit on the > number of real-time signals that may be queued to a process (see > getrlimit(2) and signal(7)) and if this limit is reached, then the > kernel reverts to delivering SIGIO, and this signal is delivered > to the entire process rather than to a specific thread. > > > Which seems to imply that when we feed fcntl(F_SETOWN) a TID instead of > a PID it should deliver SIGIO to the thread instead of the whole process > -- which, to me, seems a sane semantic. > Yes, I remember that manpage. I got the same impression and in fact that is what I document in some of my test programs. So you read this right. > However, > > kill_fasync(SIGIO) > __kill_fasync() > send_sigio() > /* if pid_type is a PIDTYPE_PID and pid a TID this should > only iterate the one thread, I think */ > do_each_pid_task() { > send_sigio_to_task(); > } while_each_pid_task(); > > where: > > send_sigio_to_task() > group_send_sig_info() > __group_send_sig_info() > send_signal(.group = 1) /* uh-ow trouble */ > __send_signal() > if (group) > pending = &t->signal->shared_pending > > which will result in the signal being send to the whole process anyway. > Exactly! That is the code path and this is why this does not work as expected. Nowhere along that path is there special casing for that F_SETOWN of tid vs. pid. kill_fasync() implies group. > > Now I was considering teaching send_sigio_to_task() to use > specific_send_sig_info() when fown->pid != fown->group_leader->pid or > something, but I'm not sure that won't break anything. > Yes, that's the problem with touching this. I don't know if this will break things. That's why I was suggested creating a parallel code path which does what we want without modifying the existing path. Unless you know some signal expert at redhat or elsewhere. > Alternatively, I've missed a detail and I either read the manpage wrong, > or the code, or both of them. > The code does not correspond to the manpage. Not clear which one is correct though. This F_SETOWN trick looks very Linux specific. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: perf_counters issue with self-sampling threads 2009-07-29 12:37 ` stephane eranian @ 2009-07-29 12:46 ` Peter Zijlstra 0 siblings, 0 replies; 49+ messages in thread From: Peter Zijlstra @ 2009-07-29 12:46 UTC (permalink / raw) To: eranian Cc: Ingo Molnar, LKML, Andrew Morton, Thomas Gleixner, Robert Richter, Paul Mackerras, Andi Kleen, Maynard Johnson, Carl Love, Corey J Ashford, Philip Mucci, Dan Terpstra, perfmon2-devel, Michael Kerrisk, oleg On Wed, 2009-07-29 at 14:37 +0200, stephane eranian wrote: > > > > > Now I was considering teaching send_sigio_to_task() to use > > specific_send_sig_info() when fown->pid != fown->group_leader->pid or > > something, but I'm not sure that won't break anything. > > > Yes, that's the problem with touching this. I don't know if this will break > things. That's why I was suggested creating a parallel code path which > does what we want without modifying the existing path. Unless you know > some signal expert at redhat or elsewhere. His name is Oleg, and he's on CC ;-) > > Alternatively, I've missed a detail and I either read the manpage wrong, > > or the code, or both of them. > > > The code does not correspond to the manpage. Not clear which one > is correct though. This F_SETOWN trick looks very Linux specific. Linus specific sounds good enough to me. Michael might have something so say on this though... ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: perf_counters issue with self-sampling threads 2009-07-29 12:19 ` Peter Zijlstra 2009-07-29 12:37 ` stephane eranian @ 2009-07-29 22:17 ` Oleg Nesterov 2009-07-30 11:31 ` Peter Zijlstra 1 sibling, 1 reply; 49+ messages in thread From: Oleg Nesterov @ 2009-07-29 22:17 UTC (permalink / raw) To: Peter Zijlstra Cc: eranian, Ingo Molnar, LKML, Andrew Morton, Thomas Gleixner, Robert Richter, Paul Mackerras, Andi Kleen, Maynard Johnson, Carl Love, Corey J Ashford, Philip Mucci, Dan Terpstra, perfmon2-devel, Michael Kerrisk (add Roland) On 07/29, Peter Zijlstra wrote: > > On Mon, 2009-07-27 at 18:51 +0200, stephane eranian wrote: > > > > POSIX does not mandate that asynchronous signals be delivered > > to the thread in which they originated. Any thread in the process > > may process the signal, assuming it does not have the signal > > blocked. Yes. I now nothing about POSIX, but this is what Linux does at least. I don't think we can/should change this behaviour. > fcntl(2) for F_SETOWN says: > > If a non-zero value is given to F_SETSIG in a multi‐ threaded > process running with a threading library that supports thread groups > (e.g., NPTL), then a positive value given to F_SETOWN has a > different meaning: instead of being a process ID identifying a whole > pro‐ cess, it is a thread ID identifying a specific thread within a > process. Heh. Definitely this is not what Linux does ;) > Which seems to imply that when we feed fcntl(F_SETOWN) a TID instead of > a PID it should deliver SIGIO to the thread instead of the whole process > -- which, to me, seems a sane semantic. I am not sure I understand the man above... But to me it looks like we should always send a private signal when fown->signum != 0 ? The change should be simple, but as you pointed out we can break things. Oleg. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: perf_counters issue with self-sampling threads 2009-07-29 22:17 ` Oleg Nesterov @ 2009-07-30 11:31 ` Peter Zijlstra 2009-07-30 19:20 ` Oleg Nesterov 0 siblings, 1 reply; 49+ messages in thread From: Peter Zijlstra @ 2009-07-30 11:31 UTC (permalink / raw) To: Oleg Nesterov Cc: eranian, Ingo Molnar, LKML, Andrew Morton, Thomas Gleixner, Robert Richter, Paul Mackerras, Andi Kleen, Maynard Johnson, Carl Love, Corey J Ashford, Philip Mucci, Dan Terpstra, perfmon2-devel, Michael Kerrisk, roland On Thu, 2009-07-30 at 00:17 +0200, Oleg Nesterov wrote: > (add Roland) but you seem to have forgotten to actually edit the CC line, fixed that ;-) > On 07/29, Peter Zijlstra wrote: > > > > On Mon, 2009-07-27 at 18:51 +0200, stephane eranian wrote: > > > > > > POSIX does not mandate that asynchronous signals be delivered > > > to the thread in which they originated. Any thread in the process > > > may process the signal, assuming it does not have the signal > > > blocked. > > Yes. I now nothing about POSIX, but this is what Linux does at least. > I don't think we can/should change this behaviour. Well, we have plenty exceptions to that rule already, we have itimer extentions, tkill sys_rt_tgsigqueueinfo and plenty more.. > > fcntl(2) for F_SETOWN says: > > > > If a non-zero value is given to F_SETSIG in a multi‐ threaded > > process running with a threading library that supports thread groups > > (e.g., NPTL), then a positive value given to F_SETOWN has a > > different meaning: instead of being a process ID identifying a whole > > pro‐ cess, it is a thread ID identifying a specific thread within a > > process. > > Heh. Definitely this is not what Linux does ;) Right, so the question is, did we ever? Why does the man page say this. Looking at the .12 source (git start) we did: 440 if (!send_sig_info(fown->signum, &si, p)) 441 break; 442 /* fall-through: fall back on the old plain SIGIO signal */ 443 case 0: 444 send_group_sig_info(SIGIO, SEND_SIG_PRIV, p); Which was 'corrected' in: commit fc9c9ab22d5650977c417ef2032d02f455011b23 Author: Bharath Ramesh <bramesh@vt.edu> Date: Sat Apr 16 15:25:41 2005 -0700 [PATCH] AYSNC IO using singals other than SIGIO A question on sigwaitinfo based IO mechanism in multithreaded applications. I am trying to use RT signals to notify me of IO events using RT signals instead of SIGIO in a multithreaded applications. I noticed that there was some discussion on lkml during november 1999 with the subject of the discussion as "Signal driven IO". In the thread I noticed that RT signals were being delivered to the worker thread. I am running 2.6.10 kernel and I am trying to use the very same mechanism and I find that only SIGIO being propogated to the worker threads and RT signals only being propogated to the main thread and not the worker threads where I actually want them to be propogated too. On further inspection I found that the following patch which I have attached solves the problem. I am not sure if this is a bug or feature in the kernel. Roland McGrath <roland@redhat.com> said: This relates only to fcntl F_SETSIG, which is a Linux extension. So there is no POSIX issue. When changing various things like the normal SIGIO signalling to do group signals, I was concerned strictly with the POSIX semantics and generally avoided touching things in the domain of Linux inventions. That's why I didn't change this when I changed the call right next to it. There is no reason I can see that F_SETSIG-requested signals shouldn't use a group signal like normal SIGIO does. I'm happy to ACK this patch, there is nothing wrong with its change to the semantics in my book. But neither POSIX nor I care a whit what F_SETSIG does. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> > > Which seems to imply that when we feed fcntl(F_SETOWN) a TID instead of > > a PID it should deliver SIGIO to the thread instead of the whole process > > -- which, to me, seems a sane semantic. > > I am not sure I understand the man above... But to me it looks like we > should always send a private signal when fown->signum != 0 ? > > The change should be simple, but as you pointed out we can break things. Right, so the change I had in mind is like the below (except I don't know if we can compare struct pid things by pointer value or if we should look at the content). In any case, we should either do something like the below (yay!), or amend the manpage (Michael?) and introduce something like F_SETOWN2 which does have the below semantics :-(. --- Index: linux-2.6/fs/fcntl.c =================================================================== --- linux-2.6.orig/fs/fcntl.c +++ linux-2.6/fs/fcntl.c @@ -431,6 +431,16 @@ static void send_sigio_to_task(struct ta int fd, int reason) { + int (*send_sig)(int, struct siginfo *, struct task_struct *); + + send_sig = group_send_sig_info; + /* + * If the fown points to a specific TID instead of to a PID + * we'll send the signal to the thread only. + */ + if (fown->pid_type == PIDTYPE_PID && fown->pid != task_tgid(p)) + send_sig = send_sig_info; + /* * F_SETSIG can change ->signum lockless in parallel, make * sure we read it once and use the same value throughout. @@ -461,11 +472,11 @@ static void send_sigio_to_task(struct ta else si.si_band = band_table[reason - POLL_IN]; si.si_fd = fd; - if (!group_send_sig_info(signum, &si, p)) + if (!send_sig(signum, &si, p)) break; /* fall-through: fall back on the old plain SIGIO signal */ case 0: - group_send_sig_info(SIGIO, SEND_SIG_PRIV, p); + send_sig(SIGIO, SEND_SIG_PRIV, p); } } ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: perf_counters issue with self-sampling threads 2009-07-30 11:31 ` Peter Zijlstra @ 2009-07-30 19:20 ` Oleg Nesterov 2009-07-30 20:00 ` Peter Zijlstra 0 siblings, 1 reply; 49+ messages in thread From: Oleg Nesterov @ 2009-07-30 19:20 UTC (permalink / raw) To: Peter Zijlstra Cc: eranian, Ingo Molnar, LKML, Andrew Morton, Thomas Gleixner, Robert Richter, Paul Mackerras, Andi Kleen, Maynard Johnson, Carl Love, Corey J Ashford, Philip Mucci, Dan Terpstra, perfmon2-devel, Michael Kerrisk, roland On 07/30, Peter Zijlstra wrote: > > On Thu, 2009-07-30 at 00:17 +0200, Oleg Nesterov wrote: > > (add Roland) > > but you seem to have forgotten to actually edit the CC line, fixed > that ;-) Yes, thanks ;) > > On 07/29, Peter Zijlstra wrote: > > > > > > On Mon, 2009-07-27 at 18:51 +0200, stephane eranian wrote: > > > > > > > > POSIX does not mandate that asynchronous signals be delivered > > > > to the thread in which they originated. Any thread in the process > > > > may process the signal, assuming it does not have the signal > > > > blocked. > > > > Yes. I now nothing about POSIX, but this is what Linux does at least. > > I don't think we can/should change this behaviour. > > Well, we have plenty exceptions to that rule already, we have itimer > extentions, tkill sys_rt_tgsigqueueinfo and plenty more.. Yes, yes, I meant the behaviour of kill(2), group_send_sig_info(), etc. > > > fcntl(2) for F_SETOWN says: > > > > > > If a non-zero value is given to F_SETSIG in a multi‐ threaded > > > process running with a threading library that supports thread groups > > > (e.g., NPTL), then a positive value given to F_SETOWN has a > > > different meaning: instead of being a process ID identifying a whole > > > pro‐ cess, it is a thread ID identifying a specific thread within a > > > process. > > > > Heh. Definitely this is not what Linux does ;) > > Right, so the question is, did we ever? Why does the man page say this. > > Looking at the .12 source (git start) we did: > > 440 if (!send_sig_info(fown->signum, &si, p)) > 441 break; > 442 /* fall-through: fall back on the old plain SIGIO signal */ > 443 case 0: > 444 send_group_sig_info(SIGIO, SEND_SIG_PRIV, p); Yes, the send_sig_info() above seems to match the manpage. Another thing I can't understand, group_send_sig_info() calls check_kill_permission(). But check_kill_permission() uses current, which can be a "random" task if kill_fasync() is called from interrupt. Even if not interrupt, I don't understand why (say) pipe_read() can't send a signal here. sigio_perm() has already checked permissions, and it correctly uses fown->cred. > Which was 'corrected' in: > > commit fc9c9ab22d5650977c417ef2032d02f455011b23 > Author: Bharath Ramesh <bramesh@vt.edu> > Date: Sat Apr 16 15:25:41 2005 -0700 > > [PATCH] AYSNC IO using singals other than SIGIO > > A question on sigwaitinfo based IO mechanism in multithreaded applications. > > I am trying to use RT signals to notify me of IO events using RT signals > instead of SIGIO in a multithreaded applications. I noticed that there was > some discussion on lkml during november 1999 with the subject of the > discussion as "Signal driven IO". In the thread I noticed that RT signals > were being delivered to the worker thread. I am running 2.6.10 kernel and > I am trying to use the very same mechanism and I find that only SIGIO being > propogated to the worker threads and RT signals only being propogated to > the main thread and not the worker threads where I actually want them to be > propogated too. On further inspection I found that the following patch > which I have attached solves the problem. So, some people want shared signals here. > > I am not sure I understand the man above... But to me it looks like we > > should always send a private signal when fown->signum != 0 ? > > > > The change should be simple, but as you pointed out we can break things. > > Right, so the change I had in mind is like the below (except I don't > know if we can compare struct pid things by pointer value or if we > should look at the content). (yes, we can compare the pointers) > In any case, we should either do something like the below (yay!), or > amend the manpage (Michael?) and introduce something like F_SETOWN2 > which does have the below semantics :-(. > > --- > Index: linux-2.6/fs/fcntl.c > =================================================================== > --- linux-2.6.orig/fs/fcntl.c > +++ linux-2.6/fs/fcntl.c > @@ -431,6 +431,16 @@ static void send_sigio_to_task(struct ta > int fd, > int reason) > { > + int (*send_sig)(int, struct siginfo *, struct task_struct *); > + > + send_sig = group_send_sig_info; > + /* > + * If the fown points to a specific TID instead of to a PID > + * we'll send the signal to the thread only. > + */ > + if (fown->pid_type == PIDTYPE_PID && fown->pid != task_tgid(p)) > + send_sig = send_sig_info; Yes, this allows to send a private signal to sub-thread. But this is a bit strange, because the user can't specify it wants a thread-specific signal to the main thread, its tid == pid. I don't know what should we do. Perhaps we can just add "bool is_group_signal" to fown_struct as another Linux extension. Oleg. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: perf_counters issue with self-sampling threads 2009-07-30 19:20 ` Oleg Nesterov @ 2009-07-30 20:00 ` Peter Zijlstra 2009-07-30 20:28 ` Oleg Nesterov 0 siblings, 1 reply; 49+ messages in thread From: Peter Zijlstra @ 2009-07-30 20:00 UTC (permalink / raw) To: Oleg Nesterov Cc: eranian, Ingo Molnar, LKML, Andrew Morton, Thomas Gleixner, Robert Richter, Paul Mackerras, Andi Kleen, Maynard Johnson, Carl Love, Corey J Ashford, Philip Mucci, Dan Terpstra, perfmon2-devel, Michael Kerrisk, roland On Thu, 2009-07-30 at 21:20 +0200, Oleg Nesterov wrote: > Yes, this allows to send a private signal to sub-thread. > > But this is a bit strange, because the user can't specify it wants > a thread-specific signal to the main thread, its tid == pid. Ah, indeed. I'll make a patch for F_SETOWN_TID then, unless someone comes up with a better name for the creature ;-) ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: perf_counters issue with self-sampling threads 2009-07-30 20:00 ` Peter Zijlstra @ 2009-07-30 20:28 ` Oleg Nesterov 2009-07-30 21:09 ` stephane eranian 2009-07-31 8:35 ` [RFC][PATCH] fcntl: F_[SG]ETOWN_TID Peter Zijlstra 0 siblings, 2 replies; 49+ messages in thread From: Oleg Nesterov @ 2009-07-30 20:28 UTC (permalink / raw) To: Peter Zijlstra Cc: eranian, Ingo Molnar, LKML, Andrew Morton, Thomas Gleixner, Robert Richter, Paul Mackerras, Andi Kleen, Maynard Johnson, Carl Love, Corey J Ashford, Philip Mucci, Dan Terpstra, perfmon2-devel, Michael Kerrisk, roland On 07/30, Peter Zijlstra wrote: > > I'll make a patch for F_SETOWN_TID then, unless someone > comes up with a better name for the creature ;-) I think you are right. It is not safe to change the current behaviour. Oleg. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: perf_counters issue with self-sampling threads 2009-07-30 20:28 ` Oleg Nesterov @ 2009-07-30 21:09 ` stephane eranian 2009-07-31 8:35 ` [RFC][PATCH] fcntl: F_[SG]ETOWN_TID Peter Zijlstra 1 sibling, 0 replies; 49+ messages in thread From: stephane eranian @ 2009-07-30 21:09 UTC (permalink / raw) To: Oleg Nesterov Cc: Peter Zijlstra, Ingo Molnar, LKML, Andrew Morton, Thomas Gleixner, Robert Richter, Paul Mackerras, Andi Kleen, Maynard Johnson, Carl Love, Corey J Ashford, Philip Mucci, Dan Terpstra, perfmon2-devel, Michael Kerrisk, roland On Thu, Jul 30, 2009 at 10:28 PM, Oleg Nesterov<oleg@redhat.com> wrote: > On 07/30, Peter Zijlstra wrote: >> >> I'll make a patch for F_SETOWN_TID then, unless someone >> comes up with a better name for the creature ;-) > > I think you are right. It is not safe to change the current > behaviour. > I agree. As I said earlier, it is better to just add a new code path. ^ permalink raw reply [flat|nested] 49+ messages in thread
* [RFC][PATCH] fcntl: F_[SG]ETOWN_TID 2009-07-30 20:28 ` Oleg Nesterov 2009-07-30 21:09 ` stephane eranian @ 2009-07-31 8:35 ` Peter Zijlstra 2009-07-31 14:01 ` stephane eranian ` (2 more replies) 1 sibling, 3 replies; 49+ messages in thread From: Peter Zijlstra @ 2009-07-31 8:35 UTC (permalink / raw) To: Oleg Nesterov Cc: eranian, Ingo Molnar, LKML, Andrew Morton, Thomas Gleixner, Robert Richter, Paul Mackerras, Andi Kleen, Maynard Johnson, Carl Love, Corey J Ashford, Philip Mucci, Dan Terpstra, perfmon2-devel, Michael Kerrisk, roland In order to direct the SIGIO signal to a particular thread of a multi-threaded application we cannot, like suggested by the manpage, put a TID into the regular fcntl(F_SETOWN) call. It will still be send to the whole process of which that thread is part. Since people do want to properly direct SIGIO we introduce F_SETOWN_TID, which functions similarly to F_SETOWN, except positive arguments are interpreted as TIDs and negative arguments are interpreted as PIDs. This extension is fully bug compatible with the old F_GETOWN implementation in that F_GETOWN_TID will be troubled by the negative return value for PIDs similarly to F_GETOWN's trouble with process groups. [ compile tested only so far ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> --- arch/parisc/include/asm/fcntl.h | 2 + fs/fcntl.c | 64 +++++++++++++++++++++++++++++++++----- include/asm-generic/fcntl.h | 4 ++ include/linux/fs.h | 11 +++++- net/socket.c | 2 +- 5 files changed, 71 insertions(+), 12 deletions(-) diff --git a/arch/parisc/include/asm/fcntl.h b/arch/parisc/include/asm/fcntl.h index 1e1c824..5d5235a 100644 --- a/arch/parisc/include/asm/fcntl.h +++ b/arch/parisc/include/asm/fcntl.h @@ -28,6 +28,8 @@ #define F_SETOWN 12 /* for sockets. */ #define F_SETSIG 13 /* for sockets. */ #define F_GETSIG 14 /* for sockets. */ +#define F_GETOWN_TID 15 +#define F_SETOWN_TID 16 /* for posix fcntl() and lockf() */ #define F_RDLCK 01 diff --git a/fs/fcntl.c b/fs/fcntl.c index ae41308..8d0b7f9 100644 --- a/fs/fcntl.c +++ b/fs/fcntl.c @@ -197,13 +197,15 @@ static int setfl(int fd, struct file * filp, unsigned long arg) } static void f_modown(struct file *filp, struct pid *pid, enum pid_type type, - int force) + int flags) { write_lock_irq(&filp->f_owner.lock); - if (force || !filp->f_owner.pid) { + if ((flags & FF_SETOWN_FORCE) || !filp->f_owner.pid) { put_pid(filp->f_owner.pid); filp->f_owner.pid = get_pid(pid); filp->f_owner.pid_type = type; + filp->f_owner.task_only = + (type == PIDTYPE_PID && (flags & FF_SETOWN_TID)); if (pid) { const struct cred *cred = current_cred(); @@ -215,7 +217,7 @@ static void f_modown(struct file *filp, struct pid *pid, enum pid_type type, } int __f_setown(struct file *filp, struct pid *pid, enum pid_type type, - int force) + int flags) { int err; @@ -223,12 +225,12 @@ int __f_setown(struct file *filp, struct pid *pid, enum pid_type type, if (err) return err; - f_modown(filp, pid, type, force); + f_modown(filp, pid, type, flags); return 0; } EXPORT_SYMBOL(__f_setown); -int f_setown(struct file *filp, unsigned long arg, int force) +int f_setown(struct file *filp, unsigned long arg, int flags) { enum pid_type type; struct pid *pid; @@ -241,7 +243,7 @@ int f_setown(struct file *filp, unsigned long arg, int force) } rcu_read_lock(); pid = find_vpid(who); - result = __f_setown(filp, pid, type, force); + result = __f_setown(filp, pid, type, flags); rcu_read_unlock(); return result; } @@ -263,6 +265,40 @@ pid_t f_getown(struct file *filp) return pid; } +static int f_setown_tid(struct file *filp, unsigned long arg) +{ + int flags = FF_SETOWN_FORCE; + struct pid *pid; + int who = arg; + int ret = 0; + + if (who < 0) + who = -who; + else + flags |= FF_SETOWN_TID; + + rcu_read_lock(); + pid = find_vpid(who); + ret = __f_setown(filp, pid, PIDTYPE_PID, flags); + rcu_read_unlock(); + + return ret; +} + +static pid_t f_getown_tid(struct file *filp) +{ + pid_t tid; + + read_lock(&filp->f_owner.lock); + tid = pid_vnr(filp->f_owner.pid); + if (filp->f_owner.pid_type == PIDTYPE_PGID) + tid = 0; + if (!filp->f_owner.task_only) + tid = -tid; + read_unlock(&filp->f_owner.lock); + return tid; +} + static long do_fcntl(int fd, unsigned int cmd, unsigned long arg, struct file *filp) { @@ -311,7 +347,14 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg, force_successful_syscall_return(); break; case F_SETOWN: - err = f_setown(filp, arg, 1); + err = f_setown(filp, arg, FF_SETOWN_FORCE); + break; + case F_GETOWN_TID: + err = f_getown_tid(filp); + force_successful_syscall_return(); + break; + case F_SETOWN_TID: + err = f_setown_tid(filp, arg); break; case F_GETSIG: err = filp->f_owner.signum; @@ -431,6 +474,7 @@ static void send_sigio_to_task(struct task_struct *p, int fd, int reason) { + int (*send_sig)(int, struct siginfo *, struct task_struct *); /* * F_SETSIG can change ->signum lockless in parallel, make * sure we read it once and use the same value throughout. @@ -440,6 +484,8 @@ static void send_sigio_to_task(struct task_struct *p, if (!sigio_perm(p, fown, signum)) return; + send_sig = fown->task_only ? send_sig_info : group_send_sig_info; + switch (signum) { siginfo_t si; default: @@ -461,11 +507,11 @@ static void send_sigio_to_task(struct task_struct *p, else si.si_band = band_table[reason - POLL_IN]; si.si_fd = fd; - if (!group_send_sig_info(signum, &si, p)) + if (!send_sig(signum, &si, p)) break; /* fall-through: fall back on the old plain SIGIO signal */ case 0: - group_send_sig_info(SIGIO, SEND_SIG_PRIV, p); + send_sig(SIGIO, SEND_SIG_PRIV, p); } } diff --git a/include/asm-generic/fcntl.h b/include/asm-generic/fcntl.h index 4d3e483..d7906b8 100644 --- a/include/asm-generic/fcntl.h +++ b/include/asm-generic/fcntl.h @@ -73,6 +73,10 @@ #define F_SETSIG 10 /* for sockets. */ #define F_GETSIG 11 /* for sockets. */ #endif +#ifndef F_SETOWN_TID +#define F_SETOWN_TID 12 +#define F_GETOWN_TID 13 +#endif /* for F_[GET|SET]FL */ #define FD_CLOEXEC 1 /* actually anything with low bit set goes */ diff --git a/include/linux/fs.h b/include/linux/fs.h index 0872372..42697e7 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -872,6 +872,7 @@ struct fown_struct { rwlock_t lock; /* protects pid, uid, euid fields */ struct pid *pid; /* pid or -pgrp where SIGIO should be sent */ enum pid_type pid_type; /* Kind of process group SIGIO should be sent to */ + bool task_only; /* task or group signal */ uid_t uid, euid; /* uid/euid of process setting the owner */ int signum; /* posix.1b rt signal to be delivered on IO */ }; @@ -1291,8 +1292,14 @@ extern void kill_fasync(struct fasync_struct **, int, int); /* only for net: no internal synchronization */ extern void __kill_fasync(struct fasync_struct *, int, int); -extern int __f_setown(struct file *filp, struct pid *, enum pid_type, int force); -extern int f_setown(struct file *filp, unsigned long arg, int force); +/* + * setown flags + */ +#define FF_SETOWN_FORCE 1 +#define FF_SETOWN_TID 2 + +extern int __f_setown(struct file *filp, struct pid *, enum pid_type, int flags); +extern int f_setown(struct file *filp, unsigned long arg, int flags); extern void f_delown(struct file *filp); extern pid_t f_getown(struct file *filp); extern int send_sigurg(struct fown_struct *fown); diff --git a/net/socket.c b/net/socket.c index 791d71a..ac57f8e 100644 --- a/net/socket.c +++ b/net/socket.c @@ -916,7 +916,7 @@ static long sock_ioctl(struct file *file, unsigned cmd, unsigned long arg) err = -EFAULT; if (get_user(pid, (int __user *)argp)) break; - err = f_setown(sock->file, pid, 1); + err = f_setown(sock->file, pid, FF_SETOWN_FORCE); break; case FIOGETOWN: case SIOCGPGRP: ^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: [RFC][PATCH] fcntl: F_[SG]ETOWN_TID 2009-07-31 8:35 ` [RFC][PATCH] fcntl: F_[SG]ETOWN_TID Peter Zijlstra @ 2009-07-31 14:01 ` stephane eranian 2009-07-31 20:52 ` Oleg Nesterov 2009-07-31 21:11 ` Andrew Morton 2 siblings, 0 replies; 49+ messages in thread From: stephane eranian @ 2009-07-31 14:01 UTC (permalink / raw) To: Peter Zijlstra Cc: Oleg Nesterov, Ingo Molnar, LKML, Andrew Morton, Thomas Gleixner, Robert Richter, Paul Mackerras, Andi Kleen, Maynard Johnson, Carl Love, Corey J Ashford, Philip Mucci, Dan Terpstra, perfmon2-devel, Michael Kerrisk, roland Hi, I have tried this patch in 2.6.30 and perfmon given I already had a stress test program for this problem. I can report that the patch seems to work for me. This is a self-sampling multi-threaded program in which each thread also uses setitimer(ITIMER_REAL) and thus is getting SIGALRM. First run with stock F_SETOWN: $ self_smpl_multi 8 20000000 29 0 program_time = 8, threshold = 20000000, signum = 29 fcntl(F_SETOWN) launch main: fd: 3, tid: 5836, self: 0x7f75f43206e0 launch side: fd: 4, tid: 5837, self: 0x40b59950 1: fd = 3, count = 30, iter = 651, rate = 46/Kiter 1: fd = 4, count = 29, iter = 623, rate = 46/Kiter (bad thread) ser = 287, fd = 4, tid = 5836, self = 0x7f75f43206e0 2: fd = 4, count = 119, iter = 2540, rate = 46/Kiter 2: fd = 3, count = 118, iter = 2510, rate = 47/Kiter (bad thread) ser = 330, fd = 4, tid = 5836, self = 0x7f75f43206e0 (bad thread) ser = 395, fd = 4, tid = 5836, self = 0x7f75f43206e0 The "(bad thread)" message shows up when the signal goes to the wrong thread. Second run with F_SETOWN_TID: $ ./self_smpl_multi 8 20000000 29 1 program_time = 8, threshold = 20000000, signum = 29 fcntl(F_SETOWN_TID) launch main: fd: 3, tid: 5838, self: 0x7fa1800af6e0 launch side: fd: 4, tid: 5839, self: 0x4253a950 1: fd = 4, count = 52, iter = 1115, rate = 46/Kiter 1: fd = 3, count = 52, iter = 1115, rate = 46/Kiter 2: fd = 4, count = 119, iter = 2541, rate = 46/Kiter 2: fd = 3, count = 117, iter = 2490, rate = 46/Kiter 3: fd = 3, count = 114, iter = 2427, rate = 46/Kiter 3: fd = 4, count = 119, iter = 2541, rate = 46/Kiter 4: fd = 4, count = 119, iter = 2541, rate = 46/Kiter 4: fd = 3, count = 114, iter = 2428, rate = 46/Kiter 5: fd = 4, count = 119, iter = 2541, rate = 46/Kiter No more error messages. Thanks. On Fri, Jul 31, 2009 at 10:35 AM, Peter Zijlstra<a.p.zijlstra@chello.nl> wrote: > In order to direct the SIGIO signal to a particular thread of a > multi-threaded application we cannot, like suggested by the manpage, put > a TID into the regular fcntl(F_SETOWN) call. It will still be send to > the whole process of which that thread is part. > > Since people do want to properly direct SIGIO we introduce F_SETOWN_TID, > which functions similarly to F_SETOWN, except positive arguments are > interpreted as TIDs and negative arguments are interpreted as PIDs. > > This extension is fully bug compatible with the old F_GETOWN > implementation in that F_GETOWN_TID will be troubled by the negative > return value for PIDs similarly to F_GETOWN's trouble with process > groups. > > [ compile tested only so far ] > > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> > --- > arch/parisc/include/asm/fcntl.h | 2 + > fs/fcntl.c | 64 +++++++++++++++++++++++++++++++++----- > include/asm-generic/fcntl.h | 4 ++ > include/linux/fs.h | 11 +++++- > net/socket.c | 2 +- > 5 files changed, 71 insertions(+), 12 deletions(-) > > diff --git a/arch/parisc/include/asm/fcntl.h b/arch/parisc/include/asm/fcntl.h > index 1e1c824..5d5235a 100644 > --- a/arch/parisc/include/asm/fcntl.h > +++ b/arch/parisc/include/asm/fcntl.h > @@ -28,6 +28,8 @@ > #define F_SETOWN 12 /* for sockets. */ > #define F_SETSIG 13 /* for sockets. */ > #define F_GETSIG 14 /* for sockets. */ > +#define F_GETOWN_TID 15 > +#define F_SETOWN_TID 16 > > /* for posix fcntl() and lockf() */ > #define F_RDLCK 01 > diff --git a/fs/fcntl.c b/fs/fcntl.c > index ae41308..8d0b7f9 100644 > --- a/fs/fcntl.c > +++ b/fs/fcntl.c > @@ -197,13 +197,15 @@ static int setfl(int fd, struct file * filp, unsigned long arg) > } > > static void f_modown(struct file *filp, struct pid *pid, enum pid_type type, > - int force) > + int flags) > { > write_lock_irq(&filp->f_owner.lock); > - if (force || !filp->f_owner.pid) { > + if ((flags & FF_SETOWN_FORCE) || !filp->f_owner.pid) { > put_pid(filp->f_owner.pid); > filp->f_owner.pid = get_pid(pid); > filp->f_owner.pid_type = type; > + filp->f_owner.task_only = > + (type == PIDTYPE_PID && (flags & FF_SETOWN_TID)); > > if (pid) { > const struct cred *cred = current_cred(); > @@ -215,7 +217,7 @@ static void f_modown(struct file *filp, struct pid *pid, enum pid_type type, > } > > int __f_setown(struct file *filp, struct pid *pid, enum pid_type type, > - int force) > + int flags) > { > int err; > > @@ -223,12 +225,12 @@ int __f_setown(struct file *filp, struct pid *pid, enum pid_type type, > if (err) > return err; > > - f_modown(filp, pid, type, force); > + f_modown(filp, pid, type, flags); > return 0; > } > EXPORT_SYMBOL(__f_setown); > > -int f_setown(struct file *filp, unsigned long arg, int force) > +int f_setown(struct file *filp, unsigned long arg, int flags) > { > enum pid_type type; > struct pid *pid; > @@ -241,7 +243,7 @@ int f_setown(struct file *filp, unsigned long arg, int force) > } > rcu_read_lock(); > pid = find_vpid(who); > - result = __f_setown(filp, pid, type, force); > + result = __f_setown(filp, pid, type, flags); > rcu_read_unlock(); > return result; > } > @@ -263,6 +265,40 @@ pid_t f_getown(struct file *filp) > return pid; > } > > +static int f_setown_tid(struct file *filp, unsigned long arg) > +{ > + int flags = FF_SETOWN_FORCE; > + struct pid *pid; > + int who = arg; > + int ret = 0; > + > + if (who < 0) > + who = -who; > + else > + flags |= FF_SETOWN_TID; > + > + rcu_read_lock(); > + pid = find_vpid(who); > + ret = __f_setown(filp, pid, PIDTYPE_PID, flags); > + rcu_read_unlock(); > + > + return ret; > +} > + > +static pid_t f_getown_tid(struct file *filp) > +{ > + pid_t tid; > + > + read_lock(&filp->f_owner.lock); > + tid = pid_vnr(filp->f_owner.pid); > + if (filp->f_owner.pid_type == PIDTYPE_PGID) > + tid = 0; > + if (!filp->f_owner.task_only) > + tid = -tid; > + read_unlock(&filp->f_owner.lock); > + return tid; > +} > + > static long do_fcntl(int fd, unsigned int cmd, unsigned long arg, > struct file *filp) > { > @@ -311,7 +347,14 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg, > force_successful_syscall_return(); > break; > case F_SETOWN: > - err = f_setown(filp, arg, 1); > + err = f_setown(filp, arg, FF_SETOWN_FORCE); > + break; > + case F_GETOWN_TID: > + err = f_getown_tid(filp); > + force_successful_syscall_return(); > + break; > + case F_SETOWN_TID: > + err = f_setown_tid(filp, arg); > break; > case F_GETSIG: > err = filp->f_owner.signum; > @@ -431,6 +474,7 @@ static void send_sigio_to_task(struct task_struct *p, > int fd, > int reason) > { > + int (*send_sig)(int, struct siginfo *, struct task_struct *); > /* > * F_SETSIG can change ->signum lockless in parallel, make > * sure we read it once and use the same value throughout. > @@ -440,6 +484,8 @@ static void send_sigio_to_task(struct task_struct *p, > if (!sigio_perm(p, fown, signum)) > return; > > + send_sig = fown->task_only ? send_sig_info : group_send_sig_info; > + > switch (signum) { > siginfo_t si; > default: > @@ -461,11 +507,11 @@ static void send_sigio_to_task(struct task_struct *p, > else > si.si_band = band_table[reason - POLL_IN]; > si.si_fd = fd; > - if (!group_send_sig_info(signum, &si, p)) > + if (!send_sig(signum, &si, p)) > break; > /* fall-through: fall back on the old plain SIGIO signal */ > case 0: > - group_send_sig_info(SIGIO, SEND_SIG_PRIV, p); > + send_sig(SIGIO, SEND_SIG_PRIV, p); > } > } > > diff --git a/include/asm-generic/fcntl.h b/include/asm-generic/fcntl.h > index 4d3e483..d7906b8 100644 > --- a/include/asm-generic/fcntl.h > +++ b/include/asm-generic/fcntl.h > @@ -73,6 +73,10 @@ > #define F_SETSIG 10 /* for sockets. */ > #define F_GETSIG 11 /* for sockets. */ > #endif > +#ifndef F_SETOWN_TID > +#define F_SETOWN_TID 12 > +#define F_GETOWN_TID 13 > +#endif > > /* for F_[GET|SET]FL */ > #define FD_CLOEXEC 1 /* actually anything with low bit set goes */ > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 0872372..42697e7 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -872,6 +872,7 @@ struct fown_struct { > rwlock_t lock; /* protects pid, uid, euid fields */ > struct pid *pid; /* pid or -pgrp where SIGIO should be sent */ > enum pid_type pid_type; /* Kind of process group SIGIO should be sent to */ > + bool task_only; /* task or group signal */ > uid_t uid, euid; /* uid/euid of process setting the owner */ > int signum; /* posix.1b rt signal to be delivered on IO */ > }; > @@ -1291,8 +1292,14 @@ extern void kill_fasync(struct fasync_struct **, int, int); > /* only for net: no internal synchronization */ > extern void __kill_fasync(struct fasync_struct *, int, int); > > -extern int __f_setown(struct file *filp, struct pid *, enum pid_type, int force); > -extern int f_setown(struct file *filp, unsigned long arg, int force); > +/* > + * setown flags > + */ > +#define FF_SETOWN_FORCE 1 > +#define FF_SETOWN_TID 2 > + > +extern int __f_setown(struct file *filp, struct pid *, enum pid_type, int flags); > +extern int f_setown(struct file *filp, unsigned long arg, int flags); > extern void f_delown(struct file *filp); > extern pid_t f_getown(struct file *filp); > extern int send_sigurg(struct fown_struct *fown); > diff --git a/net/socket.c b/net/socket.c > index 791d71a..ac57f8e 100644 > --- a/net/socket.c > +++ b/net/socket.c > @@ -916,7 +916,7 @@ static long sock_ioctl(struct file *file, unsigned cmd, unsigned long arg) > err = -EFAULT; > if (get_user(pid, (int __user *)argp)) > break; > - err = f_setown(sock->file, pid, 1); > + err = f_setown(sock->file, pid, FF_SETOWN_FORCE); > break; > case FIOGETOWN: > case SIOCGPGRP: > > > ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [RFC][PATCH] fcntl: F_[SG]ETOWN_TID 2009-07-31 8:35 ` [RFC][PATCH] fcntl: F_[SG]ETOWN_TID Peter Zijlstra 2009-07-31 14:01 ` stephane eranian @ 2009-07-31 20:52 ` Oleg Nesterov 2009-07-31 21:11 ` Andrew Morton 2 siblings, 0 replies; 49+ messages in thread From: Oleg Nesterov @ 2009-07-31 20:52 UTC (permalink / raw) To: Peter Zijlstra Cc: eranian, Ingo Molnar, LKML, Andrew Morton, Thomas Gleixner, Robert Richter, Paul Mackerras, Andi Kleen, Maynard Johnson, Carl Love, Corey J Ashford, Philip Mucci, Dan Terpstra, perfmon2-devel, Michael Kerrisk, roland On 07/31, Peter Zijlstra wrote: > > In order to direct the SIGIO signal to a particular thread of a > multi-threaded application we cannot, like suggested by the manpage, put > a TID into the regular fcntl(F_SETOWN) call. It will still be send to > the whole process of which that thread is part. > > Since people do want to properly direct SIGIO we introduce F_SETOWN_TID, > which functions similarly to F_SETOWN, except positive arguments are > interpreted as TIDs and negative arguments are interpreted as PIDs. I think this is correct. But, > @@ -431,6 +474,7 @@ static void send_sigio_to_task(struct task_struct *p, > int fd, > int reason) > { > + int (*send_sig)(int, struct siginfo *, struct task_struct *); > /* > * F_SETSIG can change ->signum lockless in parallel, make > * sure we read it once and use the same value throughout. > @@ -440,6 +484,8 @@ static void send_sigio_to_task(struct task_struct *p, > if (!sigio_perm(p, fown, signum)) > return; > > + send_sig = fown->task_only ? send_sig_info : group_send_sig_info; this is ugly. I do not blame your patch, I blame signal.c which has a lot of helpers to send a signal but it is never possible to find the right one. I think we need a new trivial helper, int xxx(int sig, struct siginfo *info, struct task_struct *p, bool group) { unsigned long flags; int ret = -ESRCH; if (lock_task_sighand(p, &flags)) { ret = send_signal(sig, info, p, group); unlock_task_sighand(p, &flags); } return ret; } send_sigio_to_task() can just do: send_signal(..., !fown->task_only). group_send_sig_info(), send_sig_info() should use this helper too. Also. without the new helper, F_SETOWN does check_kill_permission() while F_SETOWN_TID does not. This doesn't really matter, but this looks a bit odd. Note that send_sigio_to_task() does not need check_kill_permission(). We use either SEND_SIG_PRIV or SI_FROMKERNEL() signals, so we never actually check permissions. Even if we did, it would be just wrong to deny the signal here using current_cred(). Peter, may I suggest to to re-diff your patch on top of the trivial patch I am going to send a bit later today? Oleg. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [RFC][PATCH] fcntl: F_[SG]ETOWN_TID 2009-07-31 8:35 ` [RFC][PATCH] fcntl: F_[SG]ETOWN_TID Peter Zijlstra 2009-07-31 14:01 ` stephane eranian 2009-07-31 20:52 ` Oleg Nesterov @ 2009-07-31 21:11 ` Andrew Morton 2009-08-01 1:27 ` [PATCH 0/2] send_sigio/do_send_sig_info (Was: [RFC][PATCH] fcntl: F_[SG]ETOWN_TID) Oleg Nesterov ` (4 more replies) 2 siblings, 5 replies; 49+ messages in thread From: Andrew Morton @ 2009-07-31 21:11 UTC (permalink / raw) To: Peter Zijlstra Cc: oleg, eranian, mingo, linux-kernel, tglx, robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci, terpstra, perfmon2-devel, mtk.manpages, roland On Fri, 31 Jul 2009 10:35:20 +0200 Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > In order to direct the SIGIO signal to a particular thread of a > multi-threaded application we cannot, like suggested by the manpage, put > a TID into the regular fcntl(F_SETOWN) call. It will still be send to > the whole process of which that thread is part. > > Since people do want to properly direct SIGIO we introduce F_SETOWN_TID, > which functions similarly to F_SETOWN, except positive arguments are > interpreted as TIDs and negative arguments are interpreted as PIDs. > > This extension is fully bug compatible with the old F_GETOWN > implementation in that F_GETOWN_TID will be troubled by the negative > return value for PIDs similarly to F_GETOWN's trouble with process > groups. I'd be interested in seeing a bit more explanation about the "people do want to properly direct SIGIO" thing - use cases, how the current code causes them problems, etc. As it stands, it's a bit of a mystery-patch. > [ compile tested only so far ] I will continue to lurk :) > arch/parisc/include/asm/fcntl.h | 2 + > fs/fcntl.c | 64 +++++++++++++++++++++++++++++++++----- > include/asm-generic/fcntl.h | 4 ++ > include/linux/fs.h | 11 +++++- > net/socket.c | 2 +- OK. Alpha has private definitions of F_SETSIG and F_GETSIG which are identical to the generic ones. That's somewhat logical, given that alpha's F_SETOWN/F_GETOWN _differ_ from the asm-generic ones. Alpha appears to have made the decision to spell out _all_ the F_* flags, given that some of them are different. That has some merit, I guess. But your patch broke that. ^ permalink raw reply [flat|nested] 49+ messages in thread
* [PATCH 0/2] send_sigio/do_send_sig_info (Was: [RFC][PATCH] fcntl: F_[SG]ETOWN_TID) 2009-07-31 21:11 ` Andrew Morton @ 2009-08-01 1:27 ` Oleg Nesterov 2009-08-03 15:48 ` [PATCH 3/2] fcntl: F_[SG]ETOWN_TID Peter Zijlstra 2009-08-01 1:28 ` [PATCH 1/2] signals: introduce do_send_sig_info() helper Oleg Nesterov ` (3 subsequent siblings) 4 siblings, 1 reply; 49+ messages in thread From: Oleg Nesterov @ 2009-08-01 1:27 UTC (permalink / raw) To: Andrew Morton Cc: Peter Zijlstra, eranian, mingo, linux-kernel, tglx, robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci, terpstra, perfmon2-devel, mtk.manpages, roland Preparation for Peter's changes, but imho makes sense anyway. Oleg. ^ permalink raw reply [flat|nested] 49+ messages in thread
* [PATCH 3/2] fcntl: F_[SG]ETOWN_TID 2009-08-01 1:27 ` [PATCH 0/2] send_sigio/do_send_sig_info (Was: [RFC][PATCH] fcntl: F_[SG]ETOWN_TID) Oleg Nesterov @ 2009-08-03 15:48 ` Peter Zijlstra 2009-08-03 17:16 ` Oleg Nesterov 0 siblings, 1 reply; 49+ messages in thread From: Peter Zijlstra @ 2009-08-03 15:48 UTC (permalink / raw) To: Oleg Nesterov Cc: Andrew Morton, eranian, mingo, linux-kernel, tglx, robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci, terpstra, perfmon2-devel, mtk.manpages, roland In order to direct the SIGIO signal to a particular thread of a multi-threaded application we cannot, like suggested by the manpage, put a TID into the regular fcntl(F_SETOWN) call. It will still be send to the whole process of which that thread is part. Since people do want to properly direct SIGIO we introduce F_SETOWN_TID, which functions similarly to F_SETOWN, except positive arguments are interpreted as TIDs and negative arguments are interpreted as PIDs. The need to direct SIGIO comes from self-monitoring profiling such as with perf-counters. Perf-counters uses SIGIO to notify that new sample data is available. If the signal is delivered to the same task that generated the new sample it can augment that data by inspecting the task's user-space state right after it returns from the kernel. This is esp. convenient for interpreted or virtual machine driven environments. This extension is fully bug compatible with the old F_GETOWN implementation in that F_GETOWN_TID will be troubled by the negative return value for PIDs similarly to F_GETOWN's trouble with process groups. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: stephane eranian <eranian@googlemail.com> --- arch/alpha/include/asm/fcntl.h | 2 + arch/parisc/include/asm/fcntl.h | 2 + fs/fcntl.c | 62 ++++++++++++++++++++++++++++++++++------ include/asm-generic/fcntl.h | 4 ++ include/linux/fs.h | 11 +++++-- net/socket.c | 2 - 6 files changed, 71 insertions(+), 12 deletions(-) Index: linux-2.6/arch/parisc/include/asm/fcntl.h =================================================================== --- linux-2.6.orig/arch/parisc/include/asm/fcntl.h +++ linux-2.6/arch/parisc/include/asm/fcntl.h @@ -28,6 +28,8 @@ #define F_SETOWN 12 /* for sockets. */ #define F_SETSIG 13 /* for sockets. */ #define F_GETSIG 14 /* for sockets. */ +#define F_GETOWN_TID 15 +#define F_SETOWN_TID 16 /* for posix fcntl() and lockf() */ #define F_RDLCK 01 Index: linux-2.6/fs/fcntl.c =================================================================== --- linux-2.6.orig/fs/fcntl.c +++ linux-2.6/fs/fcntl.c @@ -197,13 +197,15 @@ static int setfl(int fd, struct file * f } static void f_modown(struct file *filp, struct pid *pid, enum pid_type type, - int force) + int flags) { write_lock_irq(&filp->f_owner.lock); - if (force || !filp->f_owner.pid) { + if ((flags & FF_SETOWN_FORCE) || !filp->f_owner.pid) { put_pid(filp->f_owner.pid); filp->f_owner.pid = get_pid(pid); filp->f_owner.pid_type = type; + filp->f_owner.task_only = + (type == PIDTYPE_PID && (flags & FF_SETOWN_TID)); if (pid) { const struct cred *cred = current_cred(); @@ -215,7 +217,7 @@ static void f_modown(struct file *filp, } int __f_setown(struct file *filp, struct pid *pid, enum pid_type type, - int force) + int flags) { int err; @@ -223,12 +225,12 @@ int __f_setown(struct file *filp, struct if (err) return err; - f_modown(filp, pid, type, force); + f_modown(filp, pid, type, flags); return 0; } EXPORT_SYMBOL(__f_setown); -int f_setown(struct file *filp, unsigned long arg, int force) +int f_setown(struct file *filp, unsigned long arg, int flags) { enum pid_type type; struct pid *pid; @@ -241,7 +243,7 @@ int f_setown(struct file *filp, unsigned } rcu_read_lock(); pid = find_vpid(who); - result = __f_setown(filp, pid, type, force); + result = __f_setown(filp, pid, type, flags); rcu_read_unlock(); return result; } @@ -263,6 +265,40 @@ pid_t f_getown(struct file *filp) return pid; } +static int f_setown_tid(struct file *filp, unsigned long arg) +{ + int flags = FF_SETOWN_FORCE; + struct pid *pid; + int who = arg; + int ret = 0; + + if (who < 0) + who = -who; + else + flags |= FF_SETOWN_TID; + + rcu_read_lock(); + pid = find_vpid(who); + ret = __f_setown(filp, pid, PIDTYPE_PID, flags); + rcu_read_unlock(); + + return ret; +} + +static pid_t f_getown_tid(struct file *filp) +{ + pid_t tid; + + read_lock(&filp->f_owner.lock); + tid = pid_vnr(filp->f_owner.pid); + if (filp->f_owner.pid_type == PIDTYPE_PGID) + tid = 0; + if (!filp->f_owner.task_only) + tid = -tid; + read_unlock(&filp->f_owner.lock); + return tid; +} + static long do_fcntl(int fd, unsigned int cmd, unsigned long arg, struct file *filp) { @@ -311,7 +347,14 @@ static long do_fcntl(int fd, unsigned in force_successful_syscall_return(); break; case F_SETOWN: - err = f_setown(filp, arg, 1); + err = f_setown(filp, arg, FF_SETOWN_FORCE); + break; + case F_GETOWN_TID: + err = f_getown_tid(filp); + force_successful_syscall_return(); + break; + case F_SETOWN_TID: + err = f_setown_tid(filp, arg); break; case F_GETSIG: err = filp->f_owner.signum; @@ -436,6 +479,7 @@ static void send_sigio_to_task(struct ta * sure we read it once and use the same value throughout. */ int signum = ACCESS_ONCE(fown->signum); + int group = !fown->task_only; if (!sigio_perm(p, fown, signum)) return; @@ -461,11 +505,11 @@ static void send_sigio_to_task(struct ta else si.si_band = band_table[reason - POLL_IN]; si.si_fd = fd; - if (!do_send_sig_info(signum, &si, p, true)) + if (!do_send_sig_info(signum, &si, p, group)) break; /* fall-through: fall back on the old plain SIGIO signal */ case 0: - do_send_sig_info(SIGIO, SEND_SIG_PRIV, p, true); + do_send_sig_info(SIGIO, SEND_SIG_PRIV, p, group); } } Index: linux-2.6/include/asm-generic/fcntl.h =================================================================== --- linux-2.6.orig/include/asm-generic/fcntl.h +++ linux-2.6/include/asm-generic/fcntl.h @@ -73,6 +73,10 @@ #define F_SETSIG 10 /* for sockets. */ #define F_GETSIG 11 /* for sockets. */ #endif +#ifndef F_SETOWN_TID +#define F_SETOWN_TID 12 +#define F_GETOWN_TID 13 +#endif /* for F_[GET|SET]FL */ #define FD_CLOEXEC 1 /* actually anything with low bit set goes */ Index: linux-2.6/include/linux/fs.h =================================================================== --- linux-2.6.orig/include/linux/fs.h +++ linux-2.6/include/linux/fs.h @@ -872,6 +872,7 @@ struct fown_struct { rwlock_t lock; /* protects pid, uid, euid fields */ struct pid *pid; /* pid or -pgrp where SIGIO should be sent */ enum pid_type pid_type; /* Kind of process group SIGIO should be sent to */ + bool task_only; /* task or group signal */ uid_t uid, euid; /* uid/euid of process setting the owner */ int signum; /* posix.1b rt signal to be delivered on IO */ }; @@ -1291,8 +1292,14 @@ extern void kill_fasync(struct fasync_st /* only for net: no internal synchronization */ extern void __kill_fasync(struct fasync_struct *, int, int); -extern int __f_setown(struct file *filp, struct pid *, enum pid_type, int force); -extern int f_setown(struct file *filp, unsigned long arg, int force); +/* + * setown flags + */ +#define FF_SETOWN_FORCE 1 +#define FF_SETOWN_TID 2 + +extern int __f_setown(struct file *filp, struct pid *, enum pid_type, int flags); +extern int f_setown(struct file *filp, unsigned long arg, int flags); extern void f_delown(struct file *filp); extern pid_t f_getown(struct file *filp); extern int send_sigurg(struct fown_struct *fown); Index: linux-2.6/net/socket.c =================================================================== --- linux-2.6.orig/net/socket.c +++ linux-2.6/net/socket.c @@ -916,7 +916,7 @@ static long sock_ioctl(struct file *file err = -EFAULT; if (get_user(pid, (int __user *)argp)) break; - err = f_setown(sock->file, pid, 1); + err = f_setown(sock->file, pid, FF_SETOWN_FORCE); break; case FIOGETOWN: case SIOCGPGRP: Index: linux-2.6/arch/alpha/include/asm/fcntl.h =================================================================== --- linux-2.6.orig/arch/alpha/include/asm/fcntl.h +++ linux-2.6/arch/alpha/include/asm/fcntl.h @@ -26,6 +26,8 @@ #define F_GETOWN 6 /* for sockets. */ #define F_SETSIG 10 /* for sockets. */ #define F_GETSIG 11 /* for sockets. */ +#define F_SETOWN_TID 12 +#define F_GETOWN_TID 13 /* for posix fcntl() and lockf() */ #define F_RDLCK 1 ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 3/2] fcntl: F_[SG]ETOWN_TID 2009-08-03 15:48 ` [PATCH 3/2] fcntl: F_[SG]ETOWN_TID Peter Zijlstra @ 2009-08-03 17:16 ` Oleg Nesterov 2009-08-03 17:47 ` Peter Zijlstra 0 siblings, 1 reply; 49+ messages in thread From: Oleg Nesterov @ 2009-08-03 17:16 UTC (permalink / raw) To: Peter Zijlstra Cc: Andrew Morton, eranian, mingo, linux-kernel, tglx, robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci, terpstra, perfmon2-devel, mtk.manpages, roland On 08/03, Peter Zijlstra wrote: > > --- linux-2.6.orig/fs/fcntl.c > +++ linux-2.6/fs/fcntl.c > @@ -197,13 +197,15 @@ static int setfl(int fd, struct file * f > } > > static void f_modown(struct file *filp, struct pid *pid, enum pid_type type, > - int force) > + int flags) > { > write_lock_irq(&filp->f_owner.lock); > - if (force || !filp->f_owner.pid) { > + if ((flags & FF_SETOWN_FORCE) || !filp->f_owner.pid) { > put_pid(filp->f_owner.pid); > filp->f_owner.pid = get_pid(pid); > filp->f_owner.pid_type = type; > + filp->f_owner.task_only = > + (type == PIDTYPE_PID && (flags & FF_SETOWN_TID)); Do we need type == PIDTYPE_PID check? FF_SETOWN_TID must imply PIDTYPE_PID, it is only used by f_setown_tid(). Hmm. Actually I am not sure we should change f_modown() at all. I was going to suggest this as a subsequent cleanup, but now I am starting to think it is better to do from the very beginning. Please see below. > +static int f_setown_tid(struct file *filp, unsigned long arg) > +{ > + int flags = FF_SETOWN_FORCE; > + struct pid *pid; > + int who = arg; > + int ret = 0; > + > + if (who < 0) > + who = -who; > + else > + flags |= FF_SETOWN_TID; Hmm, OK. so fcntl(F_SETOWN_TID, -666) <=> fcntl(F_SETOWN, +666). Not that I disagree, but I think this should be discussed. Perhaps F_SETOWN_TID can just reject who < 0. > +static pid_t f_getown_tid(struct file *filp) > +{ > + pid_t tid; > + > + read_lock(&filp->f_owner.lock); > + tid = pid_vnr(filp->f_owner.pid); > + if (filp->f_owner.pid_type == PIDTYPE_PGID) > + tid = 0; > + if (!filp->f_owner.task_only) > + tid = -tid; I didn't think about this before... What should F_GETOWN_TID return if we did F_GETOWN ? (and vice versa). f_getown_tid() returns < 0 if !task_only and ->piD != 0, this helps. but the caller of F_GETOWN can't know what the returned value actually means if F_GETOWN_TID was used. Do we really need fown->task_only? Not only this enlarges fown_struct, we have to modify f_modown() and f_setown(). Perhaps we can just add #define F_PIDTYPE_THREAD PIDTYPE_MAX into fcntl.c ? Then, static int f_setown_xxx(struct file *filp, unsigned long arg, int force, bool group) { enum pid_type type; struct pid *pid; int who = arg; int result; type = PIDTYPE_PID; if (!group) type = F_PIDTYPE_THREAD else if (who < 0) { type = PIDTYPE_PGID; who = -who; } rcu_read_lock(); pid = find_vpid(who); result = __f_setown(filp, pid, type, force); rcu_read_unlock(); return result; } int f_setown(struct file *filp, unsigned long arg, int force) { return f_setown_xxx(..., true); } Now we should also change send_sigio/send_sigurg, but this is trivial type = fown->pid_type; + if (type == F_PIDTYPE_THREAD) type = PIDTYPE_PID; send_sigio_to_task() is trivial too. What do you think? I agree, this is a bit hackish, but otoh this lessens the changes outside of fcntl.h. Oleg. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 3/2] fcntl: F_[SG]ETOWN_TID 2009-08-03 17:16 ` Oleg Nesterov @ 2009-08-03 17:47 ` Peter Zijlstra 2009-08-03 18:06 ` Oleg Nesterov 0 siblings, 1 reply; 49+ messages in thread From: Peter Zijlstra @ 2009-08-03 17:47 UTC (permalink / raw) To: Oleg Nesterov Cc: Andrew Morton, eranian, mingo, linux-kernel, tglx, robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci, terpstra, perfmon2-devel, mtk.manpages, roland On Mon, 2009-08-03 at 19:16 +0200, Oleg Nesterov wrote: > > + filp->f_owner.task_only = > > + (type == PIDTYPE_PID && (flags & FF_SETOWN_TID)); > > Do we need type == PIDTYPE_PID check? FF_SETOWN_TID must imply > PIDTYPE_PID, it is only used by f_setown_tid(). Paranoia I guess. > > +static int f_setown_tid(struct file *filp, unsigned long arg) > > +{ > > + int flags = FF_SETOWN_FORCE; > > + struct pid *pid; > > + int who = arg; > > + int ret = 0; > > + > > + if (who < 0) > > + who = -who; > > + else > > + flags |= FF_SETOWN_TID; > > Hmm, OK. so fcntl(F_SETOWN_TID, -666) <=> fcntl(F_SETOWN, +666). > > Not that I disagree, but I think this should be discussed. Perhaps > F_SETOWN_TID can just reject who < 0. Yeah, I considered that. Opinions? > > +static pid_t f_getown_tid(struct file *filp) > > +{ > > + pid_t tid; > > + > > + read_lock(&filp->f_owner.lock); > > + tid = pid_vnr(filp->f_owner.pid); > > + if (filp->f_owner.pid_type == PIDTYPE_PGID) > > + tid = 0; > > + if (!filp->f_owner.task_only) > > + tid = -tid; > > I didn't think about this before... What should F_GETOWN_TID return > if we did F_GETOWN ? (and vice versa). f_getown_tid() returns < 0 > if !task_only and ->piD != 0, this helps. > > but the caller of F_GETOWN can't know what the returned value actually > means if F_GETOWN_TID was used. Ah, I made GETOWN_TID deal with !PID but forgot the TID case in GETOWN. Yeah, icky, esp since there is no room for errors in the return value :/ I guess I could make it return 0. > Do we really need fown->task_only? Not only this enlarges fown_struct, > we have to modify f_modown() and f_setown(). > > Perhaps we can just add > > #define F_PIDTYPE_THREAD PIDTYPE_MAX > > into fcntl.c ? Then, > > static int f_setown_xxx(struct file *filp, unsigned long arg, int force, bool group) > { > enum pid_type type; > struct pid *pid; > int who = arg; > int result; > > type = PIDTYPE_PID; > if (!group) > type = F_PIDTYPE_THREAD > else if (who < 0) { > type = PIDTYPE_PGID; > who = -who; > } > > rcu_read_lock(); > pid = find_vpid(who); > result = __f_setown(filp, pid, type, force); > rcu_read_unlock(); > return result; > } > > int f_setown(struct file *filp, unsigned long arg, int force) > { > return f_setown_xxx(..., true); > } > > Now we should also change send_sigio/send_sigurg, but this is trivial > > type = fown->pid_type; > + if (type == F_PIDTYPE_THREAD) > type = PIDTYPE_PID; > > send_sigio_to_task() is trivial too. > > What do you think? I agree, this is a bit hackish, but otoh this lessens > the changes outside of fcntl.h. Right, I considered adding PIDTYPE_TID, but then I'd have to go through the kernel and make everything consistent, which is where I gave up ;-) You hack above makes sense, dunno if people will go for it though.. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 3/2] fcntl: F_[SG]ETOWN_TID 2009-08-03 17:47 ` Peter Zijlstra @ 2009-08-03 18:06 ` Oleg Nesterov 2009-08-03 18:36 ` Peter Zijlstra 0 siblings, 1 reply; 49+ messages in thread From: Oleg Nesterov @ 2009-08-03 18:06 UTC (permalink / raw) To: Peter Zijlstra Cc: Andrew Morton, eranian, mingo, linux-kernel, tglx, robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci, terpstra, perfmon2-devel, mtk.manpages, roland On 08/03, Peter Zijlstra wrote: > > On Mon, 2009-08-03 at 19:16 +0200, Oleg Nesterov wrote: > > > Hmm, OK. so fcntl(F_SETOWN_TID, -666) <=> fcntl(F_SETOWN, +666). > > > > Not that I disagree, but I think this should be discussed. Perhaps > > F_SETOWN_TID can just reject who < 0. > > Yeah, I considered that. Opinions? Yes, please ;) > > but the caller of F_GETOWN can't know what the returned value actually > > means if F_GETOWN_TID was used. > > Ah, I made GETOWN_TID deal with !PID but forgot the TID case in GETOWN. > Yeah, icky, esp since there is no room for errors in the return value :/ > I guess I could make it return 0. Yes, this is confusing too, but probably better. > > Perhaps we can just add > > > > #define F_PIDTYPE_THREAD PIDTYPE_MAX > > > > into fcntl.c ? Then, > > > Right, I considered adding PIDTYPE_TID, but then I'd have to go through > the kernel and make everything consistent, which is where I gave up ;-) Note! we don't add the new PIDTYPE_TID actually. This F_PIDTYPE_THREAD is not visible outsie of fcntl.c, we just re-use ->pid_type. Instead we could add a bit, but using the impossible PIDTYPE_MAX is simpler. > dunno if people will go for it though.. Yes, I am not sure people will like it. As for F_XXXOWN/F_XXXWOWN_TID interaction. Another option, perhaps, is add F_{SET,GET}OWN_EX which accepts a use-visible struct f_setown_struct { int pid; // > 0 int type; // enumerates PIDTYPE_PID, PIDTYPE_PGID, F_PIDTYPE_THREAD } pointer via arg. Instead of F_XXXOWN_TID + int who. This way at least the users of new api can't be confused. I dunno. Oleg. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 3/2] fcntl: F_[SG]ETOWN_TID 2009-08-03 18:06 ` Oleg Nesterov @ 2009-08-03 18:36 ` Peter Zijlstra 2009-08-03 19:02 ` Oleg Nesterov 0 siblings, 1 reply; 49+ messages in thread From: Peter Zijlstra @ 2009-08-03 18:36 UTC (permalink / raw) To: Oleg Nesterov Cc: Andrew Morton, eranian, mingo, linux-kernel, tglx, robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci, terpstra, perfmon2-devel, mtk.manpages, roland On Mon, 2009-08-03 at 20:06 +0200, Oleg Nesterov wrote: > > As for F_XXXOWN/F_XXXWOWN_TID interaction. Another option, perhaps, is add > F_{SET,GET}OWN_EX which accepts a use-visible > > struct f_setown_struct { > int pid; // > 0 > int type; // enumerates PIDTYPE_PID, PIDTYPE_PGID, F_PIDTYPE_THREAD > } > > pointer via arg. Instead of F_XXXOWN_TID + int who. > > This way at least the users of new api can't be confused. This would expose PIDTYPE* to userspace, and also fixes the value of F_PIDTYPE_THREAD. I'm not sure we want to go there (unless of course, PIDTYPE_* is already exposed somewhere). ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 3/2] fcntl: F_[SG]ETOWN_TID 2009-08-03 18:36 ` Peter Zijlstra @ 2009-08-03 19:02 ` Oleg Nesterov 2009-08-04 11:39 ` [PATCH 3/2 -v3] fcntl: F_[SG]ETOWN_EX Peter Zijlstra 0 siblings, 1 reply; 49+ messages in thread From: Oleg Nesterov @ 2009-08-03 19:02 UTC (permalink / raw) To: Peter Zijlstra Cc: Andrew Morton, eranian, mingo, linux-kernel, tglx, robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci, terpstra, perfmon2-devel, mtk.manpages, roland On 08/03, Peter Zijlstra wrote: > > On Mon, 2009-08-03 at 20:06 +0200, Oleg Nesterov wrote: > > > > As for F_XXXOWN/F_XXXWOWN_TID interaction. Another option, perhaps, is add > > F_{SET,GET}OWN_EX which accepts a use-visible > > > > struct f_setown_struct { > > int pid; // > 0 > > int type; // enumerates PIDTYPE_PID, PIDTYPE_PGID, F_PIDTYPE_THREAD > > } > > > > pointer via arg. Instead of F_XXXOWN_TID + int who. > > > > This way at least the users of new api can't be confused. > > This would expose PIDTYPE* to userspace, No, no, we should not export them of course. f_setown_struct->type could be 0,1,2 (or whatever), fcntl() maps them to fown_struct->enum_type or ->enum_type + ->task_only. To clarify, I am not really sure this interface is better. Just for discussion. Because it will be very painful to change this api later. It is better to at least consider all options now. Oleg. ^ permalink raw reply [flat|nested] 49+ messages in thread
* [PATCH 3/2 -v3] fcntl: F_[SG]ETOWN_EX 2009-08-03 19:02 ` Oleg Nesterov @ 2009-08-04 11:39 ` Peter Zijlstra 2009-08-04 16:20 ` Oleg Nesterov 0 siblings, 1 reply; 49+ messages in thread From: Peter Zijlstra @ 2009-08-04 11:39 UTC (permalink / raw) To: Oleg Nesterov Cc: Andrew Morton, eranian, mingo, linux-kernel, tglx, robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci, terpstra, perfmon2-devel, mtk.manpages, roland On Mon, 2009-08-03 at 21:02 +0200, Oleg Nesterov wrote: > To clarify, I am not really sure this interface is better. Just for > discussion. Because it will be very painful to change this api later. > It is better to at least consider all options now. You're completely right, how about the below? --- Subject: fcntl: F_[SG]ETOWN_EX From: Peter Zijlstra <a.p.zijlstra@chello.nl> Date: Fri, 31 Jul 2009 10:35:30 +0200 In order to direct the SIGIO signal to a particular thread of a multi-threaded application we cannot, like suggested by the manpage, put a TID into the regular fcntl(F_SETOWN) call. It will still be send to the whole process of which that thread is part. Since people do want to properly direct SIGIO we introduce F_SETOWN_EX. The need to direct SIGIO comes from self-monitoring profiling such as with perf-counters. Perf-counters uses SIGIO to notify that new sample data is available. If the signal is delivered to the same task that generated the new sample it can augment that data by inspecting the task's user-space state right after it returns from the kernel. This is esp. convenient for interpreted or virtual machine driven environments. Both F_SETOWN_EX and F_GETOWN_EX take a pointer to a struct f_owner_ex as argument: struct f_owner_ex { int type; pid_t pid; }; Where type is one of F_OWNER_TID, F_OWNER_PID or F_OWNER_GID. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> --- arch/alpha/include/asm/fcntl.h | 2 arch/parisc/include/asm/fcntl.h | 2 fs/fcntl.c | 83 +++++++++++++++++++++++++++++++++++++--- include/asm-generic/fcntl.h | 13 ++++++ 4 files changed, 95 insertions(+), 5 deletions(-) Index: linux-2.6/arch/parisc/include/asm/fcntl.h =================================================================== --- linux-2.6.orig/arch/parisc/include/asm/fcntl.h +++ linux-2.6/arch/parisc/include/asm/fcntl.h @@ -28,6 +28,8 @@ #define F_SETOWN 12 /* for sockets. */ #define F_SETSIG 13 /* for sockets. */ #define F_GETSIG 14 /* for sockets. */ +#define F_GETOWN_EX 15 +#define F_SETOWN_EX 16 /* for posix fcntl() and lockf() */ #define F_RDLCK 01 Index: linux-2.6/fs/fcntl.c =================================================================== --- linux-2.6.orig/fs/fcntl.c +++ linux-2.6/fs/fcntl.c @@ -263,6 +263,67 @@ pid_t f_getown(struct file *filp) return pid; } +static int f_setown_ex(struct file *filp, unsigned long arg) +{ + struct f_owner_ex * __user owner_p = (void * __user)arg; + struct f_owner_ex owner; + struct pid *pid; + int type; + int ret; + + ret = copy_from_user(&owner, owner_p, sizeof(owner)); + if (ret) + return ret; + + switch (owner.type) { + case F_OWNER_TID: + type = PIDTYPE_MAX; + break; + + case F_OWNER_PID: + type = PIDTYPE_PID; + break; + + case F_OWNER_GID: + type = PIDTYPE_PGID; + break; + } + + rcu_read_lock(); + pid = find_vpid(owner.pid); + ret = __f_setown(filp, pid, type, 1); + rcu_read_unlock(); + + return ret; +} + +static int f_getown_ex(struct file *filp, unsigned long arg) +{ + struct f_owner_ex * __user owner_p = (void * __user)arg; + struct f_owner_ex owner; + int ret; + + read_lock(&filp->f_owner.lock); + owner.pid = pid_vnr(filp->f_owner.pid); + switch ((int)filp->f_owner.pid_type) { + case PIDTYPE_MAX: + owner.type = F_OWNER_TID; + break; + + case PIDTYPE_PID: + owner.type = F_OWNER_PID; + break; + + case PIDTYPE_PGID: + owner.type = F_OWNER_GID; + break; + } + read_unlock(&filp->f_owner.lock); + + ret = copy_to_user(owner_p, &owner, sizeof(owner)); + return ret; +} + static long do_fcntl(int fd, unsigned int cmd, unsigned long arg, struct file *filp) { @@ -313,6 +374,12 @@ static long do_fcntl(int fd, unsigned in case F_SETOWN: err = f_setown(filp, arg, 1); break; + case F_GETOWN_EX: + err = f_getown_ex(filp, arg); + break; + case F_SETOWN_EX: + err = f_setown_ex(filp, arg); + break; case F_GETSIG: err = filp->f_owner.signum; break; @@ -428,8 +495,7 @@ static inline int sigio_perm(struct task static void send_sigio_to_task(struct task_struct *p, struct fown_struct *fown, - int fd, - int reason) + int fd, int reason, int group) { /* * F_SETSIG can change ->signum lockless in parallel, make @@ -461,11 +527,11 @@ static void send_sigio_to_task(struct ta else si.si_band = band_table[reason - POLL_IN]; si.si_fd = fd; - if (!do_send_sig_info(signum, &si, p, true)) + if (!do_send_sig_info(signum, &si, p, group)) break; /* fall-through: fall back on the old plain SIGIO signal */ case 0: - do_send_sig_info(SIGIO, SEND_SIG_PRIV, p, true); + do_send_sig_info(SIGIO, SEND_SIG_PRIV, p, group); } } @@ -474,16 +540,23 @@ void send_sigio(struct fown_struct *fown struct task_struct *p; enum pid_type type; struct pid *pid; + int group = 1; read_lock(&fown->lock); + type = fown->pid_type; + if (type == PIDTYPE_MAX) { + group = 0; + type = PIDTYPE_PID; + } + pid = fown->pid; if (!pid) goto out_unlock_fown; read_lock(&tasklist_lock); do_each_pid_task(pid, type, p) { - send_sigio_to_task(p, fown, fd, band); + send_sigio_to_task(p, fown, fd, band, group); } while_each_pid_task(pid, type, p); read_unlock(&tasklist_lock); out_unlock_fown: Index: linux-2.6/include/asm-generic/fcntl.h =================================================================== --- linux-2.6.orig/include/asm-generic/fcntl.h +++ linux-2.6/include/asm-generic/fcntl.h @@ -73,6 +73,19 @@ #define F_SETSIG 10 /* for sockets. */ #define F_GETSIG 11 /* for sockets. */ #endif +#ifndef F_SETOWN_EX +#define F_SETOWN_EX 12 +#define F_GETOWN_EX 13 +#endif + +#define F_OWNER_TID 0 +#define F_OWNER_PID 1 +#define F_OWNER_GID 2 + +struct f_owner_ex { + int type; + pid_t pid; +}; /* for F_[GET|SET]FL */ #define FD_CLOEXEC 1 /* actually anything with low bit set goes */ Index: linux-2.6/arch/alpha/include/asm/fcntl.h =================================================================== --- linux-2.6.orig/arch/alpha/include/asm/fcntl.h +++ linux-2.6/arch/alpha/include/asm/fcntl.h @@ -26,6 +26,8 @@ #define F_GETOWN 6 /* for sockets. */ #define F_SETSIG 10 /* for sockets. */ #define F_GETSIG 11 /* for sockets. */ +#define F_SETOWN_EX 12 +#define F_GETOWN_EX 13 /* for posix fcntl() and lockf() */ #define F_RDLCK 1 ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 3/2 -v3] fcntl: F_[SG]ETOWN_EX 2009-08-04 11:39 ` [PATCH 3/2 -v3] fcntl: F_[SG]ETOWN_EX Peter Zijlstra @ 2009-08-04 16:20 ` Oleg Nesterov 2009-08-04 16:52 ` Peter Zijlstra 0 siblings, 1 reply; 49+ messages in thread From: Oleg Nesterov @ 2009-08-04 16:20 UTC (permalink / raw) To: Peter Zijlstra Cc: Andrew Morton, eranian, mingo, linux-kernel, tglx, robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci, terpstra, perfmon2-devel, mtk.manpages, roland On 08/04, Peter Zijlstra wrote: > > +static int f_setown_ex(struct file *filp, unsigned long arg) > +{ > + struct f_owner_ex * __user owner_p = (void * __user)arg; > + struct f_owner_ex owner; > + struct pid *pid; > + int type; > + int ret; > + > + ret = copy_from_user(&owner, owner_p, sizeof(owner)); > + if (ret) > + return ret; > + > + switch (owner.type) { > + case F_OWNER_TID: > + type = PIDTYPE_MAX; > + break; > + > + case F_OWNER_PID: > + type = PIDTYPE_PID; > + break; > + > + case F_OWNER_GID: > + type = PIDTYPE_PGID; > + break; > + } Note that send_sigio()->do_each_pid_task(type) must use the valid type < PIDTYPE_MAX, or we can crash/etc. This means f_setown_ex() should be careful with the wrong owner->type, the switch() above needs default: return -EINVAL; > + rcu_read_lock(); > + pid = find_vpid(owner.pid); > + ret = __f_setown(filp, pid, type, 1); > + rcu_read_unlock(); > + > + return ret; Perhaps it makes sense to return -ESRCH if owner.pid && !pid, not sure. > @@ -474,16 +540,23 @@ void send_sigio(struct fown_struct *fown > struct task_struct *p; > enum pid_type type; > struct pid *pid; > + int group = 1; > > read_lock(&fown->lock); > + > type = fown->pid_type; > + if (type == PIDTYPE_MAX) { > + group = 0; > + type = PIDTYPE_PID; > + } And send_sigurg() needs the same change. I am not sure we should teach send_sigurg_to_task() to handle the F_OWNER_TID, but we must ensure send_sigurg()->do_each_pid_task() won't be called with PIDTYPE_MAX. Otherwise, personally I think this is what we need to solve the problem. Oleg. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 3/2 -v3] fcntl: F_[SG]ETOWN_EX 2009-08-04 16:20 ` Oleg Nesterov @ 2009-08-04 16:52 ` Peter Zijlstra 2009-08-04 17:19 ` Oleg Nesterov 0 siblings, 1 reply; 49+ messages in thread From: Peter Zijlstra @ 2009-08-04 16:52 UTC (permalink / raw) To: Oleg Nesterov Cc: Andrew Morton, eranian, mingo, linux-kernel, tglx, robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci, terpstra, perfmon2-devel, mtk.manpages, roland On Tue, 2009-08-04 at 18:20 +0200, Oleg Nesterov wrote: > On 08/04, Peter Zijlstra wrote: > > > > +static int f_setown_ex(struct file *filp, unsigned long arg) > > +{ > > + struct f_owner_ex * __user owner_p = (void * __user)arg; > > + struct f_owner_ex owner; > > + struct pid *pid; > > + int type; > > + int ret; > > + > > + ret = copy_from_user(&owner, owner_p, sizeof(owner)); > > + if (ret) > > + return ret; > > + > > + switch (owner.type) { > > + case F_OWNER_TID: > > + type = PIDTYPE_MAX; > > + break; > > + > > + case F_OWNER_PID: > > + type = PIDTYPE_PID; > > + break; > > + > > + case F_OWNER_GID: > > + type = PIDTYPE_PGID; > > + break; > > + } > > Note that send_sigio()->do_each_pid_task(type) must use the valid > type < PIDTYPE_MAX, or we can crash/etc. > > This means f_setown_ex() should be careful with the wrong owner->type, > the switch() above needs > > default: > return -EINVAL; D'0h very true. > > + rcu_read_lock(); > > + pid = find_vpid(owner.pid); > > + ret = __f_setown(filp, pid, type, 1); > > + rcu_read_unlock(); > > + > > + return ret; > > Perhaps it makes sense to return -ESRCH if owner.pid && !pid, not > sure. We'd need that case to unset/clear the owner, so returning -ESRCH might confuse users I think. > > @@ -474,16 +540,23 @@ void send_sigio(struct fown_struct *fown > > struct task_struct *p; > > enum pid_type type; > > struct pid *pid; > > + int group = 1; > > > > read_lock(&fown->lock); > > + > > type = fown->pid_type; > > + if (type == PIDTYPE_MAX) { > > + group = 0; > > + type = PIDTYPE_PID; > > + } > > And send_sigurg() needs the same change. I am not sure we should teach > send_sigurg_to_task() to handle the F_OWNER_TID, but we must ensure > send_sigurg()->do_each_pid_task() won't be called with PIDTYPE_MAX. How about the below delta, it changes send_sigurg_to_task() to also use do_send_sig_info() which looses the check_kill_permission() check, but your previous changes lost that same thing from SIGIO -- so I'm hoping that's ok. > Otherwise, personally I think this is what we need to solve the problem. yay! I'll look over the bits again and send out a -v4 later today. ---- Index: linux-2.6/fs/fcntl.c =================================================================== --- linux-2.6.orig/fs/fcntl.c +++ linux-2.6/fs/fcntl.c @@ -287,6 +287,9 @@ static int f_setown_ex(struct file *filp case F_OWNER_GID: type = PIDTYPE_PGID; break; + + default: + return -EINVAL; } rcu_read_lock(); @@ -564,10 +567,10 @@ void send_sigio(struct fown_struct *fown } static void send_sigurg_to_task(struct task_struct *p, - struct fown_struct *fown) + struct fown_struct *fown, int group) { if (sigio_perm(p, fown, SIGURG)) - group_send_sig_info(SIGURG, SEND_SIG_PRIV, p); + do_send_sig_info(SIGURG, SEND_SIG_PRIV, p, group); } int send_sigurg(struct fown_struct *fown) @@ -575,10 +578,17 @@ int send_sigurg(struct fown_struct *fown struct task_struct *p; enum pid_type type; struct pid *pid; + int group = 1; int ret = 0; read_lock(&fown->lock); + type = fown->pid_type; + if (type == PIDTYPE_MAX) { + group = 0; + type = PIDTYPE_PID; + } + pid = fown->pid; if (!pid) goto out_unlock_fown; @@ -587,7 +597,7 @@ int send_sigurg(struct fown_struct *fown read_lock(&tasklist_lock); do_each_pid_task(pid, type, p) { - send_sigurg_to_task(p, fown); + send_sigurg_to_task(p, fown, group); } while_each_pid_task(pid, type, p); read_unlock(&tasklist_lock); out_unlock_fown: ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 3/2 -v3] fcntl: F_[SG]ETOWN_EX 2009-08-04 16:52 ` Peter Zijlstra @ 2009-08-04 17:19 ` Oleg Nesterov 2009-08-06 13:14 ` [PATCH 3/2 -v4] " Peter Zijlstra 0 siblings, 1 reply; 49+ messages in thread From: Oleg Nesterov @ 2009-08-04 17:19 UTC (permalink / raw) To: Peter Zijlstra Cc: Andrew Morton, eranian, mingo, linux-kernel, tglx, robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci, terpstra, perfmon2-devel, mtk.manpages, roland On 08/04, Peter Zijlstra wrote: > > On Tue, 2009-08-04 at 18:20 +0200, Oleg Nesterov wrote: > > > > + rcu_read_lock(); > > > + pid = find_vpid(owner.pid); > > > + ret = __f_setown(filp, pid, type, 1); > > > + rcu_read_unlock(); > > > + > > > + return ret; > > > > Perhaps it makes sense to return -ESRCH if owner.pid && !pid, not > > sure. > > We'd need that case to unset/clear the owner, so returning -ESRCH might > confuse users I think. Agreed. Perhaps we should do nothing but return -ESRCH if user passes owner->pid != 0 and it is not valid. But this is minor and can be tweaked later. (and to clarify again, not that I really think we should do this, just a random thought). > How about the below delta, it changes send_sigurg_to_task() to also use > do_send_sig_info() which looses the check_kill_permission() check, but > your previous changes lost that same thing from SIGIO -- so I'm hoping > that's ok. Yes, I think this is fine! Oleg. ^ permalink raw reply [flat|nested] 49+ messages in thread
* [PATCH 3/2 -v4] fcntl: F_[SG]ETOWN_EX 2009-08-04 17:19 ` Oleg Nesterov @ 2009-08-06 13:14 ` Peter Zijlstra 2009-08-06 19:05 ` Oleg Nesterov 0 siblings, 1 reply; 49+ messages in thread From: Peter Zijlstra @ 2009-08-06 13:14 UTC (permalink / raw) To: Oleg Nesterov Cc: Andrew Morton, eranian, mingo, linux-kernel, tglx, robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci, terpstra, perfmon2-devel, mtk.manpages, roland Hopefully I got everything right this time ;-) --- Subject: fcntl: F_[SG]ETOWN_EX From: Peter Zijlstra <a.p.zijlstra@chello.nl> Date: Fri, 31 Jul 2009 10:35:30 +0200 In order to direct the SIGIO signal to a particular thread of a multi-threaded application we cannot, like suggested by the manpage, put a TID into the regular fcntl(F_SETOWN) call. It will still be send to the whole process of which that thread is part. Since people do want to properly direct SIGIO we introduce F_SETOWN_EX. The need to direct SIGIO comes from self-monitoring profiling such as with perf-counters. Perf-counters uses SIGIO to notify that new sample data is available. If the signal is delivered to the same task that generated the new sample it can augment that data by inspecting the task's user-space state right after it returns from the kernel. This is esp. convenient for interpreted or virtual machine driven environments. Both F_SETOWN_EX and F_GETOWN_EX take a pointer to a struct f_owner_ex as argument: struct f_owner_ex { int type; pid_t pid; }; Where type is one of F_OWNER_TID, F_OWNER_PID or F_OWNER_GID. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> --- arch/alpha/include/asm/fcntl.h | 2 arch/parisc/include/asm/fcntl.h | 2 fs/fcntl.c | 108 +++++++++++++++++++++++++++++++++++++--- include/asm-generic/fcntl.h | 13 ++++ 4 files changed, 117 insertions(+), 8 deletions(-) Index: linux-2.6/arch/parisc/include/asm/fcntl.h =================================================================== --- linux-2.6.orig/arch/parisc/include/asm/fcntl.h +++ linux-2.6/arch/parisc/include/asm/fcntl.h @@ -28,6 +28,8 @@ #define F_SETOWN 12 /* for sockets. */ #define F_SETSIG 13 /* for sockets. */ #define F_GETSIG 14 /* for sockets. */ +#define F_GETOWN_EX 15 +#define F_SETOWN_EX 16 /* for posix fcntl() and lockf() */ #define F_RDLCK 01 Index: linux-2.6/fs/fcntl.c =================================================================== --- linux-2.6.orig/fs/fcntl.c +++ linux-2.6/fs/fcntl.c @@ -263,6 +263,79 @@ pid_t f_getown(struct file *filp) return pid; } +static int f_setown_ex(struct file *filp, unsigned long arg) +{ + struct f_owner_ex * __user owner_p = (void * __user)arg; + struct f_owner_ex owner; + struct pid *pid; + int type; + int ret; + + ret = copy_from_user(&owner, owner_p, sizeof(owner)); + if (ret) + return ret; + + switch (owner.type) { + case F_OWNER_TID: + type = PIDTYPE_MAX; + break; + + case F_OWNER_PID: + type = PIDTYPE_PID; + break; + + case F_OWNER_GID: + type = PIDTYPE_PGID; + break; + + default: + return -EINVAL; + } + + rcu_read_lock(); + pid = find_vpid(owner.pid); + if (owner.pid && !pid) + ret = -ESRCH; + else + ret = __f_setown(filp, pid, type, 1); + rcu_read_unlock(); + + return ret; +} + +static int f_getown_ex(struct file *filp, unsigned long arg) +{ + struct f_owner_ex * __user owner_p = (void * __user)arg; + struct f_owner_ex owner; + int ret = 0; + + read_lock(&filp->f_owner.lock); + owner.pid = pid_vnr(filp->f_owner.pid); + switch (filp->f_owner.pid_type) { + case PIDTYPE_MAX: + owner.type = F_OWNER_TID; + break; + + case PIDTYPE_PID: + owner.type = F_OWNER_PID; + break; + + case PIDTYPE_PGID: + owner.type = F_OWNER_GID; + break; + + default: + WARN_ON(1); + ret = -EINVAL; + break; + } + read_unlock(&filp->f_owner.lock); + + if (!ret) + ret = copy_to_user(owner_p, &owner, sizeof(owner)); + return ret; +} + static long do_fcntl(int fd, unsigned int cmd, unsigned long arg, struct file *filp) { @@ -313,6 +386,12 @@ static long do_fcntl(int fd, unsigned in case F_SETOWN: err = f_setown(filp, arg, 1); break; + case F_GETOWN_EX: + err = f_getown_ex(filp, arg); + break; + case F_SETOWN_EX: + err = f_setown_ex(filp, arg); + break; case F_GETSIG: err = filp->f_owner.signum; break; @@ -428,8 +507,7 @@ static inline int sigio_perm(struct task static void send_sigio_to_task(struct task_struct *p, struct fown_struct *fown, - int fd, - int reason) + int fd, int reason, int group) { /* * F_SETSIG can change ->signum lockless in parallel, make @@ -461,11 +539,11 @@ static void send_sigio_to_task(struct ta else si.si_band = band_table[reason - POLL_IN]; si.si_fd = fd; - if (!do_send_sig_info(signum, &si, p, true)) + if (!do_send_sig_info(signum, &si, p, group)) break; /* fall-through: fall back on the old plain SIGIO signal */ case 0: - do_send_sig_info(SIGIO, SEND_SIG_PRIV, p, true); + do_send_sig_info(SIGIO, SEND_SIG_PRIV, p, group); } } @@ -474,16 +552,23 @@ void send_sigio(struct fown_struct *fown struct task_struct *p; enum pid_type type; struct pid *pid; + int group = 1; read_lock(&fown->lock); + type = fown->pid_type; + if (type == PIDTYPE_MAX) { + group = 0; + type = PIDTYPE_PID; + } + pid = fown->pid; if (!pid) goto out_unlock_fown; read_lock(&tasklist_lock); do_each_pid_task(pid, type, p) { - send_sigio_to_task(p, fown, fd, band); + send_sigio_to_task(p, fown, fd, band, group); } while_each_pid_task(pid, type, p); read_unlock(&tasklist_lock); out_unlock_fown: @@ -491,10 +576,10 @@ void send_sigio(struct fown_struct *fown } static void send_sigurg_to_task(struct task_struct *p, - struct fown_struct *fown) + struct fown_struct *fown, int group) { if (sigio_perm(p, fown, SIGURG)) - group_send_sig_info(SIGURG, SEND_SIG_PRIV, p); + do_send_sig_info(SIGURG, SEND_SIG_PRIV, p, group); } int send_sigurg(struct fown_struct *fown) @@ -502,10 +587,17 @@ int send_sigurg(struct fown_struct *fown struct task_struct *p; enum pid_type type; struct pid *pid; + int group = 1; int ret = 0; read_lock(&fown->lock); + type = fown->pid_type; + if (type == PIDTYPE_MAX) { + group = 0; + type = PIDTYPE_PID; + } + pid = fown->pid; if (!pid) goto out_unlock_fown; @@ -514,7 +606,7 @@ int send_sigurg(struct fown_struct *fown read_lock(&tasklist_lock); do_each_pid_task(pid, type, p) { - send_sigurg_to_task(p, fown); + send_sigurg_to_task(p, fown, group); } while_each_pid_task(pid, type, p); read_unlock(&tasklist_lock); out_unlock_fown: Index: linux-2.6/include/asm-generic/fcntl.h =================================================================== --- linux-2.6.orig/include/asm-generic/fcntl.h +++ linux-2.6/include/asm-generic/fcntl.h @@ -73,6 +73,19 @@ #define F_SETSIG 10 /* for sockets. */ #define F_GETSIG 11 /* for sockets. */ #endif +#ifndef F_SETOWN_EX +#define F_SETOWN_EX 12 +#define F_GETOWN_EX 13 +#endif + +#define F_OWNER_TID 0 +#define F_OWNER_PID 1 +#define F_OWNER_GID 2 + +struct f_owner_ex { + int type; + pid_t pid; +}; /* for F_[GET|SET]FL */ #define FD_CLOEXEC 1 /* actually anything with low bit set goes */ Index: linux-2.6/arch/alpha/include/asm/fcntl.h =================================================================== --- linux-2.6.orig/arch/alpha/include/asm/fcntl.h +++ linux-2.6/arch/alpha/include/asm/fcntl.h @@ -26,6 +26,8 @@ #define F_GETOWN 6 /* for sockets. */ #define F_SETSIG 10 /* for sockets. */ #define F_GETSIG 11 /* for sockets. */ +#define F_SETOWN_EX 12 +#define F_GETOWN_EX 13 /* for posix fcntl() and lockf() */ #define F_RDLCK 1 ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 3/2 -v4] fcntl: F_[SG]ETOWN_EX 2009-08-06 13:14 ` [PATCH 3/2 -v4] " Peter Zijlstra @ 2009-08-06 19:05 ` Oleg Nesterov 2009-08-07 12:10 ` stephane eranian 0 siblings, 1 reply; 49+ messages in thread From: Oleg Nesterov @ 2009-08-06 19:05 UTC (permalink / raw) To: Peter Zijlstra Cc: Andrew Morton, eranian, mingo, linux-kernel, tglx, robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci, terpstra, perfmon2-devel, mtk.manpages, roland On 08/06, Peter Zijlstra wrote: > > Subject: fcntl: F_[SG]ETOWN_EX > From: Peter Zijlstra <a.p.zijlstra@chello.nl> > Date: Fri, 31 Jul 2009 10:35:30 +0200 > > In order to direct the SIGIO signal to a particular thread of a > multi-threaded application we cannot, like suggested by the manpage, put > a TID into the regular fcntl(F_SETOWN) call. It will still be send to > the whole process of which that thread is part. > > Since people do want to properly direct SIGIO we introduce F_SETOWN_EX. > > The need to direct SIGIO comes from self-monitoring profiling such as > with perf-counters. Perf-counters uses SIGIO to notify that new sample > data is available. If the signal is delivered to the same task that > generated the new sample it can augment that data by inspecting the > task's user-space state right after it returns from the kernel. This > is esp. convenient for interpreted or virtual machine driven > environments. > > Both F_SETOWN_EX and F_GETOWN_EX take a pointer to a struct f_owner_ex > as argument: > > struct f_owner_ex { > int type; > pid_t pid; > }; > > Where type is one of F_OWNER_TID, F_OWNER_PID or F_OWNER_GID. I think the patch is right. Reviewed-by: Oleg Nesterov <oleg@redhat.com> > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> > --- > arch/alpha/include/asm/fcntl.h | 2 > arch/parisc/include/asm/fcntl.h | 2 > fs/fcntl.c | 108 +++++++++++++++++++++++++++++++++++++--- > include/asm-generic/fcntl.h | 13 ++++ > 4 files changed, 117 insertions(+), 8 deletions(-) > > Index: linux-2.6/arch/parisc/include/asm/fcntl.h > =================================================================== > --- linux-2.6.orig/arch/parisc/include/asm/fcntl.h > +++ linux-2.6/arch/parisc/include/asm/fcntl.h > @@ -28,6 +28,8 @@ > #define F_SETOWN 12 /* for sockets. */ > #define F_SETSIG 13 /* for sockets. */ > #define F_GETSIG 14 /* for sockets. */ > +#define F_GETOWN_EX 15 > +#define F_SETOWN_EX 16 > > /* for posix fcntl() and lockf() */ > #define F_RDLCK 01 > Index: linux-2.6/fs/fcntl.c > =================================================================== > --- linux-2.6.orig/fs/fcntl.c > +++ linux-2.6/fs/fcntl.c > @@ -263,6 +263,79 @@ pid_t f_getown(struct file *filp) > return pid; > } > > +static int f_setown_ex(struct file *filp, unsigned long arg) > +{ > + struct f_owner_ex * __user owner_p = (void * __user)arg; > + struct f_owner_ex owner; > + struct pid *pid; > + int type; > + int ret; > + > + ret = copy_from_user(&owner, owner_p, sizeof(owner)); > + if (ret) > + return ret; > + > + switch (owner.type) { > + case F_OWNER_TID: > + type = PIDTYPE_MAX; > + break; > + > + case F_OWNER_PID: > + type = PIDTYPE_PID; > + break; > + > + case F_OWNER_GID: > + type = PIDTYPE_PGID; > + break; > + > + default: > + return -EINVAL; > + } > + > + rcu_read_lock(); > + pid = find_vpid(owner.pid); > + if (owner.pid && !pid) > + ret = -ESRCH; > + else > + ret = __f_setown(filp, pid, type, 1); > + rcu_read_unlock(); > + > + return ret; > +} > + > +static int f_getown_ex(struct file *filp, unsigned long arg) > +{ > + struct f_owner_ex * __user owner_p = (void * __user)arg; > + struct f_owner_ex owner; > + int ret = 0; > + > + read_lock(&filp->f_owner.lock); > + owner.pid = pid_vnr(filp->f_owner.pid); > + switch (filp->f_owner.pid_type) { > + case PIDTYPE_MAX: > + owner.type = F_OWNER_TID; > + break; > + > + case PIDTYPE_PID: > + owner.type = F_OWNER_PID; > + break; > + > + case PIDTYPE_PGID: > + owner.type = F_OWNER_GID; > + break; > + > + default: > + WARN_ON(1); > + ret = -EINVAL; > + break; > + } > + read_unlock(&filp->f_owner.lock); > + > + if (!ret) > + ret = copy_to_user(owner_p, &owner, sizeof(owner)); > + return ret; > +} > + > static long do_fcntl(int fd, unsigned int cmd, unsigned long arg, > struct file *filp) > { > @@ -313,6 +386,12 @@ static long do_fcntl(int fd, unsigned in > case F_SETOWN: > err = f_setown(filp, arg, 1); > break; > + case F_GETOWN_EX: > + err = f_getown_ex(filp, arg); > + break; > + case F_SETOWN_EX: > + err = f_setown_ex(filp, arg); > + break; > case F_GETSIG: > err = filp->f_owner.signum; > break; > @@ -428,8 +507,7 @@ static inline int sigio_perm(struct task > > static void send_sigio_to_task(struct task_struct *p, > struct fown_struct *fown, > - int fd, > - int reason) > + int fd, int reason, int group) > { > /* > * F_SETSIG can change ->signum lockless in parallel, make > @@ -461,11 +539,11 @@ static void send_sigio_to_task(struct ta > else > si.si_band = band_table[reason - POLL_IN]; > si.si_fd = fd; > - if (!do_send_sig_info(signum, &si, p, true)) > + if (!do_send_sig_info(signum, &si, p, group)) > break; > /* fall-through: fall back on the old plain SIGIO signal */ > case 0: > - do_send_sig_info(SIGIO, SEND_SIG_PRIV, p, true); > + do_send_sig_info(SIGIO, SEND_SIG_PRIV, p, group); > } > } > > @@ -474,16 +552,23 @@ void send_sigio(struct fown_struct *fown > struct task_struct *p; > enum pid_type type; > struct pid *pid; > + int group = 1; > > read_lock(&fown->lock); > + > type = fown->pid_type; > + if (type == PIDTYPE_MAX) { > + group = 0; > + type = PIDTYPE_PID; > + } > + > pid = fown->pid; > if (!pid) > goto out_unlock_fown; > > read_lock(&tasklist_lock); > do_each_pid_task(pid, type, p) { > - send_sigio_to_task(p, fown, fd, band); > + send_sigio_to_task(p, fown, fd, band, group); > } while_each_pid_task(pid, type, p); > read_unlock(&tasklist_lock); > out_unlock_fown: > @@ -491,10 +576,10 @@ void send_sigio(struct fown_struct *fown > } > > static void send_sigurg_to_task(struct task_struct *p, > - struct fown_struct *fown) > + struct fown_struct *fown, int group) > { > if (sigio_perm(p, fown, SIGURG)) > - group_send_sig_info(SIGURG, SEND_SIG_PRIV, p); > + do_send_sig_info(SIGURG, SEND_SIG_PRIV, p, group); > } > > int send_sigurg(struct fown_struct *fown) > @@ -502,10 +587,17 @@ int send_sigurg(struct fown_struct *fown > struct task_struct *p; > enum pid_type type; > struct pid *pid; > + int group = 1; > int ret = 0; > > read_lock(&fown->lock); > + > type = fown->pid_type; > + if (type == PIDTYPE_MAX) { > + group = 0; > + type = PIDTYPE_PID; > + } > + > pid = fown->pid; > if (!pid) > goto out_unlock_fown; > @@ -514,7 +606,7 @@ int send_sigurg(struct fown_struct *fown > > read_lock(&tasklist_lock); > do_each_pid_task(pid, type, p) { > - send_sigurg_to_task(p, fown); > + send_sigurg_to_task(p, fown, group); > } while_each_pid_task(pid, type, p); > read_unlock(&tasklist_lock); > out_unlock_fown: > Index: linux-2.6/include/asm-generic/fcntl.h > =================================================================== > --- linux-2.6.orig/include/asm-generic/fcntl.h > +++ linux-2.6/include/asm-generic/fcntl.h > @@ -73,6 +73,19 @@ > #define F_SETSIG 10 /* for sockets. */ > #define F_GETSIG 11 /* for sockets. */ > #endif > +#ifndef F_SETOWN_EX > +#define F_SETOWN_EX 12 > +#define F_GETOWN_EX 13 > +#endif > + > +#define F_OWNER_TID 0 > +#define F_OWNER_PID 1 > +#define F_OWNER_GID 2 > + > +struct f_owner_ex { > + int type; > + pid_t pid; > +}; > > /* for F_[GET|SET]FL */ > #define FD_CLOEXEC 1 /* actually anything with low bit set goes */ > Index: linux-2.6/arch/alpha/include/asm/fcntl.h > =================================================================== > --- linux-2.6.orig/arch/alpha/include/asm/fcntl.h > +++ linux-2.6/arch/alpha/include/asm/fcntl.h > @@ -26,6 +26,8 @@ > #define F_GETOWN 6 /* for sockets. */ > #define F_SETSIG 10 /* for sockets. */ > #define F_GETSIG 11 /* for sockets. */ > +#define F_SETOWN_EX 12 > +#define F_GETOWN_EX 13 > > /* for posix fcntl() and lockf() */ > #define F_RDLCK 1 > > ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 3/2 -v4] fcntl: F_[SG]ETOWN_EX 2009-08-06 19:05 ` Oleg Nesterov @ 2009-08-07 12:10 ` stephane eranian 0 siblings, 0 replies; 49+ messages in thread From: stephane eranian @ 2009-08-07 12:10 UTC (permalink / raw) To: Oleg Nesterov Cc: Peter Zijlstra, Andrew Morton, mingo, linux-kernel, tglx, robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci, terpstra, perfmon2-devel, mtk.manpages, roland HI, On Thu, Aug 6, 2009 at 9:05 PM, Oleg Nesterov<oleg@redhat.com> wrote: > On 08/06, Peter Zijlstra wrote: >> >> Subject: fcntl: F_[SG]ETOWN_EX >> From: Peter Zijlstra <a.p.zijlstra@chello.nl> >> Date: Fri, 31 Jul 2009 10:35:30 +0200 >> >> In order to direct the SIGIO signal to a particular thread of a >> multi-threaded application we cannot, like suggested by the manpage, put >> a TID into the regular fcntl(F_SETOWN) call. It will still be send to >> the whole process of which that thread is part. >> >> Since people do want to properly direct SIGIO we introduce F_SETOWN_EX. >> >> The need to direct SIGIO comes from self-monitoring profiling such as >> with perf-counters. Perf-counters uses SIGIO to notify that new sample >> data is available. If the signal is delivered to the same task that >> generated the new sample it can augment that data by inspecting the >> task's user-space state right after it returns from the kernel. This >> is esp. convenient for interpreted or virtual machine driven >> environments. >> >> Both F_SETOWN_EX and F_GETOWN_EX take a pointer to a struct f_owner_ex >> as argument: >> >> struct f_owner_ex { >> int type; >> pid_t pid; >> }; >> >> Where type is one of F_OWNER_TID, F_OWNER_PID or F_OWNER_GID. > > I think the patch is right. > > Reviewed-by: Oleg Nesterov <oleg@redhat.com> > I have tested the patch in 2.6.30 (backport) + perfmon and it seems to work in my test case. Have not tried with perfcounters + 2.6.31. I am glad there is finally a solution to this problem. Thanks. >> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> >> --- >> arch/alpha/include/asm/fcntl.h | 2 >> arch/parisc/include/asm/fcntl.h | 2 >> fs/fcntl.c | 108 +++++++++++++++++++++++++++++++++++++--- >> include/asm-generic/fcntl.h | 13 ++++ >> 4 files changed, 117 insertions(+), 8 deletions(-) >> >> Index: linux-2.6/arch/parisc/include/asm/fcntl.h >> =================================================================== >> --- linux-2.6.orig/arch/parisc/include/asm/fcntl.h >> +++ linux-2.6/arch/parisc/include/asm/fcntl.h >> @@ -28,6 +28,8 @@ >> #define F_SETOWN 12 /* for sockets. */ >> #define F_SETSIG 13 /* for sockets. */ >> #define F_GETSIG 14 /* for sockets. */ >> +#define F_GETOWN_EX 15 >> +#define F_SETOWN_EX 16 >> >> /* for posix fcntl() and lockf() */ >> #define F_RDLCK 01 >> Index: linux-2.6/fs/fcntl.c >> =================================================================== >> --- linux-2.6.orig/fs/fcntl.c >> +++ linux-2.6/fs/fcntl.c >> @@ -263,6 +263,79 @@ pid_t f_getown(struct file *filp) >> return pid; >> } >> >> +static int f_setown_ex(struct file *filp, unsigned long arg) >> +{ >> + struct f_owner_ex * __user owner_p = (void * __user)arg; >> + struct f_owner_ex owner; >> + struct pid *pid; >> + int type; >> + int ret; >> + >> + ret = copy_from_user(&owner, owner_p, sizeof(owner)); >> + if (ret) >> + return ret; >> + >> + switch (owner.type) { >> + case F_OWNER_TID: >> + type = PIDTYPE_MAX; >> + break; >> + >> + case F_OWNER_PID: >> + type = PIDTYPE_PID; >> + break; >> + >> + case F_OWNER_GID: >> + type = PIDTYPE_PGID; >> + break; >> + >> + default: >> + return -EINVAL; >> + } >> + >> + rcu_read_lock(); >> + pid = find_vpid(owner.pid); >> + if (owner.pid && !pid) >> + ret = -ESRCH; >> + else >> + ret = __f_setown(filp, pid, type, 1); >> + rcu_read_unlock(); >> + >> + return ret; >> +} >> + >> +static int f_getown_ex(struct file *filp, unsigned long arg) >> +{ >> + struct f_owner_ex * __user owner_p = (void * __user)arg; >> + struct f_owner_ex owner; >> + int ret = 0; >> + >> + read_lock(&filp->f_owner.lock); >> + owner.pid = pid_vnr(filp->f_owner.pid); >> + switch (filp->f_owner.pid_type) { >> + case PIDTYPE_MAX: >> + owner.type = F_OWNER_TID; >> + break; >> + >> + case PIDTYPE_PID: >> + owner.type = F_OWNER_PID; >> + break; >> + >> + case PIDTYPE_PGID: >> + owner.type = F_OWNER_GID; >> + break; >> + >> + default: >> + WARN_ON(1); >> + ret = -EINVAL; >> + break; >> + } >> + read_unlock(&filp->f_owner.lock); >> + >> + if (!ret) >> + ret = copy_to_user(owner_p, &owner, sizeof(owner)); >> + return ret; >> +} >> + >> static long do_fcntl(int fd, unsigned int cmd, unsigned long arg, >> struct file *filp) >> { >> @@ -313,6 +386,12 @@ static long do_fcntl(int fd, unsigned in >> case F_SETOWN: >> err = f_setown(filp, arg, 1); >> break; >> + case F_GETOWN_EX: >> + err = f_getown_ex(filp, arg); >> + break; >> + case F_SETOWN_EX: >> + err = f_setown_ex(filp, arg); >> + break; >> case F_GETSIG: >> err = filp->f_owner.signum; >> break; >> @@ -428,8 +507,7 @@ static inline int sigio_perm(struct task >> >> static void send_sigio_to_task(struct task_struct *p, >> struct fown_struct *fown, >> - int fd, >> - int reason) >> + int fd, int reason, int group) >> { >> /* >> * F_SETSIG can change ->signum lockless in parallel, make >> @@ -461,11 +539,11 @@ static void send_sigio_to_task(struct ta >> else >> si.si_band = band_table[reason - POLL_IN]; >> si.si_fd = fd; >> - if (!do_send_sig_info(signum, &si, p, true)) >> + if (!do_send_sig_info(signum, &si, p, group)) >> break; >> /* fall-through: fall back on the old plain SIGIO signal */ >> case 0: >> - do_send_sig_info(SIGIO, SEND_SIG_PRIV, p, true); >> + do_send_sig_info(SIGIO, SEND_SIG_PRIV, p, group); >> } >> } >> >> @@ -474,16 +552,23 @@ void send_sigio(struct fown_struct *fown >> struct task_struct *p; >> enum pid_type type; >> struct pid *pid; >> + int group = 1; >> >> read_lock(&fown->lock); >> + >> type = fown->pid_type; >> + if (type == PIDTYPE_MAX) { >> + group = 0; >> + type = PIDTYPE_PID; >> + } >> + >> pid = fown->pid; >> if (!pid) >> goto out_unlock_fown; >> >> read_lock(&tasklist_lock); >> do_each_pid_task(pid, type, p) { >> - send_sigio_to_task(p, fown, fd, band); >> + send_sigio_to_task(p, fown, fd, band, group); >> } while_each_pid_task(pid, type, p); >> read_unlock(&tasklist_lock); >> out_unlock_fown: >> @@ -491,10 +576,10 @@ void send_sigio(struct fown_struct *fown >> } >> >> static void send_sigurg_to_task(struct task_struct *p, >> - struct fown_struct *fown) >> + struct fown_struct *fown, int group) >> { >> if (sigio_perm(p, fown, SIGURG)) >> - group_send_sig_info(SIGURG, SEND_SIG_PRIV, p); >> + do_send_sig_info(SIGURG, SEND_SIG_PRIV, p, group); >> } >> >> int send_sigurg(struct fown_struct *fown) >> @@ -502,10 +587,17 @@ int send_sigurg(struct fown_struct *fown >> struct task_struct *p; >> enum pid_type type; >> struct pid *pid; >> + int group = 1; >> int ret = 0; >> >> read_lock(&fown->lock); >> + >> type = fown->pid_type; >> + if (type == PIDTYPE_MAX) { >> + group = 0; >> + type = PIDTYPE_PID; >> + } >> + >> pid = fown->pid; >> if (!pid) >> goto out_unlock_fown; >> @@ -514,7 +606,7 @@ int send_sigurg(struct fown_struct *fown >> >> read_lock(&tasklist_lock); >> do_each_pid_task(pid, type, p) { >> - send_sigurg_to_task(p, fown); >> + send_sigurg_to_task(p, fown, group); >> } while_each_pid_task(pid, type, p); >> read_unlock(&tasklist_lock); >> out_unlock_fown: >> Index: linux-2.6/include/asm-generic/fcntl.h >> =================================================================== >> --- linux-2.6.orig/include/asm-generic/fcntl.h >> +++ linux-2.6/include/asm-generic/fcntl.h >> @@ -73,6 +73,19 @@ >> #define F_SETSIG 10 /* for sockets. */ >> #define F_GETSIG 11 /* for sockets. */ >> #endif >> +#ifndef F_SETOWN_EX >> +#define F_SETOWN_EX 12 >> +#define F_GETOWN_EX 13 >> +#endif >> + >> +#define F_OWNER_TID 0 >> +#define F_OWNER_PID 1 >> +#define F_OWNER_GID 2 >> + >> +struct f_owner_ex { >> + int type; >> + pid_t pid; >> +}; >> >> /* for F_[GET|SET]FL */ >> #define FD_CLOEXEC 1 /* actually anything with low bit set goes */ >> Index: linux-2.6/arch/alpha/include/asm/fcntl.h >> =================================================================== >> --- linux-2.6.orig/arch/alpha/include/asm/fcntl.h >> +++ linux-2.6/arch/alpha/include/asm/fcntl.h >> @@ -26,6 +26,8 @@ >> #define F_GETOWN 6 /* for sockets. */ >> #define F_SETSIG 10 /* for sockets. */ >> #define F_GETSIG 11 /* for sockets. */ >> +#define F_SETOWN_EX 12 >> +#define F_GETOWN_EX 13 >> >> /* for posix fcntl() and lockf() */ >> #define F_RDLCK 1 >> >> > > ^ permalink raw reply [flat|nested] 49+ messages in thread
* [PATCH 1/2] signals: introduce do_send_sig_info() helper 2009-07-31 21:11 ` Andrew Morton 2009-08-01 1:27 ` [PATCH 0/2] send_sigio/do_send_sig_info (Was: [RFC][PATCH] fcntl: F_[SG]ETOWN_TID) Oleg Nesterov @ 2009-08-01 1:28 ` Oleg Nesterov 2009-08-01 1:28 ` [PATCH 2/2] signals: send_sigio: use do_send_sig_info() to avoid check_kill_permission() Oleg Nesterov ` (2 subsequent siblings) 4 siblings, 0 replies; 49+ messages in thread From: Oleg Nesterov @ 2009-08-01 1:28 UTC (permalink / raw) To: Andrew Morton Cc: Peter Zijlstra, eranian, mingo, linux-kernel, tglx, robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci, terpstra, perfmon2-devel, mtk.manpages, roland Introduce do_send_sig_info() and convert group_send_sig_info(), send_sig_info(), do_send_specific() to use this helper. Hopefully it will have more users soon, it allows to specify specific/group behaviour via "bool group" argument. Shaves 80 bytes from .text. Signed-off-by: Oleg Nesterov <oleg@redhat.com> --- include/linux/signal.h | 2 + kernel/signal.c | 56 +++++++++++++++++++++++-------------------------- 2 files changed, 29 insertions(+), 29 deletions(-) --- WAIT/include/linux/signal.h~1_HELPER 2009-06-11 14:16:46.000000000 +0200 +++ WAIT/include/linux/signal.h 2009-08-01 02:26:41.000000000 +0200 @@ -233,6 +233,8 @@ static inline int valid_signal(unsigned } extern int next_signal(struct sigpending *pending, sigset_t *mask); +extern int do_send_sig_info(int sig, struct siginfo *info, + struct task_struct *p, bool group); extern int group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p); extern int __group_send_sig_info(int, struct siginfo *, struct task_struct *); extern long do_rt_tgsigqueueinfo(pid_t tgid, pid_t pid, int sig, --- WAIT/kernel/signal.c~1_HELPER 2009-07-01 20:22:44.000000000 +0200 +++ WAIT/kernel/signal.c 2009-08-01 02:21:27.000000000 +0200 @@ -971,6 +971,20 @@ specific_send_sig_info(int sig, struct s return send_signal(sig, info, t, 0); } +int do_send_sig_info(int sig, struct siginfo *info, struct task_struct *p, + bool group) +{ + unsigned long flags; + int ret = -ESRCH; + + if (lock_task_sighand(p, &flags)) { + ret = send_signal(sig, info, p, group); + unlock_task_sighand(p, &flags); + } + + return ret; +} + /* * Force a signal that the process can't ignore: if necessary * we unblock the signal and change any SIG_IGN to SIG_DFL. @@ -1068,18 +1082,10 @@ struct sighand_struct *lock_task_sighand */ int group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p) { - unsigned long flags; - int ret; - - ret = check_kill_permission(sig, info, p); + int ret = check_kill_permission(sig, info, p); - if (!ret && sig) { - ret = -ESRCH; - if (lock_task_sighand(p, &flags)) { - ret = __group_send_sig_info(sig, info, p); - unlock_task_sighand(p, &flags); - } - } + if (!ret && sig) + ret = do_send_sig_info(sig, info, p, true); return ret; } @@ -1224,15 +1230,9 @@ static int kill_something_info(int sig, * These are for backward compatibility with the rest of the kernel source. */ -/* - * The caller must ensure the task can't exit. - */ int send_sig_info(int sig, struct siginfo *info, struct task_struct *p) { - int ret; - unsigned long flags; - /* * Make sure legacy kernel users don't send in bad values * (normal paths check this in check_kill_permission). @@ -1240,10 +1240,7 @@ send_sig_info(int sig, struct siginfo *i if (!valid_signal(sig)) return -EINVAL; - spin_lock_irqsave(&p->sighand->siglock, flags); - ret = specific_send_sig_info(sig, info, p); - spin_unlock_irqrestore(&p->sighand->siglock, flags); - return ret; + return do_send_sig_info(sig, info, p, false); } #define __si_special(priv) \ @@ -2281,7 +2278,6 @@ static int do_send_specific(pid_t tgid, pid_t pid, int sig, struct siginfo *info) { struct task_struct *p; - unsigned long flags; int error = -ESRCH; rcu_read_lock(); @@ -2291,14 +2287,16 @@ do_send_specific(pid_t tgid, pid_t pid, /* * The null signal is a permissions and process existence * probe. No signal is actually delivered. - * - * If lock_task_sighand() fails we pretend the task dies - * after receiving the signal. The window is tiny, and the - * signal is private anyway. */ - if (!error && sig && lock_task_sighand(p, &flags)) { - error = specific_send_sig_info(sig, info, p); - unlock_task_sighand(p, &flags); + if (!error && sig) { + error = do_send_sig_info(sig, info, p, false); + /* + * If lock_task_sighand() failed we pretend the task + * dies after receiving the signal. The window is tiny, + * and the signal is private anyway. + */ + if (unlikely(error == -ESRCH)) + error = 0; } } rcu_read_unlock(); ^ permalink raw reply [flat|nested] 49+ messages in thread
* [PATCH 2/2] signals: send_sigio: use do_send_sig_info() to avoid check_kill_permission() 2009-07-31 21:11 ` Andrew Morton 2009-08-01 1:27 ` [PATCH 0/2] send_sigio/do_send_sig_info (Was: [RFC][PATCH] fcntl: F_[SG]ETOWN_TID) Oleg Nesterov 2009-08-01 1:28 ` [PATCH 1/2] signals: introduce do_send_sig_info() helper Oleg Nesterov @ 2009-08-01 1:28 ` Oleg Nesterov 2009-08-03 12:53 ` [RFC][PATCH] fcntl: F_[SG]ETOWN_TID stephane eranian 2009-08-03 15:21 ` [RFC][PATCH] fcntl: F_[SG]ETOWN_TID Peter Zijlstra 4 siblings, 0 replies; 49+ messages in thread From: Oleg Nesterov @ 2009-08-01 1:28 UTC (permalink / raw) To: Andrew Morton Cc: Peter Zijlstra, eranian, mingo, linux-kernel, tglx, robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci, terpstra, perfmon2-devel, mtk.manpages, roland group_send_sig_info()->check_kill_permission() assumes that current is the sender and uses current_cred(). This is not true in send_sigio_to_task() case. From the security pov the sender is not current, but the task which did fcntl(F_SETOWN), that is why we have sigio_perm() which uses the right creds to check. Fortunately, send_sigio() always sends either SEND_SIG_PRIV or SI_FROMKERNEL() signal, so check_kill_permission() does nothing. But still it would be tidier to avoid this bogus security check and save a couple of cycles. Signed-off-by: Oleg Nesterov <oleg@redhat.com> --- fs/fcntl.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- WAIT/fs/fcntl.c~2_SIGIO 2009-06-17 14:11:26.000000000 +0200 +++ WAIT/fs/fcntl.c 2009-08-01 02:40:41.000000000 +0200 @@ -462,11 +462,11 @@ static void send_sigio_to_task(struct ta else si.si_band = band_table[reason - POLL_IN]; si.si_fd = fd; - if (!group_send_sig_info(signum, &si, p)) + if (!do_send_sig_info(signum, &si, p, true)) break; /* fall-through: fall back on the old plain SIGIO signal */ case 0: - group_send_sig_info(SIGIO, SEND_SIG_PRIV, p); + do_send_sig_info(SIGIO, SEND_SIG_PRIV, p, true); } } ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [RFC][PATCH] fcntl: F_[SG]ETOWN_TID 2009-07-31 21:11 ` Andrew Morton ` (2 preceding siblings ...) 2009-08-01 1:28 ` [PATCH 2/2] signals: send_sigio: use do_send_sig_info() to avoid check_kill_permission() Oleg Nesterov @ 2009-08-03 12:53 ` stephane eranian 2009-08-09 5:46 ` F_SETOWN_TID: F_SETOWN was thread-specific for a while Jamie Lokier 2009-08-03 15:21 ` [RFC][PATCH] fcntl: F_[SG]ETOWN_TID Peter Zijlstra 4 siblings, 1 reply; 49+ messages in thread From: stephane eranian @ 2009-08-03 12:53 UTC (permalink / raw) To: Andrew Morton Cc: Peter Zijlstra, oleg, mingo, linux-kernel, tglx, robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci, terpstra, perfmon2-devel, mtk.manpages, roland Andrew, On Fri, Jul 31, 2009 at 11:11 PM, Andrew Morton<akpm@linux-foundation.org> wrote: > On Fri, 31 Jul 2009 10:35:20 +0200 > Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > >> In order to direct the SIGIO signal to a particular thread of a >> multi-threaded application we cannot, like suggested by the manpage, put >> a TID into the regular fcntl(F_SETOWN) call. It will still be send to >> the whole process of which that thread is part. >> >> Since people do want to properly direct SIGIO we introduce F_SETOWN_TID, >> which functions similarly to F_SETOWN, except positive arguments are >> interpreted as TIDs and negative arguments are interpreted as PIDs. >> >> This extension is fully bug compatible with the old F_GETOWN >> implementation in that F_GETOWN_TID will be troubled by the negative >> return value for PIDs similarly to F_GETOWN's trouble with process >> groups. > > I'd be interested in seeing a bit more explanation about the "people do > want to properly direct SIGIO" thing - use cases, how the current code > causes them problems, etc. As it stands, it's a bit of a mystery-patch. > I have explained that in my initial post on LKML. Here is a link to it: http://lkml.org/lkml/2009/7/27/242 Basically, in the context of self-monitoring threads (which is very common), you want the sample notification to be asynchronous and yet delivered to a particular thread, i.e., the one in which the sample was recorded. The asynchronous notification is required because you are monitoring yourself. Asynchronous signals are normally delivered to any thread within the process. In my message, I described the algorithm currently implemented by the kernel. As Peter found out, the man page seems to indicate that if you specify a TID instead a PID with F_SETOWN, then the kernel should interpret this as meaning this particular thread, but this is not what is implemented right now. Self-monitoring has specific needs, you don't necessarily want SIGIO to always be delivered to the thread in which it originated. That is why, I have suggested a new code path. It is also important that backward compatibility be maintained. >> [ compile tested only so far ] > > I will continue to lurk :) > >> arch/parisc/include/asm/fcntl.h | 2 + >> fs/fcntl.c | 64 +++++++++++++++++++++++++++++++++----- >> include/asm-generic/fcntl.h | 4 ++ >> include/linux/fs.h | 11 +++++- >> net/socket.c | 2 +- > > OK. > > > > Alpha has private definitions of F_SETSIG and F_GETSIG which are > identical to the generic ones. That's somewhat logical, given that > alpha's F_SETOWN/F_GETOWN _differ_ from the asm-generic ones. > > Alpha appears to have made the decision to spell out _all_ the F_* > flags, given that some of them are different. That has some merit, I > guess. > > But your patch broke that. > ^ permalink raw reply [flat|nested] 49+ messages in thread
* F_SETOWN_TID: F_SETOWN was thread-specific for a while 2009-08-03 12:53 ` [RFC][PATCH] fcntl: F_[SG]ETOWN_TID stephane eranian @ 2009-08-09 5:46 ` Jamie Lokier 2009-08-10 12:22 ` stephane eranian 0 siblings, 1 reply; 49+ messages in thread From: Jamie Lokier @ 2009-08-09 5:46 UTC (permalink / raw) To: eranian Cc: Andrew Morton, Peter Zijlstra, oleg, mingo, linux-kernel, tglx, robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci, terpstra, perfmon2-devel, mtk.manpages, roland stephane eranian wrote: > As Peter found out, the man page seems to indicate that if you specify a TID > instead a PID with F_SETOWN, then the kernel should interpret this as meaning > this particular thread, but this is not what is implemented right now. The behaviour was changed quietly in kernel 2.6.12. I wrote that part of the man page, and it was true at the time. SIGIO signals _were_ thread-specific. Starting in kernel 2.5.60, SIGIO signals were thread-specific when F_SETSIG was used to enable queued siginfo. Prompted by this F_SETOWN_TID patch, I checked through old kernel patches, and found that signals set by F_SIGSIG were changed to process-wide in 2.6.12: @@ -437,7 +438,7 @@ static void send_sigio_to_task(struct ta else si.si_band = band_table[reason - POLL_IN]; si.si_fd = fd; - if (!send_sig_info(fown->signum, &si, p)) + if (!send_group_sig_info(fown->signum, &si, p)) break; /* fall-through: fall back on the old plain SIGIO signal */ case 0: That's a bit annoying, because it breaks a library which uses queued I/O signals as an I/O event mechanism when used with multiple threads. (Fortunatelly epoll is available nowadays; it does I/O events much better, but sometimes you want SIGIO from epoll's descriptor...) So the man page is now incorrect, but if F_SETOWN_TID is added and has the behaviour which F_SETOWN used to have, then the text can be shuffled around. If anyone is interested, I have a fairly detailed test program which checks queued "SIGIO" (F_SETSIG) signals due to I/O on a socket, including SIGURG and SIGIO on overflow, and checks whether each one is delivered properly and is thread-specific. I found quite a few Glibc and kernel bugs with it in the past. -- Jamie ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: F_SETOWN_TID: F_SETOWN was thread-specific for a while 2009-08-09 5:46 ` F_SETOWN_TID: F_SETOWN was thread-specific for a while Jamie Lokier @ 2009-08-10 12:22 ` stephane eranian 2009-08-10 17:03 ` Oleg Nesterov 0 siblings, 1 reply; 49+ messages in thread From: stephane eranian @ 2009-08-10 12:22 UTC (permalink / raw) To: Jamie Lokier Cc: Andrew Morton, Peter Zijlstra, oleg, mingo, linux-kernel, tglx, robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci, terpstra, perfmon2-devel, mtk.manpages, roland Jamie, On Sun, Aug 9, 2009 at 7:46 AM, Jamie Lokier<jamie@shareable.org> wrote: > stephane eranian wrote: >> As Peter found out, the man page seems to indicate that if you specify a TID >> instead a PID with F_SETOWN, then the kernel should interpret this as meaning >> this particular thread, but this is not what is implemented right now. > > The behaviour was changed quietly in kernel 2.6.12. > > I wrote that part of the man page, and it was true at the time. SIGIO > signals _were_ thread-specific. > I knew at some point this worked because with perfmon we managed to get signals delivered to the right thread. But I guess it was just until 2.6.12 ;-< > Starting in kernel 2.5.60, SIGIO signals were thread-specific when > F_SETSIG was used to enable queued siginfo. > Thank you for bringing up F_SETSIG. I use it in some of my test programs. I thought it was related to the shared vs. private queue. But I looked at it again, and in fact it is related to another yet important issue when sampling. You must use F_SETSIG on SIGIO if you want your signal handler to receive the file descriptor in siginfo. This is useful if you want to perform some actions on the descriptor. That is the case in perfmon and this is the case in certain situations with perfcounters as well. Setting SA_SIGINFO provides siginfo, but the si_fd field is NOT set correctly without F_SETSIG. I have verified this with perfcounters, and this is indeed the case. This behavior seems kind of odd to me. I think it may be a "lucky" side effect of the intended behavior of F_SETSIG. It would be good to clarify this point more. > Prompted by this F_SETOWN_TID patch, I checked through old kernel > patches, and found that signals set by F_SIGSIG were changed to > process-wide in 2.6.12: > > @@ -437,7 +438,7 @@ static void send_sigio_to_task(struct ta > else > si.si_band = band_table[reason - POLL_IN]; > si.si_fd = fd; > - if (!send_sig_info(fown->signum, &si, p)) > + if (!send_group_sig_info(fown->signum, &si, p)) > break; > /* fall-through: fall back on the old plain SIGIO signal */ > case 0: > > That's a bit annoying, because it breaks a library which uses queued > I/O signals as an I/O event mechanism when used with multiple threads. > (Fortunatelly epoll is available nowadays; it does I/O events much > better, but sometimes you want SIGIO from epoll's descriptor...) > > So the man page is now incorrect, but if F_SETOWN_TID is added and has > the behaviour which F_SETOWN used to have, then the text can be > shuffled around. > > If anyone is interested, I have a fairly detailed test program which > checks queued "SIGIO" (F_SETSIG) signals due to I/O on a socket, > including SIGURG and SIGIO on overflow, and checks whether each one is > delivered properly and is thread-specific. I found quite a few Glibc > and kernel bugs with it in the past. > > -- Jamie > ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: F_SETOWN_TID: F_SETOWN was thread-specific for a while 2009-08-10 12:22 ` stephane eranian @ 2009-08-10 17:03 ` Oleg Nesterov 2009-08-10 21:01 ` stephane eranian 2009-08-11 13:10 ` Jamie Lokier 0 siblings, 2 replies; 49+ messages in thread From: Oleg Nesterov @ 2009-08-10 17:03 UTC (permalink / raw) To: stephane eranian Cc: Jamie Lokier, Andrew Morton, Peter Zijlstra, mingo, linux-kernel, tglx, robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci, terpstra, perfmon2-devel, mtk.manpages, roland On 08/10, stephane eranian wrote: > > You must use F_SETSIG on SIGIO if you want your signal handler to > receive the file descriptor in siginfo. This is useful if you want to perform > some actions on the descriptor. That is the case in perfmon and this is > the case in certain situations with perfcounters as well. > > Setting SA_SIGINFO provides siginfo, but the si_fd field is NOT set > correctly without F_SETSIG. I have verified this with perfcounters, and > this is indeed the case. > > This behavior seems kind of odd to me. Agreed, this looks a bit odd. But at least this is documented. From man 2 fcntl: By using F_SETSIG with a nonzero value, and setting SA_SIGINFO for the signal handler (see sigac- tion(2)), extra information about I/O events is passed to the handler in a siginfo_t structure. If the si_code field indicates the source is SI_SIGIO, the si_fd field gives the file descriptor associated with the event. Otherwise, there is no indication which file descriptors are pending, Not sure if it is safe to change the historical behaviour. (the manpage is not exactly right though, and the comment in send_sigio_to_task() is not right too: SI_SIGIO (and, btw, SI_QUEUE/SI_DETHREAD) is never used). Oleg. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: F_SETOWN_TID: F_SETOWN was thread-specific for a while 2009-08-10 17:03 ` Oleg Nesterov @ 2009-08-10 21:01 ` stephane eranian 2009-08-17 17:16 ` Oleg Nesterov 2009-08-11 13:10 ` Jamie Lokier 1 sibling, 1 reply; 49+ messages in thread From: stephane eranian @ 2009-08-10 21:01 UTC (permalink / raw) To: Oleg Nesterov Cc: Jamie Lokier, Andrew Morton, Peter Zijlstra, mingo, linux-kernel, tglx, robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci, terpstra, perfmon2-devel, mtk.manpages, roland On Mon, Aug 10, 2009 at 7:03 PM, Oleg Nesterov<oleg@redhat.com> wrote: > On 08/10, stephane eranian wrote: >> >> You must use F_SETSIG on SIGIO if you want your signal handler to >> receive the file descriptor in siginfo. This is useful if you want to perform >> some actions on the descriptor. That is the case in perfmon and this is >> the case in certain situations with perfcounters as well. >> >> Setting SA_SIGINFO provides siginfo, but the si_fd field is NOT set >> correctly without F_SETSIG. I have verified this with perfcounters, and >> this is indeed the case. >> >> This behavior seems kind of odd to me. > > Agreed, this looks a bit odd. But at least this is documented. From > man 2 fcntl: > > By using F_SETSIG with a nonzero value, and setting SA_SIGINFO for the > signal handler (see sigac- tion(2)), extra information about I/O events > is passed to the handler in a siginfo_t structure. If the si_code > field indicates the source is SI_SIGIO, the si_fd field gives the file > descriptor associated with the event. Otherwise, there is no indication > which file descriptors are pending, > > Not sure if it is safe to change the historical behaviour. > Don't need to change it. But for SIGIO, if you see SA_SIGINFO, then pass the si_fd. >From Jamie, it seems F_SETSIG was not added just for this file descriptor trick. The initial goal seems orthogonal to the file descriptor passing in si_fd. > (the manpage is not exactly right though, and the comment in send_sigio_to_task() > is not right too: SI_SIGIO (and, btw, SI_QUEUE/SI_DETHREAD) is never used). > > Oleg. > > ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: F_SETOWN_TID: F_SETOWN was thread-specific for a while 2009-08-10 21:01 ` stephane eranian @ 2009-08-17 17:16 ` Oleg Nesterov 2009-08-17 17:40 ` Oleg Nesterov 2009-08-17 22:26 ` stephane eranian 0 siblings, 2 replies; 49+ messages in thread From: Oleg Nesterov @ 2009-08-17 17:16 UTC (permalink / raw) To: stephane eranian Cc: Jamie Lokier, Andrew Morton, Peter Zijlstra, mingo, linux-kernel, tglx, robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci, terpstra, perfmon2-devel, mtk.manpages, roland Sorry for late reply... On 08/10, stephane eranian wrote: > > On Mon, Aug 10, 2009 at 7:03 PM, Oleg Nesterov<oleg@redhat.com> wrote: > > On 08/10, stephane eranian wrote: > >> > >> You must use F_SETSIG on SIGIO if you want your signal handler to > >> receive the file descriptor in siginfo. This is useful if you want to perform > >> some actions on the descriptor. That is the case in perfmon and this is > >> the case in certain situations with perfcounters as well. > >> > >> Setting SA_SIGINFO provides siginfo, but the si_fd field is NOT set > >> correctly without F_SETSIG. I have verified this with perfcounters, and > >> this is indeed the case. > >> > >> This behavior seems kind of odd to me. > > > > Agreed, this looks a bit odd. But at least this is documented. From > > man 2 fcntl: > > > > By using F_SETSIG with a nonzero value, and setting SA_SIGINFO for the > > signal handler (see sigac- tion(2)), extra information about I/O events > > is passed to the handler in a siginfo_t structure. If the si_code > > field indicates the source is SI_SIGIO, the si_fd field gives the file > > descriptor associated with the event. Otherwise, there is no indication > > which file descriptors are pending, > > > > Not sure if it is safe to change the historical behaviour. > > > Don't need to change it. Good, > But for SIGIO, if you see SA_SIGINFO, then pass the si_fd. But this means we do change the behaviour ;) Confused. In any case. We should not look at SA_SIGINFO at all. If sys_sigaction() was called without SA_SIGINFO, then it doesn'matter if we send SEND_SIG_PRIV or siginfo_t with the correct si_fd/etc. And again, this is even documented. The change is trivial but user-space visible, it may confuse the (stupid) app which uses SIGIO + SA_SIGINFO without F_SETSIG. Oleg. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: F_SETOWN_TID: F_SETOWN was thread-specific for a while 2009-08-17 17:16 ` Oleg Nesterov @ 2009-08-17 17:40 ` Oleg Nesterov 2009-08-17 22:26 ` stephane eranian 1 sibling, 0 replies; 49+ messages in thread From: Oleg Nesterov @ 2009-08-17 17:40 UTC (permalink / raw) To: stephane eranian Cc: Jamie Lokier, Andrew Morton, Peter Zijlstra, mingo, linux-kernel, tglx, robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci, terpstra, perfmon2-devel, mtk.manpages, roland Forgot to show the patch, On 08/17, Oleg Nesterov wrote: > > And again, this is even documented. The change is trivial but user-space > visible, it may confuse the (stupid) app which uses SIGIO + SA_SIGINFO > without F_SETSIG. Oleg. Personally I do not really think this change is good idea. (and in any case it should be re-diffed on top of Peter's OWN_EX patch). Btw. _in theory_, "case 0" is not right wrt security_file_send_sigiotask(sig). I think we shouldn't worry. --- a/fs/fcntl.c +++ b/fs/fcntl.c @@ -431,6 +431,7 @@ static void send_sigio_to_task(struct ta int fd, int reason) { + siginfo_t si; /* * F_SETSIG can change ->signum lockless in parallel, make * sure we read it once and use the same value throughout. @@ -439,33 +440,33 @@ static void send_sigio_to_task(struct ta if (!sigio_perm(p, fown, signum)) return; + /* Queue a rt signal with the appropriate fd as its + value. We use SI_SIGIO as the source, not + SI_KERNEL, since kernel signals always get + delivered even if we can't queue. Failure to + queue in this case _should_ be reported; we fall + back to SIGIO in that case. --sct */ + si.si_errno = 0; + si.si_fd = fd; + si.si_code = reason; + /* Make sure we are called with one of the POLL_* + reasons, otherwise we could leak kernel stack into + userspace. */ + BUG_ON((reason & __SI_MASK) != __SI_POLL); + if (reason - POLL_IN >= NSIGPOLL) + si.si_band = ~0L; + else + si.si_band = band_table[reason - POLL_IN]; switch (signum) { - siginfo_t si; default: - /* Queue a rt signal with the appropriate fd as its - value. We use SI_SIGIO as the source, not - SI_KERNEL, since kernel signals always get - delivered even if we can't queue. Failure to - queue in this case _should_ be reported; we fall - back to SIGIO in that case. --sct */ si.si_signo = signum; - si.si_errno = 0; - si.si_code = reason; - /* Make sure we are called with one of the POLL_* - reasons, otherwise we could leak kernel stack into - userspace. */ - BUG_ON((reason & __SI_MASK) != __SI_POLL); - if (reason - POLL_IN >= NSIGPOLL) - si.si_band = ~0L; - else - si.si_band = band_table[reason - POLL_IN]; - si.si_fd = fd; if (!group_send_sig_info(signum, &si, p)) break; /* fall-through: fall back on the old plain SIGIO signal */ case 0: - group_send_sig_info(SIGIO, SEND_SIG_PRIV, p); + si.si_signo = SIGIO; + group_send_sig_info(SIGIO, &si, p); } } ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: F_SETOWN_TID: F_SETOWN was thread-specific for a while 2009-08-17 17:16 ` Oleg Nesterov 2009-08-17 17:40 ` Oleg Nesterov @ 2009-08-17 22:26 ` stephane eranian 2009-08-18 11:45 ` Oleg Nesterov 1 sibling, 1 reply; 49+ messages in thread From: stephane eranian @ 2009-08-17 22:26 UTC (permalink / raw) To: Oleg Nesterov Cc: Jamie Lokier, Andrew Morton, Peter Zijlstra, mingo, linux-kernel, tglx, robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci, terpstra, perfmon2-devel, mtk.manpages, roland Oleg, On Mon, Aug 17, 2009 at 7:16 PM, Oleg Nesterov<oleg@redhat.com> wrote: > Sorry for late reply... > > On 08/10, stephane eranian wrote: >> >> On Mon, Aug 10, 2009 at 7:03 PM, Oleg Nesterov<oleg@redhat.com> wrote: >> > On 08/10, stephane eranian wrote: >> >> >> >> You must use F_SETSIG on SIGIO if you want your signal handler to >> >> receive the file descriptor in siginfo. This is useful if you want to perform >> >> some actions on the descriptor. That is the case in perfmon and this is >> >> the case in certain situations with perfcounters as well. >> >> >> >> Setting SA_SIGINFO provides siginfo, but the si_fd field is NOT set >> >> correctly without F_SETSIG. I have verified this with perfcounters, and >> >> this is indeed the case. >> >> >> >> This behavior seems kind of odd to me. >> > >> > Agreed, this looks a bit odd. But at least this is documented. From >> > man 2 fcntl: >> > >> > By using F_SETSIG with a nonzero value, and setting SA_SIGINFO for the >> > signal handler (see sigac- tion(2)), extra information about I/O events >> > is passed to the handler in a siginfo_t structure. If the si_code >> > field indicates the source is SI_SIGIO, the si_fd field gives the file >> > descriptor associated with the event. Otherwise, there is no indication >> > which file descriptors are pending, >> > >> > Not sure if it is safe to change the historical behaviour. >> > >> Don't need to change it. > > Good, > >> But for SIGIO, if you see SA_SIGINFO, then pass the si_fd. > > But this means we do change the behaviour ;) Confused. > I meant do not remove F_SETSIG and its side-effect on si_fd. > > In any case. We should not look at SA_SIGINFO at all. If sys_sigaction() was > called without SA_SIGINFO, then it doesn'matter if we send SEND_SIG_PRIV or > siginfo_t with the correct si_fd/etc. > What's the official role of SA_SIGINFO? Pass a siginfo struct? Does POSIX describe the rules governing the content of si_fd? Or is si_fd a Linux-ony extension in which case it goes with F_SETSIG. > And again, this is even documented. The change is trivial but user-space > visible, it may confuse the (stupid) app which uses SIGIO + SA_SIGINFO > without F_SETSIG. > That would be an app that uses SIGINFO and fiddles with si_fd which has no defined content. What kind of app would that be? It would seem natural that in the siginfo passed to the handler of SIGIO, the file descriptor be passed by default. That is all I am trying to say here. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: F_SETOWN_TID: F_SETOWN was thread-specific for a while 2009-08-17 22:26 ` stephane eranian @ 2009-08-18 11:45 ` Oleg Nesterov 2009-08-20 10:00 ` stephane eranian 0 siblings, 1 reply; 49+ messages in thread From: Oleg Nesterov @ 2009-08-18 11:45 UTC (permalink / raw) To: stephane eranian Cc: Jamie Lokier, Andrew Morton, Peter Zijlstra, mingo, linux-kernel, tglx, robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci, terpstra, perfmon2-devel, mtk.manpages, roland On 08/18, stephane eranian wrote: > > On Mon, Aug 17, 2009 at 7:16 PM, Oleg Nesterov<oleg@redhat.com> wrote: > > Sorry for late reply... > > > > On 08/10, stephane eranian wrote: > >> > >> On Mon, Aug 10, 2009 at 7:03 PM, Oleg Nesterov<oleg@redhat.com> wrote: > >> > > >> > Not sure if it is safe to change the historical behaviour. > >> > > >> Don't need to change it. > > > > Good, > > > >> But for SIGIO, if you see SA_SIGINFO, then pass the si_fd. > > > > But this means we do change the behaviour ;) Confused. > > > I meant do not remove F_SETSIG and its side-effect on si_fd. Ah, now I see what you meant. > > In any case. We should not look at SA_SIGINFO at all. If sys_sigaction() was > > called without SA_SIGINFO, then it doesn'matter if we send SEND_SIG_PRIV or > > siginfo_t with the correct si_fd/etc. > > > What's the official role of SA_SIGINFO? Pass a siginfo struct? > > Does POSIX describe the rules governing the content of si_fd? > Or is si_fd a Linux-ony extension in which case it goes with F_SETSIG. Not sure I understand your concern... OK. You suggest to pass siginfo_t with .si_fd/etc when we detect SA_SIGINFO. But, in that case we can _always_ pass siginfo_t, regardless of SA_SIGINFO. If the task has a signal handler and sigaction() was called without SA_SIGINFO, then the handler must not look into *info (the second arg of sigaction->sa_sigaction). And in fact, __setup_rt_frame() doesn't even copy the info to the user-space: if (ka->sa.sa_flags & SA_SIGINFO) { if (copy_siginfo_to_user(&frame->info, info)) return -EFAULT; } OK? Or I missed something? > > And again, this is even documented. The change is trivial but user-space > > visible, it may confuse the (stupid) app which uses SIGIO + SA_SIGINFO > > without F_SETSIG. > > > That would be an app that uses SIGINFO and fiddles with si_fd which has no > defined content. What kind of app would that be? The stupid app. But it is always unsafe to make the user-visible changes without good reasons. Even when we fix the bug (and the current code is not buggy) sometimes we have "this patch breaks my app or test-case!" reports. If nothing else, we can break the test-case which simply does void sigio_handler(int sig, siginfo_t *info, void *u) { assert(info->si_code == 0 && info->si_code = 0x80); } Once again: this is _documented_ ! And we can't set .si_fd = fd whithout changing .si_code, this will break copy_siginfo_to_user(). Or. Suppose that some app does: void io_handler(int sig, siginfo_t *info, void *u) { if ((info->si_code & __SI_MASK) != SI_POLL) { // RT signal failed! sig MUST be == SIGIO recover(); } else { do_something(info->si_fd); } } int main(void) { sigaction(SIGRTMIN, { SA_SIGINFO, io_handler }); sigaction(SIGIO, { SA_SIGINFO, io_handler }); ... } This is correct. But if we change the current behaviour, this app won't be able to detect the overflow. > It would seem natural that in the siginfo passed to the handler of SIGIO, the > file descriptor be passed by default. That is all I am trying to say here. Completely agreed! I was always puzzled by send_sigio_to_task(). I was never able to understand why it looks so strange. So, I think it should be static void send_sigio_to_task(struct task_struct *p, struct fown_struct *fown, int fd, int reason) { siginfo_t si; /* * F_SETSIG can change ->signum lockless in parallel, make * sure we read it once and use the same value throughout. */ int signum = ACCESS_ONCE(fown->signum) ?: SIGIO; if (!sigio_perm(p, fown, signum)) return; si.si_signo = signum; si.si_errno = 0; si.si_code = reason; si.si_fd = fd; /* Make sure we are called with one of the POLL_* reasons, otherwise we could leak kernel stack into userspace. */ BUG_ON((reason & __SI_MASK) != __SI_POLL); if (reason - POLL_IN >= NSIGPOLL) si.si_band = ~0L; else si.si_band = band_table[reason - POLL_IN]; /* Failure to queue an rt signal must be reported as SIGIO */ if (!group_send_sig_info(signum, &si, p)) group_send_sig_info(SIGIO, SEND_SIG_PRIV, p); } (except it should be on top of fcntl-add-f_etown_ex.patch). This way, at least we don't break the "detect RT signal failed" above. What do you think? But let me repeat: I just can't convince myself we have a good reason to change the strange, but carefully documented behaviour. In case you agree with the code above, I can send the patch. But only if I have a "good" changelog from you + your Signed-of-by in advance ;) Otherwise, please feel free to send this/similar change yourself. Oleg. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: F_SETOWN_TID: F_SETOWN was thread-specific for a while 2009-08-18 11:45 ` Oleg Nesterov @ 2009-08-20 10:00 ` stephane eranian 0 siblings, 0 replies; 49+ messages in thread From: stephane eranian @ 2009-08-20 10:00 UTC (permalink / raw) To: Oleg Nesterov Cc: Jamie Lokier, Andrew Morton, Peter Zijlstra, mingo, linux-kernel, tglx, robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci, terpstra, perfmon2-devel, mtk.manpages, roland Oleg, On Tue, Aug 18, 2009 at 1:45 PM, Oleg Nesterov<oleg@redhat.com> wrote: > On 08/18, stephane eranian wrote: >> > In any case. We should not look at SA_SIGINFO at all. If sys_sigaction() was >> > called without SA_SIGINFO, then it doesn'matter if we send SEND_SIG_PRIV or >> > siginfo_t with the correct si_fd/etc. >> > >> What's the official role of SA_SIGINFO? Pass a siginfo struct? >> >> Does POSIX describe the rules governing the content of si_fd? >> Or is si_fd a Linux-ony extension in which case it goes with F_SETSIG. > > Not sure I understand your concern... > > OK. You suggest to pass siginfo_t with .si_fd/etc when we detect SA_SIGINFO. > The reason I refer to SA_SIGINFO is simply because of the excerpt from the sigaction man page: If SA_SIGINFO is specified in sa_flags, then sa_sigaction (instead of sa_handler) specifies the signal-handling function for signum. This function receives the signal number as its first argument, a pointer to a siginfo_t as its second argument and a pointer to a ucontext_t (cast to void *) as its third argument. In other words, I use this to emphasize the fact that to get a siginfo struct, you need to pass SA_SIGINFO and use sa_sigaction instead of sa_handler. That's all I am saying. > But, in that case we can _always_ pass siginfo_t, regardless of SA_SIGINFO. > If the task has a signal handler and sigaction() was called without > SA_SIGINFO, then the handler must not look into *info (the second arg of > sigaction->sa_sigaction). And in fact, __setup_rt_frame() doesn't even > copy the info to the user-space: > > if (ka->sa.sa_flags & SA_SIGINFO) { > if (copy_siginfo_to_user(&frame->info, info)) > return -EFAULT; > } > > OK? Or I missed something? > No, I think we are in agreement. To get meaningful siginfo use SA_SIGINFO. > Or. Suppose that some app does: > > void io_handler(int sig, siginfo_t *info, void *u) > { > if ((info->si_code & __SI_MASK) != SI_POLL) { > // RT signal failed! sig MUST be == SIGIO > recover(); > } else { > do_something(info->si_fd); > } > } > > int main(void) > { > sigaction(SIGRTMIN, { SA_SIGINFO, io_handler }); > sigaction(SIGIO, { SA_SIGINFO, io_handler }); > ... > } > I don't think you can check si_code without first checking which signal it is in si_signo. The values for si_code overlap between the different struct in siginfo->_sifields. >> It would seem natural that in the siginfo passed to the handler of SIGIO, the >> file descriptor be passed by default. That is all I am trying to say here. > > Completely agreed! I was always puzzled by send_sigio_to_task(). I was never > able to understand why it looks so strange. > > So, I think it should be > > static void send_sigio_to_task(struct task_struct *p, > struct fown_struct *fown, > int fd, > int reason) > { > siginfo_t si; > /* > * F_SETSIG can change ->signum lockless in parallel, make > * sure we read it once and use the same value throughout. > */ > int signum = ACCESS_ONCE(fown->signum) ?: SIGIO; > > if (!sigio_perm(p, fown, signum)) > return; > > si.si_signo = signum; > si.si_errno = 0; > si.si_code = reason; > si.si_fd = fd; > /* Make sure we are called with one of the POLL_* > reasons, otherwise we could leak kernel stack into > userspace. */ > BUG_ON((reason & __SI_MASK) != __SI_POLL); > if (reason - POLL_IN >= NSIGPOLL) > si.si_band = ~0L; > else > si.si_band = band_table[reason - POLL_IN]; > > /* Failure to queue an rt signal must be reported as SIGIO */ > if (!group_send_sig_info(signum, &si, p)) > group_send_sig_info(SIGIO, SEND_SIG_PRIV, p); > } > > (except it should be on top of fcntl-add-f_etown_ex.patch). > This way, at least we don't break the "detect RT signal failed" above. > > What do you think? > > But let me repeat: I just can't convince myself we have a good reason > to change the strange, but carefully documented behaviour. > I agree with you. Given the existing documentation in the man page of fcntl() and the various code examples. I think it is possible for developers to figure out how to get the si_fd in the handler. This problem is not specific to perf_counters nor perfmon. Any SIGIO-based program may be interested in getting si_fd, therefore I am assuming the solution is well understood at this point. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: F_SETOWN_TID: F_SETOWN was thread-specific for a while 2009-08-10 17:03 ` Oleg Nesterov 2009-08-10 21:01 ` stephane eranian @ 2009-08-11 13:10 ` Jamie Lokier 2009-08-17 17:05 ` Oleg Nesterov 1 sibling, 1 reply; 49+ messages in thread From: Jamie Lokier @ 2009-08-11 13:10 UTC (permalink / raw) To: Oleg Nesterov Cc: stephane eranian, Andrew Morton, Peter Zijlstra, mingo, linux-kernel, tglx, robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci, terpstra, perfmon2-devel, mtk.manpages, roland Oleg Nesterov wrote: > Agreed, this looks a bit odd. But at least this is documented. From > man 2 fcntl: > > By using F_SETSIG with a nonzero value, and setting SA_SIGINFO > for the signal handler (see sigaction(2)), extra information > about I/O events is passed to the handler in a siginfo_t > structure. If the si_code field indicates the source is > SI_SIGIO, the si_fd field gives the file descriptor associated > with the event. Otherwise, there is no indication which file > descriptors are pending, > > Not sure if it is safe to change the historical behaviour. The change in 2.6.12 breaks some code of mine, which uses RT queued I/O signals on multiple threads but as far as I know it's not used anywhere now. In the <= 2.4 era, there were lots of web servers and benchmarks using queued I/O signals for scalable event-driven I/O, but I don't know of any implementation who dared do it with multiple threads, except mine. It was regarded as "beware ye who enter here" territory, which I can attest to from the long time it took to get it right and the multitude of kernel bugs and version changes needing to be worked around. Since 2.6, everyone uses epoll which is much better, except that occasionally SIGIO comes in handy when an async notification is required. So the change in 2.6.12 does break something that probably isn't much used, but it's too late now. Occasionally thread-specific SIGIO (or F_SETSIG) is useful; F_SETOWN_TID makes that nice and clear. I would drop the pseudo-"bug compatible" behaviour of using negative tid to mean pid; that's pointless. I'd also make F_GETOWN return an error when F_SETOWN_TID has been used, and F_GETOWN_TID return an error when F_SETOWN has been used. > (the manpage is not exactly right though, and the comment in > send_sigio_to_task() is not right too: SI_SIGIO (and, btw, > SI_QUEUE/SI_DETHREAD) is never used). Ah, there's another historical change you see. It was changed in 2.3.21 from SI_SIGIO to POLL_xxx, and si_band started being set at the same time. The man page could be updated to reflect that. (My portable-to-ancient-Linux code checks for si_code == SI_SIGIO, in which case it has the descriptor but doesn't know what type of event (pre 2.3.21) so adds it to a poll() set, or checks for si_code == POLL_xxx, in which case it ignores the si_code value completely and looks at si_band for the set of pending events because some patch that was never mainlined could result in multiple si_band bits set). -- Jamie ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: F_SETOWN_TID: F_SETOWN was thread-specific for a while 2009-08-11 13:10 ` Jamie Lokier @ 2009-08-17 17:05 ` Oleg Nesterov 0 siblings, 0 replies; 49+ messages in thread From: Oleg Nesterov @ 2009-08-17 17:05 UTC (permalink / raw) To: Jamie Lokier Cc: stephane eranian, Andrew Morton, Peter Zijlstra, mingo, linux-kernel, tglx, robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci, terpstra, perfmon2-devel, mtk.manpages, roland Sorry for late reply. And I am a bit confused. On 08/11, Jamie Lokier wrote: > > Oleg Nesterov wrote: > > Agreed, this looks a bit odd. But at least this is documented. From > > man 2 fcntl: > > > > By using F_SETSIG with a nonzero value, and setting SA_SIGINFO > > for the signal handler (see sigaction(2)), extra information > > about I/O events is passed to the handler in a siginfo_t > > structure. If the si_code field indicates the source is > > SI_SIGIO, the si_fd field gives the file descriptor associated > > with the event. Otherwise, there is no indication which file > > descriptors are pending, > > > > Not sure if it is safe to change the historical behaviour. > > The change in 2.6.12 breaks some code of mine, which uses RT queued > I/O signals on multiple threads but as far as I know it's not used > anywhere now. > > In the <= 2.4 era, there were lots of web servers and benchmarks using > queued I/O signals for scalable event-driven I/O, but I don't know of > any implementation who dared do it with multiple threads, except mine. > > It was regarded as "beware ye who enter here" territory, which I can > attest to from the long time it took to get it right and the multitude > of kernel bugs and version changes needing to be worked around. > > Since 2.6, everyone uses epoll which is much better, except that > occasionally SIGIO comes in handy when an async notification is > required. > > So the change in 2.6.12 does break something that probably isn't much > used, but it's too late now. So, you seem to agree we should not change this odd behaviour? > Occasionally thread-specific SIGIO (or > F_SETSIG) is useful; F_SETOWN_TID makes that nice and clear. Great. If you agree with F_SETOWN_TID, could you look at the next Peter's patch "[PATCH 3/2 -v4] fcntl: F_[SG]ETOWN_EX" http://marc.info/?l=linux-kernel&m=124956452125468 and ack it? > I would drop the pseudo-"bug compatible" behaviour of using negative > tid to mean pid; that's pointless. done, > I'd also make F_GETOWN return an > error when F_SETOWN_TID has been used, This is not trivial, F_GETOWN can't return the error. A negative result means PIDTYPE_PGID. > and F_GETOWN_TID return an > error when F_SETOWN has been used. F_GETOWN_EX does this even better. Oleg. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [RFC][PATCH] fcntl: F_[SG]ETOWN_TID 2009-07-31 21:11 ` Andrew Morton ` (3 preceding siblings ...) 2009-08-03 12:53 ` [RFC][PATCH] fcntl: F_[SG]ETOWN_TID stephane eranian @ 2009-08-03 15:21 ` Peter Zijlstra 4 siblings, 0 replies; 49+ messages in thread From: Peter Zijlstra @ 2009-08-03 15:21 UTC (permalink / raw) To: Andrew Morton Cc: oleg, eranian, mingo, linux-kernel, tglx, robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci, terpstra, perfmon2-devel, mtk.manpages, roland On Fri, 2009-07-31 at 14:11 -0700, Andrew Morton wrote: > > > > arch/parisc/include/asm/fcntl.h | 2 + > > fs/fcntl.c | 64 +++++++++++++++++++++++++++++++++----- > > include/asm-generic/fcntl.h | 4 ++ > > include/linux/fs.h | 11 +++++- > > net/socket.c | 2 +- > > OK. > > > > Alpha has private definitions of F_SETSIG and F_GETSIG which are > identical to the generic ones. That's somewhat logical, given that > alpha's F_SETOWN/F_GETOWN _differ_ from the asm-generic ones. > > Alpha appears to have made the decision to spell out _all_ the F_* > flags, given that some of them are different. That has some merit, I > guess. > > But your patch broke that. Right, all I did was validate that I didn't overlap with any of the existing fcntl numbers. I can explicitly add them to alpha if that makes people happy.. ^ permalink raw reply [flat|nested] 49+ messages in thread
end of thread, other threads:[~2009-08-20 10:00 UTC | newest]
Thread overview: 49+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-27 16:51 perf_counters issue with self-sampling threads stephane eranian
2009-07-27 16:56 ` Peter Zijlstra
2009-07-27 21:25 ` Andi Kleen
[not found] ` <7c86c4470907272213w2ee57080re50dd22a4d73a7e0@mail.gmail.com>
2009-07-28 8:51 ` stephane eranian
2009-07-28 8:56 ` Andi Kleen
2009-07-28 9:13 ` stephane eranian
2009-08-04 16:09 ` stephane eranian
2009-07-29 12:19 ` Peter Zijlstra
2009-07-29 12:37 ` stephane eranian
2009-07-29 12:46 ` Peter Zijlstra
2009-07-29 22:17 ` Oleg Nesterov
2009-07-30 11:31 ` Peter Zijlstra
2009-07-30 19:20 ` Oleg Nesterov
2009-07-30 20:00 ` Peter Zijlstra
2009-07-30 20:28 ` Oleg Nesterov
2009-07-30 21:09 ` stephane eranian
2009-07-31 8:35 ` [RFC][PATCH] fcntl: F_[SG]ETOWN_TID Peter Zijlstra
2009-07-31 14:01 ` stephane eranian
2009-07-31 20:52 ` Oleg Nesterov
2009-07-31 21:11 ` Andrew Morton
2009-08-01 1:27 ` [PATCH 0/2] send_sigio/do_send_sig_info (Was: [RFC][PATCH] fcntl: F_[SG]ETOWN_TID) Oleg Nesterov
2009-08-03 15:48 ` [PATCH 3/2] fcntl: F_[SG]ETOWN_TID Peter Zijlstra
2009-08-03 17:16 ` Oleg Nesterov
2009-08-03 17:47 ` Peter Zijlstra
2009-08-03 18:06 ` Oleg Nesterov
2009-08-03 18:36 ` Peter Zijlstra
2009-08-03 19:02 ` Oleg Nesterov
2009-08-04 11:39 ` [PATCH 3/2 -v3] fcntl: F_[SG]ETOWN_EX Peter Zijlstra
2009-08-04 16:20 ` Oleg Nesterov
2009-08-04 16:52 ` Peter Zijlstra
2009-08-04 17:19 ` Oleg Nesterov
2009-08-06 13:14 ` [PATCH 3/2 -v4] " Peter Zijlstra
2009-08-06 19:05 ` Oleg Nesterov
2009-08-07 12:10 ` stephane eranian
2009-08-01 1:28 ` [PATCH 1/2] signals: introduce do_send_sig_info() helper Oleg Nesterov
2009-08-01 1:28 ` [PATCH 2/2] signals: send_sigio: use do_send_sig_info() to avoid check_kill_permission() Oleg Nesterov
2009-08-03 12:53 ` [RFC][PATCH] fcntl: F_[SG]ETOWN_TID stephane eranian
2009-08-09 5:46 ` F_SETOWN_TID: F_SETOWN was thread-specific for a while Jamie Lokier
2009-08-10 12:22 ` stephane eranian
2009-08-10 17:03 ` Oleg Nesterov
2009-08-10 21:01 ` stephane eranian
2009-08-17 17:16 ` Oleg Nesterov
2009-08-17 17:40 ` Oleg Nesterov
2009-08-17 22:26 ` stephane eranian
2009-08-18 11:45 ` Oleg Nesterov
2009-08-20 10:00 ` stephane eranian
2009-08-11 13:10 ` Jamie Lokier
2009-08-17 17:05 ` Oleg Nesterov
2009-08-03 15:21 ` [RFC][PATCH] fcntl: F_[SG]ETOWN_TID Peter Zijlstra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox