* drm_lock()->block_all_signals() is broken
@ 2011-07-12 18:15 Oleg Nesterov
2011-07-12 19:34 ` Linus Torvalds
2011-07-12 20:17 ` Dave Airlie
0 siblings, 2 replies; 4+ messages in thread
From: Oleg Nesterov @ 2011-07-12 18:15 UTC (permalink / raw)
To: Daniel Vetter, David Airlie, Linus Torvalds
Cc: Tejun Heo, linux-kernel, dri-devel
Hello.
I tried many times to ask about the supposed behaviour of
block_all_signals() in drm, but it seems nobody can answer.
So I am going to send the patch which simply removes
block_all_signals() and friends. There are numeruous problems
with this interace, I can't even enumerate them. But I think
that it is enough to mention that block_all_signals() simply
can not work. AT ALL. I am wondering, was it ever tested and
how.
So. ioctl()->drm_lock() "blocks" the stop signals. Probably to
ensure the task can't be stopped until it does DRM_IOCTL_UNLOCK.
And what does this mean? Yes, the task won't stop if it receives,
say, SIGTSTP. But! Instead it will loop forever in kernel mode
until it receives another unblocked/non-ignored signal which
should be numerically less than SIGSTOP.
Why do we need this? Once again. block_all_signals(SIGTSTP)
only means that the caller will burn cpu instead of sleeping
in TASK_STOPPED after ^Z. What is the point?
And once again, there are other problems. For example, even if
block_all_signals() actually blocked SIGSTOP/etc, this can not
help if the caller is multithreaded.
I strongly believe block_all_signals() should die. Given that
it doesn't work, could somebody please explain me what will
be broken?
Just in case... Please look at the debugging patch below. With
this patch,
$ perl -le 'syscall 157,666 and die $!; sleep 1, print while ++$_'
1
2
3
^Z
Hang. So it does react to ^Z anyway, just it is looping in the
endless loop in the kernel. It can only look as if ^Z is ignored,
because obviously bash doesn't see it stopped.
Now lets look at drm_notifier(). If it returns 0 it does:
/* Otherwise, set flag to force call to
drmUnlock */
drmUnlock? grep shows nothing...
do {
old = s->lock->lock;
new = old | _DRM_LOCK_CONT;
prev = cmpxchg(&s->lock->lock, old, new);
} while (prev != old);
return 0;
OK. So, if block_all_signals() makes any sense, it seems that this
is only because we add _DRM_LOCK_CONT.
Who checks _DRM_LOCK_CONT? _DRM_LOCK_IS_CONT(), but it has no users.
Hmm. Looks like via_release_futex() is the only user, but it doesn't
look as "force call to drmUnlock" and it is CONFIG_DRM_VIA only.
I am totally confused. But block_all_signals() should die anyway.
We can probably implement something like 'i-am-going-to-stop' or
even 'can-i-stop' per-thread notifiers, although this all looks
like the user-space problem to me (yes, I know absolutely nothing
about drm/etc).
If nothing else. We can change drm_lock/drm_unlock to literally
block/unblock SIGSTOP/etc (or perhaps we only should worry about
the signals from tty?). This is the awful hack and this can't work
with the multithreaded tasks too, but still it is better than what
we have now.
Oleg.
--- a/kernel/sys.c~ 2011-06-16 20:12:18.000000000 +0200
+++ b/kernel/sys.c 2011-07-12 16:24:50.000000000 +0200
@@ -1614,6 +1614,11 @@ SYSCALL_DEFINE1(umask, int, mask)
return mask;
}
+static int notifier(void *arg)
+{
+ return 0;
+}
+
SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
unsigned long, arg4, unsigned long, arg5)
{
@@ -1627,6 +1632,13 @@ SYSCALL_DEFINE5(prctl, int, option, unsi
error = 0;
switch (option) {
+ case 666: {
+ sigset_t *pmask = kmalloc(sizeof(*pmask), GFP_KERNEL);
+ siginitset(pmask, sigmask(SIGTSTP));
+ block_all_signals(notifier, NULL, pmask);
+ break;
+ }
+
case PR_SET_PDEATHSIG:
if (!valid_signal(arg2)) {
error = -EINVAL;
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: drm_lock()->block_all_signals() is broken
2011-07-12 18:15 drm_lock()->block_all_signals() is broken Oleg Nesterov
@ 2011-07-12 19:34 ` Linus Torvalds
2011-07-12 20:17 ` Dave Airlie
1 sibling, 0 replies; 4+ messages in thread
From: Linus Torvalds @ 2011-07-12 19:34 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Daniel Vetter, David Airlie, Tejun Heo, linux-kernel, dri-devel
On Tue, Jul 12, 2011 at 11:15 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>
> I tried many times to ask about the supposed behaviour of
> block_all_signals() in drm, but it seems nobody can answer.
It's always been broken, I think. We've had threads about it before
(years and years ago). I'd certainly be willing to just remove it
entirely, and see if anything pops up that ends up actually working
and being saner.
Linus
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: drm_lock()->block_all_signals() is broken
2011-07-12 18:15 drm_lock()->block_all_signals() is broken Oleg Nesterov
2011-07-12 19:34 ` Linus Torvalds
@ 2011-07-12 20:17 ` Dave Airlie
2011-07-13 17:40 ` Oleg Nesterov
1 sibling, 1 reply; 4+ messages in thread
From: Dave Airlie @ 2011-07-12 20:17 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Daniel Vetter, David Airlie, Linus Torvalds, Tejun Heo,
linux-kernel, dri-devel
On Tue, Jul 12, 2011 at 7:15 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> Hello.
>
> I tried many times to ask about the supposed behaviour of
> block_all_signals() in drm, but it seems nobody can answer.
Its not hard to explain, basically there exists hardware that you
program acceleration engines using MMIO, having multiple processes
writing to the MMIO area at once is a bad idea, so we use the DRM lock
which is like a pre-futex futex. Now the problem is if you stop or
kill a process in a critical section where it is writing to the
hardware directly with MMIO you could crash the hardware, so the idea
is that you block to the max until the critical section exits via the
drm unlock.
some links
http://lkml.indiana.edu/hypermail/linux/kernel/0202.2/0045.html
The only driver that might even use this and care in the tree is tdfx
as I think it might be MMIO programmed, though I'm not 100% sure. I
pretty much reckon we can deprecate block_all_signals as I forsee
fixing all the problem you pointed out with it would be worse than
just removing it.
>
> So I am going to send the patch which simply removes
> block_all_signals() and friends. There are numeruous problems
> with this interace, I can't even enumerate them. But I think
> that it is enough to mention that block_all_signals() simply
> can not work. AT ALL. I am wondering, was it ever tested and
> how.
I wasn't around, but it did work back in linux-2.4.0-test7 when it was
introduced,
http://www.kernel.org/pub/linux/kernel/v2.4/old-test-kernels/patch-2.4.0-test7.gz
is the initial patch. You should probably check that out and see if
you can work out how it originally worked, and the work out
if its really is broken now or if you haven't analysed it correctly.
> I strongly believe block_all_signals() should die. Given that
> it doesn't work, could somebody please explain me what will
> be broken?
> Just in case... Please look at the debugging patch below. With
> this patch,
>
> $ perl -le 'syscall 157,666 and die $!; sleep 1, print while ++$_'
> 1
> 2
> 3
> ^Z
>
> Hang. So it does react to ^Z anyway, just it is looping in the
> endless loop in the kernel. It can only look as if ^Z is ignored,
> because obviously bash doesn't see it stopped.
taking the drm lock usually means you will drop it without doing
anything stupid like sleeping for a long time, if the process dies,
the lock gets dropped automatically also.
I've only second hand knowledge of how this works though, and I'm on
holidays for another week or so, maybe once I get back I'll find some
time to figure out how it works vs what happens, but really I suspect
we can kill this with fire.
Dave.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: drm_lock()->block_all_signals() is broken
2011-07-12 20:17 ` Dave Airlie
@ 2011-07-13 17:40 ` Oleg Nesterov
0 siblings, 0 replies; 4+ messages in thread
From: Oleg Nesterov @ 2011-07-13 17:40 UTC (permalink / raw)
To: Dave Airlie
Cc: Daniel Vetter, David Airlie, Linus Torvalds, Tejun Heo,
linux-kernel, dri-devel
On 07/12, Dave Airlie wrote:
>
> On Tue, Jul 12, 2011 at 7:15 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> > Hello.
> >
> > I tried many times to ask about the supposed behaviour of
> > block_all_signals() in drm, but it seems nobody can answer.
>
> Its not hard to explain, basically there exists hardware that you
> program acceleration engines using MMIO, having multiple processes
> writing to the MMIO area at once is a bad idea, so we use the DRM lock
> which is like a pre-futex futex.
I guess LOCK_TEST_WITH_RETURN() does the check... This looks per-file,
not per process/thread but I guess this doesn't matter in practice.
And in any case, of course I do not understand this code.
> Now the problem is if you stop or
> kill a process in a critical section where it is writing to the
> hardware directly with MMIO you could crash the hardware, so the idea
> is that you block to the max until the critical section exits via the
> drm unlock.
OK, thanks.
But who can send SIGTSOP to the drm_lock() holder? I mean, can this
be a problem in practice?
As for SIGTSTP/SIGTTIN/SIGTTOU, the application can block them itself.
> I
> pretty much reckon we can deprecate block_all_signals as I forsee
> fixing all the problem you pointed out with it would be worse than
> just removing it.
Yes, I think we should kill it. May be we should implement something
else instead, I dunno.
> > So I am going to send the patch which simply removes
> > block_all_signals() and friends. There are numeruous problems
> > with this interace, I can't even enumerate them. But I think
> > that it is enough to mention that block_all_signals() simply
> > can not work. AT ALL. I am wondering, was it ever tested and
> > how.
>
> I wasn't around, but it did work back in linux-2.4.0-test7 when it was
> introduced,
>
> http://www.kernel.org/pub/linux/kernel/v2.4/old-test-kernels/patch-2.4.0-test7.gz
>
> is the initial patch. You should probably check that out and see if
> you can work out how it originally worked, and the work out
> if its really is broken now or if you haven't analysed it correctly.
Oh, 10 years ago ;) Of course I can't be sure if it ever worked or
not. Probably it mostly worked, although I _feel_ it was always buggy.
But it doesn't now, and I am pretty sure it was broken many years ago.
And no, I do not think we can fix it. It is not that the current
implementation is buggy (although it is buggy), this interface is
simply wrong, imho.
For example. block_all_signals(SIGINT) can not "block" SIGINT, it
can be transofmed into SIGKILL. Of course, I am not saying it is
not possible to do something which really works, but imho we should
not do this.
Currently it is only used by drm, and drm doesn't need the "generic"
interface anyway. If we decide that the kernel should help somehow
to the process holding drm_lock (I hope it shouldn't ;), then we
should implement something sane.
> > Just in case... Please look at the debugging patch below. With
> > this patch,
> >
> > $ perl -le 'syscall 157,666 and die $!; sleep 1, print while ++$_'
> > 1
> > 2
> > 3
> > ^Z
> >
> > Hang. So it does react to ^Z anyway, just it is looping in the
> > endless loop in the kernel. It can only look as if ^Z is ignored,
> > because obviously bash doesn't see it stopped.
>
> taking the drm lock usually means you will drop it without doing
> anything stupid like sleeping for a long time,
Yes, I understand. I added "sleep" to lessen the output. The point
is, it "stops" anyway even if block_all_signals() was called. It can't
return to the user-mode and release the lock.
> if the process dies,
> the lock gets dropped automatically also.
Just curious... do you mean drm_release() does this? Again, this is
per file. A thread can exit without drm_unlock(). OK, at least this
works if the process is killed and nobody else keeps this file opened.
> I've only second hand knowledge of how this works though, and I'm on
> holidays for another week or so, maybe once I get back I'll find some
> time to figure out how it works vs what happens, but really I suspect
> we can kill this with fire.
OK, thanks Dave. This is not urgent. I do not want to send the patch
which touches the code I do not understand and can't test. I'll wait
for your return.
To summarize: block_all_signals() do not work. Afaics, the _only_
effect it has is that we set _DRM_LOCK_CONT. If this is not that
important we can simply kill block_all_signals(), then we can think
if we need do something else.
Oleg.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2011-07-13 17:42 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-07-12 18:15 drm_lock()->block_all_signals() is broken Oleg Nesterov
2011-07-12 19:34 ` Linus Torvalds
2011-07-12 20:17 ` Dave Airlie
2011-07-13 17:40 ` Oleg Nesterov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox