linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Gábor Melis" <mega@retes.hu>
To: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>,
	Davide Libenzi <davidel@xmailserver.org>,
	Ingo Molnar <mingo@elte.hu>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Chris Friesen <cfriesen@nortel.com>,
	linux-kernel@vger.kernel.org
Subject: Re: RT signal queue overflow (Was: Q: SEGSEGV && uc_mcontext->ip (Was: Signal delivery order))
Date: Wed, 18 Mar 2009 10:02:07 +0100	[thread overview]
Message-ID: <200903181002.07584.mega@retes.hu> (raw)
In-Reply-To: <20090318075901.4FA19FC3AB@magilla.sf.frob.com>

[-- Attachment #1: Type: text/plain, Size: 3309 bytes --]

On Miércoles 18 Marzo 2009, Roland McGrath wrote:
> > First of all, perhaps I missed somethings and this is solvable
> > without kernel changes, but I can't see how.
>
> It depends what kind of "solved" you mean.
>
> Signals pending for the thread are always delivered before signals
> pending for the process.  POSIX does not guarantee this to the
> application, but it has always been so in Linux and it's fine enough
> to rely on that.  Truly externally-generated and asynchronous signals
> go to the process, so it's really only pthread_kill use within your
> own program that raises the issue.
>
> Among signals pending for the thread, signals < SIGRTMIN are always
> delivered before ones >= SIGRTMIN.  POSIX does not guarantee this to
> the application, but it has always been so in Linux and it's fine
> enough to rely on that.  The most sensible thing to use with
> pthread_kill is some SIGRTMIN+n signal anyway, since they are never
> confused with any other use. If your program is doing that, you don't
> have a problem.

It was just a month or so ago when I finally made to change to use a 
non-real-time signal for signalling stop-for-gc. It was motivated by 
the fact that even with rt signals there needs to be a fallback 
mechanism for when the rt signal queue overflows. Another reason was 
that _different processes_ could interfere with each other: if one 
filled the queue the other processes would hang too (there was no 
fallback mechanism implemented). From this behaviour, it seemed that 
the rt signal queue was global. Attached is a test program that 
reproduces this. 

$ gcc -lpthread rt-signal-queue-overflow.c
$ (./a.out &); sleep 1; ./a.out
pthread_kill returned EAGAIN, errno=0, count=24566
pthread_kill returned EAGAIN, errno=0, count=0

There are two notable things here. The first is that pthread_kill 
returns EAGAIN that's not mentioned on the man page, but does not set 
errno. The other is that the first process filled the rt signal queue 
and the second one could not send a single signal successfully.

Granted, without a fallback mechanism my app deserved to lose. However, 
it seemed to me that there were other programs lacking in this regard 
on my desktop as I managed to hang a few of them.

Even though within my app I could have guarenteed that the number of 
pending rt signals is below a reasonable limit, there was no way to 
defend against other processes filling up the queue so I had to 
implement fallback mechanism that used non-rt signals (changing a few 
other things as well) and when that was done, there was no reason to 
keep the rt signal based one around.

Consider this another quality-of-implementation report.

> So on the one hand it seems pretty reasonable to say it's "solved" by
> accepting it when we say, "Welcome to Unix, these things should have
> stopped surprising you in the 1980s."  It's a strange pitfall of how
> everything fits together, granted.  But you do sort of have to make
> an effort to do things screwily before you can fall into it.
>
> All that said, it's actually probably a pretty easy hack to arrange
> that the signal posted by force_sig_info is the first one dequeued in
> all but the most utterly strange situations.
>
>
> Thanks,
> Roland

[-- Attachment #2: rt-signal-queue-overflow.c --]
[-- Type: text/x-csrc, Size: 1036 bytes --]

#include <signal.h>
#include <pthread.h>
#include <stdlib.h>
#include <stdio.h>
#include <sys/mman.h>
#include <sys/ucontext.h>
#include <unistd.h>
#include <errno.h>

int test_signal;

void test_handler(int signal, siginfo_t *info, void *context)
{
}

void install_handlers(void)
{
    struct sigaction sa;
    sa.sa_flags = SA_SIGINFO;
    sigemptyset(&sa.sa_mask);
    sa.sa_sigaction = test_handler;
    sigaction(test_signal, &sa, 0);
}

int main(void)
{
    sigset_t sigset;
    test_signal = SIGRTMIN;
    install_handlers();
    sigemptyset(&sigset);
    sigaddset(&sigset, SIGRTMIN);
    sigprocmask(SIG_BLOCK, &sigset, 0);
    {
        int r;
        int count = 0;
        do {
            r = pthread_kill(pthread_self(), test_signal);
            if (r == EAGAIN) {
                printf("pthread_kill returned EAGAIN, errno=%d, count=%d\n",
                       errno, count);
                sleep(2);
                exit(27);
            }
            if (!r)
                count++;
        } while (!r);
    }
}

  reply	other threads:[~2009-03-18  9:02 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-14 16:50 Signal delivery order Gábor Melis
2009-03-15  9:44 ` Oleg Nesterov
2009-03-15 14:40   ` Gábor Melis
2009-03-15 17:29     ` Oleg Nesterov
2009-03-15 22:06       ` Gábor Melis
2009-03-16  0:28         ` Oleg Nesterov
2009-03-16  8:34           ` Gábor Melis
2009-03-16 21:13             ` Oleg Nesterov
2009-03-16 22:56               ` Chris Friesen
2009-03-17  4:13                 ` Q: SEGSEGV && uc_mcontext->ip (Was: Signal delivery order) Oleg Nesterov
2009-03-17  4:25                   ` Oleg Nesterov
2009-03-17  8:23                   ` Gábor Melis
2009-03-17  9:25                     ` Oleg Nesterov
2009-03-17 10:20                       ` Gábor Melis
2009-03-17 10:43                         ` Oleg Nesterov
2009-03-17 15:56                     ` Linus Torvalds
2009-03-17 19:20                       ` Q: SEGSEGV && uc_mcontext->ip David Miller
2009-03-18  9:58                       ` Q: SEGSEGV && uc_mcontext->ip (Was: Signal delivery order) Gábor Melis
2009-03-18  7:59                   ` Roland McGrath
2009-03-18  9:02                     ` Gábor Melis [this message]
2009-03-18 14:52                       ` RT signal queue overflow (Was: Q: SEGSEGV && uc_mcontext->ip (Was: Signal delivery order)) Linus Torvalds
2009-03-18 15:23                         ` Gábor Melis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200903181002.07584.mega@retes.hu \
    --to=mega@retes.hu \
    --cc=akpm@linux-foundation.org \
    --cc=cfriesen@nortel.com \
    --cc=davidel@xmailserver.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=oleg@redhat.com \
    --cc=roland@redhat.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).