public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@tv-sign.ru>
To: Mark McLoughlin <markmc@redhat.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	Roland McGrath <roland@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH] posix-timers: Do not modify an already queued timer signal
Date: Sat, 19 Jul 2008 20:37:34 +0400	[thread overview]
Message-ID: <20080719163734.GA389@tv-sign.ru> (raw)
In-Reply-To: <1216377558.12300.13.camel@muff>

On 07/18, Mark McLoughlin wrote:
>
> On Thu, 2008-07-17 at 17:55 +0400, Oleg Nesterov wrote:
>
> > I forgot (if ever knew ;) this code completely, but can't we make a simpler
> > fix? posix_timer_event() can check list_empty() lockless,
> >
> >         posix_timer_event()
> >         {
> >                 if (!list_emtpy(sigq->list))
> >                         return 0;
> >
> >                 ... fill and send ->sigq...
> >         }
>
> Well, one issue with this is that we need to set the si_private supplied
> to posix_timer_event() on the queued siginfo. See updated version of the
> original patch below.
>
> So, for that reason, we can't currently do it lockless.
>
> Now, I've spent a while looking at the it_requeue_pending code and I
> can't fully satisfy myself that we need it to be a modification counter
> that we match up via si_sys_private. Do you know why this is needed? It
> seems to me that it could be seriously simplified.

No, I don't understand what does si_sys_private mean. In fact I don't even
understand what should we do with info.si_overrun in this corner case.

We have the active timer, the app does sys_timer_settime() which changes the
timer. This looks like creating the new timer which "inherits" ->it_id and
->it_sigev_value. But the queued siginfo is connected to the "old" timer...
OK, I just don't understand this all.

> Subject: [PATCH] posix-timers: Do not modify an already queued timer signal
>
> When a timer fires, posix_timer_event() zeroes out its
> pre-allocated siginfo structure, initialises it and then
> queues up the signal with send_sigqueue().
>
> However, we may have previously queued up this signal, in
> which case we only want to increment si_overrun and
> re-initialising the siginfo structure is incorrect.
>
> Also, since we are modifying an already queued signal
> without the protection of the sighand spinlock, we may also
> race with e.g. collect_signal() causing it to fail to find
> a signal on the pending list because it happens to look at
> the siginfo struct after it was zeroed and before it was
> re-initialised.
>
> The race was observed with a modified kvm-userspace when
> running a guest under heavy network load. When it occurs,
> KVM never sees another SIGALRM signal because although
> the signal is queued up the appropriate bit is never set
> in the pending mask. Manually sending the process a SIGALRM
> kicks it out of this state.

Please update the changelog to explain how it is possible to hit the
already queued siginfo.

> -int posix_timer_event(struct k_itimer *timr,int si_private)
> +int posix_timer_event(struct k_itimer *timr, int si_private)
>  {
> -	memset(&timr->sigq->info, 0, sizeof(siginfo_t));
> -	timr->sigq->info.si_sys_private = si_private;
> -	/* Send signal to the process that owns this timer.*/
> +	siginfo_t info;
>
> -	timr->sigq->info.si_signo = timr->it_sigev_signo;
> -	timr->sigq->info.si_errno = 0;
> -	timr->sigq->info.si_code = SI_TIMER;
> -	timr->sigq->info.si_tid = timr->it_id;
> -	timr->sigq->info.si_value = timr->it_sigev_value;
> +	memset(&info, 0, sizeof(siginfo_t));
> +
> +	info.si_sys_private = si_private;
> +	info.si_signo = timr->it_sigev_signo;
> +	info.si_errno = 0;
> +	info.si_code = SI_TIMER;
> +	info.si_tid = timr->it_id;
> +	info.si_value = timr->it_sigev_value;
>
>  	if (timr->it_sigev_notify & SIGEV_THREAD_ID) {
>  		struct task_struct *leader;
> -		int ret = send_sigqueue(timr->sigq, timr->it_process, 0);
> +		int ret = send_sigqueue(timr->sigq, &info, timr->it_process, 0);

I think this is a bit overkill. Note that (unless I missed something)
posix_timer_event() populates timr->sigq->info with the same numbers
every time, so afaics we can do

	--- kernel/posix-timers.c
	+++ kernel/posix-timers.c
	@@ -298,19 +298,14 @@ void do_schedule_next_timer(struct sigin
	 
	 int posix_timer_event(struct k_itimer *timr,int si_private)
	 {
	-	memset(&timr->sigq->info, 0, sizeof(siginfo_t));
	-	timr->sigq->info.si_sys_private = si_private;
	-	/* Send signal to the process that owns this timer.*/
	-
		timr->sigq->info.si_signo = timr->it_sigev_signo;
	-	timr->sigq->info.si_errno = 0;
		timr->sigq->info.si_code = SI_TIMER;
		timr->sigq->info.si_tid = timr->it_id;
		timr->sigq->info.si_value = timr->it_sigev_value;
	 
		if (timr->it_sigev_notify & SIGEV_THREAD_ID) {
			struct task_struct *leader;
	-		int ret = send_sigqueue(timr->sigq, timr->it_process, 0);
	+		int ret = send_sigqueue(timr->sigq, si_private, timr->it_process, 0);
	 
			if (likely(ret >= 0))
				return ret;
	@@ -321,7 +316,7 @@ int posix_timer_event(struct k_itimer *t
			timr->it_process = leader;
		}
	 
	-	return send_sigqueue(timr->sigq, timr->it_process, 1);
	+	return send_sigqueue(timr->sigq, si_private, timr->it_process, 1);
	 }
	 EXPORT_SYMBOL_GPL(posix_timer_event);
	 
	@@ -435,6 +430,7 @@ static struct k_itimer * alloc_posix_tim
			kmem_cache_free(posix_timers_cache, tmr);
			tmr = NULL;
		}
	+	memset(&timr->sigq->info, 0, sizeof(siginfo_t));
		return tmr;
	 }
	 
	--- kernel/signal.c
	+++ kernel/signal.c
	@@ -1283,7 +1283,7 @@ void sigqueue_free(struct sigqueue *q)
			__sigqueue_free(q);
	 }
	 
	-int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group)
	+int send_sigqueue(struct sigqueue *q, int si_private, struct task_struct *t, int group)
	 {
		int sig = q->info.si_signo;
		struct sigpending *pending;
	@@ -1300,6 +1300,8 @@ int send_sigqueue(struct sigqueue *q, st
		if (!prepare_signal(sig, t))
			goto out;
	 
	+	q->info.si_sys_private = info->si_sys_private;
	+
		ret = 0;
		if (unlikely(!list_empty(&q->list))) {
			/*

But can't we do a simpler change?

	--- kernel/posix-timers.c
	+++ kernel/posix-timers.c
	@@ -298,7 +298,6 @@ void do_schedule_next_timer(struct sigin
	 
	 int posix_timer_event(struct k_itimer *timr,int si_private)
	 {
	-	memset(&timr->sigq->info, 0, sizeof(siginfo_t));
		timr->sigq->info.si_sys_private = si_private;
		/* Send signal to the process that owns this timer.*/
	 
	@@ -435,6 +434,7 @@ static struct k_itimer * alloc_posix_tim
			kmem_cache_free(posix_timers_cache, tmr);
			tmr = NULL;
		}
	+	memset(&timr->sigq->info, 0, sizeof(siginfo_t));
		return tmr;
	 }

Yes, if sigq->info is queued, it can be dequeued right after
".si_sys_private = si_private" and before we send the signal. As I said,
I don't know what si_sys_private means for the user-level, is this bad?

Note that the we can't race with do_schedule_next_timer(), the timer is
locked.

Thoughts?

Oleg.


  reply	other threads:[~2008-07-19 16:34 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-16 14:50 [PATCH] posix-timers: Do not modify an already queued timer signal Mark McLoughlin
2008-07-16 15:33 ` Mark McLoughlin
2008-07-16 16:21 ` Oleg Nesterov
2008-07-17 11:08   ` Mark McLoughlin
2008-07-17 13:55     ` Oleg Nesterov
2008-07-18 10:39       ` Mark McLoughlin
2008-07-19 16:37         ` Oleg Nesterov [this message]
2008-07-20  6:52           ` Roland McGrath
2008-07-20 11:08             ` Oleg Nesterov
2008-07-20 12:26               ` Oleg Nesterov
2008-07-21  0:47               ` Roland McGrath
2008-07-21 15:23                 ` Oleg Nesterov
2008-07-21 15:40                   ` do_schedule_next_timer && si_overrun (Was: [PATCH] posix-timers: Do not modify an already queued timer signal) Oleg Nesterov
2008-07-21 15:55                   ` [PATCH] posix-timers: Do not modify an already queued timer signal Oliver Pinter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080719163734.GA389@tv-sign.ru \
    --to=oleg@tv-sign.ru \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=markmc@redhat.com \
    --cc=roland@redhat.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox