From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933949AbcA0QmA (ORCPT ); Wed, 27 Jan 2016 11:42:00 -0500 Received: from mx1.redhat.com ([209.132.183.28]:49580 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933115AbcA0Ql5 (ORCPT ); Wed, 27 Jan 2016 11:41:57 -0500 Date: Wed, 27 Jan 2016 17:41:54 +0100 From: Oleg Nesterov To: Peter Zijlstra Cc: Andrew Morton , Sasha Levin , linux-kernel@vger.kernel.org, mingo@kernel.org Subject: Re: [PATCH] signals: work around random wakeups in sigsuspend() Message-ID: <20160127164154.GB14320@redhat.com> References: <1453735306-13519-1-git-send-email-sasha.levin@oracle.com> <20160125133205.36542c86ada93761d8a9ff06@linux-foundation.org> <20160126211009.GA4695@redhat.com> <20160127084443.GL6357@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160127084443.GL6357@twins.programming.kicks-ass.net> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/27, Peter Zijlstra wrote: > > On Tue, Jan 26, 2016 at 10:10:09PM +0100, Oleg Nesterov wrote: > > > And, ironically, there is another more serious "reverse" problem ;) sigsuspend() > > orany other user of -ERESTARTNOHAND can "miss" the signal, in a sense that the > > kernel can wrongly restart this syscall after return from signal handler. This > > is not trivial to fix.. > > So I'm not entirely sure I get what you mean there. Yes, sorry. When I re-read my email it really looks like "I know the secret but I won't tell you" ;) The problem is simple, the fix is not. -ERESTARTNOHAND means that we should restart if do_signal()->get_signal() returns zero. This can happen when, say, another thread has already dequeued the group-wide signal which was the reason for TIF_SIGPENDING. Or signal_pending() was true because of debuger, or another reason, doesn't matter. In this case do_signal() does regs->ax = regs->orig_ax; regs->ip -= 2; and we return to user space. Note that after that the kernel can't know that the task is going to restart the syscall which should be interrupted by signals. Now suppose that another signal (with the handler) comes before this task executes the "syscall" insn. In this case sigsuspend() will restart after the task runs the handler. > But it did get me to > look at the patch again: > > + while (!signal_pending(current)) { > + __set_current_state(TASK_INTERRUPTIBLE); > + schedule(); > + } > > That should very much be: > > for (;;) { > set_current_state(TASK_INTERRUPTIBLE); > if (signal_pending(current)) > break; > schedule(); > } > __set_current_state(TASK_RUNNING); Why? It should work either way. Yes, signal_wakeup() can come right before __set_current_state(TASK_INTERRUPTIBLE) but this is fine, __schedule() must not sleep if signal_pending() == T, that is why it checks signal_pending_state(). See also the comment above smp_mb__before_spinlock() in schedule(). IOW, signal_pending() is the "special" condition, you do not need to serialize this check with task->state setting, exactly because schedule() knows about the signals. Oleg.