public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* 'khelper' (child) is stuck in endless loop: do_signal() and !user_mode(regs)
       [not found] <1144797072.59663.1331142646789.JavaMail.root@storentr1.softathome.com>
@ 2012-03-07 17:51 ` Dmitry ADAMUSHKA (EXT)
  2012-03-07 18:46   ` Oleg Nesterov
  0 siblings, 1 reply; 12+ messages in thread
From: Dmitry ADAMUSHKA (EXT) @ 2012-03-07 17:51 UTC (permalink / raw)
  To: Oleg Nesterov, Ingo Molnar, Ralf Baechle
  Cc: wouter.cloetens, dmitry adamushko, linux-kernel


Hi All,

The issue described below has been observed on a MIPS board running 2.6.30, but, according to my analysis (no need to panic, I may well be wrong :-)),
the recent kernel and other archs (at least x86) are also affected.

Problem:

a CPU ends up looping endlessly with interrupts disabled. ftrace's function tracer (triggered via SysRq, luckily it's an SMP system) shows:

khelper-1818    0d... 285882000us : do_notify_resume <-work_notifysig
khelper-1818    0d... 285882000us : do_notify_resume <-work_notifysig
khelper-1818    0d... 285882000us : do_notify_resume <-work_notifysig
[...]

At this moment, there are 2 'khelper' tasks on the system [1], the original (parent) 'khelper' is ok.

Now, the assumptions (the question is whether these are true for the recent kernels):

1) TIF_SIGPENDING can be set for 'khelper' while it's running in ____call_usermodehelper()
   between (a) flush_signal_handlers() and (b) kernel_execve() => so TIF_SIGPENDING is set;

2) kernel_execve() can fail in ____call_usermodehelper().

The later one is less of an assumption; let's say, it fails due to a shortage of memory (or whatever).

If (1) is true, then

the pre-conditions:

- a kernel space task;  

'khelper' running ____call_usermodehelper() in our case.

- TIF_SIGPENDING is set.

A signal has been delivered, say, as a result of kill(-1, SIGKILL).

The endless loop is as follows:

* syscall_exit_work:
 - work_pending:            // start_of_the_loop
 - work_notify_sig:
   - do_notify_resume()
     - do_signal()          ==> if (!user_mode(regs)) return; so signals are not handled
 - resume_userspace         // TIF_SIGPENDING is still set
 - work_pending		    // so we call work_pending => goto start_of_the_loop


And we enter this loop when both assumptions above are true. That's, kernel_execve() fails in ____call_usermodehelper() and there is a pending signal for 'khelper'.

I'm actually able to trigger the loop (with 2.6.30 on MIPS) by deliberately setting up a pending signal in ____call_usermodehelper() and then letting kernel_execve() fail. In real life, the issue is triggered sporadically when a board reboots (busybox's init calls kill(-1, SIGKILL)).

Have I overlooked something in the recent kernel that makes it immune to this problem?

Thanks for comments,

--Dmitry


[1] SysRq list-all-tasks output:

helper       D 7fffffff     0    26      2
[...]
Call Trace:
[<80440fc4>] __schedule+0x3c4/0xa60
[<80441690>] schedule+0x30/0x60
[<80441d1c>] schedule_timeout+0x19c/0x1d0
[<804409f8>] wait_for_common+0xc4/0x184
[<80440bec>] wait_for_completion+0x2c/0x40
[<8003f774>] do_fork+0x1d0/0x3cc
[<80015cd8>] kernel_thread+0x90/0xb4
[<80058754>] __call_usermodehelper+0x64/0xc4
[<80059cf4>] worker_thread+0x15c/0x2b0

khelper       R running      0  1818     26
[...]
Call Trace:
[<80440fc4>] __schedule+0x3c4/0xa60


This message and any attachments herein are confidential, intended solely for the addressees and are SoftAtHome's ownership. Any unauthorized use or dissemination is prohibited. If you are not the intended addressee of this message, please cancel it immediately and inform the sender.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 'khelper' (child) is stuck in endless loop: do_signal() and !user_mode(regs)
  2012-03-07 17:51 ` Dmitry ADAMUSHKA (EXT)
@ 2012-03-07 18:46   ` Oleg Nesterov
  2012-03-07 20:05     ` Dmitry Adamushko
  0 siblings, 1 reply; 12+ messages in thread
From: Oleg Nesterov @ 2012-03-07 18:46 UTC (permalink / raw)
  To: Dmitry ADAMUSHKA (EXT)
  Cc: Ingo Molnar, Ralf Baechle, wouter.cloetens, dmitry adamushko,
	linux-kernel

Hi Dmitry,

I can't read this email carefully now, will do tomorrow.

But,

On 03/07, Dmitry ADAMUSHKA (EXT) wrote:
>
> Now, the assumptions (the question is whether these are true for the recent kernels):
>
> 1) TIF_SIGPENDING can be set for 'khelper' while it's running in ____call_usermodehelper()
>    between (a) flush_signal_handlers() and (b) kernel_execve() => so TIF_SIGPENDING is set;

Yes, but it is not khelper. It is another kernel thread. Yes, its
->comm[] was copied from parent, so ps/etc can show it as khelper.

> 2) kernel_execve() can fail in ____call_usermodehelper().
>
> The later one is less of an assumption; let's say, it fails due to a shortage of memory (or whatever).
>
> If (1) is true, then
>
> the pre-conditions:
>
> - a kernel space task;
>
> 'khelper' running ____call_usermodehelper() in our case.
>
> - TIF_SIGPENDING is set.
>
> A signal has been delivered, say, as a result of kill(-1, SIGKILL).
>
> The endless loop is as follows:
>
> * syscall_exit_work:
>  - work_pending:            // start_of_the_loop

We shouldn't be here. This is the kernel thread.

And if start_thread() was already called, then

>  - work_notify_sig:
>    - do_notify_resume()
>      - do_signal()          ==> if (!user_mode(regs)) return; so signals are not handled

user_mode() is no longer true.

Once again, I can be wrong, I'll read this email tomorrow.

Oleg.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 'khelper' (child) is stuck in endless loop: do_signal() and !user_mode(regs)
  2012-03-07 18:46   ` Oleg Nesterov
@ 2012-03-07 20:05     ` Dmitry Adamushko
  0 siblings, 0 replies; 12+ messages in thread
From: Dmitry Adamushko @ 2012-03-07 20:05 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dmitry ADAMUSHKA (EXT), Ingo Molnar, Ralf Baechle,
	wouter.cloetens, linux-kernel

Hi Oleg,

> On 03/07, Dmitry ADAMUSHKA (EXT) wrote:
>>
>> Now, the assumptions (the question is whether these are true for the recent kernels):
>>
>> 1) TIF_SIGPENDING can be set for 'khelper' while it's running in ____call_usermodehelper()
>>    between (a) flush_signal_handlers() and (b) kernel_execve() => so TIF_SIGPENDING is set;
>
> Yes, but it is not khelper. It is another kernel thread. Yes, its
> ->comm[] was copied from parent, so ps/etc can show it as khelper.

Sure, that's why I indicated 'khelper' (child).

>
>> 2) kernel_execve() can fail in ____call_usermodehelper().
>>
>> The later one is less of an assumption; let's say, it fails due to a shortage of memory (or whatever).
>>
>> If (1) is true, then
>>
>> the pre-conditions:
>>
>> - a kernel space task;
>>
>> 'khelper' running ____call_usermodehelper() in our case.
>>
>> - TIF_SIGPENDING is set.
>>
>> A signal has been delivered, say, as a result of kill(-1, SIGKILL).
>>
>> The endless loop is as follows:
>>
>> * syscall_exit_work:
>>  - work_pending:            // start_of_the_loop
>
> We shouldn't be here. This is the kernel thread.

Note that kernel_execve() is backed up by a full fledged syscall (not
just a function call, at least on MIPS and x86), so I assume that all
the usual syscall-related stuff applies here as well.

>
> And if start_thread() was already called, then
>
>>  - work_notify_sig:
>>    - do_notify_resume()
>>      - do_signal()          ==> if (!user_mode(regs)) return; so signals are not handled
>
> user_mode() is no longer true.

!user_mode() is true. Note, the failure of kernel_execve() is one of
the pre-conditions. So we have a kernel thread returning from a real
syscall (hence, syscall_exit and co) with TIF_SIGPENDING.

>
> Once again, I can be wrong, I'll read this email tomorrow.
>

Great, thanks!


-- Dmitry

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 'khelper' (child) is stuck in endless loop: do_signal() and !user_mode(regs)
       [not found] <139779962.60750.1331202718116.JavaMail.root@storentr1.softathome.com>
@ 2012-03-08 10:37 ` Dmitry ADAMUSHKA (EXT)
  2012-03-08 15:46   ` Oleg Nesterov
  0 siblings, 1 reply; 12+ messages in thread
From: Dmitry ADAMUSHKA (EXT) @ 2012-03-08 10:37 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Ralf Baechle, wouter cloetens, linux-kernel,
	Dmitry Adamushko

[-- Attachment #1: Type: text/plain, Size: 6296 bytes --]


Oleg,

I'm able to reproduce this problem on x86 (32 bits) with the following patches that try to simulate the real-life situation (see the comments in the patches).

It happens only when CONFIG_VM86 is disabled (I tried both). Supposedly, due to the following bits of the VM86-specific code that let us break out of the endless-loop.

#ifdef CONFIG_VM86
#define resume_userspace_sig    check_userspace
#else
[...]

there is the specific are-we-a-kernel-task? check here

check_userspace:
        movl PT_EFLAGS(%esp), %eax      # mix EFLAGS and CS
        movb PT_CS(%esp), %al
        andl $(X86_EFLAGS_VM | SEGMENT_RPL_MASK), %eax
        cmpl $USER_RPL, %eax
        jb resume_kernel                # not returning to v8086 or userspace

ENTRY(resume_userspace)
        LOCKDEP_SYS_EXIT
[...]
        jne work_pending
        jmp restore_all

which is available neither in case of !CONFIG_VM86, nor in case of MIPS. Hence, the loop.

So here are the patches to simulate the problem. Is this approach not valid for one or another reason?

Thanks in advance.


=== copy-pasted ===

--- kernel/kmod.c.orig  2012-03-08 10:26:05.504752023 +0100
+++ kernel/kmod.c       2012-03-08 11:25:05.028661835 +0100
@@ -154,6 +154,15 @@ static int ____call_usermodehelper(void
        /* We can run anywhere, unlike our parent keventd(). */
        set_cpus_allowed_ptr(current, cpu_all_mask);

+       printk(KERN_EMERG "Unleash the signal...\n");
+
+       /*
+        * (1) here we emulate receiving a signal.
+        *     In the original case, a signal should be delivered from outside,
+        *     say, by "kill(-1, SIGKILL)" in busybox.
+        */
+       send_sig(SIGUSR1, current, 0);
+
        /*
         * Our parent is keventd, which runs with elevated scheduling priority.
         * Avoid propagating that into the userspace child.
@@ -181,6 +190,19 @@ static int ____call_usermodehelper(void

        commit_creds(new);

+       /* (2) here we emulate the failure of kernel_execve().
+        *     In real life, the failure can be due to a memory shortage,
+        *     or something else.
+         *     In our case, it happens when a board reboots - same as (1) above.
+        */
+       retval = kernel_execve(NULL,
+                              (const char *const *)sub_info->argv,
+                              (const char *const *)sub_info->envp);
+
+       printk(KERN_EMERG "x86 is rock-solid!");
+       flush_signals(current);
+
+       /* If we survived the test, let's continue so the user should not notice. */
        retval = kernel_execve(sub_info->path,
                               (const char *const *)sub_info->argv,
                               (const char *const *)sub_info->envp);

and another one

--- arch/x86/kernel/signal.c.orig       2012-03-08 11:18:19.702651943 +0100
+++ arch/x86/kernel/signal.c    2012-03-08 10:31:18.682304346 +0100
@@ -765,8 +765,11 @@ static void do_signal(struct pt_regs *re
         * X86_32: vm86 regs switched out by assembly code before reaching
         * here, so testing against kernel CS suffices.
         */
-       if (!user_mode(regs))
+       if (!user_mode(regs)) {
+               printk(KERN_EMERG "* endless loop\n");
+               dump_stack();
                return;
+       }

        signr = get_signal_to_deliver(&info, &ka, regs, NULL);
        if (signr > 0) {




----- Original Message -----
> From: "Dmitry Adamushko" <dmitry.adamushko@gmail.com>
> To: "Oleg Nesterov" <oleg@redhat.com>
> Cc: "Dmitry ADAMUSHKA (EXT)" <dmitry.adamushka_ext@softathome.com>, "Ingo Molnar" <mingo@elte.hu>, "Ralf Baechle"
> <ralf@linux-mips.org>, "wouter cloetens" <wouter.cloetens@softathome.com>, linux-kernel@vger.kernel.org
> Sent: Wednesday, March 7, 2012 9:05:43 PM
> Subject: Re: 'khelper' (child) is stuck in endless loop: do_signal() and !user_mode(regs)

> Hi Oleg,
> 
> > On 03/07, Dmitry ADAMUSHKA (EXT) wrote:
> >>
> >> Now, the assumptions (the question is whether these are true for
> >> the recent kernels):
> >>
> >> 1) TIF_SIGPENDING can be set for 'khelper' while it's running in
> >> ____call_usermodehelper()
> >>    between (a) flush_signal_handlers() and (b) kernel_execve() =>
> >>    so TIF_SIGPENDING is set;
> >
> > Yes, but it is not khelper. It is another kernel thread. Yes, its
> > ->comm[] was copied from parent, so ps/etc can show it as khelper.
> 
> Sure, that's why I indicated 'khelper' (child).
> 
> >
> >> 2) kernel_execve() can fail in ____call_usermodehelper().
> >>
> >> The later one is less of an assumption; let's say, it fails due to
> >> a shortage of memory (or whatever).
> >>
> >> If (1) is true, then
> >>
> >> the pre-conditions:
> >>
> >> - a kernel space task;
> >>
> >> 'khelper' running ____call_usermodehelper() in our case.
> >>
> >> - TIF_SIGPENDING is set.
> >>
> >> A signal has been delivered, say, as a result of kill(-1, SIGKILL).
> >>
> >> The endless loop is as follows:
> >>
> >> * syscall_exit_work:
> >>  - work_pending: // start_of_the_loop
> >
> > We shouldn't be here. This is the kernel thread.
> 
> Note that kernel_execve() is backed up by a full fledged syscall (not
> just a function call, at least on MIPS and x86), so I assume that all
> the usual syscall-related stuff applies here as well.
> 
> >
> > And if start_thread() was already called, then
> >
> >>  - work_notify_sig:
> >>    - do_notify_resume()
> >>      - do_signal() ==> if (!user_mode(regs)) return; so signals are
> >>      not handled
> >
> > user_mode() is no longer true.
> 
> !user_mode() is true. Note, the failure of kernel_execve() is one of
> the pre-conditions. So we have a kernel thread returning from a real
> syscall (hence, syscall_exit and co) with TIF_SIGPENDING.
> 
> >
> > Once again, I can be wrong, I'll read this email tomorrow.
> >
> 
> Great, thanks!
> 
> 
> -- Dmitry

This message and any attachments herein are confidential, intended solely for the addressees and are SoftAtHome's ownership. Any unauthorized use or dissemination is prohibited. If you are not the intended addressee of this message, please cancel it immediately and inform the sender.

[-- Attachment #2: kmod.c.diff --]
[-- Type: text/x-patch, Size: 1424 bytes --]

--- kernel/kmod.c.orig	2012-03-08 10:26:05.504752023 +0100
+++ kernel/kmod.c	2012-03-08 11:25:05.028661835 +0100
@@ -154,6 +154,15 @@ static int ____call_usermodehelper(void
 	/* We can run anywhere, unlike our parent keventd(). */
 	set_cpus_allowed_ptr(current, cpu_all_mask);
 
+	printk(KERN_EMERG "Unleash the signal...\n");
+
+	/*
+ 	 * (1) here we emulate receiving a signal.
+	 *     In the original case, a signal should be delivered from outside,
+	 *     say, by "kill(-1, SIGKILL)" in busybox.
+	 */
+	send_sig(SIGUSR1, current, 0);
+
 	/*
 	 * Our parent is keventd, which runs with elevated scheduling priority.
 	 * Avoid propagating that into the userspace child.
@@ -181,6 +190,19 @@ static int ____call_usermodehelper(void
 
 	commit_creds(new);
 
+	/* (2) here we emulate the failure of kernel_execve().
+	 *     In real life, the failure can be due to a memory shortage,
+	 *     or something else.
+         *     In our case, it happens when a board reboots - same as (1) above.
+	 */
+	retval = kernel_execve(NULL,
+			       (const char *const *)sub_info->argv,
+			       (const char *const *)sub_info->envp);
+
+	printk(KERN_EMERG "x86 is rock-solid!");
+	flush_signals(current);
+
+	/* If we survived the test, let's continue so the user should not notice. */
 	retval = kernel_execve(sub_info->path,
 			       (const char *const *)sub_info->argv,
 			       (const char *const *)sub_info->envp);

[-- Attachment #3: x86-signal.c.diff --]
[-- Type: text/x-patch, Size: 524 bytes --]

--- arch/x86/kernel/signal.c.orig	2012-03-08 11:18:19.702651943 +0100
+++ arch/x86/kernel/signal.c	2012-03-08 10:31:18.682304346 +0100
@@ -765,8 +765,11 @@ static void do_signal(struct pt_regs *re
 	 * X86_32: vm86 regs switched out by assembly code before reaching
 	 * here, so testing against kernel CS suffices.
 	 */
-	if (!user_mode(regs))
+	if (!user_mode(regs)) {
+		printk(KERN_EMERG "* endless loop\n");
+		dump_stack();
 		return;
+	}
 
 	signr = get_signal_to_deliver(&info, &ka, regs, NULL);
 	if (signr > 0) {

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 'khelper' (child) is stuck in endless loop: do_signal() and !user_mode(regs)
       [not found] <CAO6Zf6C+SDZ-TV12wr9oiO6HB-itQ6fLPHFugXk0osEiAxW22w@mail.gmail.com>
@ 2012-03-08 15:12 ` Dmitry ADAMUSHKA (EXT)
  2012-03-08 15:55   ` Dmitry ADAMUSHKA (EXT)
  2012-03-08 16:29   ` Oleg Nesterov
  0 siblings, 2 replies; 12+ messages in thread
From: Dmitry ADAMUSHKA (EXT) @ 2012-03-08 15:12 UTC (permalink / raw)
  To: Ingo Molnar, Oleg Nesterov
  Cc: Ralf Baechle, wouter cloetens, linux-kernel, Dmitry Adamushko


The following quick hack "fixes" it for x86. The output is below (contrary to the endless "* endless loop" messages seen before) [1].

--- arch/x86/kernel/entry_32.S.orig     2012-03-08 15:42:25.041296595 +0100
+++ arch/x86/kernel/entry_32.S  2012-03-08 15:58:29.926081131 +0100
@@ -98,12 +98,6 @@
 #endif
 .endm

-#ifdef CONFIG_VM86
-#define resume_userspace_sig   check_userspace
-#else
-#define resume_userspace_sig   resume_userspace
-#endif
-
 /*
  * User gs save/restore
  *
@@ -327,10 +321,19 @@ ret_from_exception:
        preempt_stop(CLBR_ANY)
 ret_from_intr:
        GET_THREAD_INFO(%ebp)
-check_userspace:
+resume_userspace_sig:
+#ifdef CONFIG_VM86
        movl PT_EFLAGS(%esp), %eax      # mix EFLAGS and CS
        movb PT_CS(%esp), %al
        andl $(X86_EFLAGS_VM | SEGMENT_RPL_MASK), %eax
+#else
+/*
+ * We can be coming here from a syscall done in the kernel space,
+ * e.g. a failed kernel_execve().
+ */
+       movl PT_CS(%esp), %eax
+       andl $SEGMENT_RPL_MASK, %eax
+#endif
        cmpl $USER_RPL, %eax
        jb resume_kernel                # not returning to v8086 or userspace

[1]

[...]
[   10.220496] input: HDA Intel Line-Out as /devices/pci0000:00/0000:00:1b.0/sound/card0/input8
[   10.448021] Unleash the signal...
[   10.448028] * endless loop
[   10.448030] Pid: 906, comm: kworker/u:6 Not tainted 3.3.0-rc4-crush-custom #4
[   10.448032] Call Trace:
[   10.448038]  [<c151cfe4>] ? printk+0x30/0x34
[   10.448042]  [<c1002abf>] do_signal+0x7ff/0x890
[   10.448045]  [<c151f6ed>] ? _raw_spin_trylock+0xd/0x20
[   10.448048]  [<c1324da8>] ? vt_console_print+0x288/0x360
[   10.448052]  [<c10264c8>] ? default_spin_lock_flags+0x8/0x10
[   10.448054]  [<c1522a90>] ? spurious_fault+0xd0/0xd0
[   10.448057]  [<c1520183>] ? error_code+0x67/0x6c
[   10.448060]  [<c111007b>] ? read_swap_cache_async+0x7b/0xf0
[   10.448063]  [<c1133c3d>] ? getname_flags+0x5d/0x160
[   10.448065]  [<c1133c3d>] ? getname_flags+0x5d/0x160
[   10.448067]  [<c1133d51>] ? getname+0x11/0x20
[   10.448069]  [<c1002dc5>] do_notify_resume+0x65/0x80
[   10.448071]  [<c151fbb7>] work_notifysig+0x16/0x1b
[   10.448074]  [<c10b00d8>] ? unmask_irq+0x8/0x30
[   10.448076]  [<c1006661>] ? kernel_execve+0x21/0x30
[   10.448080]  [<c1048f20>] ? ____call_usermodehelper+0x100/0x130
[   10.448082]  [<c1048e20>] ? proc_cap_handler+0x180/0x180
[   10.448085]  [<c1526b3e>] ? kernel_thread_helper+0x6/0x10
[   10.448086] x86 is rock-solid!
[   10.448842] ppdev: user-space parallel port driver
[...]


----- Original Message -----
> From: "Dmitry Adamushko" <dmitry.adamushko@gmail.com>
> To: "Oleg Nesterov" <oleg@redhat.com>
> Cc: "Ingo Molnar" <mingo@elte.hu>, "Ralf Baechle" <ralf@linux-mips.org>, "wouter cloetens"
> <wouter.cloetens@softathome.com>, linux-kernel@vger.kernel.org, "Dmitry ADAMUSHKA (EXT)"
> <dmitry.adamushka_ext@softathome.com>
> Sent: Thursday, March 8, 2012 11:46:41 AM
> Subject: Re: 'khelper' (child) is stuck in endless loop: do_signal() and !user_mode(regs)

> See the enclosed picture.
> 
> For some reason, I can only see the "* endless loop" messages
> (KERN_EMERG) on my terminal. Perhaps, it's due to the setting of
> syslog (or whatever is used here) for this terminal (the primary
> graphical one is just stuck).
> 
> 
> On 8 March 2012 11:37, Dmitry ADAMUSHKA (EXT)
> <dmitry.adamushka_ext@softathome.com> wrote:
> >
> > Oleg,
> >
> > I'm able to reproduce this problem on x86 (32 bits) with the
> > following patches that try to simulate the real-life situation (see
> > the comments in the patches).
> >
> > It happens only when CONFIG_VM86 is disabled (I tried both).
> > Supposedly, due to the following bits of the VM86-specific code that
> > let us break out of the endless-loop.
> >
> > #ifdef CONFIG_VM86
> > #define resume_userspace_sig check_userspace
> > #else [...]
> >
> > there is the specific are-we-a-kernel-task? check here
> >
> > check_userspace:
> >        movl PT_EFLAGS(%esp), %eax # mix EFLAGS and CS
> >        movb PT_CS(%esp), %al
> >        andl $(X86_EFLAGS_VM | SEGMENT_RPL_MASK), %eax
> >        cmpl $USER_RPL, %eax
> >        jb resume_kernel # not returning to v8086 or userspace
> >
> > ENTRY(resume_userspace)
> >        LOCKDEP_SYS_EXIT
> > [...]
> >        jne work_pending
> >        jmp restore_all
> >
> > which is available neither in case of !CONFIG_VM86, nor in case of
> > MIPS. Hence, the loop.
> >
> > So here are the patches to simulate the problem. Is this approach
> > not valid for one or another reason?
> >
> > Thanks in advance.
> >
> >
> > === copy-pasted ===
> >
> > --- kernel/kmod.c.orig 2012-03-08 10:26:05.504752023 +0100
> > +++ kernel/kmod.c 2012-03-08 11:25:05.028661835 +0100
> > @@ -154,6 +154,15 @@ static int ____call_usermodehelper(void
> >        /* We can run anywhere, unlike our parent keventd(). */
> >        set_cpus_allowed_ptr(current, cpu_all_mask);
> >
> > + printk(KERN_EMERG "Unleash the signal...\n");
> > + + /*
> > + * (1) here we emulate receiving a signal.
> > + * In the original case, a signal should be delivered from outside,
> > + * say, by "kill(-1, SIGKILL)" in busybox.
> > + */
> > + send_sig(SIGUSR1, current, 0);
> > +
> >        /*
> >         * Our parent is keventd, which runs with elevated scheduling
> >         priority. * Avoid propagating that into the userspace child.
> > @@ -181,6 +190,19 @@ static int ____call_usermodehelper(void
> >
> >        commit_creds(new);
> >
> > + /* (2) here we emulate the failure of kernel_execve().
> > + * In real life, the failure can be due to a memory shortage,
> > + * or something else.
> > + * In our case, it happens when a board reboots - same as (1)
> > above. + */
> > + retval = kernel_execve(NULL,
> > + (const char *const *)sub_info->argv,
> > + (const char *const *)sub_info->envp);
> > + + printk(KERN_EMERG "x86 is rock-solid!");
> > + flush_signals(current);
> > + + /* If we survived the test, let's continue so the user should
> > not notice. */
> >        retval = kernel_execve(sub_info->path,
> >                               (const char *const *)sub_info->argv,
> >                               (const char *const *)sub_info->envp);
> >
> > and another one
> >
> > --- arch/x86/kernel/signal.c.orig 2012-03-08 11:18:19.702651943
> > +0100 +++ arch/x86/kernel/signal.c 2012-03-08 10:31:18.682304346
> > +0100 @@ -765,8 +765,11 @@ static void do_signal(struct pt_regs *re
> >         * X86_32: vm86 regs switched out by assembly code before
> >         reaching * here, so testing against kernel CS suffices.
> >         */
> > - if (!user_mode(regs))
> > + if (!user_mode(regs)) {
> > + printk(KERN_EMERG "* endless loop\n");
> > + dump_stack();
> >                return;
> > + }
> >
> >        signr = get_signal_to_deliver(&info, &ka, regs, NULL);
> >        if (signr > 0) {
> >
> >
> >
> >
> > ----- Original Message -----
> >> From: "Dmitry Adamushko" <dmitry.adamushko@gmail.com>
> >> To: "Oleg Nesterov" <oleg@redhat.com>
> >> Cc: "Dmitry ADAMUSHKA (EXT)" <dmitry.adamushka_ext@softathome.com>,
> >> "Ingo Molnar" <mingo@elte.hu>, "Ralf Baechle"
> >> <ralf@linux-mips.org>, "wouter cloetens"
> >> <wouter.cloetens@softathome.com>, linux-kernel@vger.kernel.org
> >> Sent: Wednesday, March 7, 2012 9:05:43 PM
> >> Subject: Re: 'khelper' (child) is stuck in endless loop:
> >> do_signal() and !user_mode(regs)
> >
> >> Hi Oleg,
> >>
> >> > On 03/07, Dmitry ADAMUSHKA (EXT) wrote:
> >> >>
> >> >> Now, the assumptions (the question is whether these are true for
> >> >> the recent kernels):
> >> >>
> >> >> 1) TIF_SIGPENDING can be set for 'khelper' while it's running in
> >> >> ____call_usermodehelper()
> >> >>    between (a) flush_signal_handlers() and (b) kernel_execve()
> >> >>    => so TIF_SIGPENDING is set;
> >> >
> >> > Yes, but it is not khelper. It is another kernel thread. Yes, its
> >> > ->comm[] was copied from parent, so ps/etc can show it as
> >> > khelper.
> >>
> >> Sure, that's why I indicated 'khelper' (child).
> >>
> >> >
> >> >> 2) kernel_execve() can fail in ____call_usermodehelper().
> >> >>
> >> >> The later one is less of an assumption; let's say, it fails due
> >> >> to a shortage of memory (or whatever).
> >> >>
> >> >> If (1) is true, then
> >> >>
> >> >> the pre-conditions:
> >> >>
> >> >> - a kernel space task;
> >> >>
> >> >> 'khelper' running ____call_usermodehelper() in our case.
> >> >>
> >> >> - TIF_SIGPENDING is set.
> >> >>
> >> >> A signal has been delivered, say, as a result of kill(-1,
> >> >> SIGKILL).
> >> >>
> >> >> The endless loop is as follows:
> >> >>
> >> >> * syscall_exit_work:
> >> >>  - work_pending: // start_of_the_loop
> >> >
> >> > We shouldn't be here. This is the kernel thread.
> >>
> >> Note that kernel_execve() is backed up by a full fledged syscall
> >> (not just a function call, at least on MIPS and x86), so I assume
> >> that all
> >> the usual syscall-related stuff applies here as well.
> >>
> >> >
> >> > And if start_thread() was already called, then
> >> >
> >> >>  - work_notify_sig:
> >> >>    - do_notify_resume()
> >> >>      - do_signal() ==> if (!user_mode(regs)) return; so signals
> >> >>      are not handled
> >> >
> >> > user_mode() is no longer true.
> >>
> >> !user_mode() is true. Note, the failure of kernel_execve() is one
> >> of the pre-conditions. So we have a kernel thread returning from a
> >> real syscall (hence, syscall_exit and co) with TIF_SIGPENDING.
> >>
> >> >
> >> > Once again, I can be wrong, I'll read this email tomorrow.
> >> >
> >>
> >> Great, thanks!
> >>
> >>
> >> -- Dmitry
> >
> > This message and any attachments herein are confidential, intended
> > solely for the addressees and are SoftAtHome's ownership. Any
> > unauthorized use or dissemination is prohibited. If you are not the
> > intended addressee of this message, please cancel it immediately and
> > inform the sender.
> 
> 
> 
> --
> 
> -- Dmitry
This message and any attachments herein are confidential, intended solely for the addressees and are SoftAtHome's ownership. Any unauthorized use or dissemination is prohibited. If you are not the intended addressee of this message, please cancel it immediately and inform the sender.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 'khelper' (child) is stuck in endless loop: do_signal() and !user_mode(regs)
  2012-03-08 10:37 ` 'khelper' (child) is stuck in endless loop: do_signal() and !user_mode(regs) Dmitry ADAMUSHKA (EXT)
@ 2012-03-08 15:46   ` Oleg Nesterov
  0 siblings, 0 replies; 12+ messages in thread
From: Oleg Nesterov @ 2012-03-08 15:46 UTC (permalink / raw)
  To: Dmitry ADAMUSHKA (EXT), H. Peter Anvin
  Cc: Ingo Molnar, Ralf Baechle, wouter cloetens, linux-kernel,
	Dmitry Adamushko

Hi Dmitry,

I think you are right, but I am not expert. Add Peter.

On 03/08, Dmitry ADAMUSHKA (EXT) wrote:
>
> Oleg,
>
> I'm able to reproduce this problem on x86 (32 bits)

And I guess "32 bits" is important.
arch/x86/kernel/sys_i386_32.c:kernel_execve() does "int 0x80".

If do_execve() fails before start_thread() and TIF_SIGPENDING
is set, entry_32.S calls do_notify_resume() and we lost.

I guess this is what you meant from the very beginning ;)

> It happens only when CONFIG_VM86 is disabled (I tried both). Supposedly,
> due to the following bits of the VM86-specific code that let us break out
> of the endless-loop.
>
> #ifdef CONFIG_VM86
> #define resume_userspace_sig    check_userspace
> #else
> [...]
>
> there is the specific are-we-a-kernel-task? check here
>
> check_userspace:
>         movl PT_EFLAGS(%esp), %eax      # mix EFLAGS and C
>         movb PT_CS(%esp), %al
>         andl $(X86_EFLAGS_VM | SEGMENT_RPL_MASK), %eax
>         cmpl $USER_RPL, %eax
>         jb resume_kernel                # not returning to v8086 or userspace

Agreed, we need the USER_RPL check.

> So here are the patches to simulate the problem. Is this approach not
> valid for one or another reason?
>
> Thanks in advance.
>
>
> === copy-pasted ===
>
> --- kernel/kmod.c.orig  2012-03-08 10:26:05.504752023 +0100
> +++ kernel/kmod.c       2012-03-08 11:25:05.028661835 +0100
> @@ -154,6 +154,15 @@ static int ____call_usermodehelper(void
>         /* We can run anywhere, unlike our parent keventd(). */
>         set_cpus_allowed_ptr(current, cpu_all_mask);
>
> +       printk(KERN_EMERG "Unleash the signal...\n");
> +
> +       /*
> +        * (1) here we emulate receiving a signal.
> +        *     In the original case, a signal should be delivered from outside,
> +        *     say, by "kill(-1, SIGKILL)" in busybox.
> +        */
> +       send_sig(SIGUSR1, current, 0);

Yes, this kills the task, kernel_execve() can't succeed,

>         /*
>          * Our parent is keventd, which runs with elevated scheduling priority.
>          * Avoid propagating that into the userspace child.
> @@ -181,6 +190,19 @@ static int ____call_usermodehelper(void
>
>         commit_creds(new);
>
> +       /* (2) here we emulate the failure of kernel_execve().
> +        *     In real life, the failure can be due to a memory shortage,
> +        *     or something else.
> +         *     In our case, it happens when a board reboots - same as (1) above.
> +        */
> +       retval = kernel_execve(NULL,
> +                              (const char *const *)sub_info->argv,
> +                              (const char *const *)sub_info->envp);

and I guess it can't even return.

> +       printk(KERN_EMERG "x86 is rock-solid!");
> +       flush_signals(current);
> +
> +       /* If we survived the test, let's continue so the user should not notice. */
>         retval = kernel_execve(sub_info->path,
>                                (const char *const *)sub_info->argv,
>                                (const char *const *)sub_info->envp);
>
> and another one
>
> --- arch/x86/kernel/signal.c.orig       2012-03-08 11:18:19.702651943 +0100
> +++ arch/x86/kernel/signal.c    2012-03-08 10:31:18.682304346 +0100
> @@ -765,8 +765,11 @@ static void do_signal(struct pt_regs *re
>          * X86_32: vm86 regs switched out by assembly code before reaching
>          * here, so testing against kernel CS suffices.
>          */
> -       if (!user_mode(regs))
> +       if (!user_mode(regs)) {
> +               printk(KERN_EMERG "* endless loop\n");
> +               dump_stack();
>                 return;
> +       }

so yes, it enters the endless loop.

I do not know what should be fixed, kernel_execve() or system_call paths.

Oleg.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 'khelper' (child) is stuck in endless loop: do_signal() and !user_mode(regs)
  2012-03-08 15:12 ` Dmitry ADAMUSHKA (EXT)
@ 2012-03-08 15:55   ` Dmitry ADAMUSHKA (EXT)
  2012-03-08 16:08     ` Oleg Nesterov
  2012-03-08 16:29   ` Oleg Nesterov
  1 sibling, 1 reply; 12+ messages in thread
From: Dmitry ADAMUSHKA (EXT) @ 2012-03-08 15:55 UTC (permalink / raw)
  To: Ingo Molnar, Oleg Nesterov
  Cc: Ralf Baechle, wouter cloetens, linux-kernel, Dmitry Adamushko


And to simplify a real-life test case: it's enough for khelper's child task, while it's running in ____call_usermodehelper(), to receive SIGKILL. In this case, do_execve_common() will fail - there are a number of fatal_signal_pending(current) checks in there.

--Dmitry

----- Original Message -----
> From: "Dmitry ADAMUSHKA (EXT)" <dmitry.adamushka_ext@softathome.com>
> To: "Ingo Molnar" <mingo@elte.hu>, "Oleg Nesterov" <oleg@redhat.com>
> Cc: "Ralf Baechle" <ralf@linux-mips.org>, "wouter cloetens" <wouter.cloetens@softathome.com>,
> linux-kernel@vger.kernel.org, "Dmitry Adamushko" <dmitry.adamushko@gmail.com>
> Sent: Thursday, March 8, 2012 4:12:46 PM
> Subject: Re: 'khelper' (child) is stuck in endless loop: do_signal() and !user_mode(regs)

> The following quick hack "fixes" it for x86. The output is below
> (contrary to the endless "* endless loop" messages seen before) [1].
> 
> --- arch/x86/kernel/entry_32.S.orig 2012-03-08 15:42:25.041296595
> +0100 +++ arch/x86/kernel/entry_32.S 2012-03-08 15:58:29.926081131
> +0100 @@ -98,12 +98,6 @@
> #endif .endm
> 
> -#ifdef CONFIG_VM86
> -#define resume_userspace_sig check_userspace
> -#else -#define resume_userspace_sig resume_userspace
> -#endif -
> /* * User gs save/restore
> *
> @@ -327,10 +321,19 @@ ret_from_exception:
> preempt_stop(CLBR_ANY)
> ret_from_intr: GET_THREAD_INFO(%ebp)
> -check_userspace:
> +resume_userspace_sig: +#ifdef CONFIG_VM86
> movl PT_EFLAGS(%esp), %eax # mix EFLAGS and CS
> movb PT_CS(%esp), %al
> andl $(X86_EFLAGS_VM | SEGMENT_RPL_MASK), %eax
> +#else +/*
> + * We can be coming here from a syscall done in the kernel space,
> + * e.g. a failed kernel_execve().
> + */
> + movl PT_CS(%esp), %eax
> + andl $SEGMENT_RPL_MASK, %eax
> +#endif cmpl $USER_RPL, %eax
> jb resume_kernel # not returning to v8086 or userspace
> 
> [1]
> 
> [...]
> [ 10.220496] input: HDA Intel Line-Out as
> /devices/pci0000:00/0000:00:1b.0/sound/card0/input8 [ 10.448021]
> Unleash the signal...
> [ 10.448028] * endless loop
> [ 10.448030] Pid: 906, comm: kworker/u:6 Not tainted
> 3.3.0-rc4-crush-custom #4
> [ 10.448032] Call Trace:
> [ 10.448038] [<c151cfe4>] ? printk+0x30/0x34
> [ 10.448042] [<c1002abf>] do_signal+0x7ff/0x890
> [ 10.448045] [<c151f6ed>] ? _raw_spin_trylock+0xd/0x20
> [ 10.448048] [<c1324da8>] ? vt_console_print+0x288/0x360
> [ 10.448052] [<c10264c8>] ? default_spin_lock_flags+0x8/0x10
> [ 10.448054] [<c1522a90>] ? spurious_fault+0xd0/0xd0
> [ 10.448057] [<c1520183>] ? error_code+0x67/0x6c
> [ 10.448060] [<c111007b>] ? read_swap_cache_async+0x7b/0xf0
> [ 10.448063] [<c1133c3d>] ? getname_flags+0x5d/0x160
> [ 10.448065] [<c1133c3d>] ? getname_flags+0x5d/0x160
> [ 10.448067] [<c1133d51>] ? getname+0x11/0x20
> [ 10.448069] [<c1002dc5>] do_notify_resume+0x65/0x80
> [ 10.448071] [<c151fbb7>] work_notifysig+0x16/0x1b
> [ 10.448074] [<c10b00d8>] ? unmask_irq+0x8/0x30
> [ 10.448076] [<c1006661>] ? kernel_execve+0x21/0x30
> [ 10.448080] [<c1048f20>] ? ____call_usermodehelper+0x100/0x130
> [ 10.448082] [<c1048e20>] ? proc_cap_handler+0x180/0x180
> [ 10.448085] [<c1526b3e>] ? kernel_thread_helper+0x6/0x10
> [ 10.448086] x86 is rock-solid!
> [ 10.448842] ppdev: user-space parallel port driver
> [...]
> 
> 
> ----- Original Message -----
> > From: "Dmitry Adamushko" <dmitry.adamushko@gmail.com>
> > To: "Oleg Nesterov" <oleg@redhat.com>
> > Cc: "Ingo Molnar" <mingo@elte.hu>, "Ralf Baechle"
> > <ralf@linux-mips.org>, "wouter cloetens"
> > <wouter.cloetens@softathome.com>, linux-kernel@vger.kernel.org,
> > "Dmitry ADAMUSHKA (EXT)"
> > <dmitry.adamushka_ext@softathome.com>
> > Sent: Thursday, March 8, 2012 11:46:41 AM
> > Subject: Re: 'khelper' (child) is stuck in endless loop: do_signal()
> > and !user_mode(regs)
> 
> > See the enclosed picture.
> >
> > For some reason, I can only see the "* endless loop" messages
> > (KERN_EMERG) on my terminal. Perhaps, it's due to the setting of
> > syslog (or whatever is used here) for this terminal (the primary
> > graphical one is just stuck).
> >
> >
> > On 8 March 2012 11:37, Dmitry ADAMUSHKA (EXT)
> > <dmitry.adamushka_ext@softathome.com> wrote:
> > >
> > > Oleg,
> > >
> > > I'm able to reproduce this problem on x86 (32 bits) with the
> > > following patches that try to simulate the real-life situation
> > > (see the comments in the patches).
> > >
> > > It happens only when CONFIG_VM86 is disabled (I tried both).
> > > Supposedly, due to the following bits of the VM86-specific code
> > > that let us break out of the endless-loop.
> > >
> > > #ifdef CONFIG_VM86
> > > #define resume_userspace_sig check_userspace
> > > #else [...]
> > >
> > > there is the specific are-we-a-kernel-task? check here
> > >
> > > check_userspace:
> > >        movl PT_EFLAGS(%esp), %eax # mix EFLAGS and CS
> > >        movb PT_CS(%esp), %al
> > >        andl $(X86_EFLAGS_VM | SEGMENT_RPL_MASK), %eax
> > >        cmpl $USER_RPL, %eax
> > >        jb resume_kernel # not returning to v8086 or userspace
> > >
> > > ENTRY(resume_userspace)
> > >        LOCKDEP_SYS_EXIT
> > > [...]
> > >        jne work_pending
> > >        jmp restore_all
> > >
> > > which is available neither in case of !CONFIG_VM86, nor in case of
> > > MIPS. Hence, the loop.
> > >
> > > So here are the patches to simulate the problem. Is this approach
> > > not valid for one or another reason?
> > >
> > > Thanks in advance.
> > >
> > >
> > > === copy-pasted ===
> > >
> > > --- kernel/kmod.c.orig 2012-03-08 10:26:05.504752023 +0100
> > > +++ kernel/kmod.c 2012-03-08 11:25:05.028661835 +0100
> > > @@ -154,6 +154,15 @@ static int ____call_usermodehelper(void
> > >        /* We can run anywhere, unlike our parent keventd(). */
> > >        set_cpus_allowed_ptr(current, cpu_all_mask);
> > >
> > > + printk(KERN_EMERG "Unleash the signal...\n");
> > > + + /*
> > > + * (1) here we emulate receiving a signal.
> > > + * In the original case, a signal should be delivered from
> > > outside, + * say, by "kill(-1, SIGKILL)" in busybox.
> > > + */
> > > + send_sig(SIGUSR1, current, 0);
> > > +
> > >        /*
> > >         * Our parent is keventd, which runs with elevated
> > >         scheduling priority. * Avoid propagating that into the
> > >         userspace child.
> > > @@ -181,6 +190,19 @@ static int ____call_usermodehelper(void
> > >
> > >        commit_creds(new);
> > >
> > > + /* (2) here we emulate the failure of kernel_execve().
> > > + * In real life, the failure can be due to a memory shortage,
> > > + * or something else.
> > > + * In our case, it happens when a board reboots - same as (1)
> > > above. + */
> > > + retval = kernel_execve(NULL,
> > > + (const char *const *)sub_info->argv,
> > > + (const char *const *)sub_info->envp);
> > > + + printk(KERN_EMERG "x86 is rock-solid!");
> > > + flush_signals(current);
> > > + + /* If we survived the test, let's continue so the user should
> > > not notice. */
> > >        retval = kernel_execve(sub_info->path,
> > >                               (const char *const *)sub_info->argv,
> > >                               (const char *const
> > >                               *)sub_info->envp);
> > >
> > > and another one
> > >
> > > --- arch/x86/kernel/signal.c.orig 2012-03-08 11:18:19.702651943
> > > +0100 +++ arch/x86/kernel/signal.c 2012-03-08 10:31:18.682304346
> > > +0100 @@ -765,8 +765,11 @@ static void do_signal(struct pt_regs
> > > *re
> > >         * X86_32: vm86 regs switched out by assembly code before
> > >         reaching * here, so testing against kernel CS suffices.
> > >         */
> > > - if (!user_mode(regs))
> > > + if (!user_mode(regs)) {
> > > + printk(KERN_EMERG "* endless loop\n");
> > > + dump_stack();
> > >                return;
> > > + }
> > >
> > >        signr = get_signal_to_deliver(&info, &ka, regs, NULL);
> > >        if (signr > 0) {
> > >
> > >
> > >
> > >
> > > ----- Original Message -----
> > >> From: "Dmitry Adamushko" <dmitry.adamushko@gmail.com>
> > >> To: "Oleg Nesterov" <oleg@redhat.com>
> > >> Cc: "Dmitry ADAMUSHKA (EXT)"
> > >> <dmitry.adamushka_ext@softathome.com>, "Ingo Molnar"
> > >> <mingo@elte.hu>, "Ralf Baechle"
> > >> <ralf@linux-mips.org>, "wouter cloetens"
> > >> <wouter.cloetens@softathome.com>, linux-kernel@vger.kernel.org
> > >> Sent: Wednesday, March 7, 2012 9:05:43 PM
> > >> Subject: Re: 'khelper' (child) is stuck in endless loop:
> > >> do_signal() and !user_mode(regs)
> > >
> > >> Hi Oleg,
> > >>
> > >> > On 03/07, Dmitry ADAMUSHKA (EXT) wrote:
> > >> >>
> > >> >> Now, the assumptions (the question is whether these are true
> > >> >> for
> > >> >> the recent kernels):
> > >> >>
> > >> >> 1) TIF_SIGPENDING can be set for 'khelper' while it's running
> > >> >> in ____call_usermodehelper()
> > >> >>    between (a) flush_signal_handlers() and (b) kernel_execve()
> > >> >>    => so TIF_SIGPENDING is set;
> > >> >
> > >> > Yes, but it is not khelper. It is another kernel thread. Yes,
> > >> > its ->comm[] was copied from parent, so ps/etc can show it as
> > >> > khelper.
> > >>
> > >> Sure, that's why I indicated 'khelper' (child).
> > >>
> > >> >
> > >> >> 2) kernel_execve() can fail in ____call_usermodehelper().
> > >> >>
> > >> >> The later one is less of an assumption; let's say, it fails
> > >> >> due to a shortage of memory (or whatever).
> > >> >>
> > >> >> If (1) is true, then
> > >> >>
> > >> >> the pre-conditions:
> > >> >>
> > >> >> - a kernel space task;
> > >> >>
> > >> >> 'khelper' running ____call_usermodehelper() in our case.
> > >> >>
> > >> >> - TIF_SIGPENDING is set.
> > >> >>
> > >> >> A signal has been delivered, say, as a result of kill(-1,
> > >> >> SIGKILL).
> > >> >>
> > >> >> The endless loop is as follows:
> > >> >>
> > >> >> * syscall_exit_work:
> > >> >>  - work_pending: // start_of_the_loop
> > >> >
> > >> > We shouldn't be here. This is the kernel thread.
> > >>
> > >> Note that kernel_execve() is backed up by a full fledged syscall
> > >> (not just a function call, at least on MIPS and x86), so I assume
> > >> that all
> > >> the usual syscall-related stuff applies here as well.
> > >>
> > >> >
> > >> > And if start_thread() was already called, then
> > >> >
> > >> >>  - work_notify_sig:
> > >> >>    - do_notify_resume()
> > >> >>      - do_signal() ==> if (!user_mode(regs)) return; so
> > >> >>      signals are not handled
> > >> >
> > >> > user_mode() is no longer true.
> > >>
> > >> !user_mode() is true. Note, the failure of kernel_execve() is one
> > >> of the pre-conditions. So we have a kernel thread returning from
> > >> a real syscall (hence, syscall_exit and co) with TIF_SIGPENDING.
> > >>
> > >> >
> > >> > Once again, I can be wrong, I'll read this email tomorrow.
> > >> >
> > >>
> > >> Great, thanks!
> > >>
> > >>
> > >> -- Dmitry
> > >
> > > This message and any attachments herein are confidential, intended
> > > solely for the addressees and are SoftAtHome's ownership. Any
> > > unauthorized use or dissemination is prohibited. If you are not
> > > the intended addressee of this message, please cancel it
> > > immediately and
> > > inform the sender.
> >
> >
> >
> > --
> >
> > -- Dmitry
This message and any attachments herein are confidential, intended solely for the addressees and are SoftAtHome's ownership. Any unauthorized use or dissemination is prohibited. If you are not the intended addressee of this message, please cancel it immediately and inform the sender.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 'khelper' (child) is stuck in endless loop: do_signal() and !user_mode(regs)
  2012-03-08 15:55   ` Dmitry ADAMUSHKA (EXT)
@ 2012-03-08 16:08     ` Oleg Nesterov
  0 siblings, 0 replies; 12+ messages in thread
From: Oleg Nesterov @ 2012-03-08 16:08 UTC (permalink / raw)
  To: Dmitry ADAMUSHKA (EXT)
  Cc: Ingo Molnar, Ralf Baechle, wouter cloetens, linux-kernel,
	Dmitry Adamushko

On 03/08, Dmitry ADAMUSHKA (EXT) wrote:
>
> And to simplify a real-life test case: it's enough for khelper's child task,
> while it's running in ____call_usermodehelper(), to receive SIGKILL.
> In this case, do_execve_common() will fail - there are a number of
> fatal_signal_pending(current) checks in there.

Actually there is no difference, SIGUSR1 equally kills the task and
makes fatal_signal_pending() true. The handler is SIG_DFL after
flush_signal_handlers(), complete_signal() adds SIGKILL implicitely.

Not that this actually matters.

Oleg.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 'khelper' (child) is stuck in endless loop: do_signal() and !user_mode(regs)
  2012-03-08 15:12 ` Dmitry ADAMUSHKA (EXT)
  2012-03-08 15:55   ` Dmitry ADAMUSHKA (EXT)
@ 2012-03-08 16:29   ` Oleg Nesterov
  2012-03-08 16:58     ` Dmitry ADAMUSHKA (EXT)
  1 sibling, 1 reply; 12+ messages in thread
From: Oleg Nesterov @ 2012-03-08 16:29 UTC (permalink / raw)
  To: Dmitry ADAMUSHKA (EXT), H. Peter Anvin
  Cc: Ingo Molnar, Ralf Baechle, wouter cloetens, linux-kernel,
	Dmitry Adamushko

On 03/08, Dmitry ADAMUSHKA (EXT) wrote:
>
> The following quick hack "fixes" it for x86.

First of all let me repeat, I do not understand this asm ;)
Fortunately Ingo and Peter do.

But,

> --- arch/x86/kernel/entry_32.S.orig     2012-03-08 15:42:25.041296595 +0100
> +++ arch/x86/kernel/entry_32.S  2012-03-08 15:58:29.926081131 +0100
> @@ -98,12 +98,6 @@
>  #endif
>  .endm
>
> -#ifdef CONFIG_VM86
> -#define resume_userspace_sig   check_userspace
> -#else
> -#define resume_userspace_sig   resume_userspace
> -#endif
> -
>  /*
>   * User gs save/restore
>   *
> @@ -327,10 +321,19 @@ ret_from_exception:
>         preempt_stop(CLBR_ANY)
>  ret_from_intr:
>         GET_THREAD_INFO(%ebp)
> -check_userspace:
> +resume_userspace_sig:
> +#ifdef CONFIG_VM86
>         movl PT_EFLAGS(%esp), %eax      # mix EFLAGS and CS
>         movb PT_CS(%esp), %al
>         andl $(X86_EFLAGS_VM | SEGMENT_RPL_MASK), %eax
> +#else
> +/*
> + * We can be coming here from a syscall done in the kernel space,
> + * e.g. a failed kernel_execve().
> + */
> +       movl PT_CS(%esp), %eax
> +       andl $SEGMENT_RPL_MASK, %eax
> +#endif
>         cmpl $USER_RPL, %eax
>         jb resume_kernel                # not returning to v8086 or userspace

IIUC (I can be easily wrong) this breaks the endless loop, but
only after do_notify_resume() was already called.

_perhaps_ it would be better to avoid do_notify_resume() in this
case altogether. Say, fire_user_return_notifiers() doesn't look
right in this case, we are not going to return to the usermode.

Not that I think this is really wrong though.

Oleg.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 'khelper' (child) is stuck in endless loop: do_signal() and !user_mode(regs)
  2012-03-08 16:29   ` Oleg Nesterov
@ 2012-03-08 16:58     ` Dmitry ADAMUSHKA (EXT)
  2012-03-12 16:35       ` Dmitry ADAMUSHKA (EXT)
  0 siblings, 1 reply; 12+ messages in thread
From: Dmitry ADAMUSHKA (EXT) @ 2012-03-08 16:58 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Ralf Baechle, wouter cloetens, linux-kernel,
	Dmitry Adamushko, H. Peter Anvin


> On 03/08, Dmitry ADAMUSHKA (EXT) wrote:
> >
> > The following quick hack "fixes" it for x86.
> 
> First of all let me repeat, I do not understand this asm ;)
> Fortunately Ingo and Peter do.
> 
> But,
> 
> > --- arch/x86/kernel/entry_32.S.orig 2012-03-08 15:42:25.041296595
> > +0100 +++ arch/x86/kernel/entry_32.S 2012-03-08 15:58:29.926081131
> > +0100 @@ -98,12 +98,6 @@
> >  #endif .endm
> >
> > -#ifdef CONFIG_VM86
> > -#define resume_userspace_sig check_userspace
> > -#else -#define resume_userspace_sig resume_userspace
> > -#endif -
> >  /*
> >   * User gs save/restore
> >   *
> > @@ -327,10 +321,19 @@ ret_from_exception:
> >         preempt_stop(CLBR_ANY)
> >  ret_from_intr:
> >         GET_THREAD_INFO(%ebp)
> > -check_userspace:
> > +resume_userspace_sig: +#ifdef CONFIG_VM86
> >         movl PT_EFLAGS(%esp), %eax # mix EFLAGS and CS
> >         movb PT_CS(%esp), %al
> >         andl $(X86_EFLAGS_VM | SEGMENT_RPL_MASK), %eax
> > +#else +/*
> > + * We can be coming here from a syscall done in the kernel space,
> > + * e.g. a failed kernel_execve().
> > + */
> > + movl PT_CS(%esp), %eax
> > + andl $SEGMENT_RPL_MASK, %eax
> > +#endif
> >         cmpl $USER_RPL, %eax
> >         jb resume_kernel # not returning to v8086 or userspace
> 
> IIUC (I can be easily wrong) this breaks the endless loop, but
> only after do_notify_resume() was already called.

yeah, basically I'm simulating the approach of CONFIG_VM86 here, i.e. doing the check at the same place in the call chain. Well, and giving a possibility for that "if (!user_mode(regs))" code in do_signal() to execute :-))

btw., what are the legitimate cases/code-paths for this part of do_signal()? I see that other archs just copy-past this approach. My initial thought was that it has something to do with handling (or rather preserving) some sort of signals that get delivered to not-yet-fully-created user-space tasks.. so that they get handled when these new tasks are up-and-running.

> 
> _perhaps_ it would be better to avoid do_notify_resume() in this
> case altogether. Say, fire_user_return_notifiers() doesn't look
> right in this case, we are not going to return to the usermode.
>

yeah, there are some other corner cases I'm not sure about (like with syscall tracing). Also, there can be other scenarios of entering this loop... so one way or another, the loop should be broken.

--Dmitry
This message and any attachments herein are confidential, intended solely for the addressees and are SoftAtHome's ownership. Any unauthorized use or dissemination is prohibited. If you are not the intended addressee of this message, please cancel it immediately and inform the sender.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 'khelper' (child) is stuck in endless loop: do_signal() and !user_mode(regs)
  2012-03-08 16:58     ` Dmitry ADAMUSHKA (EXT)
@ 2012-03-12 16:35       ` Dmitry ADAMUSHKA (EXT)
  2012-03-12 18:00         ` Oleg Nesterov
  0 siblings, 1 reply; 12+ messages in thread
From: Dmitry ADAMUSHKA (EXT) @ 2012-03-12 16:35 UTC (permalink / raw)
  To: Oleg Nesterov, Ingo Molnar, H. Peter Anvin
  Cc: linux-kernel, Dmitry Adamushko, Ralf Baechle


just wondering, whether this problem/scenario considered to be completely unrealistic or I just should resend the x86 patch alone to get more attention? :-)

--Dmitry

----- Original Message -----
> From: "Dmitry ADAMUSHKA (EXT)" <dmitry.adamushka_ext@softathome.com>
> To: "Oleg Nesterov" <oleg@redhat.com>
> Cc: "Ingo Molnar" <mingo@elte.hu>, "Ralf Baechle" <ralf@linux-mips.org>, "wouter cloetens"
> <wouter.cloetens@softathome.com>, linux-kernel@vger.kernel.org, "Dmitry Adamushko" <dmitry.adamushko@gmail.com>, "H.
> Peter Anvin" <hpa@zytor.com>
> Sent: Thursday, March 8, 2012 5:58:29 PM
> Subject: Re: 'khelper' (child) is stuck in endless loop: do_signal() and !user_mode(regs)

> > On 03/08, Dmitry ADAMUSHKA (EXT) wrote:
> > >
> > > The following quick hack "fixes" it for x86.
> >
> > First of all let me repeat, I do not understand this asm ;)
> > Fortunately Ingo and Peter do.
> >
> > But,
> >
> > > --- arch/x86/kernel/entry_32.S.orig 2012-03-08 15:42:25.041296595
> > > +0100 +++ arch/x86/kernel/entry_32.S 2012-03-08 15:58:29.926081131
> > > +0100 @@ -98,12 +98,6 @@
> > >  #endif .endm
> > >
> > > -#ifdef CONFIG_VM86
> > > -#define resume_userspace_sig check_userspace
> > > -#else -#define resume_userspace_sig resume_userspace
> > > -#endif -
> > >  /*
> > >   * User gs save/restore
> > >   *
> > > @@ -327,10 +321,19 @@ ret_from_exception:
> > >         preempt_stop(CLBR_ANY)
> > >  ret_from_intr:
> > >         GET_THREAD_INFO(%ebp)
> > > -check_userspace: +resume_userspace_sig: +#ifdef CONFIG_VM86
> > >         movl PT_EFLAGS(%esp), %eax # mix EFLAGS and CS
> > >         movb PT_CS(%esp), %al
> > >         andl $(X86_EFLAGS_VM | SEGMENT_RPL_MASK), %eax
> > > +#else +/*
> > > + * We can be coming here from a syscall done in the kernel space,
> > > + * e.g. a failed kernel_execve().
> > > + */
> > > + movl PT_CS(%esp), %eax
> > > + andl $SEGMENT_RPL_MASK, %eax
> > > +#endif
> > >         cmpl $USER_RPL, %eax
> > >         jb resume_kernel # not returning to v8086 or userspace
> >
> > IIUC (I can be easily wrong) this breaks the endless loop, but
> > only after do_notify_resume() was already called.
> 
> yeah, basically I'm simulating the approach of CONFIG_VM86 here, i.e.
> doing the check at the same place in the call chain. Well, and giving
> a possibility for that "if (!user_mode(regs))" code in do_signal() to
> execute :-))
> 
> btw., what are the legitimate cases/code-paths for this part of
> do_signal()? I see that other archs just copy-past this approach. My
> initial thought was that it has something to do with handling (or
> rather preserving) some sort of signals that get delivered to
> not-yet-fully-created user-space tasks.. so that they get handled when
> these new tasks are up-and-running.
> 
> >
> > _perhaps_ it would be better to avoid do_notify_resume() in this
> > case altogether. Say, fire_user_return_notifiers() doesn't look
> > right in this case, we are not going to return to the usermode.
> >
> 
> yeah, there are some other corner cases I'm not sure about (like with
> syscall tracing). Also, there can be other scenarios of entering this
> loop... so one way or another, the loop should be broken.
> 
> --Dmitry
This message and any attachments herein are confidential, intended solely for the addressees and are SoftAtHome's ownership. Any unauthorized use or dissemination is prohibited. If you are not the intended addressee of this message, please cancel it immediately and inform the sender.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 'khelper' (child) is stuck in endless loop: do_signal() and !user_mode(regs)
  2012-03-12 16:35       ` Dmitry ADAMUSHKA (EXT)
@ 2012-03-12 18:00         ` Oleg Nesterov
  0 siblings, 0 replies; 12+ messages in thread
From: Oleg Nesterov @ 2012-03-12 18:00 UTC (permalink / raw)
  To: Dmitry ADAMUSHKA (EXT)
  Cc: Ingo Molnar, H. Peter Anvin, linux-kernel, Dmitry Adamushko,
	Ralf Baechle, Roland McGrath

On 03/12, Dmitry ADAMUSHKA (EXT) wrote:
>
> just wondering, whether this problem/scenario considered
> to be completely unrealistic

I think you are absolutely right.

Either x86_32 shouldn't use int80 for syscalls from kernel-mode, or
system_call paths should take SEGMENT_RPL_MASK into account.

> or I just should resend the x86 patch alone to get more attention? :-)

Probably ;) As I said, I can't help when it comes to the low-level asm
details. I'd also suggest to cc Roland McGrath <roland@hack.frob.com>

Ingo, Peter, I think we need your time/help.

> --Dmitry
> 
> ----- Original Message -----
> > From: "Dmitry ADAMUSHKA (EXT)" <dmitry.adamushka_ext@softathome.com>
> > To: "Oleg Nesterov" <oleg@redhat.com>
> > Cc: "Ingo Molnar" <mingo@elte.hu>, "Ralf Baechle" <ralf@linux-mips.org>, "wouter cloetens"
> > <wouter.cloetens@softathome.com>, linux-kernel@vger.kernel.org, "Dmitry Adamushko" <dmitry.adamushko@gmail.com>, "H.
> > Peter Anvin" <hpa@zytor.com>
> > Sent: Thursday, March 8, 2012 5:58:29 PM
> > Subject: Re: 'khelper' (child) is stuck in endless loop: do_signal() and !user_mode(regs)
> 
> > > On 03/08, Dmitry ADAMUSHKA (EXT) wrote:
> > > >
> > > > The following quick hack "fixes" it for x86.
> > >
> > > First of all let me repeat, I do not understand this asm ;)
> > > Fortunately Ingo and Peter do.
> > >
> > > But,
> > >
> > > > --- arch/x86/kernel/entry_32.S.orig 2012-03-08 15:42:25.041296595
> > > > +0100 +++ arch/x86/kernel/entry_32.S 2012-03-08 15:58:29.926081131
> > > > +0100 @@ -98,12 +98,6 @@
> > > >  #endif .endm
> > > >
> > > > -#ifdef CONFIG_VM86
> > > > -#define resume_userspace_sig check_userspace
> > > > -#else -#define resume_userspace_sig resume_userspace
> > > > -#endif -
> > > >  /*
> > > >   * User gs save/restore
> > > >   *
> > > > @@ -327,10 +321,19 @@ ret_from_exception:
> > > >         preempt_stop(CLBR_ANY)
> > > >  ret_from_intr:
> > > >         GET_THREAD_INFO(%ebp)
> > > > -check_userspace: +resume_userspace_sig: +#ifdef CONFIG_VM86
> > > >         movl PT_EFLAGS(%esp), %eax # mix EFLAGS and CS
> > > >         movb PT_CS(%esp), %al
> > > >         andl $(X86_EFLAGS_VM | SEGMENT_RPL_MASK), %eax
> > > > +#else +/*
> > > > + * We can be coming here from a syscall done in the kernel space,
> > > > + * e.g. a failed kernel_execve().
> > > > + */
> > > > + movl PT_CS(%esp), %eax
> > > > + andl $SEGMENT_RPL_MASK, %eax
> > > > +#endif
> > > >         cmpl $USER_RPL, %eax
> > > >         jb resume_kernel # not returning to v8086 or userspace
> > >
> > > IIUC (I can be easily wrong) this breaks the endless loop, but
> > > only after do_notify_resume() was already called.
> > 
> > yeah, basically I'm simulating the approach of CONFIG_VM86 here, i.e.
> > doing the check at the same place in the call chain. Well, and giving
> > a possibility for that "if (!user_mode(regs))" code in do_signal() to
> > execute :-))
> > 
> > btw., what are the legitimate cases/code-paths for this part of
> > do_signal()? I see that other archs just copy-past this approach. My
> > initial thought was that it has something to do with handling (or
> > rather preserving) some sort of signals that get delivered to
> > not-yet-fully-created user-space tasks.. so that they get handled when
> > these new tasks are up-and-running.
> > 
> > >
> > > _perhaps_ it would be better to avoid do_notify_resume() in this
> > > case altogether. Say, fire_user_return_notifiers() doesn't look
> > > right in this case, we are not going to return to the usermode.
> > >
> > 
> > yeah, there are some other corner cases I'm not sure about (like with
> > syscall tracing). Also, there can be other scenarios of entering this
> > loop... so one way or another, the loop should be broken.
> > 
> > --Dmitry
> This message and any attachments herein are confidential, intended solely for the addressees and are SoftAtHome's ownership. Any unauthorized use or dissemination is prohibited. If you are not the intended addressee of this message, please cancel it immediately and inform the sender.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-03-12 18:07 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <139779962.60750.1331202718116.JavaMail.root@storentr1.softathome.com>
2012-03-08 10:37 ` 'khelper' (child) is stuck in endless loop: do_signal() and !user_mode(regs) Dmitry ADAMUSHKA (EXT)
2012-03-08 15:46   ` Oleg Nesterov
     [not found] <CAO6Zf6C+SDZ-TV12wr9oiO6HB-itQ6fLPHFugXk0osEiAxW22w@mail.gmail.com>
2012-03-08 15:12 ` Dmitry ADAMUSHKA (EXT)
2012-03-08 15:55   ` Dmitry ADAMUSHKA (EXT)
2012-03-08 16:08     ` Oleg Nesterov
2012-03-08 16:29   ` Oleg Nesterov
2012-03-08 16:58     ` Dmitry ADAMUSHKA (EXT)
2012-03-12 16:35       ` Dmitry ADAMUSHKA (EXT)
2012-03-12 18:00         ` Oleg Nesterov
     [not found] <1144797072.59663.1331142646789.JavaMail.root@storentr1.softathome.com>
2012-03-07 17:51 ` Dmitry ADAMUSHKA (EXT)
2012-03-07 18:46   ` Oleg Nesterov
2012-03-07 20:05     ` Dmitry Adamushko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox