public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: 2.6.25.3: su gets stuck for root
@ 2008-06-02  1:31 Joe Peterson
  2008-06-02  5:12 ` Harald Dunkel
  0 siblings, 1 reply; 38+ messages in thread
From: Joe Peterson @ 2008-06-02  1:31 UTC (permalink / raw)
  To: harald.dunkel, linux-kernel; +Cc: Alan Cox

Hi Harold,

I just also discovered this problem independently, and when I tracked it
down to stty and googled for it, I found your post.  In my test case, it
seems to get stuck in stty as run from the user's .bashrc (i.e., "su
user", where the user's .bashrc has the stty command).  In my case, the
arguments to stty do not seem to matter (well, I've tried "-ixany" and
"echoctl" - same results).  Also, the problem is made more reliable if a
sleep is done before the stty.  E.g., here's my test .bashrc:

sleep 2
stty -ixany

Note that if run from the console or a tty, having the user logged in
already seems to avoid the hang, but doing it within an xterm shows the
hang.  Strange, since with my original [more complex] test case, it
seemed to require *not* running X (tty/console only).

Most recent kernels show the issue - the only one that doesn't is
2.6.25-git17.  I am running Gentoo.  It does happen in a recent 2.6.26
git (an rc4 git from a couple of days ago).

Doing "ps" while hung shows stty in the "T" state.  "killall -9 stty"
releases it.

					-Joe

P.S.  Please cc my address on reply.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 2.6.25.3: su gets stuck for root
  2008-06-02  1:31 2.6.25.3: su gets stuck for root Joe Peterson
@ 2008-06-02  5:12 ` Harald Dunkel
  2008-06-02  5:32   ` Willy Tarreau
  2008-06-02  5:42   ` 2.6.25.3: su gets stuck for root Joe Peterson
  0 siblings, 2 replies; 38+ messages in thread
From: Harald Dunkel @ 2008-06-02  5:12 UTC (permalink / raw)
  To: Joe Peterson; +Cc: linux-kernel, Alan Cox

Hi Joe,

Joe Peterson wrote:
> Hi Harold,
> 
> Doing "ps" while hung shows stty in the "T" state.  "killall -9 stty"
> releases it.
> 

Does strace give you the same output if you attach it to the blocking
stty (strace -p $pid)?

I got


:
ioctl(0, SNDCTL_TMR_START or TCSETS, {B38400 opost isig icanon echo ...}) = ? ERESTARTSYS (To be restarted)
--- SIGTTOU (Stopped (tty output)) @ 0 (0) ---
--- SIGTTOU (Stopped (tty output)) @ 0 (0) ---
:


Regards

Harri


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 2.6.25.3: su gets stuck for root
  2008-06-02  5:12 ` Harald Dunkel
@ 2008-06-02  5:32   ` Willy Tarreau
  2008-06-02  5:55     ` Joe Peterson
  2008-06-02  8:10     ` Alan Cox
  2008-06-02  5:42   ` 2.6.25.3: su gets stuck for root Joe Peterson
  1 sibling, 2 replies; 38+ messages in thread
From: Willy Tarreau @ 2008-06-02  5:32 UTC (permalink / raw)
  To: Harald Dunkel; +Cc: Joe Peterson, linux-kernel, Alan Cox

On Mon, Jun 02, 2008 at 07:12:06AM +0200, Harald Dunkel wrote:
> Hi Joe,
> 
> Joe Peterson wrote:
> >Hi Harold,
> >
> >Doing "ps" while hung shows stty in the "T" state.  "killall -9 stty"
> >releases it.
> >
> 
> Does strace give you the same output if you attach it to the blocking
> stty (strace -p $pid)?
> 
> I got
> 
> 
> :
> ioctl(0, SNDCTL_TMR_START or TCSETS, {B38400 opost isig icanon echo ...}) = 
> ? ERESTARTSYS (To be restarted)
> --- SIGTTOU (Stopped (tty output)) @ 0 (0) ---
> --- SIGTTOU (Stopped (tty output)) @ 0 (0) ---
> :

Guys, you should test if "kill -CONT $pid" wakes the process up.
It might be possible that some obscure bug appeared in the tty
code resulting in SIGTTOU sometimes being sent to the caller,
although that seems rather strange :-/

Willy


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 2.6.25.3: su gets stuck for root
  2008-06-02  5:12 ` Harald Dunkel
  2008-06-02  5:32   ` Willy Tarreau
@ 2008-06-02  5:42   ` Joe Peterson
  1 sibling, 0 replies; 38+ messages in thread
From: Joe Peterson @ 2008-06-02  5:42 UTC (permalink / raw)
  To: Harald Dunkel; +Cc: linux-kernel, Alan Cox

Harald Dunkel wrote:
> Joe Peterson wrote:
>> Hi Harold,
>>
>> Doing "ps" while hung shows stty in the "T" state.  "killall -9 stty"
>> releases it.
>>
> 
> Does strace give you the same output if you attach it to the blocking
> stty (strace -p $pid)?
> 
> I got
> 
> 
> :
> ioctl(0, SNDCTL_TMR_START or TCSETS, {B38400 opost isig icanon echo ...}) = ? ERESTARTSYS (To be restarted)
> --- SIGTTOU (Stopped (tty output)) @ 0 (0) ---
> --- SIGTTOU (Stopped (tty output)) @ 0 (0) ---

Yep, almost the same.  I get (repeating):

ioctl(0, SNDCTL_TMR_STOP or TCSETSW, {B38400 opost isig icanon echo
...}) = ? ERESTARTSYS (To be restarted)
--- SIGTTOU (Stopped (tty output)) @ 0 (0) ---
--- SIGTTOU (Stopped (tty output)) @ 0 (0) ---


			-Joe

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 2.6.25.3: su gets stuck for root
  2008-06-02  5:32   ` Willy Tarreau
@ 2008-06-02  5:55     ` Joe Peterson
  2008-06-02  8:10     ` Alan Cox
  1 sibling, 0 replies; 38+ messages in thread
From: Joe Peterson @ 2008-06-02  5:55 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Harald Dunkel, linux-kernel, Alan Cox

Willy Tarreau wrote:
> Guys, you should test if "kill -CONT $pid" wakes the process up.
> It might be possible that some obscure bug appeared in the tty
> code resulting in SIGTTOU sometimes being sent to the caller,
> although that seems rather strange :-/

Just tried this ("kill -CONT <pid>") - no luck.

BTW, it should be possible, I would think, for others to duplicate this
fairly easily.  Just:

1) make a user, "foo", with login shell set to /bin/bash
2) create a .bashrc in foo's home dir with contents:

sleep 2
stty -ixany

3) cp .bashrc .bash_profile  (only needed to test "su - foo" too)
4) become root
5) type "su foo" (or "su - foo")

Sometimes it takes a second try to get it to happen.  If the su hangs,
check to see if the stty process is in state "T".  Also, it may make a
difference if you are logged in already as foo or are using X.  I first
noticed this with no users logged in (except root) and no X running (but
I can reproduce with X/xterm as well using this simple test case).  It
seems timing is a factor, so it's worth trying various things.

					-Joe

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 2.6.25.3: su gets stuck for root
  2008-06-02  5:32   ` Willy Tarreau
  2008-06-02  5:55     ` Joe Peterson
@ 2008-06-02  8:10     ` Alan Cox
  2008-06-02  9:01       ` David Newall
  1 sibling, 1 reply; 38+ messages in thread
From: Alan Cox @ 2008-06-02  8:10 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Harald Dunkel, Joe Peterson, linux-kernel, Alan Cox

> Guys, you should test if "kill -CONT $pid" wakes the process up.
> It might be possible that some obscure bug appeared in the tty
> code resulting in SIGTTOU sometimes being sent to the caller,
> although that seems rather strange :-/

Not really. The task would get suspended if it attempted to change the
tty settings while not being session leader. This is part of the POSIX
and BSD job control. A race (either kernel or in something like
sshd/bash) would do that and could have been caused by any of the timing
changes recently.

That would also explain why I can't duplicate it, and the sleep
observation.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 2.6.25.3: su gets stuck for root
  2008-06-02  8:10     ` Alan Cox
@ 2008-06-02  9:01       ` David Newall
  2008-06-02  9:20         ` Alan Cox
  0 siblings, 1 reply; 38+ messages in thread
From: David Newall @ 2008-06-02  9:01 UTC (permalink / raw)
  To: Alan Cox; +Cc: Willy Tarreau, Harald Dunkel, Joe Peterson, linux-kernel,
	Alan Cox

Alan Cox wrote:
> Not really. The task would get suspended if it attempted to change the
> tty settings while not being session leader. This is part of the POSIX
> and BSD job control.

I haven't heard about this new restriction, but it begs the observation
that stty, when forked from a shell (the usual case), is never a session
leader.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 2.6.25.3: su gets stuck for root
  2008-06-02  9:01       ` David Newall
@ 2008-06-02  9:20         ` Alan Cox
  2008-06-02 10:16           ` Vegard Nossum
  2008-06-02 15:26           ` Joe Peterson
  0 siblings, 2 replies; 38+ messages in thread
From: Alan Cox @ 2008-06-02  9:20 UTC (permalink / raw)
  To: David Newall
  Cc: Willy Tarreau, Harald Dunkel, Joe Peterson, linux-kernel,
	Alan Cox

On Mon, 02 Jun 2008 18:31:34 +0930
David Newall <davidn@davidnewall.com> wrote:

> Alan Cox wrote:
> > Not really. The task would get suspended if it attempted to change the
> > tty settings while not being session leader. This is part of the POSIX
> > and BSD job control.
> 
> I haven't heard about this new restriction, but it begs the observation
> that stty, when forked from a shell (the usual case), is never a session
> leader.

Sorry I mean part of the current session. I was thinking about the
specific case of bash or the ssh->bash setup where the question would be
whether the shell was session leader.

Someone who can dup this needs to instrument it in tty_ioctl really.

Alan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 2.6.25.3: su gets stuck for root
  2008-06-02  9:20         ` Alan Cox
@ 2008-06-02 10:16           ` Vegard Nossum
  2008-06-02 10:39             ` Vegard Nossum
  2008-06-02 10:50             ` Alan Cox
  2008-06-02 15:26           ` Joe Peterson
  1 sibling, 2 replies; 38+ messages in thread
From: Vegard Nossum @ 2008-06-02 10:16 UTC (permalink / raw)
  To: Alan Cox
  Cc: David Newall, Willy Tarreau, Harald Dunkel, Joe Peterson,
	linux-kernel, Alan Cox

[-- Attachment #1: Type: text/plain, Size: 1825 bytes --]

On Mon, Jun 2, 2008 at 11:20 AM, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> On Mon, 02 Jun 2008 18:31:34 +0930
> David Newall <davidn@davidnewall.com> wrote:
>
>> Alan Cox wrote:
>> > Not really. The task would get suspended if it attempted to change the
>> > tty settings while not being session leader. This is part of the POSIX
>> > and BSD job control.
>>
>> I haven't heard about this new restriction, but it begs the observation
>> that stty, when forked from a shell (the usual case), is never a session
>> leader.
>
> Sorry I mean part of the current session. I was thinking about the
> specific case of bash or the ssh->bash setup where the question would be
> whether the shell was session leader.
>
> Someone who can dup this needs to instrument it in tty_ioctl really.

Hi,

I have written a short test program that seems to reproduce it for me
(see attachment), even though the original su/stty stuff wouldn't.

Basically, the strace shows this:
ioctl(0, SNDCTL_TMR_START or TCSETS, {B38400 opost isig icanon echo
...}) = ? ERESTARTSYS (To be restarted)
--- SIGTTOU (Stopped (tty output)) @ 0 (0) ---
--- SIGTTOU (Stopped (tty output)) @ 0 (0) ---
ioctl(0, SNDCTL_TMR_START or TCSETS, {B38400 opost isig icanon echo
...}) = ? ERESTARTSYS (To be restarted)
--- SIGTTOU (Stopped (tty output)) @ 0 (0) ---
--- SIGTTOU (Stopped (tty output)) @ 0 (0) ---
... (repeating)

The exact code path triggering this seems to be:

tcsetattr() -> ioctl(TCSETS) -> set_termios() -> tty_check_change()

This is on a 2.6.24.5-85.fc8 kernel.

I don't know what's wrong, but I hope this helps.


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: reproduce.c --]
[-- Type: text/x-csrc; name=reproduce.c, Size: 1077 bytes --]

#include <sys/types.h>
#include <sys/wait.h>

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <termios.h>
#include <unistd.h>

int
main(int argc, char *argv[])
{
	pid_t child;

	printf("pgid = %d\n", getpgrp());

	child = fork();
	if (child == 0) {
		struct termios termios_p;

		printf("forked, pgid = %d\n", getpgrp());

		if (setpgrp() == -1) {
			printf("error: setpgid: %s\n", strerror(errno));
			exit(EXIT_FAILURE);
		}

		printf("new pgid = %d\n", getpgrp());

		if (tcgetattr(STDIN_FILENO, &termios_p) == -1) {
			printf("error: tcgetattr: %s\n", strerror(errno));
			exit(EXIT_FAILURE);
		}

		if (tcsetattr(STDIN_FILENO, 0, &termios_p) == -1) {
			printf("error: tcsetattr: %s\n", strerror(errno));
			exit(EXIT_FAILURE);
		}

		exit(EXIT_SUCCESS);
	}

	printf("forked, child = %d\n", child);

	while (1) {
		pid_t pid;
		int status;

		pid = wait(&status);
		if (pid == -1) {
			printf("error: wait: %s\n", strerror(errno));
			exit(EXIT_FAILURE);
		}

		printf("pid %d status %d\n", pid, status);
	}

	return EXIT_SUCCESS;
}

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 2.6.25.3: su gets stuck for root
  2008-06-02 10:16           ` Vegard Nossum
@ 2008-06-02 10:39             ` Vegard Nossum
  2008-06-02 10:52               ` Alan Cox
  2008-06-02 10:50             ` Alan Cox
  1 sibling, 1 reply; 38+ messages in thread
From: Vegard Nossum @ 2008-06-02 10:39 UTC (permalink / raw)
  To: Alan Cox
  Cc: David Newall, Willy Tarreau, Harald Dunkel, Joe Peterson,
	linux-kernel, Alan Cox

On Mon, Jun 2, 2008 at 12:16 PM, Vegard Nossum <vegard.nossum@gmail.com> wrote:
> On Mon, Jun 2, 2008 at 11:20 AM, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
>> On Mon, 02 Jun 2008 18:31:34 +0930
>> David Newall <davidn@davidnewall.com> wrote:
>>
>>> Alan Cox wrote:
>>> > Not really. The task would get suspended if it attempted to change the
>>> > tty settings while not being session leader. This is part of the POSIX
>>> > and BSD job control.
>>>
>>> I haven't heard about this new restriction, but it begs the observation
>>> that stty, when forked from a shell (the usual case), is never a session
>>> leader.
>>
>> Sorry I mean part of the current session. I was thinking about the
>> specific case of bash or the ssh->bash setup where the question would be
>> whether the shell was session leader.
>>
>> Someone who can dup this needs to instrument it in tty_ioctl really.
>
> Hi,
>
> I have written a short test program that seems to reproduce it for me
> (see attachment), even though the original su/stty stuff wouldn't.
>
> Basically, the strace shows this:
> ioctl(0, SNDCTL_TMR_START or TCSETS, {B38400 opost isig icanon echo
> ...}) = ? ERESTARTSYS (To be restarted)
> --- SIGTTOU (Stopped (tty output)) @ 0 (0) ---
> --- SIGTTOU (Stopped (tty output)) @ 0 (0) ---
> ioctl(0, SNDCTL_TMR_START or TCSETS, {B38400 opost isig icanon echo
> ...}) = ? ERESTARTSYS (To be restarted)
> --- SIGTTOU (Stopped (tty output)) @ 0 (0) ---
> --- SIGTTOU (Stopped (tty output)) @ 0 (0) ---
> ... (repeating)
>
> The exact code path triggering this seems to be:
>
> tcsetattr() -> ioctl(TCSETS) -> set_termios() -> tty_check_change()
>
> This is on a 2.6.24.5-85.fc8 kernel.
>
> I don't know what's wrong, but I hope this helps.

The error seems that tty_check_change() returns -ERESTARTSYS.
Shouldn't it be EINTR to allow the signal to be processed and let the
process decide whether to retry the tcsetattr()?

Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 2.6.25.3: su gets stuck for root
  2008-06-02 10:16           ` Vegard Nossum
  2008-06-02 10:39             ` Vegard Nossum
@ 2008-06-02 10:50             ` Alan Cox
  2008-06-17 15:32               ` Joe Peterson
  1 sibling, 1 reply; 38+ messages in thread
From: Alan Cox @ 2008-06-02 10:50 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Alan Cox, David Newall, Willy Tarreau, Harald Dunkel,
	Joe Peterson, linux-kernel, Alan Cox

On Mon, Jun 02, 2008 at 12:16:56PM +0200, Vegard Nossum wrote:
> ioctl(0, SNDCTL_TMR_START or TCSETS, {B38400 opost isig icanon echo
> ...}) = ? ERESTARTSYS (To be restarted)
> --- SIGTTOU (Stopped (tty output)) @ 0 (0) ---
> --- SIGTTOU (Stopped (tty output)) @ 0 (0) ---
> ioctl(0, SNDCTL_TMR_START or TCSETS, {B38400 opost isig icanon echo
> ...}) = ? ERESTARTSYS (To be restarted)
> --- SIGTTOU (Stopped (tty output)) @ 0 (0) ---
> --- SIGTTOU (Stopped (tty output)) @ 0 (0) ---
> ... (repeating)
> 
> The exact code path triggering this seems to be:
> 
> tcsetattr() -> ioctl(TCSETS) -> set_termios() -> tty_check_change()

This looks correct to me and in fact I see the behaviour you report on 2.6.23
when running it. If I tell it to ignore SIGTTOU that also then behaves as
expected.

If
	your pgrp is not the pgrp of the tty
	and you are not ignoring TTOU
	and you are not orphaned (as a group)

	Then we are *supposed* to send you SIGTTOU and kick you back
	into touch.


This is so that if you do

someapp
^Z
bg
otherapp

And someapp wants to change the tty settings it blocks back to the shell.

This is correct behaviour and behaviour we've had for years.

Alan


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 2.6.25.3: su gets stuck for root
  2008-06-02 10:39             ` Vegard Nossum
@ 2008-06-02 10:52               ` Alan Cox
  2008-06-02 10:57                 ` Vegard Nossum
  0 siblings, 1 reply; 38+ messages in thread
From: Alan Cox @ 2008-06-02 10:52 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Alan Cox, David Newall, Willy Tarreau, Harald Dunkel,
	Joe Peterson, linux-kernel, Alan Cox

On Mon, Jun 02, 2008 at 12:39:29PM +0200, Vegard Nossum wrote:
> Shouldn't it be EINTR to allow the signal to be processed and let the
> process decide whether to retry the tcsetattr()?

The signal is processed, and then application retries the tcsetattr and
gets another one. The default TTOU behaviour is to block and then fg
continues the call so RESTARTSYS is both correct and has been used for
years

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 2.6.25.3: su gets stuck for root
  2008-06-02 10:52               ` Alan Cox
@ 2008-06-02 10:57                 ` Vegard Nossum
  2008-06-02 12:28                   ` Alan Cox
  0 siblings, 1 reply; 38+ messages in thread
From: Vegard Nossum @ 2008-06-02 10:57 UTC (permalink / raw)
  To: Alan Cox
  Cc: Alan Cox, David Newall, Willy Tarreau, Harald Dunkel,
	Joe Peterson, linux-kernel

On Mon, Jun 2, 2008 at 12:52 PM, Alan Cox <alan@redhat.com> wrote:
> On Mon, Jun 02, 2008 at 12:39:29PM +0200, Vegard Nossum wrote:
>> Shouldn't it be EINTR to allow the signal to be processed and let the
>> process decide whether to retry the tcsetattr()?
>
> The signal is processed, and then application retries the tcsetattr and
> gets another one. The default TTOU behaviour is to block and then fg
> continues the call so RESTARTSYS is both correct and has been used for
> years
>

Hm, yes, that seems correct. I'm sorry for the wrong suggestions.

I guess this still doesn't explain why TTOU doesn't block (IOW, stop
the process, right?) in this case, because my test program does not
touch it.


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 2.6.25.3: su gets stuck for root
  2008-06-02 10:57                 ` Vegard Nossum
@ 2008-06-02 12:28                   ` Alan Cox
  2008-06-02 14:31                     ` Vegard Nossum
  0 siblings, 1 reply; 38+ messages in thread
From: Alan Cox @ 2008-06-02 12:28 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Alan Cox, Alan Cox, David Newall, Willy Tarreau, Harald Dunkel,
	Joe Peterson, linux-kernel

On Mon, Jun 02, 2008 at 12:57:07PM +0200, Vegard Nossum wrote:
> I guess this still doesn't explain why TTOU doesn't block (IOW, stop
> the process, right?) in this case, because my test program does not
> touch it.

I see the parent process sleeping and the child taking TTOU and going to
state T. That again is correct.

alan      3219  0.0  0.0   3652   384 pts/5    S    13:11   0:00 ./repro
alan      3220  0.0  0.0   3652   204 pts/5    T    13:11   0:00 ./repro

If you run it without any straces etc do you see it blocked in T or sitting
in R ?

Alan



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 2.6.25.3: su gets stuck for root
  2008-06-02 12:28                   ` Alan Cox
@ 2008-06-02 14:31                     ` Vegard Nossum
  0 siblings, 0 replies; 38+ messages in thread
From: Vegard Nossum @ 2008-06-02 14:31 UTC (permalink / raw)
  To: Alan Cox
  Cc: Alan Cox, David Newall, Willy Tarreau, Harald Dunkel,
	Joe Peterson, linux-kernel

On Mon, Jun 2, 2008 at 2:28 PM, Alan Cox <alan@redhat.com> wrote:
> On Mon, Jun 02, 2008 at 12:57:07PM +0200, Vegard Nossum wrote:
>> I guess this still doesn't explain why TTOU doesn't block (IOW, stop
>> the process, right?) in this case, because my test program does not
>> touch it.
>
> I see the parent process sleeping and the child taking TTOU and going to
> state T. That again is correct.
>
> alan      3219  0.0  0.0   3652   384 pts/5    S    13:11   0:00 ./repro
> alan      3220  0.0  0.0   3652   204 pts/5    T    13:11   0:00 ./repro
>
> If you run it without any straces etc do you see it blocked in T or sitting
> in R ?

Without any straces, it is blocked in T. Like Joe's report.

With strace, it's in R.

Exactly as you said, correct and expected behaviour.

So this is not a kernel problem at all.

I'm sorry for having wasted your time :-(


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 2.6.25.3: su gets stuck for root
  2008-06-02  9:20         ` Alan Cox
  2008-06-02 10:16           ` Vegard Nossum
@ 2008-06-02 15:26           ` Joe Peterson
  2008-06-02 15:51             ` Alan Cox
  1 sibling, 1 reply; 38+ messages in thread
From: Joe Peterson @ 2008-06-02 15:26 UTC (permalink / raw)
  To: Alan Cox; +Cc: David Newall, Willy Tarreau, Harald Dunkel, linux-kernel,
	Alan Cox

Alan Cox wrote:
> Someone who can dup this needs to instrument it in tty_ioctl really.

Alan, since I can get it to happen faithfully, I can try this - any
suggestions on where to instrument?

	Thanks, Joe

P.S.  My stty process sits in "T" - did you say that it would be in "R"
if straced and that is correct?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 2.6.25.3: su gets stuck for root
  2008-06-02 15:26           ` Joe Peterson
@ 2008-06-02 15:51             ` Alan Cox
  2008-06-02 16:03               ` Joe Peterson
  2008-06-04 14:43               ` Joe Peterson
  0 siblings, 2 replies; 38+ messages in thread
From: Alan Cox @ 2008-06-02 15:51 UTC (permalink / raw)
  To: Joe Peterson
  Cc: Alan Cox, David Newall, Willy Tarreau, Harald Dunkel,
	linux-kernel, Alan Cox

On Mon, Jun 02, 2008 at 09:26:48AM -0600, Joe Peterson wrote:
> P.S.  My stty process sits in "T" - did you say that it would be in "R"
> if straced and that is correct?

T would be correct. I'll put together a small diff to printk useful stuff
when it happens and sent it you tonight/tomorrow


-- 
--
                Take control of enterprise infrastructure
                   Sign up for starfleet academy today

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 2.6.25.3: su gets stuck for root
  2008-06-02 15:51             ` Alan Cox
@ 2008-06-02 16:03               ` Joe Peterson
  2008-06-04 14:43               ` Joe Peterson
  1 sibling, 0 replies; 38+ messages in thread
From: Joe Peterson @ 2008-06-02 16:03 UTC (permalink / raw)
  To: Alan Cox; +Cc: Alan Cox, David Newall, Willy Tarreau, Harald Dunkel,
	linux-kernel

Alan Cox wrote:
> On Mon, Jun 02, 2008 at 09:26:48AM -0600, Joe Peterson wrote:
>> P.S.  My stty process sits in "T" - did you say that it would be in "R"
>> if straced and that is correct?
> 
> T would be correct. I'll put together a small diff to printk useful stuff
> when it happens and sent it you tonight/tomorrow

Awesome; that would be great - thanks!

			-Joe

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 2.6.25.3: su gets stuck for root
  2008-06-02 15:51             ` Alan Cox
  2008-06-02 16:03               ` Joe Peterson
@ 2008-06-04 14:43               ` Joe Peterson
  2008-06-04 15:16                 ` Alan Cox
  1 sibling, 1 reply; 38+ messages in thread
From: Joe Peterson @ 2008-06-04 14:43 UTC (permalink / raw)
  To: Alan Cox; +Cc: Alan Cox, David Newall, Willy Tarreau, Harald Dunkel,
	linux-kernel

Alan Cox wrote:
> On Mon, Jun 02, 2008 at 09:26:48AM -0600, Joe Peterson wrote:
>> P.S.  My stty process sits in "T" - did you say that it would be in "R"
>> if straced and that is correct?
> 
> T would be correct. I'll put together a small diff to printk useful stuff
> when it happens and sent it you tonight/tomorrow

[Alan, thanks for the tips on where to instrument this]

What I have verified so far is that when the problem occurs, it gets to
this point in [tty_io.c] tty_check_change():

1229         kill_pgrp(task_pgrp(current), SIGTTOU, 1);
1230         set_thread_flag(TIF_SIGPENDING);
1231         ret = -ERESTARTSYS;
1232 out:
1233         return ret;

So the error that gets returned to set_termios() is -512.

Also, the various checks before this point (of course) did not pass
(current->signal->tty != tty, !tty->pgrp, task_pgrp(current) ==
tty->pgrp, is_ignored(SIGTTOU), is_current_pgrp_orphaned()).  I have not
printed out the various values from these - let me know if this would be
helpful.  I wanted to pass this info along now in case it is of help.

						-Joe

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 2.6.25.3: su gets stuck for root
  2008-06-04 14:43               ` Joe Peterson
@ 2008-06-04 15:16                 ` Alan Cox
  2008-06-04 16:52                   ` Joe Peterson
  0 siblings, 1 reply; 38+ messages in thread
From: Alan Cox @ 2008-06-04 15:16 UTC (permalink / raw)
  To: Joe Peterson
  Cc: Alan Cox, Alan Cox, David Newall, Willy Tarreau, Harald Dunkel,
	linux-kernel

On Wed, Jun 04, 2008 at 08:43:00AM -0600, Joe Peterson wrote:
> 
> So the error that gets returned to set_termios() is -512.
> 
> Also, the various checks before this point (of course) did not pass
> (current->signal->tty != tty, !tty->pgrp, task_pgrp(current) ==
> tty->pgrp, is_ignored(SIGTTOU), is_current_pgrp_orphaned()).  I have not
> printed out the various values from these - let me know if this would be
> helpful.  I wanted to pass this info along now in case it is of help.

See what tty->pgrp is at that point when it hangs and that might identify
who is owning the tty and tty setup

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 2.6.25.3: su gets stuck for root
  2008-06-04 15:16                 ` Alan Cox
@ 2008-06-04 16:52                   ` Joe Peterson
  2008-06-04 17:10                     ` Alan Cox
  0 siblings, 1 reply; 38+ messages in thread
From: Joe Peterson @ 2008-06-04 16:52 UTC (permalink / raw)
  To: Alan Cox; +Cc: Alan Cox, David Newall, Willy Tarreau, Harald Dunkel,
	linux-kernel

Alan Cox wrote:
> See what tty->pgrp is at that point when it hangs and that might identify
> who is owning the tty and tty setup

tty = current->signal->tty = -142080000 or 0xf7880800
task->pgrg                 = -142405824 or 0xf7830f40

		-Joe


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 2.6.25.3: su gets stuck for root
  2008-06-04 16:52                   ` Joe Peterson
@ 2008-06-04 17:10                     ` Alan Cox
  2008-06-04 20:32                       ` Joe Peterson
  0 siblings, 1 reply; 38+ messages in thread
From: Alan Cox @ 2008-06-04 17:10 UTC (permalink / raw)
  To: Joe Peterson
  Cc: Alan Cox, Alan Cox, David Newall, Willy Tarreau, Harald Dunkel,
	linux-kernel

> tty = current->signal->tty = -142080000 or 0xf7880800
> task->pgrg                 = -142405824 or 0xf7830f40

task->pgrp is a struct pid - you need the value it holds

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 2.6.25.3: su gets stuck for root
  2008-06-04 17:10                     ` Alan Cox
@ 2008-06-04 20:32                       ` Joe Peterson
  2008-06-11 14:04                         ` Joe Peterson
  0 siblings, 1 reply; 38+ messages in thread
From: Joe Peterson @ 2008-06-04 20:32 UTC (permalink / raw)
  To: Alan Cox; +Cc: Alan Cox, David Newall, Willy Tarreau, Harald Dunkel,
	linux-kernel

Alan Cox wrote:
>> tty = current->signal->tty = -142080000 or 0xf7880800
>> task->pgrg                 = -142405824 or 0xf7830f40
> 
> task->pgrp is a struct pid - you need the value it holds

Yeah, I figured later that giving you the addresses was rather useless.  :)

Anyway, here is more info:

tty_check_change: current->signal->tty = f7880800
tty_check_change: tty = f7880800
tty_check_change: tty->pgrp = f7b99e40
  tty->pgrp->count = 5
  tty->pgrp->level = 0
  tty->pgrp->numbers[0].nr = 6951
tty_check_change: task_pgrp(current) = f7b99d40
  task_pgrp(current)->count = 1
  task_pgrp(current)->level = 0
  task_pgrp(current)->numbers[0].nr = 6952
tty_check_change: kill_pgrp called; returning -ERESTARTSYS
set_termios: error return value (-512) from tty_check_change
foo       6951  0.0  0.1   2332  1096 tty1     S+   14:18   0:00 su foo
foo       6952  0.0  0.1   2988  1464 tty1     S    14:18   0:00 bash


So, looks like the tty->pgrp's process is the "su" command itself, and
the task_pgrp(current)'s process is "bash" - the shell started by the su.

						-Joe

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 2.6.25.3: su gets stuck for root
  2008-06-04 20:32                       ` Joe Peterson
@ 2008-06-11 14:04                         ` Joe Peterson
  2008-06-12 11:52                           ` Vegard Nossum
  0 siblings, 1 reply; 38+ messages in thread
From: Joe Peterson @ 2008-06-11 14:04 UTC (permalink / raw)
  To: Alan Cox; +Cc: Alan Cox, David Newall, Willy Tarreau, Harald Dunkel,
	linux-kernel

Joe Peterson wrote:
> Anyway, here is more info:
> 
> tty_check_change: current->signal->tty = f7880800
> tty_check_change: tty = f7880800
> tty_check_change: tty->pgrp = f7b99e40
>   tty->pgrp->count = 5
>   tty->pgrp->level = 0
>   tty->pgrp->numbers[0].nr = 6951
> tty_check_change: task_pgrp(current) = f7b99d40
>   task_pgrp(current)->count = 1
>   task_pgrp(current)->level = 0
>   task_pgrp(current)->numbers[0].nr = 6952
> tty_check_change: kill_pgrp called; returning -ERESTARTSYS
> set_termios: error return value (-512) from tty_check_change
> foo       6951  0.0  0.1   2332  1096 tty1     S+   14:18   0:00 su foo
> foo       6952  0.0  0.1   2988  1464 tty1     S    14:18   0:00 bash
> 
> 
> So, looks like the tty->pgrp's process is the "su" command itself, and
> the task_pgrp(current)'s process is "bash" - the shell started by the su.

If anyone has any tips for my further debugging of this, given the
above, let me know.  I'd like to help resolve this.

					Thanks!  Joe

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 2.6.25.3: su gets stuck for root
  2008-06-11 14:04                         ` Joe Peterson
@ 2008-06-12 11:52                           ` Vegard Nossum
  2008-06-14  1:49                             ` Joe Peterson
  0 siblings, 1 reply; 38+ messages in thread
From: Vegard Nossum @ 2008-06-12 11:52 UTC (permalink / raw)
  To: Joe Peterson
  Cc: Alan Cox, Alan Cox, David Newall, Willy Tarreau, Harald Dunkel,
	linux-kernel

On Wed, Jun 11, 2008 at 4:04 PM, Joe Peterson <joe@skyrush.com> wrote:
> Joe Peterson wrote:
>> Anyway, here is more info:
>>
>> tty_check_change: current->signal->tty = f7880800
>> tty_check_change: tty = f7880800
>> tty_check_change: tty->pgrp = f7b99e40
>>   tty->pgrp->count = 5
>>   tty->pgrp->level = 0
>>   tty->pgrp->numbers[0].nr = 6951
>> tty_check_change: task_pgrp(current) = f7b99d40
>>   task_pgrp(current)->count = 1
>>   task_pgrp(current)->level = 0
>>   task_pgrp(current)->numbers[0].nr = 6952
>> tty_check_change: kill_pgrp called; returning -ERESTARTSYS
>> set_termios: error return value (-512) from tty_check_change
>> foo       6951  0.0  0.1   2332  1096 tty1     S+   14:18   0:00 su foo
>> foo       6952  0.0  0.1   2988  1464 tty1     S    14:18   0:00 bash
>>
>>
>> So, looks like the tty->pgrp's process is the "su" command itself, and
>> the task_pgrp(current)'s process is "bash" - the shell started by the su.
>
> If anyone has any tips for my further debugging of this, given the
> above, let me know.  I'd like to help resolve this.

I think knowing the pgrps of the above processes (there is possibly
one more involved, stty?) would be useful; try:

    $ ps -eo pid,pgrp,tpgid,user,args

..as this problem occurs because a process tries to change the
terminal settings (and subsequently gets suspended because of that)
while it's not the owner of the terminal.

This can happen if you fork something off to the background, e.g. like

    $ stty 9600 &

(which should immediately give you [1]+ Stopped stty 9600),

so can you please look for anything like that in your login scripts or
shell rc files?

I don't know any other way to debug this further, sorry :-(

Thanks.


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 2.6.25.3: su gets stuck for root
  2008-06-12 11:52                           ` Vegard Nossum
@ 2008-06-14  1:49                             ` Joe Peterson
  2008-06-14  7:45                               ` Vegard Nossum
  0 siblings, 1 reply; 38+ messages in thread
From: Joe Peterson @ 2008-06-14  1:49 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Alan Cox, Alan Cox, David Newall, Willy Tarreau, Harald Dunkel,
	linux-kernel

Vegard Nossum wrote:
> I think knowing the pgrps of the above processes (there is possibly
> one more involved, stty?) would be useful; try:
> 
>     $ ps -eo pid,pgrp,tpgid,user,args

OK, I performed this test again (getting the su to hang), and here is
the info:

tty_check_change: current->signal->tty = f7879800
tty_check_change: tty = f7879800
tty_check_change: tty->pgrp = f78639c0
  tty->pgrp->count = 5
  tty->pgrp->level = 0
  tty->pgrp->numbers[0].nr = 7036
tty_check_change: task_pgrp(current) = f7863f00
  task_pgrp(current)->count = 1
  task_pgrp(current)->level = 0
  task_pgrp(current)->numbers[0].nr = 7037
tty_check_change: kill_pgrp called; returning -ERESTARTSYS
set_termios: error return value (-512) from tty_check_change

scorpius ~ # ps aux | grep 7036
foo  7036  0.0  0.1   2336  1100 tty1     S+   19:30   0:00 su foo

scorpius ~ # ps aux | grep 7037
foo  7037  0.0  0.1   2988  1460 tty1     S    19:30   0:00 bash

scorpius ~ # ps -eo pid,pgrp,tpgid,user,args | grep 7036
 6902  6902  7036 root     /bin/login --
 6922  6922  7036 root     -bash
 7036  7036  7036 foo      su foo
 7037  7037  7036 foo      bash
 7042  7037  7036 foo      stty -ixany

scorpius ~ # ps -eo pid,pgrp,tpgid,user,args | grep 7037
 7037  7037  7036 foo      bash
 7042  7037  7036 foo      stty -ixany

scorpius ~ # ps aux | grep 7042
foo  7042  0.0  0.0   1608   376 tty1     T    19:30   0:00 stty -ixany

scorpius ~ # ps -eo pid,pgrp,tpgid,user,args | grep 7042
 7042  7037  7036 foo      stty -ixany

(I omitted, of course, when grep found itself, and I compressed some
white space to allow lines to fit nicely in the email)

> ..as this problem occurs because a process tries to change the
> terminal settings (and subsequently gets suspended because of that)
> while it's not the owner of the terminal.
> 
> This can happen if you fork something off to the background, e.g. like
> 
>     $ stty 9600 &
> 
> (which should immediately give you [1]+ Stopped stty 9600),
> 
> so can you please look for anything like that in your login scripts or
> shell rc files?

I do use stty in my .bashrc (that's why this happens), but I do not put
it in the background.

Anyway, hope the additional info above is of help...

					Thanks, Joe

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 2.6.25.3: su gets stuck for root
  2008-06-14  1:49                             ` Joe Peterson
@ 2008-06-14  7:45                               ` Vegard Nossum
  2008-06-14 17:43                                 ` Joe Peterson
  0 siblings, 1 reply; 38+ messages in thread
From: Vegard Nossum @ 2008-06-14  7:45 UTC (permalink / raw)
  To: Joe Peterson
  Cc: Alan Cox, Alan Cox, David Newall, Willy Tarreau, Harald Dunkel,
	linux-kernel

On Sat, Jun 14, 2008 at 3:49 AM, Joe Peterson <joe@skyrush.com> wrote:
> Vegard Nossum wrote:
>> I think knowing the pgrps of the above processes (there is possibly
>> one more involved, stty?) would be useful; try:
>>
>>     $ ps -eo pid,pgrp,tpgid,user,args
>
> OK, I performed this test again (getting the su to hang), and here is
> the info:

[snip]

> scorpius ~ # ps -eo pid,pgrp,tpgid,user,args | grep 7036
>  6902  6902  7036 root     /bin/login --
>  6922  6922  7036 root     -bash
>  7036  7036  7036 foo      su foo
>  7037  7037  7036 foo      bash
>  7042  7037  7036 foo      stty -ixany

So this clearly shows what's wrong; 7036 is the "controlling process"
group id. But only "su foo" is in this group, the bash and stty
processes have their own group, 7037.

On my own system, when I do "su", I get this:
 2891  2891  2892 root     su temp
 2892  2892  2892 temp     bash

...and here the "bash" process is in the right group, 2892, while "su"
is the one in the background!

Can you try to run strace on the su to see where things go wrong, i.e.

    $ strace -f -e trace=process su foo

...and we're only interested in what happens up to the point where it
hangs. That should hopefully tell us which process is doing the wrong
thing. In either case, as Alan pointed out, this seems unlikely to be
a kernel problem.

[snip]

>> so can you please look for anything like that in your login scripts or
>> shell rc files?
>
> I do use stty in my .bashrc (that's why this happens), but I do not put
> it in the background.

Yeah, most likely the process that calls stty is first put in the
background itself (or never brought to the foreground?). But I don't
know why... when you get the trace, we can compare and find out where
it deviates.


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 2.6.25.3: su gets stuck for root
  2008-06-14  7:45                               ` Vegard Nossum
@ 2008-06-14 17:43                                 ` Joe Peterson
  2008-06-14 20:34                                   ` Vegard Nossum
  0 siblings, 1 reply; 38+ messages in thread
From: Joe Peterson @ 2008-06-14 17:43 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Alan Cox, Alan Cox, David Newall, Willy Tarreau, Harald Dunkel,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1468 bytes --]

Vegard Nossum wrote:
> So this clearly shows what's wrong; 7036 is the "controlling process"
> group id. But only "su foo" is in this group, the bash and stty
> processes have their own group, 7037.
> 
> On my own system, when I do "su", I get this:
>  2891  2891  2892 root     su temp
>  2892  2892  2892 temp     bash
> 
> ...and here the "bash" process is in the right group, 2892, while "su"
> is the one in the background!

Hmm.

> Can you try to run strace on the su to see where things go wrong, i.e.
> 
>     $ strace -f -e trace=process su foo
> 
> ...and we're only interested in what happens up to the point where it
> hangs. That should hopefully tell us which process is doing the wrong
> thing. In either case, as Alan pointed out, this seems unlikely to be
> a kernel problem.

OK, I attached this as a text file at the end.  But (*bummer*), using
strace makes it impossible to reproduce the hang (figures, and I believe
someone earlier in the thread also had this problem).

As for whether the kernel is at fault, not sure (i.e. does this hang
behavior implicate the kernel automatically or can a user-space process
cause itself such an issue?).  But I *do* see different behavior
depending on the kernel version.  There were a couple of git kernels in
which I could not reproduce it.  Still, if it is a race or something, it
might be that the conditions were just slightly perturbed.

I attached the strace log just in case it is of help.

					-Joe

[-- Attachment #2: su_strace.log --]
[-- Type: text/x-log, Size: 2501 bytes --]

7009  execve("/bin/su", ["su", "foo"], [/* 32 vars */]) = 0
7009  clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7e3d708) = 7010
7010  execve("/bin/bash", ["bash"], [/* 31 vars */]) = 0
7010  clone( <unfinished ...>
7009  waitpid(-1,  <unfinished ...>
7010  <... clone resumed> child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7db0708) = 7011
7011  exit_group(0)                     = ?
7010  --- SIGCHLD (Child exited) @ 0 (0) ---
7010  waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 7011
7010  waitpid(-1, 0xbff58cec, WNOHANG)  = -1 ECHILD (No child processes)
7010  clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7db0708) = 7012
7012  clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7db0708) = 7013
7013  execve("/usr/bin/dircolors", ["dircolors", "-b", "/etc/DIR_COLORS"], [/* 31 vars */]) = 0
7013  exit_group(0)                     = ?
7012  --- SIGCHLD (Child exited) @ 0 (0) ---
7012  waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 7013
7012  waitpid(-1, 0xbff585ec, WNOHANG)  = -1 ECHILD (No child processes)
7012  exit_group(0)                     = ?
7010  waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 7012
7010  --- SIGCHLD (Child exited) @ 0 (0) ---
7010  waitpid(-1, 0xbff5873c, WNOHANG)  = -1 ECHILD (No child processes)
7010  clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7db0708) = 7014
7014  execve("/bin/sleep", ["sleep", "2"], [/* 31 vars */]) = 0
7010  waitpid(-1,  <unfinished ...>
7014  exit_group(0)                     = ?
7010  <... waitpid resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 7014
7010  --- SIGCHLD (Child exited) @ 0 (0) ---
7010  waitpid(-1, 0xbff593dc, WNOHANG)  = -1 ECHILD (No child processes)
7010  clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7db0708) = 7015
7015  execve("/bin/stty", ["stty", "-ixany"], [/* 31 vars */]) = 0
7015  exit_group(0)                     = ?
7010  --- SIGCHLD (Child exited) @ 0 (0) ---
7010  waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 7015
7010  waitpid(-1, 0xbff5936c, WNOHANG)  = -1 ECHILD (No child processes)
7010  exit_group(0)                     = ?
7009  <... waitpid resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WSTOPPED) = 7010
7009  exit_group(0)                     = ?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 2.6.25.3: su gets stuck for root
  2008-06-14 17:43                                 ` Joe Peterson
@ 2008-06-14 20:34                                   ` Vegard Nossum
  2008-06-14 20:52                                     ` Joe Peterson
  0 siblings, 1 reply; 38+ messages in thread
From: Vegard Nossum @ 2008-06-14 20:34 UTC (permalink / raw)
  To: Joe Peterson
  Cc: Alan Cox, Alan Cox, David Newall, Willy Tarreau, Harald Dunkel,
	linux-kernel

On Sat, Jun 14, 2008 at 7:43 PM, Joe Peterson <joe@skyrush.com> wrote:
>> Can you try to run strace on the su to see where things go wrong, i.e.
>>
>>     $ strace -f -e trace=process su foo
>>
>> ...and we're only interested in what happens up to the point where it
>> hangs. That should hopefully tell us which process is doing the wrong
>> thing. In either case, as Alan pointed out, this seems unlikely to be
>> a kernel problem.
>
> OK, I attached this as a text file at the end.  But (*bummer*), using
> strace makes it impossible to reproduce the hang (figures, and I believe
> someone earlier in the thread also had this problem).

Yeah, but doesn't it loop indefinitely calling ioctl() and getting a
SIGTTOU? Tracing up till this point is okay (and what I had in mind).

>
> As for whether the kernel is at fault, not sure (i.e. does this hang
> behavior implicate the kernel automatically or can a user-space process
> cause itself such an issue?).  But I *do* see different behavior
> depending on the kernel version.  There were a couple of git kernels in
> which I could not reproduce it.  Still, if it is a race or something, it
> might be that the conditions were just slightly perturbed.

Yeah, a user-space process can do this, and it's the right behaviour
for the kernel. I did post a program that would "reproduce" what
you're seeing. I do now believe that it's something timing-related, as
Alan suggested initially. (But timing-related with your scripts, that
is. I must say, that "sleep 2" does look a bit suspicious; I have no
idea what that is supposed to do :-))

I suppose it would be more useful to see a trace where you include a
few more system calls, can you try:

    # strace -e trace=process,ioctl,setpgid -f su foo

instead?

Just for the record, I'm probably not the best person to debug this,
so I'm just trying to figure it out as we go. On the other hand, I
don't see better suggestions from anybody else. Thank you for
persisting, though! :-)

(And the fact that the results differ with the kernel versions does
make this relevant for LKML still.)


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 2.6.25.3: su gets stuck for root
  2008-06-14 20:34                                   ` Vegard Nossum
@ 2008-06-14 20:52                                     ` Joe Peterson
  2008-06-14 21:26                                       ` Vegard Nossum
  0 siblings, 1 reply; 38+ messages in thread
From: Joe Peterson @ 2008-06-14 20:52 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Alan Cox, Alan Cox, David Newall, Willy Tarreau, Harald Dunkel,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2153 bytes --]

Vegard Nossum wrote:
> Yeah, a user-space process can do this, and it's the right behaviour
> for the kernel. I did post a program that would "reproduce" what
> you're seeing. I do now believe that it's something timing-related, as
> Alan suggested initially. (But timing-related with your scripts, that
> is. I must say, that "sleep 2" does look a bit suspicious; I have no
> idea what that is supposed to do :-))

Ah, that is something I put in there to artificially make it more
reproducible.  Here's the reason: when I first encountered the problem,
it was happening if the home dir of the user was on the "btrfs"
filesystem (the new checksumming one from Oracle).  This made me suspect
btrfs initially.  But I reproduced the problem [more sporadically] when
the home was on ext3 as well.  Since btrfs has a different performance
profile, especially when first accessed after a mount (and it is a
filesystem still under development, so some optimizations are yet to
come), I figured it might be timing-related, and sure enough, adding the
"sleep 2" proved that.

So without the sleep 2 and with a home of ext3, it rarely happens, since
it takes very little time to read the homedir files (.bashrc, etc.).
Putting in the sleep makes it almost always happen.  It seems like the
delay invoked by the sleep causes that subsequent stty call to hang.

> I suppose it would be more useful to see a trace where you include a
> few more system calls, can you try:
> 
>     # strace -e trace=process,ioctl,setpgid -f su foo
> 
> instead?

OK, attached.

> Just for the record, I'm probably not the best person to debug this,
> so I'm just trying to figure it out as we go. On the other hand, I
> don't see better suggestions from anybody else. Thank you for
> persisting, though! :-)
> 
> (And the fact that the results differ with the kernel versions does
> make this relevant for LKML still.)

Thanks for helping.  Yes, this is the kind of nagging issue that really
bugs me, since it is intermittent and makes things feel unstable.  If we
determine the problem is in something else (like stty or bash), then at
least I can file a bug with them.

						-Joe

[-- Attachment #2: strace_su.log --]
[-- Type: text/x-log, Size: 5681 bytes --]

9738  execve("/bin/su", ["su", "foo"], [/* 50 vars */]) = 0
9738  ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
9738  ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
9738  ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
9738  clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7dfc708) = 9740
9738  waitpid(-1,  <unfinished ...>
9740  execve("/bin/bash", ["bash"], [/* 50 vars */]) = 0
9740  ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
9740  ioctl(2, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
9740  ioctl(255, TIOCGPGRP, [9737])     = 0
9740  setpgid(0, 9740)                  = 0
9740  ioctl(255, TIOCSPGRP, [9740])     = 0
9740  ioctl(255, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
9740  clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7df1708) = 9741
9741  exit_group(0)                     = ?
9740  --- SIGCHLD (Child exited) @ 0 (0) ---
9740  waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 9741
9740  waitpid(-1, 0xbfb187fc, WNOHANG)  = -1 ECHILD (No child processes)
9740  ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbfb18774) = -1 ENOTTY (Inappropriate ioctl for device)
9740  clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7df1708) = 9742
9742  clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7df1708) = 9743
9742  waitpid(-1,  <unfinished ...>
9743  execve("/usr/bin/dircolors", ["dircolors", "-b", "/etc/DIR_COLORS"], [/* 49 vars */]) = 0
9743  exit_group(0)                     = ?
9742  <... waitpid resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 9743
9742  --- SIGCHLD (Child exited) @ 0 (0) ---
9742  waitpid(-1, 0xbfb17fbc, WNOHANG)  = -1 ECHILD (No child processes)
9742  exit_group(0)                     = ?
9740  --- SIGCHLD (Child exited) @ 0 (0) ---
9740  waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 9742
9740  waitpid(-1, 0xbfb1824c, WNOHANG)  = -1 ECHILD (No child processes)
9740  clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7df1708) = 9744
9740  waitpid(-1,  <unfinished ...>
9744  execve("/bin/sleep", ["sleep", "2"], [/* 49 vars */]) = 0
9744  exit_group(0)                     = ?
9740  <... waitpid resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 9744
9740  ioctl(255, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
9740  ioctl(255, TIOCGWINSZ, {ws_row=30, ws_col=80, ws_xpixel=724, ws_ypixel=454}) = 0
9740  ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
9740  ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
9740  ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
9740  ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
9740  ioctl(1, TIOCGWINSZ, {ws_row=30, ws_col=80, ws_xpixel=724, ws_ypixel=454}) = 0
9740  ioctl(0, TIOCGWINSZ, {ws_row=30, ws_col=80, ws_xpixel=724, ws_ypixel=454}) = 0
9740  --- SIGCHLD (Child exited) @ 0 (0) ---
9740  waitpid(-1, 0xbfb18d3c, WNOHANG)  = -1 ECHILD (No child processes)
9740  clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7df1708) = 9745
9740  waitpid(-1,  <unfinished ...>
9745  execve("/bin/stty", ["stty", "-ixany"], [/* 49 vars */]) = 0
9745  ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
9745  ioctl(0, SNDCTL_TMR_STOP or TCSETSW, {B38400 opost isig icanon echo ...}) = 0
9745  ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
9745  exit_group(0)                     = ?
9740  <... waitpid resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 9745
9740  ioctl(255, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
9740  ioctl(255, TIOCGWINSZ, {ws_row=30, ws_col=80, ws_xpixel=724, ws_ypixel=454}) = 0
9740  --- SIGCHLD (Child exited) @ 0 (0) ---
9740  waitpid(-1, 0xbfb18d3c, WNOHANG)  = -1 ECHILD (No child processes)
9740  ioctl(255, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
9740  ioctl(255, TIOCGWINSZ, {ws_row=30, ws_col=80, ws_xpixel=724, ws_ypixel=454}) = 0
9740  ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
9740  ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
9740  ioctl(1, TIOCGWINSZ, {ws_row=30, ws_col=80, ws_xpixel=724, ws_ypixel=454}) = 0
9740  ioctl(0, TIOCGWINSZ, {ws_row=30, ws_col=80, ws_xpixel=724, ws_ypixel=454}) = 0
9740  ioctl(0, TIOCSWINSZ, {ws_row=30, ws_col=80, ws_xpixel=724, ws_ypixel=454}) = 0
9740  ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
9740  ioctl(255, TIOCSPGRP, [9740])     = 0
9740  ioctl(0, TIOCGWINSZ, {ws_row=30, ws_col=80, ws_xpixel=724, ws_ypixel=454}) = 0
9740  ioctl(0, TIOCSWINSZ, {ws_row=30, ws_col=80, ws_xpixel=724, ws_ypixel=454}) = 0
9740  ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
9740  ioctl(0, SNDCTL_TMR_STOP or TCSETSW, {B38400 opost isig -icanon -echo ...}) = 0
9740  ioctl(0, SNDCTL_TMR_STOP or TCSETSW, {B38400 opost isig icanon echo ...}) = 0
9740  ioctl(255, TIOCSPGRP, [9737])     = 0
9740  setpgid(0, 9737)                  = 0
9740  exit_group(0)                     = ?
9738  <... waitpid resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WSTOPPED) = 9740
9738  exit_group(0)                     = ?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 2.6.25.3: su gets stuck for root
  2008-06-14 20:52                                     ` Joe Peterson
@ 2008-06-14 21:26                                       ` Vegard Nossum
  2008-06-14 21:34                                         ` Joe Peterson
  2008-07-02 18:03                                         ` tty session leader issue (was Re: 2.6.25.3: su gets stuck for root) Joe Peterson
  0 siblings, 2 replies; 38+ messages in thread
From: Vegard Nossum @ 2008-06-14 21:26 UTC (permalink / raw)
  To: Joe Peterson
  Cc: Alan Cox, Alan Cox, David Newall, Willy Tarreau, Harald Dunkel,
	linux-kernel

On Sat, Jun 14, 2008 at 10:52 PM, Joe Peterson <joe@skyrush.com> wrote:
> Vegard Nossum wrote:
>> Yeah, a user-space process can do this, and it's the right behaviour
>> for the kernel. I did post a program that would "reproduce" what
>> you're seeing. I do now believe that it's something timing-related, as
>> Alan suggested initially. (But timing-related with your scripts, that
>> is. I must say, that "sleep 2" does look a bit suspicious; I have no
>> idea what that is supposed to do :-))
>
> Ah, that is something I put in there to artificially make it more
> reproducible.  Here's the reason: when I first encountered the problem,
> it was happening if the home dir of the user was on the "btrfs"
> filesystem (the new checksumming one from Oracle).  This made me suspect
> btrfs initially.  But I reproduced the problem [more sporadically] when
> the home was on ext3 as well.  Since btrfs has a different performance
> profile, especially when first accessed after a mount (and it is a
> filesystem still under development, so some optimizations are yet to
> come), I figured it might be timing-related, and sure enough, adding the
> "sleep 2" proved that.

I'm not sure it is. Try adding sleep 3 instead. Because I have the
"sleep 2" when I run "su foo" as well, and I _didn't_ put it there:

[pid  6298] execve("/bin/sleep", ["sleep", "2"], [/* 47 vars */]
<unfinished ...>


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 2.6.25.3: su gets stuck for root
  2008-06-14 21:26                                       ` Vegard Nossum
@ 2008-06-14 21:34                                         ` Joe Peterson
  2008-07-02 18:03                                         ` tty session leader issue (was Re: 2.6.25.3: su gets stuck for root) Joe Peterson
  1 sibling, 0 replies; 38+ messages in thread
From: Joe Peterson @ 2008-06-14 21:34 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Alan Cox, Alan Cox, David Newall, Willy Tarreau, Harald Dunkel,
	linux-kernel

Vegard Nossum wrote:
> I'm not sure it is. Try adding sleep 3 instead. Because I have the
> "sleep 2" when I run "su foo" as well, and I _didn't_ put it there:
> 
> [pid  6298] execve("/bin/sleep", ["sleep", "2"], [/* 47 vars */]
> <unfinished ...>

Weird!  OK, I tried it with "sleep 3" in .bashrc, and it says
"...execve("/usr/bin/sleep", ["sleep", "3"], [/* 30 vars */]) = 0".
This sounds like what I'd expect.  I don't understand why you see a
sleep 2 when you did not have one in your config.....

	-Joe

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: 2.6.25.3: su gets stuck for root
  2008-06-02 10:50             ` Alan Cox
@ 2008-06-17 15:32               ` Joe Peterson
  0 siblings, 0 replies; 38+ messages in thread
From: Joe Peterson @ 2008-06-17 15:32 UTC (permalink / raw)
  To: Alan Cox
  Cc: Vegard Nossum, Alan Cox, David Newall, Willy Tarreau,
	Harald Dunkel, linux-kernel

Alan Cox wrote:
> This looks correct to me and in fact I see the behaviour you report on 2.6.23
> when running it. If I tell it to ignore SIGTTOU that also then behaves as
> expected.
> 
> If
> 	your pgrp is not the pgrp of the tty
> 	and you are not ignoring TTOU
> 	and you are not orphaned (as a group)
> 
> 	Then we are *supposed* to send you SIGTTOU and kick you back
> 	into touch.

OK, I am still baffled.  I've thought of several different theories,
wondering if bash does not have the right parent process, how there
could be a race in the kernel or elsewhere, but as far as I can tell,
things are in order.  Here's the ps -ax --forest output while hung:

 6435 tty3     Ss     0:00 /bin/login --
 7954 tty3     S      0:00  \_ -bash
 7958 tty3     S+     0:00      \_ su foo
 7959 tty3     S      0:00          \_ bash
 7964 tty3     T      0:00              \_ stty -ixany

I had logged into the tty as root (with shell set to bash), then su'd to
foo (with shell set to bash), so this tree makes sense.  During the
sleep before the stty, sleep is under the final bash similar to the way
stty is while it is hung.

Note that the stty is a child of bash (which, BTW, sometimes appears as
"-su" instead - I am not clear on that), and they all lead back to the
original tty, which I gather is the session leader (or is it the "su"?).

Now, the debugging I did shows that the reason that tty_check_change()
returns an error is that the tty->pgrg != task_pgrp(current).  The
former is the "su foo" process, and the latter is the bash child process.

So I guess that when it does work, they are the same process, but why
would they be the same (or not, as it were)?  Does something happen
during bash startup that causes bash to become the session leader?

Please, please, someone who understands the mechanics better than I let
me know how I can explore this more deeply.

						Thanks, Joe

^ permalink raw reply	[flat|nested] 38+ messages in thread

* tty session leader issue (was Re: 2.6.25.3: su gets stuck for root)
  2008-06-14 21:26                                       ` Vegard Nossum
  2008-06-14 21:34                                         ` Joe Peterson
@ 2008-07-02 18:03                                         ` Joe Peterson
  2008-07-02 19:21                                           ` markus reichelt
  2008-07-06 14:08                                           ` Tim Connors
  1 sibling, 2 replies; 38+ messages in thread
From: Joe Peterson @ 2008-07-02 18:03 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Alan Cox, Alan Cox, David Newall, Willy Tarreau, Harald Dunkel,
	linux-kernel, tconnors

I have done some more investigation on this problem, and I am posting
here my results in hope that someone can point me in the right direction
for further investigation...

Summary: during the initialization of a new bash shell, the terminal
foreground process group often reverts back to that of the parent of the
bash shell (after being set *to* the bash shell pgrp by bash),
prohibiting commands like stty from being run by the init scripts.  The
result is that the execution of these commands will hang until killed,
causing the bash prompt to not appear.  Adding a delay in the script
(using sleep) increases the chance of this having time to happen.

For example, putting the following in a user's .bashrc:

sleep 2
stty -ixany

is a good way to reproduce this.  doing "su <user>" from root (note that
the fact that no password is required helps the timing) will then often
hang.  Killing -9 stty will allow the bash prompt to appear.

I have instrumented the bash source code in an attempt to see why this
is happeneing, partly because I suspected a bug in bash.  What I have
found is this:

1) bash calls tcsetpgrp() with the pgrp of the bash process (two times)
before starting to execute init scripts.  This makes sense, since bash
needs to be the session leader.  It is never called again until just
before the bash shell exits normally (at which time it returns control
to the parent).

2) During the processing of the init scripts (sometimes .bashrc, but
sometimes a system script that is processed first), calling tcgetpgrp()
shows that the pgrp has reverted back to the "su <user>" process.  It
does not appear that bash reverted it in my testing so far.  Running
stty while in the reverted state causes a hang, since bash is not the
session leader.

So here is the question: is there a way/reason the kernel would revert
the pgrp of the session leader after bash sets it?  Is there some more
instrumenting in the kernel or in bash that might reveal what is going
on?  I have heard yet another report of this happening since I added to
the thread, and I can get it to happen easily on two different machines
(a desktop and a laptop).

						Thanks, Joe


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: tty session leader issue (was Re: 2.6.25.3: su gets stuck for root)
  2008-07-02 18:03                                         ` tty session leader issue (was Re: 2.6.25.3: su gets stuck for root) Joe Peterson
@ 2008-07-02 19:21                                           ` markus reichelt
  2008-07-06 14:08                                           ` Tim Connors
  1 sibling, 0 replies; 38+ messages in thread
From: markus reichelt @ 2008-07-02 19:21 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 316 bytes --]

* Joe Peterson <joe@skyrush.com> wrote:

> I have done some more investigation on this problem, and I am
> posting here my results in hope that someone can point me in the
> right direction for further investigation...

I cannot reproduce this with 2.6.25.9 (on Slackware 12.0)

-- 
left blank, right bald

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: tty session leader issue (was Re: 2.6.25.3: su gets stuck for root)
  2008-07-02 18:03                                         ` tty session leader issue (was Re: 2.6.25.3: su gets stuck for root) Joe Peterson
  2008-07-02 19:21                                           ` markus reichelt
@ 2008-07-06 14:08                                           ` Tim Connors
  2008-07-06 16:44                                             ` Alan Cox
  2008-07-06 18:49                                             ` tty session leader issue [cause now known!] " Joe Peterson
  1 sibling, 2 replies; 38+ messages in thread
From: Tim Connors @ 2008-07-06 14:08 UTC (permalink / raw)
  To: Joe Peterson
  Cc: Vegard Nossum, Alan Cox, Alan Cox, David Newall, Willy Tarreau,
	Harald Dunkel, linux-kernel

On Wed, 2 Jul 2008, Joe Peterson wrote:

> I have done some more investigation on this problem, and I am posting
> here my results in hope that someone can point me in the right direction
> for further investigation...
>
> Summary: during the initialization of a new bash shell, the terminal
> foreground process group often reverts back to that of the parent of the
> bash shell (after being set *to* the bash shell pgrp by bash),
> prohibiting commands like stty from being run by the init scripts.  The
> result is that the execution of these commands will hang until killed,
> causing the bash prompt to not appear.  Adding a delay in the script
> (using sleep) increases the chance of this having time to happen.
...
> So here is the question: is there a way/reason the kernel would revert
> the pgrp of the session leader after bash sets it?  Is there some more
> instrumenting in the kernel or in bash that might reveal what is going
> on?  I have heard yet another report of this happening since I added to
> the thread, and I can get it to happen easily on two different machines
> (a desktop and a laptop).

In fact, in various laptops (Eeeepc, dell inspiron 1520, Dell inspiron
4000), I've got various tty screwups that have been introduced since
circa 2.6.19.

The 6 year old inspiron 4000 gets stuck at stty erase ^? .  Randomly, but
most of the time.

All of my machines exhibit the ctrl-C being slower than ctrl-Z discussed
elswhere (I've almost developed a habit of typing ctrl-Z kill %1 <RET>).
Although even ctrl-Z recently has been reluctant to always work.  I wonder
if this is the cause of dpkg recently not responding to ctrl-Z's? (debian
bug #486222).  dpkg does respond to kill -STOP

ctrl-s doesn't always work anymore.  Again, what prompted me to write this
email, was I couldn't pause dpkg.  It's particularly unreliable at
stopping scrolling messages at bootup, and if I press it at the wrong time
at bootup (not a specific place - it can be starting up any number of
scripts), something deadlocks and won't resume upon a ctrl-q.
alt-sysrq-k is enough to kill whatever has deadlocked.  I have a feeling,
but don't want to test on this system right now, that pressing scroll-lock
as opposed to ctrl-q once unlocked such a stuck display.

In summary, something in tty is certainly screwed.  Does anyone see a
connection between all of these?

-- 
TimC
> cat ~/.signature
Electromagnetic pulse received (core dumped)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: tty session leader issue (was Re: 2.6.25.3: su gets stuck for root)
  2008-07-06 14:08                                           ` Tim Connors
@ 2008-07-06 16:44                                             ` Alan Cox
  2008-07-06 18:49                                             ` tty session leader issue [cause now known!] " Joe Peterson
  1 sibling, 0 replies; 38+ messages in thread
From: Alan Cox @ 2008-07-06 16:44 UTC (permalink / raw)
  To: Tim Connors
  Cc: Joe Peterson, Vegard Nossum, Alan Cox, Alan Cox, David Newall,
	Willy Tarreau, Harald Dunkel, linux-kernel

On Mon, Jul 07, 2008 at 12:08:58AM +1000, Tim Connors wrote:
> In summary, something in tty is certainly screwed.  Does anyone see a
> connection between all of these?

That they don't happen for me - at all is the only one I can suggest ? Most
of your comments are also not ones I've seen reported before.

Unfortunately 'works for me' doesn't tell me whether that is luck, distribution
specific, user configuration choices, gcc version, bugs in code , or whatever
and someone who sees the ^C problem is going to have to track it down.

Alan


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: tty session leader issue [cause now known!] (was Re: 2.6.25.3: su gets stuck for root)
  2008-07-06 14:08                                           ` Tim Connors
  2008-07-06 16:44                                             ` Alan Cox
@ 2008-07-06 18:49                                             ` Joe Peterson
  1 sibling, 0 replies; 38+ messages in thread
From: Joe Peterson @ 2008-07-06 18:49 UTC (permalink / raw)
  To: Tim Connors
  Cc: Vegard Nossum, Alan Cox, Alan Cox, David Newall, Willy Tarreau,
	Harald Dunkel, linux-kernel, Ingo Molnar

Tim Connors wrote:
> On Wed, 2 Jul 2008, Joe Peterson wrote:
> 
>> I have done some more investigation on this problem, and I am posting
>> here my results in hope that someone can point me in the right direction
>> for further investigation...
>>
>> Summary: during the initialization of a new bash shell, the terminal
>> foreground process group often reverts back to that of the parent of the
>> bash shell (after being set *to* the bash shell pgrp by bash),
>> prohibiting commands like stty from being run by the init scripts.  The
>> result is that the execution of these commands will hang until killed,
>> causing the bash prompt to not appear.  Adding a delay in the script
>> (using sleep) increases the chance of this having time to happen.

I have done more investigation, and I now know the cause of the
bash/stty problem.  It appears to be a race condition in bash (well,
between two different bash shells, actually).  I saw a post from a while
back about something similar by Ingo Molnar, so I have copied him here too.

Here is the ps tree of the test case where stty has hung:

 4704 ?        S      0:00  \_ xterm
 4706 pts/3    Ss     0:00  |   \_ -bash
 4739 pts/3    S      0:00  |       \_ su
 4742 pts/3    S      0:00  |           \_ bash
 4746 pts/3    S+     0:00  |               \_ su foo
 4747 pts/3    S      0:00  |                   \_ bash
 4752 pts/3    T      0:00  |                       \_ stty -ixany

What should happen is: when "su foo" (4746) is run, it spawns a bash
shell (4747) that then makes itself the session leader when it
initializes its job control.  The stty command (in the child bash's
.bashrc) will then be able to work (and not hang).

However, the hang happens when the parent bash (4742) interferes by
reverting the tty session leader back to its child (the "su foo"
process: 4746) shortly after the child bash (4747) becomes the leader.
The parent does this when it calls
execute_command_internal()->stop_pipeline()->give_terminal_to().  This
seems to happen at a slightly random time, making the issue intermittent
- it depends which one wins the race.

In summary, when the bug does *not* occur, here is the approximate
sequence (note I am :

1) parent bash (4742) runs 'su foo' (4746)
2) parent bash sets tty leader to 'su' (4746)
3) child bash (4747) initializes and sets itself to be the leader
4) stty command in .bashrc runs successfully

When the bug occurs, here is the sequence:

1) parent bash (4742) runs 'su foo' (4746)
2) child bash (4747) initializes and sets itself to be the leader
3) parent bash sets tty leader *back* to 'su' (4746)
4) stty command runs and fails/hangs because its parent is not leader

The various calls to tcsetpgrp() that do this are interleaved from the
two bash processes, and sometimes the parent does it slightly *after*
the child bash initializes job control - that's when the problem happens.

I have not looked further to find a solution (but it's a great start to
know the cause...!).  Any further help is welcome.

> The 6 year old inspiron 4000 gets stuck at stty erase ^? .  Randomly, but
> most of the time.
> 
> All of my machines exhibit the ctrl-C being slower than ctrl-Z discussed
> elswhere (I've almost developed a habit of typing ctrl-Z kill %1 <RET>).
> Although even ctrl-Z recently has been reluctant to always work.  I wonder
> if this is the cause of dpkg recently not responding to ctrl-Z's? (debian
> bug #486222).  dpkg does respond to kill -STOP

I doubt that this is related.  See the following thread for more info on
this:

	http://marc.info/?l=linux-kernel&m=121528829718840&w=2

> ctrl-s doesn't always work anymore.  Again, what prompted me to write this
> email, was I couldn't pause dpkg.  It's particularly unreliable at
> stopping scrolling messages at bootup, and if I press it at the wrong time
> at bootup (not a specific place - it can be starting up any number of
> scripts), something deadlocks and won't resume upon a ctrl-q.
> alt-sysrq-k is enough to kill whatever has deadlocked.  I have a feeling,
> but don't want to test on this system right now, that pressing scroll-lock
> as opposed to ctrl-q once unlocked such a stuck display.

Hmm, not sure; I have not seen that behavior.

> In summary, something in tty is certainly screwed.  Does anyone see a
> connection between all of these?

I doubt there is a connection between the bash issue and what you are
seeing with ctrl-C/ctrl-S, etc.

					-Joe

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2008-07-06 18:49 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-02  1:31 2.6.25.3: su gets stuck for root Joe Peterson
2008-06-02  5:12 ` Harald Dunkel
2008-06-02  5:32   ` Willy Tarreau
2008-06-02  5:55     ` Joe Peterson
2008-06-02  8:10     ` Alan Cox
2008-06-02  9:01       ` David Newall
2008-06-02  9:20         ` Alan Cox
2008-06-02 10:16           ` Vegard Nossum
2008-06-02 10:39             ` Vegard Nossum
2008-06-02 10:52               ` Alan Cox
2008-06-02 10:57                 ` Vegard Nossum
2008-06-02 12:28                   ` Alan Cox
2008-06-02 14:31                     ` Vegard Nossum
2008-06-02 10:50             ` Alan Cox
2008-06-17 15:32               ` Joe Peterson
2008-06-02 15:26           ` Joe Peterson
2008-06-02 15:51             ` Alan Cox
2008-06-02 16:03               ` Joe Peterson
2008-06-04 14:43               ` Joe Peterson
2008-06-04 15:16                 ` Alan Cox
2008-06-04 16:52                   ` Joe Peterson
2008-06-04 17:10                     ` Alan Cox
2008-06-04 20:32                       ` Joe Peterson
2008-06-11 14:04                         ` Joe Peterson
2008-06-12 11:52                           ` Vegard Nossum
2008-06-14  1:49                             ` Joe Peterson
2008-06-14  7:45                               ` Vegard Nossum
2008-06-14 17:43                                 ` Joe Peterson
2008-06-14 20:34                                   ` Vegard Nossum
2008-06-14 20:52                                     ` Joe Peterson
2008-06-14 21:26                                       ` Vegard Nossum
2008-06-14 21:34                                         ` Joe Peterson
2008-07-02 18:03                                         ` tty session leader issue (was Re: 2.6.25.3: su gets stuck for root) Joe Peterson
2008-07-02 19:21                                           ` markus reichelt
2008-07-06 14:08                                           ` Tim Connors
2008-07-06 16:44                                             ` Alan Cox
2008-07-06 18:49                                             ` tty session leader issue [cause now known!] " Joe Peterson
2008-06-02  5:42   ` 2.6.25.3: su gets stuck for root Joe Peterson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox