public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* 2.6.31 regression: system hang after pptp connection established
@ 2009-09-17 19:59 Peter Volkov
  2009-09-17 20:42 ` Linus Torvalds
  2009-09-18 15:37 ` Andrey Rahmatullin
  0 siblings, 2 replies; 7+ messages in thread
From: Peter Volkov @ 2009-09-17 19:59 UTC (permalink / raw)
  To: gregkh, torvalds; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1613 bytes --]

Hi. 

After pptp connection is established my 2.6.31 system freezes while
2.6.30 works as expected. Bissecting gave me the following result:

commit ac89a9174decf343de049a06fad75681f71890eb
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Sat Sep 5 13:27:10 2009 -0700

    pty: don't limit the writes to 'pty_space()' inside 'pty_write()'

and looks like reverting this patch from 2.6.31 fixes the problem.


Other observations: It's hard to say when exactly system hangs - it
hangs not immediatly, sometimes when I try to send some traffic,
sometimes when I switch ppp connection off. Hang is not complete: mouse
cursor keeps moving in Xorg, but every click gets no respond, I'm unable
to start new programs, in open xconsoles it's possible to input
something but after I press enter consoles hang too. Also there is no
way out of X (ctrl+alt+FN combo does not work) and connected to this
computer ssh consoles hang too (again, it's possible to put ls there but
after that it hangs). No new ssh connections possible due to time out.

In hope to get any oops I've started netconsole but at hang no new ouput
was there. I've managed to gather some information with SysRq (it's
gzipped in attachment) but I'm not sure how useful it is.

I've tried to establish pptp connection both over wireless and wired
connections and system hanged with both, so it looks like networking
drivers are not the reason here. BTW, I'm using networkmanager to
establish connection.

gzipped kernel config is in attachment.

Is this problem known? Does anybody experience same problem? Do you have
a fix? :)

-- 
Peter.

[-- Attachment #2: sysrq.txt.gz --]
[-- Type: application/x-gzip, Size: 10123 bytes --]

[-- Attachment #3: config.gz --]
[-- Type: application/x-gzip, Size: 14771 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.31 regression: system hang after pptp connection established
  2009-09-17 19:59 2.6.31 regression: system hang after pptp connection established Peter Volkov
@ 2009-09-17 20:42 ` Linus Torvalds
  2009-09-17 21:12   ` Linus Torvalds
  2009-09-18 15:37 ` Andrey Rahmatullin
  1 sibling, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2009-09-17 20:42 UTC (permalink / raw)
  To: Peter Volkov; +Cc: gregkh, linux-kernel



On Thu, 17 Sep 2009, Peter Volkov wrote:
> 
> After pptp connection is established my 2.6.31 system freezes while
> 2.6.30 works as expected. Bissecting gave me the following result:
> 
> commit ac89a9174decf343de049a06fad75681f71890eb
> Author: Linus Torvalds <torvalds@linux-foundation.org>
> Date:   Sat Sep 5 13:27:10 2009 -0700
> 
>     pty: don't limit the writes to 'pty_space()' inside 'pty_write()'
> 
> and looks like reverting this patch from 2.6.31 fixes the problem.

Hmm. The only thing it should cause is that pty_write() will effectively 
allow a larger buffer for writes (limited to ~64kB rather than 8kB).

But considering how fragile ppp has been, I guess I shouldn't be surprised 
that this can cause a hang in itself.

> In hope to get any oops I've started netconsole but at hang no new ouput
> was there. I've managed to gather some information with SysRq (it's
> gzipped in attachment) but I'm not sure how useful it is.

It's interesting, but I don't know how _useful_ it is.

What's interesting about it is that it shows a problem, but the problem it 
shows would seem to have nothing at all to do with ppp or networking or 
pty's. The problem seems to be processes stuck in disk-wait:

	events/0      D ffff88007d0c7b50     0     7      2 0x00000000
	events/0      D ffff88007d0c7b50     0     7      2 0x00000000
	kacpi_notify  D ffff88007d2ffbe8     0   170      2 0x00000000
	khubd         D ffff88007d211ae8     0   260      2 0x00000000
	pdflush       D ffff88007d26bd40     0   326      2 0x00000000
	kjournald     D ffff88007b2f3df8     0  3361      2 0x00000000
	kjournald     D ffff88007c65ddf8     0  3362      2 0x00000000
	reiserfs/0    D [<ffffffff810725f7>] ? delayacct_end+0x81/0x8c
	events/0      D ffff88007d0c7b50     0     7      2 0x00000000
	kacpi_notify  D ffff88007d2ffbe8     0   170      2 0x00000000
	khubd         D ffff88007d211ae8     0   260      2 0x00000000
	pdflush       D ffff88007d26bd40     0   326      2 0x00000000
	kjournald     D ffff88007b2f3df8     0  3361      2 0x00000000
	kjournald     D ffff88007c65ddf8     0  3362      2 0x00000000

which explains your symptoms - hung X (with just cursor moving) and ssh's 
hanging.

It's just that while it all explains your symptoms, none of the above 
should have anything what-so-ever to do with pty's!

pdflush, for example, seems to be stuck waiting for &jl->j_commit_mutex in 
reiserfs. Odd. It really looks like you have something stuck waiting for 
IO.

But your CPU 1 backtrace looks relevant, and seems hung on a spinlock in 
tty_buffer_request_room() and has that pty_write() thing there. I'm not 
seeing why the 'D' states above happen, though.

> I've tried to establish pptp connection both over wireless and wired
> connections and system hanged with both, so it looks like networking
> drivers are not the reason here. BTW, I'm using networkmanager to
> establish connection.
> 
> gzipped kernel config is in attachment.
> 
> Is this problem known? Does anybody experience same problem? Do you have
> a fix? :)

Not a known problem, but it's entirely possible that there is some bug in 
the "tty buffer out of memory" handling - that nobody has ever seen 
because in practice everybody always hit other limits first.

Let me look at it a bit, and see if I can come up with test patches for 
you.

		Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.31 regression: system hang after pptp connection established
  2009-09-17 20:42 ` Linus Torvalds
@ 2009-09-17 21:12   ` Linus Torvalds
  2009-09-18 11:20     ` Peter Volkov
  0 siblings, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2009-09-17 21:12 UTC (permalink / raw)
  To: Peter Volkov; +Cc: gregkh, linux-kernel



On Thu, 17 Sep 2009, Linus Torvalds wrote:
> 
> What's interesting about it is that it shows a problem, but the problem it 
> shows would seem to have nothing at all to do with ppp or networking or 
> pty's. The problem seems to be processes stuck in disk-wait:

Ahh. I think I see what may be going on.

Somebody got a filesystem mutex, and then went to sleep due to IO. Then 
pptp comes in, and seems to be stuck in a loop in kernel space, and 
it seems to be stuck with preemption off.

So one CPU is stuck, and the thing that we want to run is on the same 
run-queue, and not preempting. An looking at your CPU#1 trace, it's likely 
looping in ppp_async_push().

And that whole loop is insane (and very prone to infinite loops), but it 
also depends on that tty wakeup() thing.

Does this patch make a difference? Make sure to _not_ try to do the whole 
wakeup thing if we couldn't actually insert anything into the tty buffers.

		Linus
---
 drivers/char/pty.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/char/pty.c b/drivers/char/pty.c
index b33d668..53761ce 100644
--- a/drivers/char/pty.c
+++ b/drivers/char/pty.c
@@ -120,8 +120,10 @@ static int pty_write(struct tty_struct *tty, const unsigned char *buf, int c)
 		/* Stuff the data into the input queue of the other end */
 		c = tty_insert_flip_string(to, buf, c);
 		/* And shovel */
-		tty_flip_buffer_push(to);
-		tty_wakeup(tty);
+		if (c) {
+			tty_flip_buffer_push(to);
+			tty_wakeup(tty);
+		}
 	}
 	return c;
 }

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: 2.6.31 regression: system hang after pptp connection established
  2009-09-17 21:12   ` Linus Torvalds
@ 2009-09-18 11:20     ` Peter Volkov
  2009-09-18 14:16       ` Linus Torvalds
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Volkov @ 2009-09-18 11:20 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: gregkh, linux-kernel

The patch fixes the problem here. Thank you very much.

--
Peter.

В Чтв, 17/09/2009 в 14:12 -0700, Linus Torvalds пишет:
> 
> On Thu, 17 Sep 2009, Linus Torvalds wrote:
> > 
> > What's interesting about it is that it shows a problem, but the problem it 
> > shows would seem to have nothing at all to do with ppp or networking or 
> > pty's. The problem seems to be processes stuck in disk-wait:
> 
> Ahh. I think I see what may be going on.
> 
> Somebody got a filesystem mutex, and then went to sleep due to IO. Then 
> pptp comes in, and seems to be stuck in a loop in kernel space, and 
> it seems to be stuck with preemption off.
> 
> So one CPU is stuck, and the thing that we want to run is on the same 
> run-queue, and not preempting. An looking at your CPU#1 trace, it's likely 
> looping in ppp_async_push().
> 
> And that whole loop is insane (and very prone to infinite loops), but it 
> also depends on that tty wakeup() thing.
> 
> Does this patch make a difference? Make sure to _not_ try to do the whole 
> wakeup thing if we couldn't actually insert anything into the tty buffers.
> 
> 		Linus
> ---
>  drivers/char/pty.c |    6 ++++--
>  1 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/char/pty.c b/drivers/char/pty.c
> index b33d668..53761ce 100644
> --- a/drivers/char/pty.c
> +++ b/drivers/char/pty.c
> @@ -120,8 +120,10 @@ static int pty_write(struct tty_struct *tty, const unsigned char *buf, int c)
>  		/* Stuff the data into the input queue of the other end */
>  		c = tty_insert_flip_string(to, buf, c);
>  		/* And shovel */
> -		tty_flip_buffer_push(to);
> -		tty_wakeup(tty);
> +		if (c) {
> +			tty_flip_buffer_push(to);
> +			tty_wakeup(tty);
> +		}
>  	}
>  	return c;
>  }


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.31 regression: system hang after pptp connection established
  2009-09-18 11:20     ` Peter Volkov
@ 2009-09-18 14:16       ` Linus Torvalds
  0 siblings, 0 replies; 7+ messages in thread
From: Linus Torvalds @ 2009-09-18 14:16 UTC (permalink / raw)
  To: Peter Volkov; +Cc: gregkh, linux-kernel


On Fri, 18 Sep 2009, Peter Volkov wrote:
>
> The patch fixes the problem here. Thank you very much.

Hey, thank _you_ for the sysrq output, that made it quite debuggable.

Committed as 202c4675c, and I cc'd stable.

		Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.31 regression: system hang after pptp connection established
  2009-09-17 19:59 2.6.31 regression: system hang after pptp connection established Peter Volkov
  2009-09-17 20:42 ` Linus Torvalds
@ 2009-09-18 15:37 ` Andrey Rahmatullin
  2009-09-18 15:48   ` Linus Torvalds
  1 sibling, 1 reply; 7+ messages in thread
From: Andrey Rahmatullin @ 2009-09-18 15:37 UTC (permalink / raw)
  To: Peter Volkov, Linus Torvalds; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 630 bytes --]

On Thu, Sep 17, 2009 at 11:59:13PM +0400, Peter Volkov wrote:
> Is this problem known? 
Yes, it's described at http://bugzilla.kernel.org/show_bug.cgi?id=14179
since Tuesday.

On Fri, Sep 18, 2009 at 07:16:39AM -0700, Linus Torvalds wrote:
> > The patch fixes the problem here. Thank you very much.
> Hey, thank _you_ for the sysrq output, that made it quite debuggable.
> Committed as 202c4675c, and I cc'd stable.
Thanks for the fix, but should I send bugreports directly here next time
instead of filing a bug in bugzilla.kernel.org and waiting for response
that will never come?

-- 
WBR, wRAR (ALT Linux Team)

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.31 regression: system hang after pptp connection established
  2009-09-18 15:37 ` Andrey Rahmatullin
@ 2009-09-18 15:48   ` Linus Torvalds
  0 siblings, 0 replies; 7+ messages in thread
From: Linus Torvalds @ 2009-09-18 15:48 UTC (permalink / raw)
  To: Andrey Rahmatullin; +Cc: Peter Volkov, linux-kernel



On Fri, 18 Sep 2009, Andrey Rahmatullin wrote:
>
> On Thu, Sep 17, 2009 at 11:59:13PM +0400, Peter Volkov wrote:
> > Is this problem known? 
> Yes, it's described at http://bugzilla.kernel.org/show_bug.cgi?id=14179
> since Tuesday.
> 
> On Fri, Sep 18, 2009 at 07:16:39AM -0700, Linus Torvalds wrote:
> > > The patch fixes the problem here. Thank you very much.
> > Hey, thank _you_ for the sysrq output, that made it quite debuggable.
> > Committed as 202c4675c, and I cc'd stable.
> Thanks for the fix, but should I send bugreports directly here next time
> instead of filing a bug in bugzilla.kernel.org and waiting for response
> that will never come?

Bugzilla is great, but you should _also_ target the maintainers directly 
and let them know.  And especially if you have bisected things, always cc 
everybody that is listed in the commit.

Otherwise, what happens is that other people not directly involved will 
eventually look at the regression list, and see it - but that generally 
happens much later. So things will get fixed from just the bugzilla report 
too, but you'll have a much longer latency than required.

In fact, if you can bisect it to a single commit (especially a small one 
like this), then bugzilla is the secondary, rather than the primary place. 
Bugzilla is great for keeping track of things and trying to avoid losing 
reports, but that comes at the expense of not being very convenient for 
short-term stuff.

So if you have a very targeted bugreport, and know who to send a report 
to, try the direct route first. Then, if nothing happens immediately, open 
a bugzilla (or open the bugzilla immediately, just in case, but see it 
as a "fallback" thing).

			Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-09-18 16:07 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-09-17 19:59 2.6.31 regression: system hang after pptp connection established Peter Volkov
2009-09-17 20:42 ` Linus Torvalds
2009-09-17 21:12   ` Linus Torvalds
2009-09-18 11:20     ` Peter Volkov
2009-09-18 14:16       ` Linus Torvalds
2009-09-18 15:37 ` Andrey Rahmatullin
2009-09-18 15:48   ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox