* 2.6.31 regression: system hang after pptp connection established
@ 2009-09-17 19:59 Peter Volkov
2009-09-17 20:42 ` Linus Torvalds
2009-09-18 15:37 ` Andrey Rahmatullin
0 siblings, 2 replies; 7+ messages in thread
From: Peter Volkov @ 2009-09-17 19:59 UTC (permalink / raw)
To: gregkh, torvalds; +Cc: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1613 bytes --]
Hi.
After pptp connection is established my 2.6.31 system freezes while
2.6.30 works as expected. Bissecting gave me the following result:
commit ac89a9174decf343de049a06fad75681f71890eb
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date: Sat Sep 5 13:27:10 2009 -0700
pty: don't limit the writes to 'pty_space()' inside 'pty_write()'
and looks like reverting this patch from 2.6.31 fixes the problem.
Other observations: It's hard to say when exactly system hangs - it
hangs not immediatly, sometimes when I try to send some traffic,
sometimes when I switch ppp connection off. Hang is not complete: mouse
cursor keeps moving in Xorg, but every click gets no respond, I'm unable
to start new programs, in open xconsoles it's possible to input
something but after I press enter consoles hang too. Also there is no
way out of X (ctrl+alt+FN combo does not work) and connected to this
computer ssh consoles hang too (again, it's possible to put ls there but
after that it hangs). No new ssh connections possible due to time out.
In hope to get any oops I've started netconsole but at hang no new ouput
was there. I've managed to gather some information with SysRq (it's
gzipped in attachment) but I'm not sure how useful it is.
I've tried to establish pptp connection both over wireless and wired
connections and system hanged with both, so it looks like networking
drivers are not the reason here. BTW, I'm using networkmanager to
establish connection.
gzipped kernel config is in attachment.
Is this problem known? Does anybody experience same problem? Do you have
a fix? :)
--
Peter.
[-- Attachment #2: sysrq.txt.gz --]
[-- Type: application/x-gzip, Size: 10123 bytes --]
[-- Attachment #3: config.gz --]
[-- Type: application/x-gzip, Size: 14771 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: 2.6.31 regression: system hang after pptp connection established 2009-09-17 19:59 2.6.31 regression: system hang after pptp connection established Peter Volkov @ 2009-09-17 20:42 ` Linus Torvalds 2009-09-17 21:12 ` Linus Torvalds 2009-09-18 15:37 ` Andrey Rahmatullin 1 sibling, 1 reply; 7+ messages in thread From: Linus Torvalds @ 2009-09-17 20:42 UTC (permalink / raw) To: Peter Volkov; +Cc: gregkh, linux-kernel On Thu, 17 Sep 2009, Peter Volkov wrote: > > After pptp connection is established my 2.6.31 system freezes while > 2.6.30 works as expected. Bissecting gave me the following result: > > commit ac89a9174decf343de049a06fad75681f71890eb > Author: Linus Torvalds <torvalds@linux-foundation.org> > Date: Sat Sep 5 13:27:10 2009 -0700 > > pty: don't limit the writes to 'pty_space()' inside 'pty_write()' > > and looks like reverting this patch from 2.6.31 fixes the problem. Hmm. The only thing it should cause is that pty_write() will effectively allow a larger buffer for writes (limited to ~64kB rather than 8kB). But considering how fragile ppp has been, I guess I shouldn't be surprised that this can cause a hang in itself. > In hope to get any oops I've started netconsole but at hang no new ouput > was there. I've managed to gather some information with SysRq (it's > gzipped in attachment) but I'm not sure how useful it is. It's interesting, but I don't know how _useful_ it is. What's interesting about it is that it shows a problem, but the problem it shows would seem to have nothing at all to do with ppp or networking or pty's. The problem seems to be processes stuck in disk-wait: events/0 D ffff88007d0c7b50 0 7 2 0x00000000 events/0 D ffff88007d0c7b50 0 7 2 0x00000000 kacpi_notify D ffff88007d2ffbe8 0 170 2 0x00000000 khubd D ffff88007d211ae8 0 260 2 0x00000000 pdflush D ffff88007d26bd40 0 326 2 0x00000000 kjournald D ffff88007b2f3df8 0 3361 2 0x00000000 kjournald D ffff88007c65ddf8 0 3362 2 0x00000000 reiserfs/0 D [<ffffffff810725f7>] ? delayacct_end+0x81/0x8c events/0 D ffff88007d0c7b50 0 7 2 0x00000000 kacpi_notify D ffff88007d2ffbe8 0 170 2 0x00000000 khubd D ffff88007d211ae8 0 260 2 0x00000000 pdflush D ffff88007d26bd40 0 326 2 0x00000000 kjournald D ffff88007b2f3df8 0 3361 2 0x00000000 kjournald D ffff88007c65ddf8 0 3362 2 0x00000000 which explains your symptoms - hung X (with just cursor moving) and ssh's hanging. It's just that while it all explains your symptoms, none of the above should have anything what-so-ever to do with pty's! pdflush, for example, seems to be stuck waiting for &jl->j_commit_mutex in reiserfs. Odd. It really looks like you have something stuck waiting for IO. But your CPU 1 backtrace looks relevant, and seems hung on a spinlock in tty_buffer_request_room() and has that pty_write() thing there. I'm not seeing why the 'D' states above happen, though. > I've tried to establish pptp connection both over wireless and wired > connections and system hanged with both, so it looks like networking > drivers are not the reason here. BTW, I'm using networkmanager to > establish connection. > > gzipped kernel config is in attachment. > > Is this problem known? Does anybody experience same problem? Do you have > a fix? :) Not a known problem, but it's entirely possible that there is some bug in the "tty buffer out of memory" handling - that nobody has ever seen because in practice everybody always hit other limits first. Let me look at it a bit, and see if I can come up with test patches for you. Linus ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 2.6.31 regression: system hang after pptp connection established 2009-09-17 20:42 ` Linus Torvalds @ 2009-09-17 21:12 ` Linus Torvalds 2009-09-18 11:20 ` Peter Volkov 0 siblings, 1 reply; 7+ messages in thread From: Linus Torvalds @ 2009-09-17 21:12 UTC (permalink / raw) To: Peter Volkov; +Cc: gregkh, linux-kernel On Thu, 17 Sep 2009, Linus Torvalds wrote: > > What's interesting about it is that it shows a problem, but the problem it > shows would seem to have nothing at all to do with ppp or networking or > pty's. The problem seems to be processes stuck in disk-wait: Ahh. I think I see what may be going on. Somebody got a filesystem mutex, and then went to sleep due to IO. Then pptp comes in, and seems to be stuck in a loop in kernel space, and it seems to be stuck with preemption off. So one CPU is stuck, and the thing that we want to run is on the same run-queue, and not preempting. An looking at your CPU#1 trace, it's likely looping in ppp_async_push(). And that whole loop is insane (and very prone to infinite loops), but it also depends on that tty wakeup() thing. Does this patch make a difference? Make sure to _not_ try to do the whole wakeup thing if we couldn't actually insert anything into the tty buffers. Linus --- drivers/char/pty.c | 6 ++++-- 1 files changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/char/pty.c b/drivers/char/pty.c index b33d668..53761ce 100644 --- a/drivers/char/pty.c +++ b/drivers/char/pty.c @@ -120,8 +120,10 @@ static int pty_write(struct tty_struct *tty, const unsigned char *buf, int c) /* Stuff the data into the input queue of the other end */ c = tty_insert_flip_string(to, buf, c); /* And shovel */ - tty_flip_buffer_push(to); - tty_wakeup(tty); + if (c) { + tty_flip_buffer_push(to); + tty_wakeup(tty); + } } return c; } ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: 2.6.31 regression: system hang after pptp connection established 2009-09-17 21:12 ` Linus Torvalds @ 2009-09-18 11:20 ` Peter Volkov 2009-09-18 14:16 ` Linus Torvalds 0 siblings, 1 reply; 7+ messages in thread From: Peter Volkov @ 2009-09-18 11:20 UTC (permalink / raw) To: Linus Torvalds; +Cc: gregkh, linux-kernel The patch fixes the problem here. Thank you very much. -- Peter. В Чтв, 17/09/2009 в 14:12 -0700, Linus Torvalds пишет: > > On Thu, 17 Sep 2009, Linus Torvalds wrote: > > > > What's interesting about it is that it shows a problem, but the problem it > > shows would seem to have nothing at all to do with ppp or networking or > > pty's. The problem seems to be processes stuck in disk-wait: > > Ahh. I think I see what may be going on. > > Somebody got a filesystem mutex, and then went to sleep due to IO. Then > pptp comes in, and seems to be stuck in a loop in kernel space, and > it seems to be stuck with preemption off. > > So one CPU is stuck, and the thing that we want to run is on the same > run-queue, and not preempting. An looking at your CPU#1 trace, it's likely > looping in ppp_async_push(). > > And that whole loop is insane (and very prone to infinite loops), but it > also depends on that tty wakeup() thing. > > Does this patch make a difference? Make sure to _not_ try to do the whole > wakeup thing if we couldn't actually insert anything into the tty buffers. > > Linus > --- > drivers/char/pty.c | 6 ++++-- > 1 files changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/char/pty.c b/drivers/char/pty.c > index b33d668..53761ce 100644 > --- a/drivers/char/pty.c > +++ b/drivers/char/pty.c > @@ -120,8 +120,10 @@ static int pty_write(struct tty_struct *tty, const unsigned char *buf, int c) > /* Stuff the data into the input queue of the other end */ > c = tty_insert_flip_string(to, buf, c); > /* And shovel */ > - tty_flip_buffer_push(to); > - tty_wakeup(tty); > + if (c) { > + tty_flip_buffer_push(to); > + tty_wakeup(tty); > + } > } > return c; > } ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 2.6.31 regression: system hang after pptp connection established 2009-09-18 11:20 ` Peter Volkov @ 2009-09-18 14:16 ` Linus Torvalds 0 siblings, 0 replies; 7+ messages in thread From: Linus Torvalds @ 2009-09-18 14:16 UTC (permalink / raw) To: Peter Volkov; +Cc: gregkh, linux-kernel On Fri, 18 Sep 2009, Peter Volkov wrote: > > The patch fixes the problem here. Thank you very much. Hey, thank _you_ for the sysrq output, that made it quite debuggable. Committed as 202c4675c, and I cc'd stable. Linus ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 2.6.31 regression: system hang after pptp connection established 2009-09-17 19:59 2.6.31 regression: system hang after pptp connection established Peter Volkov 2009-09-17 20:42 ` Linus Torvalds @ 2009-09-18 15:37 ` Andrey Rahmatullin 2009-09-18 15:48 ` Linus Torvalds 1 sibling, 1 reply; 7+ messages in thread From: Andrey Rahmatullin @ 2009-09-18 15:37 UTC (permalink / raw) To: Peter Volkov, Linus Torvalds; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 630 bytes --] On Thu, Sep 17, 2009 at 11:59:13PM +0400, Peter Volkov wrote: > Is this problem known? Yes, it's described at http://bugzilla.kernel.org/show_bug.cgi?id=14179 since Tuesday. On Fri, Sep 18, 2009 at 07:16:39AM -0700, Linus Torvalds wrote: > > The patch fixes the problem here. Thank you very much. > Hey, thank _you_ for the sysrq output, that made it quite debuggable. > Committed as 202c4675c, and I cc'd stable. Thanks for the fix, but should I send bugreports directly here next time instead of filing a bug in bugzilla.kernel.org and waiting for response that will never come? -- WBR, wRAR (ALT Linux Team) [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 490 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 2.6.31 regression: system hang after pptp connection established 2009-09-18 15:37 ` Andrey Rahmatullin @ 2009-09-18 15:48 ` Linus Torvalds 0 siblings, 0 replies; 7+ messages in thread From: Linus Torvalds @ 2009-09-18 15:48 UTC (permalink / raw) To: Andrey Rahmatullin; +Cc: Peter Volkov, linux-kernel On Fri, 18 Sep 2009, Andrey Rahmatullin wrote: > > On Thu, Sep 17, 2009 at 11:59:13PM +0400, Peter Volkov wrote: > > Is this problem known? > Yes, it's described at http://bugzilla.kernel.org/show_bug.cgi?id=14179 > since Tuesday. > > On Fri, Sep 18, 2009 at 07:16:39AM -0700, Linus Torvalds wrote: > > > The patch fixes the problem here. Thank you very much. > > Hey, thank _you_ for the sysrq output, that made it quite debuggable. > > Committed as 202c4675c, and I cc'd stable. > Thanks for the fix, but should I send bugreports directly here next time > instead of filing a bug in bugzilla.kernel.org and waiting for response > that will never come? Bugzilla is great, but you should _also_ target the maintainers directly and let them know. And especially if you have bisected things, always cc everybody that is listed in the commit. Otherwise, what happens is that other people not directly involved will eventually look at the regression list, and see it - but that generally happens much later. So things will get fixed from just the bugzilla report too, but you'll have a much longer latency than required. In fact, if you can bisect it to a single commit (especially a small one like this), then bugzilla is the secondary, rather than the primary place. Bugzilla is great for keeping track of things and trying to avoid losing reports, but that comes at the expense of not being very convenient for short-term stuff. So if you have a very targeted bugreport, and know who to send a report to, try the direct route first. Then, if nothing happens immediately, open a bugzilla (or open the bugzilla immediately, just in case, but see it as a "fallback" thing). Linus ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2009-09-18 16:07 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-09-17 19:59 2.6.31 regression: system hang after pptp connection established Peter Volkov 2009-09-17 20:42 ` Linus Torvalds 2009-09-17 21:12 ` Linus Torvalds 2009-09-18 11:20 ` Peter Volkov 2009-09-18 14:16 ` Linus Torvalds 2009-09-18 15:37 ` Andrey Rahmatullin 2009-09-18 15:48 ` Linus Torvalds
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox