* Characters vanishing in the tty layer? (maybe related to [Bug #14388] keyboard under X with 2.6.31)
@ 2009-10-14 6:21 Michael Ellerman
2009-10-14 14:53 ` Linus Torvalds
0 siblings, 1 reply; 6+ messages in thread
From: Michael Ellerman @ 2009-10-14 6:21 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Rafael J. Wysocki, Boyan, Kernel Testers List,
fredlwm-Re5JQEeQqe8AvxtiuMwx3w, Linus Torvalds, Ed Tomlinson,
OGAWA Hirofumi, Dmitry Torokhov
[-- Attachment #1: Type: text/plain, Size: 2486 bytes --]
Hi all,
I'm seeing a bug in an internal tree, based on 32-rc4, where characters
that are given to the tty code seem to disappear. It sounds different,
but perhaps somehow related to the problems in the "keyboard under X
with 2.6.31" thread.
I'm using hvc_console.c, and I see all my input read into there. Adding
debug in tty_read() though, I see characters go missing. It's only 50 or
so characters out of 4K or more. The bug is 100% reproducible, but only
on an internal simulator (sorry). I'm running one cpu and one thread,
with no preempt.
I'm using 80f5069 or thereabouts, plus some powerpc patches. We also see
the bug in 31, but not in 27.
I've tried two of Linus' patches from the X keyboard thread, the one
that changes flush_to_ldisc() and the one that adds
locked_tty_buffer_request_room() - neither seems to help. I've also
changed hvc_console.c to use tty_insert_flip_string(tty, c, 1) (which is
locked after Linus' patch) instead of tty_insert_flip_char() - also
didn't help.
Some example debug output:
137352697: (137187792): hvc_fss_read_console: '.out
137353604: (137188699): cat strace'
137361976: (137197049): hvc_fss_read_console: '.out | callthru '
137370882: (137205933): hvc_fss_read_console: 'sink /home/micha'
137379788: (137214817): hvc_fss_read_console: 'el/kbt/strace.ou'
137388696: (137223703): hvc_fss_read_console: 't
137389606: (137224613): callthru eval'
2396468919: (2340876339): (none) / # tty_read: read 1 chars 'c'
2396579463: (2340986479): ctty_read: read 1 chars 'a'
2396690167: (2341096780): atty_read: read 1 chars 't'
2396800851: (2341207061): ttty_read: read 1 chars ' '
2396911580: (2341317387): tty_read: read 1 chars 's'
2397022603: (2341428007): stty_read: read 1 chars 't'
2397133524: (2341538525): ttty_read: read 1 chars 'r'
2397244739: (2341649337): rtty_read: read 1 chars 'a'
2397355852: (2341760047): atty_read: read 1 chars 'c'
2397467259: (2341871051): ctty_read: read 1 chars 'e'
2397578564: (2341981953): etty_read: read 1 chars '.'
2397690618: (2342093604): .tty_read: read 1 chars, newline
2397800089: (2342202671):
2399849231: (2344246188): cat: strace.: No such file or directory
So we seem to have lost everything after the dot in the second ".out"
until the end of the line.
It's entirely possible this is a bug in hvc_console.c, or our arch
patches, or the simulator .. or the tty code :)
Anyway, if anyone has ideas or patches I'm all ears.
cheers
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Characters vanishing in the tty layer? (maybe related to [Bug #14388] keyboard under X with 2.6.31)
2009-10-14 6:21 Characters vanishing in the tty layer? (maybe related to [Bug #14388] keyboard under X with 2.6.31) Michael Ellerman
@ 2009-10-14 14:53 ` Linus Torvalds
2009-10-19 6:38 ` Michael Ellerman
0 siblings, 1 reply; 6+ messages in thread
From: Linus Torvalds @ 2009-10-14 14:53 UTC (permalink / raw)
To: Michael Ellerman
Cc: Linux Kernel Mailing List, Rafael J. Wysocki, Boyan,
Kernel Testers List, fredlwm-Re5JQEeQqe8AvxtiuMwx3w, Ed Tomlinson,
OGAWA Hirofumi, Dmitry Torokhov
On Wed, 14 Oct 2009, Michael Ellerman wrote:
>
> I'm using 80f5069 or thereabouts, plus some powerpc patches. We also see
> the bug in 31, but not in 27.
Could you bisect it? Even if you don't bisect it _all_ the way, 27..31 is
a big range. Doing just a handful of bisections (say six or seven) would
cut down the number of commits from 47 thousand to somewhat more
manageable "hundreds of commits"... But obviously going all the way would
be better.
Linus
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Characters vanishing in the tty layer? (maybe related to [Bug #14388] keyboard under X with 2.6.31)
2009-10-14 14:53 ` Linus Torvalds
@ 2009-10-19 6:38 ` Michael Ellerman
2009-10-19 18:09 ` Linus Torvalds
0 siblings, 1 reply; 6+ messages in thread
From: Michael Ellerman @ 2009-10-19 6:38 UTC (permalink / raw)
To: Linus Torvalds
Cc: Linux Kernel Mailing List, Kernel Testers List, Michael Neuling,
Benjamin Herrenschmidt
[-- Attachment #1: Type: text/plain, Size: 2540 bytes --]
On Wed, 2009-10-14 at 07:53 -0700, Linus Torvalds wrote:
>
> On Wed, 14 Oct 2009, Michael Ellerman wrote:
> >
> > I'm using 80f5069 or thereabouts, plus some powerpc patches. We also see
> > the bug in 31, but not in 27.
>
> Could you bisect it? Even if you don't bisect it _all_ the way, 27..31 is
> a big range. Doing just a handful of bisections (say six or seven) would
> cut down the number of commits from 47 thousand to somewhat more
> manageable "hundreds of commits"... But obviously going all the way would
> be better.
As Benh said it's not really bisectable on our kernel. But I got Mikey
to bisect it on upstream using a different simulator model, and he
couldn't tie it down. It becomes easier to hit in more recent kernels
(since 27), but he could hit in 25 too.
Looking at the code, which I should have done first, it looks like this
is expected behaviour. The sequence is roughly:
* hvc_console reads all our input and passes it to the tty code
via tty_insert_flip_char()
* flush_to_ldisc() runs calling n_tty_receive_buf(), which fills
4K of tty->read_buf
* Once read_buf is full, tty->receive_room becomes 0 and
flush_to_ldisc() reschedules itself to run again in 1 jiffy.
* Bash reads 1 character, causing receive_room to become 1.
* flush_to_ldisc() runs again and inserts 1 more char because
receive_room is now 1.
* (repeat the last two steps a few times)
* Bash sets tty->icanon = 1 via n_tty_set_termios(), which calls
n_tty_set_room(). Because icanon is enabled, n_tty_set_room()
lies and says we have space for 1 character even though we
don't.
* flush_to_ldisc() runs, sees that receive_room is 1 and calls
n_tty_receive_buf()
* n_tty_receive_buf() calls n_tty_receive_char() which drops the
character because there's no room (~ line 1132).
* We keep dropping characters until we see a newline, which
increments tty->canon_data, causing n_tty_set_room() to report 0
space left, and so flush_to_ldisc() reschedules again.
It's a bit unfortunate that n_tty_set_room() lies about the available
space when icanon = 1, but it makes sense in order to handle erase. It
would be nice if n_tty_receive_buf() returned a status to
flush_to_disc() to say "actually I could only fit X chars after all,
please take them back" - but I don't grok how that would play with all
the other logic in there.
cheers
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Characters vanishing in the tty layer? (maybe related to [Bug #14388] keyboard under X with 2.6.31)
2009-10-19 6:38 ` Michael Ellerman
@ 2009-10-19 18:09 ` Linus Torvalds
[not found] ` <alpine.LFD.2.00.0910200245330.13080-OZUqEyPC5NRQetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>
0 siblings, 1 reply; 6+ messages in thread
From: Linus Torvalds @ 2009-10-19 18:09 UTC (permalink / raw)
To: Michael Ellerman
Cc: Linux Kernel Mailing List, Kernel Testers List, Michael Neuling,
Benjamin Herrenschmidt
On Mon, 19 Oct 2009, Michael Ellerman wrote:
>
> As Benh said it's not really bisectable on our kernel. But I got Mikey
> to bisect it on upstream using a different simulator model, and he
> couldn't tie it down. It becomes easier to hit in more recent kernels
> (since 27), but he could hit in 25 too.
Ok, thanks to the verification.
And I think I see why it got easier to hit lately, and to some degree I
think we can at least partially avoid it:
> * hvc_console reads all our input and passes it to the tty code
> via tty_insert_flip_char()
> * flush_to_ldisc() runs calling n_tty_receive_buf(), which fills
> 4K of tty->read_buf
> * Once read_buf is full, tty->receive_room becomes 0 and
> flush_to_ldisc() reschedules itself to run again in 1 jiffy.
> * Bash reads 1 character, causing receive_room to become 1.
> * flush_to_ldisc() runs again and inserts 1 more char because
> receive_room is now 1.
.. ok, I agree that our behavior in the "buffer full" case is likely not
wonderful. And that's especially true in the 'icanon' case..
> * (repeat the last two steps a few times)
> * Bash sets tty->icanon = 1 via n_tty_set_termios(), which calls
> n_tty_set_room(). Because icanon is enabled, n_tty_set_room()
> lies and says we have space for 1 character even though we
> don't.
So just to clarify - icanon wasn't set before?
> * flush_to_ldisc() runs, sees that receive_room is 1 and calls
> n_tty_receive_buf()
> * n_tty_receive_buf() calls n_tty_receive_char() which drops the
> character because there's no room (~ line 1132).
> * We keep dropping characters until we see a newline, which
> increments tty->canon_data, causing n_tty_set_room() to report 0
> space left, and so flush_to_ldisc() reschedules again.
Did you have newlines in the big bulk dump?
If there really aren't any newlines, I don't think we can do a lot. icanon
handling kind of fundamentally doesn't work if there is no newline at all,
since it is all about line buffering, and we obviously have to limit the
lines _somewhere_.
But I also thihk that we only update that whole 'canon_data' thing if we
_received_ the newlines while we were in icanon mode (but I didn't really
check very closely), so if we actually switch from -icanon to +icanon, I
think canon_data can be confused, and we thus handle the buffer-full case
worse than we _could_ have.
> It's a bit unfortunate that n_tty_set_room() lies about the available
> space when icanon = 1, but it makes sense in order to handle erase. It
> would be nice if n_tty_receive_buf() returned a status to
> flush_to_disc() to say "actually I could only fit X chars after all,
> please take them back" - but I don't grok how that would play with all
> the other logic in there.
Yeah, I don't think that is even worth it. The thing is, we do have to
start dropping characters at some point, so trying to extend the non-drop
case just moves it somewhere else. If you are in canon mode, and the line
is longer than <some-number-that-happens-to-be-4kB>, you're basically
screwed.
But what we _might_ do is to handle the "turn on canon mode" case better,
in case the old buffer had newlines. IOW, instead of always setting
'canon_data' to 0 when icanon changes (and setting canon_head to the tail
of the data), maybe we should simply _count_ the number of newlines (and
set canon_head to the last one in the buffer).
IOW, if you do have newlines in your bulk data before icanon got set, we
could probably make n_tty handle that case better.
That said, as far as I know, the tty layer behavior in this area has
basically always been the way it is. The fact that you can see it more
easily is almost certainly just related to just how the buffers that lead
up to flush_to_ldisc have grown, and are now allowed to fill up further.
So it probably got much easier to trigger, but it is likely not in any way
a really new case, and goes back not just to 2.6.25, but probably
forever...
I wonder a bit what in your particular environment makes this problem show
up, but I assume that your simulation environment ends up just blindly
stuffing the tty buffers through the virtual console, so you basically
have "simulated typing" that was (a) started long before the reader was
interested in accepting it and (b) was a huge dump rather than what any
real load would be.
But if you change n_tty_set_termios() to count newlines when it sets
icanon, you might get the behaviour you want, and it would seem to me to
be an improvement over what we have now, so I wouldn't object to it, even
if I suspect nobody else has ever cared, and would ever care in the
future.
Linus
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Characters vanishing in the tty layer? (maybe related to [Bug #14388] keyboard under X with 2.6.31)
[not found] ` <alpine.LFD.2.00.0910200245330.13080-OZUqEyPC5NRQetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>
@ 2009-10-20 0:31 ` Benjamin Herrenschmidt
2009-10-20 5:31 ` Michael Ellerman
1 sibling, 0 replies; 6+ messages in thread
From: Benjamin Herrenschmidt @ 2009-10-20 0:31 UTC (permalink / raw)
To: Linus Torvalds
Cc: Michael Ellerman, Linux Kernel Mailing List, Kernel Testers List,
Michael Neuling
On Tue, 2009-10-20 at 03:09 +0900, Linus Torvalds wrote:
>
>
> > It's a bit unfortunate that n_tty_set_room() lies about the available
> > space when icanon = 1, but it makes sense in order to handle erase. It
> > would be nice if n_tty_receive_buf() returned a status to
> > flush_to_disc() to say "actually I could only fit X chars after all,
> > please take them back" - but I don't grok how that would play with all
> > the other logic in there.
>
> Yeah, I don't think that is even worth it. The thing is, we do have to
> start dropping characters at some point,
Do we ? IE, if we could push the flow control back up to hvc_console,
telling it how many chars we -really- could put in instead of dropping
anything, the hvc_console back end can then stop fetching from the
hypervisor, which would for most cases of virtual consoles push the flow
control all the way to the client terminal and thus not lose anything.
I haven't looked at the code in details and I may well be missing
something important, of course, and I agree that if there's no new line
we do have to impose a limit to our line buffer.
> so trying to extend the non-drop
> case just moves it somewhere else. If you are in canon mode, and the line
> is longer than <some-number-that-happens-to-be-4kB>, you're basically
> screwed.
Right.
.../...
> I wonder a bit what in your particular environment makes this problem show
> up, but I assume that your simulation environment ends up just blindly
> stuffing the tty buffers through the virtual console, so you basically
> have "simulated typing" that was (a) started long before the reader was
> interested in accepting it and (b) was a huge dump rather than what any
> real load would be.
Yes pretty much
> But if you change n_tty_set_termios() to count newlines when it sets
> icanon, you might get the behaviour you want, and it would seem to me to
> be an improvement over what we have now, so I wouldn't object to it, even
> if I suspect nobody else has ever cared, and would ever care in the
> future.
Cheers,
Ben.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Characters vanishing in the tty layer? (maybe related to [Bug #14388] keyboard under X with 2.6.31)
[not found] ` <alpine.LFD.2.00.0910200245330.13080-OZUqEyPC5NRQetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>
2009-10-20 0:31 ` Benjamin Herrenschmidt
@ 2009-10-20 5:31 ` Michael Ellerman
1 sibling, 0 replies; 6+ messages in thread
From: Michael Ellerman @ 2009-10-20 5:31 UTC (permalink / raw)
To: Linus Torvalds
Cc: Linux Kernel Mailing List, Kernel Testers List, Michael Neuling,
Benjamin Herrenschmidt
[-- Attachment #1: Type: text/plain, Size: 6117 bytes --]
On Tue, 2009-10-20 at 03:09 +0900, Linus Torvalds wrote:
>
> On Mon, 19 Oct 2009, Michael Ellerman wrote:
> >
> > As Benh said it's not really bisectable on our kernel. But I got Mikey
> > to bisect it on upstream using a different simulator model, and he
> > couldn't tie it down. It becomes easier to hit in more recent kernels
> > (since 27), but he could hit in 25 too.
>
> Ok, thanks to the verification.
>
> And I think I see why it got easier to hit lately, and to some degree I
> think we can at least partially avoid it:
>
> > * hvc_console reads all our input and passes it to the tty code
> > via tty_insert_flip_char()
> > * flush_to_ldisc() runs calling n_tty_receive_buf(), which fills
> > 4K of tty->read_buf
> > * Once read_buf is full, tty->receive_room becomes 0 and
> > flush_to_ldisc() reschedules itself to run again in 1 jiffy.
> > * Bash reads 1 character, causing receive_room to become 1.
> > * flush_to_ldisc() runs again and inserts 1 more char because
> > receive_room is now 1.
>
> .. ok, I agree that our behavior in the "buffer full" case is likely not
> wonderful. And that's especially true in the 'icanon' case..
>
> > * (repeat the last two steps a few times)
> > * Bash sets tty->icanon = 1 via n_tty_set_termios(), which calls
> > n_tty_set_room(). Because icanon is enabled, n_tty_set_room()
> > lies and says we have space for 1 character even though we
> > don't.
>
> So just to clarify - icanon wasn't set before?
It wasn't set before.
> > * flush_to_ldisc() runs, sees that receive_room is 1 and calls
> > n_tty_receive_buf()
> > * n_tty_receive_buf() calls n_tty_receive_char() which drops the
> > character because there's no room (~ line 1132).
> > * We keep dropping characters until we see a newline, which
> > increments tty->canon_data, causing n_tty_set_room() to report 0
> > space left, and so flush_to_ldisc() reschedules again.
>
> Did you have newlines in the big bulk dump?
Yes. It's not really a bulk dump, it's just a bit over 4K of shell
script that we're writing into the console to run a test. So lots of
newlines.
> If there really aren't any newlines, I don't think we can do a lot. icanon
> handling kind of fundamentally doesn't work if there is no newline at all,
> since it is all about line buffering, and we obviously have to limit the
> lines _somewhere_.
Right that makes sense. But shouldn't be the case here.
> But I also thihk that we only update that whole 'canon_data' thing if we
> _received_ the newlines while we were in icanon mode (but I didn't really
> check very closely), so if we actually switch from -icanon to +icanon, I
> think canon_data can be confused, and we thus handle the buffer-full case
> worse than we _could_ have.
Yeah I think that's right. n_tty_set_termios() always sets canon_data to
0 when there is a canon change, in either direction. And I don't see
anything else will fix it.
> > It's a bit unfortunate that n_tty_set_room() lies about the available
> > space when icanon = 1, but it makes sense in order to handle erase. It
> > would be nice if n_tty_receive_buf() returned a status to
> > flush_to_disc() to say "actually I could only fit X chars after all,
> > please take them back" - but I don't grok how that would play with all
> > the other logic in there.
>
> Yeah, I don't think that is even worth it. The thing is, we do have to
> start dropping characters at some point, so trying to extend the non-drop
> case just moves it somewhere else. If you are in canon mode, and the line
> is longer than <some-number-that-happens-to-be-4kB>, you're basically
> screwed.
Agreed.
> But what we _might_ do is to handle the "turn on canon mode" case better,
> in case the old buffer had newlines. IOW, instead of always setting
> 'canon_data' to 0 when icanon changes (and setting canon_head to the tail
> of the data), maybe we should simply _count_ the number of newlines (and
> set canon_head to the last one in the buffer).
>
> IOW, if you do have newlines in your bulk data before icanon got set, we
> could probably make n_tty handle that case better.
Yeah I think that is the root cause of my bug. There are plenty of lines
in my buffer, so n_tty_set_room() should not be accepting more data.
> That said, as far as I know, the tty layer behavior in this area has
> basically always been the way it is. The fact that you can see it more
> easily is almost certainly just related to just how the buffers that lead
> up to flush_to_ldisc have grown, and are now allowed to fill up further.
> So it probably got much easier to trigger, but it is likely not in any way
> a really new case, and goes back not just to 2.6.25, but probably
> forever...
Yeah, we only stopped at 25 because our toolchain was too new :}
> I wonder a bit what in your particular environment makes this problem show
> up, but I assume that your simulation environment ends up just blindly
> stuffing the tty buffers through the virtual console, so you basically
> have "simulated typing" that was (a) started long before the reader was
> interested in accepting it and (b) was a huge dump rather than what any
> real load would be.
Yes and no. (a) doesn't seem to apply, no amount of delay before
starting the input makes a difference. And (b) only sort of. I don't
think a bit more than 4K is a ridiculous amount to stuff down a console.
Though I guess we're "typing" at a ridiculous speed.
> But if you change n_tty_set_termios() to count newlines when it sets
> icanon, you might get the behaviour you want, and it would seem to me to
> be an improvement over what we have now, so I wouldn't object to it, even
> if I suspect nobody else has ever cared, and would ever care in the
> future.
Clearly no one has ever cared before :) I'll have a look if I can come
up with a neat fix.
cheers
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-10-20 5:31 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-14 6:21 Characters vanishing in the tty layer? (maybe related to [Bug #14388] keyboard under X with 2.6.31) Michael Ellerman
2009-10-14 14:53 ` Linus Torvalds
2009-10-19 6:38 ` Michael Ellerman
2009-10-19 18:09 ` Linus Torvalds
[not found] ` <alpine.LFD.2.00.0910200245330.13080-OZUqEyPC5NRQetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>
2009-10-20 0:31 ` Benjamin Herrenschmidt
2009-10-20 5:31 ` Michael Ellerman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).