* PROBLEM: Data corruption when pasting large data to terminal [not found] <CAGWcZkJs2uQHM=7wmf1JOLmUeS3Mxo5L4arMuMQSez1mvJLKQA@mail.gmail.com> @ 2012-02-15 18:50 ` Egmont Koblinger 2012-02-15 23:30 ` Greg KH 2012-02-15 23:58 ` Parag Warudkar 0 siblings, 2 replies; 18+ messages in thread From: Egmont Koblinger @ 2012-02-15 18:50 UTC (permalink / raw) To: gregkh, linux-kernel Hi, Short summary: When pasting large amount of data (>4kB) to terminals, often the data gets mangled. How to reproduce: Create a text file that contains this line about 100 times: a=(123456789123456789123456789123456789123456789123456789123456789) (also available at http://pastebin.com/LAH2bmaw for a while) and then copy-paste its entire contents in one step into a "bash" or "python" running in a graphical terminal. Expected result: The interpreter correctly parses these lines and produces no visible result. Actual result: They complain about syntax error. Reproducibility: About 10% on my computer (2.6.38.8), reportedly 100% on friends' computers running 2.6.37 and 3.1.1. Why I believe this is a kernel bug: - Reproducible with any source of copy-pasting (e.g. various terminals, graphical editors, browsers). - Reproducible with at least five different popular graphical terminal emulators where you paste into (xterm, gnome, kde, urxvt, putty). - Reproducble with at least two applications (bash, python). - stracing the terminal shows that it does indeed write the correct copy-paste buffer into /dev/ptmx, and all its writes return the full amount of bytes requested, i.e. no short write. - stracing the application clearly shows that it does not receive all the desired characters from its stdin, some are simply missing, i.e. a read(0, "3", 1) = 1 is followed by a read(0, "\n", 1) = 1 (with a write() and some rt_sigprocmask()s in between), although the char '3' shouldn't be followed by a newline. - Not reproducible on MacOS. Additional informaiton: - On friends' computers the bug always happens from the offset 4163 which is exactly the length of the first line (data immediately processed by the application) plus the magic 4095. The rest of that line, up to the next newline, is cut off. - On my computer, the bug, if happens, always happens at an offset behind this one; moreover, there's a lone digit '3' appearing on the display on its own line exactly 4095 bytes before the syntax error. Here's a "screenshot" with "$ " being the bash prompt, and with my comments after "#": $ a=(123456789123456789123456789123456789123456789123456789123456789) # repeated a few, varying number of times 3 # <- notice this lone '3' on the display $ a=(123456789123456789123456789123456789123456789123456789123456789) # 60 times, that's 4080 bytes incl. newlines $ a=(123456789123 > a=(123456789123456789123456789123456789123456789123456789123456789) bash: syntax error near unexpected token `(' $ a=(123456789123456789123456789123456789123456789123456789123456789) # a few more times - I couldn't reproduce with cat-like applications, I have a feeling perhaps the bug only occurs in raw terminal mode, but I'm really not sure about this. I'd be glad if you could find the time to look at this problem, it's quite unfortunate that I cannot safely copy-paste large amount of data into terminals. Thanks a lot, egmont ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: PROBLEM: Data corruption when pasting large data to terminal 2012-02-15 18:50 ` PROBLEM: Data corruption when pasting large data to terminal Egmont Koblinger @ 2012-02-15 23:30 ` Greg KH 2012-02-16 0:39 ` Egmont Koblinger 2012-02-15 23:58 ` Parag Warudkar 1 sibling, 1 reply; 18+ messages in thread From: Greg KH @ 2012-02-15 23:30 UTC (permalink / raw) To: Egmont Koblinger; +Cc: linux-kernel On Wed, Feb 15, 2012 at 07:50:58PM +0100, Egmont Koblinger wrote: > Hi, > > Short summary: When pasting large amount of data (>4kB) to terminals, > often the data gets mangled. > > How to reproduce: > Create a text file that contains this line about 100 times: > a=(123456789123456789123456789123456789123456789123456789123456789) > (also available at http://pastebin.com/LAH2bmaw for a while) > and then copy-paste its entire contents in one step into a "bash" or > "python" running in a graphical terminal. > > Expected result: The interpreter correctly parses these lines and > produces no visible result. > Actual result: They complain about syntax error. > Reproducibility: About 10% on my computer (2.6.38.8), reportedly 100% > on friends' computers running 2.6.37 and 3.1.1. Has this ever worked properly for you on older kernels? How about 3.2? 3.3-rc3? Having a "known good" point to work from here would be nice to have. I can reproduce this using bash, BUT, I can not reproduce it using vim running in the same window bash was running in. So, that implies that this is a userspace bug, not a kernel one, otherwise the results would be the same both times, right? > Why I believe this is a kernel bug: > - Reproducible with any source of copy-pasting (e.g. various > terminals, graphical editors, browsers). Bugs are common when people start with the same original codebase :) > - Reproducible with at least five different popular graphical terminal > emulators where you paste into (xterm, gnome, kde, urxvt, putty). > - Reproducble with at least two applications (bash, python). Again, I can't duplicate this with vim in a terminal window, which rules out the terminal, and points at bash, right? > - stracing the terminal shows that it does indeed write the correct > copy-paste buffer into /dev/ptmx, and all its writes return the full > amount of bytes requested, i.e. no short write. short writes are legal, but so many userspace programs don't handle them properly. > - stracing the application clearly shows that it does not receive all > the desired characters from its stdin, some are simply missing, i.e. a > read(0, "3", 1) = 1 is followed by a read(0, "\n", 1) = 1 (with a > write() and some rt_sigprocmask()s in between), although the char '3' > shouldn't be followed by a newline. Perhaps the buffer is overflowing as the program isn't able to keep up properly? It's not an "endless" buffer, it can overflow if reads don't keep up. > - Not reproducible on MacOS. That means nothing :) > Additional informaiton: > - On friends' computers the bug always happens from the offset 4163 > which is exactly the length of the first line (data immediately > processed by the application) plus the magic 4095. The rest of that > line, up to the next newline, is cut off. > > - On my computer, the bug, if happens, always happens at an offset > behind this one; moreover, there's a lone digit '3' appearing on the > display on its own line exactly 4095 bytes before the syntax error. > Here's a "screenshot" with "$ " being the bash prompt, and with my > comments after "#": > > $ a=(123456789123456789123456789123456789123456789123456789123456789) > # repeated a few, varying number of times > 3 > # <- notice this lone '3' on the display > $ a=(123456789123456789123456789123456789123456789123456789123456789) > # 60 times, that's 4080 bytes incl. newlines > $ a=(123456789123 > > a=(123456789123456789123456789123456789123456789123456789123456789) > bash: syntax error near unexpected token `(' > $ a=(123456789123456789123456789123456789123456789123456789123456789) > # a few more times > > - I couldn't reproduce with cat-like applications, I have a feeling > perhaps the bug only occurs in raw terminal mode, but I'm really not > sure about this. That kind of proves the "there's a problem in the application you are testing" theory, right? > I'd be glad if you could find the time to look at this problem, it's > quite unfortunate that I cannot safely copy-paste large amount of data > into terminals. Works for me, just use an editor to do that... thanks, greg k-h ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: PROBLEM: Data corruption when pasting large data to terminal 2012-02-15 23:30 ` Greg KH @ 2012-02-16 0:39 ` Egmont Koblinger 2012-02-16 0:54 ` Greg KH 0 siblings, 1 reply; 18+ messages in thread From: Egmont Koblinger @ 2012-02-16 0:39 UTC (permalink / raw) To: Greg KH; +Cc: linux-kernel Hi Greg, Sorry, I didn't emphasize the point that makes me suspect it's a kernel issue: - strace reveals that the terminal emulator writes the correct data into /dev/ptmx, and the kernel reports no short writes(!), all the write(..., ..., 68) calls actually return 68 (the length of the example file's lines incl. newline; I'm naively assuming I can trust strace here.) - strace reveals that the receiving application (bash) doesn't receive all the data from /dev/pts/N. - so: the data gets lost after writing to /dev/ptmx, but before reading it out from /dev/pts/N. First I was also hoping for a bug in the terminal emulators not handling short writes correctly, but it's not the case. Could you please verify that stracing the terminal and the app shows the same behavior to you? If it's the same, and if strace correctly reports the actual number of bytes written, then can it still be an application bug? Not being able to reproduce in vim/whatever doesn't mean too much, as it seems to be some kind of race condition (behaves differently on different machines, buggy only at ~10% of the time for me), the actual circumstances that trigger the bug might depend on timing or the way the applications read the buffer (byte by byte, or larger chunks) or number of processors or I don't know what. Unfortunately I have no information about "known good" reference point, but I recall seeing a similar bug a year or two ago, I just didn't pay attention to it. So probably it's not a new one. thanks a lot, egmont On Thu, Feb 16, 2012 at 00:30, Greg KH <gregkh@linuxfoundation.org> wrote: > > On Wed, Feb 15, 2012 at 07:50:58PM +0100, Egmont Koblinger wrote: > > Hi, > > > > Short summary: When pasting large amount of data (>4kB) to terminals, > > often the data gets mangled. > > > > How to reproduce: > > Create a text file that contains this line about 100 times: > > a=(123456789123456789123456789123456789123456789123456789123456789) > > (also available at http://pastebin.com/LAH2bmaw for a while) > > and then copy-paste its entire contents in one step into a "bash" or > > "python" running in a graphical terminal. > > > > Expected result: The interpreter correctly parses these lines and > > produces no visible result. > > Actual result: They complain about syntax error. > > Reproducibility: About 10% on my computer (2.6.38.8), reportedly 100% > > on friends' computers running 2.6.37 and 3.1.1. > > Has this ever worked properly for you on older kernels? How about 3.2? > 3.3-rc3? Having a "known good" point to work from here would be nice to > have. > > I can reproduce this using bash, BUT, I can not reproduce it using vim > running in the same window bash was running in. > > So, that implies that this is a userspace bug, not a kernel one, > otherwise the results would be the same both times, right? > > > Why I believe this is a kernel bug: > > - Reproducible with any source of copy-pasting (e.g. various > > terminals, graphical editors, browsers). > > Bugs are common when people start with the same original codebase :) > > > - Reproducible with at least five different popular graphical terminal > > emulators where you paste into (xterm, gnome, kde, urxvt, putty). > > - Reproducble with at least two applications (bash, python). > > Again, I can't duplicate this with vim in a terminal window, which rules > out the terminal, and points at bash, right? > > > - stracing the terminal shows that it does indeed write the correct > > copy-paste buffer into /dev/ptmx, and all its writes return the full > > amount of bytes requested, i.e. no short write. > > short writes are legal, but so many userspace programs don't handle them > properly. > > > - stracing the application clearly shows that it does not receive all > > the desired characters from its stdin, some are simply missing, i.e. a > > read(0, "3", 1) = 1 is followed by a read(0, "\n", 1) = 1 (with a > > write() and some rt_sigprocmask()s in between), although the char '3' > > shouldn't be followed by a newline. > > Perhaps the buffer is overflowing as the program isn't able to keep up > properly? It's not an "endless" buffer, it can overflow if reads don't > keep up. > > > - Not reproducible on MacOS. > > That means nothing :) > > > Additional informaiton: > > - On friends' computers the bug always happens from the offset 4163 > > which is exactly the length of the first line (data immediately > > processed by the application) plus the magic 4095. The rest of that > > line, up to the next newline, is cut off. > > > > - On my computer, the bug, if happens, always happens at an offset > > behind this one; moreover, there's a lone digit '3' appearing on the > > display on its own line exactly 4095 bytes before the syntax error. > > Here's a "screenshot" with "$ " being the bash prompt, and with my > > comments after "#": > > > > $ a=(123456789123456789123456789123456789123456789123456789123456789) > > # repeated a few, varying number of times > > 3 > > # <- notice this lone '3' on the display > > $ a=(123456789123456789123456789123456789123456789123456789123456789) > > # 60 times, that's 4080 bytes incl. newlines > > $ a=(123456789123 > > > a=(123456789123456789123456789123456789123456789123456789123456789) > > bash: syntax error near unexpected token `(' > > $ a=(123456789123456789123456789123456789123456789123456789123456789) > > # a few more times > > > > - I couldn't reproduce with cat-like applications, I have a feeling > > perhaps the bug only occurs in raw terminal mode, but I'm really not > > sure about this. > > That kind of proves the "there's a problem in the application you are > testing" theory, right? > > > I'd be glad if you could find the time to look at this problem, it's > > quite unfortunate that I cannot safely copy-paste large amount of data > > into terminals. > > Works for me, just use an editor to do that... > > thanks, > > greg k-h ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: PROBLEM: Data corruption when pasting large data to terminal 2012-02-16 0:39 ` Egmont Koblinger @ 2012-02-16 0:54 ` Greg KH 2012-02-16 1:12 ` Egmont Koblinger 2012-02-17 19:28 ` Pavel Machek 0 siblings, 2 replies; 18+ messages in thread From: Greg KH @ 2012-02-16 0:54 UTC (permalink / raw) To: Egmont Koblinger; +Cc: linux-kernel A: No. Q: Should I include quotations after my reply? http://daringfireball.net/2007/07/on_top On Thu, Feb 16, 2012 at 01:39:59AM +0100, Egmont Koblinger wrote: > Hi Greg, > > Sorry, I didn't emphasize the point that makes me suspect it's a kernel issue: > > - strace reveals that the terminal emulator writes the correct data > into /dev/ptmx, and the kernel reports no short writes(!), all the > write(..., ..., 68) calls actually return 68 (the length of the > example file's lines incl. newline; I'm naively assuming I can trust > strace here.) > - strace reveals that the receiving application (bash) doesn't receive > all the data from /dev/pts/N. > - so: the data gets lost after writing to /dev/ptmx, but before > reading it out from /dev/pts/N. Which it will, if the reader doesn't read fast enough, right? Is the data somewhere guaranteed to never "overrun" the buffer? If so, how do we handle not just running out of memory? > First I was also hoping for a bug in the terminal emulators not > handling short writes correctly, but it's not the case. Yes, that would make things easier. > Could you please verify that stracing the terminal and the app shows > the same behavior to you? If it's the same, and if strace correctly > reports the actual number of bytes written, then can it still be an > application bug? You can do that stracing vim if you want to, I'm currently on the road at the moment, and have to give a presentation in a few minutes, so my spare time for this is a bit limited :) > Not being able to reproduce in vim/whatever doesn't mean too much, as > it seems to be some kind of race condition (behaves differently on > different machines, buggy only at ~10% of the time for me), the actual > circumstances that trigger the bug might depend on timing or the way > the applications read the buffer (byte by byte, or larger chunks) or > number of processors or I don't know what. Not being able to reproduce it with a different userspace program is important, in that there is at least one "known good" userspace program here that does things correctly. I bet you can write a simple userspace program that also does this correctly, have you tried that? That might be best to provide a "tiny" reproducer. Odds are bash and python don't do things properly, as they aren't accustomed to such large buffers coming in at this rate of speed. They are designed for this type of thing, while vim is used to it. > Unfortunately I have no information about "known good" reference > point, but I recall seeing a similar bug a year or two ago, I just > didn't pay attention to it. So probably it's not a new one. If you can trace something down in the kernel to point to where we are doing something wrong, I would be glad to look at it. But without that, there's not much I can do here, sorry. thanks, greg k-h ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: PROBLEM: Data corruption when pasting large data to terminal 2012-02-16 0:54 ` Greg KH @ 2012-02-16 1:12 ` Egmont Koblinger 2012-02-17 19:28 ` Pavel Machek 1 sibling, 0 replies; 18+ messages in thread From: Egmont Koblinger @ 2012-02-16 1:12 UTC (permalink / raw) To: Greg KH; +Cc: linux-kernel On Thu, Feb 16, 2012 at 01:54, Greg KH <gregkh@linuxfoundation.org> wrote: >> - so: the data gets lost after writing to /dev/ptmx, but before >> reading it out from /dev/pts/N. > > Which it will, if the reader doesn't read fast enough, right? Is the > data somewhere guaranteed to never "overrun" the buffer? If so, how do > we handle not just running out of memory? If the buffer is full, the write() into /dev/ptmx should signal it by returning a smaller-than-requested number (short write), or 0 or it should block (perhaps depending on whether the fd is in blocking mode) -- am I correct? If the write() returns 68 whereas actually it stored 15 bytes and threw out the rest, then there's no way applications could handle this. As I said, stracing the terminal tells me that the terminal wrote 68 bytes 100 times, and all hundred times the write() call returned 68, not less. > I bet you can write a simple userspace program that also does this > correctly, have you tried that? That might be best to provide a "tiny" > reproducer. > > Odds are bash and python don't do things properly, as they aren't > accustomed to such large buffers coming in at this rate of speed. They > are designed for this type of thing, while vim is used to it. Again: stracing bash reveals that it reads from its stdin byte by byte, and at one point it read()s a '3' character followed by a '\n' with the next read() - something that never occurs in the input. I wish I had a simpler test case, but I don't have yet. Anyway, I'm not looking at the internals of the terminal or the application, I'm observing the interface where they talk to the pty, and what I see is that the terminal writes the correct stuff, and the application receives a garbled one. Whatever bash does with that data later on is irrelevant, isn't it? > If you can trace something down in the kernel to point to where we are > doing something wrong, I would be glad to look at it. But without that, > there's not much I can do here, sorry. I still can't understand why the two strace outputs are not convincing enough. I'm also uncertain what the desired behavior would be if one tries to write to ptmx while the kernel's buffer is full. thanks, e. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: PROBLEM: Data corruption when pasting large data to terminal 2012-02-16 0:54 ` Greg KH 2012-02-16 1:12 ` Egmont Koblinger @ 2012-02-17 19:28 ` Pavel Machek 2012-02-17 21:57 ` Bruno Prémont 1 sibling, 1 reply; 18+ messages in thread From: Pavel Machek @ 2012-02-17 19:28 UTC (permalink / raw) To: Greg KH; +Cc: Egmont Koblinger, linux-kernel Hi! > > Sorry, I didn't emphasize the point that makes me suspect it's a kernel issue: > > > > - strace reveals that the terminal emulator writes the correct data > > into /dev/ptmx, and the kernel reports no short writes(!), all the > > write(..., ..., 68) calls actually return 68 (the length of the > > example file's lines incl. newline; I'm naively assuming I can trust > > strace here.) > > - strace reveals that the receiving application (bash) doesn't receive > > all the data from /dev/pts/N. > > - so: the data gets lost after writing to /dev/ptmx, but before > > reading it out from /dev/pts/N. > > Which it will, if the reader doesn't read fast enough, right? Is the > data somewhere guaranteed to never "overrun" the buffer? If so, how do > we handle not just running out of memory? Start blocking the writer? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: PROBLEM: Data corruption when pasting large data to terminal 2012-02-17 19:28 ` Pavel Machek @ 2012-02-17 21:57 ` Bruno Prémont 2012-02-19 20:55 ` Egmont Koblinger 0 siblings, 1 reply; 18+ messages in thread From: Bruno Prémont @ 2012-02-17 21:57 UTC (permalink / raw) To: Pavel Machek; +Cc: Greg KH, Egmont Koblinger, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1904 bytes --] Hi, On Fri, 17 February 2012 Pavel Machek <pavel@ucw.cz> wrote: > > > Sorry, I didn't emphasize the point that makes me suspect it's a kernel issue: > > > > > > - strace reveals that the terminal emulator writes the correct data > > > into /dev/ptmx, and the kernel reports no short writes(!), all the > > > write(..., ..., 68) calls actually return 68 (the length of the > > > example file's lines incl. newline; I'm naively assuming I can trust > > > strace here.) > > > - strace reveals that the receiving application (bash) doesn't receive > > > all the data from /dev/pts/N. > > > - so: the data gets lost after writing to /dev/ptmx, but before > > > reading it out from /dev/pts/N. > > > > Which it will, if the reader doesn't read fast enough, right? Is the > > data somewhere guaranteed to never "overrun" the buffer? If so, how do > > we handle not just running out of memory? > > Start blocking the writer? I did quickly write a small test program (attached). It forks a reader child and sends data over to it, at the end both write down their copy of the buffer to a /tmp/ptmx_{in,out}.txt file for manual comparing results (in addition to basic output of mismatch start line) From the time it took the writer to write larger buffers (as seen using strace) it seems there *is* some kind of blocking, but it's not blocking long enough or unblocking too early if the reader does not keep up. For quick and dirty testing of effects of buffer sizes, tune "rsz", "wsz" and "line" in main() as well as total size with BUFF_SZ define. The effects for me are that writer writes all data but reader never sees tail of written data (how much is being seen seems variable, probably matter of scheduling, frequency scaling and similar racing factors). My test system is single-core uniprocessor centrino laptop (32bit x86) with 3.2.5 kernel. Bruno [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: ptmx.c --] [-- Type: text/x-csrc, Size: 4853 bytes --] #define _XOPEN_SOURCE 700 #include <stdio.h> #include <stdlib.h> #include <errno.h> #include <string.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> #include <time.h> #include <sys/wait.h> #define BUFF_SZ (4096*64) void write_buffer(const char *buff, size_t buff_sz, const char *fname) { int fd = open(fname, O_CREAT | O_WRONLY | O_TRUNC, 0664); size_t n = 0; ssize_t r; if (!fd) { fprintf(stderr, "Failed to open(3) %s: %s\n", fname, strerror(errno)); return; } do { r = write(fd, buff + n, buff_sz - n); if (r == -1) { if (errno == EAGAIN || errno == EWOULDBLOCK || errno == EINTR) continue; fprintf(stderr, "Failed to write(2): %s\n", strerror(errno)); return; } else if (r == 0) { break; } else { n += r; } } while (n < buff_sz); close(fd); } void ptmx_slave_test(int pty, const char *line, size_t rsz) { char *buff = malloc(BUFF_SZ); size_t n = 0, nn; ssize_t r; int l, bad; struct timespec slen; if (!buff) { fprintf(stderr, "Failed to malloc(3): %s\n", strerror(errno)); return; } do { r = read(pty, buff + n, rsz + n > BUFF_SZ ? BUFF_SZ - n : rsz); if (r == -1) { if (errno == EAGAIN || errno == EWOULDBLOCK || errno == EINTR) continue; fprintf(stderr, "Failed to read(2): %s\n", strerror(errno)); return; } else if (r == 0) { if (n < BUFF_SZ) fprintf(stderr, "Read %zu bytes, expected %zu!\n", n, BUFF_SZ); break; } else { n += r; } memset(&slen, 0, sizeof(slen)); nanosleep(&slen, NULL); } while (n < BUFF_SZ); nn = n; /* check buffer if it matches expected value... */ r = strlen(line); l = 0; bad = 0; for (n = 0; n < BUFF_SZ; n += r+1) { l++; if (memcmp(buff + n, line, n + r < BUFF_SZ ? r : BUFF_SZ - n) != 0) { // TODO: determine position of breakage! fprintf(stderr, "Line data mismatch for line %d!\n", l); bad = 1; break; } if (n + r + 1 < BUFF_SZ && buff[n+r] != '\n') { fprintf(stderr, "Expecting '\\n' at end of line %d, but found 0x%hhx\n", l, buff[n+r]); bad = 1; break; } } // fprintf(stderr, "Buffer seen by slave is:\n"); // fwrite(buff, BUFF_SZ, 1, stdout); if (bad) write_buffer(buff, nn, "/tmp/ptmx_out.txt"); } void ptmx_master_test(int pty, const char *line, size_t wsz) { char *buff = malloc(BUFF_SZ); size_t n = 0; ssize_t r; if (!buff) { fprintf(stderr, "Failed to malloc(3): %s\n", strerror(errno)); return; } /* initialize buffer */ r = strlen(line); for (n = 0; n < BUFF_SZ; n += r+1) { memcpy(buff + n, line, n + r < BUFF_SZ ? r : BUFF_SZ - n); if (n + r + 1 < BUFF_SZ) buff[n+r] = '\n'; } n = 0; do { r = write(pty, buff + n, wsz + n > BUFF_SZ ? BUFF_SZ - n : wsz); if (r == -1) { if (errno == EAGAIN || errno == EWOULDBLOCK || errno == EINTR) continue; fprintf(stderr, "Failed to write(2): %s\n", strerror(errno)); return; } else if (r == 0) { break; } else { n += r; } } while (n < BUFF_SZ); close(pty); write_buffer(buff, BUFF_SZ, "/tmp/ptmx_in.txt"); } int main() { const char *line = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"; const char *ptsdname = NULL; int pty, pid; size_t rsz = 128, wsz = 1024; pty = open("/dev/ptmx", O_RDWR | O_CLOEXEC); if (pty == -1) { fprintf(stderr, "Failed to open(3) /dev/ptmx: %s\n", strerror(errno)); return 1; } ptsdname = ptsname(pty); if (!ptsdname) { fprintf(stderr, "Failed to ptsname(3): %s\n", strerror(errno)); close(pty); return 1; } if (grantpt(pty) == -1) { fprintf(stderr, "Failed to grantpty(3): %s\n", strerror(errno)); close(pty); return 1; } if (unlockpt(pty) == -1) { fprintf(stderr, "Failed to unlockpt(3): %s\n", strerror(errno)); close(pty); return 1; } pid = fork(); if (pid == -1) { fprintf(stderr, "Failed to fork(3): %s\n", strerror(errno)); close(pty); return 1; } else if (pid == 0) { close(pty); pty = open(ptsdname, O_RDWR | O_CLOEXEC); if (pty == -1) { fprintf(stderr, "Failed to open(3) %s: %s\n", ptsdname, strerror(errno)); return 1; } ptmx_slave_test(pty, line, rsz); close(pty); return 0; } else { int s; ptmx_master_test(pty, line, wsz); if (waitpid(pid, &s, 0) == -1) { fprintf(stderr, "Failed to waitpid(2) for %d: %s\n", pid, strerror(errno)); return 1; } if (WIFEXITED(s) && WEXITSTATUS(s) == 0) return 0; if (WIFEXITED(s)) fprintf(stderr, "Child exited with %d\n", WEXITSTATUS(s)); else if (WIFSIGNALED(s)) fprintf(stderr, "Child died with signal %d\n", WTERMSIG(s)); else fprintf(stderr, "Child terminated in an unknown way with status %d\n", s); return 1; } } ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: PROBLEM: Data corruption when pasting large data to terminal 2012-02-17 21:57 ` Bruno Prémont @ 2012-02-19 20:55 ` Egmont Koblinger 2012-02-19 21:14 ` Bruno Prémont 0 siblings, 1 reply; 18+ messages in thread From: Egmont Koblinger @ 2012-02-19 20:55 UTC (permalink / raw) To: Bruno Prémont; +Cc: Pavel Machek, Greg KH, linux-kernel Hi Bruno, Unfortunately the lost tail is a different thing: the terminal is in cooked mode by default, so the kernel intentionally keeps the data in its buffer until it sees a complete line. A quick-and-dirty way of changing to byte-based transmission (I'm lazy to look up the actual system calls, apologies for the terribly ugly way of doing this) is: pty = open(ptsdname, O_RDWR): if (pty == -1) { ... } + char cmd[100]; + sprintf(cmd, "stty raw <>%s", ptsdname); + system(cmd); ptmx_slave_test(pty, line, rsz); Anyway, thanks very much for your test program, I'll try to modify it to trigger the data corruption bug. egmont On Fri, Feb 17, 2012 at 22:57, Bruno Prémont <bonbons@linux-vserver.org> wrote: > Hi, > > On Fri, 17 February 2012 Pavel Machek <pavel@ucw.cz> wrote: >> > > Sorry, I didn't emphasize the point that makes me suspect it's a kernel issue: >> > > >> > > - strace reveals that the terminal emulator writes the correct data >> > > into /dev/ptmx, and the kernel reports no short writes(!), all the >> > > write(..., ..., 68) calls actually return 68 (the length of the >> > > example file's lines incl. newline; I'm naively assuming I can trust >> > > strace here.) >> > > - strace reveals that the receiving application (bash) doesn't receive >> > > all the data from /dev/pts/N. >> > > - so: the data gets lost after writing to /dev/ptmx, but before >> > > reading it out from /dev/pts/N. >> > >> > Which it will, if the reader doesn't read fast enough, right? Is the >> > data somewhere guaranteed to never "overrun" the buffer? If so, how do >> > we handle not just running out of memory? >> >> Start blocking the writer? > > I did quickly write a small test program (attached). It forks a reader child > and sends data over to it, at the end both write down their copy of the buffer > to a /tmp/ptmx_{in,out}.txt file for manual comparing results (in addition > to basic output of mismatch start line) > > From the time it took the writer to write larger buffers (as seen using strace) > it seems there *is* some kind of blocking, but it's not blocking long enough > or unblocking too early if the reader does not keep up. > > > For quick and dirty testing of effects of buffer sizes, tune "rsz", "wsz" > and "line" in main() as well as total size with BUFF_SZ define. > > > The effects for me are that writer writes all data but reader never sees tail > of written data (how much is being seen seems variable, probably matter of > scheduling, frequency scaling and similar racing factors). > > My test system is single-core uniprocessor centrino laptop (32bit x86) with > 3.2.5 kernel. > > Bruno ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: PROBLEM: Data corruption when pasting large data to terminal 2012-02-19 20:55 ` Egmont Koblinger @ 2012-02-19 21:14 ` Bruno Prémont 2012-02-19 21:35 ` Alan Cox 2012-02-19 21:41 ` Egmont Koblinger 0 siblings, 2 replies; 18+ messages in thread From: Bruno Prémont @ 2012-02-19 21:14 UTC (permalink / raw) To: Egmont Koblinger; +Cc: Pavel Machek, Greg KH, linux-kernel Hi Egmont, On Sun, 19 February 2012 Egmont Koblinger <egmont@gmail.com> wrote: > Unfortunately the lost tail is a different thing: the terminal is in > cooked mode by default, so the kernel intentionally keeps the data in > its buffer until it sees a complete line. A quick-and-dirty way of > changing to byte-based transmission (I'm lazy to look up the actual > system calls, apologies for the terribly ugly way of doing this) is: > pty = open(ptsdname, O_RDWR): > if (pty == -1) { ... } > + char cmd[100]; > + sprintf(cmd, "stty raw <>%s", ptsdname); > + system(cmd); > ptmx_slave_test(pty, line, rsz); > > Anyway, thanks very much for your test program, I'll try to modify it > to trigger the data corruption bug. Well, not sure but the closing of ptmx on sender side should force kernel to flush whatever is remaining independently on end-of-line (I was thinking I should push an EOF over the ptmx instead of closing it before waiting for child process though I have not yet looked-up how to do so!). The amount of missing tail for my few runs of the test program were of varying length, but in all cases way more than a single line, thus I would hope it's not line-buffering by the kernel which causes the missing data! Bruno > egmont > > On Fri, Feb 17, 2012 at 22:57, Bruno Prémont <bonbons@linux-vserver.org> wrote: > > Hi, > > > > On Fri, 17 February 2012 Pavel Machek <pavel@ucw.cz> wrote: > >> > > Sorry, I didn't emphasize the point that makes me suspect it's a kernel issue: > >> > > > >> > > - strace reveals that the terminal emulator writes the correct data > >> > > into /dev/ptmx, and the kernel reports no short writes(!), all the > >> > > write(..., ..., 68) calls actually return 68 (the length of the > >> > > example file's lines incl. newline; I'm naively assuming I can trust > >> > > strace here.) > >> > > - strace reveals that the receiving application (bash) doesn't receive > >> > > all the data from /dev/pts/N. > >> > > - so: the data gets lost after writing to /dev/ptmx, but before > >> > > reading it out from /dev/pts/N. > >> > > >> > Which it will, if the reader doesn't read fast enough, right? Is the > >> > data somewhere guaranteed to never "overrun" the buffer? If so, how do > >> > we handle not just running out of memory? > >> > >> Start blocking the writer? > > > > I did quickly write a small test program (attached). It forks a reader child > > and sends data over to it, at the end both write down their copy of the buffer > > to a /tmp/ptmx_{in,out}.txt file for manual comparing results (in addition > > to basic output of mismatch start line) > > > > From the time it took the writer to write larger buffers (as seen using strace) > > it seems there *is* some kind of blocking, but it's not blocking long enough > > or unblocking too early if the reader does not keep up. > > > > > > For quick and dirty testing of effects of buffer sizes, tune "rsz", "wsz" > > and "line" in main() as well as total size with BUFF_SZ define. > > > > > > The effects for me are that writer writes all data but reader never sees tail > > of written data (how much is being seen seems variable, probably matter of > > scheduling, frequency scaling and similar racing factors). > > > > My test system is single-core uniprocessor centrino laptop (32bit x86) with > > 3.2.5 kernel. > > > > Bruno ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: PROBLEM: Data corruption when pasting large data to terminal 2012-02-19 21:14 ` Bruno Prémont @ 2012-02-19 21:35 ` Alan Cox 2012-02-19 21:41 ` Egmont Koblinger 1 sibling, 0 replies; 18+ messages in thread From: Alan Cox @ 2012-02-19 21:35 UTC (permalink / raw) To: Bruno Prémont; +Cc: Egmont Koblinger, Pavel Machek, Greg KH, linux-kernel > Well, not sure but the closing of ptmx on sender side should force kernel > to flush whatever is remaining independently on end-of-line (I was > thinking I should push an EOF over the ptmx instead of closing it before > waiting for child process though I have not yet looked-up how to do so!). > The behaviour for the master side on a close is to hangup the child. You would normally wait for the child to exit first ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: PROBLEM: Data corruption when pasting large data to terminal 2012-02-19 21:14 ` Bruno Prémont 2012-02-19 21:35 ` Alan Cox @ 2012-02-19 21:41 ` Egmont Koblinger 2012-02-20 17:18 ` Egmont Koblinger 1 sibling, 1 reply; 18+ messages in thread From: Egmont Koblinger @ 2012-02-19 21:41 UTC (permalink / raw) To: Bruno Prémont; +Cc: Pavel Machek, Greg KH, linux-kernel Hi Bruno, On Sun, Feb 19, 2012 at 22:14, Bruno Prémont <bonbons@linux-vserver.org> wrote: > Hi Egmont, > > On Sun, 19 February 2012 Egmont Koblinger <egmont@gmail.com> wrote: >> Unfortunately the lost tail is a different thing: the terminal is in >> cooked mode by default, so the kernel intentionally keeps the data in >> its buffer until it sees a complete line. A quick-and-dirty way of >> changing to byte-based transmission (I'm lazy to look up the actual >> system calls, apologies for the terribly ugly way of doing this) is: >> pty = open(ptsdname, O_RDWR): >> if (pty == -1) { ... } >> + char cmd[100]; >> + sprintf(cmd, "stty raw <>%s", ptsdname); >> + system(cmd); >> ptmx_slave_test(pty, line, rsz); >> >> Anyway, thanks very much for your test program, I'll try to modify it >> to trigger the data corruption bug. > > Well, not sure but the closing of ptmx on sender side should force kernel > to flush whatever is remaining independently on end-of-line (I was > thinking I should push an EOF over the ptmx instead of closing it before > waiting for child process though I have not yet looked-up how to do so!). As Alan also pointed out, the way to close stuff is not handled very nicely in the example. However, I didn't face a problem with that - I'm not particularly interested in whether the application receives all the data if I kill the underlying terminal. My problem is data corruption way before the end of the stream, and actually incorrect bytes received by the application (not just an early eof due to a closed terminal). I'm trying hard to reproduce that with a single example, but I haven't succeeded so far. Note that I've triggered the bug with 4 apps so far: emacs (which is always in char-based input mode), and three readline apps (which keep switching back and forth between the two modes). I have no clue yet whether the bug itself is related to raw char-based mode or not, but I guess switching to this mode might not hurt. egmont > > The amount of missing tail for my few runs of the test program were of > varying length, but in all cases way more than a single line, thus I would > hope it's not line-buffering by the kernel which causes the missing data! > > Bruno > > >> egmont >> >> On Fri, Feb 17, 2012 at 22:57, Bruno Prémont <bonbons@linux-vserver.org> wrote: >> > Hi, >> > >> > On Fri, 17 February 2012 Pavel Machek <pavel@ucw.cz> wrote: >> >> > > Sorry, I didn't emphasize the point that makes me suspect it's a kernel issue: >> >> > > >> >> > > - strace reveals that the terminal emulator writes the correct data >> >> > > into /dev/ptmx, and the kernel reports no short writes(!), all the >> >> > > write(..., ..., 68) calls actually return 68 (the length of the >> >> > > example file's lines incl. newline; I'm naively assuming I can trust >> >> > > strace here.) >> >> > > - strace reveals that the receiving application (bash) doesn't receive >> >> > > all the data from /dev/pts/N. >> >> > > - so: the data gets lost after writing to /dev/ptmx, but before >> >> > > reading it out from /dev/pts/N. >> >> > >> >> > Which it will, if the reader doesn't read fast enough, right? Is the >> >> > data somewhere guaranteed to never "overrun" the buffer? If so, how do >> >> > we handle not just running out of memory? >> >> >> >> Start blocking the writer? >> > >> > I did quickly write a small test program (attached). It forks a reader child >> > and sends data over to it, at the end both write down their copy of the buffer >> > to a /tmp/ptmx_{in,out}.txt file for manual comparing results (in addition >> > to basic output of mismatch start line) >> > >> > From the time it took the writer to write larger buffers (as seen using strace) >> > it seems there *is* some kind of blocking, but it's not blocking long enough >> > or unblocking too early if the reader does not keep up. >> > >> > >> > For quick and dirty testing of effects of buffer sizes, tune "rsz", "wsz" >> > and "line" in main() as well as total size with BUFF_SZ define. >> > >> > >> > The effects for me are that writer writes all data but reader never sees tail >> > of written data (how much is being seen seems variable, probably matter of >> > scheduling, frequency scaling and similar racing factors). >> > >> > My test system is single-core uniprocessor centrino laptop (32bit x86) with >> > 3.2.5 kernel. >> > >> > Bruno ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: PROBLEM: Data corruption when pasting large data to terminal 2012-02-19 21:41 ` Egmont Koblinger @ 2012-02-20 17:18 ` Egmont Koblinger 2012-02-20 17:31 ` Pavel Machek 0 siblings, 1 reply; 18+ messages in thread From: Egmont Koblinger @ 2012-02-20 17:18 UTC (permalink / raw) To: Bruno Prémont; +Cc: Pavel Machek, Greg KH, linux-kernel Further investigation reveals that: - In case of emacs, strace shows that it receives the correct data on its standard input, so it's an emacs bug, not a kernel one. My bad. - For the other three remaining readline-based apps (bash, python, bc), strace shows that wherever the data is correct, lines are terminated by '\r' (as it seems to be the standard for raw terminal mode, and the terminal always puts this character in the terminal), whereas as soon as it's buggy, the received character becomes a '\n' (as it seems to be the way for cooked terminal mode). Here's an excerpt of 'strace bash', grepping only the reads from stdin: read(0, "8", 1) = 1 read(0, "9", 1) = 1 read(0, ")", 1) = 1 read(0, "\r", 1) = 1 <-- everything's fine read(0, "a", 1) = 1 read(0, "=", 1) = 1 read(0, "(", 1) = 1 read(0, "1", 1) = 1 ... read(0, "2", 1) = 1 read(0, "3", 1) = 1 <-- a line shouldn't end with '3', read(0, "\n", 1) = 1 <-- and it's a '\n' where it's buggy read(0, "a", 1) = 1 read(0, "=", 1) = 1 read(0, "(", 1) = 1 read(0, "1", 1) = 1 read(0, "2", 1) = 1 - This, in combination with the fact that we haven't been able to reproduce the bug with a raw-only or cooked-only terminal, suggests that there's somehow a race condition when writes, reads and termios changes are all involved. I'll keep on investigating. There's quite a lot for me to learn, e.g. I'm wondering if maybe readline incorrectly uses the TCSETS* ioctl attributes? Right now readline only uses TCSETSW to change the terminal values, it toggles back-n-forth between two states (raw when expecting user input, cooked when processing a command), and only read()s in the raw state, is this the correct behavior? Even if it uses the wrong one, would it explain data missing from the input stream? TCSETSF seems to be one that can cause data to be dropped, but according to strace, readline doesn't use this. I'm quite new to this area, so any hint from terminal experts on how it should work would be appreciated. thanks a lot, egmont On Sun, Feb 19, 2012 at 22:41, Egmont Koblinger <egmont@gmail.com> wrote: > Hi Bruno, > > On Sun, Feb 19, 2012 at 22:14, Bruno Prémont <bonbons@linux-vserver.org> wrote: >> Hi Egmont, >> >> On Sun, 19 February 2012 Egmont Koblinger <egmont@gmail.com> wrote: >>> Unfortunately the lost tail is a different thing: the terminal is in >>> cooked mode by default, so the kernel intentionally keeps the data in >>> its buffer until it sees a complete line. A quick-and-dirty way of >>> changing to byte-based transmission (I'm lazy to look up the actual >>> system calls, apologies for the terribly ugly way of doing this) is: >>> pty = open(ptsdname, O_RDWR): >>> if (pty == -1) { ... } >>> + char cmd[100]; >>> + sprintf(cmd, "stty raw <>%s", ptsdname); >>> + system(cmd); >>> ptmx_slave_test(pty, line, rsz); >>> >>> Anyway, thanks very much for your test program, I'll try to modify it >>> to trigger the data corruption bug. >> >> Well, not sure but the closing of ptmx on sender side should force kernel >> to flush whatever is remaining independently on end-of-line (I was >> thinking I should push an EOF over the ptmx instead of closing it before >> waiting for child process though I have not yet looked-up how to do so!). > > As Alan also pointed out, the way to close stuff is not handled very > nicely in the example. However, I didn't face a problem with that - > I'm not particularly interested in whether the application receives > all the data if I kill the underlying terminal. My problem is data > corruption way before the end of the stream, and actually incorrect > bytes received by the application (not just an early eof due to a > closed terminal). I'm trying hard to reproduce that with a single > example, but I haven't succeeded so far. > > Note that I've triggered the bug with 4 apps so far: emacs (which is > always in char-based input mode), and three readline apps (which keep > switching back and forth between the two modes). I have no clue yet > whether the bug itself is related to raw char-based mode or not, but I > guess switching to this mode might not hurt. > > > egmont > >> >> The amount of missing tail for my few runs of the test program were of >> varying length, but in all cases way more than a single line, thus I would >> hope it's not line-buffering by the kernel which causes the missing data! >> >> Bruno >> >> >>> egmont >>> >>> On Fri, Feb 17, 2012 at 22:57, Bruno Prémont <bonbons@linux-vserver.org> wrote: >>> > Hi, >>> > >>> > On Fri, 17 February 2012 Pavel Machek <pavel@ucw.cz> wrote: >>> >> > > Sorry, I didn't emphasize the point that makes me suspect it's a kernel issue: >>> >> > > >>> >> > > - strace reveals that the terminal emulator writes the correct data >>> >> > > into /dev/ptmx, and the kernel reports no short writes(!), all the >>> >> > > write(..., ..., 68) calls actually return 68 (the length of the >>> >> > > example file's lines incl. newline; I'm naively assuming I can trust >>> >> > > strace here.) >>> >> > > - strace reveals that the receiving application (bash) doesn't receive >>> >> > > all the data from /dev/pts/N. >>> >> > > - so: the data gets lost after writing to /dev/ptmx, but before >>> >> > > reading it out from /dev/pts/N. >>> >> > >>> >> > Which it will, if the reader doesn't read fast enough, right? Is the >>> >> > data somewhere guaranteed to never "overrun" the buffer? If so, how do >>> >> > we handle not just running out of memory? >>> >> >>> >> Start blocking the writer? >>> > >>> > I did quickly write a small test program (attached). It forks a reader child >>> > and sends data over to it, at the end both write down their copy of the buffer >>> > to a /tmp/ptmx_{in,out}.txt file for manual comparing results (in addition >>> > to basic output of mismatch start line) >>> > >>> > From the time it took the writer to write larger buffers (as seen using strace) >>> > it seems there *is* some kind of blocking, but it's not blocking long enough >>> > or unblocking too early if the reader does not keep up. >>> > >>> > >>> > For quick and dirty testing of effects of buffer sizes, tune "rsz", "wsz" >>> > and "line" in main() as well as total size with BUFF_SZ define. >>> > >>> > >>> > The effects for me are that writer writes all data but reader never sees tail >>> > of written data (how much is being seen seems variable, probably matter of >>> > scheduling, frequency scaling and similar racing factors). >>> > >>> > My test system is single-core uniprocessor centrino laptop (32bit x86) with >>> > 3.2.5 kernel. >>> > >>> > Bruno ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: PROBLEM: Data corruption when pasting large data to terminal 2012-02-20 17:18 ` Egmont Koblinger @ 2012-02-20 17:31 ` Pavel Machek 2012-02-20 21:11 ` Egmont Koblinger 0 siblings, 1 reply; 18+ messages in thread From: Pavel Machek @ 2012-02-20 17:31 UTC (permalink / raw) To: Egmont Koblinger; +Cc: Bruno Prémont, Greg KH, linux-kernel > I'm quite new to this area, so any hint from terminal experts on how > it should work would be appreciated. One particular chance would be trying with some old kernel (like 2.4.0 or 2.6.0); if readline works as expected there, we have kernel regression. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: PROBLEM: Data corruption when pasting large data to terminal 2012-02-20 17:31 ` Pavel Machek @ 2012-02-20 21:11 ` Egmont Koblinger 2012-02-20 21:29 ` Egmont Koblinger 0 siblings, 1 reply; 18+ messages in thread From: Egmont Koblinger @ 2012-02-20 21:11 UTC (permalink / raw) To: Pavel Machek; +Cc: Bruno Prémont, Greg KH, linux-kernel [-- Attachment #1: Type: text/plain, Size: 2404 bytes --] Hi, I attach a simple self-contained test case that triggers the bug most of the time. Moreover, it turns out that we're facing a data corruption plus a deadlock issue -- often the test triggers randomly one of them. The test is a slight modification of Bruno's example (thanks!). The most important change is: it emulates a readline app by setting the terminal to cooked mode and doing some "work" (1 millisecond of sleep) after every newline, then reverting it to raw mode. Minor changes also include: ignoring the last 100 bytes (potentially an incomplete line that stays in the kernel's buffer, the slave doesn't expect that to arrive), plus a long sleep on the master after writing its output (ugly hack, but definitely long enough to give the slave time to read everything). The behavior is: - Often: Corrupt data read (\r versus \n changes, as well as actual loss of data), as reported by the slave. - Often: Deadlock, the slave hangs in a read() reading from the terminal, while the master hangs on its write() at the same time. You can play with parameters like the buffer size, the write size (wsz), the blocking vs. nonblocking mode of write, TCSETS versus TCSETSW -- they don't make much of a difference. What does make a difference though, is the read size (rsz). The bug is reproducible if and only if the read size is a divisor of the length of the line excluding the terminating newline (i.e. the length of the full line minus one); that is, a divisor of 62 in this example. So a read size of 1 (which is used by readline) triggers the bug with all kinds of data; larger read sizes only with certain well-crafted buffers. Also, the bug is still only reproducible after writing at least 4kB. This gives me a guts feeling (without actually studying the kernel's source) that it might be some circular buffer overrun: whenever there's only 1 byte left in the buffer, the final newline of a line, the writer can incorrectly wrap around in a 4k buffer and override that -- does this make any sense? Interestingly, the test uses \n and \r reversed compare to real life (the buffer should contain \r instead of \n, and ICRNL should be used instead of INLCR) -- for some reason this test didn't trigger the bug for me after swapping the two, I don't know why. Anyway, I hope that this test case and my findings about the read size helps catch and fix the bug. Thanks a lot, egmont [-- Attachment #2: ptmx2.c --] [-- Type: text/x-csrc, Size: 6151 bytes --] #define _XOPEN_SOURCE 700 #include <stdio.h> #include <stdlib.h> #include <errno.h> #include <string.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> #include <time.h> #include <sys/wait.h> #include <sys/select.h> #include <sys/ioctl.h> #include <termios.h> #define BUFF_SZ (4096*256) // Expect fewer bytes than the master writes, because the last incomplete line // is not sent to the slave in cooked mode. #define READ_BUFF_SZ (BUFF_SZ - 100) void raw(int pty) { struct termios t; ioctl(pty, TCGETS, &t); t.c_lflag &= ~ICANON; t.c_iflag &= ~INLCR; ioctl(pty, TCSETSW, &t); } void cooked(int pty) { struct termios t; ioctl(pty, TCGETS, &t); t.c_lflag |= ICANON; t.c_iflag |= INLCR; ioctl(pty, TCSETSW, &t); } void write_buffer(const char *buff, size_t buff_sz, const char *fname) { int fd = open(fname, O_CREAT | O_WRONLY | O_TRUNC, 0664); size_t n = 0; ssize_t r; if (!fd) { fprintf(stderr, "Failed to open(3) %s: %s\n", fname, strerror(errno)); return; } do { r = write(fd, buff + n, buff_sz - n); if (r == -1) { if (errno == EAGAIN || errno == EWOULDBLOCK || errno == EINTR) continue; fprintf(stderr, "Failed to write(2): %s\n", strerror(errno)); return; } else if (r == 0) { break; } else { n += r; } } while (n < buff_sz); close(fd); } void ptmx_slave_test(int pty, const char *line, size_t rsz) { char *buff = malloc(READ_BUFF_SZ); size_t n = 0, nn; ssize_t r; int l, bad; struct timespec slen; if (!buff) { fprintf(stderr, "Failed to malloc(3): %s\n", strerror(errno)); return; } memset(buff, 0, READ_BUFF_SZ); raw(pty); // emulate the initialization of a readline app do { r = read(pty, buff + n, rsz + n > READ_BUFF_SZ ? READ_BUFF_SZ - n : rsz); if (r == -1) { if (errno == EAGAIN || errno == EWOULDBLOCK || errno == EINTR) continue; fprintf(stderr, "Failed to read(2) after reading %d bytes: %s\n", n, strerror(errno)); break; // Despite the error, compare the buffer against the reference. } else if (r == 0) { if (n < READ_BUFF_SZ) fprintf(stderr, "Read %zu bytes, expected %zu!\n", n, READ_BUFF_SZ); break; } else { if (buff[n] == '\n') { // emulate a readline app taking action on the input cooked(pty); memset(&slen, 0, sizeof(slen)); slen.tv_nsec = 1000 * 1000; nanosleep(&slen, NULL); raw(pty); } n += r; } } while (n < READ_BUFF_SZ); nn = n; /* check buffer if it matches expected value... */ r = strlen(line); l = 0; bad = 0; for (n = 0; n < READ_BUFF_SZ; n += r+1) { l++; if (memcmp(buff + n, line, n + r < READ_BUFF_SZ ? r : READ_BUFF_SZ - n) != 0) { // TODO: determine position of breakage! fprintf(stderr, "Line data mismatch for line %d!\n", l); bad = 1; break; } if (n + r + 1 < READ_BUFF_SZ && buff[n+r] != '\n') { if (!bad) fprintf(stderr, "Expecting '\\n' at end of line %d, but found 0x%hhx\n", l, buff[n+r]); bad = 1; // Don't break, see if there's a more serious mistake than a \r -> \n. } } // fprintf(stderr, "Buffer seen by slave is:\n"); // fwrite(buff, READ_BUFF_SZ, 1, stdout); if (bad) { write_buffer(buff, nn, "/tmp/ptmx_out.txt"); fprintf(stderr, "See payload in /tmp/ptmx_out.txt\n"); } else fprintf(stderr, "slave says: everything's okay\n"); } void ptmx_master_test(int pty, const char *line, size_t wsz) { char *buff = malloc(BUFF_SZ); size_t n = 0; ssize_t r; if (!buff) { fprintf(stderr, "Failed to malloc(3): %s\n", strerror(errno)); return; } /* initialize buffer */ r = strlen(line); for (n = 0; n < BUFF_SZ; n += r+1) { memcpy(buff + n, line, n + r < BUFF_SZ ? r : BUFF_SZ - n); if (n + r + 1 < BUFF_SZ) buff[n+r] = '\n'; } n = 0; do { fprintf(stderr, "write %d\n", wsz + n > BUFF_SZ ? BUFF_SZ - n : wsz); r = write(pty, buff + n, wsz + n > BUFF_SZ ? BUFF_SZ - n : wsz); fprintf(stderr, " -> wrote %d\n", r); if (r == -1) { if (errno == EAGAIN || errno == EWOULDBLOCK || errno == EINTR) { fd_set write_fds; FD_ZERO(&write_fds); FD_SET(pty, &write_fds); select(pty+1, NULL, &write_fds, NULL, NULL); continue; } fprintf(stderr, "Failed to write(2): %s\n", strerror(errno)); return; } else if (r == 0) { break; } else { n += r; } } while (n < BUFF_SZ); fprintf(stderr, "master is sleeping now...\n"); sleep(10); fprintf(stderr, "master exiting\n"); close(pty); write_buffer(buff, BUFF_SZ, "/tmp/ptmx_in.txt"); } int main() { const char *line = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"; const char *ptsdname = NULL; int pty, pid; size_t rsz = 31, wsz = 4096 * 256; pty = open("/dev/ptmx", O_RDWR /* | O_NONBLOCK */); if (pty == -1) { fprintf(stderr, "Failed to open(3) /dev/ptmx: %s\n", strerror(errno)); return 1; } ptsdname = ptsname(pty); fprintf(stderr, "ptsname = %s\n", ptsdname); if (!ptsdname) { fprintf(stderr, "Failed to ptsname(3): %s\n", strerror(errno)); close(pty); return 1; } if (grantpt(pty) == -1) { fprintf(stderr, "Failed to grantpty(3): %s\n", strerror(errno)); close(pty); return 1; } if (unlockpt(pty) == -1) { fprintf(stderr, "Failed to unlockpt(3): %s\n", strerror(errno)); close(pty); return 1; } pid = fork(); if (pid == -1) { fprintf(stderr, "Failed to fork(3): %s\n", strerror(errno)); close(pty); return 1; } else if (pid == 0) { close(pty); pty = open(ptsdname, O_RDWR); if (pty == -1) { fprintf(stderr, "Failed to open(3) %s: %s\n", ptsdname, strerror(errno)); return 1; } ptmx_slave_test(pty, line, rsz); close(pty); return 0; } else { int s; ptmx_master_test(pty, line, wsz); if (waitpid(pid, &s, 0) == -1) { fprintf(stderr, "Failed to waitpid(2) for %d: %s\n", pid, strerror(errno)); return 1; } if (WIFEXITED(s) && WEXITSTATUS(s) == 0) return 0; if (WIFEXITED(s)) fprintf(stderr, "Child exited with %d\n", WEXITSTATUS(s)); else if (WIFSIGNALED(s)) fprintf(stderr, "Child died with signal %d\n", WTERMSIG(s)); else fprintf(stderr, "Child terminated in an unknown way with status %d\n", s); return 1; } } ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: PROBLEM: Data corruption when pasting large data to terminal 2012-02-20 21:11 ` Egmont Koblinger @ 2012-02-20 21:29 ` Egmont Koblinger 0 siblings, 0 replies; 18+ messages in thread From: Egmont Koblinger @ 2012-02-20 21:29 UTC (permalink / raw) To: Pavel Machek; +Cc: Bruno Prémont, Greg KH, linux-kernel [-- Attachment #1: Type: text/plain, Size: 689 bytes --] > What does make a difference though, is the read size (rsz). The bug > is reproducible if and only if the read size is a divisor of the > length of the line excluding the terminating newline (i.e. the length > of the full line minus one); that is, a divisor of 62 in this example. Errr, forget this paragraph, this was a bug in my code. When trying to emulate readline, rsz should really be 1. I was not scanning through the whole buffer looking for a newline, and was looking at an incorrect offset. Sorry for the pebkac. Fixed version attached. Anyway, the point is still the same, the data corruption or deadlock are still triggered the same way. thx, egmont [-- Attachment #2: ptmx3.c --] [-- Type: text/x-csrc, Size: 6152 bytes --] #define _XOPEN_SOURCE 700 #include <stdio.h> #include <stdlib.h> #include <errno.h> #include <string.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> #include <time.h> #include <sys/wait.h> #include <sys/select.h> #include <sys/ioctl.h> #include <termios.h> #define BUFF_SZ (4096*256) // Expect fewer bytes than the master writes, because the last incomplete line // is not sent to the slave in cooked mode. #define READ_BUFF_SZ (BUFF_SZ - 100) void raw(int pty) { struct termios t; ioctl(pty, TCGETS, &t); t.c_lflag &= ~ICANON; t.c_iflag &= ~INLCR; ioctl(pty, TCSETSW, &t); } void cooked(int pty) { struct termios t; ioctl(pty, TCGETS, &t); t.c_lflag |= ICANON; t.c_iflag |= INLCR; ioctl(pty, TCSETSW, &t); } void write_buffer(const char *buff, size_t buff_sz, const char *fname) { int fd = open(fname, O_CREAT | O_WRONLY | O_TRUNC, 0664); size_t n = 0; ssize_t r; if (!fd) { fprintf(stderr, "Failed to open(3) %s: %s\n", fname, strerror(errno)); return; } do { r = write(fd, buff + n, buff_sz - n); if (r == -1) { if (errno == EAGAIN || errno == EWOULDBLOCK || errno == EINTR) continue; fprintf(stderr, "Failed to write(2): %s\n", strerror(errno)); return; } else if (r == 0) { break; } else { n += r; } } while (n < buff_sz); close(fd); } void ptmx_slave_test(int pty, const char *line, size_t rsz) { char *buff = malloc(READ_BUFF_SZ); size_t n = 0, nn; ssize_t r; int l, bad; struct timespec slen; if (!buff) { fprintf(stderr, "Failed to malloc(3): %s\n", strerror(errno)); return; } memset(buff, 0, READ_BUFF_SZ); raw(pty); // emulate the initialization of a readline app do { r = read(pty, buff + n, rsz + n > READ_BUFF_SZ ? READ_BUFF_SZ - n : rsz); if (r == -1) { if (errno == EAGAIN || errno == EWOULDBLOCK || errno == EINTR) continue; fprintf(stderr, "Failed to read(2) after reading %d bytes: %s\n", n, strerror(errno)); break; // Despite the error, compare the buffer against the reference. } else if (r == 0) { if (n < READ_BUFF_SZ) fprintf(stderr, "Read %zu bytes, expected %zu!\n", n, READ_BUFF_SZ); break; } else { n += r; if (buff[n-1] == '\n') { // emulate a readline app taking action on the input cooked(pty); memset(&slen, 0, sizeof(slen)); slen.tv_nsec = 1000 * 1000; nanosleep(&slen, NULL); raw(pty); } } } while (n < READ_BUFF_SZ); nn = n; /* check buffer if it matches expected value... */ r = strlen(line); l = 0; bad = 0; for (n = 0; n < READ_BUFF_SZ; n += r+1) { l++; if (memcmp(buff + n, line, n + r < READ_BUFF_SZ ? r : READ_BUFF_SZ - n) != 0) { // TODO: determine position of breakage! fprintf(stderr, "Line data mismatch for line %d!\n", l); bad = 1; break; } if (n + r + 1 < READ_BUFF_SZ && buff[n+r] != '\n') { if (!bad) fprintf(stderr, "Expecting '\\n' at end of line %d, but found 0x%hhx\n", l, buff[n+r]); bad = 1; // Don't break, see if there's a more serious mistake than a \r -> \n. } } // fprintf(stderr, "Buffer seen by slave is:\n"); // fwrite(buff, READ_BUFF_SZ, 1, stdout); if (bad) { write_buffer(buff, nn, "/tmp/ptmx_out.txt"); fprintf(stderr, "See payload in /tmp/ptmx_out.txt\n"); } else fprintf(stderr, "slave says: everything's okay\n"); } void ptmx_master_test(int pty, const char *line, size_t wsz) { char *buff = malloc(BUFF_SZ); size_t n = 0; ssize_t r; if (!buff) { fprintf(stderr, "Failed to malloc(3): %s\n", strerror(errno)); return; } /* initialize buffer */ r = strlen(line); for (n = 0; n < BUFF_SZ; n += r+1) { memcpy(buff + n, line, n + r < BUFF_SZ ? r : BUFF_SZ - n); if (n + r + 1 < BUFF_SZ) buff[n+r] = '\n'; } n = 0; do { fprintf(stderr, "write %d\n", wsz + n > BUFF_SZ ? BUFF_SZ - n : wsz); r = write(pty, buff + n, wsz + n > BUFF_SZ ? BUFF_SZ - n : wsz); fprintf(stderr, " -> wrote %d\n", r); if (r == -1) { if (errno == EAGAIN || errno == EWOULDBLOCK || errno == EINTR) { fd_set write_fds; FD_ZERO(&write_fds); FD_SET(pty, &write_fds); select(pty+1, NULL, &write_fds, NULL, NULL); continue; } fprintf(stderr, "Failed to write(2): %s\n", strerror(errno)); return; } else if (r == 0) { break; } else { n += r; } } while (n < BUFF_SZ); fprintf(stderr, "master is sleeping now...\n"); sleep(10); fprintf(stderr, "master exiting\n"); close(pty); write_buffer(buff, BUFF_SZ, "/tmp/ptmx_in.txt"); } int main() { const char *line = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"; const char *ptsdname = NULL; int pty, pid; size_t rsz = 1, wsz = 4096 * 256; pty = open("/dev/ptmx", O_RDWR /* | O_NONBLOCK */); if (pty == -1) { fprintf(stderr, "Failed to open(3) /dev/ptmx: %s\n", strerror(errno)); return 1; } ptsdname = ptsname(pty); fprintf(stderr, "ptsname = %s\n", ptsdname); if (!ptsdname) { fprintf(stderr, "Failed to ptsname(3): %s\n", strerror(errno)); close(pty); return 1; } if (grantpt(pty) == -1) { fprintf(stderr, "Failed to grantpty(3): %s\n", strerror(errno)); close(pty); return 1; } if (unlockpt(pty) == -1) { fprintf(stderr, "Failed to unlockpt(3): %s\n", strerror(errno)); close(pty); return 1; } pid = fork(); if (pid == -1) { fprintf(stderr, "Failed to fork(3): %s\n", strerror(errno)); close(pty); return 1; } else if (pid == 0) { close(pty); pty = open(ptsdname, O_RDWR); if (pty == -1) { fprintf(stderr, "Failed to open(3) %s: %s\n", ptsdname, strerror(errno)); return 1; } ptmx_slave_test(pty, line, rsz); close(pty); return 0; } else { int s; ptmx_master_test(pty, line, wsz); if (waitpid(pid, &s, 0) == -1) { fprintf(stderr, "Failed to waitpid(2) for %d: %s\n", pid, strerror(errno)); return 1; } if (WIFEXITED(s) && WEXITSTATUS(s) == 0) return 0; if (WIFEXITED(s)) fprintf(stderr, "Child exited with %d\n", WEXITSTATUS(s)); else if (WIFSIGNALED(s)) fprintf(stderr, "Child died with signal %d\n", WTERMSIG(s)); else fprintf(stderr, "Child terminated in an unknown way with status %d\n", s); return 1; } } ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: PROBLEM: Data corruption when pasting large data to terminal 2012-02-15 18:50 ` PROBLEM: Data corruption when pasting large data to terminal Egmont Koblinger 2012-02-15 23:30 ` Greg KH @ 2012-02-15 23:58 ` Parag Warudkar 2012-02-16 0:10 ` Greg KH 1 sibling, 1 reply; 18+ messages in thread From: Parag Warudkar @ 2012-02-15 23:58 UTC (permalink / raw) To: Egmont Koblinger; +Cc: gregkh, linux-kernel On Wed, Feb 15, 2012 at 1:50 PM, Egmont Koblinger <egmont@gmail.com> wrote: > Hi, > > Short summary: When pasting large amount of data (>4kB) to terminals, > often the data gets mangled. > > How to reproduce: > Create a text file that contains this line about 100 times: > a=(123456789123456789123456789123456789123456789123456789123456789) > (also available at http://pastebin.com/LAH2bmaw for a while) > and then copy-paste its entire contents in one step into a "bash" or > "python" running in a graphical terminal. > FWIW, this also works fine on cygwin / Windows 7. No errors. $ bash --version GNU bash, version 4.1.10(4)-release (i686-pc-cygwin) Unsure what that means though - probably nothing! Greg - when you said it works in vim - since of course vim isn't 'parsing' the input may be you did not see an error - or did you actually verify all 4KB somehow? ;) Parag ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: PROBLEM: Data corruption when pasting large data to terminal 2012-02-15 23:58 ` Parag Warudkar @ 2012-02-16 0:10 ` Greg KH 2012-02-16 11:42 ` Egmont Koblinger 0 siblings, 1 reply; 18+ messages in thread From: Greg KH @ 2012-02-16 0:10 UTC (permalink / raw) To: Parag Warudkar; +Cc: Egmont Koblinger, linux-kernel On Wed, Feb 15, 2012 at 06:58:12PM -0500, Parag Warudkar wrote: > On Wed, Feb 15, 2012 at 1:50 PM, Egmont Koblinger <egmont@gmail.com> wrote: > > Hi, > > > > Short summary: When pasting large amount of data (>4kB) to terminals, > > often the data gets mangled. > > > > How to reproduce: > > Create a text file that contains this line about 100 times: > > a=(123456789123456789123456789123456789123456789123456789123456789) > > (also available at http://pastebin.com/LAH2bmaw for a while) > > and then copy-paste its entire contents in one step into a "bash" or > > "python" running in a graphical terminal. > > > > FWIW, this also works fine on cygwin / Windows 7. No errors. > > $ bash --version > GNU bash, version 4.1.10(4)-release (i686-pc-cygwin) > > Unsure what that means though - probably nothing! > > Greg - when you said it works in vim - since of course vim isn't > 'parsing' the input may be you did not see an error - or did you > actually verify all 4KB somehow? ;) I verified that the input actually matched the paste buffer. It's pretty trivial to do so. Odds are emacs also does this correctly, anyone care to verify that? thanks, greg k-h ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: PROBLEM: Data corruption when pasting large data to terminal 2012-02-16 0:10 ` Greg KH @ 2012-02-16 11:42 ` Egmont Koblinger 0 siblings, 0 replies; 18+ messages in thread From: Egmont Koblinger @ 2012-02-16 11:42 UTC (permalink / raw) To: Greg KH; +Cc: Parag Warudkar, linux-kernel On Thu, Feb 16, 2012 at 01:10, Greg KH <gregkh@linuxfoundation.org> wrote: > > Odds are emacs also does this correctly, anyone care to verify that? Actually, thanks for the hint, emacs is the first editor where I could easily trigger the bug. I made the input data a bit longer (200 lines), and it's being pasted incorrectly all the time. Stracing xterm contains exactly this, with the "= 68" at the end, 200 times: write(5, "a=(12345678912345678912345678912"..., 68) = 68 Contrary to xterm, gnome-terminal doesn't split the buffer at newlines, it tries to write to /dev/ptmx in one single step. In this example I extended the file to 1000 lines long (68000 bytes), here's what strace gnome-terminal says: open("/dev/ptmx", O_RDWR) = 17 [...] write(17, "a=(12345678912345678912345678912"..., 68000) = 65280 [...] write(17, "a=(12345678912345678912345678912"..., 2720) = 2720 So copy-paste starts misbehaving after 4kB, which suggests a buffer size of around that size; on the other hand, writes to /dev/ptmx can return up to almost 64kB, which suggests a much larger terminal buffer which makes it strange that it starts misbehaving as early as at 4kB. On a side note, apparently gnome-terminal handles short writes correctly. (It's a mere accident that the size of the first write, 65280, is dividable by the line length of my example, 68. A different input shows that the second write indeed continues sending the buffer from the correct offset, from the middle of a line. The return value of the first write is the same, 65280 in that case too.) thx, egmont ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2012-02-20 21:30 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CAGWcZkJs2uQHM=7wmf1JOLmUeS3Mxo5L4arMuMQSez1mvJLKQA@mail.gmail.com>
2012-02-15 18:50 ` PROBLEM: Data corruption when pasting large data to terminal Egmont Koblinger
2012-02-15 23:30 ` Greg KH
2012-02-16 0:39 ` Egmont Koblinger
2012-02-16 0:54 ` Greg KH
2012-02-16 1:12 ` Egmont Koblinger
2012-02-17 19:28 ` Pavel Machek
2012-02-17 21:57 ` Bruno Prémont
2012-02-19 20:55 ` Egmont Koblinger
2012-02-19 21:14 ` Bruno Prémont
2012-02-19 21:35 ` Alan Cox
2012-02-19 21:41 ` Egmont Koblinger
2012-02-20 17:18 ` Egmont Koblinger
2012-02-20 17:31 ` Pavel Machek
2012-02-20 21:11 ` Egmont Koblinger
2012-02-20 21:29 ` Egmont Koblinger
2012-02-15 23:58 ` Parag Warudkar
2012-02-16 0:10 ` Greg KH
2012-02-16 11:42 ` Egmont Koblinger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox