* pty_chars_in_buffer NULL pointer (kernel oops) @ 2005-02-18 8:56 nuclearcat 2005-02-25 23:04 ` Marcelo Tosatti 0 siblings, 1 reply; 9+ messages in thread From: nuclearcat @ 2005-02-18 8:56 UTC (permalink / raw) To: linux-kernel Dear, linux-kernel. Is discussed at http://kerneltrap.org/mailarchive/1/message/12508/thread bug fixed in 2.4.x tree? Cause seems i have downloaded 2.4.29, and it is not fixed (still my kernel on vpn server crashing almost at start), i have grepped fast pre and bk patches, but didnt found any fixed related to tty/pty. Provided in thread patch from Linus working, but after night i have checked server, and see load average jumped to 700. Can anybody help in that? I am not kernel guru to provide a patch, but seems by search in google it is actual problem for people, who own poptop vpn servers, it is really causing serious instability for servers. -- With best regards, GlobalProof Globax Division Manager, Denys Fedoryshchenko mailto:denys@globalproof.net ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: pty_chars_in_buffer NULL pointer (kernel oops) 2005-02-18 8:56 pty_chars_in_buffer NULL pointer (kernel oops) nuclearcat @ 2005-02-25 23:04 ` Marcelo Tosatti 2005-02-26 3:42 ` Linus Torvalds 2005-02-27 14:52 ` Re[2]: " nuclearcat 0 siblings, 2 replies; 9+ messages in thread From: Marcelo Tosatti @ 2005-02-25 23:04 UTC (permalink / raw) To: nuclearcat, torvalds, Alan Cox; +Cc: linux-kernel Hi, On Fri, Feb 18, 2005 at 10:56:53AM +0200, nuclearcat wrote: > Is discussed at > http://kerneltrap.org/mailarchive/1/message/12508/thread > bug fixed in 2.4.x tree? Cause seems i have downloaded 2.4.29, and it > is not fixed (still my kernel on vpn server crashing almost at start), > i have grepped fast pre and bk patches, but didnt found any fixed > related to tty/pty. Can you please post the oops? Have you done so already? What makes you think it is the same race discussed in the above thread? BTW, I fail to see any drivers/char/pty.c change related to the race which triggers the pty_chars_in_buffer->0 oops. Quoting the first message from thread you mention: "That last call trace entry is the call in pty_chars_in_buffer() to /* The ldisc must report 0 if no characters available to be read */ count = to->ldisc.chars_in_buffer(to); " Alan, Linus, what correction to the which the above thread discusses has been deployed? > Provided in thread patch from Linus working, but after night i have > checked server, and see load average jumped to 700. > Can anybody help in that? I am not kernel guru to provide a patch, but > seems by search in google it is actual problem for people, who own > poptop vpn servers, it is really causing serious instability for > servers. Can you compile a list of such v2.4 reports? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: pty_chars_in_buffer NULL pointer (kernel oops) 2005-02-25 23:04 ` Marcelo Tosatti @ 2005-02-26 3:42 ` Linus Torvalds 2005-02-26 11:18 ` Alan Cox 2005-02-27 14:52 ` Re[2]: " nuclearcat 1 sibling, 1 reply; 9+ messages in thread From: Linus Torvalds @ 2005-02-26 3:42 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: nuclearcat, Alan Cox, linux-kernel On Fri, 25 Feb 2005, Marcelo Tosatti wrote: > > BTW, I fail to see any drivers/char/pty.c change related to the race which triggers > the pty_chars_in_buffer->0 oops. Indeed, I don't think 2.6.x got that merged, because it was never really clear _which_ fix was the right one (the extra locking was absolutely deadly for performance, the hacky one was a tad _too_ hacky ;) Alan, did you ever decide what the proper locking would be? I've applied the hacky "works by hiding the problem" patch for 2.6.11 which didn't have any subtle performance issues associated with it. Linus ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: pty_chars_in_buffer NULL pointer (kernel oops) 2005-02-26 3:42 ` Linus Torvalds @ 2005-02-26 11:18 ` Alan Cox 0 siblings, 0 replies; 9+ messages in thread From: Alan Cox @ 2005-02-26 11:18 UTC (permalink / raw) To: Linus Torvalds; +Cc: Marcelo Tosatti, nuclearcat, Linux Kernel Mailing List I couldnt duplicate the performance hit so I believe the proper fix not the hack is the right one ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re[2]: pty_chars_in_buffer NULL pointer (kernel oops) 2005-02-25 23:04 ` Marcelo Tosatti 2005-02-26 3:42 ` Linus Torvalds @ 2005-02-27 14:52 ` nuclearcat 2005-02-27 15:03 ` Re[3]: " nuclearcat 1 sibling, 1 reply; 9+ messages in thread From: nuclearcat @ 2005-02-27 14:52 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: torvalds, Alan Cox, linux-kernel Dear, Marcelo. You wrote Saturday, February 26, 2005, 1:04:32 AM: Sorry about delay, i had switched kernel to non-SMP mode. I cannot debug on kernel (it is loaded VPN server, and there is no redundancy for now). I have only few old oopses, saved before (it is on old redhat kernel) Feb 16 06:44:41 nss kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000 Feb 16 06:44:41 nss kernel: printing eip: Feb 16 06:44:41 nss kernel: 00000000 Feb 16 06:44:41 nss kernel: *pde = 00000000 Feb 16 06:44:41 nss kernel: Oops: 0000 Feb 16 06:44:41 nss kernel: cls_u32 sch_sfq sch_cbq ip_nat_ftp ip_conntrack_ftp tun ipt_REJECT ipt_REDIRECT nls_iso8859-1 loop cipcb ip_gre ipip ppp_async pp Feb 16 06:44:41 nss kernel: CPU: 3 Feb 16 06:44:41 nss kernel: EIP: 0060:[<00000000>] Not tainted Feb 16 06:44:41 nss kernel: EFLAGS: 00010286 Feb 16 06:44:41 nss kernel: Feb 16 06:44:41 nss kernel: EIP is at [unresolved] (2.4.20-20.9smp) Feb 16 06:44:41 nss kernel: eax: d4b26000 ebx: ce7fe000 ecx: c01997c0 edx: ef7c6b80 Feb 16 06:44:41 nss kernel: esi: 00000000 edi: ce7fe000 ebp: e040a880 esp: cfdf7ee0 Feb 16 06:44:41 nss kernel: ds: 0068 es: 0068 ss: 0068 Feb 16 06:44:41 nss kernel: Process pptpctrl (pid: 15960, stackpage=cfdf7000) Feb 16 06:44:41 nss kernel: Stack: c019d839 d4b26000 00000000 c019b2e6 ce7fe000 ce7fe974 cfdf7f48 ce7fe000 Feb 16 06:44:41 nss kernel: e040a880 00000004 00000010 c0197a15 ce7fe000 e040a880 00000000 00000000 Feb 16 06:44:41 nss kernel: e040a880 c01662b7 e040a880 00000000 cfdf6000 00000145 cfdf6000 00001962 Feb 16 06:44:41 nss kernel: Call Trace: [<c019d839>] pty_chars_in_buffer [kernel] 0x39 (0xcfdf7ee0)) Feb 16 06:44:41 nss kernel: [<c019b2e6>] normal_poll [kernel] 0x106 (0xcfdf7eec)) Feb 16 06:44:41 nss kernel: [<c0197a15>] tty_poll [kernel] 0x85 (0xcfdf7f0c)) Feb 16 06:44:41 nss kernel: [<c01662b7>] do_select [kernel] 0x247 (0xcfdf7f24)) Feb 16 06:44:41 nss kernel: [<c016663e>] sys_select [kernel] 0x34e (0xcfdf7f60)) Feb 16 06:44:41 nss kernel: [<c01098cf>] system_call [kernel] 0x33 (0xcfdf7fc0)) Feb 16 06:44:41 nss kernel: Feb 16 06:44:41 nss kernel: Feb 16 06:44:41 nss kernel: Code: Bad EIP value. in new kernel there is no debug messages to find where is problem, but problem looks very similar Feb 17 13:13:54 nss kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000 Feb 17 13:13:54 nss kernel: printing eip: Feb 17 13:13:54 nss kernel: 00000000 Feb 17 13:13:54 nss kernel: *pde = 00000000 Feb 17 13:13:54 nss kernel: Oops: 0000 Feb 17 13:13:54 nss kernel: CPU: 3 Feb 17 13:13:54 nss kernel: EIP: 0010:[<00000000>] Not tainted Feb 17 13:13:54 nss kernel: EFLAGS: 00010286 Feb 17 13:13:54 nss kernel: eax: ec32e000 ebx: f6891000 ecx: c01f56a0 edx: d6547980 Feb 17 13:13:54 nss kernel: esi: f6891000 edi: 00000000 ebp: f39c4c00 esp: d66bbed8 Feb 17 13:13:54 nss kernel: ds: 0018 es: 0018 ss: 0018 Feb 17 13:13:54 nss kernel: Process pptpctrl (pid: 10632, stackpage=d66bb000) Feb 17 13:13:54 nss kernel: Stack: c01f9829 ec32e000 00000000 c01f7e66 f6891000 00000010 00000202 f68910c0 Feb 17 13:13:54 nss kernel: f6891000 f39c4c00 00000000 c01f3bb0 f6891000 f39c4c00 00000000 00000000 Feb 17 13:13:54 nss kernel: f39c4c00 00000004 00000010 c0153a87 f39c4c00 00000000 d66ba000 00000145 Feb 17 13:13:54 nss kernel: Call Trace: [<c01f9829>] [<c01f7e66>] [<c01f3bb0>] [<c0153a87>] [<c0153e0e>] Feb 17 13:13:54 nss kernel: [<c010ae99>] [<c0108f67>] Feb 17 13:13:54 nss kernel: Feb 17 13:13:54 nss kernel: Code: Bad EIP value. And problem disappearing in non-SMP kernel. > Hi, > On Fri, Feb 18, 2005 at 10:56:53AM +0200, nuclearcat wrote: >> Is discussed at >> http://kerneltrap.org/mailarchive/1/message/12508/thread >> bug fixed in 2.4.x tree? Cause seems i have downloaded 2.4.29, and it >> is not fixed (still my kernel on vpn server crashing almost at start), >> i have grepped fast pre and bk patches, but didnt found any fixed >> related to tty/pty. > Can you please post the oops? Have you done so already? > What makes you think it is the same race discussed in the above thread? > BTW, I fail to see any drivers/char/pty.c change related to the race which triggers > the pty_chars_in_buffer->0 oops. > Quoting the first message from thread you mention: > "That last call trace entry is the call in pty_chars_in_buffer() to > /* The ldisc must report 0 if no characters available to be read */ > count = to->ldisc.chars_in_buffer(to); > " > Alan, Linus, what correction to the which the above thread discusses has > been deployed? >> Provided in thread patch from Linus working, but after night i have >> checked server, and see load average jumped to 700. >> Can anybody help in that? I am not kernel guru to provide a patch, but >> seems by search in google it is actual problem for people, who own >> poptop vpn servers, it is really causing serious instability for >> servers. > Can you compile a list of such v2.4 reports? > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- With best regards, GlobalProof Globax Division Manager, Denys Fedoryshchenko mailto:denys@globalproof.net ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re[3]: pty_chars_in_buffer NULL pointer (kernel oops) 2005-02-27 14:52 ` Re[2]: " nuclearcat @ 2005-02-27 15:03 ` nuclearcat 0 siblings, 0 replies; 9+ messages in thread From: nuclearcat @ 2005-02-27 15:03 UTC (permalink / raw) To: nuclearcat; +Cc: linux-kernel Dear, nuclearcat. You wrote Sunday, February 27, 2005, 4:52:54 PM: P.S. new kernel - 2.4.29 vanilla > Dear, Marcelo. > You wrote Saturday, February 26, 2005, 1:04:32 AM: > Sorry about delay, i had switched kernel to non-SMP mode. > I cannot debug on kernel (it is loaded VPN server, and there is no > redundancy for now). > I have only few old oopses, saved before (it is on old redhat kernel) > Feb 16 06:44:41 nss kernel: Unable to handle kernel NULL pointer > dereference at virtual address 00000000 > Feb 16 06:44:41 nss kernel: printing eip: > Feb 16 06:44:41 nss kernel: 00000000 > Feb 16 06:44:41 nss kernel: *pde = 00000000 > Feb 16 06:44:41 nss kernel: Oops: 0000 > Feb 16 06:44:41 nss kernel: cls_u32 sch_sfq sch_cbq ip_nat_ftp > ip_conntrack_ftp tun ipt_REJECT ipt_REDIRECT nls_iso8859-1 loop > cipcb ip_gre ipip ppp_async pp > Feb 16 06:44:41 nss kernel: CPU: 3 > Feb 16 06:44:41 nss kernel: EIP: 0060:[<00000000>] Not tainted > Feb 16 06:44:41 nss kernel: EFLAGS: 00010286 > Feb 16 06:44:41 nss kernel: > Feb 16 06:44:41 nss kernel: EIP is at [unresolved] (2.4.20-20.9smp) > Feb 16 06:44:41 nss kernel: eax: d4b26000 ebx: ce7fe000 ecx: c01997c0 edx: ef7c6b80 > Feb 16 06:44:41 nss kernel: esi: 00000000 edi: ce7fe000 ebp: e040a880 esp: cfdf7ee0 > Feb 16 06:44:41 nss kernel: ds: 0068 es: 0068 ss: 0068 > Feb 16 06:44:41 nss kernel: Process pptpctrl (pid: 15960, stackpage=cfdf7000) > Feb 16 06:44:41 nss kernel: Stack: c019d839 d4b26000 00000000 > c019b2e6 ce7fe000 ce7fe974 cfdf7f48 ce7fe000 > Feb 16 06:44:41 nss kernel: e040a880 00000004 00000010 > c0197a15 ce7fe000 e040a880 00000000 00000000 > Feb 16 06:44:41 nss kernel: e040a880 c01662b7 e040a880 > 00000000 cfdf6000 00000145 cfdf6000 00001962 > Feb 16 06:44:41 nss kernel: Call Trace: [<c019d839>] > pty_chars_in_buffer [kernel] 0x39 (0xcfdf7ee0)) > Feb 16 06:44:41 nss kernel: [<c019b2e6>] normal_poll [kernel] 0x106 (0xcfdf7eec)) > Feb 16 06:44:41 nss kernel: [<c0197a15>] tty_poll [kernel] 0x85 (0xcfdf7f0c)) > Feb 16 06:44:41 nss kernel: [<c01662b7>] do_select [kernel] 0x247 (0xcfdf7f24)) > Feb 16 06:44:41 nss kernel: [<c016663e>] sys_select [kernel] 0x34e (0xcfdf7f60)) > Feb 16 06:44:41 nss kernel: [<c01098cf>] system_call [kernel] 0x33 (0xcfdf7fc0)) > Feb 16 06:44:41 nss kernel: > Feb 16 06:44:41 nss kernel: > Feb 16 06:44:41 nss kernel: Code: Bad EIP value. > in new kernel there is no debug messages to find where is problem, but > problem looks very similar > Feb 17 13:13:54 nss kernel: Unable to handle kernel NULL pointer > dereference at virtual address 00000000 > Feb 17 13:13:54 nss kernel: printing eip: > Feb 17 13:13:54 nss kernel: 00000000 > Feb 17 13:13:54 nss kernel: *pde = 00000000 > Feb 17 13:13:54 nss kernel: Oops: 0000 > Feb 17 13:13:54 nss kernel: CPU: 3 > Feb 17 13:13:54 nss kernel: EIP: 0010:[<00000000>] Not tainted > Feb 17 13:13:54 nss kernel: EFLAGS: 00010286 > Feb 17 13:13:54 nss kernel: eax: ec32e000 ebx: f6891000 ecx: c01f56a0 edx: d6547980 > Feb 17 13:13:54 nss kernel: esi: f6891000 edi: 00000000 ebp: f39c4c00 esp: d66bbed8 > Feb 17 13:13:54 nss kernel: ds: 0018 es: 0018 ss: 0018 > Feb 17 13:13:54 nss kernel: Process pptpctrl (pid: 10632, stackpage=d66bb000) > Feb 17 13:13:54 nss kernel: Stack: c01f9829 ec32e000 00000000 > c01f7e66 f6891000 00000010 00000202 f68910c0 > Feb 17 13:13:54 nss kernel: f6891000 f39c4c00 00000000 > c01f3bb0 f6891000 f39c4c00 00000000 00000000 > Feb 17 13:13:54 nss kernel: f39c4c00 00000004 00000010 > c0153a87 f39c4c00 00000000 d66ba000 00000145 > Feb 17 13:13:54 nss kernel: Call Trace: [<c01f9829>] > [<c01f7e66>] [<c01f3bb0>] [<c0153a87>] [<c0153e0e>] > Feb 17 13:13:54 nss kernel: [<c010ae99>] [<c0108f67>] > Feb 17 13:13:54 nss kernel: > Feb 17 13:13:54 nss kernel: Code: Bad EIP value. > And problem disappearing in non-SMP kernel. >> Hi, >> On Fri, Feb 18, 2005 at 10:56:53AM +0200, nuclearcat wrote: >>> Is discussed at >>> http://kerneltrap.org/mailarchive/1/message/12508/thread >>> bug fixed in 2.4.x tree? Cause seems i have downloaded 2.4.29, and it >>> is not fixed (still my kernel on vpn server crashing almost at start), >>> i have grepped fast pre and bk patches, but didnt found any fixed >>> related to tty/pty. >> Can you please post the oops? Have you done so already? >> What makes you think it is the same race discussed in the above thread? >> BTW, I fail to see any drivers/char/pty.c change related to the race which triggers >> the pty_chars_in_buffer->0 oops. >> Quoting the first message from thread you mention: >> "That last call trace entry is the call in pty_chars_in_buffer() to >> /* The ldisc must report 0 if no characters available to be read */ >> count = to->ldisc.chars_in_buffer(to); >> " >> Alan, Linus, what correction to the which the above thread discusses has >> been deployed? >>> Provided in thread patch from Linus working, but after night i have >>> checked server, and see load average jumped to 700. >>> Can anybody help in that? I am not kernel guru to provide a patch, but >>> seems by search in google it is actual problem for people, who own >>> poptop vpn servers, it is really causing serious instability for >>> servers. >> Can you compile a list of such v2.4 reports? >> - >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ -- With best regards, GlobalProof Globax Division Manager, Denys Fedoryshchenko mailto:denys@globalproof.net ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: pty_chars_in_buffer NULL pointer (kernel oops) @ 2005-02-27 10:00 Marcelo Tosatti 2005-02-27 19:46 ` Linus Torvalds 0 siblings, 1 reply; 9+ messages in thread From: Marcelo Tosatti @ 2005-02-27 10:00 UTC (permalink / raw) To: nuclearcat, torvalds, Alan Cox; +Cc: linux-kernel (resending since first message didnt seem to go through) Hi, On Fri, Feb 18, 2005 at 10:56:53AM +0200, nuclearcat wrote: > Is discussed at > http://kerneltrap.org/mailarchive/1/message/12508/thread > bug fixed in 2.4.x tree? Cause seems i have downloaded 2.4.29, and it > is not fixed (still my kernel on vpn server crashing almost at start), > i have grepped fast pre and bk patches, but didnt found any fixed > related to tty/pty. Can you please post the oops? Have you done so already? What makes you think it is the same race discussed in the above thread? BTW, I fail to see any drivers/char/pty.c change related to the race which triggers the pty_chars_in_buffer->0 oops. Quoting the first message from thread you mention: "That last call trace entry is the call in pty_chars_in_buffer() to /* The ldisc must report 0 if no characters available to be read */ count = to->ldisc.chars_in_buffer(to); " Alan, Linus, what correction to the which the above thread discusses has been deployed? > Provided in thread patch from Linus working, but after night i have > checked server, and see load average jumped to 700. > Can anybody help in that? I am not kernel guru to provide a patch, but > seems by search in google it is actual problem for people, who own > poptop vpn servers, it is really causing serious instability for > servers. Can you compile a list of such v2.4 reports? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: pty_chars_in_buffer NULL pointer (kernel oops) 2005-02-27 10:00 Marcelo Tosatti @ 2005-02-27 19:46 ` Linus Torvalds 2005-04-13 18:43 ` Jason Baron 0 siblings, 1 reply; 9+ messages in thread From: Linus Torvalds @ 2005-02-27 19:46 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: nuclearcat, Alan Cox, linux-kernel On Sun, 27 Feb 2005, Marcelo Tosatti wrote: > > Alan, Linus, what correction to the which the above thread discusses has > been deployed? This is the hacky "hide the problem" patch that is in my current tree (and was discussed in the original thread some time ago). It's in no way "correct", in that the race hasn't actually gone away by this patch, but the patch makes it unimportant. We may end up calling a stale line discipline, which is still very wrong, but it so happens that we don't much care in practice. I think that in a 2.4.x tree there are some theoretical SMP races with module unloading etc (which the 2.6.x code doesn't have because module unload stops the other CPU's - maybe that part got backported to 2.4.x?), but quite frankly, I suspect that even in 2.4.x they are entirely theoretical and impossible to actually hit. And again, in theory some line discipline might do something strange in it's "chars_in_buffer" routine that would be problematic. In practice that's just not the case: the "chars_in_buffer()" routine might return a bogus _value_ for a stale line discipline thing, but none of them seem to follow any pointers that might have become invalid (and in fact, most ldiscs don't even have that function). So while this patch is wrogn in theory, it does have the advantage of being (a) very safe minimal patch and (b) fixing the problem in practice with no performance downside. I still feel a bit guilty about it, though. Linus --- # This is a BitKeeper generated diff -Nru style patch. # # ChangeSet # 2005/02/25 19:39:39-08:00 torvalds@ppc970.osdl.org # Fix possible pty line discipline race. # # This ain't pretty. Real fix under discussion. # # drivers/char/pty.c # 2005/02/25 19:39:32-08:00 torvalds@ppc970.osdl.org +4 -2 # Fix possible pty line discipline race. # # This ain't pretty. Real fix under discussion. # diff -Nru a/drivers/char/pty.c b/drivers/char/pty.c --- a/drivers/char/pty.c 2005-02-27 11:31:57 -08:00 +++ b/drivers/char/pty.c 2005-02-27 11:31:57 -08:00 @@ -149,13 +149,15 @@ static int pty_chars_in_buffer(struct tty_struct *tty) { struct tty_struct *to = tty->link; + ssize_t (*chars_in_buffer)(struct tty_struct *); int count; - if (!to || !to->ldisc.chars_in_buffer) + /* We should get the line discipline lock for "tty->link" */ + if (!to || !(chars_in_buffer = to->ldisc.chars_in_buffer)) return 0; /* The ldisc must report 0 if no characters available to be read */ - count = to->ldisc.chars_in_buffer(to); + count = chars_in_buffer(to); if (tty->driver->subtype == PTY_TYPE_SLAVE) return count; ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: pty_chars_in_buffer NULL pointer (kernel oops) 2005-02-27 19:46 ` Linus Torvalds @ 2005-04-13 18:43 ` Jason Baron 0 siblings, 0 replies; 9+ messages in thread From: Jason Baron @ 2005-04-13 18:43 UTC (permalink / raw) To: Linus Torvalds; +Cc: Marcelo Tosatti, nuclearcat, Alan Cox, linux-kernel On Sun, 27 Feb 2005, Linus Torvalds wrote: > > > On Sun, 27 Feb 2005, Marcelo Tosatti wrote: > > > > Alan, Linus, what correction to the which the above thread discusses has > > been deployed? > > This is the hacky "hide the problem" patch that is in my current tree (and > was discussed in the original thread some time ago). > > It's in no way "correct", in that the race hasn't actually gone away by > this patch, but the patch makes it unimportant. We may end up calling a > stale line discipline, which is still very wrong, but it so happens that > we don't much care in practice. > > I think that in a 2.4.x tree there are some theoretical SMP races with > module unloading etc (which the 2.6.x code doesn't have because module > unload stops the other CPU's - maybe that part got backported to 2.4.x?), > but quite frankly, I suspect that even in 2.4.x they are entirely > theoretical and impossible to actually hit. > > And again, in theory some line discipline might do something strange in > it's "chars_in_buffer" routine that would be problematic. In practice > that's just not the case: the "chars_in_buffer()" routine might return a > bogus _value_ for a stale line discipline thing, but none of them seem to > follow any pointers that might have become invalid (and in fact, most > ldiscs don't even have that function). > > So while this patch is wrogn in theory, it does have the advantage of > being (a) very safe minimal patch and (b) fixing the problem in practice > with no performance downside. > > I still feel a bit guilty about it, though. > > Linus > > --- > # This is a BitKeeper generated diff -Nru style patch. > # > # ChangeSet > # 2005/02/25 19:39:39-08:00 torvalds@ppc970.osdl.org > # Fix possible pty line discipline race. > # > # This ain't pretty. Real fix under discussion. > # > # drivers/char/pty.c > # 2005/02/25 19:39:32-08:00 torvalds@ppc970.osdl.org +4 -2 > # Fix possible pty line discipline race. > # > # This ain't pretty. Real fix under discussion. > # > diff -Nru a/drivers/char/pty.c b/drivers/char/pty.c > --- a/drivers/char/pty.c 2005-02-27 11:31:57 -08:00 > +++ b/drivers/char/pty.c 2005-02-27 11:31:57 -08:00 > @@ -149,13 +149,15 @@ > static int pty_chars_in_buffer(struct tty_struct *tty) > { > struct tty_struct *to = tty->link; > + ssize_t (*chars_in_buffer)(struct tty_struct *); > int count; > > - if (!to || !to->ldisc.chars_in_buffer) > + /* We should get the line discipline lock for "tty->link" */ > + if (!to || !(chars_in_buffer = to->ldisc.chars_in_buffer)) > return 0; > > /* The ldisc must report 0 if no characters available to be read */ > - count = to->ldisc.chars_in_buffer(to); > + count = chars_in_buffer(to); > > if (tty->driver->subtype == PTY_TYPE_SLAVE) return count; > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > hi, I've implented another approach to this issue, based on some previous suggestions. The idea is to lock both sides of a ptmx/pty pair during line discipline changing. As previously discussed this has the advantage of being on a less often used path, as opposed to locking the ptmx/pty pair on read/write/poll paths. The patch below, takes both ldisc locks in either order b/c the locks are both taken under the same spinlock(). I thought about locking the ptmx/pty separately, such as master always first but that introduces a 3 way deadlock. For example, process 1 does a blocking read on the slave side. Then, process 2 does an ldisc change on the slave side, which acquires the master ldisc lock but not the slave's. Finally, process 3 does a write which blocks on the process 2's ldisc reference. This patch does introduce some changes in semantics. For example, a line discipline change on side 'a' of a ptmx/pty pair, will now wait for a read/write to complete on the other side, or side 'b'. The current behavior is to simply wait for any read/writes on only side 'a', not both sides 'a' and 'b'. I think this behavior makes sense, but i wanted to point it out. I've tested the patch with a bunch of read/write/poll while changing the line discipline out from underneath. This patch obviates the need for the above "hide the problem" patch. thanks, -Jason --- linux/drivers/char/tty_io.c.bak +++ linux/drivers/char/tty_io.c @@ -458,21 +458,19 @@ static void tty_ldisc_enable(struct tty_ static int tty_set_ldisc(struct tty_struct *tty, int ldisc) { - int retval = 0; - struct tty_ldisc o_ldisc; + int retval = 0; + struct tty_ldisc o_ldisc; char buf[64]; int work; unsigned long flags; struct tty_ldisc *ld; + struct tty_struct *o_tty; if ((ldisc < N_TTY) || (ldisc >= NR_LDISCS)) return -EINVAL; restart: - if (tty->ldisc.num == ldisc) - return 0; /* We are already in the desired discipline */ - ld = tty_ldisc_get(ldisc); /* Eduardo Blanco <ejbs@cs.cs.com.uy> */ /* Cyrus Durgin <cider@speakeasy.org> */ @@ -483,45 +481,74 @@ restart: if (ld == NULL) return -EINVAL; - o_ldisc = tty->ldisc; - tty_wait_until_sent(tty, 0); + if (tty->ldisc.num == ldisc) { + tty_ldisc_put(ldisc); + return 0; + } + + o_ldisc = tty->ldisc; + o_tty = tty->link; + /* * Make sure we don't change while someone holds a * reference to the line discipline. The TTY_LDISC bit * prevents anyone taking a reference once it is clear. * We need the lock to avoid racing reference takers. */ - + spin_lock_irqsave(&tty_ldisc_lock, flags); - if(tty->ldisc.refcount) - { - /* Free the new ldisc we grabbed. Must drop the lock - first. */ + if (tty->ldisc.refcount || (o_tty && o_tty->ldisc.refcount)) { + if(tty->ldisc.refcount) { + /* Free the new ldisc we grabbed. Must drop the lock + first. */ + spin_unlock_irqrestore(&tty_ldisc_lock, flags); + tty_ldisc_put(ldisc); + /* + * There are several reasons we may be busy, including + * random momentary I/O traffic. We must therefore + * retry. We could distinguish between blocking ops + * and retries if we made tty_ldisc_wait() smarter. That + * is up for discussion. + */ + if (wait_event_interruptible(tty_ldisc_wait, tty->ldisc.refcount == 0) < 0) + return -ERESTARTSYS; + goto restart; + } + if(o_tty && o_tty->ldisc.refcount) { + spin_unlock_irqrestore(&tty_ldisc_lock, flags); + tty_ldisc_put(ldisc); + if (wait_event_interruptible(tty_ldisc_wait, o_tty->ldisc.refcount == 0) < 0) + return -ERESTARTSYS; + goto restart; + } + } + + /* if the TTY_LDISC bit is set, then we are racing against another ldisc change */ + + if (!test_bit(TTY_LDISC, &tty->flags)) { spin_unlock_irqrestore(&tty_ldisc_lock, flags); tty_ldisc_put(ldisc); - /* - * There are several reasons we may be busy, including - * random momentary I/O traffic. We must therefore - * retry. We could distinguish between blocking ops - * and retries if we made tty_ldisc_wait() smarter. That - * is up for discussion. - */ - if(wait_event_interruptible(tty_ldisc_wait, tty->ldisc.refcount == 0) < 0) - return -ERESTARTSYS; + ld = tty_ldisc_ref_wait(tty); + tty_ldisc_deref(ld); goto restart; } - clear_bit(TTY_LDISC, &tty->flags); + + clear_bit(TTY_LDISC, &tty->flags); clear_bit(TTY_DONT_FLIP, &tty->flags); - spin_unlock_irqrestore(&tty_ldisc_lock, flags); - + if (o_tty) { + clear_bit(TTY_LDISC, &o_tty->flags); + clear_bit(TTY_DONT_FLIP, &o_tty->flags); + } + spin_unlock_irqrestore(&tty_ldisc_lock, flags); + /* * From this point on we know nobody has an ldisc * usage reference, nor can they obtain one until * we say so later on. */ - + work = cancel_delayed_work(&tty->flip.work); /* * Wait for ->hangup_work and ->flip.work handlers to terminate @@ -572,10 +599,12 @@ restart: */ tty_ldisc_enable(tty); + if (o_tty) + tty_ldisc_enable(o_tty); /* Restart it in case no characters kick it off. Safe if already running */ - if(work) + if (work) schedule_delayed_work(&tty->flip.work, 1); return retval; } ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2005-04-13 19:44 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-02-18 8:56 pty_chars_in_buffer NULL pointer (kernel oops) nuclearcat 2005-02-25 23:04 ` Marcelo Tosatti 2005-02-26 3:42 ` Linus Torvalds 2005-02-26 11:18 ` Alan Cox 2005-02-27 14:52 ` Re[2]: " nuclearcat 2005-02-27 15:03 ` Re[3]: " nuclearcat -- strict thread matches above, loose matches on Subject: below -- 2005-02-27 10:00 Marcelo Tosatti 2005-02-27 19:46 ` Linus Torvalds 2005-04-13 18:43 ` Jason Baron
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox