* Suspecting kernel timing bug in ppp or tty drivers
@ 2014-03-10 13:36 Mats Kärrman
2014-03-10 19:40 ` Greg KH
0 siblings, 1 reply; 8+ messages in thread
From: Mats Kärrman @ 2014-03-10 13:36 UTC (permalink / raw)
To: linux-ppp@vger.kernel.org, linux-serial@vger.kernel.org
Hi Linux serial & PPP experts,
For a long time now I have been chasing a problem in my PPP connection.
The fault scenario is as follows:
- After reboot establishing a PPP connection works OK
- Network communication over PPP works OK
- The PPP connection is restarted - could be for any of several reasons; peer
drops the connection gracefully, local restart due to network problems,
local restart due to user request, et.c.
- After closing down the connection, it is not possible to re-connect again.
The Error messages from pppd looks something like:
Jan 1 01:04:42 mymachine local2.err chat[631]: Can't get terminal parameters: Input/output error
Jan 1 01:04:42 mymachine daemon.debug pppd[629]: Script /usr/sbin/chat -v -f /etc/ppp/chats/gprs-conn -T my-apn (pid 630), status = 0x2
Jan 1 01:04:42 mymachine daemon.err pppd[629]: Connect script failed
Some data about the system:
- PPP daemon v2.4.5
- Linux kernel v3.2.51 (same error with v2.6.35 though)
- CPU PowerPC e300c4 (MPC5125)
- Cinterion GPRS modem
- The serial device driver is based on main-line drivers for MPC5200 and
MPC5121 (mpc52xx_uart.c) but with some custom extensions. Works fine for
other applications.
By running strace I have found that it is the ioctl's the PPPD tries to
execute on the tty device that fails after disconnect (I have also tried
including fcntl64 in the strace trace list but not found any such calls that
fails):
-------------------------------------------------------------------------------
root@mymachine:~# strace -e trace=open,close,ioctl pppd dump debug nodetach call gprs
<snip>
open("/etc/passwd", O_RDONLY|O_CLOEXEC) = 4
close(4) = 0
open("/home/root/.ppprc", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/etc/ppp/peers/gprs", O_RDONLY) = 4
open("/etc/ppp/peers/ip-connect", O_RDONLY) = 5
close(5) = 0
close(4) = 0
open("/etc/ppp/options.ttyPSC4", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/dev/ppp", O_RDWR) = 4
close(4) = 0
open("/usr/share/zoneinfo/UTC", O_RDONLY) = -1 ENOENT (No such file or directory)
pppd options in effect:
debug # (from command line)
nodetach # (from command line)
holdoff 2 # (from /etc/ppp/peers/gprs)
linkname gprs # (from /etc/ppp/peers/gprs)
maxfail 5 # (from /etc/ppp/peers/gprs)
unit 0 # (from /etc/ppp/peers/gprs)
dump # (from command line)
noauth # (from /etc/ppp/peers/gprs)
name mycon # (from /etc/ppp/peers/ip-connect)
/dev/ttyPSC4 # (from /etc/ppp/peers/gprs)
115200 # (from /etc/ppp/peers/gprs)
lock # (from /etc/ppp/options)
connect /usr/sbin/chat -v -f /etc/ppp/chats/gprs-conn -T my-apn # (from /etc/ppp/peers/ip-connect)
crtscts # (from /etc/ppp/peers/gprs)
modem # (from /etc/ppp/peers/gprs)
lcp-echo-failure 3 # (from /etc/ppp/peers/gprs)
lcp-echo-interval 10 # (from /etc/ppp/peers/gprs)
ipcp-accept-local # (from /etc/ppp/peers/gprs)
ipcp-accept-remote # (from /etc/ppp/peers/gprs)
noipdefault # (from /etc/ppp/peers/gprs)
defaultroute # (from /etc/ppp/peers/gprs)
replacedefaultroute # (from /etc/ppp/peers/gprs)
usepeerdns # (from /etc/ppp/peers/gprs)
noccp # (from /etc/ppp/peers/gprs)
open("/dev/null", O_RDWR) = 4
open("/var/run/pppd2.tdb", O_RDWR|O_CREAT, 0644) = 6
ioctl(0, TCGETS, {B115200 opost isig icanon echo ...}) = 0
open("/var/run/utmp", O_RDWR|O_LARGEFILE|O_CLOEXEC) = 7
close(7) = 0
open("/var/run/ppp-gprs.pid", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 7
close(7) = 0
open("/var/lock/LCK..ttyPSC4", O_RDWR|O_CREAT|O_EXCL, 0644) = 7
close(7) = 0
open("/dev/ttyPSC4", O_RDWR|O_NONBLOCK) = 7
ioctl(7, TIOCMBIS, [TIOCM_DTR]) = -1 EIO (Input/output error)
ioctl(7, TCGETS, 0xbf817378) = -1 EIO (Input/output error)
ioctl(7, TIOCMBIC, [TIOCM_DTR]) = -1 EIO (Input/output error)
ioctl(7, TIOCMBIS, [TIOCM_DTR]) = -1 EIO (Input/output error)
close(8) = 0
close(9) = 0
close(8) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
Script /usr/sbin/chat -v -f /etc/ppp/chats/gprs-conn -T my-apn finished (pid 3770), status = 0x2
Connect script failed
ioctl(7, TCFLSH, 0x2) = -1 EIO (Input/output error)
ioctl(7, TIOCMBIC, [TIOCM_DTR]) = -1 EIO (Input/output error)
close(7) = 0
root@mymachine:~#
-------------------------------------------------------------------------------
Something that makes the error difficult to debug is that it's occurence is
highly timing dependent. E.g. the use of strace can make the error occur when
it didn't or make it disappear when it did. Also it seems like something in
the network (timing?) also affects the occurance. Given an identical test
setup, some days it won't happen at all, some days it's intermittent and other
days it happens consistently.
Rebooting linux makes it possible to connect again. Sometimes it is enough to
let pppd retry by itself (persist) for a couple of times and the error will go
away but sometimes this is not enough and the pppd will continue trying forever
with the same error.
I have tried chasing the error by inserting printk statements in the kernel code
(mainly in tty_io.c and serial_core.c) but every time the error has suddenly
stopped occuring, I guess as an effect of the changed timing caused by the
printk's.
The gut feeling I have (no hard evidence though) is that there is a state
corruption in the kernel. After the PPP is finnished negotiating a disconnect
it matters when the tty is "hung up". If one goes well before the other
everything is OK but sometimes they collide and leave the tty hanging in limbo.
The following is from a test-run where the peer disconnects and the pppd tries
to reconnect (persist) and fails on the first attempt but not the second
(additional printk's labeled "MAKR:", the calls to "hung_up_tty_ioctl" below
would all have returned EIO, kernel v2.6.35):
-------------------------------------------------------------------------------
<snip>
Jan 7 17:22:52 mymachine daemon.debug pppd[609]: Script /etc/ppp/ip-up finished (pid 32381), status = 0x0
Jan 7 18:22:51 mymachine daemon.debug pppd[609]: rcvd [LCP TermReq id=0x0 "Normal Termination by NCP"]
Jan 7 18:22:51 mymachine daemon.info pppd[609]: LCP terminated by peer (Normal Termination by NCP)
Jan 7 18:22:51 mymachine daemon.info pppd[609]: Connect time 60.0 minutes.
Jan 7 18:22:51 mymachine daemon.info pppd[609]: Sent 0 bytes, received 0 bytes.
Jan 7 18:22:51 mymachine daemon.debug pppd[609]: Script /etc/ppp/ip-down started (pid 1421)
Jan 7 18:22:51 mymachine daemon.debug pppd[609]: sent [LCP TermAck id=0x0]
Jan 7 18:22:51 mymachine daemon.debug pppd[609]: Script /etc/ppp/ip-down finished (pid 1421), status = 0x0
Jan 7 18:22:51 mymachine daemon.info pppd[609]: Hangup (SIGHUP) code 128
Jan 7 18:22:51 mymachine daemon.notice pppd[609]: Modem hangup
Jan 7 18:22:51 mymachine daemon.notice pppd[609]: Connection terminated.
Jan 7 18:22:51 mymachine user.debug kernel: MAKR: uart_shutdown set ttyPSC4 40a0a
Jan 7 18:22:51 mymachine user.debug kernel: MAKR: hung_up_tty_ioctl(0x5417,0xbfba0418) ENOTTY/EIO 1 ?
Jan 7 18:22:52 mymachine user.debug kernel: MAKR: hung_up_tty_ioctl(0x802c7416,0xbfba03d8) ENOTTY/EIO 1 ?
Jan 7 18:22:55 mymachine user.debug kernel: MAKR: uart_startup set ttyPSC4 a02
Jan 7 18:22:55 mymachine user.debug kernel: MAKR: uart_startup clr ttyPSC4 a00
Jan 7 18:22:55 mymachine user.debug kernel: MAKR: uart_block_til_ready
Jan 7 18:22:55 mymachine user.debug kernel: MAKR: tty_ioctl 0x5416 on ttyPSC4 ret 0 at 2
Jan 7 18:22:55 mymachine user.debug kernel: MAKR: tty_ioctl 0x402c7413 on ttyPSC4 ret 0 at 5
Jan 7 18:22:55 mymachine user.debug kernel: MAKR: tty_ioctl 0x802c7416 on ttyPSC4 ret 0 at 5
Jan 7 18:22:55 mymachine user.debug kernel: MAKR: tty_ioctl 0x5417 on ttyPSC4 ret 0 at 2
Jan 7 18:22:55 mymachine user.debug kernel: MAKR: uart_shutdown set ttyPSC4 40a02
Jan 7 18:22:55 mymachine user.debug kernel: MAKR: hung_up_tty_ioctl(0x5416,0xbfba0398) ENOTTY/EIO 1 ?
Jan 7 18:22:55 mymachine local2.err chat[1435]: Can't get terminal parameters: Input/output error
Jan 7 18:22:55 mymachine daemon.debug pppd[609]: Script /usr/sbin/chat -v -f /etc/ppp/chats/gprs-conn -T my-apn finished (pid 1434), status = 0x2
Jan 7 18:22:55 mymachine daemon.err pppd[609]: Connect script failed
Jan 7 18:22:55 mymachine user.debug kernel: MAKR: hung_up_tty_ioctl(0x402c7413,0xbf893768) ENOTTY/EIO 1 ?
Jan 7 18:22:55 mymachine user.debug kernel: MAKR: hung_up_tty_ioctl(0x2000741f,0x2) ENOTTY/EIO 1 ?
Jan 7 18:22:55 mymachine user.debug kernel: MAKR: hung_up_tty_ioctl(0x5417,0xbfba0428) ENOTTY/EIO 1 ?
Jan 7 18:22:56 mymachine daemon.info pppd[609]: Hangup (SIGHUP) code 128
Jan 7 18:22:56 mymachine user.debug kernel: MAKR: hung_up_tty_ioctl(0x802c7416,0xbfba03e8) ENOTTY/EIO 1 ?
Jan 7 18:22:56 mymachine user.debug kernel: MAKR: uart_startup set ttyPSC4 a02
Jan 7 18:22:56 mymachine user.debug kernel: MAKR: uart_startup clr ttyPSC4 a00
Jan 7 18:22:56 mymachine user.debug kernel: MAKR: uart_block_til_ready
Jan 7 18:22:56 mymachine user.debug kernel: MAKR: tty_ioctl 0x5416 on ttyPSC4 ret 0 at 2
Jan 7 18:22:56 mymachine user.debug kernel: MAKR: tty_ioctl 0x402c7413 on ttyPSC4 ret 0 at 5
Jan 7 18:22:56 mymachine user.debug kernel: MAKR: tty_ioctl 0x802c7416 on ttyPSC4 ret 0 at 5
Jan 7 18:22:56 mymachine user.debug kernel: MAKR: tty_ioctl 0x5417 on ttyPSC4 ret 0 at 2
Jan 7 18:22:57 mymachine user.debug kernel: MAKR: tty_ioctl 0x5416 on ttyPSC4 ret 0 at 2
Jan 7 18:22:57 mymachine local2.info chat[1437]: abort on (ERROR)
Jan 7 18:22:57 mymachine local2.info chat[1437]: timeout set to 10 seconds
Jan 7 18:22:57 mymachine local2.info chat[1437]: send (AT^M)
Jan 7 18:22:57 mymachine user.debug kernel: MAKR: tty_ioctl 0x402c7413 on ttyPSC4 ret 0 at 5
Jan 7 18:22:57 mymachine user.debug kernel: MAKR: tty_ioctl 0x802c7414 on ttyPSC4 ret 0 at 5
Jan 7 18:22:57 mymachine local2.info chat[1437]: expect (OK)
Jan 7 18:22:57 mymachine local2.info chat[1437]: AT^M^M
Jan 7 18:22:57 mymachine local2.info chat[1437]: OK
<snip>
-------------------------------------------------------------------------------
If anyone have any good suggestion about where to go next, I'm eager to hear.
Need more information? Please ask!
Best regards,
Mats
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Suspecting kernel timing bug in ppp or tty drivers
2014-03-10 13:36 Suspecting kernel timing bug in ppp or tty drivers Mats Kärrman
@ 2014-03-10 19:40 ` Greg KH
2014-03-11 9:41 ` Mats Kärrman
0 siblings, 1 reply; 8+ messages in thread
From: Greg KH @ 2014-03-10 19:40 UTC (permalink / raw)
To: Mats Kärrman; +Cc: linux-ppp@vger.kernel.org, linux-serial@vger.kernel.org
On Mon, Mar 10, 2014 at 01:36:36PM +0000, Mats Kärrman wrote:
> Hi Linux serial & PPP experts,
>
> For a long time now I have been chasing a problem in my PPP connection.
> The fault scenario is as follows:
>
> - After reboot establishing a PPP connection works OK
> - Network communication over PPP works OK
> - The PPP connection is restarted - could be for any of several reasons; peer
> drops the connection gracefully, local restart due to network problems,
> local restart due to user request, et.c.
> - After closing down the connection, it is not possible to re-connect again.
> The Error messages from pppd looks something like:
>
> Jan 1 01:04:42 mymachine local2.err chat[631]: Can't get terminal parameters: Input/output error
> Jan 1 01:04:42 mymachine daemon.debug pppd[629]: Script /usr/sbin/chat -v -f /etc/ppp/chats/gprs-conn -T my-apn (pid 630), status = 0x2
> Jan 1 01:04:42 mymachine daemon.err pppd[629]: Connect script failed
>
>
> Some data about the system:
> - PPP daemon v2.4.5
> - Linux kernel v3.2.51 (same error with v2.6.35 though)
That's a really old kernel version, can you try 3.13 or newer? Lots of
work has been done in the tty layer since 3.2 was released.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: Suspecting kernel timing bug in ppp or tty drivers
2014-03-10 19:40 ` Greg KH
@ 2014-03-11 9:41 ` Mats Kärrman
2014-03-11 18:50 ` Greg KH
0 siblings, 1 reply; 8+ messages in thread
From: Mats Kärrman @ 2014-03-11 9:41 UTC (permalink / raw)
To: Greg KH; +Cc: linux-ppp@vger.kernel.org, linux-serial@vger.kernel.org
On Monday, March 10, 2014 8:40 PM, Greg KH wrote:
> On Mon, Mar 10, 2014 at 01:36:36PM +0000, Mats Kärrman wrote:
>> Some data about the system:
>> - PPP daemon v2.4.5
>> - Linux kernel v3.2.51 (same error with v2.6.35 though)
>
> That's a really old kernel version, can you try 3.13 or newer? Lots of
> work has been done in the tty layer since 3.2 was released.
>
Thanks for your reply Greg!
For various reasons I'd prefer to go to the long-term release 3.12.y, would
that be sufficient or do you think the changes made to 3.13 are critical?
(Btw. You recently set a tag for 3.12.14 but the tip of 3.12.y is 3.12.13
according to git pull and kernel.org cgit, how come?)
As the MPC5125 is not main-line and I have some drivers for proprietary
hardware the porting and re-testing will take me at least a week I guess.
Best regards,
Mats
--
To unsubscribe from this list: send the line "unsubscribe linux-serial" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Suspecting kernel timing bug in ppp or tty drivers
2014-03-11 9:41 ` Mats Kärrman
@ 2014-03-11 18:50 ` Greg KH
2014-03-12 7:54 ` Mats Kärrman
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Greg KH @ 2014-03-11 18:50 UTC (permalink / raw)
To: Mats Kärrman; +Cc: linux-ppp@vger.kernel.org, linux-serial@vger.kernel.org
On Tue, Mar 11, 2014 at 09:41:56AM +0000, Mats Kärrman wrote:
> On Monday, March 10, 2014 8:40 PM, Greg KH wrote:
> > On Mon, Mar 10, 2014 at 01:36:36PM +0000, Mats Kärrman wrote:
> >> Some data about the system:
> >> - PPP daemon v2.4.5
> >> - Linux kernel v3.2.51 (same error with v2.6.35 though)
> >
> > That's a really old kernel version, can you try 3.13 or newer? Lots of
> > work has been done in the tty layer since 3.2 was released.
> >
>
> Thanks for your reply Greg!
>
> For various reasons I'd prefer to go to the long-term release 3.12.y, would
> that be sufficient or do you think the changes made to 3.13 are critical?
It's a good place to start testing, and better than 3.2 :)
> (Btw. You recently set a tag for 3.12.14 but the tip of 3.12.y is 3.12.13
> according to git pull and kernel.org cgit, how come?)
I messed up and forgot to push the tree, should be fine now.
> As the MPC5125 is not main-line and I have some drivers for proprietary
> hardware the porting and re-testing will take me at least a week I guess.
Why aren't these drivers and arch code upstream?
greg k-h
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: Suspecting kernel timing bug in ppp or tty drivers
2014-03-11 18:50 ` Greg KH
@ 2014-03-12 7:54 ` Mats Kärrman
2014-03-12 12:14 ` terry white
2014-03-25 9:51 ` Mats Kärrman
2 siblings, 0 replies; 8+ messages in thread
From: Mats Kärrman @ 2014-03-12 7:54 UTC (permalink / raw)
To: Greg KH; +Cc: linux-ppp@vger.kernel.org, linux-serial@vger.kernel.org
On Tuesday, March 11, 2014 7:50 PM, Greg KH wrote:
> On Tue, Mar 11, 2014 at 09:41:56AM +0000, Mats Kärrman wrote:
>> As the MPC5125 is not main-line and I have some drivers for proprietary
>> hardware the porting and re-testing will take me at least a week I guess.
>
> Why aren't these drivers and arch code upstream?
Well, the proprietary drivers are for hardware that only exist on our proprietary
board so I guess community interest is limited. Of course the buyers of
the product will get access to the source if they want to.
The MPC5125 adaptions partly comes from the Freescale Linux demo for
the TWR5125 board so they are not mine to contribute. However I was
surprised to see that someone has started contributing 5125 stuff since
v3.2 so maybe there will be an opportunity for me to contribute as well,
once I get the new kernel running.
:) // Mats--
To unsubscribe from this list: send the line "unsubscribe linux-serial" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Suspecting kernel timing bug in ppp or tty drivers
2014-03-11 18:50 ` Greg KH
2014-03-12 7:54 ` Mats Kärrman
@ 2014-03-12 12:14 ` terry white
2014-03-12 14:04 ` Peter Hurley
2014-03-25 9:51 ` Mats Kärrman
2 siblings, 1 reply; 8+ messages in thread
From: terry white @ 2014-03-12 12:14 UTC (permalink / raw)
To: linux-ppp, linux-serial
... ciao:
" Some data about the system:
- PPP daemon v2.4.5
- Linux kernel v3.2.51 (same error with v2.6.35 though)
That's a really old kernel version, can you try 3.13 or newer?
Lots of work has been done in the tty layer since 3.2 was released."
my reading of this 'thread' suggests that "pppd" and/or "serial" have
started "DEMANDING" bleeding edge kernel versions. if that 'correct', for
me anyway, it's going to prove problematic. that aside, it doesn't seem
like the most elegant solution, to problems it's meant to address ...
--
... it's not what you see ,
but in stead , notice ...
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Suspecting kernel timing bug in ppp or tty drivers
2014-03-12 12:14 ` terry white
@ 2014-03-12 14:04 ` Peter Hurley
0 siblings, 0 replies; 8+ messages in thread
From: Peter Hurley @ 2014-03-12 14:04 UTC (permalink / raw)
To: terry white; +Cc: linux-ppp, linux-serial, Mats Kärrman, Greg KH
On 03/12/2014 08:14 AM, terry white wrote:
> my reading of this 'thread' suggests that "pppd" and/or "serial" have
> started "DEMANDING" bleeding edge kernel versions. if that 'correct', for
> me anyway, it's going to prove problematic. that aside, it doesn't seem
> like the most elegant solution, to problems it's meant to address ...
There have been _hundreds of changes_ to the tty core since 3.2, many
to specifically fix the type of error reported.
Very few of these changes are trivial; there are at least a dozen patchsets
with 10 or more patches. I doubt these will ever be backported to -stable;
the changes are just too extensive.
It makes sense to establish if these changes have indeed fixed the observed
error already or if more in-depth debugging is necessary to uncover the root
cause.
Regards,
Peter Hurley
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: Suspecting kernel timing bug in ppp or tty drivers
2014-03-11 18:50 ` Greg KH
2014-03-12 7:54 ` Mats Kärrman
2014-03-12 12:14 ` terry white
@ 2014-03-25 9:51 ` Mats Kärrman
2 siblings, 0 replies; 8+ messages in thread
From: Mats Kärrman @ 2014-03-25 9:51 UTC (permalink / raw)
To: Greg KH; +Cc: linux-ppp@vger.kernel.org, linux-serial@vger.kernel.org
On Tuesday, March 11, 2014 7:50 PM, Greg KH wrote:
> On Tue, Mar 11, 2014 at 09:41:56AM +0000, Mats Kärrman wrote:
> > On Monday, March 10, 2014 8:40 PM, Greg KH wrote:
> > > On Mon, Mar 10, 2014 at 01:36:36PM +0000, Mats Kärrman wrote:
> > >> Some data about the system:
> > >> - PPP daemon v2.4.5
> > >> - Linux kernel v3.2.51 (same error with v2.6.35 though)
> > >
> > > That's a really old kernel version, can you try 3.13 or newer? Lots of
> > > work has been done in the tty layer since 3.2 was released.
> > >
> >
> > Thanks for your reply Greg!
> >
> > For various reasons I'd prefer to go to the long-term release 3.12.y, would
> > that be sufficient or do you think the changes made to 3.13 are critical?
>
> It's a good place to start testing, and better than 3.2 :)
After porting to 3.12.14 and testing for about a week I have been unable to
reproduce the problem. Hopefully this means the issue is solved...
BR // Mats
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2014-03-25 9:51 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-10 13:36 Suspecting kernel timing bug in ppp or tty drivers Mats Kärrman
2014-03-10 19:40 ` Greg KH
2014-03-11 9:41 ` Mats Kärrman
2014-03-11 18:50 ` Greg KH
2014-03-12 7:54 ` Mats Kärrman
2014-03-12 12:14 ` terry white
2014-03-12 14:04 ` Peter Hurley
2014-03-25 9:51 ` Mats Kärrman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).