* Timeout/Reliability issue with my plugin
@ 2016-03-12 15:03 Holger Freyther
2016-03-12 15:52 ` Denis Kenzior
0 siblings, 1 reply; 5+ messages in thread
From: Holger Freyther @ 2016-03-12 15:03 UTC (permalink / raw)
To: ofono
[-- Attachment #1: Type: text/plain, Size: 1901 bytes --]
Hi,
my plugin is opening the serial on _enable and on _disable it is issuing a
AT+CFUN=0 and waits for the result before discarding the atchat.
$ ./tests/enable-modem /wavecom_0
...
$ lsof | grep ttyUSB0 | wc -l
1
$ ./tests/disable-modem /wavecom_0
$ lsof | grep ttyUSB0 | wc -l
0
So far so good. But if I run the above in a loop.. then I see dbus timeouts and
when I execute lsof.. I see that ofonod has 1023 FDs open. So at some point in
time either the state is messed up or the AT+CFUN doesn't respond.. but we open
the same device more than once.
Is this known? Does ofono start timers? Does the GAtChat have a way to timeout
commands and then disclose it? How do other modems handle such situation?
kind regards
holger
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: Timeout/Reliability issue with my plugin 2016-03-12 15:03 Timeout/Reliability issue with my plugin Holger Freyther @ 2016-03-12 15:52 ` Denis Kenzior 2016-03-12 17:10 ` Holger Freyther 0 siblings, 1 reply; 5+ messages in thread From: Denis Kenzior @ 2016-03-12 15:52 UTC (permalink / raw) To: ofono [-- Attachment #1: Type: text/plain, Size: 1621 bytes --] Hi Holger, On 03/12/2016 09:03 AM, Holger Freyther wrote: > Hi, > > my plugin is opening the serial on _enable and on _disable it is issuing a > AT+CFUN=0 and waits for the result before discarding the atchat. > > $ ./tests/enable-modem /wavecom_0 > ... > $ lsof | grep ttyUSB0 | wc -l > 1 > > $ ./tests/disable-modem /wavecom_0 > $ lsof | grep ttyUSB0 | wc -l > 0 > > So far so good. But if I run the above in a loop.. then I see dbus timeouts and > when I execute lsof.. I see that ofonod has 1023 FDs open. So at some point in > time either the state is messed up or the AT+CFUN doesn't respond.. but we open > the same device more than once. Without seeing the modem driver code, I can't comment. The core does set a timeout whenever .set_powered() method is called. If the method does not return in that time, the core does what cleanup that it can. Refer to src/modem.c for details. However, the modem driver is still responsible for cleaning up after itself, since the core has no visibility into what resources are being used by the modem driver. > > Is this known? Does ofono start timers? Does the GAtChat have a way to timeout > commands and then disclose it? How do other modems handle such situation? > GAtChat does not timeout the individual commands. The atom drivers would have no idea what to do with commands that timeout anyway. And we have never needed this capability in practice. Your modem driver would need to handle any modem idiosyncrasies, such as the modem firmware resetting on CFUN=0 and never returning an OK. Regards, -Denis ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Timeout/Reliability issue with my plugin 2016-03-12 15:52 ` Denis Kenzior @ 2016-03-12 17:10 ` Holger Freyther 2016-03-12 17:23 ` Denis Kenzior 0 siblings, 1 reply; 5+ messages in thread From: Holger Freyther @ 2016-03-12 17:10 UTC (permalink / raw) To: ofono [-- Attachment #1: Type: text/plain, Size: 3317 bytes --] Denis Kenzior <denkenz@...> writes: Dear Denis, > thank you for the quick answer. > Without seeing the modem driver code, I can't comment. The core does > set a timeout whenever .set_powered() method is called. If the method > does not return in that time, the core does what cleanup that it can. > Refer to src/modem.c for details. > > However, the modem driver is still responsible for cleaning up after > itself, since the core has no visibility into what resources are being > used by the modem driver. How would it do it? Should it start it's own glib timer next to the AT+CFUN? I have added an abort in the code. As you have guessed it is a standard AT modem. What happens is that somehow the _enable times out, on disable the _disable will not be called (as the modem was probably never powered up) and then on the next _enable we create a second GAtChat. test/enable-modem test/disable-modem in a loop Connecting modem /wavecom_15... Traceback (most recent call last): File "./test/enable-modem", line 20, in <module> modem.SetProperty("Powered", dbus.Boolean(1), timeout = 120) File "/usr/lib/python3/dist-packages/dbus/proxies.py", line 70, in __call__ return self._proxy_method(*args, **keywords) File "/usr/lib/python3/dist-packages/dbus/proxies.py", line 145, in __call__ **keywords) File "/usr/lib/python3/dist-packages/dbus/connection.py", line 651, in call_blocking message, timeout) dbus.exceptions.DBusException: org.ofono.Error.Timedout: Operation failure due to timeout Disconnecting modem /wavecom_15... Connecting modem /wavecom_15... printf + at commands ofonod[18642]: 0x7955f0: wavecom_enable ofonod[18642]: 0x78f300 0x7955f0 reset res 2 ofonod[18642]: > AT+WIND=32767\r ofonod[18642]: < \r\nOK\r\n ofonod[18642]: wind_set ofonod[18642]: > AT+CFUN=1,1\r ofonod[18642]: 0x7955f0: wavecom_enable ofonod[18642]: 0x7955f0 the chat is still there!!!! so _enable is called twice without a _disable inbetween. The _enable routine returns -EINPROGRESS and the AT+CFUN=1,1 probably too much time then or failed somehow differently. In this specific case the GAtChat will not be unrefed which means the fd will be leaked. So how can the plugin figure out that the core has decided not to move forward? holger ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Timeout/Reliability issue with my plugin 2016-03-12 17:10 ` Holger Freyther @ 2016-03-12 17:23 ` Denis Kenzior 2016-03-12 19:20 ` Holger Freyther 0 siblings, 1 reply; 5+ messages in thread From: Denis Kenzior @ 2016-03-12 17:23 UTC (permalink / raw) To: ofono [-- Attachment #1: Type: text/plain, Size: 2916 bytes --] Hi Holger, >> However, the modem driver is still responsible for cleaning up after >> itself, since the core has no visibility into what resources are being >> used by the modem driver. > > How would it do it? Should it start it's own glib timer next to the AT+CFUN? > I have added an abort in the code. As you have guessed it is a standard AT > modem. What happens is that somehow the _enable times out, on disable > the _disable will not be called (as the modem was probably never powered > up) and then on the next _enable we create a second GAtChat. You could do a timer, or forcefully unref the g_at_chat object if the GAtChat already exists when set_powered() is being called. However, its best to drive a modem in such a way that the firmware doesn't reset and keeps responding to AT commands properly. Sometimes this is easier said than done :) > > > test/enable-modem test/disable-modem in a loop > > Connecting modem /wavecom_15... > Traceback (most recent call last): > File "./test/enable-modem", line 20, in <module> > modem.SetProperty("Powered", dbus.Boolean(1), timeout = 120) > File "/usr/lib/python3/dist-packages/dbus/proxies.py", line 70, in __call__ > return self._proxy_method(*args, **keywords) > File "/usr/lib/python3/dist-packages/dbus/proxies.py", line 145, in __call__ > **keywords) > File "/usr/lib/python3/dist-packages/dbus/connection.py", line 651, in > call_blocking message, timeout) > dbus.exceptions.DBusException: org.ofono.Error.Timedout: Operation failure due > to timeout > Disconnecting modem /wavecom_15... > Connecting modem /wavecom_15... > > > printf + at commands > > ofonod[18642]: 0x7955f0: wavecom_enable > ofonod[18642]: 0x78f300 0x7955f0 reset res 2 > ofonod[18642]: > AT+WIND=32767\r > ofonod[18642]: < \r\nOK\r\n > ofonod[18642]: wind_set > ofonod[18642]: > AT+CFUN=1,1\r Doesn't this second ,1 parameter cause a reset? > ofonod[18642]: 0x7955f0: wavecom_enable > ofonod[18642]: 0x7955f0 the chat is still there!!!! > > so _enable is called twice without a _disable inbetween. The _enable routine > returns -EINPROGRESS and the AT+CFUN=1,1 probably too much time then > or failed somehow differently. In this specific case the GAtChat will not be > unrefed which means the fd will be leaked. Try using CFUN=1, and CFUN=4. > > So how can the plugin figure out that the core has decided not to move forward? > The set_powered() method should succeed in a second or two, maybe 5. I think the core uses a 20 second timer, so if you're hitting that, then there's something fundamentally wrong. Fundamentally the driver needs to ensure that the core callback is called within a decent time frame. The timeout handling in the core is just a failsafe for the applications to be notified that the hardware is not available. Regards, -Denis ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Timeout/Reliability issue with my plugin 2016-03-12 17:23 ` Denis Kenzior @ 2016-03-12 19:20 ` Holger Freyther 0 siblings, 0 replies; 5+ messages in thread From: Holger Freyther @ 2016-03-12 19:20 UTC (permalink / raw) To: ofono [-- Attachment #1: Type: text/plain, Size: 1654 bytes --] Denis Kenzior <denkenz@...> writes: > > Hi Holger, Denis! > You could do a timer, or forcefully unref the g_at_chat object if the > GAtChat already exists when set_powered() is being called. However, its > best to drive a modem in such a way that the firmware doesn't reset and > keeps responding to AT commands properly. > > Sometimes this is easier said than done :) yes, please keep in mind that this is in an infinite loop to enable/disable the modem. I have opted for unrefing the atchat. In terms from a driver model the "tiemout" could be communicated to the plugin? E.g. a _reset? > > Doesn't this second ,1 parameter cause a reset? Yes, it does. I think AT+CFUN=1,0 didn't always work. I will retry it. Sadly I have to go to AT+CFUN=1 and only then can switch to CFUN=4 > The set_powered() method should succeed in a second or two, maybe 5. I > think the core uses a 20 second timer, so if you're hitting that, then > there's something fundamentally wrong. > > Fundamentally the driver needs to ensure that the core callback is > called within a decent time frame. The timeout handling in the core is > just a failsafe for the applications to be notified that the hardware is > not available. It is a "load" test right now. The modem might do intensive wear leveling of the flash. In general it is a lot closer to the 5s seconds (it would be cool if OFONO_DEBUG_AT would print timing and 'modem or at least the atchat). Okay, so I am sticking with checking if the atchat is present, cancel everything and then unref it. thank you for the quick responses! holger ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-03-12 19:20 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-03-12 15:03 Timeout/Reliability issue with my plugin Holger Freyther 2016-03-12 15:52 ` Denis Kenzior 2016-03-12 17:10 ` Holger Freyther 2016-03-12 17:23 ` Denis Kenzior 2016-03-12 19:20 ` Holger Freyther
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.