All of lore.kernel.org
 help / color / mirror / Atom feed
* Timeout/Reliability issue with my plugin
@ 2016-03-12 15:03 Holger Freyther
  2016-03-12 15:52 ` Denis Kenzior
  0 siblings, 1 reply; 5+ messages in thread
From: Holger Freyther @ 2016-03-12 15:03 UTC (permalink / raw)
  To: ofono

[-- Attachment #1: Type: text/plain, Size: 1901 bytes --]

Hi,                                                                             
                                                                                
my plugin is opening the serial on _enable and on _disable it is issuing a      
AT+CFUN=0 and waits for the result before discarding the atchat.                
                                                                                
$ ./tests/enable-modem /wavecom_0                                               
...                                                                             
$ lsof | grep ttyUSB0 | wc -l                                                   
1                                                                               
                                                                                
$ ./tests/disable-modem /wavecom_0                                              
$ lsof | grep ttyUSB0 | wc -l                                                   
0                                                                               
                                                                                
So far so good. But if I run the above in a loop.. then I see dbus timeouts and 
when I execute lsof.. I see that ofonod has 1023 FDs open. So at some point in  
time either the state is messed up or the AT+CFUN doesn't respond.. but we open 
the same device more than once.                                                 
                                                                                
Is this known? Does ofono start timers? Does the GAtChat have a way to timeout  
commands and then disclose it? How do other modems handle such situation?       
                                                                                
kind regards                                                                    
  holger   


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Timeout/Reliability issue with my plugin
  2016-03-12 15:03 Timeout/Reliability issue with my plugin Holger Freyther
@ 2016-03-12 15:52 ` Denis Kenzior
  2016-03-12 17:10   ` Holger Freyther
  0 siblings, 1 reply; 5+ messages in thread
From: Denis Kenzior @ 2016-03-12 15:52 UTC (permalink / raw)
  To: ofono

[-- Attachment #1: Type: text/plain, Size: 1621 bytes --]

Hi Holger,

On 03/12/2016 09:03 AM, Holger Freyther wrote:
> Hi,
>
> my plugin is opening the serial on _enable and on _disable it is issuing a
> AT+CFUN=0 and waits for the result before discarding the atchat.
>
> $ ./tests/enable-modem /wavecom_0
> ...
> $ lsof | grep ttyUSB0 | wc -l
> 1
>
> $ ./tests/disable-modem /wavecom_0
> $ lsof | grep ttyUSB0 | wc -l
> 0
>
> So far so good. But if I run the above in a loop.. then I see dbus timeouts and
> when I execute lsof.. I see that ofonod has 1023 FDs open. So at some point in
> time either the state is messed up or the AT+CFUN doesn't respond.. but we open
> the same device more than once.

Without seeing the modem driver code, I can't comment.  The core does 
set a timeout whenever .set_powered() method is called.  If the method 
does not return in that time, the core does what cleanup that it can. 
Refer to src/modem.c for details.

However, the modem driver is still responsible for cleaning up after 
itself, since the core has no visibility into what resources are being 
used by the modem driver.

>
> Is this known? Does ofono start timers? Does the GAtChat have a way to timeout
> commands and then disclose it? How do other modems handle such situation?
>

GAtChat does not timeout the individual commands.  The atom drivers 
would have no idea what to do with commands that timeout anyway.  And we 
have never needed this capability in practice.  Your modem driver would 
need to handle any modem idiosyncrasies, such as the modem firmware 
resetting on CFUN=0 and never returning an OK.

Regards,
-Denis

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Timeout/Reliability issue with my plugin
  2016-03-12 15:52 ` Denis Kenzior
@ 2016-03-12 17:10   ` Holger Freyther
  2016-03-12 17:23     ` Denis Kenzior
  0 siblings, 1 reply; 5+ messages in thread
From: Holger Freyther @ 2016-03-12 17:10 UTC (permalink / raw)
  To: ofono

[-- Attachment #1: Type: text/plain, Size: 3317 bytes --]

Denis Kenzior <denkenz@...> writes:


Dear Denis,

>

thank you for the quick answer.
 
> Without seeing the modem driver code, I can't comment.  The core does 
> set a timeout whenever .set_powered() method is called.  If the method 
> does not return in that time, the core does what cleanup that it can. 
> Refer to src/modem.c for details.
> 
> However, the modem driver is still responsible for cleaning up after 
> itself, since the core has no visibility into what resources are being 
> used by the modem driver.

How would it do it? Should it start it's own glib timer next to the AT+CFUN?
I have added an abort in the code. As you have guessed it is a standard AT
modem. What happens is that somehow the _enable times out, on disable
the _disable will not be called (as the modem was probably never powered
up) and then on the next _enable we create a second GAtChat.


test/enable-modem test/disable-modem in a loop

Connecting modem /wavecom_15...                                                 
Traceback (most recent call last):                                              
  File "./test/enable-modem", line 20, in <module>                              
    modem.SetProperty("Powered", dbus.Boolean(1), timeout = 120)                
  File "/usr/lib/python3/dist-packages/dbus/proxies.py", line 70, in __call__   
    return self._proxy_method(*args, **keywords)                                
  File "/usr/lib/python3/dist-packages/dbus/proxies.py", line 145, in __call__  
    **keywords)                                                                 
  File "/usr/lib/python3/dist-packages/dbus/connection.py", line 651, in        
    call_blocking message, timeout)                                             
dbus.exceptions.DBusException: org.ofono.Error.Timedout: Operation failure due   
to timeout                                                                      
Disconnecting modem /wavecom_15...                                              
Connecting modem /wavecom_15...                                                 
                                                                                

printf + at commands
                                                                                
ofonod[18642]: 0x7955f0: wavecom_enable                                         
ofonod[18642]: 0x78f300 0x7955f0 reset res 2                                    
ofonod[18642]: > AT+WIND=32767\r                                                
ofonod[18642]: < \r\nOK\r\n                                                     
ofonod[18642]: wind_set                                                         
ofonod[18642]: > AT+CFUN=1,1\r                                                  
ofonod[18642]: 0x7955f0: wavecom_enable                                         
ofonod[18642]: 0x7955f0 the chat is still there!!!!                             

so _enable is called twice without a _disable inbetween. The _enable routine
returns -EINPROGRESS and the AT+CFUN=1,1 probably too much time then
or failed somehow differently. In this specific case the GAtChat will not be
unrefed which means the fd will be leaked.

So how can the plugin figure out that the core has decided not to move forward?

holger



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Timeout/Reliability issue with my plugin
  2016-03-12 17:10   ` Holger Freyther
@ 2016-03-12 17:23     ` Denis Kenzior
  2016-03-12 19:20       ` Holger Freyther
  0 siblings, 1 reply; 5+ messages in thread
From: Denis Kenzior @ 2016-03-12 17:23 UTC (permalink / raw)
  To: ofono

[-- Attachment #1: Type: text/plain, Size: 2916 bytes --]

Hi Holger,

 >> However, the modem driver is still responsible for cleaning up after
>> itself, since the core has no visibility into what resources are being
>> used by the modem driver.
>
> How would it do it? Should it start it's own glib timer next to the AT+CFUN?
> I have added an abort in the code. As you have guessed it is a standard AT
> modem. What happens is that somehow the _enable times out, on disable
> the _disable will not be called (as the modem was probably never powered
> up) and then on the next _enable we create a second GAtChat.

You could do a timer, or forcefully unref the g_at_chat object if the 
GAtChat already exists when set_powered() is being called.  However, its 
best to drive a modem in such a way that the firmware doesn't reset and 
keeps responding to AT commands properly.

Sometimes this is easier said than done :)

>
>
> test/enable-modem test/disable-modem in a loop
>
> Connecting modem /wavecom_15...
> Traceback (most recent call last):
>    File "./test/enable-modem", line 20, in <module>
>      modem.SetProperty("Powered", dbus.Boolean(1), timeout = 120)
>    File "/usr/lib/python3/dist-packages/dbus/proxies.py", line 70, in __call__
>      return self._proxy_method(*args, **keywords)
>    File "/usr/lib/python3/dist-packages/dbus/proxies.py", line 145, in __call__
>      **keywords)
>    File "/usr/lib/python3/dist-packages/dbus/connection.py", line 651, in
>      call_blocking message, timeout)
> dbus.exceptions.DBusException: org.ofono.Error.Timedout: Operation failure due
> to timeout
> Disconnecting modem /wavecom_15...
> Connecting modem /wavecom_15...
>
>
> printf + at commands
>
> ofonod[18642]: 0x7955f0: wavecom_enable
> ofonod[18642]: 0x78f300 0x7955f0 reset res 2
> ofonod[18642]: > AT+WIND=32767\r
> ofonod[18642]: < \r\nOK\r\n
> ofonod[18642]: wind_set
> ofonod[18642]: > AT+CFUN=1,1\r

Doesn't this second ,1 parameter cause a reset?

> ofonod[18642]: 0x7955f0: wavecom_enable
> ofonod[18642]: 0x7955f0 the chat is still there!!!!
>
> so _enable is called twice without a _disable inbetween. The _enable routine
> returns -EINPROGRESS and the AT+CFUN=1,1 probably too much time then
> or failed somehow differently. In this specific case the GAtChat will not be
> unrefed which means the fd will be leaked.

Try using CFUN=1, and CFUN=4.

>
> So how can the plugin figure out that the core has decided not to move forward?
>

The set_powered() method should succeed in a second or two, maybe 5.  I 
think the core uses a 20 second timer, so if you're hitting that, then 
there's something fundamentally wrong.

Fundamentally the driver needs to ensure that the core callback is 
called within a decent time frame.  The timeout handling in the core is 
just a failsafe for the applications to be notified that the hardware is 
not available.

Regards,
-Denis


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Timeout/Reliability issue with my plugin
  2016-03-12 17:23     ` Denis Kenzior
@ 2016-03-12 19:20       ` Holger Freyther
  0 siblings, 0 replies; 5+ messages in thread
From: Holger Freyther @ 2016-03-12 19:20 UTC (permalink / raw)
  To: ofono

[-- Attachment #1: Type: text/plain, Size: 1654 bytes --]

Denis Kenzior <denkenz@...> writes:

> 
> Hi Holger,

Denis!


> You could do a timer, or forcefully unref the g_at_chat object if the 
> GAtChat already exists when set_powered() is being called.  However, its 
> best to drive a modem in such a way that the firmware doesn't reset and 
> keeps responding to AT commands properly.
> 
> Sometimes this is easier said than done :)

yes, please keep in mind that this is in an infinite loop to enable/disable the
modem. I have opted for unrefing the atchat. In terms from a driver model
the "tiemout" could be communicated to the plugin? E.g. a _reset?



> 
> Doesn't this second ,1 parameter cause a reset?

Yes, it does. I think AT+CFUN=1,0 didn't always work. I will retry it. Sadly
I have to go to AT+CFUN=1 and only then can switch to CFUN=4



> The set_powered() method should succeed in a second or two, maybe 5.  I 
> think the core uses a 20 second timer, so if you're hitting that, then 
> there's something fundamentally wrong.
> 
> Fundamentally the driver needs to ensure that the core callback is 
> called within a decent time frame.  The timeout handling in the core is 
> just a failsafe for the applications to be notified that the hardware is 
> not available.

It is a "load" test right now. The modem might do intensive wear leveling
of the flash. In general it is a lot closer to the 5s seconds (it would be cool
if OFONO_DEBUG_AT would print timing and 'modem or at least the atchat).

Okay, so I am sticking with checking if the atchat is present, cancel everything
and then unref it.

thank you for the quick responses!

  holger


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-03-12 19:20 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-12 15:03 Timeout/Reliability issue with my plugin Holger Freyther
2016-03-12 15:52 ` Denis Kenzior
2016-03-12 17:10   ` Holger Freyther
2016-03-12 17:23     ` Denis Kenzior
2016-03-12 19:20       ` Holger Freyther

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.