netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* qmi-wwan bug
@ 2012-10-24 18:50 Shawn J. Goff
  2012-10-24 21:19 ` Bjørn Mork
  0 siblings, 1 reply; 6+ messages in thread
From: Shawn J. Goff @ 2012-10-24 18:50 UTC (permalink / raw)
  To: netdev

I've backported qmi-wwan to 2.6.39 (it's here: 
https://bitbucket.org/accelecon/linux-at91/changesets), and it mostly 
works, but I've come across a problem. The modem will sometimes stop 
responding to any qmi data (but the AT commands on the TTY ports keep 
working). This only happens when there is significant traffic flowing 
through the device (downloading a large file) while at the same time, AT 
commands are sent to one of the TTY ports (I first noticed with my own 
modem query program, but I can reproduce it using microcom to send 
"ATI\r" in a loop). I see this problem with different devices from 
different manufacturers. I've only made it happen on my kernel - I tried 
on 3.6.2, but it seems to not happen there. I've also tried using a 
similar modem that uses a different driver (sierra-net) and that doesn't 
have the same problem.

When it is in failure, if I try to ping an address, the system sends out 
several an ARP requests but gets no response. To get the device to 
respond again, I have to administratively set the wwan interface down, 
then up, use libqmi to get the connection going again, then dhcp to get 
an address.

I also have some USB traces of the failure and recovery process. I'm not 
familiar with CDC or QMI, so it's not yet clear to me exactly what's 
happening, but it looks like the modem just stops sending anything on 
its QMI endpoint for no reason.

How might I dive further into the issue? So far, my next step is to look 
into CDC and QMI and try to decypher the USB traces. If anyone is 
interested, I can share a tcpdump or a USB trace.

-- 
Shawn J. Goff | Accelerated Concepts | Embedded Systems Engineer | 1-813-983-7501

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: qmi-wwan bug
  2012-10-24 18:50 qmi-wwan bug Shawn J. Goff
@ 2012-10-24 21:19 ` Bjørn Mork
  2012-10-24 22:28   ` Shawn J. Goff
  0 siblings, 1 reply; 6+ messages in thread
From: Bjørn Mork @ 2012-10-24 21:19 UTC (permalink / raw)
  To: Shawn J. Goff, netdev



"Shawn J. Goff" <shawn.goff@accelecon.com> wrote:

>I've backported qmi-wwan to 2.6.39 (it's here: 
>https://bitbucket.org/accelecon/linux-at91/changesets), and it mostly 
>works, but I've come across a problem. The modem will sometimes stop 
>responding to any qmi data (but the AT commands on the TTY ports keep 
>working). This only happens when there is significant traffic flowing 
>through the device (downloading a large file) while at the same time,
>AT 
>commands are sent to one of the TTY ports (I first noticed with my own 
>modem query program, but I can reproduce it using microcom to send 
>"ATI\r" in a loop). 

Using the tty ports should be completely independent of any qmi activity from the host perspective. I am tempted to claim this indicates a firmware bug.

> I see this problem with different devices from 
>different manufacturers. 

Which still may use pretty much the same firmware, although a little less likely.

> I've only made it happen on my kernel - I
>tried 
>on 3.6.2, but it seems to not happen there.

Good. But I have a feeling that you switched more than just the kernel. Do you see the issue if you run your backport on the same hardware you tested 3.6.2 on?

 > I've also tried using a 
>similar modem that uses a different driver (sierra-net) and that
>doesn't 
>have the same problem.

Well, that is an entirely different firmware application and driver, even if the hardware is similar or even identical.

>When it is in failure, if I try to ping an address, the system sends
>out 
>several an ARP requests but gets no response. To get the device to 
>respond again, I have to administratively set the wwan interface down, 
>then up, use libqmi to get the connection going again, then dhcp to get
>
>an address.

Which sounds like the connection died. Does QMI work at this point, or is that dead too?

>I also have some USB traces of the failure and recovery process. I'm
>not 
>familiar with CDC or QMI, so it's not yet clear to me exactly what's 
>happening, but it looks like the modem just stops sending anything on 
>its QMI endpoint for no reason.
>
>How might I dive further into the issue? So far, my next step is to
>look 
>into CDC and QMI and try to decypher the USB traces. If anyone is 
>interested, I can share a tcpdump or a USB trace.

I think a test of the backport on the same host hardware you run 3.6.2 on would be the best place to start.

Bjørn

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: qmi-wwan bug
  2012-10-24 21:19 ` Bjørn Mork
@ 2012-10-24 22:28   ` Shawn J. Goff
  2012-10-25  7:40     ` Bjørn Mork
  2012-10-25 10:36     ` Aleksander Morgado
  0 siblings, 2 replies; 6+ messages in thread
From: Shawn J. Goff @ 2012-10-24 22:28 UTC (permalink / raw)
  To: Bjørn Mork; +Cc: netdev


On 10/24/2012 05:19 PM, Bjørn Mork wrote:
>
> "Shawn J. Goff" <shawn.goff@accelecon.com> wrote:
>
>> I've backported qmi-wwan to 2.6.39 (it's here:
>> https://bitbucket.org/accelecon/linux-at91/changesets), and it mostly
>> works, but I've come across a problem. The modem will sometimes stop
>> responding to any qmi data (but the AT commands on the TTY ports keep
>> working). This only happens when there is significant traffic flowing
>> through the device (downloading a large file) while at the same time,
>> AT
>> commands are sent to one of the TTY ports (I first noticed with my own
>> modem query program, but I can reproduce it using microcom to send
>> "ATI\r" in a loop).
> Using the tty ports should be completely independent of any qmi activity from the host perspective. I am tempted to claim this indicates a firmware bug.
>
>> I see this problem with different devices from
>> different manufacturers.
> Which still may use pretty much the same firmware, although a little less likely.
>
>> I've only made it happen on my kernel - I
>> tried
>> on 3.6.2, but it seems to not happen there.
> Good. But I have a feeling that you switched more than just the kernel. Do you see the issue if you run your backport on the same hardware you tested 3.6.2 on?

I just tested my 2.6.39 kernel on the same hardware that had 3.6.2; the 
problem is absent there.

>
>   > I've also tried using a
>> similar modem that uses a different driver (sierra-net) and that
>> doesn't
>> have the same problem.
> Well, that is an entirely different firmware application and driver, even if the hardware is similar or even identical.

Yes - I wanted to eliminate anything lower (such as usb-net? not sure if 
qmi-wwan uses that) from being a suspect contributor to the problem.

>
>> When it is in failure, if I try to ping an address, the system sends
>> out
>> several an ARP requests but gets no response. To get the device to
>> respond again, I have to administratively set the wwan interface down,
>> then up, use libqmi to get the connection going again, then dhcp to get
>>
>> an address.
> Which sounds like the connection died. Does QMI work at this point, or is that dead too?

Looks like qmi works. I can do --nas-get-signal-strength and it gives me 
good numbers. --wds-get-packet-service-status returns "Connection 
status: '2'"
>
>> I also have some USB traces of the failure and recovery process. I'm
>> not
>> familiar with CDC or QMI, so it's not yet clear to me exactly what's
>> happening, but it looks like the modem just stops sending anything on
>> its QMI endpoint for no reason.
>>
>> How might I dive further into the issue? So far, my next step is to
>> look
>> into CDC and QMI and try to decypher the USB traces. If anyone is
>> interested, I can share a tcpdump or a USB trace.
> I think a test of the backport on the same host hardware you run 3.6.2 on would be the best place to start.
>
> Bjørn
Thanks for your help.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: qmi-wwan bug
  2012-10-24 22:28   ` Shawn J. Goff
@ 2012-10-25  7:40     ` Bjørn Mork
  2012-10-25 10:19       ` Shawn J. Goff
  2012-10-25 10:36     ` Aleksander Morgado
  1 sibling, 1 reply; 6+ messages in thread
From: Bjørn Mork @ 2012-10-25  7:40 UTC (permalink / raw)
  To: Shawn J. Goff; +Cc: netdev

"Shawn J. Goff" <shawn.goff@accelecon.com> writes:
> On 10/24/2012 05:19 PM, Bjørn Mork wrote:
>> "Shawn J. Goff" <shawn.goff@accelecon.com> wrote:
>>
>>> I've only made it happen on my kernel - I
>>> tried
>>> on 3.6.2, but it seems to not happen there.
>>
>> Good. But I have a feeling that you switched more than just the
>> kernel. Do you see the issue if you run your backport on the same
>> hardware you tested 3.6.2 on?
>
> I just tested my 2.6.39 kernel on the same hardware that had 3.6.2;
> the problem is absent there.

As I suspected.  Then I believe this issue is more likely related to
your hardware platform and/or its other lower layer drivers, and not to
the backported qmi_wwan driver directly.

Maybe an obscure firmware issue related to timing or other differences?
That is going to be difficult to track down, if it really is the cause.

>>   > I've also tried using a
>>> similar modem that uses a different driver (sierra-net) and that
>>> doesn't
>>> have the same problem.
>>
>> Well, that is an entirely different firmware application and driver,
>> even if the hardware is similar or even identical.
>
> Yes - I wanted to eliminate anything lower (such as usb-net? not sure
> if qmi-wwan uses that) from being a suspect contributor to the
> problem.

Yes, that is useful.  You are perfectly right that most of the host side
drivers are common. Both sierra_net and qmi_wwan are usbnet minidrivers,
and almost all network device functionality is served by the shared
usbnet framework.  So this test pretty much eliminates the host USB
stack.

Which IMHO points to the firmware implementation as the major
difference.

>>> When it is in failure, if I try to ping an address, the system sends
>>> out
>>> several an ARP requests but gets no response. To get the device to
>>> respond again, I have to administratively set the wwan interface down,
>>> then up, use libqmi to get the connection going again, then dhcp to get
>>>
>>> an address.
>> Which sounds like the connection died. Does QMI work at this point, or is that dead too?
>
> Looks like qmi works. I can do --nas-get-signal-strength and it gives
> me good numbers. --wds-get-packet-service-status returns "Connection
> status: '2'"

Ah, interesting.  Then we only need to find out where the bulk URBs end
up.

I would have tried to enable what I could of USB and usbnet debugging to
see if there are any hints there.  A usbmon trace may also show the
problem.  I don't know.  But in any case, I believe this is a USB
problem and probably not too interesting for netdev...


Bjørn

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: qmi-wwan bug
  2012-10-25  7:40     ` Bjørn Mork
@ 2012-10-25 10:19       ` Shawn J. Goff
  0 siblings, 0 replies; 6+ messages in thread
From: Shawn J. Goff @ 2012-10-25 10:19 UTC (permalink / raw)
  To: Bjørn Mork; +Cc: netdev

On 10/25/2012 03:40 AM, Bjørn Mork wrote:
> "Shawn J. Goff" <shawn.goff@accelecon.com> writes:
>> On 10/24/2012 05:19 PM, Bjørn Mork wrote:
>>> "Shawn J. Goff" <shawn.goff@accelecon.com> wrote:
>>>
>>>> I've only made it happen on my kernel - I
>>>> tried
>>>> on 3.6.2, but it seems to not happen there.
>>> Good. But I have a feeling that you switched more than just the
>>> kernel. Do you see the issue if you run your backport on the same
>>> hardware you tested 3.6.2 on?
>> I just tested my 2.6.39 kernel on the same hardware that had 3.6.2;
>> the problem is absent there.
> As I suspected.  Then I believe this issue is more likely related to
> your hardware platform and/or its other lower layer drivers, and not to
> the backported qmi_wwan driver directly.
>
> Maybe an obscure firmware issue related to timing or other differences?
> That is going to be difficult to track down, if it really is the cause.

That's what I figured when I coudn't reproduce it on the other hardware. 
The first thing I'm going to do is check the power - I noticed the USB 
+5V supply can drop to around 4.8V when the modem is under load. I'll 
apply a bench power supply and see if that fixes things.

>>>    > I've also tried using a
>>>> similar modem that uses a different driver (sierra-net) and that
>>>> doesn't
>>>> have the same problem.
>>> Well, that is an entirely different firmware application and driver,
>>> even if the hardware is similar or even identical.
>> Yes - I wanted to eliminate anything lower (such as usb-net? not sure
>> if qmi-wwan uses that) from being a suspect contributor to the
>> problem.
> Yes, that is useful.  You are perfectly right that most of the host side
> drivers are common. Both sierra_net and qmi_wwan are usbnet minidrivers,
> and almost all network device functionality is served by the shared
> usbnet framework.  So this test pretty much eliminates the host USB
> stack.
>
> Which IMHO points to the firmware implementation as the major
> difference.
>
>>>> When it is in failure, if I try to ping an address, the system sends
>>>> out
>>>> several an ARP requests but gets no response. To get the device to
>>>> respond again, I have to administratively set the wwan interface down,
>>>> then up, use libqmi to get the connection going again, then dhcp to get
>>>>
>>>> an address.
>>> Which sounds like the connection died. Does QMI work at this point, or is that dead too?
>> Looks like qmi works. I can do --nas-get-signal-strength and it gives
>> me good numbers. --wds-get-packet-service-status returns "Connection
>> status: '2'"
> Ah, interesting.  Then we only need to find out where the bulk URBs end
> up.
>
> I would have tried to enable what I could of USB and usbnet debugging to
> see if there are any hints there.  A usbmon trace may also show the
> problem.  I don't know.  But in any case, I believe this is a USB
> problem and probably not too interesting for netdev...
After power, I'll move on to enabling whatever USB debugging I can find.
Thanks!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: qmi-wwan bug
  2012-10-24 22:28   ` Shawn J. Goff
  2012-10-25  7:40     ` Bjørn Mork
@ 2012-10-25 10:36     ` Aleksander Morgado
  1 sibling, 0 replies; 6+ messages in thread
From: Aleksander Morgado @ 2012-10-25 10:36 UTC (permalink / raw)
  To: Shawn J. Goff; +Cc: Bjørn Mork, netdev


>>> When it is in failure, if I try to ping an address, the system sends
>>> out
>>> several an ARP requests but gets no response. To get the device to
>>> respond again, I have to administratively set the wwan interface down,
>>> then up, use libqmi to get the connection going again, then dhcp to get
>>>
>>> an address.
>> Which sounds like the connection died. Does QMI work at this point, or
>> is that dead too?
> 
> Looks like qmi works. I can do --nas-get-signal-strength and it gives me
> good numbers. --wds-get-packet-service-status returns "Connection
> status: '2'"

Out of topic...  I just fixed qmicli so that it prints the enum nickname
string instead of the integer value.

-- 
Aleksander

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-10-25 10:44 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-24 18:50 qmi-wwan bug Shawn J. Goff
2012-10-24 21:19 ` Bjørn Mork
2012-10-24 22:28   ` Shawn J. Goff
2012-10-25  7:40     ` Bjørn Mork
2012-10-25 10:19       ` Shawn J. Goff
2012-10-25 10:36     ` Aleksander Morgado

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).