All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Bjørn Mork" <bjorn@mork.no>
To: Oliver Neukum <oneukum@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	linux-usb@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH v2] cdc-wdm: fix "out-of-sync" due to missing notifications
Date: Mon, 04 Jul 2016 19:01:07 +0200	[thread overview]
Message-ID: <87shvpcp7w.fsf@nemi.mork.no> (raw)
In-Reply-To: <871t39eehv.fsf@nemi.mork.no> ("Bjørn Mork"'s message of "Mon, 04 Jul 2016 15:09:48 +0200")

Bjørn Mork <bjorn@mork.no> writes:
> Oliver Neukum <oneukum@suse.com> writes:
>> On Mon, 2016-07-04 at 13:54 +0200, Bjørn Mork wrote:
>>> Oliver Neukum <oneukum@suse.com> writes:
>>> 
>
>>> > If you go for that approack you also need to do it
>>> > in resume()
>>> 
>>> Yes, I wondered about that...  But I didn't add it because I am unable
>>> to provoke the problem there, so I cannot really test if it has any
>>> positive effect.  Do you still want it?
>>
>> Yes. I think the window is small, but we don't want strange
>> irreproducible flukes.
>
> OK.  Will post a v3 with the same code in resume once I've done some
> basic testing.

This isn't going very well.

First, this whole thing is a hack for something that should not be an
issue in the first place. Firmware should never queue data without
sending a notification.  The problem does not affect any CDC WDM device,
and I don't think it is a common problem for QMI devices either.  I have
mostly observed it with MBIM.  But there it is very real and extremely
annoying.  I can easily provoke it with two different firmware bases
(from Intel and Qualcomm).

The proposed driver solution is a hack, allowing us to resyncronize
without resetting the firmware. We have no reliable way to reset the
firmware and doing so will have large side effects anyway.  The hack
allows us to continue, with a possibly lost message or two as the only
consequences.  We don't have to break any existing session etc.

But the hack is far from perfect. We race against the firmware every
time it runs.  We risk reading data which the firmware has just prepared
and are sending us a notification for, only to get no data when we
respond to the notification.  This is no problem for any CDC WDM
device.  One of the reads will turn up emtpy, but it doesn't matter if
that's the one before or after the notification.  The spec explicitly
allows empty reads.

But it turns out that it is a real problem for the very same MBIM
firmwares which caused us to need the hack in the first place: They
stall when you try to read and there is no data.  Which means that a
lost race can end up with an EPIPE being returned to _read(), where it
is translated and returned to userspace.

We don't want that.  I consider it an acceptable risk for every open(),
given that I have no better way to do this resyncronization.  But it is
not acceptable that it can happen on every autosuspend sequence, which
is the consequence of adding the hack to resume.  I've now been running
a few hours with it, and the result is occasional unexpected read errors
after resuming.

This is a mess.  I fully agree that it looks like this either should go
into both open() and resume(), or none.  The real answer is 'none'.

But we need the hack for the yucky MBIM firmwares.  Putting it in open()
is defining open() as a syncronization point for something which should
not need syncronization in a perfect world.  I can expand the comment
there a bit to make this clear. Making resume() a syncronization point
as well is only making things worse.


Bjørn


  reply	other threads:[~2016-07-04 17:02 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-03 19:59 [PATCH v2] cdc-wdm: fix "out-of-sync" due to missing notifications Bjørn Mork
2016-07-04  8:04 ` Oliver Neukum
2016-07-04 11:54   ` Bjørn Mork
2016-07-04 12:48     ` Oliver Neukum
2016-07-04 13:09       ` Bjørn Mork
2016-07-04 17:01         ` Bjørn Mork [this message]
2016-07-05 12:52           ` Oliver Neukum
2016-07-09 18:31             ` Bjørn Mork
2016-07-10 12:47               ` Oliver Neukum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87shvpcp7w.fsf@nemi.mork.no \
    --to=bjorn@mork.no \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=oneukum@suse.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.