Re: [PATCH] cdc-wdm: fix "out-of-sync" due to missing notifications

public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed

From: Oliver Neukum <oneukum@suse.com>
To: "Bjørn Mork" <bjorn@mork.no>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	linux-usb@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH] cdc-wdm: fix "out-of-sync" due to missing notifications
Date: Wed, 18 May 2016 10:12:32 +0200	[thread overview]
Message-ID: <1463559152.22748.10.camel@suse.com> (raw)
In-Reply-To: <878tz81ch4.fsf@nemi.mork.no>

On Wed, 2016-05-18 at 01:39 +0200, Bjørn Mork wrote:
> Oliver Neukum <oneukum@suse.com> writes:
> > On Tue, 2016-05-17 at 21:24 +0200, Bjørn Mork wrote:
> >> Oliver Neukum <oneukum@suse.com> writes:
> >> 
> >> > On Fri, 2016-05-13 at 18:59 +0200, Bjørn Mork wrote:
> >> >> Bjørn Mork <bjorn@mork.no> writes:
> >> >> 
> >> >> > The driver enforces a strict one-to-one relationship between the
> >> >> > received RESPONSE_AVAILABLE notifications and messages read from
> >> >> > the device. At the same time, it will cancel the interrupt URB
> >> >> > when there is no client holding the character device open.
> >> >> 
> >> >> Never mind.  Forget it.
> >> >> 
> >> >> This patch breaks other devices again.  The immediate and unconditional
> >> >> reading make them barf. I guess it can be worked around by delaying the
> >> >> flushing until at least one notification is received, but I obviously
> >> >> have to test this theory thoroughly on all devices I have.
> >> >
> >> > Hi,
> >> >
> >> > I think the best approach would be to keep the interrupt URB always
> >> > active. I didn't do this to conserve bandwidth, but if it makes devices
> >> > work, it certainly would be the best option.
> >> 
> >> Yes, I considered that.  But this implies purging the device message
> >> queue without telling userspace that we did so.  At least with the
> >> current driver design, which is based on a single limited size
> >> buffer. If the device queues a number of unsolictied messages between
> >> two userspace requests, then we really want all those unsolicted
> >> messages delivered to the userspace program on the second request.
> >
> > You might argue that if user space wants the data it should open the
> > device.
> 
> Maybe.  It's a variant of the current situation, where userspace must
> not close the device while a session is in progress.
> 
> The issue here is that userspace (and the driver) knows nothing about
> what kind of messages the device decides to send, or when.  So how can
> userspace know that it wants the data?  It can't.  It has to keep the
> device open just in case there is something interesting happening.

Data is produced. If it is not processed, it must eventually be dropped.
The only question is how soon.
> 
> This is not the kind of semantics I'd like to present to any userspace
> developer.  We present a character device as an abstraction of a
> hardware device. I believe a reasonable assumption from a userspace
> developer is that the driver forwards all messages it reads from the
> hardware to the character device.  So either we don't read from hardware
> when the character device is closed, or we cache everything we read
> until the character device is open.

Well, no, we cannot meet such guarantee unless we have flow control.
Why would it matter whether the kernel or the device drop data?

> 
> >> And I do think the original bandwidth (and power) conservative approach
> >> is worth keeping too.  There is no point in waking up these devices
> >> unless there actually is an interested userspace application.
> >
> > They can sleep just fine. I did not imply that runtime PM should
> > be disabled.
> 
> Yes, which means that we cancel the URBs..  I haven't been able to
> reproduce it yet, but I think we might occasionally miss a notification
> during suspend/resume too. But this is timing sensitive, and device
> timing sensitive, so it's difficult to trigger on purpose.

System or runtime resume? Possibly we should just request a response
when we resume.

> For now I've ignored it.  But I wouldn't be surprised if we end up
> having to do the same "flush queue" excercise on every resume too.

Yes.

> >> FWIW, my initial analysis of the problem with the patch was too quick
> >> imprecise. The problem is simply the -EPIPE status we inevitably will
> >> hit when the queue is empty, as I should have anticipated. It will be
> >> returned to userspace translated to -EIO.  I am currently testing a
> >> version taking care of that, and it seems to behave well so far. I'll
> >> submit it as soon as I am absoltely sure that it works on all WDM, QMI
> >> and MBIM devices I have.  Might take some time, since I am running out
> >> of mini-PCIe and m.2 adapters..
> >
> > That looks a bit risky. Firstly, if you get -EPIPE after a notification
> > it is an error and must be reported as such, so you need an additional
> > state.
> 
> Yes, -EPIPE should be reported if it occurs later when polling after a
> notification.  But no additional state is needed.  That info is already
> available.
> 
> > And what do you do after -EPIPE? Do you clean up the stall
> > or not? And the fun really starts if you get a notification while
> > you clean the stall.
> 
> No cleanup necessary/possible AFAICS:  This is endpoint 0.

True. But they shouldn't stall in the first place.
This is a rathole.

> 
> > And are you sure all devices can cope with an unsolicited request?
> 
> Nope. I am not sure about anything when it comes to USB device firmware
> ;)
> 
> Broad testing is definitely necessary.  But realistically: How can it
> possibly fail in other ways than returning 0 data bytes or stalling?
> 
> Wait... Don't answer that.  Yes, I know.  Some device will do something
> completely wild.  I'm just not sure that it is worth caring about...

AS far as we can maintain a sensible behavior with a black list
I agree.

>   The firmware shall send ResponseAvailable notifications periodically,
>   using any appropriate algorithm, to inform the host that there is data
>   available in the reply buffer. The firmware is allowed to send
>   ResponseAvailable notifications even if there is no data available,
>   but this will obviously reduce overall performance."

That was the original reason to depend on the noitifications.

> It remains to see if there are any devices which cannot cope with an
> unexpected GetEncapsulatedResponse.

As long as they can be somehow dealt with, I am open to all
suggestions.

	Regards
		Oliver

     prev parent reply	other threads:[~2016-05-18  8:16 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-13 15:39 [PATCH] cdc-wdm: fix "out-of-sync" due to missing notifications Bjørn Mork
2016-05-13 16:59 ` Bjørn Mork
2016-05-17  9:13   ` Oliver Neukum
2016-05-17 19:24     ` Bjørn Mork
2016-05-17 21:49       ` Oliver Neukum
2016-05-17 23:39         ` Bjørn Mork
2016-05-18  8:12           ` Oliver Neukum [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1463559152.22748.10.camel@suse.com \
    --to=oneukum@suse.com \
    --cc=bjorn@mork.no \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox