Re: [PATCH] cdc-wdm: fix "out-of-sync" due to missing notifications

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Oliver Neukum <oneukum@suse.com>
To: "Bjørn Mork" <bjorn@mork.no>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	linux-usb@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH] cdc-wdm: fix "out-of-sync" due to missing notifications
Date: Wed, 18 May 2016 10:12:32 +0200	[thread overview]
Message-ID: <1463559152.22748.10.camel@suse.com> (raw)
In-Reply-To: <878tz81ch4.fsf@nemi.mork.no>

On Wed, 2016-05-18 at 01:39 +0200, Bjørn Mork wrote:
> Oliver Neukum <oneukum@suse.com> writes:
> > On Tue, 2016-05-17 at 21:24 +0200, Bjørn Mork wrote:
> >> Oliver Neukum <oneukum@suse.com> writes:
> >> 
> >> > On Fri, 2016-05-13 at 18:59 +0200, Bjørn Mork wrote:
> >> >> Bjørn Mork <bjorn@mork.no> writes:
> >> >> 
> >> >> > The driver enforces a strict one-to-one relationship between the
> >> >> > received RESPONSE_AVAILABLE notifications and messages read from
> >> >> > the device. At the same time, it will cancel the interrupt URB
> >> >> > when there is no client holding the character device open.
> >> >> 
> >> >> Never mind.  Forget it.
> >> >> 
> >> >> This patch breaks other devices again.  The immediate and unconditional
> >> >> reading make them barf. I guess it can be worked around by delaying the
> >> >> flushing until at least one notification is received, but I obviously
> >> >> have to test this theory thoroughly on all devices I have.
> >> >
> >> > Hi,
> >> >
> >> > I think the best approach would be to keep the interrupt URB always
> >> > active. I didn't do this to conserve bandwidth, but if it makes devices
> >> > work, it certainly would be the best option.
> >> 
> >> Yes, I considered that.  But this implies purging the device message
> >> queue without telling userspace that we did so.  At least with the
> >> current driver design, which is based on a single limited size
> >> buffer. If the device queues a number of unsolictied messages between
> >> two userspace requests, then we really want all those unsolicted
> >> messages delivered to the userspace program on the second request.
> >
> > You might argue that if user space wants the data it should open the
> > device.
> 
> Maybe.  It's a variant of the current situation, where userspace must
> not close the device while a session is in progress.
> 
> The issue here is that userspace (and the driver) knows nothing about
> what kind of messages the device decides to send, or when.  So how can
> userspace know that it wants the data?  It can't.  It has to keep the
> device open just in case there is something interesting happening.

Data is produced. If it is not processed, it must eventually be dropped.
The only question is how soon.
> 
> This is not the kind of semantics I'd like to present to any userspace
> developer.  We present a character device as an abstraction of a
> hardware device. I believe a reasonable assumption from a userspace
> developer is that the driver forwards all messages it reads from the
> hardware to the character device.  So either we don't read from hardware
> when the character device is closed, or we cache everything we read
> until the character device is open.

Well, no, we cannot meet such guarantee unless we have flow control.
Why would it matter whether the kernel or the device drop data?

> 
> >> And I do think the original bandwidth (and power) conservative approach
> >> is worth keeping too.  There is no point in waking up these devices
> >> unless there actually is an interested userspace application.
> >
> > They can sleep just fine. I did not imply that runtime PM should
> > be disabled.
> 
> Yes, which means that we cancel the URBs..  I haven't been able to
> reproduce it yet, but I think we might occasionally miss a notification
> during suspend/resume too. But this is timing sensitive, and device
> timing sensitive, so it's difficult to trigger on purpose.

System or runtime resume? Possibly we should just request a response
when we resume.

> For now I've ignored it.  But I wouldn't be surprised if we end up
> having to do the same "flush queue" excercise on every resume too.

Yes.

> >> FWIW, my initial analysis of the problem with the patch was too quick
> >> imprecise. The problem is simply the -EPIPE status we inevitably will
> >> hit when the queue is empty, as I should have anticipated. It will be
> >> returned to userspace translated to -EIO.  I am currently testing a
> >> version taking care of that, and it seems to behave well so far. I'll
> >> submit it as soon as I am absoltely sure that it works on all WDM, QMI
> >> and MBIM devices I have.  Might take some time, since I am running out
> >> of mini-PCIe and m.2 adapters..
> >
> > That looks a bit risky. Firstly, if you get -EPIPE after a notification
> > it is an error and must be reported as such, so you need an additional
> > state.
> 
> Yes, -EPIPE should be reported if it occurs later when polling after a
> notification.  But no additional state is needed.  That info is already
> available.
> 
> > And what do you do after -EPIPE? Do you clean up the stall
> > or not? And the fun really starts if you get a notification while
> > you clean the stall.
> 
> No cleanup necessary/possible AFAICS:  This is endpoint 0.

True. But they shouldn't stall in the first place.
This is a rathole.

> 
> > And are you sure all devices can cope with an unsolicited request?
> 
> Nope. I am not sure about anything when it comes to USB device firmware
> ;)
> 
> Broad testing is definitely necessary.  But realistically: How can it
> possibly fail in other ways than returning 0 data bytes or stalling?
> 
> Wait... Don't answer that.  Yes, I know.  Some device will do something
> completely wild.  I'm just not sure that it is worth caring about...

AS far as we can maintain a sensible behavior with a black list
I agree.

>   The firmware shall send ResponseAvailable notifications periodically,
>   using any appropriate algorithm, to inform the host that there is data
>   available in the reply buffer. The firmware is allowed to send
>   ResponseAvailable notifications even if there is no data available,
>   but this will obviously reduce overall performance."

That was the original reason to depend on the noitifications.

> It remains to see if there are any devices which cannot cope with an
> unexpected GetEncapsulatedResponse.

As long as they can be somehow dealt with, I am open to all
suggestions.

	Regards
		Oliver

     prev parent reply	other threads:[~2016-05-18  8:16 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-13 15:39 [PATCH] cdc-wdm: fix "out-of-sync" due to missing notifications Bjørn Mork
2016-05-13 16:59 ` Bjørn Mork
2016-05-17  9:13   ` Oliver Neukum
2016-05-17 19:24     ` Bjørn Mork
2016-05-17 21:49       ` Oliver Neukum
2016-05-17 23:39         ` Bjørn Mork
2016-05-18  8:12           ` Oliver Neukum [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1463559152.22748.10.camel@suse.com \
    --to=oneukum@suse.com \
    --cc=bjorn@mork.no \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.