From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx2.suse.de ([195.135.220.15]:53638 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750792AbcERIQD (ORCPT ); Wed, 18 May 2016 04:16:03 -0400 Message-ID: <1463559152.22748.10.camel@suse.com> Subject: Re: [PATCH] cdc-wdm: fix "out-of-sync" due to missing notifications From: Oliver Neukum To: =?ISO-8859-1?Q?Bj=F8rn?= Mork Cc: Greg Kroah-Hartman , linux-usb@vger.kernel.org, stable@vger.kernel.org Date: Wed, 18 May 2016 10:12:32 +0200 In-Reply-To: <878tz81ch4.fsf@nemi.mork.no> References: <1463153977-19771-1-git-send-email-bjorn@mork.no> <87mvntevwy.fsf@nemi.mork.no> <1463476438.19237.3.camel@suse.com> <87k2is1o8x.fsf@nemi.mork.no> <1463521746.21262.6.camel@suse.com> <878tz81ch4.fsf@nemi.mork.no> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: stable-owner@vger.kernel.org List-ID: On Wed, 2016-05-18 at 01:39 +0200, Bjørn Mork wrote: > Oliver Neukum writes: > > On Tue, 2016-05-17 at 21:24 +0200, Bjørn Mork wrote: > >> Oliver Neukum writes: > >> > >> > On Fri, 2016-05-13 at 18:59 +0200, Bjørn Mork wrote: > >> >> Bjørn Mork writes: > >> >> > >> >> > The driver enforces a strict one-to-one relationship between the > >> >> > received RESPONSE_AVAILABLE notifications and messages read from > >> >> > the device. At the same time, it will cancel the interrupt URB > >> >> > when there is no client holding the character device open. > >> >> > >> >> Never mind. Forget it. > >> >> > >> >> This patch breaks other devices again. The immediate and unconditional > >> >> reading make them barf. I guess it can be worked around by delaying the > >> >> flushing until at least one notification is received, but I obviously > >> >> have to test this theory thoroughly on all devices I have. > >> > > >> > Hi, > >> > > >> > I think the best approach would be to keep the interrupt URB always > >> > active. I didn't do this to conserve bandwidth, but if it makes devices > >> > work, it certainly would be the best option. > >> > >> Yes, I considered that. But this implies purging the device message > >> queue without telling userspace that we did so. At least with the > >> current driver design, which is based on a single limited size > >> buffer. If the device queues a number of unsolictied messages between > >> two userspace requests, then we really want all those unsolicted > >> messages delivered to the userspace program on the second request. > > > > You might argue that if user space wants the data it should open the > > device. > > Maybe. It's a variant of the current situation, where userspace must > not close the device while a session is in progress. > > The issue here is that userspace (and the driver) knows nothing about > what kind of messages the device decides to send, or when. So how can > userspace know that it wants the data? It can't. It has to keep the > device open just in case there is something interesting happening. Data is produced. If it is not processed, it must eventually be dropped. The only question is how soon. > > This is not the kind of semantics I'd like to present to any userspace > developer. We present a character device as an abstraction of a > hardware device. I believe a reasonable assumption from a userspace > developer is that the driver forwards all messages it reads from the > hardware to the character device. So either we don't read from hardware > when the character device is closed, or we cache everything we read > until the character device is open. Well, no, we cannot meet such guarantee unless we have flow control. Why would it matter whether the kernel or the device drop data? > > >> And I do think the original bandwidth (and power) conservative approach > >> is worth keeping too. There is no point in waking up these devices > >> unless there actually is an interested userspace application. > > > > They can sleep just fine. I did not imply that runtime PM should > > be disabled. > > Yes, which means that we cancel the URBs.. I haven't been able to > reproduce it yet, but I think we might occasionally miss a notification > during suspend/resume too. But this is timing sensitive, and device > timing sensitive, so it's difficult to trigger on purpose. System or runtime resume? Possibly we should just request a response when we resume. > For now I've ignored it. But I wouldn't be surprised if we end up > having to do the same "flush queue" excercise on every resume too. Yes. > >> FWIW, my initial analysis of the problem with the patch was too quick > >> imprecise. The problem is simply the -EPIPE status we inevitably will > >> hit when the queue is empty, as I should have anticipated. It will be > >> returned to userspace translated to -EIO. I am currently testing a > >> version taking care of that, and it seems to behave well so far. I'll > >> submit it as soon as I am absoltely sure that it works on all WDM, QMI > >> and MBIM devices I have. Might take some time, since I am running out > >> of mini-PCIe and m.2 adapters.. > > > > That looks a bit risky. Firstly, if you get -EPIPE after a notification > > it is an error and must be reported as such, so you need an additional > > state. > > Yes, -EPIPE should be reported if it occurs later when polling after a > notification. But no additional state is needed. That info is already > available. > > > And what do you do after -EPIPE? Do you clean up the stall > > or not? And the fun really starts if you get a notification while > > you clean the stall. > > No cleanup necessary/possible AFAICS: This is endpoint 0. True. But they shouldn't stall in the first place. This is a rathole. > > > And are you sure all devices can cope with an unsolicited request? > > Nope. I am not sure about anything when it comes to USB device firmware > ;) > > Broad testing is definitely necessary. But realistically: How can it > possibly fail in other ways than returning 0 data bytes or stalling? > > Wait... Don't answer that. Yes, I know. Some device will do something > completely wild. I'm just not sure that it is worth caring about... AS far as we can maintain a sensible behavior with a black list I agree. > The firmware shall send ResponseAvailable notifications periodically, > using any appropriate algorithm, to inform the host that there is data > available in the reply buffer. The firmware is allowed to send > ResponseAvailable notifications even if there is no data available, > but this will obviously reduce overall performance." That was the original reason to depend on the noitifications. > It remains to see if there are any devices which cannot cope with an > unexpected GetEncapsulatedResponse. As long as they can be somehow dealt with, I am open to all suggestions. Regards Oliver