qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Hans de Goede <hdegoede@redhat.com>
To: Gerd Hoffmann <kraxel@redhat.com>
Cc: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 04/12] usb: Add support for input pipelining
Date: Mon, 08 Oct 2012 16:50:32 +0200	[thread overview]
Message-ID: <5072E838.8080104@redhat.com> (raw)
In-Reply-To: <5072CC04.4010300@redhat.com>

Hi,

On 10/08/2012 02:50 PM, Gerd Hoffmann wrote:
> On 10/08/12 09:51, Hans de Goede wrote:
>> Currently we effectively only do pipelining for output endpoints, since
>> controllers will stop filling the queue for a packet with stop the queue
>> on a short read semantics enabled.
>
>> This causes large input transfers to get split into multiple packets, which
>> comes with a serious performance penalty.
>>
>> This patch makes it possible to do pipelining for input endpoints, by
>> re-combining packets which belong to 1 transfer at the guest device-driver
>> level into 1 large packet before sending them to the device, this is mostly
>> useful as a significant performance improvement for redirection of real USB
>> devices.
>
> This splitting and recombining is too much magic in the core layer for
> my taste.

I disagree, let me try to explain below.

> I think the core work flow should stay as-is.  Instead we should add
> more meta data to USBPacket (stop-queue-on-short-read bit, maybe
> interrupt-on-complete too).

We agree on the meta data :)

> Continue queuing packets even with
> stop-queue-on-short-read set.

This can only happen if we combine them first. We *must* ensure that
we do not read additional data after a short completion and add
that to a packet belonging to the previous short ended transfer, or
if we end up cancelling after the short completion, throwing away that
data, otherwise the guest and device can get out of sync. Which is exactly
what was happening with the uhci code, before I added the SPD tests there.

Another important reason for combining is that in most cases, we only
see short-not-ok flags because the device-driver in the guest has asked
for a single bulk in transfer which does not fit in a single td, so
it gets split, and to make sure that if it ends up short, the problem
of data from another part of the protocol (ie usb mass storage bulk only
transport data versus sense) ending up in tds belonging to the previous
transfer does not happen, the short-not-ok flag gets set on all
packets but the last one. Notice btw that the xhci controller has no
notion of short-not-ok, instead it simply allows for very large
bulk transfers to be submitted in one "atomic" entity.

So far to not cause this mixup of to which protocol phase in-data belongs,
we do not queue up packets to the device past a short-not-ok boundary,
but this means for serial <-> usb converters that instead of doing:

submit 1st 256 byte bulk in
submit 2nd 256 byte bulk in
<wait for completions>
<after a single transfer completion>
resubmit 256 byte bulk in

We do:
submit 1st 64/256 byte bulk in
<wait for completion>
<on full 64 byte completion>
submit 2nd 64/256 byte bulk in
<etc>

Meaning that for receiving 256 bytes we pay 4 times the round trip
time instead of 1 time, and since we do not submit further packets
after a short-not-ok packet, that we don't have a 2nd 256 byte
transfer queued up ready to receive more data while the 1st
one is being processed. Making reading from such a device highly
unreliable if there is any sort of load on the host.

The whole combining "magic" results in a single 256 byte packet, which
does not have its short-not-ok flag set, allowing true pipelining and
having another in transfer queued up to keep the device busy will the
1st one is being processed.

Re-combining the packets into a single transfer, is the *only* way to
ensure that a short completion cannot end up reading some data from
the next protocol phase.

One possible alternative to re-combining would be to pass the
short-not-ok flag along the chain all the way down into usbfs, but
that won't work since the USBFS_URB_SHORT_NOT_OK flag does not
work with XHCI controllers! There has even been discussion about
outright refusing packet submissions with that flag set in newer
Linux kernels, but that would cause regressions for apps which set
it when they don't really need it (and most don't). So since more
and more machines are using XHCI controllers, passing the
short-not-ok flag along is not a valid option.

So lets try to split this discussion into multiple questions:

1) Can we agree that re-combining input packets back into their original
transfers as done at the guest device driver level, is useful and given
the serial <-> usb converters case, even necessary to have ?


2) If we agree on 1), what is the right place to do them ?

To be able to properly combine packets into one larger transfer, an
uncombine them again later, some coordination is needed between the
entity doing the combining / uncombining and the hcd code:

2a) The combining entity needs to know when the hcd is done with
queuing up packets for an ep.

2b) When uncombining it is possible that less packets are actually
used / filled with data, then were combined into one transfer. When
this happens the hcd code will stop processing the queue after the
short packet, and wait for the guest to restart it. When the guest
restarts the queue it will be modified and the un-used packets will
be gone from it. Since the hcd code keeps its own queue, when
when uncombining the hcd code needs to be told to forget about
the unused packets.

Note that currently we have the same problem on a halt of the ep
queue already, and currently we expect the hcd to completely empty
the queue on a stall. I'm planning to change this code path over
to also let the core tell the hcd to forget about the packets it
had queued beyond the stall, as this seems something to do at the
core level rather then reproducing it in every hcd emulation
separately.


Given the necessary close interaction between the hcd code and
the combine / uncombine code + not wanting to reproduce the
same code in both host-linux.c and redirect.c I believe that
the usb core is the right place for this.

 > Maybe add a callback to notify USBDevice
> when the host controller is done filling the queue, so USBDevices can
> process all queued packets in one go (bottom half would work too
> though).

Changing the callback which I added for this to a bh is fine with me,
I think that both from a making clear what is going on when reading
the code, and from a having all the data hot in cache pov, the callback
is better though.

> Then expect USBDevices to get things right based on the
> USBPacket flags passed on (i.e. have host-linux use
> USBFS_URB_BULK_CONTINUATION as needed).

Note that the need for host-linux to use USBFS_URB_BULK_CONTINUATION +
SHORT_NOT_OK flags with older kernels, and to simply no longer split with
newer kernels, already exists! A single ehci td can span 20k, and we
split at 16k, so an input transfer ending short in the first 16k will
already cause the wrong thing to happen (the sense data of usb-ms gets read
as the next 4k), without input pipelining coming into play at all, and with
the xhci emulation we can already get much larger in transfers...

The reason why I disable input pipelining for host-linux for now in my
patchset is because it will make the problem worse, not because the
problem is new.

For the full details on all the magic needed to correctly handle
larger input transfers on usbdevfs with all possible different
kernel versions, please see:
https://github.com/libusbx/libusbx/commit/ede02ba91920f9be787a7f3cd006c5a4b92b5eab

Or just switch to libusb ...

> This way usb-host and usbredir will see what is really going on instead
> of having the usb core magically fixing up stuff for them.

I believe this is based on you thinking simply passing along a short-not-ok
flag all the way down into usbfs will be enough, but it definitely is not
enough, since the XHCI hardware has no notion of short-not-ok, the only
way to do input pipelinig with an XHCI attached device is to re-combine
the split packets into a single large input transfer. And the proper
place for such combining code is in the core IMHO.

> I think this
> will serve us better long-term.  Maybe you are seeing this "data *and*
> stall" issue (patch 1) exactly because of all this combining & splitting
> logic?

I'm not seeing the data and stall condition in practice at all, but while
designing all this I noticed that this can happen. a single ehci td
can be a multiple of the endpoints maxPacketSize and then the endpoint
can respond with maxPacketSize data, maxPacketSize data, stall at which
point we will have both data and a stall condition in one.

And the Linux ehci + usbfs + libusb + usbredir code all handle passing
along both data and a stall as a result for one packet / transfer,
so this is something which eventually we will need to support in
the qemu usb core too. For now I decided on the quick fix in usbredir
to cover the eventuality of this actually happening and added a
warning to see if this actually ever happens (which it seems not to
in my testing done sofar).

Regards,

Hans

  reply	other threads:[~2012-10-08 14:49 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-08  7:51 [Qemu-devel] [PATCH 00/12] RFC: usb: input pipelining support and other speedups Hans de Goede
2012-10-08  7:51 ` [Qemu-devel] [PATCH 01/12] usb-redir: When a packet contains data on a stall, ignore the stall Hans de Goede
2012-10-08  7:51 ` [Qemu-devel] [PATCH 02/12] usb-redir: Add support for 32 bits bulk packet length Hans de Goede
2012-10-08  7:51 ` [Qemu-devel] [PATCH 03/12] usb-host-linux: Only enabling pipeling for output endpoints Hans de Goede
2012-10-08  7:51 ` [Qemu-devel] [PATCH 04/12] usb: Add support for input pipelining Hans de Goede
2012-10-08 12:50   ` Gerd Hoffmann
2012-10-08 14:50     ` Hans de Goede [this message]
2012-10-09  9:21       ` Gerd Hoffmann
2012-10-09 10:40         ` Hans de Goede
2012-10-09 13:31           ` Gerd Hoffmann
2012-10-09 14:19             ` Hans de Goede
2012-10-08  7:51 ` [Qemu-devel] [PATCH 05/12] uhci: Properly unmap packets on cancel / invalid pid Hans de Goede
2012-10-08  7:51 ` [Qemu-devel] [PATCH 06/12] uhci: Move checks to continue queuing to uhci_fill_queue() Hans de Goede
2012-10-08  7:51 ` [Qemu-devel] [PATCH 07/12] uhci: Add support for input queuing Hans de Goede
2012-10-08  7:51 ` [Qemu-devel] [PATCH 08/12] ehci: Get rid of packet tbytes field Hans de Goede
2012-10-08  7:51 ` [Qemu-devel] [PATCH 09/12] ehci: Set int flag on a short input packet Hans de Goede
2012-10-08  7:51 ` [Qemu-devel] [PATCH 10/12] ehci: Add support for input queuing Hans de Goede
2012-10-08  7:51 ` [Qemu-devel] [PATCH 11/12] ehci: Improve latency of interrupt delivery and async schedule scanning Hans de Goede
2012-10-08  7:51 ` [Qemu-devel] [PATCH 12/12] ehci: Speed up the timer of raising int from the async schedule Hans de Goede

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5072E838.8080104@redhat.com \
    --to=hdegoede@redhat.com \
    --cc=kraxel@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).