qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] xhci migration breakage
@ 2017-03-30 12:23 Dr. David Alan Gilbert
  2017-03-30 14:10 ` Gerd Hoffmann
  0 siblings, 1 reply; 5+ messages in thread
From: Dr. David Alan Gilbert @ 2017-03-30 12:23 UTC (permalink / raw)
  To: kraxel; +Cc: qemu-devel

Hi Gerd,
  2.9 seems to have an xhci migration breakage, I've bisected it
down to:
commit ddb603ab6c981c1d67cb42266fc700c33e5b2d8f
Author: Gerd Hoffmann <kraxel@redhat.com>
Date:   Mon Jan 30 16:36:46 2017 +0100

    xhci: don't kick in xhci_submit and xhci_fire_ctl_transfer

I guess maybe there's a kick needed pre/post migrate?

(it's https://bugzilla.redhat.com/show_bug.cgi?id=1436616 )

The symptoms on the destination are that the qemu prints:

qemu-kvm: usb-msd: Bad signature 53425300

and the guest sees:
  xhci_hcd 0000:00:06.0: WARN Event TRB for slot 1 ep 0 with no TDs queued?
  xhci_hcd 0000:00:06.0: WARN Event TRB for slot 1 ep 2 with no TDs queued?
  xhci_hcd 0000:00:06.0: WARN Event TRB for slot 1 ep 3 with no TDs queued?

and never gets any more IO on the USB disk.

Any ideas?

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] xhci migration breakage
  2017-03-30 12:23 [Qemu-devel] xhci migration breakage Dr. David Alan Gilbert
@ 2017-03-30 14:10 ` Gerd Hoffmann
  2017-03-30 17:42   ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 5+ messages in thread
From: Gerd Hoffmann @ 2017-03-30 14:10 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: qemu-devel

  Hi,

> I guess maybe there's a kick needed pre/post migrate?

Sure, but it is there already (post_load arms kick_timer for active
endpoints), so it isn't *that* simple.

Looking at this I have no idea offhand, guess I have to schedule some
debugging for tomorrow.  Keep me posted in case you continue digging.

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] xhci migration breakage
  2017-03-30 14:10 ` Gerd Hoffmann
@ 2017-03-30 17:42   ` Dr. David Alan Gilbert
  2017-03-31  5:45     ` Gerd Hoffmann
  0 siblings, 1 reply; 5+ messages in thread
From: Dr. David Alan Gilbert @ 2017-03-30 17:42 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: qemu-devel

* Gerd Hoffmann (kraxel@redhat.com) wrote:
>   Hi,
> 
> > I guess maybe there's a kick needed pre/post migrate?
> 
> Sure, but it is there already (post_load arms kick_timer for active
> endpoints), so it isn't *that* simple.
> 
> Looking at this I have no idea offhand, guess I have to schedule some
> debugging for tomorrow.  Keep me posted in case you continue digging.

OK, I don't know any of this code, but my poking about so far is:

source
....
xhci: setup packet pid 0x69 addr 1 ep 1
20095@1490894271.183259:usb_xhci_ep_kick slotid 1, epid 4, streamid 0
xhci: set epctx: 35497080 state=1 dequeue=0000000230cf2f61
20095@1490894271.183284:usb_xhci_xfer_start 0x7f302931fef0: slotid 1, epid 4, streamid 0
xhci_submit(slotid=1,epid=4)
xhci: setup packet pid 0xe1 addr 1 ep 2
20095@1490894271.183404:usb_xhci_ep_kick slotid 1, epid 3, streamid 0
xhci: set epctx: 35497060 state=1 dequeue=0000000230cf3331
20095@1490894271.183420:usb_xhci_xfer_start 0x7f302931fef0: slotid 1, epid 3, streamid 0
xhci_submit(slotid=1,epid=3)
xhci: setup packet pid 0x69 addr 1 ep 1
inputs_connect: inputs channel client create
red_dispatcher_set_cursor_peer: 

(qemu) migra
migrate                 migrate_cancel          migrate_incoming        
migrate_set_cache_size  migrate_set_capability  migrate_set_downtime    
migrate_set_parameter   migrate_set_speed       migrate_start_postcopy  
(qemu) migrate_set_speed 1G
(qemu) migrate tcp:0:4444
usb-storage msd pre_save 0x7f3044627960 sig=0 tag=0
(qemu) q

dest:
(qemu) red_dispatcher_loadvm_commands: 
usb-storage msd post_load 0x7f45f510ee50 sig=0 tag=0

I added these as debug on the 'usb-storage' vmsd element; but
the signature/tag being 0 I assume they're not the culprit.

(qemu) c
(qemu) 20107@1490894287.479732:usb_xhci_ep_kick slotid 1, epid 1, streamid 0
xhci: set epctx: 35497020 state=1 dequeue=0000000035499231
20107@1490894287.479763:usb_xhci_xfer_start 0x7f45f556a800: slotid 1, epid 1, streamid 0
xhci: setup packet pid 0x69 addr 1 ep 0
20107@1490894287.479799:usb_xhci_ep_kick slotid 1, epid 3, streamid 0
xhci: set epctx: 35497060 state=1 dequeue=0000000230cf3331
20107@1490894287.479812:usb_xhci_xfer_start 0x7f45f556a800: slotid 1, epid 3, streamid 0
xhci_submit(slotid=1,epid=3)
xhci: setup packet pid 0x69 addr 1 ep 1
xhci: set epctx: 35497060 state=2 dequeue=0000000230cf3331
20107@1490894287.479837:usb_xhci_ep_kick slotid 1, epid 4, streamid 0
xhci: set epctx: 35497080 state=1 dequeue=0000000230cf2f61
20107@1490894287.479848:usb_xhci_xfer_start 0x7f45f556a800: slotid 1, epid 4, streamid 0
xhci_submit(slotid=1,epid=4)
xhci: setup packet pid 0xe1 addr 1 ep 2
qemu-system-x86_64: usb-msd: Bad signature 53425300

given the bottom byte is 0 - is anything supposed to poison these things when
they're finished with them?


The backtrace (where I added an assert) shows it is coming off the timer
(Note I think this is a different run from that above):
#3  0x00007feba72c91f2 in __GI___assert_fail (assertion=assertion@entry=0x7febabcc1f3a "0", file=file@entry=0x7febabcb1f90 "/root/1436616/qemu/hw/usb/dev-storage.c", line=line@entry=429, function=function@entry=0x7febabcb20f0 <__PRETTY_FUNCTION__.30660> "usb_msd_handle_data") at assert.c:101
#4  0x00007febabad00e1 in usb_msd_handle_data (dev=0x7febae693e50, p=0x7febaeaef808)
    at /root/1436616/qemu/hw/usb/dev-storage.c:429
#5  0x00007febabab2b58 in usb_handle_packet (dev=<optimized out>, p=p@entry=0x7febaeaef808)
    at /root/1436616/qemu/hw/usb/core.c:412
#6  0x00007febabacae27 in xhci_kick_epctx (epctx=<optimized out>, xfer=<optimized out>, xhci=<optimized out>)
    at /root/1436616/qemu/hw/usb/hcd-xhci.c:2103 <- right at the end of xhci_submit
#7  0x00007febabacae27 in xhci_kick_epctx (epctx=<optimized out>, xfer=<optimized out>, xhci=<optimized out>)
    at /root/1436616/qemu/hw/usb/hcd-xhci.c:2112 right at end of xhci_fire_transfer
#8  0x00007febabacae27 in xhci_kick_epctx (epctx=0x7febaec395a0, streamid=0)
    at /root/1436616/qemu/hw/usb/hcd-xhci.c:2240
#9  0x00007febabb4c8b9 in timerlist_run_timers (timer_list=0x7febad2ad790) at /root/1436616/qemu/qemu-timer.c:540
#10 0x00007febabb4cb20 in qemu_clock_run_all_timers (type=<optimized out>) at /root/1436616/qemu/qemu-timer.c:551
#11 0x00007febabb4cb20 in qemu_clock_run_all_timers () at /root/1436616/qemu/qemu-timer.c:665
#12 0x00007febabb4bae7 in main_loop_wait (nonblocking=<optimized out>) at /root/1436616/qemu/main-loop.c:516
#13 0x00007febab8855e2 in main () at /root/1436616/qemu/vl.c:1888
#14 0x00007febab8855e2 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>)
    at /root/1436616/qemu/vl.c:4630

Dave

> cheers,
>   Gerd
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] xhci migration breakage
  2017-03-30 17:42   ` Dr. David Alan Gilbert
@ 2017-03-31  5:45     ` Gerd Hoffmann
  2017-03-31  8:54       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 5+ messages in thread
From: Gerd Hoffmann @ 2017-03-31  5:45 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1021 bytes --]

  Hi,

> dest:
> (qemu) red_dispatcher_loadvm_commands: 
> usb-storage msd post_load 0x7f45f510ee50 sig=0 tag=0
> 
> I added these as debug on the 'usb-storage' vmsd element; but
> the signature/tag being 0 I assume they're not the culprit.

Yep, no inflight transfer, so highly unlikely (also note that ehci works
fine according to bugzilla).

> #6  0x00007febabacae27 in xhci_kick_epctx (epctx=<optimized out>, xfer=<optimized out>, xhci=<optimized out>)
>     at /root/1436616/qemu/hw/usb/hcd-xhci.c:2103 <- right at the end of xhci_submit
> #7  0x00007febabacae27 in xhci_kick_epctx (epctx=<optimized out>, xfer=<optimized out>, xhci=<optimized out>)
>     at /root/1436616/qemu/hw/usb/hcd-xhci.c:2112 right at end of xhci_fire_transfer
> #8  0x00007febabacae27 in xhci_kick_epctx (epctx=0x7febaec395a0, streamid=0)
>     at /root/1436616/qemu/hw/usb/hcd-xhci.c:2240

Recursive xhci_kick_epctx calls, not good.

Does the attached patch (not tested yet) make any difference?

cheers,
  Gerd


[-- Attachment #2: 0001-xhci-set-kick_active-for-retried-xfers-too.patch --]
[-- Type: text/x-patch, Size: 2779 bytes --]

From 1b10b10445ab438c3fa1107f9962add3ae0716c6 Mon Sep 17 00:00:00 2001
From: Gerd Hoffmann <kraxel@redhat.com>
Date: Fri, 31 Mar 2017 07:40:45 +0200
Subject: [PATCH] xhci: set kick_active for retried xfers too

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 hw/usb/hcd-xhci.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/usb/hcd-xhci.c b/hw/usb/hcd-xhci.c
index f0af852..8958f95 100644
--- a/hw/usb/hcd-xhci.c
+++ b/hw/usb/hcd-xhci.c
@@ -2083,6 +2083,7 @@ static void xhci_kick_epctx(XHCIEPContext *epctx, unsigned int streamid)
         return;
     }
 
+    epctx->kick_active++;
     if (epctx->retry) {
         XHCITransfer *xfer = epctx->retry;
 
@@ -2093,6 +2094,7 @@ static void xhci_kick_epctx(XHCIEPContext *epctx, unsigned int streamid)
             mfindex = xhci_mfindex_get(xhci);
             xhci_check_intr_iso_kick(xhci, xfer, epctx, mfindex);
             if (xfer->running_retry) {
+                epctx->kick_active--;
                 return;
             }
             xfer->timed_xfer = 0;
@@ -2101,6 +2103,7 @@ static void xhci_kick_epctx(XHCIEPContext *epctx, unsigned int streamid)
         if (xfer->iso_xfer) {
             /* retry iso transfer */
             if (xhci_setup_packet(xfer) < 0) {
+                epctx->kick_active--;
                 return;
             }
             usb_handle_packet(xfer->packet.ep->dev, &xfer->packet);
@@ -2109,10 +2112,12 @@ static void xhci_kick_epctx(XHCIEPContext *epctx, unsigned int streamid)
         } else {
             /* retry nak'ed transfer */
             if (xhci_setup_packet(xfer) < 0) {
+                epctx->kick_active--;
                 return;
             }
             usb_handle_packet(xfer->packet.ep->dev, &xfer->packet);
             if (xfer->packet.status == USB_RET_NAK) {
+                epctx->kick_active--;
                 return;
             }
             xhci_try_complete_packet(xfer);
@@ -2126,6 +2131,7 @@ static void xhci_kick_epctx(XHCIEPContext *epctx, unsigned int streamid)
 
     if (epctx->state == EP_HALTED) {
         DPRINTF("xhci: ep halted, not running schedule\n");
+        epctx->kick_active--;
         return;
     }
 
@@ -2134,6 +2140,7 @@ static void xhci_kick_epctx(XHCIEPContext *epctx, unsigned int streamid)
         uint32_t err;
         stctx = xhci_find_stream(epctx, streamid, &err);
         if (stctx == NULL) {
+            epctx->kick_active--;
             return;
         }
         ring = &stctx->ring;
@@ -2145,7 +2152,6 @@ static void xhci_kick_epctx(XHCIEPContext *epctx, unsigned int streamid)
     }
     assert(ring->dequeue != 0);
 
-    epctx->kick_active++;
     while (1) {
         length = xhci_ring_chain_length(xhci, ring);
         if (length <= 0) {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] xhci migration breakage
  2017-03-31  5:45     ` Gerd Hoffmann
@ 2017-03-31  8:54       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 5+ messages in thread
From: Dr. David Alan Gilbert @ 2017-03-31  8:54 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: qemu-devel

* Gerd Hoffmann (kraxel@redhat.com) wrote:
>   Hi,
> 
> > dest:
> > (qemu) red_dispatcher_loadvm_commands: 
> > usb-storage msd post_load 0x7f45f510ee50 sig=0 tag=0
> > 
> > I added these as debug on the 'usb-storage' vmsd element; but
> > the signature/tag being 0 I assume they're not the culprit.
> 
> Yep, no inflight transfer, so highly unlikely (also note that ehci works
> fine according to bugzilla).
> 
> > #6  0x00007febabacae27 in xhci_kick_epctx (epctx=<optimized out>, xfer=<optimized out>, xhci=<optimized out>)
> >     at /root/1436616/qemu/hw/usb/hcd-xhci.c:2103 <- right at the end of xhci_submit
> > #7  0x00007febabacae27 in xhci_kick_epctx (epctx=<optimized out>, xfer=<optimized out>, xhci=<optimized out>)
> >     at /root/1436616/qemu/hw/usb/hcd-xhci.c:2112 right at end of xhci_fire_transfer
> > #8  0x00007febabacae27 in xhci_kick_epctx (epctx=0x7febaec395a0, streamid=0)
> >     at /root/1436616/qemu/hw/usb/hcd-xhci.c:2240
> 
> Recursive xhci_kick_epctx calls, not good.
> 
> Does the attached patch (not tested yet) make any difference?

Nope.

Source:
xhci_submit(slotid=1,epid=3)
xhci: setup packet pid 0x69 addr 1 ep 1
30724@1490950388.025424:usb_xhci_ep_kick slotid 1, epid 4, streamid 0
xhci: set epctx: 23fe22080 state=1 dequeue=0000000035460f41
30724@1490950388.025449:usb_xhci_xfer_start 0x7fbf642facc0: slotid 1, epid 4, streamid 0
xhci_submit(slotid=1,epid=4)
xhci: setup packet pid 0xe1 addr 1 ep 2
30724@1490950388.025649:usb_xhci_ep_kick slotid 1, epid 3, streamid 0
xhci: set epctx: 23fe22060 state=1 dequeue=000000003550e421
30724@1490950388.025674:usb_xhci_xfer_start 0x7fbf642facc0: slotid 1, epid 3, streamid 0
xhci_submit(slotid=1,epid=3)
xhci: setup packet pid 0x69 addr 1 ep 1
30724@1490950388.025790:usb_xhci_ep_kick slotid 1, epid 3, streamid 0
xhci: set epctx: 23fe22060 state=1 dequeue=000000003550e431
30724@1490950388.025807:usb_xhci_xfer_start 0x7fbf642facc0: slotid 1, epid 3, streamid 0
xhci_submit(slotid=1,epid=3)
xhci: setup packet pid 0x69 addr 1 ep 1
30724@1490950388.026700:usb_xhci_ep_kick slotid 1, epid 4, streamid 0
xhci: set epctx: 23fe22080 state=1 dequeue=0000000035460f51
30724@1490950388.026739:usb_xhci_xfer_start 0x7fbf680008f0: slotid 1, epid 4, streamid 0
xhci_submit(slotid=1,epid=4)
xhci: setup packet pid 0xe1 addr 1 ep 2
30724@1490950388.026910:usb_xhci_ep_kick slotid 1, epid 3, streamid 0
xhci: set epctx: 23fe22060 state=1 dequeue=000000003550e441
30724@1490950388.026936:usb_xhci_xfer_start 0x7fbf680008f0: slotid 1, epid 3, streamid 0
xhci_submit(slotid=1,epid=3)
xhci: setup packet pid 0x69 addr 1 ep 1

Dest:
QEMU 2.8.92 monitor - type 'help' for more information
(qemu) red_dispatcher_loadvm_commands: 
usb-storage msd post_load 0x7fbcde35def0 sig=0 tag=0
main_channel_link: add main channel client
main_channel_handle_parsed: net test: latency 210.332000 ms, bitrate 1656610 bps (1.579866 Mbps) LOW BANDWIDTH
red_dispatcher_set_cursor_peer: 
inputs_connect: inputs channel client create

(qemu) c
(qemu) 30736@1490950422.138659:usb_xhci_ep_kick slotid 1, epid 1, streamid 0
xhci: set epctx: 23fe22020 state=1 dequeue=0000000230c87231
30736@1490950422.138695:usb_xhci_xfer_start 0x7fbcdd8cdee0: slotid 1, epid 1, streamid 0
xhci: setup packet pid 0x69 addr 1 ep 0
30736@1490950422.138731:usb_xhci_ep_kick slotid 1, epid 3, streamid 0
xhci: set epctx: 23fe22060 state=1 dequeue=000000003550e441
30736@1490950422.138746:usb_xhci_xfer_start 0x7fbcddaf8f20: slotid 1, epid 3, streamid 0
xhci_submit(slotid=1,epid=3)
xhci: setup packet pid 0x69 addr 1 ep 1
xhci: set epctx: 23fe22060 state=2 dequeue=000000003550e441
30736@1490950422.138769:usb_xhci_ep_kick slotid 1, epid 4, streamid 0
xhci: set epctx: 23fe22080 state=1 dequeue=0000000035460f51
30736@1490950422.138781:usb_xhci_xfer_start 0x7fbcdd76aff0: slotid 1, epid 4, streamid 0
xhci_submit(slotid=1,epid=4)
xhci: setup packet pid 0xe1 addr 1 ep 2
qemu-system-x86_64: usb-msd: Bad signature 53425300

Dave

> cheers,
>   Gerd
> 

> From 1b10b10445ab438c3fa1107f9962add3ae0716c6 Mon Sep 17 00:00:00 2001
> From: Gerd Hoffmann <kraxel@redhat.com>
> Date: Fri, 31 Mar 2017 07:40:45 +0200
> Subject: [PATCH] xhci: set kick_active for retried xfers too
> 
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> ---
>  hw/usb/hcd-xhci.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/usb/hcd-xhci.c b/hw/usb/hcd-xhci.c
> index f0af852..8958f95 100644
> --- a/hw/usb/hcd-xhci.c
> +++ b/hw/usb/hcd-xhci.c
> @@ -2083,6 +2083,7 @@ static void xhci_kick_epctx(XHCIEPContext *epctx, unsigned int streamid)
>          return;
>      }
>  
> +    epctx->kick_active++;
>      if (epctx->retry) {
>          XHCITransfer *xfer = epctx->retry;
>  
> @@ -2093,6 +2094,7 @@ static void xhci_kick_epctx(XHCIEPContext *epctx, unsigned int streamid)
>              mfindex = xhci_mfindex_get(xhci);
>              xhci_check_intr_iso_kick(xhci, xfer, epctx, mfindex);
>              if (xfer->running_retry) {
> +                epctx->kick_active--;
>                  return;
>              }
>              xfer->timed_xfer = 0;
> @@ -2101,6 +2103,7 @@ static void xhci_kick_epctx(XHCIEPContext *epctx, unsigned int streamid)
>          if (xfer->iso_xfer) {
>              /* retry iso transfer */
>              if (xhci_setup_packet(xfer) < 0) {
> +                epctx->kick_active--;
>                  return;
>              }
>              usb_handle_packet(xfer->packet.ep->dev, &xfer->packet);
> @@ -2109,10 +2112,12 @@ static void xhci_kick_epctx(XHCIEPContext *epctx, unsigned int streamid)
>          } else {
>              /* retry nak'ed transfer */
>              if (xhci_setup_packet(xfer) < 0) {
> +                epctx->kick_active--;
>                  return;
>              }
>              usb_handle_packet(xfer->packet.ep->dev, &xfer->packet);
>              if (xfer->packet.status == USB_RET_NAK) {
> +                epctx->kick_active--;
>                  return;
>              }
>              xhci_try_complete_packet(xfer);
> @@ -2126,6 +2131,7 @@ static void xhci_kick_epctx(XHCIEPContext *epctx, unsigned int streamid)
>  
>      if (epctx->state == EP_HALTED) {
>          DPRINTF("xhci: ep halted, not running schedule\n");
> +        epctx->kick_active--;
>          return;
>      }
>  
> @@ -2134,6 +2140,7 @@ static void xhci_kick_epctx(XHCIEPContext *epctx, unsigned int streamid)
>          uint32_t err;
>          stctx = xhci_find_stream(epctx, streamid, &err);
>          if (stctx == NULL) {
> +            epctx->kick_active--;
>              return;
>          }
>          ring = &stctx->ring;
> @@ -2145,7 +2152,6 @@ static void xhci_kick_epctx(XHCIEPContext *epctx, unsigned int streamid)
>      }
>      assert(ring->dequeue != 0);
>  
> -    epctx->kick_active++;
>      while (1) {
>          length = xhci_ring_chain_length(xhci, ring);
>          if (length <= 0) {
> -- 
> 1.8.3.1
> 

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-03-31  8:55 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-03-30 12:23 [Qemu-devel] xhci migration breakage Dr. David Alan Gilbert
2017-03-30 14:10 ` Gerd Hoffmann
2017-03-30 17:42   ` Dr. David Alan Gilbert
2017-03-31  5:45     ` Gerd Hoffmann
2017-03-31  8:54       ` Dr. David Alan Gilbert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).