qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Huth <thuth@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>, qemu-devel@nongnu.org
Cc: yong.huang@smartx.com
Subject: Re: [PATCH v2] scsi-disk: Fix crash for VM configured with USB CDROM after live migration
Date: Mon, 24 Jun 2024 12:06:44 +0200	[thread overview]
Message-ID: <b625d08e-e65f-4da4-818d-bbc4e2015122@redhat.com> (raw)
In-Reply-To: <20240610170252.26516-1-pbonzini@redhat.com>

On 10/06/2024 19.02, Paolo Bonzini wrote:
> From: Hyman Huang <yong.huang@smartx.com>
> 
> For VMs configured with the USB CDROM device:
> 
> -drive file=/path/to/local/file,id=drive-usb-disk0,media=cdrom,readonly=on...
> -device usb-storage,drive=drive-usb-disk0,id=usb-disk0...
> 
> QEMU process may crash after live migration, to reproduce the issue,
> configure VM (Guest OS ubuntu 20.04 or 21.10) with the following XML:
> 
> <disk type='file' device='cdrom'>
>    <driver name='qemu' type='raw'/>
>    <source file='/path/to/share_fs/cdrom.iso'/>
>    <target dev='sda' bus='usb'/>
>    <readonly/>
>    <address type='usb' bus='0' port='2'/>
> </disk>
> <controller type='usb' index='0' model='piix3-uhci'/>
> 
> Do the live migration repeatedly, crash may happen after live migratoin,
> trace log at the source before live migration is as follows:
> 
> 324808@1711972823.521945:usb_uhci_frame_start nr 319
> 324808@1711972823.521978:usb_uhci_qh_load qh 0x35cb5400
> 324808@1711972823.521989:usb_uhci_qh_load qh 0x35cb5480
> 324808@1711972823.521997:usb_uhci_td_load qh 0x35cb5480, td 0x35cbe000, ctrl 0x0, token 0xffe07f69
> 324808@1711972823.522010:usb_uhci_td_nextqh qh 0x35cb5480, td 0x35cbe000
> 324808@1711972823.522022:usb_uhci_qh_load qh 0x35cb5680
> 324808@1711972823.522030:usb_uhci_td_load qh 0x35cb5680, td 0x75ac5180, ctrl 0x19800000, token 0x3c903e1
> 324808@1711972823.522045:usb_uhci_packet_add token 0x103e1, td 0x75ac5180
> 324808@1711972823.522056:usb_packet_state_change bus 0, port 2, ep 2, packet 0x559f9ba14b00, state undef -> setup
> 324808@1711972823.522079:usb_msd_cmd_submit lun 0, tag 0x472, flags 0x00000080, len 10, data-len 8
> 324808@1711972823.522107:scsi_req_parsed target 0 lun 0 tag 1138 command 74 dir 1 length 8
> 324808@1711972823.522124:scsi_req_parsed_lba target 0 lun 0 tag 1138 command 74 lba 4096
> 324808@1711972823.522139:scsi_req_alloc target 0 lun 0 tag 1138
> 324808@1711972823.522169:scsi_req_continue target 0 lun 0 tag 1138
> 324808@1711972823.522181:scsi_req_data target 0 lun 0 tag 1138 len 8
> 324808@1711972823.522194:usb_packet_state_change bus 0, port 2, ep 2, packet 0x559f9ba14b00, state setup -> complete
> 324808@1711972823.522209:usb_uhci_packet_complete_success token 0x103e1, td 0x75ac5180
> 324808@1711972823.522219:usb_uhci_packet_del token 0x103e1, td 0x75ac5180
> 324808@1711972823.522232:usb_uhci_td_complete qh 0x35cb5680, td 0x75ac5180
> 
> trace log at the destination after live migration is as follows:
> 
> 3286206@1711972823.951646:usb_uhci_frame_start nr 320
> 3286206@1711972823.951663:usb_uhci_qh_load qh 0x35cb5100
> 3286206@1711972823.951671:usb_uhci_qh_load qh 0x35cb5480
> 3286206@1711972823.951680:usb_uhci_td_load qh 0x35cb5480, td 0x35cbe000, ctrl 0x1000000, token 0xffe07f69
> 3286206@1711972823.951693:usb_uhci_td_nextqh qh 0x35cb5480, td 0x35cbe000
> 3286206@1711972823.951702:usb_uhci_qh_load qh 0x35cb5700
> 3286206@1711972823.951709:usb_uhci_td_load qh 0x35cb5700, td 0x75ac5240, ctrl 0x39800000, token 0xe08369
> 3286206@1711972823.951727:usb_uhci_queue_add token 0x8369
> 3286206@1711972823.951735:usb_uhci_packet_add token 0x8369, td 0x75ac5240
> 3286206@1711972823.951746:usb_packet_state_change bus 0, port 2, ep 1, packet 0x56066b2fb5a0, state undef -> setup
> 3286206@1711972823.951766:usb_msd_data_in 8/8 (scsi 8)
> 2024-04-01 12:00:24.665+0000: shutting down, reason=crashed
> 
> The backtrace reveals the following:
> 
> Program terminated with signal SIGSEGV, Segmentation fault.
> 0  __memmove_sse2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:312
> 312        movq    -8(%rsi,%rdx), %rcx
> [Current thread is 1 (Thread 0x7f0a9025fc00 (LWP 3286206))]
> (gdb) bt
> 0  __memmove_sse2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:312
> 1  memcpy (__len=8, __src=<optimized out>, __dest=<optimized out>) at /usr/include/bits/string_fortified.h:34
> 2  iov_from_buf_full (iov=<optimized out>, iov_cnt=<optimized out>, offset=<optimized out>, buf=0x0, bytes=bytes@entry=8) at ../util/iov.c:33
> 3  iov_from_buf (bytes=8, buf=<optimized out>, offset=<optimized out>, iov_cnt=<optimized out>, iov=<optimized out>)
>     at /usr/src/debug/qemu-6-6.2.0-75.7.oe1.smartx.git.40.x86_64/include/qemu/iov.h:49
> 4  usb_packet_copy (p=p@entry=0x56066b2fb5a0, ptr=<optimized out>, bytes=bytes@entry=8) at ../hw/usb/core.c:636
> 5  usb_msd_copy_data (s=s@entry=0x56066c62c770, p=p@entry=0x56066b2fb5a0) at ../hw/usb/dev-storage.c:186
> 6  usb_msd_handle_data (dev=0x56066c62c770, p=0x56066b2fb5a0) at ../hw/usb/dev-storage.c:496
> 7  usb_handle_packet (dev=0x56066c62c770, p=p@entry=0x56066b2fb5a0) at ../hw/usb/core.c:455
> 8  uhci_handle_td (s=s@entry=0x56066bd5f210, q=0x56066bb7fbd0, q@entry=0x0, qh_addr=qh_addr@entry=902518530, td=td@entry=0x7fffe6e788f0, td_addr=<optimized out>,
>     int_mask=int_mask@entry=0x7fffe6e788e4) at ../hw/usb/hcd-uhci.c:885
> 9  uhci_process_frame (s=s@entry=0x56066bd5f210) at ../hw/usb/hcd-uhci.c:1061
> 10 uhci_frame_timer (opaque=opaque@entry=0x56066bd5f210) at ../hw/usb/hcd-uhci.c:1159
> 11 timerlist_run_timers (timer_list=0x56066af26bd0) at ../util/qemu-timer.c:642
> 12 qemu_clock_run_timers (type=QEMU_CLOCK_VIRTUAL) at ../util/qemu-timer.c:656
> 13 qemu_clock_run_all_timers () at ../util/qemu-timer.c:738
> 14 main_loop_wait (nonblocking=nonblocking@entry=0) at ../util/main-loop.c:542
> 15 qemu_main_loop () at ../softmmu/runstate.c:739
> 16 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../softmmu/main.c:52
> (gdb) frame 5
> (gdb) p ((SCSIDiskReq *)s->req)->iov
> $1 = {iov_base = 0x0, iov_len = 0}
> (gdb) p/x s->req->tag
> $2 = 0x472
> 
> When designing the USB mass storage device model, QEMU places SCSI disk
> device as the backend of USB mass storage device. In addition, USB mass
> device driver in Guest OS conforms to the "Universal Serial Bus Mass
> Storage Class Bulk-Only Transport" specification in order to simulate
> the transform behavior between a USB controller and a USB mass device.
> The following shows the protocol hierarchy:
> 
>                        +----------------+
>   CDROM driver         |  scsi command  |        CDROM
>                        +----------------+
> 
>                     +-----------------------+
>   USB mass          | USB Mass Storage Class|    USB mass
>   storage driver    | Bulk-Only Transport   |    storage device
>                     +-----------------------+
> 
>                        +----------------+
>   USB Controller       |  USB Protocol  |        USB device
>                        +----------------+
> 
> In the USB protocol layer, between the USB controller and USB device, at
> least two USB packets will be transformed when guest OS send a
> read operation to USB mass storage device:
> 
> 1. The CBW packet, which will be delivered to the USB device's Bulk-Out
> endpoint. In order to simulate a read operation, the USB mass storage
> device parses the CBW and converts it to a SCSI command, which would be
> executed by CDROM(represented as SCSI disk in QEMU internally), and store
> the result data of the SCSI command in a buffer.
> 
> 2. The DATA-IN packet, which will be delivered from the USB device's
> Bulk-In endpoint(fetched directly from the preceding buffer) to the USB
> controller.
> 
> We consider UHCI to be the controller. The two packets mentioned above may
> have been processed by UHCI in two separate frame entries of the Frame List
> , and also described by two different TDs. Unlike the physical environment,
> a virtualized environment requires the QEMU to make sure that the result
> data of CBW is not lost and is delivered to the UHCI controller.
> 
> Currently, these types of SCSI requests are not migrated, so QEMU cannot
> ensure the result data of the IO operation is not lost if there are
> inflight emulated SCSI requests during the live migration.
> 
> Assume for the moment that the USB mass storage device is processing the
> CBW and storing the result data of the read operation to a buffre, live
> migration happens and moves the VM to the destination while not migrating
> the result data of the read operation.
> 
> After migration, when UHCI at the destination issues a DATA-IN request to
> the USB mass storage device, a crash happens because USB mass storage device
> fetches the result data and get nothing.
> 
> The scenario this patch addresses is this one.
> 
> Theoretically, any device that uses the SCSI disk as a back-end would be
> affected by this issue. In this case, it is the USB CDROM.
> 
> To fix it, inflight emulated SCSI request be migrated during live migration,
> similar to the DMA SCSI request.
> 
> Signed-off-by: Hyman Huang <yong.huang@smartx.com>
> Message-ID: <878c8f093f3fc2f584b5c31cb2490d9f6a12131a.1716531409.git.yong.huang@smartx.com>
> [Do not bump migration version, introduce compat property instead. - Paolo]
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   hw/core/machine.c   |  1 +
>   hw/scsi/scsi-disk.c | 24 +++++++++++++++++++++++-
>   2 files changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index c93d2492443..655d75c21fc 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -36,6 +36,7 @@
>   
>   GlobalProperty hw_compat_9_0[] = {
>       {"arm-cpu", "backcompat-cntfrq", "true" },
> +    {"scsi-disk-base", "migrate-emulated-scsi-request", "false" },
>       {"vfio-pci", "skip-vsc-check", "false" },
>   };

  Hi Paolo, hi Hyman,

this patch introduced a problem with device introspection on older machine 
types. Running "make check-qtest SPEED=slow" is now failing for older 
machine types.

Or if you want to reproduce it manually:

  $ ./qemu-system-x86_64 -M pc-q35-8.0 -monitor stdio
  QEMU 9.0.50 monitor - type 'help' for more information
  (qemu) device_add scsi-block,help
  Unexpected error in object_property_find_err() at
  ../../devel/qemu/qom/object.c:1357:
  can't apply global scsi-disk-base.migrate-emulated-scsi-request=false:
  Property 'scsi-block.migrate-emulated-scsi-request' not found
  Aborted (core dumped)

I think the problem is that the property is only added to certain SCSI 
devices via the DEFINE_SCSI_DISK_PROPERTIES macro, but "scsi-block" device 
does not use this macro and thus does not have this property. So when the 
compat code tries to set this property for "scsi-block" (which is also 
derived from "scsi-disk-base"), it fails, of course.

Should the "migrate-emulated-scsi-request" property maybe rather be added to 
the "scsi-disk-base" class instead? Or should hw_compat_9_0 rather list the 
devices that have the property instead of using the parent class? Could you 
please have a look?

  Thanks,
   Thomas



  reply	other threads:[~2024-06-24 10:07 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-10 17:02 [PATCH v2] scsi-disk: Fix crash for VM configured with USB CDROM after live migration Paolo Bonzini
2024-06-24 10:06 ` Thomas Huth [this message]
2024-06-25  1:58   ` Yong Huang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b625d08e-e65f-4da4-818d-bbc4e2015122@redhat.com \
    --to=thuth@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=yong.huang@smartx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).