All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Huth <thuth@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>, qemu-devel@nongnu.org
Cc: yong.huang@smartx.com
Subject: Re: [PATCH v2] scsi-disk: Fix crash for VM configured with USB CDROM after live migration
Date: Mon, 24 Jun 2024 12:06:44 +0200	[thread overview]
Message-ID: <b625d08e-e65f-4da4-818d-bbc4e2015122@redhat.com> (raw)
In-Reply-To: <20240610170252.26516-1-pbonzini@redhat.com>

On 10/06/2024 19.02, Paolo Bonzini wrote:
> From: Hyman Huang <yong.huang@smartx.com>
> 
> For VMs configured with the USB CDROM device:
> 
> -drive file=/path/to/local/file,id=drive-usb-disk0,media=cdrom,readonly=on...
> -device usb-storage,drive=drive-usb-disk0,id=usb-disk0...
> 
> QEMU process may crash after live migration, to reproduce the issue,
> configure VM (Guest OS ubuntu 20.04 or 21.10) with the following XML:
> 
> <disk type='file' device='cdrom'>
>    <driver name='qemu' type='raw'/>
>    <source file='/path/to/share_fs/cdrom.iso'/>
>    <target dev='sda' bus='usb'/>
>    <readonly/>
>    <address type='usb' bus='0' port='2'/>
> </disk>
> <controller type='usb' index='0' model='piix3-uhci'/>
> 
> Do the live migration repeatedly, crash may happen after live migratoin,
> trace log at the source before live migration is as follows:
> 
> 324808@1711972823.521945:usb_uhci_frame_start nr 319
> 324808@1711972823.521978:usb_uhci_qh_load qh 0x35cb5400
> 324808@1711972823.521989:usb_uhci_qh_load qh 0x35cb5480
> 324808@1711972823.521997:usb_uhci_td_load qh 0x35cb5480, td 0x35cbe000, ctrl 0x0, token 0xffe07f69
> 324808@1711972823.522010:usb_uhci_td_nextqh qh 0x35cb5480, td 0x35cbe000
> 324808@1711972823.522022:usb_uhci_qh_load qh 0x35cb5680
> 324808@1711972823.522030:usb_uhci_td_load qh 0x35cb5680, td 0x75ac5180, ctrl 0x19800000, token 0x3c903e1
> 324808@1711972823.522045:usb_uhci_packet_add token 0x103e1, td 0x75ac5180
> 324808@1711972823.522056:usb_packet_state_change bus 0, port 2, ep 2, packet 0x559f9ba14b00, state undef -> setup
> 324808@1711972823.522079:usb_msd_cmd_submit lun 0, tag 0x472, flags 0x00000080, len 10, data-len 8
> 324808@1711972823.522107:scsi_req_parsed target 0 lun 0 tag 1138 command 74 dir 1 length 8
> 324808@1711972823.522124:scsi_req_parsed_lba target 0 lun 0 tag 1138 command 74 lba 4096
> 324808@1711972823.522139:scsi_req_alloc target 0 lun 0 tag 1138
> 324808@1711972823.522169:scsi_req_continue target 0 lun 0 tag 1138
> 324808@1711972823.522181:scsi_req_data target 0 lun 0 tag 1138 len 8
> 324808@1711972823.522194:usb_packet_state_change bus 0, port 2, ep 2, packet 0x559f9ba14b00, state setup -> complete
> 324808@1711972823.522209:usb_uhci_packet_complete_success token 0x103e1, td 0x75ac5180
> 324808@1711972823.522219:usb_uhci_packet_del token 0x103e1, td 0x75ac5180
> 324808@1711972823.522232:usb_uhci_td_complete qh 0x35cb5680, td 0x75ac5180
> 
> trace log at the destination after live migration is as follows:
> 
> 3286206@1711972823.951646:usb_uhci_frame_start nr 320
> 3286206@1711972823.951663:usb_uhci_qh_load qh 0x35cb5100
> 3286206@1711972823.951671:usb_uhci_qh_load qh 0x35cb5480
> 3286206@1711972823.951680:usb_uhci_td_load qh 0x35cb5480, td 0x35cbe000, ctrl 0x1000000, token 0xffe07f69
> 3286206@1711972823.951693:usb_uhci_td_nextqh qh 0x35cb5480, td 0x35cbe000
> 3286206@1711972823.951702:usb_uhci_qh_load qh 0x35cb5700
> 3286206@1711972823.951709:usb_uhci_td_load qh 0x35cb5700, td 0x75ac5240, ctrl 0x39800000, token 0xe08369
> 3286206@1711972823.951727:usb_uhci_queue_add token 0x8369
> 3286206@1711972823.951735:usb_uhci_packet_add token 0x8369, td 0x75ac5240
> 3286206@1711972823.951746:usb_packet_state_change bus 0, port 2, ep 1, packet 0x56066b2fb5a0, state undef -> setup
> 3286206@1711972823.951766:usb_msd_data_in 8/8 (scsi 8)
> 2024-04-01 12:00:24.665+0000: shutting down, reason=crashed
> 
> The backtrace reveals the following:
> 
> Program terminated with signal SIGSEGV, Segmentation fault.
> 0  __memmove_sse2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:312
> 312        movq    -8(%rsi,%rdx), %rcx
> [Current thread is 1 (Thread 0x7f0a9025fc00 (LWP 3286206))]
> (gdb) bt
> 0  __memmove_sse2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:312
> 1  memcpy (__len=8, __src=<optimized out>, __dest=<optimized out>) at /usr/include/bits/string_fortified.h:34
> 2  iov_from_buf_full (iov=<optimized out>, iov_cnt=<optimized out>, offset=<optimized out>, buf=0x0, bytes=bytes@entry=8) at ../util/iov.c:33
> 3  iov_from_buf (bytes=8, buf=<optimized out>, offset=<optimized out>, iov_cnt=<optimized out>, iov=<optimized out>)
>     at /usr/src/debug/qemu-6-6.2.0-75.7.oe1.smartx.git.40.x86_64/include/qemu/iov.h:49
> 4  usb_packet_copy (p=p@entry=0x56066b2fb5a0, ptr=<optimized out>, bytes=bytes@entry=8) at ../hw/usb/core.c:636
> 5  usb_msd_copy_data (s=s@entry=0x56066c62c770, p=p@entry=0x56066b2fb5a0) at ../hw/usb/dev-storage.c:186
> 6  usb_msd_handle_data (dev=0x56066c62c770, p=0x56066b2fb5a0) at ../hw/usb/dev-storage.c:496
> 7  usb_handle_packet (dev=0x56066c62c770, p=p@entry=0x56066b2fb5a0) at ../hw/usb/core.c:455
> 8  uhci_handle_td (s=s@entry=0x56066bd5f210, q=0x56066bb7fbd0, q@entry=0x0, qh_addr=qh_addr@entry=902518530, td=td@entry=0x7fffe6e788f0, td_addr=<optimized out>,
>     int_mask=int_mask@entry=0x7fffe6e788e4) at ../hw/usb/hcd-uhci.c:885
> 9  uhci_process_frame (s=s@entry=0x56066bd5f210) at ../hw/usb/hcd-uhci.c:1061
> 10 uhci_frame_timer (opaque=opaque@entry=0x56066bd5f210) at ../hw/usb/hcd-uhci.c:1159
> 11 timerlist_run_timers (timer_list=0x56066af26bd0) at ../util/qemu-timer.c:642
> 12 qemu_clock_run_timers (type=QEMU_CLOCK_VIRTUAL) at ../util/qemu-timer.c:656
> 13 qemu_clock_run_all_timers () at ../util/qemu-timer.c:738
> 14 main_loop_wait (nonblocking=nonblocking@entry=0) at ../util/main-loop.c:542
> 15 qemu_main_loop () at ../softmmu/runstate.c:739
> 16 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../softmmu/main.c:52
> (gdb) frame 5
> (gdb) p ((SCSIDiskReq *)s->req)->iov
> $1 = {iov_base = 0x0, iov_len = 0}
> (gdb) p/x s->req->tag
> $2 = 0x472
> 
> When designing the USB mass storage device model, QEMU places SCSI disk
> device as the backend of USB mass storage device. In addition, USB mass
> device driver in Guest OS conforms to the "Universal Serial Bus Mass
> Storage Class Bulk-Only Transport" specification in order to simulate
> the transform behavior between a USB controller and a USB mass device.
> The following shows the protocol hierarchy:
> 
>                        +----------------+
>   CDROM driver         |  scsi command  |        CDROM
>                        +----------------+
> 
>                     +-----------------------+
>   USB mass          | USB Mass Storage Class|    USB mass
>   storage driver    | Bulk-Only Transport   |    storage device
>                     +-----------------------+
> 
>                        +----------------+
>   USB Controller       |  USB Protocol  |        USB device
>                        +----------------+
> 
> In the USB protocol layer, between the USB controller and USB device, at
> least two USB packets will be transformed when guest OS send a
> read operation to USB mass storage device:
> 
> 1. The CBW packet, which will be delivered to the USB device's Bulk-Out
> endpoint. In order to simulate a read operation, the USB mass storage
> device parses the CBW and converts it to a SCSI command, which would be
> executed by CDROM(represented as SCSI disk in QEMU internally), and store
> the result data of the SCSI command in a buffer.
> 
> 2. The DATA-IN packet, which will be delivered from the USB device's
> Bulk-In endpoint(fetched directly from the preceding buffer) to the USB
> controller.
> 
> We consider UHCI to be the controller. The two packets mentioned above may
> have been processed by UHCI in two separate frame entries of the Frame List
> , and also described by two different TDs. Unlike the physical environment,
> a virtualized environment requires the QEMU to make sure that the result
> data of CBW is not lost and is delivered to the UHCI controller.
> 
> Currently, these types of SCSI requests are not migrated, so QEMU cannot
> ensure the result data of the IO operation is not lost if there are
> inflight emulated SCSI requests during the live migration.
> 
> Assume for the moment that the USB mass storage device is processing the
> CBW and storing the result data of the read operation to a buffre, live
> migration happens and moves the VM to the destination while not migrating
> the result data of the read operation.
> 
> After migration, when UHCI at the destination issues a DATA-IN request to
> the USB mass storage device, a crash happens because USB mass storage device
> fetches the result data and get nothing.
> 
> The scenario this patch addresses is this one.
> 
> Theoretically, any device that uses the SCSI disk as a back-end would be
> affected by this issue. In this case, it is the USB CDROM.
> 
> To fix it, inflight emulated SCSI request be migrated during live migration,
> similar to the DMA SCSI request.
> 
> Signed-off-by: Hyman Huang <yong.huang@smartx.com>
> Message-ID: <878c8f093f3fc2f584b5c31cb2490d9f6a12131a.1716531409.git.yong.huang@smartx.com>
> [Do not bump migration version, introduce compat property instead. - Paolo]
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   hw/core/machine.c   |  1 +
>   hw/scsi/scsi-disk.c | 24 +++++++++++++++++++++++-
>   2 files changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index c93d2492443..655d75c21fc 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -36,6 +36,7 @@
>   
>   GlobalProperty hw_compat_9_0[] = {
>       {"arm-cpu", "backcompat-cntfrq", "true" },
> +    {"scsi-disk-base", "migrate-emulated-scsi-request", "false" },
>       {"vfio-pci", "skip-vsc-check", "false" },
>   };

  Hi Paolo, hi Hyman,

this patch introduced a problem with device introspection on older machine 
types. Running "make check-qtest SPEED=slow" is now failing for older 
machine types.

Or if you want to reproduce it manually:

  $ ./qemu-system-x86_64 -M pc-q35-8.0 -monitor stdio
  QEMU 9.0.50 monitor - type 'help' for more information
  (qemu) device_add scsi-block,help
  Unexpected error in object_property_find_err() at
  ../../devel/qemu/qom/object.c:1357:
  can't apply global scsi-disk-base.migrate-emulated-scsi-request=false:
  Property 'scsi-block.migrate-emulated-scsi-request' not found
  Aborted (core dumped)

I think the problem is that the property is only added to certain SCSI 
devices via the DEFINE_SCSI_DISK_PROPERTIES macro, but "scsi-block" device 
does not use this macro and thus does not have this property. So when the 
compat code tries to set this property for "scsi-block" (which is also 
derived from "scsi-disk-base"), it fails, of course.

Should the "migrate-emulated-scsi-request" property maybe rather be added to 
the "scsi-disk-base" class instead? Or should hw_compat_9_0 rather list the 
devices that have the property instead of using the parent class? Could you 
please have a look?

  Thanks,
   Thomas



  reply	other threads:[~2024-06-24 10:07 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-10 17:02 [PATCH v2] scsi-disk: Fix crash for VM configured with USB CDROM after live migration Paolo Bonzini
2024-06-24 10:06 ` Thomas Huth [this message]
2024-06-25  1:58   ` Yong Huang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b625d08e-e65f-4da4-818d-bbc4e2015122@redhat.com \
    --to=thuth@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=yong.huang@smartx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.