From: Thomas Huth <thuth@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>, qemu-devel@nongnu.org
Cc: yong.huang@smartx.com
Subject: Re: [PATCH v2] scsi-disk: Fix crash for VM configured with USB CDROM after live migration
Date: Mon, 24 Jun 2024 12:06:44 +0200 [thread overview]
Message-ID: <b625d08e-e65f-4da4-818d-bbc4e2015122@redhat.com> (raw)
In-Reply-To: <20240610170252.26516-1-pbonzini@redhat.com>
On 10/06/2024 19.02, Paolo Bonzini wrote:
> From: Hyman Huang <yong.huang@smartx.com>
>
> For VMs configured with the USB CDROM device:
>
> -drive file=/path/to/local/file,id=drive-usb-disk0,media=cdrom,readonly=on...
> -device usb-storage,drive=drive-usb-disk0,id=usb-disk0...
>
> QEMU process may crash after live migration, to reproduce the issue,
> configure VM (Guest OS ubuntu 20.04 or 21.10) with the following XML:
>
> <disk type='file' device='cdrom'>
> <driver name='qemu' type='raw'/>
> <source file='/path/to/share_fs/cdrom.iso'/>
> <target dev='sda' bus='usb'/>
> <readonly/>
> <address type='usb' bus='0' port='2'/>
> </disk>
> <controller type='usb' index='0' model='piix3-uhci'/>
>
> Do the live migration repeatedly, crash may happen after live migratoin,
> trace log at the source before live migration is as follows:
>
> 324808@1711972823.521945:usb_uhci_frame_start nr 319
> 324808@1711972823.521978:usb_uhci_qh_load qh 0x35cb5400
> 324808@1711972823.521989:usb_uhci_qh_load qh 0x35cb5480
> 324808@1711972823.521997:usb_uhci_td_load qh 0x35cb5480, td 0x35cbe000, ctrl 0x0, token 0xffe07f69
> 324808@1711972823.522010:usb_uhci_td_nextqh qh 0x35cb5480, td 0x35cbe000
> 324808@1711972823.522022:usb_uhci_qh_load qh 0x35cb5680
> 324808@1711972823.522030:usb_uhci_td_load qh 0x35cb5680, td 0x75ac5180, ctrl 0x19800000, token 0x3c903e1
> 324808@1711972823.522045:usb_uhci_packet_add token 0x103e1, td 0x75ac5180
> 324808@1711972823.522056:usb_packet_state_change bus 0, port 2, ep 2, packet 0x559f9ba14b00, state undef -> setup
> 324808@1711972823.522079:usb_msd_cmd_submit lun 0, tag 0x472, flags 0x00000080, len 10, data-len 8
> 324808@1711972823.522107:scsi_req_parsed target 0 lun 0 tag 1138 command 74 dir 1 length 8
> 324808@1711972823.522124:scsi_req_parsed_lba target 0 lun 0 tag 1138 command 74 lba 4096
> 324808@1711972823.522139:scsi_req_alloc target 0 lun 0 tag 1138
> 324808@1711972823.522169:scsi_req_continue target 0 lun 0 tag 1138
> 324808@1711972823.522181:scsi_req_data target 0 lun 0 tag 1138 len 8
> 324808@1711972823.522194:usb_packet_state_change bus 0, port 2, ep 2, packet 0x559f9ba14b00, state setup -> complete
> 324808@1711972823.522209:usb_uhci_packet_complete_success token 0x103e1, td 0x75ac5180
> 324808@1711972823.522219:usb_uhci_packet_del token 0x103e1, td 0x75ac5180
> 324808@1711972823.522232:usb_uhci_td_complete qh 0x35cb5680, td 0x75ac5180
>
> trace log at the destination after live migration is as follows:
>
> 3286206@1711972823.951646:usb_uhci_frame_start nr 320
> 3286206@1711972823.951663:usb_uhci_qh_load qh 0x35cb5100
> 3286206@1711972823.951671:usb_uhci_qh_load qh 0x35cb5480
> 3286206@1711972823.951680:usb_uhci_td_load qh 0x35cb5480, td 0x35cbe000, ctrl 0x1000000, token 0xffe07f69
> 3286206@1711972823.951693:usb_uhci_td_nextqh qh 0x35cb5480, td 0x35cbe000
> 3286206@1711972823.951702:usb_uhci_qh_load qh 0x35cb5700
> 3286206@1711972823.951709:usb_uhci_td_load qh 0x35cb5700, td 0x75ac5240, ctrl 0x39800000, token 0xe08369
> 3286206@1711972823.951727:usb_uhci_queue_add token 0x8369
> 3286206@1711972823.951735:usb_uhci_packet_add token 0x8369, td 0x75ac5240
> 3286206@1711972823.951746:usb_packet_state_change bus 0, port 2, ep 1, packet 0x56066b2fb5a0, state undef -> setup
> 3286206@1711972823.951766:usb_msd_data_in 8/8 (scsi 8)
> 2024-04-01 12:00:24.665+0000: shutting down, reason=crashed
>
> The backtrace reveals the following:
>
> Program terminated with signal SIGSEGV, Segmentation fault.
> 0 __memmove_sse2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:312
> 312 movq -8(%rsi,%rdx), %rcx
> [Current thread is 1 (Thread 0x7f0a9025fc00 (LWP 3286206))]
> (gdb) bt
> 0 __memmove_sse2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:312
> 1 memcpy (__len=8, __src=<optimized out>, __dest=<optimized out>) at /usr/include/bits/string_fortified.h:34
> 2 iov_from_buf_full (iov=<optimized out>, iov_cnt=<optimized out>, offset=<optimized out>, buf=0x0, bytes=bytes@entry=8) at ../util/iov.c:33
> 3 iov_from_buf (bytes=8, buf=<optimized out>, offset=<optimized out>, iov_cnt=<optimized out>, iov=<optimized out>)
> at /usr/src/debug/qemu-6-6.2.0-75.7.oe1.smartx.git.40.x86_64/include/qemu/iov.h:49
> 4 usb_packet_copy (p=p@entry=0x56066b2fb5a0, ptr=<optimized out>, bytes=bytes@entry=8) at ../hw/usb/core.c:636
> 5 usb_msd_copy_data (s=s@entry=0x56066c62c770, p=p@entry=0x56066b2fb5a0) at ../hw/usb/dev-storage.c:186
> 6 usb_msd_handle_data (dev=0x56066c62c770, p=0x56066b2fb5a0) at ../hw/usb/dev-storage.c:496
> 7 usb_handle_packet (dev=0x56066c62c770, p=p@entry=0x56066b2fb5a0) at ../hw/usb/core.c:455
> 8 uhci_handle_td (s=s@entry=0x56066bd5f210, q=0x56066bb7fbd0, q@entry=0x0, qh_addr=qh_addr@entry=902518530, td=td@entry=0x7fffe6e788f0, td_addr=<optimized out>,
> int_mask=int_mask@entry=0x7fffe6e788e4) at ../hw/usb/hcd-uhci.c:885
> 9 uhci_process_frame (s=s@entry=0x56066bd5f210) at ../hw/usb/hcd-uhci.c:1061
> 10 uhci_frame_timer (opaque=opaque@entry=0x56066bd5f210) at ../hw/usb/hcd-uhci.c:1159
> 11 timerlist_run_timers (timer_list=0x56066af26bd0) at ../util/qemu-timer.c:642
> 12 qemu_clock_run_timers (type=QEMU_CLOCK_VIRTUAL) at ../util/qemu-timer.c:656
> 13 qemu_clock_run_all_timers () at ../util/qemu-timer.c:738
> 14 main_loop_wait (nonblocking=nonblocking@entry=0) at ../util/main-loop.c:542
> 15 qemu_main_loop () at ../softmmu/runstate.c:739
> 16 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../softmmu/main.c:52
> (gdb) frame 5
> (gdb) p ((SCSIDiskReq *)s->req)->iov
> $1 = {iov_base = 0x0, iov_len = 0}
> (gdb) p/x s->req->tag
> $2 = 0x472
>
> When designing the USB mass storage device model, QEMU places SCSI disk
> device as the backend of USB mass storage device. In addition, USB mass
> device driver in Guest OS conforms to the "Universal Serial Bus Mass
> Storage Class Bulk-Only Transport" specification in order to simulate
> the transform behavior between a USB controller and a USB mass device.
> The following shows the protocol hierarchy:
>
> +----------------+
> CDROM driver | scsi command | CDROM
> +----------------+
>
> +-----------------------+
> USB mass | USB Mass Storage Class| USB mass
> storage driver | Bulk-Only Transport | storage device
> +-----------------------+
>
> +----------------+
> USB Controller | USB Protocol | USB device
> +----------------+
>
> In the USB protocol layer, between the USB controller and USB device, at
> least two USB packets will be transformed when guest OS send a
> read operation to USB mass storage device:
>
> 1. The CBW packet, which will be delivered to the USB device's Bulk-Out
> endpoint. In order to simulate a read operation, the USB mass storage
> device parses the CBW and converts it to a SCSI command, which would be
> executed by CDROM(represented as SCSI disk in QEMU internally), and store
> the result data of the SCSI command in a buffer.
>
> 2. The DATA-IN packet, which will be delivered from the USB device's
> Bulk-In endpoint(fetched directly from the preceding buffer) to the USB
> controller.
>
> We consider UHCI to be the controller. The two packets mentioned above may
> have been processed by UHCI in two separate frame entries of the Frame List
> , and also described by two different TDs. Unlike the physical environment,
> a virtualized environment requires the QEMU to make sure that the result
> data of CBW is not lost and is delivered to the UHCI controller.
>
> Currently, these types of SCSI requests are not migrated, so QEMU cannot
> ensure the result data of the IO operation is not lost if there are
> inflight emulated SCSI requests during the live migration.
>
> Assume for the moment that the USB mass storage device is processing the
> CBW and storing the result data of the read operation to a buffre, live
> migration happens and moves the VM to the destination while not migrating
> the result data of the read operation.
>
> After migration, when UHCI at the destination issues a DATA-IN request to
> the USB mass storage device, a crash happens because USB mass storage device
> fetches the result data and get nothing.
>
> The scenario this patch addresses is this one.
>
> Theoretically, any device that uses the SCSI disk as a back-end would be
> affected by this issue. In this case, it is the USB CDROM.
>
> To fix it, inflight emulated SCSI request be migrated during live migration,
> similar to the DMA SCSI request.
>
> Signed-off-by: Hyman Huang <yong.huang@smartx.com>
> Message-ID: <878c8f093f3fc2f584b5c31cb2490d9f6a12131a.1716531409.git.yong.huang@smartx.com>
> [Do not bump migration version, introduce compat property instead. - Paolo]
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> hw/core/machine.c | 1 +
> hw/scsi/scsi-disk.c | 24 +++++++++++++++++++++++-
> 2 files changed, 24 insertions(+), 1 deletion(-)
>
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index c93d2492443..655d75c21fc 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -36,6 +36,7 @@
>
> GlobalProperty hw_compat_9_0[] = {
> {"arm-cpu", "backcompat-cntfrq", "true" },
> + {"scsi-disk-base", "migrate-emulated-scsi-request", "false" },
> {"vfio-pci", "skip-vsc-check", "false" },
> };
Hi Paolo, hi Hyman,
this patch introduced a problem with device introspection on older machine
types. Running "make check-qtest SPEED=slow" is now failing for older
machine types.
Or if you want to reproduce it manually:
$ ./qemu-system-x86_64 -M pc-q35-8.0 -monitor stdio
QEMU 9.0.50 monitor - type 'help' for more information
(qemu) device_add scsi-block,help
Unexpected error in object_property_find_err() at
../../devel/qemu/qom/object.c:1357:
can't apply global scsi-disk-base.migrate-emulated-scsi-request=false:
Property 'scsi-block.migrate-emulated-scsi-request' not found
Aborted (core dumped)
I think the problem is that the property is only added to certain SCSI
devices via the DEFINE_SCSI_DISK_PROPERTIES macro, but "scsi-block" device
does not use this macro and thus does not have this property. So when the
compat code tries to set this property for "scsi-block" (which is also
derived from "scsi-disk-base"), it fails, of course.
Should the "migrate-emulated-scsi-request" property maybe rather be added to
the "scsi-disk-base" class instead? Or should hw_compat_9_0 rather list the
devices that have the property instead of using the parent class? Could you
please have a look?
Thanks,
Thomas
next prev parent reply other threads:[~2024-06-24 10:07 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-10 17:02 [PATCH v2] scsi-disk: Fix crash for VM configured with USB CDROM after live migration Paolo Bonzini
2024-06-24 10:06 ` Thomas Huth [this message]
2024-06-25 1:58 ` Yong Huang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b625d08e-e65f-4da4-818d-bbc4e2015122@redhat.com \
--to=thuth@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=yong.huang@smartx.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).