From: Jag Raman <jag.raman@oracle.com>
To: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: elena.ufimtseva@oracle.com, fam@euphon.net, thuth@redhat.com,
john.g.johnson@oracle.com, ehabkost@redhat.com,
konrad.wilk@oracle.com, liran.alon@oracle.com, rth@twiddle.net,
quintela@redhat.com, qemu-devel@nongnu.org, armbru@redhat.com,
ross.lagerwall@citrix.com, mst@redhat.com, kraxel@redhat.com,
stefanha@redhat.com, pbonzini@redhat.com,
kanth.ghatraju@oracle.com, mreitz@redhat.com, kwolf@redhat.com,
dgilbert@redhat.com, marcandre.lureau@gmail.com
Subject: Re: [RFC v4 PATCH 41/49] multi-process/mig: Enable VMSD save in the Proxy object
Date: Mon, 18 Nov 2019 10:42:19 -0500 [thread overview]
Message-ID: <456fe446-65f7-1769-6ea8-a63e2ca5d523@oracle.com> (raw)
In-Reply-To: <20191113171126.GL2445240@redhat.com>
On 11/13/2019 12:11 PM, Daniel P. Berrangé wrote:
> On Wed, Nov 13, 2019 at 11:32:09AM -0500, Jag Raman wrote:
>>
>>
>> On 11/13/2019 10:50 AM, Daniel P. Berrangé wrote:
>>> On Thu, Oct 24, 2019 at 05:09:22AM -0400, Jagannathan Raman wrote:
>>>> Collect the VMSD from remote process on the source and save
>>>> it to the channel leading to the destination
>>>>
>>>> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
>>>> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
>>>> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
>>>> ---
>>>> New patch in v4
>>>>
>>>> hw/proxy/qemu-proxy.c | 132 ++++++++++++++++++++++++++++++++++++++++++
>>>> include/hw/proxy/qemu-proxy.h | 2 +
>>>> include/io/mpqemu-link.h | 1 +
>>>> 3 files changed, 135 insertions(+)
>>>>
>>>> diff --git a/hw/proxy/qemu-proxy.c b/hw/proxy/qemu-proxy.c
>>>> index 623a6c5..ce72e6a 100644
>>>> --- a/hw/proxy/qemu-proxy.c
>>>> +++ b/hw/proxy/qemu-proxy.c
>>>> @@ -52,6 +52,14 @@
>>>> #include "util/event_notifier-posix.c"
>>>> #include "hw/boards.h"
>>>> #include "include/qemu/log.h"
>>>> +#include "io/channel.h"
>>>> +#include "migration/qemu-file-types.h"
>>>> +#include "qapi/error.h"
>>>> +#include "io/channel-util.h"
>>>> +#include "migration/qemu-file-channel.h"
>>>> +#include "migration/qemu-file.h"
>>>> +#include "migration/migration.h"
>>>> +#include "migration/vmstate.h"
>>>> QEMUTimer *hb_timer;
>>>> static void pci_proxy_dev_realize(PCIDevice *dev, Error **errp);
>>>> @@ -62,6 +70,9 @@ static void stop_heartbeat_timer(void);
>>>> static void childsig_handler(int sig, siginfo_t *siginfo, void *ctx);
>>>> static void broadcast_msg(MPQemuMsg *msg, bool need_reply);
>>>> +#define PAGE_SIZE getpagesize()
>>>> +uint8_t *mig_data;
>>>> +
>>>> static void childsig_handler(int sig, siginfo_t *siginfo, void *ctx)
>>>> {
>>>> /* TODO: Add proper handler. */
>>>> @@ -357,14 +368,135 @@ static void pci_proxy_dev_inst_init(Object *obj)
>>>> dev->mem_init = false;
>>>> }
>>>> +typedef struct {
>>>> + QEMUFile *rem;
>>>> + PCIProxyDev *dev;
>>>> +} proxy_mig_data;
>>>> +
>>>> +static void *proxy_mig_out(void *opaque)
>>>> +{
>>>> + proxy_mig_data *data = opaque;
>>>> + PCIProxyDev *dev = data->dev;
>>>> + uint8_t byte;
>>>> + uint64_t data_size = PAGE_SIZE;
>>>> +
>>>> + mig_data = g_malloc(data_size);
>>>> +
>>>> + while (true) {
>>>> + byte = qemu_get_byte(data->rem);
>>>
>>> There is a pretty large set of APIs hiding behind the qemu_get_byte
>>> call, which does not give me confidence that...
>>>
>>>> + mig_data[dev->migsize++] = byte;
>>>> + if (dev->migsize == data_size) {
>>>> + data_size += PAGE_SIZE;
>>>> + mig_data = g_realloc(mig_data, data_size);
>>>> + }
>>>> + }
>>>> +
>>>> + return NULL;
>>>> +}
>>>> +
>>>> +static int proxy_pre_save(void *opaque)
>>>> +{
>>>> + PCIProxyDev *pdev = opaque;
>>>> + proxy_mig_data *mig_data;
>>>> + QEMUFile *f_remote;
>>>> + MPQemuMsg msg = {0};
>>>> + QemuThread thread;
>>>> + Error *err = NULL;
>>>> + QIOChannel *ioc;
>>>> + uint64_t size;
>>>> + int fd[2];
>>>> +
>>>> + if (socketpair(AF_UNIX, SOCK_STREAM, 0, fd)) {
>>>> + return -1;
>>>> + }
>>>> +
>>>> + ioc = qio_channel_new_fd(fd[0], &err);
>>>> + if (err) {
>>>> + error_report_err(err);
>>>> + return -1;
>>>> + }
>>>> +
>>>> + qio_channel_set_name(QIO_CHANNEL(ioc), "PCIProxyDevice-mig");
>>>> +
>>>> + f_remote = qemu_fopen_channel_input(ioc);
>>>> +
>>>> + pdev->migsize = 0;
>>>> +
>>>> + mig_data = g_malloc0(sizeof(proxy_mig_data));
>>>> + mig_data->rem = f_remote;
>>>> + mig_data->dev = pdev;
>>>> +
>>>> + qemu_thread_create(&thread, "Proxy MIG_OUT", proxy_mig_out, mig_data,
>>>> + QEMU_THREAD_DETACHED);
>>>> +
>>>> + msg.cmd = START_MIG_OUT;
>>>> + msg.bytestream = 0;
>>>> + msg.num_fds = 2;
>>>> + msg.fds[0] = fd[1];
>>>> + msg.fds[1] = GET_REMOTE_WAIT;
>>>> +
>>>> + mpqemu_msg_send(pdev->mpqemu_link, &msg, pdev->mpqemu_link->com);
>>>> + size = wait_for_remote(msg.fds[1]);
>>>> + PUT_REMOTE_WAIT(msg.fds[1]);
>>>> +
>>>> + assert(size != ULLONG_MAX);
>>>> +
>>>> + /*
>>>> + * migsize is being update by a separate thread. Using volatile to
>>>> + * instruct the compiler to fetch the value of this variable from
>>>> + * memory during every read
>>>> + */
>>>> + while (*((volatile uint64_t *)&pdev->migsize) < size) {
>>>> + }
>>>> +
>>>> + qemu_thread_cancel(&thread);
>>>
>>> ....this is a safe way to stop the thread executing without
>>> resulting in memory being leaked.
>>>
>>> In addition thread cancellation is asynchronous, so the thread
>>> may still be using the QEMUFile object while....
>>>
>>>> + qemu_fclose(f_remote);
>>
>> The above "wait_for_remote()" call waits for the remote process to
>> finish with Migration, and return the size of the VMSD.
>>
>> It should be safe to cancel the thread and close the file, once the
>> remote process is done sending the VMSD and we have read "size" bytes
>> from it, is it not?
>
> Ok, so the thread is doing
>
> while (true) {
> byte = qemu_get_byte(data->rem);
> ...do something with byte...
> }
>
> so when the thread is cancelled it is almost certainly in the
> qemu_get_byte() call. Since you say wait_for_remote() syncs
> with the end of migration, I'll presume there's no more data
> to be read but the file is still open.
>
> If we're using a blocking FD here we'll probably be stuck in
> read() when we're cancelled, and cancellation would probably
> be ok from looking at the current impl of QEMUFile / QIOChannel.
> If we're handling any error scenario though there could be a
> "Error *local_err" that needs freeing before cancellation.
>
> If the fclose is processed before cancellation takes affect
> on the target thread though we could have a race.
>
> 1. proxy_mig_out blocked in read from qemu_fill_buffer
>
> 2. main thread request async cancel
>
> 3. main thread calls qemu_fclose which closes the FD
> and free's the QEMUFile object
>
> 4. proxy_mig_out thread returns from read() with
> ret == 0 (EOF)
This wasn't happening. It would be convenient if it did.
When the file was closed by the main thread, the async thread was still
hung at qemu_fill_buffer(), instead of returning 0 (EOF). That's reason
why we took the thread-cancellation route. We'd be glad to remove
qemu_thread_cancel().
>
> 5. proxy_mig_out thread calls qemu_file_set_error_obj
> on a QEMUFole object free'd in (3). use after free. opps
>
> 6. ..async cancel request gets delivered....
>
> admittedly it is fairly unlikely for the async cancel
> to be delayed for so long that this sequence happens, but
> unexpected things can happen when we really don't want them.
Absolutely, we don't want to leave anything to chance.
>
> IMHO the safe way to deal with this would be a lock-step
> sequence between the threads
>
> 1. proxy_mig_out blocked in read from qemu_fill_buffer
>
> 2. main thread closes the FD with qemu_file_shutdown()
> closing both directions
Will give qemu_file_shutdown() a try.
Thank you!
--
Jag
>
> 3. proxy_mig_out returns from read with ret == 0 (EOF)
>
> 4. proxy_mig_out thread breaks out of its inifinite loop
> due to EOF and exits
>
> 5. main thread calls pthread_join on proxy_mig_out
>
> 6. main thread calls qemu_fclose()
>
> this is easier to reason about the safety of than the cancel based
> approach IMHO.
>
> Regards,
> Daniel
>
next prev parent reply other threads:[~2019-11-18 15:43 UTC|newest]
Thread overview: 140+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-24 9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
2019-10-24 9:08 ` [RFC v4 PATCH 01/49] multi-process: memory: alloc RAM from file at offset Jagannathan Raman
2019-10-24 9:08 ` [RFC v4 PATCH 02/49] multi-process: util: Add qemu_thread_cancel() to cancel running thread Jagannathan Raman
2019-11-13 15:30 ` Stefan Hajnoczi
2019-11-13 15:38 ` Jag Raman
2019-11-13 15:51 ` Daniel P. Berrangé
2019-11-13 16:04 ` Jag Raman
2019-11-13 16:35 ` Daniel P. Berrangé
2019-10-24 9:08 ` [RFC v4 PATCH 03/49] multi-process: add a command line option for debug file Jagannathan Raman
2019-11-13 15:35 ` Stefan Hajnoczi
2019-10-24 9:08 ` [RFC v4 PATCH 04/49] multi-process: Add stub functions to facilate build of multi-process Jagannathan Raman
2019-10-24 9:08 ` [RFC v4 PATCH 05/49] multi-process: Add config option for multi-process QEMU Jagannathan Raman
2019-10-24 9:08 ` [RFC v4 PATCH 06/49] multi-process: build system for remote device process Jagannathan Raman
2019-10-24 9:08 ` [RFC v4 PATCH 07/49] multi-process: define mpqemu-link object Jagannathan Raman
2019-11-11 16:41 ` Stefan Hajnoczi
2019-11-13 15:47 ` Jag Raman
2019-11-13 15:53 ` Stefan Hajnoczi
2019-11-18 15:26 ` Jag Raman
2019-10-24 9:08 ` [RFC v4 PATCH 08/49] multi-process: add functions to synchronize proxy and remote endpoints Jagannathan Raman
2019-10-24 9:08 ` [RFC v4 PATCH 09/49] multi-process: setup PCI host bridge for remote device Jagannathan Raman
2019-11-13 16:07 ` Stefan Hajnoczi
2019-11-18 15:25 ` Jag Raman
2019-11-21 10:37 ` Stefan Hajnoczi
2019-10-24 9:08 ` [RFC v4 PATCH 10/49] multi-process: setup a machine object for remote device process Jagannathan Raman
2019-11-13 16:22 ` Stefan Hajnoczi
2019-11-18 15:29 ` Jag Raman
2019-10-24 9:08 ` [RFC v4 PATCH 11/49] multi-process: setup memory manager for remote device Jagannathan Raman
2019-11-13 16:33 ` Stefan Hajnoczi
2019-11-13 16:34 ` Jag Raman
2019-10-24 9:08 ` [RFC v4 PATCH 12/49] multi-process: remote process initialization Jagannathan Raman
2019-11-13 16:38 ` Stefan Hajnoczi
2019-10-24 9:08 ` [RFC v4 PATCH 13/49] multi-process: introduce proxy object Jagannathan Raman
2019-11-21 11:09 ` Stefan Hajnoczi
2019-10-24 9:08 ` [RFC v4 PATCH 14/49] mutli-process: build remote command line args Jagannathan Raman
2019-11-21 11:23 ` Stefan Hajnoczi
2019-10-24 9:08 ` [RFC v4 PATCH 15/49] multi-process: PCI BAR read/write handling for proxy & remote endpoints Jagannathan Raman
2019-11-21 11:33 ` Stefan Hajnoczi
2019-10-24 9:08 ` [RFC v4 PATCH 16/49] multi-process: Add LSI device proxy object Jagannathan Raman
2019-11-21 11:35 ` Stefan Hajnoczi
2019-10-24 9:08 ` [RFC v4 PATCH 17/49] multi-process: Synchronize remote memory Jagannathan Raman
2019-11-21 11:44 ` Stefan Hajnoczi
2019-10-24 9:08 ` [RFC v4 PATCH 18/49] multi-process: create IOHUB object to handle irq Jagannathan Raman
2019-11-21 12:02 ` Stefan Hajnoczi
2019-10-24 9:09 ` [RFC v4 PATCH 19/49] multi-process: configure remote side devices Jagannathan Raman
2019-11-21 12:05 ` Stefan Hajnoczi
2019-10-24 9:09 ` [RFC v4 PATCH 20/49] multi-process: add qdev_proxy_add to create proxy devices Jagannathan Raman
2019-11-21 12:16 ` Stefan Hajnoczi
2019-10-24 9:09 ` [RFC v4 PATCH 21/49] multi-process: remote: add setup_devices and setup_drive msg processing Jagannathan Raman
2019-10-24 9:09 ` [RFC v4 PATCH 22/49] multi-process: remote: use fd for socket from parent process Jagannathan Raman
2019-10-24 9:09 ` [RFC v4 PATCH 23/49] multi-process: remote: add create_done condition Jagannathan Raman
2019-10-24 9:09 ` [RFC v4 PATCH 24/49] multi-process: add processing of remote drive and device command line Jagannathan Raman
2019-10-24 9:09 ` [RFC v4 PATCH 25/49] multi-process: Introduce build flags to separate remote process code Jagannathan Raman
2019-10-24 9:09 ` [RFC v4 PATCH 26/49] multi-process: refractor vl.c code to re-use in remote Jagannathan Raman
2019-10-24 9:09 ` [RFC v4 PATCH 27/49] multi-process: add remote option Jagannathan Raman
2019-10-24 9:09 ` [RFC v4 PATCH 28/49] multi-process: add remote options parser Jagannathan Raman
2019-10-24 9:09 ` [RFC v4 PATCH 29/49] multi-process: add parse_cmdline in remote process Jagannathan Raman
2019-10-24 9:09 ` [RFC v4 PATCH 30/49] multi-process: send heartbeat messages to remote Jagannathan Raman
2019-11-11 16:27 ` Stefan Hajnoczi
2019-11-13 16:01 ` Jag Raman
2019-11-21 12:19 ` Stefan Hajnoczi
2019-10-24 9:09 ` [RFC v4 PATCH 31/49] multi-process: handle heartbeat messages in remote process Jagannathan Raman
2019-10-24 9:09 ` [RFC v4 PATCH 32/49] multi-process: Use separate MMIO communication channel Jagannathan Raman
2019-11-11 16:21 ` Stefan Hajnoczi
2019-11-13 16:14 ` Jag Raman
2019-11-21 12:31 ` Stefan Hajnoczi
2019-10-24 9:09 ` [RFC v4 PATCH 33/49] multi-process: perform device reset in the remote process Jagannathan Raman
2019-11-11 16:19 ` Stefan Hajnoczi
2019-11-13 16:15 ` Jag Raman
2019-10-24 9:09 ` [RFC v4 PATCH 34/49] multi-process/mon: choose HMP commands based on target Jagannathan Raman
2019-10-24 9:09 ` [RFC v4 PATCH 35/49] multi-process/mon: stub functions to enable QMP module for remote process Jagannathan Raman
2019-10-24 9:09 ` [RFC v4 PATCH 36/49] multi-process/mon: enable QMP module support in the " Jagannathan Raman
2019-10-24 9:09 ` [RFC v4 PATCH 37/49] multi-process/mon: Refactor monitor/chardev functions out of vl.c Jagannathan Raman
2019-10-24 9:09 ` [RFC v4 PATCH 38/49] multi-process/mon: Initialize QMP module for remote processes Jagannathan Raman
2019-10-24 9:09 ` [RFC v4 PATCH 39/49] multi-process: prevent duplicate memory initialization in remote Jagannathan Raman
2019-10-24 9:09 ` [RFC v4 PATCH 40/49] multi-process/mig: build migration module in the remote process Jagannathan Raman
2019-10-24 9:09 ` [RFC v4 PATCH 41/49] multi-process/mig: Enable VMSD save in the Proxy object Jagannathan Raman
2019-11-13 15:50 ` Daniel P. Berrangé
2019-11-13 16:32 ` Jag Raman
2019-11-13 17:11 ` Daniel P. Berrangé
2019-11-18 15:42 ` Jag Raman [this message]
2019-11-22 10:34 ` Dr. David Alan Gilbert
2019-10-24 9:09 ` [RFC v4 PATCH 42/49] multi-process/mig: Send VMSD of remote to " Jagannathan Raman
2019-10-24 9:09 ` [RFC v4 PATCH 43/49] multi-process/mig: Load VMSD in the proxy object Jagannathan Raman
2019-10-24 9:09 ` [RFC v4 PATCH 44/49] multi-process/mig: refactor runstate_check into common file Jagannathan Raman
2019-10-24 9:09 ` [RFC v4 PATCH 45/49] multi-process/mig: Synchronize runstate of remote process Jagannathan Raman
2019-11-11 16:17 ` Stefan Hajnoczi
2019-11-13 16:33 ` Jag Raman
2019-10-24 9:09 ` [RFC v4 PATCH 46/49] multi-process/mig: Restore the VMSD in " Jagannathan Raman
2019-10-24 9:09 ` [RFC v4 PATCH 47/49] multi-process: Enable support for multiple devices in remote Jagannathan Raman
2019-11-11 16:15 ` Stefan Hajnoczi
2019-11-13 16:21 ` Jag Raman
2019-10-24 9:09 ` [RFC v4 PATCH 48/49] multi-process: add the concept description to docs/devel/qemu-multiprocess Jagannathan Raman
2019-10-25 19:33 ` Elena Ufimtseva
2019-11-07 15:50 ` Stefan Hajnoczi
2019-11-11 15:41 ` Stefan Hajnoczi
2019-10-24 9:09 ` [RFC v4 PATCH 49/49] multi-process: add configure and usage information Jagannathan Raman
2019-11-07 14:02 ` Stefan Hajnoczi
2019-11-07 14:33 ` Michael S. Tsirkin
2019-11-08 11:17 ` Stefan Hajnoczi
2019-11-08 11:32 ` Daniel P. Berrangé
2019-11-07 14:39 ` Daniel P. Berrangé
2019-11-07 15:53 ` Jag Raman
2019-11-08 11:14 ` Stefan Hajnoczi
2019-10-25 2:08 ` [RFC v4 PATCH 00/49] Initial support of multi-process qemu no-reply
2019-10-25 2:08 ` no-reply
2019-10-25 2:10 ` no-reply
2019-11-21 12:46 ` Stefan Hajnoczi
2019-12-10 6:47 ` [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update Elena Ufimtseva
2019-12-13 10:41 ` Stefan Hajnoczi
2019-12-16 19:46 ` Elena Ufimtseva
2019-12-16 19:57 ` Felipe Franciosi
2019-12-17 16:33 ` Stefan Hajnoczi
2019-12-17 22:57 ` Felipe Franciosi
2019-12-18 0:00 ` Paolo Bonzini
2019-12-19 13:36 ` Stefan Hajnoczi
2019-12-20 17:15 ` John G Johnson
2020-01-02 10:00 ` Stefan Hajnoczi
2020-01-02 10:04 ` Stefan Hajnoczi
2019-12-19 11:55 ` Stefan Hajnoczi
2019-12-19 12:33 ` Felipe Franciosi
2019-12-19 12:55 ` Daniel P. Berrangé
2019-12-20 9:47 ` Stefan Hajnoczi
2019-12-20 9:50 ` Paolo Bonzini
2019-12-20 14:14 ` Felipe Franciosi
2019-12-20 15:25 ` Alex Williamson
2019-12-20 16:00 ` Felipe Franciosi
2020-02-25 9:16 ` Thanos Makatos
2019-12-20 10:22 ` Daniel P. Berrangé
2020-01-02 10:42 ` Stefan Hajnoczi
2020-01-02 11:03 ` Felipe Franciosi
2020-01-02 18:55 ` Marc-André Lureau
2020-01-08 16:31 ` Stefan Hajnoczi
2020-01-03 15:59 ` Stefan Hajnoczi
2020-01-14 1:56 ` John G Johnson
2020-01-17 17:25 ` Dr. David Alan Gilbert
2019-12-19 16:40 ` Jag Raman
2019-12-19 12:50 ` Daniel P. Berrangé
2019-12-19 16:46 ` Daniel P. Berrangé
2020-01-02 16:01 ` Elena Ufimtseva
2020-01-03 15:00 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=456fe446-65f7-1769-6ea8-a63e2ca5d523@oracle.com \
--to=jag.raman@oracle.com \
--cc=armbru@redhat.com \
--cc=berrange@redhat.com \
--cc=dgilbert@redhat.com \
--cc=ehabkost@redhat.com \
--cc=elena.ufimtseva@oracle.com \
--cc=fam@euphon.net \
--cc=john.g.johnson@oracle.com \
--cc=kanth.ghatraju@oracle.com \
--cc=konrad.wilk@oracle.com \
--cc=kraxel@redhat.com \
--cc=kwolf@redhat.com \
--cc=liran.alon@oracle.com \
--cc=marcandre.lureau@gmail.com \
--cc=mreitz@redhat.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=ross.lagerwall@citrix.com \
--cc=rth@twiddle.net \
--cc=stefanha@redhat.com \
--cc=thuth@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).