From: Anthony Liguori <anthony@codemonkey.ws>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH] net: delay peer host device delete
Date: Mon, 20 Sep 2010 13:14:12 -0500 [thread overview]
Message-ID: <4C97A474.8040900@codemonkey.ws> (raw)
In-Reply-To: <20100920171439.GF29862@redhat.com>
On 09/20/2010 12:14 PM, Michael S. Tsirkin wrote:
> On Mon, Sep 20, 2010 at 11:56:56AM -0500, Anthony Liguori wrote:
>
>> On 09/20/2010 11:47 AM, Michael S. Tsirkin wrote:
>>
>>> On Mon, Sep 20, 2010 at 11:41:45AM -0500, Anthony Liguori wrote:
>>>
>>>> On 09/20/2010 11:30 AM, Michael S. Tsirkin wrote:
>>>>
>>>>> With -netdev, virtio devices present offload
>>>>> features to guest, depending on the backend used.
>>>>> Thus, removing host ntedev peer while guest is
>>>>> active leads to guest-visible inconsistency and/or crashes.
>>>>> See e.g. https://bugzilla.redhat.com/show_bug.cgi?id=623735
>>>>>
>>>>> As a solution, while guest (NIC) peer device exists,
>>>>> we must prevent the host peer from being deleted.
>>>>>
>>>>> This patch does this by adding peer_deleted flag in nic state:
>>>>> if host device is going away while guest device
>>>>> is around, set this flag and keep host device around
>>>>> for as long as guest device exists.
>>>>>
>>>> Having an unclear life cycle really worries me.
>>>>
>>>> Wouldn't the more correct solution be to avoid removing the netdev
>>>> device until after the peer has successfully been removed?
>>>>
>>>> Regards,
>>>>
>>>> Anthony Liguori
>>>>
>>> This is exactly what the patch does.
>>>
>> At the management layer instead of doing it magically in the backend.
>>
> The amount of pain this inflicts on management would be considerable.
> Hotplug commands were designed to be asynchronous
> (starts the process, does not wait for it to complete), maybe that
> was a mistake but we can not change semantics at will now.
>
> Add new commands, okay, but existing ones should work and get fixed
> if there's a bug.
>
But having commands that are impossible to use correctly is not very good.
Here's what makes sense to me:
1) async device remove + poll device status/removal notification +
remove backend
The management tool needs to determine when the device is gone and
remove the backend.
2) sync device remove + remove backend
Command does not return until device is removed
3) async device and backend removal + poll device/backend removal +
removal notification
One command that removes the device and any associated backend. We need
to indicate to the management layer when this operation is complete.
I think (2) is the most elegant but also the most difficult to implement
today. I think (1) is the least invasive to implement but has the most
management tool complexity. (3) is probably the best compromise in
terms of complexity and ease of implementation.
Just for comparison, your patch does:
4) async device removal + remove backend
Whereas remove backend may or may not cause removal depending on whether
device removal has happened. So it's really async removal but it
doesn't happen deterministically on it's own. What happens if you call
remove backend before starting async device removal? What if the guest
never removes the device? What if a reset happens?
One advantage of (1) is that there is no tricky life cycle
considerations. If we did (3), we would have to think through what
happens if a guest doesn't respond to an unplug request.
Regards,
Anthony Liguori
>
>> IOW, if device_del returns and the device isn't actually deleted,
>> that's a bug and addressing it like this just means we'll trip over
>> it somewhere else.
>>
>> We'll have the same problem with drive_del.
>>
> Let's fix it there as well then.
>
>
>> Regards,
>>
>> Anthony Liguori
>>
>>
>>>>> Signed-off-by: Michael S. Tsirkin<mst@redhat.com>
>>>>> ---
>>>>> net.c | 21 ++++++++++++++++++++-
>>>>> net.h | 1 +
>>>>> 2 files changed, 21 insertions(+), 1 deletions(-)
>>>>>
>>>>> diff --git a/net.c b/net.c
>>>>> index 3d0fde7..10855d1 100644
>>>>> --- a/net.c
>>>>> +++ b/net.c
>>>>> @@ -286,12 +286,31 @@ void qemu_del_vlan_client(VLANClientState *vc)
>>>>> if (vc->vlan) {
>>>>> QTAILQ_REMOVE(&vc->vlan->clients, vc, next);
>>>>> } else {
>>>>> + /* Even if client will not be deleted yet, remove it from list so it
>>>>> + * does not appear in monitor. */
>>>>> + QTAILQ_REMOVE(&non_vlan_clients, vc, next);
>>>>> + /* Detect that guest-visible (NIC) peer is active, and delay deletion.
>>>>> + * */
>>>>> + if (vc->peer&& vc->peer->info->type == NET_CLIENT_TYPE_NIC) {
>>>>> + NICState *nic = DO_UPCAST(NICState, nc, vc->peer);
>>>>> + assert(!nic->peer_deleted);
>>>>> + nic->peer_deleted = true;
>>>>> + return;
>>>>> + }
>>>>> if (vc->send_queue) {
>>>>> qemu_del_net_queue(vc->send_queue);
>>>>> }
>>>>> - QTAILQ_REMOVE(&non_vlan_clients, vc, next);
>>>>> if (vc->peer) {
>>>>> vc->peer->peer = NULL;
>>>>> + /* If this is a guest-visible (NIC) device,
>>>>> + * and peer has already been removed from monitor,
>>>>> + * delete it here. */
>>>>> + if (vc->info->type == NET_CLIENT_TYPE_NIC) {
>>>>> + NICState *nic = DO_UPCAST(NICState, nc, vc);
>>>>> + if (nic->peer_deleted) {
>>>>> + qemu_del_vlan_client(vc->peer);
>>>>> + }
>>>>> + }
>>>>> }
>>>>> }
>>>>>
>>>>> diff --git a/net.h b/net.h
>>>>> index 518cf9c..44c31a9 100644
>>>>> --- a/net.h
>>>>> +++ b/net.h
>>>>> @@ -72,6 +72,7 @@ typedef struct NICState {
>>>>> VLANClientState nc;
>>>>> NICConf *conf;
>>>>> void *opaque;
>>>>> + bool peer_deleted;
>>>>> } NICState;
>>>>>
>>>>> struct VLANState {
>>>>>
next prev parent reply other threads:[~2010-09-20 18:14 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-09-20 16:30 [Qemu-devel] [PATCH] net: delay peer host device delete Michael S. Tsirkin
2010-09-20 16:41 ` Anthony Liguori
2010-09-20 16:47 ` Michael S. Tsirkin
2010-09-20 16:56 ` Anthony Liguori
2010-09-20 17:14 ` Michael S. Tsirkin
2010-09-20 18:14 ` Anthony Liguori [this message]
2010-09-20 18:19 ` Anthony Liguori
2010-09-20 18:59 ` [Qemu-devel] " Michael S. Tsirkin
2010-09-20 19:22 ` Anthony Liguori
2010-09-20 19:37 ` Michael S. Tsirkin
2010-09-20 20:15 ` Anthony Liguori
2010-09-20 20:15 ` Michael S. Tsirkin
2010-09-21 8:58 ` Daniel P. Berrange
2010-09-21 9:20 ` Michael S. Tsirkin
2010-09-21 12:47 ` Anthony Liguori
2010-09-20 18:24 ` Michael S. Tsirkin
2010-09-20 18:39 ` Anthony Liguori
2010-09-20 19:15 ` Michael S. Tsirkin
2010-09-20 19:28 ` Anthony Liguori
2010-09-20 19:44 ` Michael S. Tsirkin
2010-09-20 20:20 ` Anthony Liguori
2010-09-20 20:27 ` Michael S. Tsirkin
2010-09-20 20:38 ` Anthony Liguori
2010-09-20 20:37 ` Michael S. Tsirkin
2010-09-20 20:50 ` Anthony Liguori
2010-09-21 9:18 ` Michael S. Tsirkin
2010-09-21 12:42 ` Anthony Liguori
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4C97A474.8040900@codemonkey.ws \
--to=anthony@codemonkey.ws \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.