Re: [Qemu-devel] [PATCH v3 2/2] vhost user: Add RARP injection for legacy guest

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Jason Wang <jasowang@redhat.com>
To: Thibaut Collet <thibaut.collet@6wind.com>,
	"Michael S. Tsirkin" <mst@redhat.com>
Cc: qemu-devel <qemu-devel@nongnu.org>,
	Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v3 2/2] vhost user: Add RARP injection for legacy guest
Date: Fri, 26 Jun 2015 12:06:35 +0800	[thread overview]
Message-ID: <558CCFCB.3000706@redhat.com> (raw)
In-Reply-To: <CABUUfwNNrvdcr1cp88mnqnHKH9L7f=yj6oRWOYAsAoB_T9w=CQ@mail.gmail.com>



On 06/25/2015 10:22 PM, Thibaut Collet wrote:
> On Thu, Jun 25, 2015 at 2:53 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> On Thu, Jun 25, 2015 at 01:01:29PM +0200, Thibaut Collet wrote:
>>> On Thu, Jun 25, 2015 at 11:59 AM, Jason Wang <jasowang@redhat.com> wrote:
>>>>
>>>>
>>>> On 06/24/2015 07:05 PM, Michael S. Tsirkin wrote:
>>>>> On Wed, Jun 24, 2015 at 04:31:15PM +0800, Jason Wang wrote:
>>>>>> On 06/23/2015 01:49 PM, Michael S. Tsirkin wrote:
>>>>>>> On Tue, Jun 23, 2015 at 10:12:17AM +0800, Jason Wang wrote:
>>>>>>>>> On 06/18/2015 11:16 PM, Thibaut Collet wrote:
>>>>>>>>>>> On Tue, Jun 16, 2015 at 10:05 AM, Jason Wang <jasowang@redhat.com> wrote:
>>>>>>>>>>>>> On 06/16/2015 03:24 PM, Thibaut Collet wrote:
>>>>>>>>>>>>>>> If my understanding is correct, on a resume operation, we have the
>>>>>>>>>>>>>>> following callback trace:
>>>>>>>>>>>>>>> 1. virtio_pci_restore function that calls all restore call back of
>>>>>>>>>>>>>>> virtio devices
>>>>>>>>>>>>>>> 2. virtnet_restore that calls try_fill_recv function for each virtual queues
>>>>>>>>>>>>>>> 3. try_fill_recv function kicks the virtual queue (through
>>>>>>>>>>>>>>> virtqueue_kick function)
>>>>>>>>>>>>> Yes, but this happens only after pm resume not migration. Migration is
>>>>>>>>>>>>> totally transparent to guest.
>>>>>>>>>>>>>
>>>>>>>>>>> Hi Jason,
>>>>>>>>>>>
>>>>>>>>>>> After a deeper look in the migration code of QEMU a resume event is
>>>>>>>>>>> always sent when the live migration is finished.
>>>>>>>>>>> On a live migration we have the following callback trace:
>>>>>>>>>>> 1. The VM on the new host is set to the state RUN_STATE_INMIGRATE, the
>>>>>>>>>>> autostart boolean to 1  and calls the qemu_start_incoming_migration
>>>>>>>>>>> function (see function main of vl.c)
>>>>>>>>>>> .....
>>>>>>>>>>> 2. call of process_incoming_migration function in
>>>>>>>>>>> migration/migration.c file whatever the way to do the live migration
>>>>>>>>>>> (tcp:, fd:, unix:, exec: ...)
>>>>>>>>>>> 3. call of process_incoming_migration_co function in migration/migration.c
>>>>>>>>>>> 4. call of vm_start function in vl.c (otherwise the migrated VM stay
>>>>>>>>>>> in the pause state, the autostart boolean is set to 1 by the main
>>>>>>>>>>> function in vl.c)
>>>>>>>>>>> 5. call of vm_start function that sets the VM is the RUN_STATE_RUNNING state.
>>>>>>>>>>> 6. call of qapi_event_send_resume function that ends a resume event to the VM
>>>>>>>>> AFAIK, this function sends resume event to qemu monitor not VM.
>>>>>>>>>
>>>>>>>>>>> So when a live migration is ended:
>>>>>>>>>>> 1. a resume event is sent to the guest
>>>>>>>>>>> 2. On the reception of this resume event the virtual queue are kicked
>>>>>>>>>>> by the guest
>>>>>>>>>>> 3. Backend vhost user catches this kick and can emit a RARP to guest
>>>>>>>>>>> that does not support GUEST_ANNOUNCE
>>>>>>>>>>>
>>>>>>>>>>> This solution, as solution based on detection of DRIVER_OK status
>>>>>>>>>>> suggested by Michael, allows backend to send the RARP to legacy guest
>>>>>>>>>>> without involving QEMU and add ioctl to vhost-user.
>>>>>>>>> A question here is did vhost-user code pass status to the backend? If
>>>>>>>>> not, how can userspace backend detect DRIVER_OK?
>>>>>>> Sorry, I must have been unclear.
>>>>>>> vhost core calls VHOST_NET_SET_BACKEND on DRIVER_OK.
>>>>>>> Unfortunately vhost user currently translates it to VHOST_USER_NONE.
>>>>>> Looks like VHOST_NET_SET_BACKEND was only used for tap backend.
>>>>>>
>>>>>>> As a work around, I think kicking ioeventfds once you get
>>>>>>> VHOST_NET_SET_BACKEND will work.
>>>>>> Maybe just a eventfd_set() in vhost_net_start(). But is this
>>>>>> "workaround" elegant enough to be documented? Is it better to do this
>>>>>> explicitly with a new feature?
>>>>> If you are going to do this anyway, there are a couple of other changes
>>>>> we should do, in particular, decide what we want to do with control vq.
>>>>>
>>>> If I understand correctly, you mean VIRTIO_NET_CTRL_MQ and
>>>> VIRTIO_NET_CTRL_GUEST_OFFLOADS? Looks like both of these were broken.
>>>> Need more thought, maybe new kinds of requests.
>>>>
>>>>
>>> Are there any objections to add VHOST_NET_SET_BACKEND support to vhost
>>> user with a patch like that:
>>>
>>>
>>>  hw/net/vhost_net.c     |    8 ++++++++
>>>  hw/virtio/vhost-user.c |   10 +++++++++-
>>>  2 files changed, 17 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
>>> index 907e002..7a008c0 100644
>>> --- a/hw/net/vhost_net.c
>>> +++ b/hw/net/vhost_net.c
>>> @@ -234,6 +234,14 @@ static int vhost_net_start_one(struct vhost_net *net,
>>>                  goto fail;
>>>              }
>>>          }
>>> +    } else if (net->nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER) {
>>> +         file.fd = 0;
>>> +         for (file.index = 0; file.index < net->dev.nvqs; ++file.index) {
>>> +            const VhostOps *vhost_ops = net->dev.vhost_ops;
>>> +            int r = vhost_ops->vhost_call(&net->dev, VHOST_NET_SET_BACKEND,
>>> +                                          &file);
>>> +            assert(r >= 0);
>>> +        }
>>>      }
>>>      return 0;
>>>  fail:
>>> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
>>> index d6f2163..32c6bd9 100644
>>> --- a/hw/virtio/vhost-user.c
>>> +++ b/hw/virtio/vhost-user.c
>>> @@ -41,6 +41,7 @@ typedef enum VhostUserRequest {
>>>      VHOST_USER_SET_VRING_KICK = 12,
>>>      VHOST_USER_SET_VRING_CALL = 13,
>>>      VHOST_USER_SET_VRING_ERR = 14,
>>> +    VHOST_USER_NET_SET_BACKEND = 15,
>>>      VHOST_USER_MAX
>>>  } VhostUserRequest;
>>>
>>> @@ -104,7 +105,8 @@ static unsigned long int
>>> ioctl_to_vhost_user_request[VHOST_USER_MAX] = {
>>>      VHOST_GET_VRING_BASE,   /* VHOST_USER_GET_VRING_BASE */
>>>      VHOST_SET_VRING_KICK,   /* VHOST_USER_SET_VRING_KICK */
>>>      VHOST_SET_VRING_CALL,   /* VHOST_USER_SET_VRING_CALL */
>>> -    VHOST_SET_VRING_ERR     /* VHOST_USER_SET_VRING_ERR */
>>> +    VHOST_SET_VRING_ERR,    /* VHOST_USER_SET_VRING_ERR */
>>> +    VHOST_NET_SET_BACKEND   /* VHOST_USER_NET_SET_BACKEND */
>>>  };
>>>
>>>  static VhostUserRequest vhost_user_request_translate(unsigned long int request)
>>> @@ -287,6 +289,12 @@ static int vhost_user_call(struct vhost_dev *dev,
>>> unsigned long int request,
>>>              msg.u64 |= VHOST_USER_VRING_NOFD_MASK;
>>>          }
>>>          break;
>>> +
>>> +    case VHOST_NET_SET_BACKEND:
>>> +        memcpy(&msg.file, arg, sizeof(struct vhost_vring_state));
>>> +        msg.size = sizeof(m.state);
>>> +        break;
>>> +
>>>      default:
>>>          error_report("vhost-user trying to send unhandled ioctl");
>>>          return -1;
>>>
>>>
>>> This message will be sent when guest is ready and can be used by vhost
>>> user backend to send RARP to legacy guest.
>>>
>>> This solution avoids to add new message and has no impact on control vq.
>>
>> I think that you can't add messages to protocol unconditionally.
>> For example, snabbswitch simply crashes if it gets an unknown
>> message.
>>
>> Either this needs a new feature bit, or implement
>> [PATCH RFC] vhost-user: protocol extensions
>> making it safe to add new messages.
>>
>> --
>> MST
> I understand.
> Last idea before doing a RFC:
> Normally guest notifies vhost of new buffer onto a virtqueue by
> kicking the eventfd. This eventfd has been provided to vhost by QEMU.
> So when DRIVER_OK is received by QEMU, QEMU can kick the eventfd.
>
> A possible patch to do that is:
>  hw/net/vhost_net.c |    7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index 907e002..fbc55e0 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -234,6 +234,13 @@ static int vhost_net_start_one(struct vhost_net *net,
>                  goto fail;
>              }
>          }
> +    } else if (net->nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER) {
> +         int idx;
> +
> +         for (idx = 0; idx < net->dev.nvqs; ++idx) {
> +            struct VirtQueue *vq = virtio_get_queue(dev, idx);
> +            event_notifier_set(virtio_queue_get_host_notifier(vq));
> +        }
>      }
>      return 0;
>  fail:
>
> kicking this eventfd has no impact for QEMU or the guest (they do not
> poll it) and simply wake up vhost to allow it to send RARP for legacy
> guest.
>
> Regards.
>
> Thibaut.

This may work but the issue is:

- I believe we should document this in the spec. But it looks more like
a workaround and use implicit method to notify control message which
often cause ulgy codes in both side. This is not elegant to be
documented in the spec.
- Consider you may have 100 queues, then kick will happen 100 times and
backend need handle such case.

next prev parent reply	other threads:[~2015-06-26  4:06 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-10 13:43 [Qemu-devel] [PATCH v3 0/2] Add live migration for vhost user Thibaut Collet
2015-06-10 13:43 ` [Qemu-devel] [PATCH v3 1/2] vhost user: add support of live migration Thibaut Collet
2015-06-10 13:52   ` Michael S. Tsirkin
2015-06-10 14:22     ` Thibaut Collet
2015-06-10 14:27       ` Michael S. Tsirkin
2015-06-10 15:24         ` Thibaut Collet
2015-06-10 15:34     ` Stefan Hajnoczi
2015-06-23  6:01   ` Michael S. Tsirkin
2015-06-10 13:43 ` [Qemu-devel] [PATCH v3 2/2] vhost user: Add RARP injection for legacy guest Thibaut Collet
2015-06-10 15:34   ` Michael S. Tsirkin
2015-06-10 15:48     ` Thibaut Collet
2015-06-10 16:00       ` Michael S. Tsirkin
2015-06-10 20:25         ` Thibaut Collet
2015-06-10 20:50           ` Michael S. Tsirkin
2015-06-11  5:34             ` Thibaut Collet
2015-06-11  5:39           ` Jason Wang
2015-06-11  5:49             ` Thibaut Collet
2015-06-11  5:54               ` Jason Wang
2015-06-11 10:38                 ` Michael S. Tsirkin
2015-06-11 12:10                   ` Thibaut Collet
2015-06-11 12:13                     ` Michael S. Tsirkin
2015-06-11 12:33                       ` Thibaut Collet
2015-06-12  7:55                       ` Jason Wang
2015-06-12 11:53                         ` Thibaut Collet
2015-06-12 14:28                         ` Michael S. Tsirkin
2015-06-15  7:43                           ` Jason Wang
2015-06-15  8:44                             ` Michael S. Tsirkin
2015-06-15 12:12                               ` Thibaut Collet
2015-06-15 12:45                                 ` Michael S. Tsirkin
2015-06-15 13:04                                   ` Thibaut Collet
2015-06-16  5:29                                 ` Jason Wang
2015-06-16  7:24                                   ` Thibaut Collet
2015-06-16  8:05                                     ` Jason Wang
2015-06-16  8:16                                       ` Thibaut Collet
2015-06-17  4:16                                         ` Jason Wang
2015-06-17  6:42                                           ` Michael S. Tsirkin
2015-06-17  7:05                                             ` Thibaut Collet
2015-06-18 15:16                                       ` Thibaut Collet
2015-06-23  2:12                                         ` Jason Wang
2015-06-23  5:49                                           ` Michael S. Tsirkin
2015-06-24  8:31                                             ` Jason Wang
2015-06-24 11:05                                               ` Michael S. Tsirkin
2015-06-25  9:59                                                 ` Jason Wang
2015-06-25 11:01                                                   ` Thibaut Collet
2015-06-25 12:53                                                     ` Michael S. Tsirkin
2015-06-25 14:22                                                       ` Thibaut Collet
2015-06-26  4:06                                                         ` Jason Wang [this message]
2015-06-16  3:35                               ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=558CCFCB.3000706@redhat.com \
    --to=jasowang@redhat.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=thibaut.collet@6wind.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).