qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Jason Wang <jasowang@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Liuyongan <liuyongan@huawei.com>, Amos Kong <akong@redhat.com>,
	qemu-devel@nongnu.org, aliguori@amazon.com
Subject: Re: [Qemu-devel] [PATCH] virtio-net: announce self by guest
Date: Fri, 16 May 2014 13:02:51 +0800	[thread overview]
Message-ID: <53759BFB.2060104@redhat.com> (raw)
In-Reply-To: <20140515094520.GA16405@redhat.com>

On 05/15/2014 05:45 PM, Michael S. Tsirkin wrote:
> On Thu, May 15, 2014 at 05:22:28PM +0800, Jason Wang wrote:
>> On 05/15/2014 04:28 PM, Michael S. Tsirkin wrote:
>>> Thanks, looks good.
>>> Some minor comments below,
>>>
>>> On Thu, May 15, 2014 at 03:16:47PM +0800, Jason Wang wrote:
>>>> It's hard to track all mac addresses and their configurations (e.g
>>>> vlan or ipv6) in qemu. Without those informations, it's impossible to
>>> s/those informations/this information/
>>>
>>>> build proper garp packet after migration. The only possible solution
>>>> to this is let guest (who knew all configurations) to do this.
>>> s/knew/knows/
>>>
>>>> So, this patch introduces a new readonly config status bit of virtio-net,
>>>> VIRTIO_NET_S_ANNOUNCE which is used to notify guest to announce
>>>> presence of its link through config update interrupt.When guest has
>>>> done the announcement, it should ack the notification through
>>>> VIRTIO_NET_CTRL_ANNOUNCE_ACK cmd. This feature is negotiated by a new
>>>> feature bit VIRTIO_NET_F_ANNOUNCE (which has already been supported by
>>>> Linux guest).
>>>>
>>>> During load, a counter of announcing rounds were set so that the after
>>> s/were/is/
>>> s/the after/after/
>> Will correct those typos.
>>>> the vm is running it can trigger rounds of config interrupts to notify
>>>> the guest to build and send the correct garps.
>>>>
>>>> Tested with ping to guest with vlan during migration. Without the
>>>> patch, lots of the packets were lost after migration. With the patch,
>>>> could not notice packet loss after migration.
>>> below changelog should go after ---, until the ack list.
>>>
>> Ok.
>>>> Reference:
>>>> RFC v2: https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg01750.html
>>>> RFC v1: https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg02648.html
>>>> V7:     https://lists.gnu.org/archive/html/qemu-devel/2013-03/msg01127.html
>>>>
>>>> Changes from RFC v2:
>>>> - use QEMU_CLOCK_VIRTUAL instead of QEMU_CLOCK_REALTIME
>>>> - compat self announce for 2.0 machine type
>>>>
>>>> Changes from RFC v1:
>>>> - clean VIRTIO_NET_S_ANNOUNCE bit during reset
>>>> - free announce timer during clean
>>>> - make announce work for non-vhost case
>>>>
>>>> Changes from V7:
>>>> - Instead of introducing a global method for each kind of nic, this
>>>>   version limits the changes to virtio-net itself.
>>>>
>>>> Cc: Liuyongan <liuyongan@huawei.com>
>>>> Cc: Amos Kong <akong@redhat.com>
>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>> ---
>>>>  hw/net/virtio-net.c            |   48 ++++++++++++++++++++++++++++++++++++++++
>>>>  include/hw/i386/pc.h           |    5 ++++
>>>>  include/hw/virtio/virtio-net.h |   16 +++++++++++++
>>>>  3 files changed, 69 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>>> index 940a7cf..98d59e9 100644
>>>> --- a/hw/net/virtio-net.c
>>>> +++ b/hw/net/virtio-net.c
>>>> @@ -25,6 +25,7 @@
>>>>  #include "monitor/monitor.h"
>>>>  
>>>>  #define VIRTIO_NET_VM_VERSION    11
>>>> +#define VIRTIO_NET_ANNOUNCE_ROUNDS    3
>>>>  
>>>>  #define MAC_TABLE_ENTRIES    64
>>>>  #define MAX_VLAN    (1 << 12)   /* Per 802.1Q definition */
>>> I would make it  5 to be consistent with SELF_ANNOUNCE_ROUNDS
>>> in savevm.c
>>>
>>> in fact, let's move SELF_ANNOUNCE_ROUNDS to include/migration/vmstate.h
>>> and reuse it.
>> Ok.
>>>> @@ -99,6 +100,25 @@ static bool virtio_net_started(VirtIONet *n, uint8_t status)
>>>>          (n->status & VIRTIO_NET_S_LINK_UP) && vdev->vm_running;
>>>>  }
>>>>  
>>>> +static void virtio_net_announce_timer(void *opaque)
>>>> +{
>>>> +    VirtIONet *n = opaque;
>>>> +    VirtIODevice *vdev = VIRTIO_DEVICE(n);
>>>> +
>>>> +    if (n->announce &&
>>> I would make it > 0 here, just in case it becomes negative as a result
>>> of some bug.
>> Sure.
>>>> +        virtio_net_started(n, vdev->status) &&
>>>> +        vdev->guest_features & (0x1 << VIRTIO_NET_F_GUEST_ANNOUNCE) &&
>>>> +        vdev->guest_features & (0x1 << VIRTIO_NET_F_CTRL_VQ)) {
>>>> +
>>>> +        n->announce--;
>>>> +        n->status |= VIRTIO_NET_S_ANNOUNCE;
>>>> +        virtio_notify_config(vdev);
>>>> +    } else {
>>>> +        timer_del(n->announce_timer);
>>> why do this here?
>>>
>>>> +        n->announce = 0;
>>> why is this here?
>>>
>> If guest shuts down the device, there's no need to do the announcing.
> It's still weird.
> Either announce is 0 and then timer was not running
> or it's > 0 and then this won't trigger.

Right, the logic here is for QEMU_CLOCK_REALTIME. But there's another
question, we use QEMU_CLOCK_VIRTUAL while qemu is using
QEMU_CLOCK_REALTIME for its announcing. This looks fine but not sure
whether this is safe. And if the link was down, it's better for us to
stop the announcing?

>
>>>> +    }
>>>> +}
>>>> +
>>>>  static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
>>>>  {
>>>>      VirtIODevice *vdev = VIRTIO_DEVICE(n);
>>>> @@ -147,6 +167,8 @@ static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
>>>>  
>>>>      virtio_net_vhost_status(n, status);
>>>>  
>>>> +    virtio_net_announce_timer(n);
>>>> +
>>> why do this here? why not right after we set announce counter?
>> The reasons are:
>>
>> - The counters were set in load, but the device is not running so we
>> could not inject the interrupt at that time.
> I see. This makes sense - but this isn't intuitive.
> Why don't we simply start timer with current time?
> Need to make sure it runs fine if time passes, but
> I think it's fine.

Not sure I get the point, I didn't see any differences except for an
extra timer fire.
>
>> - We can stop the progress when guest is shutting down the device.
> On shut down guest will reset device stopping timer - this seems enough.

Yes, I see.
>>>>      for (i = 0; i < n->max_queues; i++) {
>>>>          q = &n->vqs[i];
>>>>  
>>>> @@ -322,6 +344,9 @@ static void virtio_net_reset(VirtIODevice *vdev)
>>>>      n->nobcast = 0;
>>>>      /* multiqueue is disabled by default */
>>>>      n->curr_queues = 1;
>>>> +    timer_del(n->announce_timer);
>>>> +    n->announce = 0;
>>>> +    n->status &= ~VIRTIO_NET_S_ANNOUNCE;
>>>>  
>>>>      /* Flush any MAC and VLAN filter table state */
>>>>      n->mac_table.in_use = 0;
>>>> @@ -731,6 +756,22 @@ static int virtio_net_handle_vlan_table(VirtIONet *n, uint8_t cmd,
>>>>      return VIRTIO_NET_OK;
>>>>  }
>>>>  
>>>> +static int virtio_net_handle_announce(VirtIONet *n, uint8_t cmd,
>>>> +                                      struct iovec *iov, unsigned int iov_cnt)
>>>> +{
>>>> +    if (cmd == VIRTIO_NET_CTRL_ANNOUNCE_ACK) {
>>>> +        n->status &= ~VIRTIO_NET_S_ANNOUNCE;
>>>> +        if (n->announce) {
>>> I would make it > 0 here, just in case it becomes negative as a result
>>> of some bug.
>> Ok.
>>>> +            timer_mod(n->announce_timer,
>>>> +                      qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 50 +
>>>> +                      (VIRTIO_NET_ANNOUNCE_ROUNDS - n->announce - 1) * 100);
>>> savevm.c has this code, and a comment:
>>>         /* delay 50ms, 150ms, 250ms, ... */
>>> 	50 + (SELF_ANNOUNCE_ROUNDS - count - 1) * 100
>>>
>>> how about an API in include/migration/vmstate.h
>>>
>>> static inline
>>> int64_t self_announce_delay(int round)
>>> {
>>> 	assert(round < SELF_ANNOUNCE_ROUNDS && round > 0);
>>>         /* delay 50ms, 150ms, 250ms, ... */
>>> 	return 50 + (SELF_ANNOUNCE_ROUNDS - round - 1) * 100;
>>> }
>>>
>>> or something to this end.
>>>
>> Good idea, will do this.
>>>> +        }
>>>> +        return VIRTIO_NET_OK;
>>>> +    } else {
>>>> +        return VIRTIO_NET_ERR;
>>>> +    }
>>>> +}
>>>> +
>>>>  static int virtio_net_handle_mq(VirtIONet *n, uint8_t cmd,
>>>>                                  struct iovec *iov, unsigned int iov_cnt)
>>>>  {
>>>> @@ -794,6 +835,8 @@ static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
>>>>              status = virtio_net_handle_mac(n, ctrl.cmd, iov, iov_cnt);
>>>>          } else if (ctrl.class == VIRTIO_NET_CTRL_VLAN) {
>>>>              status = virtio_net_handle_vlan_table(n, ctrl.cmd, iov, iov_cnt);
>>>> +        } else if (ctrl.class == VIRTIO_NET_CTRL_ANNOUNCE) {
>>>> +            status = virtio_net_handle_announce(n, ctrl.cmd, iov, iov_cnt);
>>>>          } else if (ctrl.class == VIRTIO_NET_CTRL_MQ) {
>>>>              status = virtio_net_handle_mq(n, ctrl.cmd, iov, iov_cnt);
>>>>          } else if (ctrl.class == VIRTIO_NET_CTRL_GUEST_OFFLOADS) {
>>>> @@ -1451,6 +1494,7 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
>>>>          qemu_get_subqueue(n->nic, i)->link_down = link_down;
>>>>      }
>>>>  
>>>> +    n->announce = VIRTIO_NET_ANNOUNCE_ROUNDS;
>>> Well if virtio_net_handle_announce runs now it will start timer
>>> in the past, right?
>>> This seems wrong.
>> Not sure I get the case. When in virtio_net_load() the vm is not even
>> running so looks like virtio_net_handle_announce() could not run in the
>> same time.
> I see, this works because you decrement it when VM starts running.
> I think it's not a good idea to rely on this though,
> better do everything from timer, and handle
> the case of command arriving too early.
>

Right, if QEMU_CLOCK_VIRTUAL is fine, we can do everything in a timer.

For the case of command arriving too early. Is there anything else we
need to do? Since we only start the next timer when we get ack command.

Thanks

  reply	other threads:[~2014-05-16  5:03 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-15  7:16 [Qemu-devel] [PATCH] virtio-net: announce self by guest Jason Wang
2014-05-15  8:28 ` Michael S. Tsirkin
2014-05-15  9:22   ` Jason Wang
2014-05-15  9:45     ` Michael S. Tsirkin
2014-05-16  5:02       ` Jason Wang [this message]
2014-05-18  9:04         ` Michael S. Tsirkin
2014-05-19  9:34           ` Jason Wang
2014-05-19 10:06             ` Michael S. Tsirkin
2014-05-20  6:06               ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53759BFB.2060104@redhat.com \
    --to=jasowang@redhat.com \
    --cc=akong@redhat.com \
    --cc=aliguori@amazon.com \
    --cc=liuyongan@huawei.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).