Re: [PATCH] vhost: Perform memory section dirty scans once per iteration

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Michael S. Tsirkin" <mst@redhat.com>
To: Joao Martins <joao.m.martins@oracle.com>
Cc: qemu-devel@nongnu.org, Jason Wang <jasowang@redhat.com>,
	Si-Wei Liu <si-wei.liu@oracle.com>
Subject: Re: [PATCH] vhost: Perform memory section dirty scans once per iteration
Date: Tue, 3 Oct 2023 10:01:15 -0400	[thread overview]
Message-ID: <20231003095019-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20230927111428.15982-1-joao.m.martins@oracle.com>

On Wed, Sep 27, 2023 at 12:14:28PM +0100, Joao Martins wrote:
> On setups with one or more virtio-net devices with vhost on,
> dirty tracking iteration increases cost the bigger the number
> amount of queues are set up e.g. on idle guests migration the
> following is observed with virtio-net with vhost=on:
> 
> 48 queues -> 78.11%  [.] vhost_dev_sync_region.isra.13
> 8 queues -> 40.50%   [.] vhost_dev_sync_region.isra.13
> 1 queue -> 6.89%     [.] vhost_dev_sync_region.isra.13
> 2 devices, 1 queue -> 18.60%  [.] vhost_dev_sync_region.isra.14
> 
> With high memory rates the symptom is lack of convergence as soon
> as it has a vhost device with a sufficiently high number of queues,
> the sufficient number of vhost devices.
> 
> On every migration iteration (every 100msecs) it will redundantly
> query the *shared log* the number of queues configured with vhost
> that exist in the guest. For the virtqueue data, this is necessary,
> but not for the memory sections which are the same. So
> essentially we end up scanning the dirty log too often.
> 
> To fix that, select a vhost device responsible for scanning the
> log with regards to memory sections dirty tracking. It is selected
> when we enable the logger (during migration) and cleared when we
> disable the logger.
> 
> The real problem, however, is exactly that: a device per vhost worker/qp,
> when there should be a device representing a netdev (for N vhost workers).
> Given this problem exists for any Qemu these days, figured a simpler
> solution is better to increase stable tree's coverage; thus don't
> change the device model of sw vhost to fix this "over log scan" issue.
> 
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
> I am not fully sure the heuristic captures the myriad of different vhost
> devices -- I think so. IIUC, the log is always shared, it's just whether
> it's qemu head memory or via /dev/shm when other processes want to
> access it.

Thanks for working on this.

I don't think this works like this because different types of different
vhost devices have different regions - see e.g. vhost_region_add_section
I am also not sure all devices are running at the same time - e.g.
some could be disconnected, and vhost_sync_dirty_bitmap takes this
into account.

But the idea is I think a good one - I just feel more refactoring is
needed.

We also have a FIXME:

static void vhost_log_sync_range(struct vhost_dev *dev,
                                 hwaddr first, hwaddr last)
{   
    int i;
    /* FIXME: this is N^2 in number of sections */
    for (i = 0; i < dev->n_mem_sections; ++i) {
        MemoryRegionSection *section = &dev->mem_sections[i];
        vhost_sync_dirty_bitmap(dev, section, first, last);
    }
}       

that it would be nice to address. Thanks!


> ---
>  hw/virtio/vhost.c | 44 ++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 38 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index e2f6ffb446b7..70646c2b533c 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -44,6 +44,7 @@
>  
>  static struct vhost_log *vhost_log;
>  static struct vhost_log *vhost_log_shm;
> +static struct vhost_dev *vhost_log_dev;
>  
>  static unsigned int used_memslots;
>  static QLIST_HEAD(, vhost_dev) vhost_devices =
> @@ -124,6 +125,21 @@ bool vhost_dev_has_iommu(struct vhost_dev *dev)
>      }
>  }
>  
> +static bool vhost_log_dev_enabled(struct vhost_dev *dev)
> +{
> +    return dev == vhost_log_dev;
> +}
> +
> +static void vhost_log_set_dev(struct vhost_dev *dev)
> +{
> +    vhost_log_dev = dev;
> +}
> +
> +static bool vhost_log_dev_is_set(void)
> +{
> +    return vhost_log_dev != NULL;
> +}
> +
>  static int vhost_sync_dirty_bitmap(struct vhost_dev *dev,
>                                     MemoryRegionSection *section,
>                                     hwaddr first,
> @@ -141,13 +157,16 @@ static int vhost_sync_dirty_bitmap(struct vhost_dev *dev,
>      start_addr = MAX(first, start_addr);
>      end_addr = MIN(last, end_addr);
>  
> -    for (i = 0; i < dev->mem->nregions; ++i) {
> -        struct vhost_memory_region *reg = dev->mem->regions + i;
> -        vhost_dev_sync_region(dev, section, start_addr, end_addr,
> -                              reg->guest_phys_addr,
> -                              range_get_last(reg->guest_phys_addr,
> -                                             reg->memory_size));
> +    if (vhost_log_dev_enabled(dev)) {
> +        for (i = 0; i < dev->mem->nregions; ++i) {
> +            struct vhost_memory_region *reg = dev->mem->regions + i;
> +            vhost_dev_sync_region(dev, section, start_addr, end_addr,
> +                                  reg->guest_phys_addr,
> +                                  range_get_last(reg->guest_phys_addr,
> +                                                 reg->memory_size));
> +        }
>      }
> +
>      for (i = 0; i < dev->nvqs; ++i) {
>          struct vhost_virtqueue *vq = dev->vqs + i;
>  
> @@ -943,6 +962,19 @@ static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
>              goto err_vq;
>          }
>      }
> +
> +    /*
> +     * During migration devices can't be removed, so we at log start
> +     * we select our vhost_device that will scan the memory sections
> +     * and skip for the others. This is possible because the log is shared
> +     * amongst all vhost devices.
> +     */
> +    if (enable_log && !vhost_log_dev_is_set()) {
> +        vhost_log_set_dev(dev);
> +    } else if (!enable_log) {
> +        vhost_log_set_dev(NULL);
> +    }
> +
>      return 0;
>  err_vq:
>      for (; i >= 0; --i) {
> -- 
> 2.39.3

next prev parent reply	other threads:[~2023-10-03 14:02 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-27 11:14 [PATCH] vhost: Perform memory section dirty scans once per iteration Joao Martins
2023-10-03 14:01 ` Michael S. Tsirkin [this message]
2023-10-06  8:58   ` Joao Martins
2023-10-06  9:48     ` Michael S. Tsirkin
2023-10-06 11:02       ` Joao Martins
2023-10-18  0:32       ` Si-Wei Liu
2023-10-18  5:55         ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231003095019-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=joao.m.martins@oracle.com \
    --cc=qemu-devel@nongnu.org \
    --cc=si-wei.liu@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.