All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Igor Mammedov <imammedo@redhat.com>
Cc: qemu-devel@nongnu.org, mst@redhat.com, groug@kaod.org,
	stefanha@gmail.com
Subject: Re: [Qemu-devel] [RFC] vhost: check if ring mapping is still valid when building memmap
Date: Mon, 11 Dec 2017 11:03:00 +0000	[thread overview]
Message-ID: <20171211110259.GB2424@work-vm> (raw)
In-Reply-To: <20171211103755.2c95f86a@redhat.com>

* Igor Mammedov (imammedo@redhat.com) wrote:
> On Fri, 8 Dec 2017 17:51:56 +0000
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> 
> > * Igor Mammedov (imammedo@redhat.com) wrote:
> > > On Thu, 7 Dec 2017 18:17:51 +0000
> > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > >   
> > > > * Igor Mammedov (imammedo@redhat.com) wrote:  
> > > > > vhost_verify_ring_mappings() were used to verify that
> > > > > rings are still accessible and related memory hasn't
> > > > > been moved after flatview is updated.
> > > > > 
> > > > > It were doing checks by mapping ring's GPA+len and
> > > > > checking that HVA hasn't changed with new memory map.
> > > > > To avoid maybe expensive mapping call, we were
> > > > > identifying address range that changed and were doing
> > > > > mapping only if ring were in changed range.
> > > > > 
> > > > > However it's no neccessary to perform ringi's GPA
> > > > > mapping as we already have it's current HVA and all
> > > > > we need is to verify that ring's GPA translates to
> > > > > the same HVA in updated flatview.
> > > > > 
> > > > > That could be done during time when we are building
> > > > > vhost memmap where vhost_update_mem_cb() already maps
> > > > > every RAM MemoryRegionSection to get section HVA. This
> > > > > fact can be used to check that ring belongs to the same
> > > > > MemoryRegion in the same place, like this:
> > > > > 
> > > > >   hva_ring_offset = GPA(ring) - GPA(MemoryRegionSection)
> > > > >   ring_hva == HVA(MemoryRegionSection) + hva_ring_offset
> > > > > 
> > > > > Doing this would allow us to drop ~100LOC of code which
> > > > > figures out changed memory range and avoid calling not
> > > > > needed reverse vhost_memory_map(is_write=true) which is
> > > > > overkill for the task at hand.    
> > > > 
> > > > There are a few things about this I don't understand; however if
> > > > it's right I agree that it means we can kill off my comparison
> > > > code.
> > > > 
> > > > 
> > > > First, can I check what constraints 'verify_ring' needs to check;
> > > > if I'm understanding the original code it's checking that :
> > > >     a) If a queue falls within a region, that the whole queue can
> > > >        be mapped  
> > >          see vhost_verify_ring_part_mapping():
> > > 
> > >              p = vhost_memory_map(dev, part_addr, &l, 1);                                 
> > >              if (!p || l != part_size) 
> > >                   error_out
> > >              
> > >          1st: (!p) requested part_addr must be mapped
> > >               i.e. be a part of MemorySection in flatview
> > >          AND
> > >          2nd: (l != part_size) mapped range size must match what we asked for
> > >               i.e. mapped as continuous range so that
> > >                  [GPA, GPA + part_size) could directly correspond to [HVA, HVA + part_size)
> > >               and that's is possible only withing 1 MemorySection && 1 MeoryRegion
> > >               if I read address_space_map() correctly
> > >                      flatview_translate() -> GPA's MemorySection
> > >                      flatview_extend_translation() -> 1:1 GPA->HVA range size
> > >               
> > >          So answer in case of RAM memory region that we are interested in, would be:
> > >          queue falls within a MemorySection and whole queue fits in to it  
> > 
> > Yes, OK.
> > 
> > > >     b) And the start of the queue corresponds to where we thought
> > > >        it used to be (in GPA?)  
> > >          that cached at vq->desc queue HVA hasn't changed after flatview change
> > >             if (p != part)
> > >                error_out  
> > 
> > OK, so it's the HVA that's not changed based on taking the part_addr
> > which is GPA and checking the map?
> Yes, I think so.
> 
> > > >    so that doesn't have any constraint on the ordering of the regions
> > > >    or whether the region is the same size or anything.
> > > >   Also I don't think it would spot if there was a qeueue that had no
> > > >   region associated with it at all.
> > > > 
> > > > Now, comments on your code below:
> > > > 
> > > >   
> > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > > > > ---
> > > > > PS:
> > > > >    should be applied on top of David's series
> > > > > 
> > > > > ---
> > > > >  include/hw/virtio/vhost.h |   2 -
> > > > >  hw/virtio/vhost.c         | 195 ++++++++++++++--------------------------------
> > > > >  2 files changed, 57 insertions(+), 140 deletions(-)
> > > > > 
> > > > > diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> > > > > index 467dc77..fc4af5c 100644
> > > > > --- a/include/hw/virtio/vhost.h
> > > > > +++ b/include/hw/virtio/vhost.h
> > > > > @@ -68,8 +68,6 @@ struct vhost_dev {
> > > > >      uint64_t log_size;
> > > > >      Error *migration_blocker;
> > > > >      bool memory_changed;
> > > > > -    hwaddr mem_changed_start_addr;
> > > > > -    hwaddr mem_changed_end_addr;
> > > > >      const VhostOps *vhost_ops;
> > > > >      void *opaque;
> > > > >      struct vhost_log *log;
> > > > > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > > > > index 5b9a7d7..026bac3 100644
> > > > > --- a/hw/virtio/vhost.c
> > > > > +++ b/hw/virtio/vhost.c
> > > > > @@ -296,35 +296,43 @@ static void vhost_memory_unmap(struct vhost_dev *dev, void *buffer,
> > > > >      }
> > > > >  }
> > > > >  
> > > > > -static int vhost_verify_ring_part_mapping(struct vhost_dev *dev,
> > > > > -                                          void *part,
> > > > > -                                          uint64_t part_addr,
> > > > > -                                          uint64_t part_size,
> > > > > -                                          uint64_t start_addr,
> > > > > -                                          uint64_t size)
> > > > > +static int vhost_verify_ring_part_mapping(void *ring_hva,
> > > > > +                                          uint64_t ring_gpa,
> > > > > +                                          uint64_t ring_size,
> > > > > +                                          void *reg_hva,
> > > > > +                                          uint64_t reg_gpa,
> > > > > +                                          uint64_t reg_size)
> > > > >  {
> > > > > -    hwaddr l;
> > > > > -    void *p;
> > > > > -    int r = 0;
> > > > > +    uint64_t hva_ring_offset;
> > > > > +    uint64_t ring_last = range_get_last(ring_gpa, ring_size);
> > > > > +    uint64_t reg_last = range_get_last(reg_gpa, reg_size);
> > > > >  
> > > > > -    if (!ranges_overlap(start_addr, size, part_addr, part_size)) {
> > > > > +    /* start check from the end so that the rest of checks
> > > > > +     * are done when whole ring is in merged sections range
> > > > > +     */
> > > > > +    if (ring_last < reg_last || ring_gpa > reg_last) {
> > > > >          return 0;
> > > > >      }    
> > > > 
> > > >   so does that mean if our region never grows to be as big as the ring
> > > > we wont spot the problem?  
> > > I think there is mistake here it should be:
> > >    ring_last < reg_gpa || ring_gpa > reg_last
> > > 
> > > so if ring is out side of continuous region, we do not care => no change  
> > 
> > OK, I don't see how that corresponds to your 'start check from the end'
> > comment - I thought it was doing something smarter to deal with this
> > being called from the _cb routine potentially before another part of the
> > range had been joined onto it.
> > In that case, we can just use ranges_overlap like the original
> > routine did.
> I suppose range check will do as well, the reason for check from the end
> was optimization to make less checks than ranges_overlap(),
> considering we are comparing against vhost_memory_region.

Have a look at the RFC v2 I've posted; I've reworked it a bit so
we call this code aftwards rather than from the _cb.

> But it seems like a bit an overdoing, since by the commit time
> flatview would contain 1:1 mapping of MemorySection:vhost_memory_region
> (modulo sections we are not interested in). It should be unlikely to 
> get 2 MemoryRegions allocated one after another and merge in vhost_memory_region()
> into one vhost_memory_region. Maybe we would be better off with just
> copying MemorySections into vhost_memory_region in vhost_update_mem_cb()
> and drop merging altogether.

Except I need the merging for the hugepage stuff that's coming in the
postcopy world.

> I'd even consider 'non deterministic' merging of 2 MemoryRegions harmful
> to migration, since vhost might see different memory map on target vs source.

I think that's come up before and it was decided it's not a problem
as long as the regions are still mapped; the only consistency required
is between the qemu and the vhost (either the kernel or the vhost-user);
it shouldn't be a guest visibile issue if I understand it correctly.

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

  reply	other threads:[~2017-12-11 11:03 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-29 18:50 [Qemu-devel] [RFC 0/7] Rework vhost memory region updates Dr. David Alan Gilbert (git)
2017-11-29 18:50 ` [Qemu-devel] [RFC 1/7] memory: address_space_iterate Dr. David Alan Gilbert (git)
2017-11-29 18:50 ` [Qemu-devel] [RFC 2/7] vhost: Move log_dirty check Dr. David Alan Gilbert (git)
2017-11-29 18:50 ` [Qemu-devel] [RFC 3/7] vhost: New memory update functions Dr. David Alan Gilbert (git)
2017-11-30 15:48   ` Igor Mammedov
2017-12-05 18:25     ` Dr. David Alan Gilbert
2017-11-29 18:50 ` [Qemu-devel] [RFC 4/7] vhost: update_mem_cb implementation Dr. David Alan Gilbert (git)
2017-11-30 11:27   ` Igor Mammedov
2017-12-06 20:09     ` Dr. David Alan Gilbert
2017-11-29 18:50 ` [Qemu-devel] [RFC 5/7] vhost: Compare new and old memory lists Dr. David Alan Gilbert (git)
2017-11-29 18:50 ` [Qemu-devel] [RFC 6/7] vhost: Copy updated region data into device state Dr. David Alan Gilbert (git)
2017-11-29 18:50 ` [Qemu-devel] [RFC 7/7] vhost: Remove vhost_set_memory and children Dr. David Alan Gilbert (git)
2017-11-30 11:22 ` [Qemu-devel] [RFC 0/7] Rework vhost memory region updates Igor Mammedov
2017-11-30 12:08   ` Dr. David Alan Gilbert
2017-11-30 12:40     ` Igor Mammedov
2017-11-30 12:47       ` Dr. David Alan Gilbert
2017-11-30 12:58         ` Igor Mammedov
2017-11-30 13:06           ` Dr. David Alan Gilbert
2017-11-30 15:08             ` Igor Mammedov
2017-11-30 15:18               ` Dr. David Alan Gilbert
2017-11-30 15:32                 ` Igor Mammedov
2017-11-30 15:41                   ` Dr. David Alan Gilbert
2017-11-30 16:51               ` Greg Kurz
2017-12-01 10:02 ` Stefan Hajnoczi
2017-12-01 10:19   ` Dr. David Alan Gilbert
2017-12-01 14:22 ` [Qemu-devel] [RFC] vhost: check if ring mapping is still valid when building memmap Igor Mammedov
2017-12-07 18:17   ` Dr. David Alan Gilbert
2017-12-08 14:42     ` Igor Mammedov
2017-12-08 17:51       ` Dr. David Alan Gilbert
2017-12-11  9:37         ` Igor Mammedov
2017-12-11 11:03           ` Dr. David Alan Gilbert [this message]
2017-12-11 13:45             ` Igor Mammedov
2017-12-11 15:43               ` Dr. David Alan Gilbert
2017-12-08 20:45   ` Dr. David Alan Gilbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171211110259.GB2424@work-vm \
    --to=dgilbert@redhat.com \
    --cc=groug@kaod.org \
    --cc=imammedo@redhat.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.