From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:55851)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1QGTub-0008Q3-Cj
	for qemu-devel@nongnu.org; Sun, 01 May 2011 06:30:02 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1QGTuZ-0001eJ-DZ
	for qemu-devel@nongnu.org; Sun, 01 May 2011 06:30:01 -0400
Received: from mx1.redhat.com ([209.132.183.28]:3473)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1QGTuZ-0001db-4X
	for qemu-devel@nongnu.org; Sun, 01 May 2011 06:29:59 -0400
Date: Sun, 1 May 2011 13:29:41 +0300
From: "Michael S. Tsirkin" <mst@redhat.com>
Message-ID: <20110501102941.GG27816@redhat.com>
References: <20110429031437.3796.49456.stgit@s20.home>
	<20110429150640.GB27816@redhat.com> <4DBAD942.6080001@siemens.com>
	<1304091508.3418.11.camel@x201> <4DBADD2B.2050300@siemens.com>
	<1304092537.3418.16.camel@x201> <4DBAE239.80007@siemens.com>
	<1304094003.14244.2.camel@x201> <4DBAE7C7.8010203@siemens.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4DBAE7C7.8010203@siemens.com>
Subject: Re: [Qemu-devel] [PATCH] Fix phys memory client - pass guest
 physical address not region offset
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Alex Williamson <alex.williamson@redhat.com>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>

On Fri, Apr 29, 2011 at 06:31:03PM +0200, Jan Kiszka wrote:
> On 2011-04-29 18:20, Alex Williamson wrote:
> > On Fri, 2011-04-29 at 18:07 +0200, Jan Kiszka wrote:
> >> On 2011-04-29 17:55, Alex Williamson wrote:
> >>> On Fri, 2011-04-29 at 17:45 +0200, Jan Kiszka wrote:
> >>>> On 2011-04-29 17:38, Alex Williamson wrote:
> >>>>> On Fri, 2011-04-29 at 17:29 +0200, Jan Kiszka wrote:
> >>>>>> On 2011-04-29 17:06, Michael S. Tsirkin wrote:
> >>>>>>> On Thu, Apr 28, 2011 at 09:15:23PM -0600, Alex Williamson wrote:
> >>>>>>>> When we're trying to get a newly registered phys memory client updated
> >>>>>>>> with the current page mappings, we end up passing the region offset
> >>>>>>>> (a ram_addr_t) as the start address rather than the actual guest
> >>>>>>>> physical memory address (target_phys_addr_t).  If your guest has less
> >>>>>>>> than 3.5G of memory, these are coincidentally the same thing.  If
> >>>>>>
> >>>>>> I think this broke even with < 3.5G as phys_offset also encodes the
> >>>>>> memory type while region_offset does not. So everything became RAMthis
> >>>>>> way, no MMIO was announced.
> >>>>>>
> >>>>>>>> there's more, the region offset for the memory above 4G starts over
> >>>>>>>> at 0, so the set_memory client will overwrite it's lower memory entries.
> >>>>>>>>
> >>>>>>>> Instead, keep track of the guest phsyical address as we're walking the
> >>>>>>>> tables and pass that to the set_memory client.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> >>>>>>>
> >>>>>>> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> >>>>>>>
> >>>>>>> Given all this, can yo tell how much time does
> >>>>>>> it take to hotplug a device with, say, a 40G RAM guest?
> >>>>>>
> >>>>>> Why not collect pages of identical types and report them as one chunk
> >>>>>> once the type changes?
> >>>>>
> >>>>> Good idea, I'll see if I can code that up.  I don't have a terribly
> >>>>> large system to test with, but with an 8G guest, it's surprisingly not
> >>>>> very noticeable.  For vfio, I intend to only have one memory client, so
> >>>>> adding additional devices won't have to rescan everything.  The memory
> >>>>> overhead of keeping the list that the memory client creates is probably
> >>>>> also low enough that it isn't worthwhile to tear it all down if all the
> >>>>> devices are removed.  Thanks,
> >>>>
> >>>> What other clients register late? Do the need to know to whole memory
> >>>> layout?
> >>>>
> >>>> This full page table walk is likely a latency killer as it happens under
> >>>> global lock. Ugly.
> >>>
> >>> vhost and kvm are the only current users.  kvm registers it's client
> >>> early enough that there's no memory registered, so doesn't really need
> >>> this replay through the page table walk.  I'm not sure how vhost works
> >>> currently.  I'm also looking at using this for vfio to register pages
> >>> for the iommu.
> >>
> >> Hmm, it looks like vhost is basically recreating the condensed, slotted
> >> memory layout from the per-page reports now. A bit inefficient,
> >> specifically as this happens per vhost device, no? And if vfio preferred
> >> a slotted format as well, you would end up copying vhost logic.
> >>
> >> That sounds to me like the qemu core should start tracking slots and
> >> report slot changes, not memory region registrations.
> > 
> > I was thinking the same thing, but I think Michael is concerned if we'll
> > each need slightly different lists.  This is also where kvm is mapping
> > to a fixed array of slots, which is know to blow-up with too many
> > assigned devices.  Needs to be fixed on both kernel and qemu side.
> > Runtime overhead of the phys memory client is pretty minimal, it's just
> > the startup that thrashes set_memory.
> 
> I'm not just concerned about the runtime overhead. This is code
> duplication. Even if the format of the lists differ, their structure
> should not: one entry per continuous memory region, and some lists may
> track sparsely based on their interests.
> 
> I'm sure the core could be taught to help the clients creating and
> maintaining such lists. We already have two types of users in tree, you
> are about to create another one, and Xen should have some need for it as
> well.
> 
> Jan

Absolutely. There should be some common code to deal with
slots.

> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux