From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:58907)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mtosatti@redhat.com>) id 1RdjnG-0002bJ-6b
	for qemu-devel@nongnu.org; Thu, 22 Dec 2011 09:38:54 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mtosatti@redhat.com>) id 1RdjnC-0000Qx-1l
	for qemu-devel@nongnu.org; Thu, 22 Dec 2011 09:38:50 -0500
Received: from mx1.redhat.com ([209.132.183.28]:50588)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mtosatti@redhat.com>) id 1RdjnB-0000Qp-RO
	for qemu-devel@nongnu.org; Thu, 22 Dec 2011 09:38:46 -0500
Date: Thu, 22 Dec 2011 09:24:44 -0200
From: Marcelo Tosatti <mtosatti@redhat.com>
Message-ID: <20111222112443.GB7893@amt.cnet>
References: <20111029184502.GH11038@in.ibm.com>
	<7816C401-9BE5-48A9-8BA9-4CDAD1B39FC8@suse.de>
	<20111108173304.GA14486@sequoia.sous-sol.org>
	<20111121150054.GA3602@in.ibm.com> <1321889126.28118.5.camel@twins>
	<20111121160001.GB3602@in.ibm.com>
	<1321894980.28118.16.camel@twins> <4ECB0019.7020800@codemonkey.ws>
	<20111123150300.GH8397@redhat.com> <1322761231.4699.48.camel@twins>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1322761231.4699.48.camel@twins>
Subject: Re: [Qemu-devel] [RFC PATCH] Exporting Guest RAM information for
 NUMA binding
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrea Arcangeli <aarcange@redhat.com>, kvm list <kvm@vger.kernel.org>, dipankar@in.ibm.com, Alexander Graf <agraf@suse.de>, qemu-devel Developers <qemu-devel@nongnu.org>, Chris Wright <chrisw@sous-sol.org>, bharata@linux.vnet.ibm.com, Vaidyanathan S <svaidy@in.ibm.com>

On Thu, Dec 01, 2011 at 06:40:31PM +0100, Peter Zijlstra wrote:
> On Wed, 2011-11-23 at 16:03 +0100, Andrea Arcangeli wrote:

<snip>

> >From what I gather what you propose is to periodically unmap all user
> memory (or map it !r !w !x, which is effectively the same) and take the
> fault. This fault will establish a thread:page relation. One can just
> use that or involve some history as well. Once you have this thread:page
> relation set you want to group them on the same node.
> 
> There's various problems with that, firstly of course the overhead,
> storing this thread:page relation set requires quite a lot of memory.
> Secondly I'm not quite sure I see how that works for threads that share
> a working set. Suppose you have 4 threads and 2 working sets, how do you
> make sure to keep the 2 groups together. I don't think that's evident
> from the simple thread:page relation data [*]. Thirdly I immensely
> dislike all these background scanner things, they make it very hard to
> account time to those who actually use it.

Picture yourself as the administrator of a virtualized host, with
a given workload of guests doing their tasks. All it takes is to
understand from a high level what the algorithms of ksm (collapsing of
equal content-pages into same physical RAM) and khugepaged (collapsing
of 4k pages in 2MB pages, good for TLB) are doing (and that should be
documented), and infer from that what is happening. The same is valid
for the guy who is writing management tools and exposing the statistics
to the system administrator.