From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-arch-owner@vger.kernel.org>
Received: from e31.co.us.ibm.com ([32.97.110.149]:40853 "EHLO
	e31.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S965765AbXDBTzC (ORCPT
	<rfc822;linux-arch@vger.kernel.org>); Mon, 2 Apr 2007 15:55:02 -0400
Subject: Re: [PATCH 1/4] x86_64: Switch to SPARSE_VIRTUAL
From: Dave Hansen <hansendc@us.ibm.com>
In-Reply-To: <Pine.LNX.4.64.0704020851300.30394@schroedinger.engr.sgi.com>
References: <20070401071024.23757.4113.sendpatchset@schroedinger.engr.sgi.com>
	 <200704011246.52238.ak@suse.de>
	 <Pine.LNX.4.64.0704020832320.30394@schroedinger.engr.sgi.com>
	 <200704021744.39880.ak@suse.de>
	 <Pine.LNX.4.64.0704020851300.30394@schroedinger.engr.sgi.com>
Content-Type: text/plain
Date: Mon, 02 Apr 2007 12:54:56 -0700
Message-Id: <1175543696.22373.51.camel@localhost.localdomain>
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Sender: linux-arch-owner@vger.kernel.org
To: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@suse.de>, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, Martin Bligh <mbligh@google.com>, linux-mm@kvack.org, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
List-ID: <linux-arch.vger.kernel.org>

On Mon, 2007-04-02 at 08:54 -0700, Christoph Lameter wrote:
> > BTW there is no guarantee the node size is a multiple of 128MB so
> > you likely need to handle the overlap case. Otherwise we can 
> > get cache corruptions
> 
> How does sparsemem handle that? 

It doesn't. :)

In practice, this situation never happens because we don't have any
actual architectures that have any node boundaries on less than
MAX_ORDER, and the section size is at least MAX_ORDER.  If we *did* have
this, then the page allocator would already be broken for these
nodes. ;)

So, this SPARSE_VIRTUAL does introduce a new dependency, which Andi
calculated above.  But, in reality, I don't think it's a big deal.  Just
to spell it out a bit more, if this:

	VMEMMAP_MAPPING_SIZE/sizeof(struct page) * PAGE_SIZE

(where VMEMMAP_MAPPING_SIZE is PMD_SIZE in your case) is any larger than
the granularity on which your NUMA nodes are divided, then you might
have a problem with mem_map for one NUMA node getting allocated on
another.  

It might be worth a comment, or at least some kind of WARN_ON().
Perhaps we can stick something in online_page() to check if:

	page_to_nid(page) == page_to_nid(virt_to_page(page))

-- Dave