From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758659AbYEEQE4 (ORCPT ); Mon, 5 May 2008 12:04:56 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752716AbYEEQEs (ORCPT ); Mon, 5 May 2008 12:04:48 -0400 Received: from relay2.sgi.com ([192.48.171.30]:52381 "EHLO relay.sgi.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1754546AbYEEQEr (ORCPT ); Mon, 5 May 2008 12:04:47 -0400 Date: Mon, 5 May 2008 11:04:43 -0500 From: Robin Holt To: Linus Torvalds Cc: Johannes Weiner , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Ingo Molnar , Andi Kleen , Yinghai Lu , Andrew Morton , Yasunori Goto Subject: Re: [rfc][patch 0/3] bootmem2: a memory block-oriented boot time allocator Message-ID: <20080505160443.GG19717@sgi.com> References: <20080505095938.326928514@symbol.fehenstaub.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.17+20080114 (2008-01-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 05, 2008 at 08:23:34AM -0700, Linus Torvalds wrote: > > > On Mon, 5 May 2008, Johannes Weiner wrote: > > > > here is a bootmem allocator replacement that uses one bitmap for all > > available pages and works with a model of contiguous memory blocks > > that reside on nodes instead of nodes only as the current allocator > > does. > > Won't this have problems with huge non-contiguous areas? > > Some setups have traditionally had node memory separated in physical space > by the high bits of the memory address, and using a single bitmap for such > things would potentially be basically impossible - even with a single bit > per page, the "span" of possible pages is potentially just too high, even > if the nodes themselves don't have tons of memory, because the memory is > just very spread out - and allocating the initial bitmap may not work > reliably. > > Now, admittedly I don't know if we even support that kind of thing or if > people really do things that way any more, so maybe it's not an issue. SGI sn2 architecture does. Each DIMM bank is allocated a 16GB range of physical addresses. There are up to four banks per node. The node number is stuck into higher portions of the address, giving a gap between nodes of 256GB. With a potential of 1024 nodes, you would have a very large array. Additionally on our upcoming UV systems, there will potentially be a hole between the bulk of memory and a small amount addressable at the high end of the address range (slightly short of 16TB) with the typical gap being on the order of 15TB. Thanks, Robin Holt