From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751164Ab3JOGup (ORCPT ); Tue, 15 Oct 2013 02:50:45 -0400 Received: from mail-ee0-f46.google.com ([74.125.83.46]:60215 "EHLO mail-ee0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752085Ab3JOGuo (ORCPT ); Tue, 15 Oct 2013 02:50:44 -0400 Date: Tue, 15 Oct 2013 08:50:40 +0200 From: Ingo Molnar To: "H. Peter Anvin" Cc: Yinghai Lu , Tejun Heo , Zhang Yanfei , Zhang Yanfei , Toshi Kani , Ingo Molnar , Andrew Morton , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH part2 v2 0/8] Arrange hotpluggable memory as ZONE_MOVABLE Message-ID: <20131015065040.GB24584@gmail.com> References: <20131014142719.GI4722@htj.dyndns.org> <525C02DC.4050706@gmail.com> <20131014145131.GJ4722@htj.dyndns.org> <525C0866.2010808@gmail.com> <20131014151902.GL4722@htj.dyndns.org> <525C0EFE.2010409@gmail.com> <20131014200437.GA5720@htj.dyndns.org> <525C5727.7000603@zytor.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <525C5727.7000603@zytor.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * H. Peter Anvin wrote: > On 10/14/2013 01:37 PM, Yinghai Lu wrote: > >> > >> Optimizing NUMA boot just requires moving the heavy lifting to > >> appropriate NUMA nodes. It doesn't require that early boot phase > >> should strictly follow NUMA node boundaries. > > > > At end of day, I like to see all numa system (ram/cpu/pci) could have > > non boot nodes to be hot-removed logically. with any boot command > > line. > > > > I don't think that is realistic without hardware support, simply because > all it takes is a single page of kernel locked memory to prevent a page > from being removed. The only realistic way around that, I believe, is > to remove the identity-mapping in the kernel, but it still has all kinds > of funnies involving devices and DMA. We played with virtual kernel memory a decade ago and it's doable. The only complication was DMA from the kernel stack - that was done with some really broken old ISA drivers IIRC. Those should be a distant memory, in terms of practical impact. So if anyone can implement it using huge pages, with a really fast __va() and __pa() implementation, then it might be possible. But that's a pretty major surgery on x86. Thanks, Ingo