From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Mon, 5 Mar 2007 10:54:43 -0800 (PST) From: Christoph Lameter Subject: Merged Zone / Node in order to do containers etc easily? Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org Return-Path: To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org List-ID: We have talked a bit in the last days about eventually getting rid of either nodes or zones. If one would merge the nodes and the zones struct this would be possible. Actually the current kernel supports something like it if the following config options are not set CONFIG_ZONE_DMA CONFIG_ZONE_DMA32 CONFIG_HIGHMEM In that case we only have a single zone per node but no support anymore for DMA zones or highmem. We save the bits in the page->flags that are usually used to identify the zone. For simplicities sake lets just call these node / zone entities "zone". Let say we have also CONFIG_NUMA set. Then A. We could add more "zones" via node hotplug. B. We can identify the zones via a node number from user space and direct allocations to a specfic "zone". C. We can migrate memory between "zones" D. We have an indication how favorably these "zones" are to be used given their SLIT distance. Lets call these "zones" that were generated during bootup "base zones". Now we need some additional functionality. In particular we want to be able to put some memory dynamically into containers and we need to find a replacement for the DMA zones. Lets create a new type of zones called "derived zones". These are based on base zone. An arbitrary number of MAX_ORDER blocks can be moved to these and then they function like a regular "zone". They can be dynamically created and deleted via the node hotplug interfaces. So if we create a new container then we create a new zone and extract a number of MAX_ORDER blocks from a base zone. The zone functions like a base zone for the time that it exists and thus we have all the usual accounting for the zone and do not need to add them separately. Reclaim will work as for base zones etc etc. (this only works if we have MAX_ORDER blocks available, thus we would need Mel's defrag patches). Applications can be restricted to a container or containers by the cpuset functionality. The build in process migration in cpusets can move applications. Processes can be manually moved through page migration. If we need some DMA zones for a particular device then we can also create a new zone and extract pages in a certain range from the base zone. This could occur dynamically (but early during boot so that the low end pages in a zone have not been used yet) if we discover that devices exist that need restricted memory pools. Moreover these zones could be custom sized for the devices that are challenged in a particular way. For example we could dynamically create a pool for a 2GB pool for the strange SCSI device that can only reliable do DMA using a 31 bit address. That leaves the HIGHMEM out cold so far but HIGHMEM is not needed on 64 bit platforms as far as I can tell. Maybe HIGHMEM could also be some sort of derived zone with memory taken from the base zone used as the memmap and as bounce buffers etc? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org