On Sat, 2005-02-19 at 17:03 +1100, Nigel Cunningham wrote: > > The mem_maps are per-pgdat or per-node with discontig, but I have a > > patch in the pipeline to take them out of there and make one for every > > 128MB or 256MB, etc... area of memory (for memory hotplug). So, hanging > > them off the pgdat or zone won't even work in that case, because even > > the struct zone can have pretty sparse memory inside of it. I *think* > > the table is the only way to go. But, that can wait until Monday. :) > > Okay. I'll just wait :> I didn't realize at first that your patch already did 95% of what I wanted. Pretty much all that has to be done is make checks in the allocation loop to make sure that pages are present before allocating the map space for them. However, one use that I was thinking of would require map space for even non-present pages: implementing something like pfn_valid() for memory hotplug. We'd need a bit for every page that is possible in the system, no matter what. So, I added a flag to the alloc routine to allow both of these combinations. We could also optimize the lookup side to assume that a NULL in the top-level table implies all 0's for the second level. Then, have the users of PAGE_UL_PTR() check for NULL results. That would save some memory, and another cacheline during a lookup. static inline unsigned long *PAGE_UL_PTR(dyn_pageflags_t *bitmap, int pagenum) { unsigned long *map = bitmap[PAGENUMBER(pagenum)]; if (unlikely(!map)) return NULL; return map + PAGEINDEX(pagenum); } The other thing is to replace max_mapnr with something that is friendly for memory hotplug. We have a way to represent the largest _possible_ physical memory with MAX_PHYSADDR_BITS. That can be used instead of max_mapnr to figure out how big the table has to be. This patch isn't merged yet, but it's where I got MAX_PHYSADDR_BITS. http://www.sr71.net/patches/2.6.11/2.6.11-rc3-mhp1/broken-out/B-sparse-160-sparsemem-i386.patch The only other question is how much memory this design can handle. With 512 indexes in the top-level page (64-bit architectures), a 128k maximum kmalloc() for the second level, and 4k pages, I think that's 2TB of memory. We can always use __alloc_pages() directly if we need more than that. :) Attached patch is untested, uncompiled, and would have to be applied after the sparsemem patches I pointed to above, anyway. Does it look OK, in concept? -- Dave