From: Andy Whitcroft <apw@shadowen.org>
To: schwidefsky@de.ibm.com
Cc: Dave Hansen <haveblue@us.ibm.com>,
"Luck, Tony" <tony.luck@intel.com>, Andi Kleen <ak@suse.de>,
Paul Mackerras <paulus@samba.org>, Andrew Morton <akpm@osdl.org>,
"Eric W. Biederman" <ebiederm@xmission.com>,
"Randy.Dunlap" <rdunlap@xenotime.net>,
"Protasevich, Natalie" <Natalie.Protasevich@unisys.com>,
linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org
Subject: Re: [PATCH] x86_64: Make NR_IRQS configurable in Kconfig
Date: Thu, 10 Aug 2006 15:40:07 +0100 [thread overview]
Message-ID: <44DB4547.80007@shadowen.org> (raw)
In-Reply-To: <1155214538.14749.54.camel@localhost>
Martin Schwidefsky wrote:
> On Wed, 2006-08-09 at 11:25 -0700, Dave Hansen wrote:
>> Instead of:
>>
>> #define pfn_to_section_nr(pfn) ((pfn) >> PFN_SECTION_SHIFT)
>>
>> We could do:
>>
>> static inline unsigned long pfn_to_section_nr(unsigned long pfn)
>> {
>> return some_hash(pfn) % NR_OF_SECTION_SLOTS;
>> }
>>
>> This would, of course, still have limits on how _many_ sections can be
>> populated. But, it would remove the relationship on what the actual
>> physical address ranges can be from the number of populated sections.
>>
>> Of course, it isn't quite that simple. You need to make sure that the
>> sparse code is clean from all connections between section number and
>> physical address, as well as handling things like hash collisions. We'd
>> probably also need to store the _actual_ physical address somewhere
>> because we can't get it from the section number any more.
>
> You have to deal with the hash collisions somehow, for example with a
> list of pages that have the same hash. And you have to calculate the
> hash value. Both hurts performance.
>
>> P.S. With sparsemem extreme, I think you can cover an entire 64-bits of
>> address space with a 4GB top-level table. If one more level of tables
>> was added, we'd be down to (I think) an 8MB table. So, that might be an
>> option, too.
>
> On s390 we have to prepare for the situation of an address space that
> has a chunk of memory at the low end and another chunk with bit 2^63
> set. So the mem_map array needs to cover the whole 64 bit address range.
> For sparsemem, we can choose on the size of the mem_map sections and on
> how many indirections the lookup table should have. Some examples:
>
> 1) flat mem_map array: 2^52 entries, 56 bytes each.
> 2) mem_map sections with 256 entries / 14KB for each section,
> 1 indirection level, 2^44 indirection pointers, 128TB overhead
> 3) mem_map sections with 256 entries / 14KB for each section,
> 2 indirection levels, 2^22 indirection pointers for each level,
> 32MB for each indirection array, minimum 64MB overhead
> 4) mem_map sections with 256 entries / 14KB for each section,
> 3 indirection levels, 2^15/2^15/2^14 indirection pointers,
> 256K/256K/128K indirection arrays, minimum 640K overhead
> 5) mem_map sections with 1024 entries / 56KB for each section,
> 3 indirection levels, 2^14/2^14/2^14 indirection pointers,
> 128K/128K/128K indirection arrays, minimum 384KB overhead
>
> 2 levels of indirection results in large overhead in regard to memory.
> For 3 levels of indirection the memory overhead is ok, but each lookup
> has to walk 3 indirections. This adds cpu cycles to access the mem_map
> array.
>
> The alternative of a flat mem_map array in vmalloc space is much more
> attractive. The size of the array is 2^52*56 Byte. 1,3% of the virtual
> address space. The access doesn't change, an array gets accessed. The
> access gets automatically cached by the hardware.
> Simple, straightforward, no additional overhead. Only the setup of the
> kernel page tables for the mem_map vmalloc area needs some thought.
>
Well you could do something more fun with the top of the address. You
don't need to keep the bytes in the same order for instance. If this is
really a fair size chunk at the bottom and one at the top then taking
the address and swapping the bytes like:
ABCDEFGH => BCDAEFGH
Would be a pretty trivial bit of register wibbling (ie very quick), but
would probabally mean a single flat, smaller sparsemem table would cover
all likely areas.
-apw
next prev parent reply other threads:[~2006-08-10 14:42 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-08-07 15:26 [PATCH] x86_64: Make NR_IRQS configurable in Kconfig Eric W. Biederman
2006-08-07 15:33 ` Andi Kleen
2006-08-07 15:55 ` Eric W. Biederman
2006-08-07 15:59 ` Randy.Dunlap
2006-08-07 16:11 ` Protasevich, Natalie
2006-08-07 16:17 ` Andi Kleen
2006-08-07 16:23 ` Protasevich, Natalie
2006-08-07 16:58 ` Eric W. Biederman
2006-08-07 16:44 ` Eric W. Biederman
2006-08-07 17:00 ` H. Peter Anvin
2006-08-07 17:46 ` Eric W. Biederman
2006-08-07 17:52 ` H. Peter Anvin
2006-08-07 17:30 ` Eric W. Biederman
2006-08-07 17:55 ` Randy.Dunlap
2006-08-07 18:16 ` Eric W. Biederman
2006-08-07 18:53 ` Eric W. Biederman
2006-08-07 19:04 ` Randy.Dunlap
2006-08-07 22:10 ` Eric W. Biederman
2006-08-07 23:55 ` Andrew Morton
2006-08-08 2:17 ` Andi Kleen
2006-08-08 2:41 ` Andrew Morton
2006-08-08 2:47 ` Arjan van de Ven
2006-08-08 5:47 ` [PATCH] x86_64: Auto size the per cpu area Eric W. Biederman
2006-08-08 6:01 ` Andrew Morton
2006-08-08 6:31 ` Eric W. Biederman
2006-08-08 6:01 ` Andi Kleen
2006-08-08 6:46 ` Eric W. Biederman
2006-08-08 6:48 ` Andi Kleen
2006-08-08 7:29 ` Eric W. Biederman
2006-08-08 5:09 ` [PATCH] x86_64: Make NR_IRQS configurable in Kconfig Paul Mackerras
2006-08-08 5:14 ` Andi Kleen
2006-08-08 8:17 ` Martin Schwidefsky
2006-08-09 17:58 ` Luck, Tony
2006-08-09 18:25 ` Dave Hansen
2006-08-10 12:55 ` Martin Schwidefsky
2006-08-10 14:40 ` Andy Whitcroft [this message]
2006-08-10 14:53 ` Martin Schwidefsky
2006-08-07 19:40 ` Adrian Bunk
2006-08-07 22:26 ` Eric W. Biederman
2006-08-07 23:06 ` Adrian Bunk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=44DB4547.80007@shadowen.org \
--to=apw@shadowen.org \
--cc=Natalie.Protasevich@unisys.com \
--cc=ak@suse.de \
--cc=akpm@osdl.org \
--cc=ebiederm@xmission.com \
--cc=haveblue@us.ibm.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=paulus@samba.org \
--cc=rdunlap@xenotime.net \
--cc=schwidefsky@de.ibm.com \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.