From: Andy Whitcroft <apw@shadowen.org>
To: schwidefsky@de.ibm.com
Cc: Dave Hansen <haveblue@us.ibm.com>,
"Luck, Tony" <tony.luck@intel.com>, Andi Kleen <ak@suse.de>,
Paul Mackerras <paulus@samba.org>, Andrew Morton <akpm@osdl.org>,
"Eric W. Biederman" <ebiederm@xmission.com>,
"Randy.Dunlap" <rdunlap@xenotime.net>,
"Protasevich, Natalie" <Natalie.Protasevich@unisys.com>,
linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org
Subject: Re: [PATCH] x86_64: Make NR_IRQS configurable in Kconfig
Date: Thu, 10 Aug 2006 15:40:07 +0100 [thread overview]
Message-ID: <44DB4547.80007@shadowen.org> (raw)
In-Reply-To: <1155214538.14749.54.camel@localhost>
Martin Schwidefsky wrote:
> On Wed, 2006-08-09 at 11:25 -0700, Dave Hansen wrote:
>> Instead of:
>>
>> #define pfn_to_section_nr(pfn) ((pfn) >> PFN_SECTION_SHIFT)
>>
>> We could do:
>>
>> static inline unsigned long pfn_to_section_nr(unsigned long pfn)
>> {
>> return some_hash(pfn) % NR_OF_SECTION_SLOTS;
>> }
>>
>> This would, of course, still have limits on how _many_ sections can be
>> populated. But, it would remove the relationship on what the actual
>> physical address ranges can be from the number of populated sections.
>>
>> Of course, it isn't quite that simple. You need to make sure that the
>> sparse code is clean from all connections between section number and
>> physical address, as well as handling things like hash collisions. We'd
>> probably also need to store the _actual_ physical address somewhere
>> because we can't get it from the section number any more.
>
> You have to deal with the hash collisions somehow, for example with a
> list of pages that have the same hash. And you have to calculate the
> hash value. Both hurts performance.
>
>> P.S. With sparsemem extreme, I think you can cover an entire 64-bits of
>> address space with a 4GB top-level table. If one more level of tables
>> was added, we'd be down to (I think) an 8MB table. So, that might be an
>> option, too.
>
> On s390 we have to prepare for the situation of an address space that
> has a chunk of memory at the low end and another chunk with bit 2^63
> set. So the mem_map array needs to cover the whole 64 bit address range.
> For sparsemem, we can choose on the size of the mem_map sections and on
> how many indirections the lookup table should have. Some examples:
>
> 1) flat mem_map array: 2^52 entries, 56 bytes each.
> 2) mem_map sections with 256 entries / 14KB for each section,
> 1 indirection level, 2^44 indirection pointers, 128TB overhead
> 3) mem_map sections with 256 entries / 14KB for each section,
> 2 indirection levels, 2^22 indirection pointers for each level,
> 32MB for each indirection array, minimum 64MB overhead
> 4) mem_map sections with 256 entries / 14KB for each section,
> 3 indirection levels, 2^15/2^15/2^14 indirection pointers,
> 256K/256K/128K indirection arrays, minimum 640K overhead
> 5) mem_map sections with 1024 entries / 56KB for each section,
> 3 indirection levels, 2^14/2^14/2^14 indirection pointers,
> 128K/128K/128K indirection arrays, minimum 384KB overhead
>
> 2 levels of indirection results in large overhead in regard to memory.
> For 3 levels of indirection the memory overhead is ok, but each lookup
> has to walk 3 indirections. This adds cpu cycles to access the mem_map
> array.
>
> The alternative of a flat mem_map array in vmalloc space is much more
> attractive. The size of the array is 2^52*56 Byte. 1,3% of the virtual
> address space. The access doesn't change, an array gets accessed. The
> access gets automatically cached by the hardware.
> Simple, straightforward, no additional overhead. Only the setup of the
> kernel page tables for the mem_map vmalloc area needs some thought.
>
Well you could do something more fun with the top of the address. You
don't need to keep the bytes in the same order for instance. If this is
really a fair size chunk at the bottom and one at the top then taking
the address and swapping the bytes like:
ABCDEFGH => BCDAEFGH
Would be a pretty trivial bit of register wibbling (ie very quick), but
would probabally mean a single flat, smaller sparsemem table would cover
all likely areas.
-apw
next prev parent reply other threads:[~2006-08-10 14:42 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <m1irl4ftya.fsf@ebiederm.dsl.xmission.com>
[not found] ` <20060807194159.f7c741b5.akpm@osdl.org>
[not found] ` <17624.7310.856480.704542@cargo.ozlabs.ibm.com>
2006-08-08 5:14 ` [PATCH] x86_64: Make NR_IRQS configurable in Kconfig Andi Kleen
2006-08-08 8:17 ` Martin Schwidefsky
2006-08-09 17:58 ` Luck, Tony
2006-08-09 18:25 ` Dave Hansen
2006-08-10 12:55 ` Martin Schwidefsky
2006-08-10 14:40 ` Andy Whitcroft [this message]
2006-08-10 14:53 ` Martin Schwidefsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=44DB4547.80007@shadowen.org \
--to=apw@shadowen.org \
--cc=Natalie.Protasevich@unisys.com \
--cc=ak@suse.de \
--cc=akpm@osdl.org \
--cc=ebiederm@xmission.com \
--cc=haveblue@us.ibm.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=paulus@samba.org \
--cc=rdunlap@xenotime.net \
--cc=schwidefsky@de.ibm.com \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox