public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andy Whitcroft <apw@shadowen.org>
To: schwidefsky@de.ibm.com
Cc: Dave Hansen <haveblue@us.ibm.com>,
	"Luck, Tony" <tony.luck@intel.com>, Andi Kleen <ak@suse.de>,
	Paul Mackerras <paulus@samba.org>, Andrew Morton <akpm@osdl.org>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	"Randy.Dunlap" <rdunlap@xenotime.net>,
	"Protasevich, Natalie" <Natalie.Protasevich@unisys.com>,
	linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org
Subject: Re: [PATCH] x86_64: Make NR_IRQS configurable in Kconfig
Date: Thu, 10 Aug 2006 15:40:07 +0100	[thread overview]
Message-ID: <44DB4547.80007@shadowen.org> (raw)
In-Reply-To: <1155214538.14749.54.camel@localhost>

Martin Schwidefsky wrote:
> On Wed, 2006-08-09 at 11:25 -0700, Dave Hansen wrote: 
>> Instead of:
>>
>> #define pfn_to_section_nr(pfn) ((pfn) >> PFN_SECTION_SHIFT)
>>
>> We could do:
>>
>> static inline unsigned long pfn_to_section_nr(unsigned long pfn)
>> {
>> 	return some_hash(pfn) % NR_OF_SECTION_SLOTS;
>> }
>>
>> This would, of course, still have limits on how _many_ sections can be
>> populated.  But, it would remove the relationship on what the actual
>> physical address ranges can be from the number of populated sections.
>>
>> Of course, it isn't quite that simple.  You need to make sure that the
>> sparse code is clean from all connections between section number and
>> physical address, as well as handling things like hash collisions.  We'd
>> probably also need to store the _actual_ physical address somewhere
>> because we can't get it from the section number any more.
> 
> You have to deal with the hash collisions somehow, for example with a
> list of pages that have the same hash. And you have to calculate the
> hash value. Both hurts performance.
> 
>> P.S. With sparsemem extreme, I think you can cover an entire 64-bits of
>> address space with a 4GB top-level table.  If one more level of tables
>> was added, we'd be down to (I think) an 8MB table.  So, that might be an
>> option, too.
> 
> On s390 we have to prepare for the situation of an address space that
> has a chunk of memory at the low end and another chunk with bit 2^63
> set. So the mem_map array needs to cover the whole 64 bit address range.
> For sparsemem, we can choose on the size of the mem_map sections and on
> how many indirections the lookup table should have. Some examples:
> 
> 1) flat mem_map array: 2^52 entries, 56 bytes each.
> 2) mem_map sections with 256 entries / 14KB for each section,
>    1 indirection level, 2^44 indirection pointers, 128TB overhead
> 3) mem_map sections with 256 entries / 14KB for each section,
>    2 indirection levels, 2^22 indirection pointers for each level,
>    32MB for each indirection array, minimum 64MB overhead
> 4) mem_map sections with 256 entries / 14KB for each section,
>    3 indirection levels, 2^15/2^15/2^14 indirection pointers,
>    256K/256K/128K indirection arrays, minimum 640K overhead
> 5) mem_map sections with 1024 entries / 56KB for each section,
>    3 indirection levels, 2^14/2^14/2^14 indirection pointers,
>    128K/128K/128K indirection arrays, minimum 384KB overhead
> 
> 2 levels of indirection results in large overhead in regard to memory.
> For 3 levels of indirection the memory overhead is ok, but each lookup
> has to walk 3 indirections. This adds cpu cycles to access the mem_map
> array.
> 
> The alternative of a flat mem_map array in vmalloc space is much more
> attractive. The size of the array is 2^52*56 Byte. 1,3% of the virtual
> address space. The access doesn't change, an array gets accessed. The
> access gets automatically cached by the hardware.
> Simple, straightforward, no additional overhead. Only the setup of the
> kernel page tables for the mem_map vmalloc area needs some thought.
> 

Well you could do something more fun with the top of the address.  You 
don't need to keep the bytes in the same order for instance.  If this is 
really a fair size chunk at the bottom and one at the top then taking 
the address and swapping the bytes like:

	ABCDEFGH => BCDAEFGH

Would be a pretty trivial bit of register wibbling (ie very quick), but 
would probabally mean a single flat, smaller sparsemem table would cover 
all likely areas.

-apw

  reply	other threads:[~2006-08-10 14:42 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-08-07 15:26 [PATCH] x86_64: Make NR_IRQS configurable in Kconfig Eric W. Biederman
2006-08-07 15:33 ` Andi Kleen
2006-08-07 15:55   ` Eric W. Biederman
2006-08-07 15:59 ` Randy.Dunlap
2006-08-07 16:11   ` Protasevich, Natalie
2006-08-07 16:17     ` Andi Kleen
2006-08-07 16:23       ` Protasevich, Natalie
2006-08-07 16:58       ` Eric W. Biederman
2006-08-07 16:44   ` Eric W. Biederman
2006-08-07 17:00     ` H. Peter Anvin
2006-08-07 17:46       ` Eric W. Biederman
2006-08-07 17:52         ` H. Peter Anvin
2006-08-07 17:30   ` Eric W. Biederman
2006-08-07 17:55     ` Randy.Dunlap
2006-08-07 18:16       ` Eric W. Biederman
2006-08-07 18:53       ` Eric W. Biederman
2006-08-07 19:04         ` Randy.Dunlap
2006-08-07 22:10           ` Eric W. Biederman
2006-08-07 23:55             ` Andrew Morton
2006-08-08  2:17               ` Andi Kleen
2006-08-08  2:41                 ` Andrew Morton
2006-08-08  2:47                   ` Arjan van de Ven
2006-08-08  5:47                     ` [PATCH] x86_64: Auto size the per cpu area Eric W. Biederman
2006-08-08  6:01                       ` Andrew Morton
2006-08-08  6:31                         ` Eric W. Biederman
2006-08-08  6:01                       ` Andi Kleen
2006-08-08  6:46                         ` Eric W. Biederman
2006-08-08  6:48                           ` Andi Kleen
2006-08-08  7:29                             ` Eric W. Biederman
2006-08-08  5:09                   ` [PATCH] x86_64: Make NR_IRQS configurable in Kconfig Paul Mackerras
2006-08-08  5:14                     ` Andi Kleen
2006-08-08  8:17                       ` Martin Schwidefsky
2006-08-09 17:58                         ` Luck, Tony
2006-08-09 18:25                           ` Dave Hansen
2006-08-10 12:55                             ` Martin Schwidefsky
2006-08-10 14:40                               ` Andy Whitcroft [this message]
2006-08-10 14:53                                 ` Martin Schwidefsky
2006-08-07 19:40         ` Adrian Bunk
2006-08-07 22:26           ` Eric W. Biederman
2006-08-07 23:06             ` Adrian Bunk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=44DB4547.80007@shadowen.org \
    --to=apw@shadowen.org \
    --cc=Natalie.Protasevich@unisys.com \
    --cc=ak@suse.de \
    --cc=akpm@osdl.org \
    --cc=ebiederm@xmission.com \
    --cc=haveblue@us.ibm.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulus@samba.org \
    --cc=rdunlap@xenotime.net \
    --cc=schwidefsky@de.ibm.com \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox