From: Hiroyuki KAMEZAWA <kamezawa.hiroyu@jp.fujitsu.com>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: LinuxIA64 <linux-ia64@vger.kernel.org>, linux-mm <linux-mm@kvack.org>
Subject: Re: [RFC/PATCH] pfn_valid() more generic : intro[0/2]
Date: Wed, 06 Oct 2004 07:33:52 +0000 [thread overview]
Message-ID: <41639FE0.5060409@jp.fujitsu.com> (raw)
In-Reply-To: <B8E391BBE9FE384DAA4C5C003888BE6F0221CC82@scsmsx401.amr.corp.intel.com>
Hi,
Luck, Tony wrote:
>>ia64's ia64_pfn_valid() uses get_user() for checking whether a
>>page struct is available or not. I think this is an irregular
>>implementation and following patches
>>are a more generic replacement, careful_pfn_valid(). It uses 2
>>level table.
>
>
> It is odd ... but a somewhat convenient way to make check whether
> the page struct exists, while handling the fault if it is in an
> area of virtual mem_map that doesn't exist. I think that in practice
> we rarely call it with a pfn that generates a fault (except in error
> paths).
I understand it's rare case.
Honestly, this patch is for no-bitmap buddy allocator (I posted before).
pfn_valid() returns 0 in many case in no-bitmap buddy allocator
(because MAX_ORDER is 4GB).
So I decided to write experimental pfn_valid() which doesn't cause fault.
> How big will the pfn_validmap[] be for a very sparse physical space
> like SGI Altix? I'm not sure I see how PFN_VALID_MAPSHIFT is
> generated for each system.
>
PFN_VALID_MAPSHIFT can be overwritten in each asm-xxx/page.h. (can be in config.h)
I think each special architecture can find suitable value, if it wants.
If Altrix has XXX Tbytes for each node, setting 1 cache line(64bytes2entry) covers
each node's maximum size will be good.
1st level table.
With current configuration, 1Gbytes per 2byte, 8Tbytes per 1 page(16kpages)
2nd level table.
1 entry per 8 bytes. Entries are coalesced with each other as much as possible.
If memory layout is like a bee's nest, careful_pfn_valid() will need great amount
of memory and cannot work fine because of searching.
BTW, how sparse SGI Altix ?
> Why do we need a loop when looking in the 2nd level? Can't the
> entry from the 1st level point us to the right place?
>
consider this case.
a 1st level entry covers 0x1000 - 0x2000
[valid range ] 0x1000 - 0x1100
0x1200 - 0x1500
0x1600 - 0x2000
pfn_valid(0x1501)
-> by 1st level, we get 0x1000-0x1100
into loop 0x1200-0x1500
0x1600- returns 0.
walking 2nd level table can reduce size of 1st table.
I'd like to avoid cache-miss rather than avoiding small walk.
- Kame
WARNING: multiple messages have this Message-ID (diff)
From: Hiroyuki KAMEZAWA <kamezawa.hiroyu@jp.fujitsu.com>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: LinuxIA64 <linux-ia64@vger.kernel.org>, linux-mm <linux-mm@kvack.org>
Subject: Re: [RFC/PATCH] pfn_valid() more generic : intro[0/2]
Date: Wed, 06 Oct 2004 16:33:52 +0900 [thread overview]
Message-ID: <41639FE0.5060409@jp.fujitsu.com> (raw)
In-Reply-To: <B8E391BBE9FE384DAA4C5C003888BE6F0221CC82@scsmsx401.amr.corp.intel.com>
Hi,
Luck, Tony wrote:
>>ia64's ia64_pfn_valid() uses get_user() for checking whether a
>>page struct is available or not. I think this is an irregular
>>implementation and following patches
>>are a more generic replacement, careful_pfn_valid(). It uses 2
>>level table.
>
>
> It is odd ... but a somewhat convenient way to make check whether
> the page struct exists, while handling the fault if it is in an
> area of virtual mem_map that doesn't exist. I think that in practice
> we rarely call it with a pfn that generates a fault (except in error
> paths).
I understand it's rare case.
Honestly, this patch is for no-bitmap buddy allocator (I posted before).
pfn_valid() returns 0 in many case in no-bitmap buddy allocator
(because MAX_ORDER is 4GB).
So I decided to write experimental pfn_valid() which doesn't cause fault.
> How big will the pfn_validmap[] be for a very sparse physical space
> like SGI Altix? I'm not sure I see how PFN_VALID_MAPSHIFT is
> generated for each system.
>
PFN_VALID_MAPSHIFT can be overwritten in each asm-xxx/page.h. (can be in config.h)
I think each special architecture can find suitable value, if it wants.
If Altrix has XXX Tbytes for each node, setting 1 cache line(64bytes=32entry) covers
each node's maximum size will be good.
1st level table.
With current configuration, 1Gbytes per 2byte, 8Tbytes per 1 page(16kpages)
2nd level table.
1 entry per 8 bytes. Entries are coalesced with each other as much as possible.
If memory layout is like a bee's nest, careful_pfn_valid() will need great amount
of memory and cannot work fine because of searching.
BTW, how sparse SGI Altix ?
> Why do we need a loop when looking in the 2nd level? Can't the
> entry from the 1st level point us to the right place?
>
consider this case.
a 1st level entry covers 0x1000 - 0x2000
[valid range ] 0x1000 - 0x1100
0x1200 - 0x1500
0x1600 - 0x2000
pfn_valid(0x1501)
-> by 1st level, we get 0x1000-0x1100
into loop 0x1200-0x1500
0x1600- returns 0.
walking 2nd level table can reduce size of 1st table.
I'd like to avoid cache-miss rather than avoiding small walk.
- Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
next prev parent reply other threads:[~2004-10-06 7:33 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-10-06 6:20 [RFC/PATCH] pfn_valid() more generic : intro[0/2] Hiroyuki KAMEZAWA
2004-10-06 6:20 ` Hiroyuki KAMEZAWA
2004-10-06 6:33 ` Luck, Tony
2004-10-06 6:33 ` Luck, Tony
2004-10-06 7:33 ` Hiroyuki KAMEZAWA [this message]
2004-10-06 7:33 ` Hiroyuki KAMEZAWA
2004-10-06 6:37 ` [RFC/PATCH] pfn_valid() more generic : arch independent part[0/2] Hiroyuki KAMEZAWA
2004-10-06 6:37 ` Hiroyuki KAMEZAWA
2004-10-06 15:14 ` Martin J. Bligh
2004-10-06 15:14 ` Martin J. Bligh
2004-10-07 0:10 ` Hiroyuki KAMEZAWA
2004-10-07 0:10 ` Hiroyuki KAMEZAWA
2004-10-07 5:22 ` Luck, Tony
2004-10-07 5:22 ` Luck, Tony
2004-10-07 6:28 ` Hiroyuki KAMEZAWA
2004-10-07 6:28 ` Hiroyuki KAMEZAWA
2004-10-07 6:51 ` align vmemmap to ia64's granule Hiroyuki KAMEZAWA
2004-10-07 14:38 ` [RFC/PATCH] pfn_valid() more generic : arch independent part[0/2] Martin J. Bligh
2004-10-07 14:38 ` Martin J. Bligh
2004-10-07 23:38 ` Hiroyuki KAMEZAWA
2004-10-07 23:38 ` Hiroyuki KAMEZAWA
2004-10-07 15:53 ` Luck, Tony
2004-10-07 15:53 ` Luck, Tony
2004-10-07 16:02 ` Martin J. Bligh
2004-10-07 16:02 ` Martin J. Bligh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=41639FE0.5060409@jp.fujitsu.com \
--to=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-ia64@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.