Re: [patch] mtrr: fix issues with large addresses

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: ebiederm@xmission.com (Eric W. Biederman)
To: "Andreas Herrmann" <andreas.herrmann3@amd.com>
Cc: "Andi Kleen" <ak@suse.de>,
	linux-kernel@vger.kernel.org, discuss@x86-64.org,
	"Richard Gooch" <rgooch@safe-mbox.com>
Subject: Re: [patch] mtrr: fix issues with large addresses
Date: Tue, 06 Feb 2007 10:54:23 -0700	[thread overview]
Message-ID: <m1sldjtbn4.fsf@ebiederm.dsl.xmission.com> (raw)
In-Reply-To: <20070206160828.GH8665@alberich.amd.com> (Andreas Herrmann's message of "Tue, 6 Feb 2007 17:08:28 +0100")

"Andreas Herrmann" <andreas.herrmann3@amd.com> writes:

> On Mon, Feb 05, 2007 at 05:26:12PM -0700, ebiederm@xmission.com wrote:
>> "Andreas Herrmann" <andreas.herrmann3@amd.com> writes:
>> > mtrr: fix issues with large addresses
>> >
>> > Fixes some issues with /proc/mtrr interface:
>> > o If physical address size crosses the 44 bit boundary
>> >   size_or_mask is evaluated wrong
>> > o size_and_mask limits physical base
>> >   address for an MTRR to be less than 44 bit
>> > o added check to restrict base address to 36 bit on i386
>> 
>> The limit is per cpu not per architecture.  So if you run a
>> cpu that can run in 64bit mode in 32bit mode the limit
>> is not 36 bits.  Even PAE in 32bit mode doesn't have that limit.
>> 
>
> Good point.
>
> I totally ignored that on 64 bit cpus in legacy mode
> - PAE-paging means up to 52 physical address bits respectively
> "physical address size of the underlying implementation"
> - for non-PAE-paging with PSE enabled we have 40 bits for AMD and
> with PSE36 36 bits for Intel

For non PAE-paging you have 32bits.

The real question is how many physical bits does the cpu implemented
for bus transactions.  The latest Intel (non-ia64) cpus are at 40 bits
I believe, and the cutting edge AMD cpus are moving to 48bits.  The
Intel bus protocol supports 50 bits physical at least in it's ia64
incarnation so I believe some of the chipsets may actually support
that.

> What a mess.
> (Hope anyone knows for sure which paging methods are relevant for
> Linux if compiled for i386 and w/o CONFIG_X86_64?)

PAE does get used on i386, but except that it is a related phenomenon
it isn't relevant.

> (Seems that in my mind this legacy stuff is still tied to 36 and 32
> bits :(

Basically Intel had 36 bits implemented for so long it just got
stuck in everyones heads.  Forget that.  It is a question of
how many physical address bits the cpu uses when generating bus
cycles.  Nothing about the mode the cpu is running in matters.

>> > diff --git a/arch/i386/kernel/cpu/mtrr/generic.c
>> > b/arch/i386/kernel/cpu/mtrr/generic.c
>> > index f77fc53..aa21d15 100644
>> > --- a/arch/i386/kernel/cpu/mtrr/generic.c
>> > +++ b/arch/i386/kernel/cpu/mtrr/generic.c
>> > @@ -172,7 +172,7 @@ int generic_get_free_region(unsigned long base, unsigned
>> > long size, int replace_
>> >  static void generic_get_mtrr(unsigned int reg, unsigned long *base,
>> >  			     unsigned long *size, mtrr_type *type)
>> >  {
>> > -	unsigned int mask_lo, mask_hi, base_lo, base_hi;
>> > +	unsigned long mask_lo, mask_hi, base_lo, base_hi;
>> 
>> Why?  Given the low and the high I am assuming these are all implicitly
>> 32bit quantities.  unsigned int is fine.
>
> It is not, please refer to the function body, e.g.
>
> 	*base = base_hi << (32 - PAGE_SHIFT) | base_lo >> PAGE_SHIFT;
>
> All leading 20 bits of "unsigned int" base_hi are snipped away. Thus
> limiting base to use 44 bit instead of 52 bit in 64 bit mode. An
> option would have been to use a type cast while shifting.
>
> (Hmm, having your first remark in mind I have to admit that my fix is
> mainly focused on 64 bit mode not on 64 bit cpu running in 32 bit ...)

Yes.  So base needs to be come a u64. 

So base = ((base_hi << 32) | base_lo) >> PAGE_SHIFT.

I see where the 44bit limit comes in.  Do you actually have boxes
with > 16TB?

Regardless it looks like base and possibly size needs to become
a u64.  At which time the extra >> PAGE_SHIFT could be meaningless.
Either that or because base and size need to be sized in something like
megabytes.

I suspect making it a u64 sized in bytes will get the job done and
result in simpler code.


>> > diff --git a/arch/i386/kernel/cpu/mtrr/if.c b/arch/i386/kernel/cpu/mtrr/if.c
>> > index 5ae1705..3abc3f1 100644
>> > --- a/arch/i386/kernel/cpu/mtrr/if.c
>> > +++ b/arch/i386/kernel/cpu/mtrr/if.c
>> > @@ -137,6 +137,10 @@ mtrr_write(struct file *file, const char __user *buf,
>> > size_t len, loff_t * ppos)
>> >  	for (i = 0; i < MTRR_NUM_TYPES; ++i) {
>> >  		if (strcmp(ptr, mtrr_strings[i]))
>> >  			continue;
>> > +#ifndef CONFIG_X86_64
>> > +		if (base > 0xfffffffffULL)
>> > +			return -EINVAL;
>> > +#endif
>> 
>> That is just silly.  If the cpu is running in long mode or should
>> not affect this capability.
>
> Yes, that check is wrong -- due to my wrong assumption.
> Base can use 52 bits not just 36 bits in 32 bit mode on 64 bit cpu.
>
> My intention was to avoid the silent truncation of base address
> in the following lines:
>
> 		base >>= PAGE_SHIFT;
> 		size >>= PAGE_SHIFT;
> 		err =
> 		    mtrr_add_page((unsigned long) base, ...
>
> where base is cut at bit 44 due to the type cast.
>
> The user doing
> 	#> echo 0x100000000000 size=0x0815000 type=uncachable >/proc/mtrr
> will end up with a new MTRR pair having PhysBase==0x0. (At least this
> will give him some time to get a coffee when his system reboots after
> the crash.)
>
> So it seems that some more stuff needs to be fixed in the mtrr code.
> All unsigned long base addresses used in this code implicitly restrict
> the address to 44 bit (taking the PAGE_SHIFT into account).
>
> So I could do one of the following:
> (1) prepare new patch omitting this silly hunk (-> old behaviour)
> (2) check for 44 bit address size instead of 36 bit address size to
> reflect the implicit truncation (-> avoid silent truncation)
> (3) fix all mtrr code to be able to use up to 52 bit width physical
> addresses instead of 44 bit ones if running in 32 bit mode on 64 bit
> cpus.
>
> I prefer to do (2).
> (IMHO those who have the need for n>44 bit width base address in an MTRR
> should stick to 64 bit mode.)

I prefer (3).  Since the code is shared between 32 and 64bit mode it
should behave the same in both.  I know there are people who regularly
test 32bit kernels on boxes with 128 cpus and 128MB of ram.

People sometimes want crazy things and since it just a matter of changing
the type it should be no real work to get the code to work in 32bit mode.

>> > diff --git a/arch/i386/kernel/cpu/mtrr/main.c
> b/arch/i386/kernel/cpu/mtrr/main.c
>> > index 16bb7ea..0acfb6a 100644
>> > --- a/arch/i386/kernel/cpu/mtrr/main.c
>> > +++ b/arch/i386/kernel/cpu/mtrr/main.c
>> > @@ -50,7 +50,7 @@ u32 num_var_ranges = 0;
>> >  unsigned int *usage_table;
>> >  static DEFINE_MUTEX(mtrr_mutex);
>> >  
>> > -u32 size_or_mask, size_and_mask;
>> > +u64 size_or_mask, size_and_mask;
>> >  
>> >  static struct mtrr_ops * mtrr_ops[X86_VENDOR_NUM] = {};
>> >  
>> > @@ -662,8 +662,8 @@ void __init mtrr_bp_init(void)
>> >  			     boot_cpu_data.x86_mask == 0x4))
>> >  				phys_addr = 36;
>> >  
>> > -			size_or_mask = ~((1 << (phys_addr - PAGE_SHIFT)) - 1);
>> > -			size_and_mask = ~size_or_mask & 0xfff00000;
>> > + size_or_mask = ~((1ULL << (phys_addr - PAGE_SHIFT)) - 1);
>> > +			size_and_mask = ~size_or_mask & 0xfffff00000ULL;
>> 
>> Don't you want to make this hard coded mask 0xfffffffffff00000ULL?
>> 
>
> No.
>
> (That mask is used for values already right shifted by PAGE_SHIFT. So
> at least the upper three nibbles should be zero. And I recommend to
> zero out also all bits betwenn 52 and 63.)
>
> AFAIK size_and_mask is used to mask the address bits above 32 bits
> in PhysBase and PhysMask.
> 0xfffff00000 << PAGE_SHIFT will mask the upper bits of PhysBase and
> PhysMask up to the 52nd bit (counted from 1).
>
> And 52 bit physical address size is exactly what both AMD and Intel
> specify as the "architectural limit" in their programmer manuals for
> 64 bit mode.

Ok. I had earlier missed the PAGE_SHIFT shenanigans the code is playing.

Eric

next prev parent reply	other threads:[~2007-02-06 17:55 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-02-05 17:19 [patch] mtrr: fix issues with large addresses Andreas Herrmann
2007-02-05 22:50 ` Siddha, Suresh B
2007-02-06  7:53   ` Andi Kleen
2007-02-06  9:31     ` [discuss] " Jan Beulich
2007-02-06  9:45       ` Andi Kleen
2007-02-06  9:53         ` Jan Beulich
2007-02-06 10:54           ` Andi Kleen
2007-02-06 16:25             ` Andreas Herrmann
2007-02-06 16:16       ` Andreas Herrmann
2007-02-06  0:26 ` Eric W. Biederman
2007-02-06 16:08   ` Andreas Herrmann
2007-02-06 17:54     ` Eric W. Biederman [this message]
2007-02-06 18:42       ` [discuss] " Andreas Herrmann
2007-02-06 19:08         ` Eric W. Biederman
2007-02-06 19:25           ` Joerg Roedel
2007-02-06 19:44             ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m1sldjtbn4.fsf@ebiederm.dsl.xmission.com \
    --to=ebiederm@xmission.com \
    --cc=ak@suse.de \
    --cc=andreas.herrmann3@amd.com \
    --cc=discuss@x86-64.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rgooch@safe-mbox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox