* remap allocator for per-CPU memory
@ 2009-05-12 15:30 Jan Beulich
2009-05-12 15:43 ` Tejun Heo
0 siblings, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2009-05-12 15:30 UTC (permalink / raw)
To: Ingo Molnar, tj; +Cc: linux-kernel
Didn't the addition of this allocator introduce another case that needs special
treatment in pageattr.c? Since large pages are used for mapping the allocated
memory, but only part of the initially allocated large pages are actually
retained, there's now the potential for TLB aliases with different cache
attributes for those parts of these pages that get passed back through
free_bootmem().
Jan
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: remap allocator for per-CPU memory
2009-05-12 15:30 remap allocator for per-CPU memory Jan Beulich
@ 2009-05-12 15:43 ` Tejun Heo
2009-05-13 10:09 ` Tejun Heo
0 siblings, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2009-05-12 15:43 UTC (permalink / raw)
To: Jan Beulich; +Cc: Ingo Molnar, linux-kernel
Jan Beulich wrote:
> Didn't the addition of this allocator introduce another case that
> needs special treatment in pageattr.c? Since large pages are used
> for mapping the allocated memory, but only part of the initially
> allocated large pages are actually retained, there's now the
> potential for TLB aliases with different cache attributes for those
> parts of these pages that get passed back through free_bootmem().
Hmmm.... yes, the large page mapping and the returned part of it would
alias each other. What changes should be made for it? Dunno much
about how pageattr works.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: remap allocator for per-CPU memory
2009-05-12 15:43 ` Tejun Heo
@ 2009-05-13 10:09 ` Tejun Heo
2009-05-13 11:05 ` Andi Kleen
0 siblings, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2009-05-13 10:09 UTC (permalink / raw)
To: Jan Beulich; +Cc: Ingo Molnar, linux-kernel
Tejun Heo wrote:
> Jan Beulich wrote:
>> Didn't the addition of this allocator introduce another case that
>> needs special treatment in pageattr.c? Since large pages are used
>> for mapping the allocated memory, but only part of the initially
>> allocated large pages are actually retained, there's now the
>> potential for TLB aliases with different cache attributes for those
>> parts of these pages that get passed back through free_bootmem().
>
> Hmmm.... yes, the large page mapping and the returned part of it would
> alias each other. What changes should be made for it? Dunno much
> about how pageattr works.
Okay, just glanced over the pageattr code. I don't think we need any
special provisions for this as long as the TLB is fine with having
overlapping PMD and PTE mappings with different attributes (please
note that these two mappings aren't occupying the same linear
addresses - they're aliases). This is allowed, right?
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: remap allocator for per-CPU memory
2009-05-13 10:09 ` Tejun Heo
@ 2009-05-13 11:05 ` Andi Kleen
2009-05-13 11:13 ` Tejun Heo
0 siblings, 1 reply; 11+ messages in thread
From: Andi Kleen @ 2009-05-13 11:05 UTC (permalink / raw)
To: Tejun Heo; +Cc: Jan Beulich, Ingo Molnar, linux-kernel
Tejun Heo <teheo@novell.com> writes:
>
> Okay, just glanced over the pageattr code. I don't think we need any
> special provisions for this as long as the TLB is fine with having
> overlapping PMD and PTE mappings with different attributes (please
> note that these two mappings aren't occupying the same linear
> addresses - they're aliases). This is allowed, right?
Nope.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: remap allocator for per-CPU memory
2009-05-13 11:05 ` Andi Kleen
@ 2009-05-13 11:13 ` Tejun Heo
2009-05-13 11:29 ` Tejun Heo
0 siblings, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2009-05-13 11:13 UTC (permalink / raw)
To: Andi Kleen; +Cc: Jan Beulich, Ingo Molnar, linux-kernel
Andi Kleen wrote:
> Tejun Heo <teheo@novell.com> writes:
>> Okay, just glanced over the pageattr code. I don't think we need any
>> special provisions for this as long as the TLB is fine with having
>> overlapping PMD and PTE mappings with different attributes (please
>> note that these two mappings aren't occupying the same linear
>> addresses - they're aliases). This is allowed, right?
>
> Nope.
Yeah, I'm going through the manual now and can't find anything which
allows such behavior. I haven't been able to find anything which
describes what happens between large page and 4k page aliases.
Aieee... I'll dig through the manual a bit more and see whether this
can be worked around somehow. :-(
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: remap allocator for per-CPU memory
2009-05-13 11:13 ` Tejun Heo
@ 2009-05-13 11:29 ` Tejun Heo
2009-05-13 11:46 ` Andi Kleen
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Tejun Heo @ 2009-05-13 11:29 UTC (permalink / raw)
To: Andi Kleen; +Cc: Jan Beulich, Ingo Molnar, linux-kernel
Tejun Heo wrote:
> Andi Kleen wrote:
>> Tejun Heo <teheo@novell.com> writes:
>>> Okay, just glanced over the pageattr code. I don't think we need any
>>> special provisions for this as long as the TLB is fine with having
>>> overlapping PMD and PTE mappings with different attributes (please
>>> note that these two mappings aren't occupying the same linear
>>> addresses - they're aliases). This is allowed, right?
>> Nope.
>
> Yeah, I'm going through the manual now and can't find anything which
> allows such behavior. I haven't been able to find anything which
> describes what happens between large page and 4k page aliases.
> Aieee... I'll dig through the manual a bit more and see whether this
> can be worked around somehow. :-(
Looks like we're screwed.
I couldn't find anything explicitly prohibiting PMD/PTE aliases w/
different attributes although there are plenty of warnings and don'ts
against giving different attributes to the same linear addresses. At
any rate, it definitely looks way too dangerous to depend on.
And, set_memory_*() is basically allowed on any memory allocated via
get_free_page(), so... we're between rock and hard place. Looks like
remapping partially using large pages is no go.
Any ideas?
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: remap allocator for per-CPU memory
2009-05-13 11:29 ` Tejun Heo
@ 2009-05-13 11:46 ` Andi Kleen
2009-05-13 12:51 ` Jan Beulich
[not found] ` <4A0ADE6A0200007800000BA7@novell.com>
2 siblings, 0 replies; 11+ messages in thread
From: Andi Kleen @ 2009-05-13 11:46 UTC (permalink / raw)
To: Tejun Heo; +Cc: Andi Kleen, Jan Beulich, Ingo Molnar, linux-kernel
> I couldn't find anything explicitly prohibiting PMD/PTE aliases w/
> different attributes although there are plenty of warnings and don'ts
> against giving different attributes to the same linear addresses. At
> any rate, it definitely looks way too dangerous to depend on.
It is. We've had data corruption because of this in the past.
Worse it's very subtle data corruption, taking a long time
to track down. So yes it's definitely dangerous.
>
> And, set_memory_*() is basically allowed on any memory allocated via
> get_free_page(), so... we're between rock and hard place. Looks like
The x86-64 kernel has the text mapping alias which had similar problems.
That was avoided by special casing this.
> remapping partially using large pages is no go.
I'm not sure it was ever a good idea because most CPUs have much less
large TLB entries than small entries.
(in some cases the difference is dramatic, a few older cores
only had something like 4 2/4MB entries)
Unless it's really a lot of memory you're talking about here
it's probably better to use small pages for such specific
purposes.
The other issue is that with 1GB pages used in the direct mapping
the problem generally gets much worse. Although luckily it only
hits because there is some dumb kernel code which always forces
smaller pages for GB 0 and 4.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: remap allocator for per-CPU memory
2009-05-13 11:29 ` Tejun Heo
2009-05-13 11:46 ` Andi Kleen
@ 2009-05-13 12:51 ` Jan Beulich
[not found] ` <4A0ADE6A0200007800000BA7@novell.com>
2 siblings, 0 replies; 11+ messages in thread
From: Jan Beulich @ 2009-05-13 12:51 UTC (permalink / raw)
To: Tejun Heo; +Cc: Ingo Molnar, Andi Kleen, linux-kernel
>>> Tejun Heo <teheo@novell.com> 13.05.09 13:29 >>>
>Tejun Heo wrote:
>> Andi Kleen wrote:
>>> Tejun Heo <teheo@novell.com> writes:
>>>> Okay, just glanced over the pageattr code. I don't think we need any
>>>> special provisions for this as long as the TLB is fine with having
>>>> overlapping PMD and PTE mappings with different attributes (please
>>>> note that these two mappings aren't occupying the same linear
>>>> addresses - they're aliases). This is allowed, right?
>>> Nope.
>>
>> Yeah, I'm going through the manual now and can't find anything which
>> allows such behavior. I haven't been able to find anything which
>>> describes what happens between large page and 4k page aliases.
>> Aieee... I'll dig through the manual a bit more and see whether this
>> can be worked around somehow. :-(
>
>Looks like we're screwed.
>
>I couldn't find anything explicitly prohibiting PMD/PTE aliases w/
>different attributes although there are plenty of warnings and don'ts
>against giving different attributes to the same linear addresses. At
>any rate, it definitely looks way too dangerous to depend on.
>
>And, set_memory_*() is basically allowed on any memory allocated via
>get_free_page(), so... we're between rock and hard place. Looks like
>remapping partially using large pages is no go.
>
>Any ideas?
I think the only two alternatives are
(a) don't use large pages in the first place here, or
(b) teach the pageattr code to handle the per-CPU virtual area similarly to
the kernel space for x86-64 (though it's going to be a little more complicated
since there's no pre-determined relation between the virtual and physical
addresses - the necessary lookup might become expensive on systems with
very many [possible] CPUs).
Jan
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: remap allocator for per-CPU memory
[not found] ` <4A0ADE6A0200007800000BA7@novell.com>
@ 2009-05-13 13:29 ` Tejun Heo
2009-05-13 13:44 ` Jan Beulich
0 siblings, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2009-05-13 13:29 UTC (permalink / raw)
To: Jan Beulich; +Cc: Ingo Molnar, Andi Kleen, linux-kernel
Hello,
Jan Beulich wrote:
> I think the only two alternatives are
>
> (a) don't use large pages in the first place here, or
Seems like the first candidate at the moment.
> (b) teach the pageattr code to handle the per-CPU virtual area similarly to
> the kernel space for x86-64 (though it's going to be a little more complicated
> since there's no pre-determined relation between the virtual and physical
> addresses - the necessary lookup might become expensive on systems with
> very many [possible] CPUs).
Can you elaborate this a bit? Let's sya there's quick way to match
whether the page is part of the remapped large page, what can pageattr
do differently then? Applying the same attribute to both mappings?
Failing or filtering set_memory_*()?
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: remap allocator for per-CPU memory
2009-05-13 13:29 ` Tejun Heo
@ 2009-05-13 13:44 ` Jan Beulich
2009-05-13 13:53 ` Tejun Heo
0 siblings, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2009-05-13 13:44 UTC (permalink / raw)
To: Tejun Heo; +Cc: Ingo Molnar, Andi Kleen, linux-kernel
>>> Tejun Heo <teheo@novell.com> 13.05.09 15:29 >>>
>> (b) teach the pageattr code to handle the per-CPU virtual area similarly to
>> the kernel space for x86-64 (though it's going to be a little more complicated
>> since there's no pre-determined relation between the virtual and physical
>> addresses - the necessary lookup might become expensive on systems with
>> very many [possible] CPUs).
>
>Can you elaborate this a bit? Let's sya there's quick way to match
>whether the page is part of the remapped large page, what can pageattr
>do differently then? Applying the same attribute to both mappings?
>Failing or filtering set_memory_*()?
It would have to split the page. Perhaps there wouldn't be a need to apply the
new attribute to the page(s) that is(are) in the process of getting its(their)
attribute(s) changed; instead, just don't re-establish a 4k mapping for those
pages that aren't part of the per-CPU space.
And of course, the request should fail when it targets one of the pages that
are actually part of the per-CPU space -- but would be a BUG() anyway.
Jan
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: remap allocator for per-CPU memory
2009-05-13 13:44 ` Jan Beulich
@ 2009-05-13 13:53 ` Tejun Heo
0 siblings, 0 replies; 11+ messages in thread
From: Tejun Heo @ 2009-05-13 13:53 UTC (permalink / raw)
To: Jan Beulich; +Cc: Ingo Molnar, Andi Kleen, linux-kernel
Hello, Jan.
Jan Beulich wrote:
>>>> Tejun Heo <teheo@novell.com> 13.05.09 15:29 >>>
>>> (b) teach the pageattr code to handle the per-CPU virtual area similarly to
>>> the kernel space for x86-64 (though it's going to be a little more complicated
>>> since there's no pre-determined relation between the virtual and physical
>>> addresses - the necessary lookup might become expensive on systems with
>>> very many [possible] CPUs).
>> Can you elaborate this a bit? Let's sya there's quick way to match
>> whether the page is part of the remapped large page, what can pageattr
>> do differently then? Applying the same attribute to both mappings?
>> Failing or filtering set_memory_*()?
>
> It would have to split the page. Perhaps there wouldn't be a need to
> apply the new attribute to the page(s) that is(are) in the process
> of getting its(their) attribute(s) changed; instead, just don't
> re-establish a 4k mapping for those pages that aren't part of the
> per-CPU space.
Ah... right. Splitting the remapped area should do the trick, so now
the question is whether it would worth all the trouble or should we
just forget about remapping and use 4k on NUMA machines. I don't have
much clue here. Andi seems to think there's no reason to bother with
PMD mappings. Ingo, what do you think?
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2009-05-13 13:52 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-12 15:30 remap allocator for per-CPU memory Jan Beulich
2009-05-12 15:43 ` Tejun Heo
2009-05-13 10:09 ` Tejun Heo
2009-05-13 11:05 ` Andi Kleen
2009-05-13 11:13 ` Tejun Heo
2009-05-13 11:29 ` Tejun Heo
2009-05-13 11:46 ` Andi Kleen
2009-05-13 12:51 ` Jan Beulich
[not found] ` <4A0ADE6A0200007800000BA7@novell.com>
2009-05-13 13:29 ` Tejun Heo
2009-05-13 13:44 ` Jan Beulich
2009-05-13 13:53 ` Tejun Heo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox