* change_page_attr() and global_flush_tlb()
@ 2007-04-05 9:32 Jan Beulich
2007-04-05 12:43 ` [discuss] " Andi Kleen
0 siblings, 1 reply; 4+ messages in thread
From: Jan Beulich @ 2007-04-05 9:32 UTC (permalink / raw)
To: linux-kernel, discuss
Looking at both the i386 and x86-64 implementations I fail to understand why
there is an explicit requirement on calling global_flush_tlb() after
change_page_attr(), yet actual TLB flushing will not normally happen (on i386
it will happen if the CPU doesn't support clflush, but if it does, or on x86-64,
the flushing depends on the list of deferred pages being non-empty, which
can only happen when a large page gets re-combined). Is it assumed that
the callers additionally call tlb_flush_all() (I think none of them do)?
Further, change_page_attr()'s reference counting in a split large page's
page table appears to imply that attributes are only changed from or back to
the reference attributes, but not from one kind of non-default ones to the
same or another set of non-default ones (otherwise the reference count
will never again drop to zero), and also not from default to default (i.e. the
caller trying to revert attributes to normal not knowing what state they are
currently in) - this would BUG() if the large page was already reverted, or
screw the reference count otherwise. Is all of this intentional? I think it
will need to be changed as a prerequisite to supporting on-the-fly attribute
changes in the SMP alternatives code, which was requested as a follow-up
to the tightening of the CONFIG_DEBUG_RODATA effects.
Finally, at least for the kernel image range it would seem to me that it might
be beneficial to recombine mappings into large ones even when the
attributes are not at their default anymore, but consistent across an entire
2Mb/4Mb range (i.e. after write-protecting .text). At the same time I wonder,
though, whether it wouldn't be safer to remove execute permission from
anything but .text along with write-protecting read-only regions under
CONFIG_DEBUG_RODATA.
Thanks for any insight,
Jan
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [discuss] change_page_attr() and global_flush_tlb()
2007-04-05 9:32 change_page_attr() and global_flush_tlb() Jan Beulich
@ 2007-04-05 12:43 ` Andi Kleen
2007-04-05 13:52 ` Jan Beulich
0 siblings, 1 reply; 4+ messages in thread
From: Andi Kleen @ 2007-04-05 12:43 UTC (permalink / raw)
To: discuss; +Cc: Jan Beulich, linux-kernel
On Thursday 05 April 2007 11:32:49 Jan Beulich wrote:
> Looking at both the i386 and x86-64 implementations I fail to understand why
> there is an explicit requirement on calling global_flush_tlb() after
> change_page_attr(), yet actual TLB flushing will not normally happen (on i386
> it will happen if the CPU doesn't support clflush, but if it does, or on x86-64,
> the flushing depends on the list of deferred pages being non-empty, which
> can only happen when a large page gets re-combined). Is it assumed that
> the callers additionally call tlb_flush_all() (I think none of them do)?
Not sure I understand the question? global_flush_tlb is perhaps a little
misnamed, but it only flushes the pages changed in change_page_attr.
This works because it uses INVLPG which should ignore the G bits,
so not additional global flush is needed.
Linus wanted it done this two step way because he was worried about too
many IPIs. I guess it doesn't make too much difference though because near all
callers only change single pages. The flush could be probably folded
back into c_p_a().
BTW we know the sequence for doing this is not quite as recommended
by Intel (TODO item) but afaik the TLB flushing works.
> Further, change_page_attr()'s reference counting in a split large page's
> page table appears to imply that attributes are only changed from or back to
> the reference attributes, but not from one kind of non-default ones to the
> same or another set of non-default ones (otherwise the reference count
> will never again drop to zero), > and also not from default to default (i.e. the
> caller trying to revert attributes to normal not knowing what state they are
> currently in) - this would BUG() if the large page was already reverted, or
> screw the reference count otherwise. Is all of this intentional? I think it
> will need to be changed as a prerequisite to supporting on-the-fly attribute
> changes in the SMP alternatives code, which was requested as a follow-up
> to the tightening of the CONFIG_DEBUG_RODATA effects.
The reference count is just to count pages that have a non default attribute
in the PMD range so that we know when to revert to a large page.
Originally attribute was only the caching attribute, but later changed
to include NX and RW for slab (but that area was always a bit hackish
and might have some bugs)
For non default to another non default changes the count should not change.
If it doesn't work this way that would be a bug.
> Finally, at least for the kernel image range it would seem to me that it might
> be beneficial to recombine mappings into large ones even when the
> attributes are not at their default anymore, but consistent across an entire
> 2Mb/4Mb range (i.e. after write-protecting .text). At the same time I wonder,
> though, whether it wouldn't be safer to remove execute permission from
> anything but .text along with write-protecting read-only regions under
> CONFIG_DEBUG_RODATA.
Yes I guess that would be a useful optimization. Just getting
the reference counting right with that might be tricky.
At least the i386 code will probably change significantly soon as I clean
up the GB page patches, which require some restructuring in c_p_a().
-Andi
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [discuss] change_page_attr() and global_flush_tlb()
2007-04-05 12:43 ` [discuss] " Andi Kleen
@ 2007-04-05 13:52 ` Jan Beulich
2007-04-05 14:09 ` Andi Kleen
0 siblings, 1 reply; 4+ messages in thread
From: Jan Beulich @ 2007-04-05 13:52 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-kernel, discuss
>>> Andi Kleen <ak@suse.de> 05.04.07 14:43 >>>
>On Thursday 05 April 2007 11:32:49 Jan Beulich wrote:
>> Looking at both the i386 and x86-64 implementations I fail to understand why
>> there is an explicit requirement on calling global_flush_tlb() after
>> change_page_attr(), yet actual TLB flushing will not normally happen (on i386
>> it will happen if the CPU doesn't support clflush, but if it does, or on x86-64,
>> the flushing depends on the list of deferred pages being non-empty, which
>> can only happen when a large page gets re-combined). Is it assumed that
>> the callers additionally call tlb_flush_all() (I think none of them do)?
>
>Not sure I understand the question? global_flush_tlb is perhaps a little
>misnamed, but it only flushes the pages changed in change_page_attr.
>This works because it uses INVLPG which should ignore the G bits,
>so not additional global flush is needed.
That is the point - I don't see this invlpg. If you look at x86-64's
global_flush_tlb(), then you will note that it passes the list of pages grabbed
from deferred_pages. If that list has no entries, no single __flush_tlb_one
will be called, and the only place entries are added to this list is in
save_page(), which in turn only gets called if page_private(kpte_page) is
zero (i.e. the page was just reverted back to a big one).
This is also hardened by the fact that the flushing and the freeing of pages
happens walking the same list, hence only pages being freed get flushed.
Am I missing something here?
>> Further, change_page_attr()'s reference counting in a split large page's
>> page table appears to imply that attributes are only changed from or back to
>> the reference attributes, but not from one kind of non-default ones to the
>> same or another set of non-default ones (otherwise the reference count
>> will never again drop to zero), > and also not from default to default (i.e. the
>> caller trying to revert attributes to normal not knowing what state they are
>> currently in) - this would BUG() if the large page was already reverted, or
>> screw the reference count otherwise. Is all of this intentional? I think it
>> will need to be changed as a prerequisite to supporting on-the-fly attribute
>> changes in the SMP alternatives code, which was requested as a follow-up
>> to the tightening of the CONFIG_DEBUG_RODATA effects.
>
>The reference count is just to count pages that have a non default attribute
>in the PMD range so that we know when to revert to a large page.
>
>For non default to another non default changes the count should not change.
That is what it should be, but it also gets bumped when a page already had
non-default attributes, because the increment just depends on
pgprot_val(prot) != pgprot_val(ref_prot) (but specifically not on the
attributes the page had before the change).
Jan
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [discuss] change_page_attr() and global_flush_tlb()
2007-04-05 13:52 ` Jan Beulich
@ 2007-04-05 14:09 ` Andi Kleen
0 siblings, 0 replies; 4+ messages in thread
From: Andi Kleen @ 2007-04-05 14:09 UTC (permalink / raw)
To: discuss; +Cc: Jan Beulich, linux-kernel
On Thursday 05 April 2007 15:52:47 Jan Beulich wrote:
> That is the point - I don't see this invlpg. If you look at x86-64's
> global_flush_tlb(), then you will note that it passes the list of pages grabbed
> from deferred_pages. If that list has no entries, no single __flush_tlb_one
> will be called, and the only place entries are added to this list is in
> save_page(), which in turn only gets called if page_private(kpte_page) is
> zero (i.e. the page was just reverted back to a big one).
You're right. This got broken when the code was changed to invlpg
from global flush at some point. It has to flush in all cases,
but only wbinvd when the caching attributes changed. Will fix.
> >> Further, change_page_attr()'s reference counting in a split large page's
> >> page table appears to imply that attributes are only changed from or back to
> >> the reference attributes, but not from one kind of non-default ones to the
> >> same or another set of non-default ones (otherwise the reference count
> >> will never again drop to zero), > and also not from default to default (i.e. the
> >> caller trying to revert attributes to normal not knowing what state they are
> >> currently in) - this would BUG() if the large page was already reverted, or
> >> screw the reference count otherwise. Is all of this intentional? I think it
> >> will need to be changed as a prerequisite to supporting on-the-fly attribute
> >> changes in the SMP alternatives code, which was requested as a follow-up
> >> to the tightening of the CONFIG_DEBUG_RODATA effects.
> >
> >The reference count is just to count pages that have a non default attribute
> >in the PMD range so that we know when to revert to a large page.
> >
> >For non default to another non default changes the count should not change.
>
> That is what it should be, but it also gets bumped when a page already had
> non-default attributes, because the increment just depends on
> pgprot_val(prot) != pgprot_val(ref_prot) (but specifically not on the
> attributes the page had before the change).
I see.
-Andi
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2007-04-05 14:09 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-05 9:32 change_page_attr() and global_flush_tlb() Jan Beulich
2007-04-05 12:43 ` [discuss] " Andi Kleen
2007-04-05 13:52 ` Jan Beulich
2007-04-05 14:09 ` Andi Kleen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox