* Re: PAT support for i386 and x86_64
@ 2007-08-07 12:34 Daniel J Blueman
2007-08-07 14:10 ` Andi Kleen
0 siblings, 1 reply; 8+ messages in thread
From: Daniel J Blueman @ 2007-08-07 12:34 UTC (permalink / raw)
To: Andi Kleen; +Cc: Cédric Augonnet, Linux Kernel
On 7 Aug, 09:40, Andi Kleen <a...@firstfloor.org> wrote:
> On Mon, Aug 06, 2007 at 10:03:15PM -0400, Cédric Augonnet wrote:
> > Hi all,
>
> > For quite a while now, there as been numerous attempt to introduce support for
> > Page Attribute Table (PAT) in the Linux kernel, whereas most other OS already
> > have some support for this feature. Such a proposition popping up periodically,
> > perhaps it would be an opportunity to fix that lack for once.
>
> The trouble is you need to avoid conflicting attributes, otherwise
> you risk cache corruption. This means the direct mapping needs to be fixed
> up and the kernel needs to keep track of the ranges to prevent conflicts.
The general case of drivers setting up mappings to PCI memory space is
usually done consistently, thus no virtual pages with different PAT
indexes referring to the same physical page would exist here.
What cases did you have in mind?
> Also when there is already a MTRR it might not work due to the complicated
> rules of MTRR<->PAT interaction.
Not so complicated - stronger ordering always takes precedence (for safety).
>From current Intel 64 and IA-32 Architectures Software Developer's Manual
Volume 3A: System Programming Guide, section 10.5.2:
"[...] If there is an overlap of page-level and MTRR caching controls,
the mechanism that prevents caching has precedence. In cases where
there is a overlap in the assignment of the write-back and
write-through caching policies to a page and a region of memory, the
write-through policy takes precedence. The write-combining policy
(which can only be assigned through an MTRR or the PAT) takes
precedence over either write-through or write-back."
If (eg) a broken BIOS marks PCI memory space as UC, we gain nothing
like today. Most modern systems (in the server domain) I've seen don't
do this, as Windows' use of PAT would be broken too.
> Then there are old CPU errata that need to be handled etc.
I don't see a problem with having CONFIG_PAT depend on P4/Athlon or
newer to avoid such workarounds for now. Better to have something
there which works in (only) 80% of the cases, than nothing at all.
> There are also some other issues.
What like?
> You didn't solve all that at all. If it was as simple as your patch
> we would have long done it already.
Perhaps this feature can be marked as WIP to allow this to move
forward while corner-cases are worked out. Adding such a go/no-go
barrier will hamper progress we do see, as it has done already for
years.
The PCI ordering config option years ago was a similar case, since
various drivers didn't issue wmb()s until they were fixed up.
Daniel
--
Daniel J Blueman
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: PAT support for i386 and x86_64
2007-08-07 12:34 PAT support for i386 and x86_64 Daniel J Blueman
@ 2007-08-07 14:10 ` Andi Kleen
2007-08-07 15:26 ` Daniel J Blueman
0 siblings, 1 reply; 8+ messages in thread
From: Andi Kleen @ 2007-08-07 14:10 UTC (permalink / raw)
To: Daniel J Blueman; +Cc: Andi Kleen, Cédric Augonnet, Linux Kernel
> What cases did you have in mind?
Mainly mapping memory which is rather tricky.
> From current Intel 64 and IA-32 Architectures Software Developer's Manual
> Volume 3A: System Programming Guide, section 10.5.2:
If you look at the detailed table it's not that simple.
> > Then there are old CPU errata that need to be handled etc.
>
> I don't see a problem with having CONFIG_PAT depend on P4/Athlon or
> newer to avoid such workarounds for now. Better to have something
> there which works in (only) 80% of the cases, than nothing at all.
Agreed, but the patch doesn't do that.
> > You didn't solve all that at all. If it was as simple as your patch
> > we would have long done it already.
>
> Perhaps this feature can be marked as WIP to allow this to move
> forward while corner-cases are worked out. Adding such a go/no-go
We're not going to have potentially cache corrupting features
as WIP. The last time one such bug took months to track down.
> barrier will hamper progress we do see, as it has done already for
> years.
Because it's not simple and nobody has done it properly yet.
> The PCI ordering config option years ago was a similar case, since
> various drivers didn't issue wmb()s until they were fixed up.
wmb alone is not enough for PCI ordering.
-Andi
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: PAT support for i386 and x86_64
2007-08-07 14:10 ` Andi Kleen
@ 2007-08-07 15:26 ` Daniel J Blueman
2007-08-07 15:33 ` Andi Kleen
0 siblings, 1 reply; 8+ messages in thread
From: Daniel J Blueman @ 2007-08-07 15:26 UTC (permalink / raw)
To: Andi Kleen; +Cc: Cédric Augonnet, Linux Kernel
On 07/08/07, Andi Kleen <andi@firstfloor.org> wrote:
> > What cases did you have in mind?
>
> Mainly mapping memory which is rather tricky.
Yes, I'm talking about memory-mapping PCI memory regions; however I
can't see other immediate uses. Then maybe it's acceptable to document
that - for now - we need to hand out mappings which have consistent
PAT indexes, which would be natural in all cases I've seen with
processes writing/bursting to registers in PCI memory space.
> > From current Intel 64 and IA-32 Architectures Software Developer's Manual
> > Volume 3A: System Programming Guide, section 10.5.2:
>
> If you look at the detailed table it's not that simple.
Ignoring the N/A PPro and PII table, the various combinations of MTRR
and PAT entries (table 10-7) again show expected safe behaviour. We
don't worry about WP MTRRs, since BIOSes and Linux don't set that, and
we are practically only interested in WC and UC access, which we get
irrespective of the MTRR type covering that region.
A number of vendors (graphics, HPC etc) implement and use their own
PAT mechanisms, so defining only WC and UC covers all the cases.
> > > Then there are old CPU errata that need to be handled etc.
> >
> > I don't see a problem with having CONFIG_PAT depend on P4/Athlon or
> > newer to avoid such workarounds for now. Better to have something
> > there which works in (only) 80% of the cases, than nothing at all.
>
> Agreed, but the patch doesn't do that.
Noted for later.
> > > You didn't solve all that at all. If it was as simple as your patch
> > > we would have long done it already.
> >
> > Perhaps this feature can be marked as WIP to allow this to move
> > forward while corner-cases are worked out. Adding such a go/no-go
>
> We're not going to have potentially cache corrupting features
> as WIP. The last time one such bug took months to track down.
A driver that hands out mappings to the same PCI memory address with
different access ordering seems broken by design (due to PCI BARs
being strictly prefetchable or not). Again vendors implement their own
PAT mechanics which are consistent about access types.
(Intel IA32 ASDM 3A 10-47) "a WC page must never be aliased to a
cacheable page because WC writes may not check the processor caches."
This is not talking about corrupting the cache generally, but data
hitting a PCI target twice by (ie) a broken driver.
> > barrier will hamper progress we do see, as it has done already for
> > years.
>
> Because it's not simple and nobody has done it properly yet.
Providing only UC and WC types and documenting aliasing problems is
what is required here.
> > The PCI ordering config option years ago was a similar case, since
> > various drivers didn't issue wmb()s until they were fixed up.
>
> wmb alone is not enough for PCI ordering.
(not the point in question)
> -Andi
--
Daniel J Blueman
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: PAT support for i386 and x86_64
2007-08-07 15:26 ` Daniel J Blueman
@ 2007-08-07 15:33 ` Andi Kleen
0 siblings, 0 replies; 8+ messages in thread
From: Andi Kleen @ 2007-08-07 15:33 UTC (permalink / raw)
To: Daniel J Blueman; +Cc: Andi Kleen, Cédric Augonnet, Linux Kernel
On Tue, Aug 07, 2007 at 04:26:06PM +0100, Daniel J Blueman wrote:
> (Intel IA32 ASDM 3A 10-47) "a WC page must never be aliased to a
> cacheable page because WC writes may not check the processor caches."
Yes that is what we're trying to avoid, but it is far more complicated
than you think.
> This is not talking about corrupting the cache generally, but data
> hitting a PCI target twice by (ie) a broken driver.
This can end up with data corruption, machine checks or
a broken device. All happened.
> > > barrier will hamper progress we do see, as it has done already for
> > > years.
> >
> > Because it's not simple and nobody has done it properly yet.
>
> Providing only UC and WC types and documenting aliasing problems is
> what is required here.
At least the direct mapping needs to be changed too and any mapping
comming from other sources (/sys/bus/pci, /dev/mem etc.)
Otherwise you get the dangerous illegal aliases.
The chances of a driver getting this right even when documented
are very small.
BTW i had some prototype patches to address some of this,
but they need much more work to handle all this properly.
Just reprogramming the PAT registers is really the smallest issue.
The IA64 port has done some of the work as an example, unfortunately they
use a strange memory mapping scheme that cannot be easily moved
over. iirc Bjorn identified over 10 potential sources of cache aliases
that all needed to be addressed in the end.
-Andi
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 0/2] PAT support for i386 and x86_64
@ 2007-08-07 2:03 Cédric Augonnet
2007-08-07 8:30 ` Andi Kleen
0 siblings, 1 reply; 8+ messages in thread
From: Cédric Augonnet @ 2007-08-07 2:03 UTC (permalink / raw)
To: Linux Kernel Mailing List, Andrew Morton, Andi Kleen
Cc: loic, brice, cedric.augonnet
Hi all,
For quite a while now, there as been numerous attempt to introduce support for
Page Attribute Table (PAT) in the Linux kernel, whereas most other OS already
have some support for this feature. Such a proposition popping up periodically,
perhaps it would be an opportunity to fix that lack for once.
PAT actually makes life much easier for people needing to use write-combining
(WC) attribute. Indeed the only solution yet available is MTRRs, which are not
as flexible as PAT is.
Indeed, even if they both allow to set memory types on memory regions, MTRRs
are intended to be used on a physical memory range, while PAT makes possible
to set such memory type based on a linear address mapping. The actual problem
with MTRR is that the BIOS usually covers physical RAM with write-back (WB),
and that MTRRs should not be overlapped.
Using write-combining feature is sometimes crucial for performance of driver
for which writes operation on the bus are critical. High-speed networking
drivers such as myri10ge would take benefit of using PAT instead of MTRRs.
Video drivers using frame-buffers could also avoid using MTRRs that way.
PAT6 can be a candidate for that purpose. Not only for backward compatibility
reasons, but also as various errata concerning PAT would end up by having the
PAT bit ignored, therefore if we use PAT6, when such an error occurs we would
have an uncached area, which means it would at worst not be effective, thus
resolving the issues raised in http://lkml.org/lkml/2004/4/13/102 .
*******
Back to 2.4.20, Terence Ripperda already submitted such a proposal for PAT
support in
- http://lkml.org/lkml/2003/5/20/131
himself refering to
- http://www.ussg.iu.edu/hypermail/linux/kernel/0303.1/0606.html
In 2005, Eric W. Biederman submitted another similar patch :
- http://lkml.org/lkml/2005/8/29/226
More recently, there have been numerous threads dealing with MTRR issues,
some of them suggesting that there should actually be some support for PAT.
See for instance :
- http://lkml.org/lkml/2007/6/6/333
*******
I therefore propose here a set of 2 patches to add some support for
write-combining using PAT :
[PATCH 1/2] - [PAT] Setting write combining on PAT6 at boot time
[PATCH 2/2] - [PAT] Support for write combining in pci_mmap_page_range
Kind regards,
Cedric
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/2] PAT support for i386 and x86_64
2007-08-07 2:03 [PATCH 0/2] " Cédric Augonnet
@ 2007-08-07 8:30 ` Andi Kleen
2007-08-08 8:26 ` Loic Prylli
0 siblings, 1 reply; 8+ messages in thread
From: Andi Kleen @ 2007-08-07 8:30 UTC (permalink / raw)
To: Cédric Augonnet
Cc: Linux Kernel Mailing List, Andrew Morton, Andi Kleen, loic, brice,
cedric.augonnet
On Mon, Aug 06, 2007 at 10:03:15PM -0400, Cédric Augonnet wrote:
> Hi all,
>
> For quite a while now, there as been numerous attempt to introduce support for
> Page Attribute Table (PAT) in the Linux kernel, whereas most other OS already
> have some support for this feature. Such a proposition popping up periodically,
> perhaps it would be an opportunity to fix that lack for once.
The trouble is you need to avoid conflicting attributes, otherwise
you risk cache corruption. This means the direct mapping needs to be fixed
up and the kernel needs to keep track of the ranges to prevent conflicts.
Also when there is already a MTRR it might not work due to the complicated
rules of MTRR<->PAT interaction.
Then there are old CPU errata that need to be handled etc.
There are also some other issues.
You didn't solve all that at all. If it was as simple as your patch
we would have long done it already.
-Andi
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: PAT support for i386 and x86_64
2007-08-07 8:30 ` Andi Kleen
@ 2007-08-08 8:26 ` Loic Prylli
2007-08-08 10:14 ` Andi Kleen
0 siblings, 1 reply; 8+ messages in thread
From: Loic Prylli @ 2007-08-08 8:26 UTC (permalink / raw)
To: Andi Kleen
Cc: Cédric Augonnet, Linux Kernel Mailing List, Andrew Morton,
brice, cedric.augonnet
On 8/7/2007 4:30 AM, Andi Kleen wrote:
> On Mon, Aug 06, 2007 at 10:03:15PM -0400, Cédric Augonnet wrote:
>
>> Hi all,
>>
>> For quite a while now, there as been numerous attempt to introduce support for
>> Page Attribute Table (PAT) in the Linux kernel, whereas most other OS already
>> have some support for this feature. Such a proposition popping up periodically,
>> perhaps it would be an opportunity to fix that lack for once.
>>
>
> The trouble is you need to avoid conflicting attributes, otherwise
> you risk cache corruption. This means the direct mapping needs to be fixed
> up and the kernel needs to keep track of the ranges to prevent conflicts.
>
I don't see why we have to worry about cache corruption in the case at
hand. Write-combining is needed to map io (typically pci-mem regions)
which are never mapped cachable anywhere, including in the linear map.
If somebody for some reason needs to play with special attributes on
regular RAM for which inconsistent aliasing could be a problem:
- please explain why that consistency issue is mentioned in the context
of write-combining/PAT: the problem already potentially exists through
the use of the _PAGE_PCD attribute, and having an extra WC choice should
not make the problem worse or better (note that with the initial patch
that WC/PAT combination is only exploited in pci_mmap_page_range() which
rightfully doesn't seem to care about cachable attribute consistency
anyway).
> Also when there is already a MTRR it might not work due to the complicated
> rules of MTRR<->PAT interaction.
>
Some PAT<->MTRR cases are messy, but getting a WC type through PAT seems
to straightforwardly take precedence over any MTRR type, doesn't it?
> Then there are old CPU errata that need to be handled etc.
>
We mentioned that point in the introduction of the patch. We have looked
at the documented PAT erratas that exists for the Pentium-II,
Pentium-III, Pentium-M, some early pentium-IV processors. While there
are minor variations for the description of the bug depending on the
processor, they all fit into the following description: "under some
circumstances the PAT bit might be ignored". The patch purposefully puts
write-combining at PAT6 so if the conditions are there for the errata
to trigger, PAT2 (UC-) will be selected by the processor and the
corresponding accesses will simply be uncacheable instead of being
write-combining, which doesn't affect correctness.
We would certainly appreciate having any other erratas we missed
mentioned here for reference.
> There are also some other issues.
>
Introducing something like ioremap_wc() or a "sfence" wmb_wc() was
excluded from the initial patch on purpose. It would be the logical next
step (but involves possible API driver extensions), so the proposed
patch was limited to making use of the new WC attribute by really
handling the write_combine argument of pci_mmap_page_range(). That
seemed to generate enough objections to start with.
There is at least one mostly cosmetic problem in the patch in
pci_mmap_page_range() where huge pages should not be a concern here.
> You didn't solve all that at all. If it was as simple as your patch
> we would have long done it already.
I am sorry, but after this and other messages on the list, I still don't
understand why a simple approach hasn't been made available already:
- the attribute consistency issue seems independant of using PAT to
create WC mappings (in particular the possibility of mixing by accident
WC and UC aliases has always existed, having broken driver make such
aliases through a new PAT-based attribute combination hardly changes
anything, same for mixing uncachable and cachable aliases, new WC/PAT
combination doesn't change anything).
Sure there would some logical follow-up after a simple patch (like
providing a proper ioremap() interface, and maybe a new kind of barrier,
and handling the PAT bit properly in arch/x86/mm/pageattr.c, but those
follow-ups were purposefully excluded, and none seems very complex either).
So if there is any remaining issue on doing something "as simple as this
patch", please clarify it.
Best regards,
Loic
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: PAT support for i386 and x86_64
2007-08-08 8:26 ` Loic Prylli
@ 2007-08-08 10:14 ` Andi Kleen
2007-08-08 15:52 ` Cédric Augonnet
[not found] ` <f56c1ba00708080842i1eaa00kef5071d9a1f01f04@mail.gmail.com>
0 siblings, 2 replies; 8+ messages in thread
From: Andi Kleen @ 2007-08-08 10:14 UTC (permalink / raw)
To: Loic Prylli
Cc: Andi Kleen, Cédric Augonnet, Linux Kernel Mailing List,
Andrew Morton, brice, cedric.augonnet
> I don't see why we have to worry about cache corruption in the case at
> hand. Write-combining is needed to map io (typically pci-mem regions)
> which are never mapped cachable anywhere, including in the linear map.
If we WC them using PAT then there would be a UC<->WC conflict with
the direct mapping and possibly others. That's already undefined
and not allowed.
After some very bad experiences in the past I'm not going to take
chances on this.
We really need to keep all possible mappings synchronized.
-Andi
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: PAT support for i386 and x86_64
2007-08-08 10:14 ` Andi Kleen
@ 2007-08-08 15:52 ` Cédric Augonnet
[not found] ` <f56c1ba00708080842i1eaa00kef5071d9a1f01f04@mail.gmail.com>
1 sibling, 0 replies; 8+ messages in thread
From: Cédric Augonnet @ 2007-08-08 15:52 UTC (permalink / raw)
To: Andi Kleen
Cc: Loic Prylli, Linux Kernel Mailing List, Andrew Morton, brice,
cedric.augonnet
[Apologize for the double-post, messed up with my mailer... ]
2007/8/8, Andi Kleen <andi@firstfloor.org>:
> > I don't see why we have to worry about cache corruption in the case at
> > hand. Write-combining is needed to map io (typically pci-mem regions)
> > which are never mapped cachable anywhere, including in the linear map.
>
> If we WC them using PAT then there would be a UC<->WC conflict with
> the direct mapping and possibly others. That's already undefined
> and not allowed.
>
> After some very bad experiences in the past I'm not going to take
> chances on this.
>
> We really need to keep all possible mappings synchronized.
>
> -Andi
>
>
Hi,
First thanks for your reactions.
Andi, what i don't understand is, if we only put a minimal patch, like only
modifying the register, perhaps we may avoid having all vendors doing
their own recipe, possibly all messing the one the other. As you said ,
changing this register is absolutely not the actual issue with PAT, but don't
you think such a first step is needed to avoid conflicts since people _do_
already set that register from their driver.
I agree there is work for a comprehensive support of PAT, especially when
dealing with ioremap, but even though we only handle a smallish portion
of the problem yet, perhaps it's time to at least modify that register ?
Of course there is no problem for that being dependant on a CONFIG_PAT.
Kind regards,
Cédric
^ permalink raw reply [flat|nested] 8+ messages in thread[parent not found: <f56c1ba00708080842i1eaa00kef5071d9a1f01f04@mail.gmail.com>]
* Re: PAT support for i386 and x86_64
[not found] ` <f56c1ba00708080842i1eaa00kef5071d9a1f01f04@mail.gmail.com>
@ 2007-08-08 16:00 ` Andi Kleen
0 siblings, 0 replies; 8+ messages in thread
From: Andi Kleen @ 2007-08-08 16:00 UTC (permalink / raw)
To: Cédric Augonnet
Cc: Andi Kleen, Loic Prylli, Linux Kernel Mailing List, Andrew Morton,
brice, cedric.augonnet
On Wed, Aug 08, 2007 at 11:42:17AM -0400, Cédric Augonnet wrote:
> changing this register is absolutely not the actual issue with PAT, but
> don't
> you think such a first step is needed to avoid conflicts since people _do_
> already set that register from their driver.
Which drivers do that? Ok, I know the ATI 3d driver does, but if that
crashes the users know who is to blame.
No, I don't think we should merge a potentially data corrupting "first
step.". Even first steps have to meet some standards.
> Of course there is no problem for that being dependant on a CONFIG_PAT.
I don't think a CONFIG_DATA_CORRUPTION is a good idea.
-Andi
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2007-08-08 16:01 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-07 12:34 PAT support for i386 and x86_64 Daniel J Blueman
2007-08-07 14:10 ` Andi Kleen
2007-08-07 15:26 ` Daniel J Blueman
2007-08-07 15:33 ` Andi Kleen
-- strict thread matches above, loose matches on Subject: below --
2007-08-07 2:03 [PATCH 0/2] " Cédric Augonnet
2007-08-07 8:30 ` Andi Kleen
2007-08-08 8:26 ` Loic Prylli
2007-08-08 10:14 ` Andi Kleen
2007-08-08 15:52 ` Cédric Augonnet
[not found] ` <f56c1ba00708080842i1eaa00kef5071d9a1f01f04@mail.gmail.com>
2007-08-08 16:00 ` Andi Kleen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox