From: "Thomas Hellström" <thellstrom@vmware.com>
To: "Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com>
Cc: Linux kernel mailing list <linux-kernel@vger.kernel.org>,
"Siddha, Suresh B" <suresh.b.siddha@intel.com>
Subject: Re: 2.6.29 pat issue
Date: Fri, 06 Feb 2009 10:51:26 +0100 [thread overview]
Message-ID: <498C081E.80100@vmware.com> (raw)
In-Reply-To: <1233875311.4286.127.camel@localhost.localdomain>
Pallipadi, Venkatesh wrote:
> On Thu, 2009-02-05 at 13:32 -0800, Thomas Hellstrom wrote:
>
>> Pallipadi, Venkatesh wrote:
>>
>>> Only place where vm_pgoff is getting set for a PFNMAP vma is in
>>> remap_pfn_range() which maps the entire range. vm_insert_pfn() which may
>>> have sparsely populated ranges does not set vm_pgoff. What interface are
>>> you using to map discontig pages, where you are seeing these errors?
>>>
>>>
>> Since vm_pgoff can be nonzero upon every call to a device driver's mmap
>> method (It corresponds to the @offset parameter, page shifted, given by
>> the user's mmap call), _Any_ VM_PFNMAP vma can practically be assumed to
>> be linear by is_linear_pfn_mapping(), and that's an invalid assumption.
>>
>> In this particular case, We set VM_PFNMAP explicitly in the mmap method
>> and use fault() and vm_insert_pfn() to populate the vmas with PTEs
>> pointing to private memory pages or io-space depending on where the data
>> is currently located. The member vma->vm_pgoff is, as mentioned, set by
>> the user-space mmap call, indicating what part of the device address
>> space needs to be mapped.
>>
>> So in the end, we're hitting the WARN_ON_ONCE(1) near line 637 in
>> arch/x86/mm/pat.c. We should never have ended up in reserve_pfn_range()
>> in the first place.
>>
>>
>
> OK. Now I understand how you are seeing that warning. I am not what is
> the simple way around this. There are no bits available in vm_flags that
> we can use to identify linear_pfn_mapping. I don't think you have any
> way around in the driver other than using pgoff, in order to do
> vm_insert_pfn.
> One possible way is to overload some existing flag + PFNMAP to mean
> linear pfn map. Will send a patch for this as an RFC soon.
>
Thanks, Venki. There are a couple of other issues as well. This wasn't
the root cause of the problem, Pls look at the mail I just sent out.
>
>>> The result of not having the caching attribute right can be really bad
>>> as to hang/crash the system. So, having this only in debug is not the
>>> enough, IM0. Kernel has to enforce UC and WC caching types are
>>> consistent at all times. And we also have to keep the indentity map and
>>> other mappings that may be present for that address consistent.
>>>
>> Indeed, it's crucial to keep the mappings consistent, but failure to do
>> so is a kernel driver bug, it should never be the result of invalid user
>> data.
>>
>> There are other more common kernel bugs that can be even worse and hang
>> / crash the system. For example using uninitialized spinlocks, writing
>> to kfreed memory etc. There is code in the kernel to detect these as
>> well, but this code is behind debug defines.
>>
>> IMHO checking each vm_insert_pfn() for caching attribute correctness is
>> not something that should be enabled by default, due to the CPU
>> overhead. Production drivers should never violate this.
>>
>>
>
> It is not a question of single production driver. There are many
> variables here. Different drivers can be mapping the same region. There
> can be mapping from /dev/mem. There are also kernel identity and text
> mappings. So, any change of cacheability by one driver has to make sure
> it is not stepping over some other users of that pte. Kernel has to make
> sure different things co-exist in a sane way.
>
Yes, I understand the need for this check now.
> There is an alternative to checking this in each vm_insert_pfn, as long
> as mappings are going to be contiguous (even though they may be inserted
> individually). As in include/linux/io-mapping.h, we can have a
> create_mapping which reserves the entire space, and individual map and
> unmap, which doesn't have to check. May be we need a new API for your
> use case though...
>
I think when the issues in the previous mail are fixed, this will in the
end reduce to a possible performance problem when doing vm_insert_pfn()
into a contigous range. A create_mapping API could be a way around this.
Thanks,
Thomas
> Thanks,
> Venki
>
>
next prev parent reply other threads:[~2009-02-06 9:51 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-02-05 12:47 2.6.29 pat issue Thomas Hellström
2009-02-05 18:03 ` Pallipadi, Venkatesh
2009-02-05 21:32 ` Thomas Hellstrom
2009-02-05 23:08 ` Pallipadi, Venkatesh
2009-02-06 9:51 ` Thomas Hellström [this message]
2009-02-06 1:11 ` Eric W. Biederman
2009-02-06 9:43 ` Thomas Hellström
2009-03-04 6:08 ` Pallipadi, Venkatesh
2009-03-04 9:56 ` Thomas Hellstrom
2009-03-06 22:38 ` Pallipadi, Venkatesh
2009-03-06 23:44 ` Thomas Hellstrom
2009-03-10 1:39 ` Pallipadi, Venkatesh
2009-03-10 8:22 ` Thomas Hellstrom
2009-03-10 17:42 ` Pallipadi, Venkatesh
2009-03-11 9:17 ` Thomas Hellstrom
2009-03-11 9:33 ` Ingo Molnar
2009-03-11 17:54 ` [PATCH] VM, x86, PAT: Change implementation of is_linear_pfn_mapping Pallipadi, Venkatesh
2009-03-11 22:09 ` Frans Pop
2009-03-12 0:31 ` Pallipadi, Venkatesh
2009-03-12 3:22 ` Pallipadi, Venkatesh
2009-03-12 5:45 ` Frans Pop
2009-03-12 18:59 ` Pallipadi, Venkatesh
2009-03-12 20:30 ` Frans Pop
2009-03-12 22:48 ` Pallipadi, Venkatesh
2009-03-13 0:36 ` Ingo Molnar
2009-03-13 0:45 ` [PATCH] VM, x86, PAT: Change is_linear_pfn_mapping to not use vm_pgoff Pallipadi, Venkatesh
2009-03-13 4:03 ` [tip:x86/urgent] " Pallipadi, Venkatesh
2009-03-13 16:25 ` Nick Piggin
2009-03-13 17:00 ` Pallipadi, Venkatesh
2009-03-14 2:52 ` Nick Piggin
2009-03-13 23:35 ` [PATCH] Add a new vm flag to track full pfnmap at mmap Pallipadi, Venkatesh
2009-03-14 2:53 ` Nick Piggin
2009-03-14 8:54 ` [tip:x86/urgent] VM, x86, PAT: add " Pallipadi, Venkatesh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=498C081E.80100@vmware.com \
--to=thellstrom@vmware.com \
--cc=linux-kernel@vger.kernel.org \
--cc=suresh.b.siddha@intel.com \
--cc=venkatesh.pallipadi@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox