Re: [PATCH]: Handling spurious page fault for hugetlb region for 2.6.14-rc4-git5

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Rohit Seth <rohit.seth@intel.com>
To: Hugh Dickins <hugh@veritas.com>
Cc: Andrew Morton <akpm@osdl.org>,
	linux-kernel@vger.kernel.org, torvalds@osdl.org
Subject: Re: [PATCH]: Handling spurious page fault for hugetlb region for 2.6.14-rc4-git5
Date: Wed, 19 Oct 2005 11:47:27 -0700	[thread overview]
Message-ID: <1129747647.339.78.camel@akash.sc.intel.com> (raw)
In-Reply-To: <Pine.LNX.4.61.0510191551180.7586@goblin.wat.veritas.com>

On Wed, 2005-10-19 at 16:23 +0100, Hugh Dickins wrote:

> I thought that the CPU never caches !present entries in the TLB?
> Or is that true of i386 (and x86_64), but untrue of ia64?

IA-64 can prefetch any entry from VHPT (last level page table)
irrespective of its value.  You are right that i386 and x86_64 does not
cache !present entry.  Though OS is suppose to handle those faults if
happen.

> Or do you have some new model or errata on some CPU where it's true?

No errata here.

> Or, final ghastly possibility ;), am I simply altogether wrong?
> 

You are asking the right questions here.

> > Meaning, unless this entry is purged or displaced, for virtual address V
> 
> When you say "purged", is that what we elsewhere call "flushed"
> in relation to the TLB, or something else?
> 

I should use flush to be consistent.

> > CPU will generate the page fault (as the P bit is not set and assuming
> > this fault has the highest precedence).
> > 
> > Kernel updates the *pte so that it now maps the hugepage at virtual
> > address V to physical address P.  
> > 
> > Later when the user process make a reference to V, because of stale TLB
> > entry, the processor gets PAGE_FAULT.
> 
> You seem to be saying that strictly, we ought to flush TLB even when we
> make a page present where none was before, but that the likelihood of it
> being needed is so low, and the overhead of TLB flush so high, and the
> existing code almost everywhere recovering safely from this condition,
> that the most effective thing to do is just fix up the hugetlb case.
> Is that correct?
> 

Yes.  At least for the architectures that can cache any translation in
its TLB.  IA-64 is again a good example here.  It flushes the entry only
at the fault time so that next time around you get the updated entry
(for the cases where the fault happened because of any stale TLB).

> > > Has this problem been observed in testing?
> > 
> > Yes. On IA-64.
> 
> But not on i386 or x86_64.
> 

No.

> Same series of doubts as with !present entries in the TLB; but after
> looking at the ia64 fault handler, that does seem to have stuff about
> speculative loads, so I'm guessing i386 and x86_64 prefetch does not
> cause faults (modulo errata), but ia64 does.
> 

Those speculative loads (are more of advanced loads generated by
compiler in anticipation that they will be helpful) on IA-64 are
different from prefetches that HW does for TLBs.

HW Speculative loads never generates any fault.  

Whereas prefetched TLB entries in i386, x86_64 or IA-64 can cause fault
if they are not flushed after updates.  

> Once I started to understand this thread, I thought you were quite
> wrong to be changing hugetlb fault handling, thought I'd find several
> other places which would need fixing too e.g. kmap_atomic, remap_pfn_range.
> 
> But no, I've found no others.  Either miraculously, or by good design,
> all the kernel misfaults should be seamlessly handled by the lazy vmalloc
> path (on i386 anyway: I don't know what happens for ia64 there), and the
> userspace misfaults by handle_pte_fault's pte_present check.  I think.
> 

Good OS design :-)  Though on IA-64 there was recently a similar issue
for vmalloc area that got fixed in low level arch specific code.

-rohit

next prev parent reply	other threads:[~2005-10-19 18:40 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-10-18 21:15 [PATCH]: Handling spurious page fault for hugetlb region for 2.6.14-rc4-git5 Seth, Rohit
2005-10-18 21:34 ` Andrew Morton
2005-10-18 22:17   ` Rohit Seth
2005-10-19  0:25     ` Andrew Morton
2005-10-19  3:25       ` Rohit Seth
2005-10-19  4:07         ` Andrew Morton
2005-10-19 14:33           ` Adam Litke
2005-10-19 15:48           ` Hugh Dickins
2005-10-19 19:05             ` Rohit Seth
2005-10-19 20:00               ` Hugh Dickins
2005-10-19 20:19                 ` Andrew Morton
2005-10-19 20:28                   ` Hugh Dickins
2005-10-19 23:53                     ` Rohit Seth
2005-10-20  1:36                       ` Rohit Seth
2005-10-20  1:37                         ` Andrew Morton
2005-10-20  6:17                         ` Hugh Dickins
2005-10-19 15:23         ` Hugh Dickins
2005-10-19 18:47           ` Rohit Seth [this message]
2005-10-19 20:53             ` Linus Torvalds
2005-10-19 21:59               ` Tony Luck
2005-10-20  0:05               ` Rohit Seth

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1129747647.339.78.camel@akash.sc.intel.com \
    --to=rohit.seth@intel.com \
    --cc=akpm@osdl.org \
    --cc=hugh@veritas.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox