public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed
From: Zoltan Menyhart <Zoltan.Menyhart@bull.net>
To: linux-ia64@vger.kernel.org
Subject: Re: Read *pgd again in vhpt_miss handler
Date: Fri, 28 Apr 2006 07:53:19 +0000	[thread overview]
Message-ID: <4451C9EF.9060807@bull.net> (raw)
In-Reply-To: <444F79CA.7060804@bull.net>

Christoph Lameter wrote:
> On Thu, 27 Apr 2006, Zoltan Menyhart wrote:
> 
> 
>>I wanted to use the mm semaphore => no need to walk again the
>>pgd ... pte chain.
> 
> 
> The pgd ... pte chain does not change even without mmap until 
> the usage of the memory area ceases.

It is about about un-mapping a zone while another thread faults
on an address belonging to the same zone.

We have got a

	rx = ... -> pgd[i] -> pud[j] -> pmd[k] -> pte[l]

chain to walk in the VHPT miss handler.

Having reached somewhere in this chain walking, we have got
the ph. address of the next page in the chain in a register.

Before we can fetch the next item in the chain, "unpredictable
long" time can pass.

In the mean time:
- "free_pgtables()" kills the page we are about to touch.
- Someone re-uses the same page for something else.

As we are still keeping the same ph. address, we fetch an item
from a page that is no more ours.

Even if this security window is small, it does exist.

The probability to hit this bug grows higher on a NUMA machine
with lots of CPUs.

I can accept that the VHPT miss handler cannot protected by
some locks, it is the other end that should use some "careful
un-mapping" in order to avoid race conditions.

Here is what I'm working on:

PTE, PMD and PUD page usage perfectly fits into the RCU approach:

1. The VHPT miss handler is protected by "rcu_read_lock_bh()".
   There is not a single instruction added, the required semantics
   is provided by the fact that the interrupts are off.

2. "free_pgtables()" keeps working as today for the non multi-
   threaded applications.

3. "free_pgtables()" and its subroutines do not actually free
   the PTE, PMD and PUD pages for multi-threaded applications.
   These pages will set free via an "call_rcu_bh()"-activated
   service.

(Perhaps, the weaker protection "rcu_read_lock()" - "call_rcu()"
will be enough...)

Please note that:
- The life span of the PTE, PMD and PUD pages is rather long:
  they are freed when the usage of the memory area ceases,
  provided no other map (using the same PTE, PMD and PUD pages)
  is valid.
- The number of the PTE, PMD and PUD pages is much more smaller
  that that of the leaf pages.
Therefore freeing them is not really performance critical.
As the "call_rcu_bh()"-activated freeing service will do a batch
processing, these is a chance that freeing the PTE, PMD and PUD
pages in this way be more efficient then the "pte_free()"... etc.
services of today are.

Regards,

Zoltan


      parent reply	other threads:[~2006-04-28  7:53 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-04-26 13:46 Read *pgd again in vhpt_miss handler Zoltan Menyhart
2006-04-26 15:00 ` Chen, Kenneth W
2006-04-27 11:04 ` Zoltan Menyhart
2006-04-28  1:23 ` Christoph Lameter
2006-04-28  7:53 ` Zoltan Menyhart [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4451C9EF.9060807@bull.net \
    --to=zoltan.menyhart@bull.net \
    --cc=linux-ia64@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox