Re: page fault fastpath: Increasing SMP scalability by introducing

public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed

* Re: page fault fastpath: Increasing SMP scalability by introducing
       [not found]   ` <2tCiw-8pK-1@gated-at.bofh.it>
@ 2004-08-15 23:53     ` Andi Kleen
  2004-08-15 23:55       ` Christoph Lameter
  0 siblings, 1 reply; 5+ messages in thread
From: Andi Kleen @ 2004-08-15 23:53 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-ia64, linux-kernel

Christoph Lameter <clameter@sgi.com> writes:

> On Sun, 15 Aug 2004, David S. Miller wrote:
>
>>
>> Is the read lock in the VMA semaphore enough to let you do
>> the pgd/pmd walking without the page_table_lock?
>> I think it is, but just checking.
>
> That would be great.... May I change the page_table lock to
> be a read write spinlock instead?

That's probably not a good idea. r/w locks are extremly slow on 
some architectures. Including ia64.

Just profile cat /proc/net/tcp on a machine with a lot of memory
and you'll notice.

-Andi


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: page fault fastpath: Increasing SMP scalability by introducing
  2004-08-15 23:53     ` page fault fastpath: Increasing SMP scalability by introducing Andi Kleen
@ 2004-08-15 23:55       ` Christoph Lameter
  2004-08-16  0:12         ` page fault fastpath: Increasing SMP scalability by introducing pte locks? Andi Kleen
  0 siblings, 1 reply; 5+ messages in thread
From: Christoph Lameter @ 2004-08-15 23:55 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Christoph Lameter, linux-ia64, linux-kernel

On Mon, 16 Aug 2004, Andi Kleen wrote:

> Christoph Lameter <clameter@sgi.com> writes:
>
> > On Sun, 15 Aug 2004, David S. Miller wrote:
> >
> >>
> >> Is the read lock in the VMA semaphore enough to let you do
> >> the pgd/pmd walking without the page_table_lock?
> >> I think it is, but just checking.
> >
> > That would be great.... May I change the page_table lock to
> > be a read write spinlock instead?
>
> That's probably not a good idea. r/w locks are extremly slow on
> some architectures. Including ia64.

I was thinking about a read write spinlock not an readwrite
semaphore. Look at include/asm-ia64/spinlock.h.
The implementations are almost the same. Are you sure
about this?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: page fault fastpath: Increasing SMP scalability by introducing pte locks?
  2004-08-15 23:55       ` Christoph Lameter
@ 2004-08-16  0:12         ` Andi Kleen
  0 siblings, 0 replies; 5+ messages in thread
From: Andi Kleen @ 2004-08-16  0:12 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Christoph Lameter, linux-ia64, linux-kernel

On Sun, Aug 15, 2004 at 04:55:57PM -0700, Christoph Lameter wrote:
> On Mon, 16 Aug 2004, Andi Kleen wrote:
> 
> > Christoph Lameter <clameter@sgi.com> writes:
> >
> > > On Sun, 15 Aug 2004, David S. Miller wrote:
> > >
> > >>
> > >> Is the read lock in the VMA semaphore enough to let you do
> > >> the pgd/pmd walking without the page_table_lock?
> > >> I think it is, but just checking.
> > >
> > > That would be great.... May I change the page_table lock to
> > > be a read write spinlock instead?
> >
> > That's probably not a good idea. r/w locks are extremly slow on
> > some architectures. Including ia64.
> 
> I was thinking about a read write spinlock not an readwrite
> semaphore. Look at include/asm-ia64/spinlock.h.

I was also talking about rw spinlocks.

> The implementations are almost the same. Are you sure
> about this?

Yes. Try the cat /proc/net/tcp test. It will take >100k read locks
for the TCP listen hash table, and on bigger ppc64 and ia64 machines this
can take nearly a second of system time.

-Andi

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: page fault fastpath: Increasing SMP scalability by introducing pte locks?
  2004-08-16  3:29           ` Christoph Lameter
  2004-08-16  6:59             ` Ray Bryant
@ 2004-08-16 14:39             ` William Lee Irwin III
  1 sibling, 0 replies; 5+ messages in thread
From: William Lee Irwin III @ 2004-08-16 14:39 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: David S. Miller, linux-ia64, linux-kernel

On Sun, 15 Aug 2004, David S. Miller wrote:
>> munmap() can destroy pmd and pte tables.  somehow we have
>> to protect against that, and currently that is having the
>> VMA semaphore held for reading, see free_pgtables().

On Sun, Aug 15, 2004 at 08:29:11PM -0700, Christoph Lameter wrote:
> It looks to me like the code takes care to provide the correct
> sequencing so that the integrity of pgd,pmd and pte links is
> guaranteed from the viewpoint of the MMU in the CPUs. munmap is there to
> protect one kernel thread messing with the addresses of these entities
> that might be stored in another threads register.
> Therefore it is safe to walk the chain only holding the semaphore read
> lock?

Detached pagetables are assumed to be freeable after a TLB flush IPI.
Previously holding ->page_table_lock would prevent the shootdowns of
links to the pagetable page from executing concurrently with
modifications to the pagetable page. Disabling interrupts or otherwise
inhibiting the progress of the IPI'ing cpu is needed to prevent
dereferencing freed pagetables and incorrect accounting based on
contents of about-to-be-freed pagetables. Reference counting pagetable
pages may help here, where the final put would be responsible for
unaccounting the various things in the pagetable page.

-- wli

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: page fault fastpath: Increasing SMP scalability by introducing pte locks?
  2004-08-16 15:18               ` Christoph Lameter
@ 2004-08-16 16:18                 ` William Lee Irwin III
  0 siblings, 0 replies; 5+ messages in thread
From: William Lee Irwin III @ 2004-08-16 16:18 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Ray Bryant, David S. Miller, linux-ia64, linux-kernel

On Mon, 16 Aug 2004, Ray Bryant wrote:
>> Something else to worry about here is mm->rss.  Previously, this was updated
>> only with the page_table_lock held, so concurrent increments were not a
>> problem.  rss may need to converted be an atomic_t if you use pte_locks.
>> It may be that an approximate value for rss is good enough, but I'm not sure
>> how to bound the error that could be introduced by a couple of hundred
>> processers handling page faults in parallel and updating rss without locking
>> it or making it an atomic_t.

On Mon, Aug 16, 2004 at 08:18:11AM -0700, Christoph Lameter wrote:
> Correct. There are a number of issues that may have to be addressed but
> first we need to agree on a general idea how to proceed.

I'd favor a per-cpu counter so the cacheline doesn't bounce.


-- wli

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2004-08-16 16:18 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <2ttIr-2e4-17@gated-at.bofh.it>
     [not found] ` <2tzE4-6sw-25@gated-at.bofh.it>
     [not found]   ` <2tCiw-8pK-1@gated-at.bofh.it>
2004-08-15 23:53     ` page fault fastpath: Increasing SMP scalability by introducing Andi Kleen
2004-08-15 23:55       ` Christoph Lameter
2004-08-16  0:12         ` page fault fastpath: Increasing SMP scalability by introducing pte locks? Andi Kleen
2004-08-15 13:50 page fault fastpath: Increasing SMP scalability by introducing pte Christoph Lameter
2004-08-15 20:09 ` page fault fastpath: Increasing SMP scalability by introducing David S. Miller
2004-08-15 22:58   ` Christoph Lameter
2004-08-15 23:58     ` David S. Miller
2004-08-16  0:11       ` Christoph Lameter
2004-08-16  1:56         ` David S. Miller
2004-08-16  3:29           ` Christoph Lameter
2004-08-16  6:59             ` Ray Bryant
2004-08-16 15:18               ` Christoph Lameter
2004-08-16 16:18                 ` page fault fastpath: Increasing SMP scalability by introducing pte locks? William Lee Irwin III
2004-08-16 14:39             ` William Lee Irwin III

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox