All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@osdl.org>, Benjamin LaHaise <bcrl@kvack.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH] use local_t for page statistics
Date: Sat, 07 Jan 2006 14:48:22 +1100	[thread overview]
Message-ID: <43BF3A06.10502@yahoo.com.au> (raw)
In-Reply-To: <200601070425.24810.ak@suse.de>

Andi Kleen wrote:
> On Saturday 07 January 2006 04:19, Nick Piggin wrote:
> 
>>Andi Kleen wrote:
>>
>>>On Saturday 07 January 2006 03:52, Nick Piggin wrote:
>>>
>>>
>>>
>>>>No. On many load/store architectures there is no good way to do local_t,
>>>>so something like ppc32 or ia64 just uses all atomic operations for
>>>
>>>
>>>well, they're just broken and need to be fixed to not do that.
>>>
>>
>>How?
> 
> 
> If anything use the 3x duplicated data setup, not atomic operations.
> 

At a 3x cache footprint cost? (and probably more than 3x for icache, though
I haven't checked) And I think hardware trends are against us. (Also, does
it have race issues with nested interrupts that Andrew noticed?)

> 
>>>Also I bet with some tricks a seqlock like setup could be made to work.
>>>
>>
>>I asked you how before. If you can come up with a way then it indeed
>>might be a good solution... 
> 
> 
> I'll try to work something up.
> 

Cool, I'd be interested to see.

> 
>>The problem I see with seqlock is that it 
>>is only fast in the read path. That path is not the issue here.
> 
> 
> The common case - not getting interrupted would be fast.
> 

The problem is that you can never do the final store without risking a
race with an interrupt. Because it is not a read-path.

The closest think I can see to a seqlock would be ll/sc operations, at
which point you're back to atomic ops.

> 
>>>>local_t, and ppc64 uses 3 counters per-cpu thus tripling the cache
>>>>footprint.
>>>
>>>
>>>and ppc64 has big caches so this also shouldn't be a problem.
>>>
>>
>>Well it is even less of a problem for them now, by about 1/3.
>>
>>Performance-wise there is really no benefit for even i386 or x86-64
>>to move to local_t now either so I don't see what the fuss is about.
> 
> 
> Actually P4 doesn't like CLI/STI. For AMD and P-M it's not that much an issue,
> but NetBurst really doesn't like it.
> 

Yes, it was worth over a second of real time and ~ 7% total kernel
time on kbuild on a P4.

(git: a74609fafa2e5cc31d558012abaaa55ec9ad9da4)

AMD and PM I didn't test but the improvement might still be noticable,
if much smaller.

-- 
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com 

WARNING: multiple messages have this Message-ID (diff)
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@osdl.org>, Benjamin LaHaise <bcrl@kvack.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH] use local_t for page statistics
Date: Sat, 07 Jan 2006 14:48:22 +1100	[thread overview]
Message-ID: <43BF3A06.10502@yahoo.com.au> (raw)
In-Reply-To: <200601070425.24810.ak@suse.de>

Andi Kleen wrote:
> On Saturday 07 January 2006 04:19, Nick Piggin wrote:
> 
>>Andi Kleen wrote:
>>
>>>On Saturday 07 January 2006 03:52, Nick Piggin wrote:
>>>
>>>
>>>
>>>>No. On many load/store architectures there is no good way to do local_t,
>>>>so something like ppc32 or ia64 just uses all atomic operations for
>>>
>>>
>>>well, they're just broken and need to be fixed to not do that.
>>>
>>
>>How?
> 
> 
> If anything use the 3x duplicated data setup, not atomic operations.
> 

At a 3x cache footprint cost? (and probably more than 3x for icache, though
I haven't checked) And I think hardware trends are against us. (Also, does
it have race issues with nested interrupts that Andrew noticed?)

> 
>>>Also I bet with some tricks a seqlock like setup could be made to work.
>>>
>>
>>I asked you how before. If you can come up with a way then it indeed
>>might be a good solution... 
> 
> 
> I'll try to work something up.
> 

Cool, I'd be interested to see.

> 
>>The problem I see with seqlock is that it 
>>is only fast in the read path. That path is not the issue here.
> 
> 
> The common case - not getting interrupted would be fast.
> 

The problem is that you can never do the final store without risking a
race with an interrupt. Because it is not a read-path.

The closest think I can see to a seqlock would be ll/sc operations, at
which point you're back to atomic ops.

> 
>>>>local_t, and ppc64 uses 3 counters per-cpu thus tripling the cache
>>>>footprint.
>>>
>>>
>>>and ppc64 has big caches so this also shouldn't be a problem.
>>>
>>
>>Well it is even less of a problem for them now, by about 1/3.
>>
>>Performance-wise there is really no benefit for even i386 or x86-64
>>to move to local_t now either so I don't see what the fuss is about.
> 
> 
> Actually P4 doesn't like CLI/STI. For AMD and P-M it's not that much an issue,
> but NetBurst really doesn't like it.
> 

Yes, it was worth over a second of real time and ~ 7% total kernel
time on kbuild on a P4.

(git: a74609fafa2e5cc31d558012abaaa55ec9ad9da4)

AMD and PM I didn't test but the improvement might still be noticable,
if much smaller.

-- 
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2006-01-07  3:48 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-01-06 21:53 [PATCH] use local_t for page statistics Benjamin LaHaise
2006-01-06 21:53 ` Benjamin LaHaise
2006-01-07  0:33 ` Andrew Morton
2006-01-07  0:33   ` Andrew Morton
2006-01-07  1:00   ` Marcelo Tosatti
2006-01-07  1:00     ` Marcelo Tosatti
2006-01-07  2:52   ` Nick Piggin
2006-01-07  2:52     ` Nick Piggin
2006-01-07  3:01     ` Andi Kleen
2006-01-07  3:01       ` Andi Kleen
2006-01-07  3:19       ` Nick Piggin
2006-01-07  3:19         ` Nick Piggin
2006-01-07  3:25         ` Andi Kleen
2006-01-07  3:25           ` Andi Kleen
2006-01-07  3:48           ` Nick Piggin [this message]
2006-01-07  3:48             ` Nick Piggin
2006-01-07  4:03             ` Andi Kleen
2006-01-07  4:03               ` Andi Kleen
2006-01-09 20:54       ` Christoph Lameter
2006-01-09 20:54         ` Christoph Lameter
2006-01-07  3:07     ` Andrew Morton
2006-01-07  3:07       ` Andrew Morton
2006-01-09 18:26   ` Benjamin LaHaise
2006-01-09 18:26     ` Benjamin LaHaise
2006-01-09 20:52     ` Christoph Lameter
2006-01-09 20:52       ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43BF3A06.10502@yahoo.com.au \
    --to=nickpiggin@yahoo.com.au \
    --cc=ak@suse.de \
    --cc=akpm@osdl.org \
    --cc=bcrl@kvack.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.