* Locking comment on shrink_caches()
@ 2001-09-25 17:49 Marcelo Tosatti
2001-09-25 19:57 ` David S. Miller
0 siblings, 1 reply; 67+ messages in thread
From: Marcelo Tosatti @ 2001-09-25 17:49 UTC (permalink / raw)
To: Andrea Arcangeli, Linus Torvalds; +Cc: lkml
Andrea,
Do you really need to do this ?
if (unlikely(!spin_trylock(&pagecache_lock))) {
/* we hold the page lock so the page cannot go away from under us */
spin_unlock(&pagemap_lru_lock);
spin_lock(&pagecache_lock);
spin_lock(&pagemap_lru_lock);
}
Have you actually seen bad hold times of pagecache_lock by
shrink_caches() ?
Its just that I prefer clear locking without those "tricks". (easier to
understand and harder to miss subtle details)
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-25 19:57 ` David S. Miller
@ 2001-09-25 18:40 ` Marcelo Tosatti
2001-09-25 20:15 ` David S. Miller
2001-09-25 20:40 ` Josh MacDonald
2001-09-25 21:57 ` Andrea Arcangeli
1 sibling, 2 replies; 67+ messages in thread
From: Marcelo Tosatti @ 2001-09-25 18:40 UTC (permalink / raw)
To: David S. Miller; +Cc: andrea, torvalds, linux-kernel
On Tue, 25 Sep 2001, David S. Miller wrote:
> From: Marcelo Tosatti <marcelo@conectiva.com.br>
> Date: Tue, 25 Sep 2001 14:49:40 -0300 (BRT)
>
> Do you really need to do this ?
>
> if (unlikely(!spin_trylock(&pagecache_lock))) {
> /* we hold the page lock so the page cannot go away from under us */
> spin_unlock(&pagemap_lru_lock);
>
> spin_lock(&pagecache_lock);
> spin_lock(&pagemap_lru_lock);
> }
>
> Have you actually seen bad hold times of pagecache_lock by
> shrink_caches() ?
>
> Marcelo, this is needed because of the spin lock ordering rules.
> The pagecache_lock must be obtained before the pagemap_lru_lock
> or else deadlock is possible. The spin_trylock is an optimization.
Not, it is not.
We can simply lock the pagecachelock and the pagemap_lru_lock at the
beginning of the cleaning function. page_launder() use to do that.
Thats why I asked Andrea if there was long hold times by shrink_caches().
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-25 20:15 ` David S. Miller
@ 2001-09-25 19:02 ` Marcelo Tosatti
2001-09-25 20:29 ` David S. Miller
2001-09-25 20:24 ` Rik van Riel
2001-09-25 22:01 ` Andrea Arcangeli
2 siblings, 1 reply; 67+ messages in thread
From: Marcelo Tosatti @ 2001-09-25 19:02 UTC (permalink / raw)
To: David S. Miller; +Cc: andrea, torvalds, linux-kernel
On Tue, 25 Sep 2001, David S. Miller wrote:
> From: Marcelo Tosatti <marcelo@conectiva.com.br>
> Date: Tue, 25 Sep 2001 15:40:23 -0300 (BRT)
>
> We can simply lock the pagecachelock and the pagemap_lru_lock at the
> beginning of the cleaning function. page_launder() use to do that.
>
> Thats why I asked Andrea if there was long hold times by shrink_caches().
>
> Ok, I see.
>
> I do think it's silly to hold the pagecache_lock during pure scanning
> activities of shrink_caches().
It may well be, but I would like to see some lockmeter results which show
that _shrink_cache()_ itself is a problem. :)
> It is known that pagecache_lock is the biggest scalability issue on
> large SMP systems, and thus the page cache locking patches Ingo and
> myself did.
Btw, is that one going into 2.5 for sure? (the per-address-space lock).
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-25 20:40 ` Josh MacDonald
@ 2001-09-25 19:25 ` Marcelo Tosatti
0 siblings, 0 replies; 67+ messages in thread
From: Marcelo Tosatti @ 2001-09-25 19:25 UTC (permalink / raw)
To: Josh MacDonald; +Cc: linux-kernel
On Tue, 25 Sep 2001, Josh MacDonald wrote:
> Quoting Marcelo Tosatti (marcelo@conectiva.com.br):
> >
> >
> > On Tue, 25 Sep 2001, David S. Miller wrote:
> >
> > > From: Marcelo Tosatti <marcelo@conectiva.com.br>
> > > Date: Tue, 25 Sep 2001 14:49:40 -0300 (BRT)
> > >
> > > Do you really need to do this ?
> > >
> > > if (unlikely(!spin_trylock(&pagecache_lock))) {
> > > /* we hold the page lock so the page cannot go away from under us */
> > > spin_unlock(&pagemap_lru_lock);
> > >
> > > spin_lock(&pagecache_lock);
> > > spin_lock(&pagemap_lru_lock);
> > > }
> > >
> > > Have you actually seen bad hold times of pagecache_lock by
> > > shrink_caches() ?
> > >
> > > Marcelo, this is needed because of the spin lock ordering rules.
> > > The pagecache_lock must be obtained before the pagemap_lru_lock
> > > or else deadlock is possible. The spin_trylock is an optimization.
> >
> > Not, it is not.
> >
> > We can simply lock the pagecachelock and the pagemap_lru_lock at the
> > beginning of the cleaning function. page_launder() use to do that.
>
> Since your main concern seems to be simplicity, the code can remain
> the way it is and be far more readable with, e.g.,
>
> /* Aquire lock1 while holding lock2--reverse order. */
> #define spin_reverse_lock(lock1,lock2) \
> if (unlikely(!spin_trylock(&lock1))) { \
> spin_unlock(&lock2); \
> spin_lock(&lock1); \
> spin_lock(&lock2); \
> }
>
> You can't argue for simple in favor of increasing lock contention,
> but you can keep it readable.
Making the code readable is different from making it logically simple.
I've already seen pretty subtle races on the VM which were living for long
times (eg the latest race which Hugh and me found on
add_to_swap_cache/try_to_swap_out which was there since 2.4.early), so I
prefer to make the code as simpler as possible.
If there is really long hold times by shrink_cache(), then I agree to keep
the current snippet of code to avoid that.
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-25 17:49 Locking comment on shrink_caches() Marcelo Tosatti
@ 2001-09-25 19:57 ` David S. Miller
2001-09-25 18:40 ` Marcelo Tosatti
2001-09-25 21:57 ` Andrea Arcangeli
0 siblings, 2 replies; 67+ messages in thread
From: David S. Miller @ 2001-09-25 19:57 UTC (permalink / raw)
To: marcelo; +Cc: andrea, torvalds, linux-kernel
From: Marcelo Tosatti <marcelo@conectiva.com.br>
Date: Tue, 25 Sep 2001 14:49:40 -0300 (BRT)
Do you really need to do this ?
if (unlikely(!spin_trylock(&pagecache_lock))) {
/* we hold the page lock so the page cannot go away from under us */
spin_unlock(&pagemap_lru_lock);
spin_lock(&pagecache_lock);
spin_lock(&pagemap_lru_lock);
}
Have you actually seen bad hold times of pagecache_lock by
shrink_caches() ?
Marcelo, this is needed because of the spin lock ordering rules.
The pagecache_lock must be obtained before the pagemap_lru_lock
or else deadlock is possible. The spin_trylock is an optimization.
Franks a lot,
David S. Miller
davem@redhat.com
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-25 18:40 ` Marcelo Tosatti
@ 2001-09-25 20:15 ` David S. Miller
2001-09-25 19:02 ` Marcelo Tosatti
` (2 more replies)
2001-09-25 20:40 ` Josh MacDonald
1 sibling, 3 replies; 67+ messages in thread
From: David S. Miller @ 2001-09-25 20:15 UTC (permalink / raw)
To: marcelo; +Cc: andrea, torvalds, linux-kernel
From: Marcelo Tosatti <marcelo@conectiva.com.br>
Date: Tue, 25 Sep 2001 15:40:23 -0300 (BRT)
We can simply lock the pagecachelock and the pagemap_lru_lock at the
beginning of the cleaning function. page_launder() use to do that.
Thats why I asked Andrea if there was long hold times by shrink_caches().
Ok, I see.
I do think it's silly to hold the pagecache_lock during pure scanning
activities of shrink_caches().
It is known that pagecache_lock is the biggest scalability issue on
large SMP systems, and thus the page cache locking patches Ingo and
myself did.
Franks a lot,
David S. Miller
davem@redhat.com
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-25 20:15 ` David S. Miller
2001-09-25 19:02 ` Marcelo Tosatti
@ 2001-09-25 20:24 ` Rik van Riel
2001-09-25 20:28 ` David S. Miller
[not found] ` <200109252215.f8PMFDa02034@eng2.beaverton.ibm.com>
2001-09-25 22:01 ` Andrea Arcangeli
2 siblings, 2 replies; 67+ messages in thread
From: Rik van Riel @ 2001-09-25 20:24 UTC (permalink / raw)
To: David S. Miller; +Cc: marcelo, andrea, torvalds, linux-kernel
On Tue, 25 Sep 2001, David S. Miller wrote:
> It is known that pagecache_lock is the biggest scalability issue
> on large SMP systems, and thus the page cache locking patches
> Ingo and myself did.
Interesting, most lockmeter data dumps I've seen here
indicate the locks in fs/buffer.c as the big problem
and have pagecache_lock down in the noise.
Or were you measuring loads which are mostly read-only ?
regards,
Rik
--
IA64: a worthy successor to the i860.
http://www.surriel.com/
http://www.conectiva.com/ http://distro.conectiva.com/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-25 20:24 ` Rik van Riel
@ 2001-09-25 20:28 ` David S. Miller
2001-09-25 21:05 ` Andrew Morton
[not found] ` <200109252215.f8PMFDa02034@eng2.beaverton.ibm.com>
1 sibling, 1 reply; 67+ messages in thread
From: David S. Miller @ 2001-09-25 20:28 UTC (permalink / raw)
To: riel; +Cc: marcelo, andrea, torvalds, linux-kernel
From: Rik van Riel <riel@conectiva.com.br>
Date: Tue, 25 Sep 2001 17:24:21 -0300 (BRST)
Or were you measuring loads which are mostly read-only ?
When Kanoj Sarcar was back at SGI testing 32 processor Origin
MIPS systems, pagecache_lock was at the top.
Franks a lot,
David S. Miller
davem@redhat.com
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-25 19:02 ` Marcelo Tosatti
@ 2001-09-25 20:29 ` David S. Miller
2001-09-25 21:00 ` Benjamin LaHaise
0 siblings, 1 reply; 67+ messages in thread
From: David S. Miller @ 2001-09-25 20:29 UTC (permalink / raw)
To: marcelo; +Cc: andrea, torvalds, linux-kernel
From: Marcelo Tosatti <marcelo@conectiva.com.br>
Date: Tue, 25 Sep 2001 16:02:29 -0300 (BRT)
> It is known that pagecache_lock is the biggest scalability issue on
> large SMP systems, and thus the page cache locking patches Ingo and
> myself did.
Btw, is that one going into 2.5 for sure? (the per-address-space lock).
Well, there are two things happing in that patch. Per-hash chain
locks for the page cache itself, and the lock added to the address
space for that page list.
Linus has indicated it will go into 2.5.x, yes.
Franks a lot,
David S. Miller
davem@redhat.com
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-25 18:40 ` Marcelo Tosatti
2001-09-25 20:15 ` David S. Miller
@ 2001-09-25 20:40 ` Josh MacDonald
2001-09-25 19:25 ` Marcelo Tosatti
1 sibling, 1 reply; 67+ messages in thread
From: Josh MacDonald @ 2001-09-25 20:40 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: linux-kernel
Quoting Marcelo Tosatti (marcelo@conectiva.com.br):
>
>
> On Tue, 25 Sep 2001, David S. Miller wrote:
>
> > From: Marcelo Tosatti <marcelo@conectiva.com.br>
> > Date: Tue, 25 Sep 2001 14:49:40 -0300 (BRT)
> >
> > Do you really need to do this ?
> >
> > if (unlikely(!spin_trylock(&pagecache_lock))) {
> > /* we hold the page lock so the page cannot go away from under us */
> > spin_unlock(&pagemap_lru_lock);
> >
> > spin_lock(&pagecache_lock);
> > spin_lock(&pagemap_lru_lock);
> > }
> >
> > Have you actually seen bad hold times of pagecache_lock by
> > shrink_caches() ?
> >
> > Marcelo, this is needed because of the spin lock ordering rules.
> > The pagecache_lock must be obtained before the pagemap_lru_lock
> > or else deadlock is possible. The spin_trylock is an optimization.
>
> Not, it is not.
>
> We can simply lock the pagecachelock and the pagemap_lru_lock at the
> beginning of the cleaning function. page_launder() use to do that.
Since your main concern seems to be simplicity, the code can remain
the way it is and be far more readable with, e.g.,
/* Aquire lock1 while holding lock2--reverse order. */
#define spin_reverse_lock(lock1,lock2) \
if (unlikely(!spin_trylock(&lock1))) { \
spin_unlock(&lock2); \
spin_lock(&lock1); \
spin_lock(&lock2); \
}
You can't argue for simple in favor of increasing lock contention,
but you can keep it readable.
-josh
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-25 20:29 ` David S. Miller
@ 2001-09-25 21:00 ` Benjamin LaHaise
2001-09-25 21:55 ` David S. Miller
2001-09-25 22:03 ` Andrea Arcangeli
0 siblings, 2 replies; 67+ messages in thread
From: Benjamin LaHaise @ 2001-09-25 21:00 UTC (permalink / raw)
To: David S. Miller; +Cc: marcelo, andrea, torvalds, linux-kernel
On Tue, Sep 25, 2001 at 01:29:05PM -0700, David S. Miller wrote:
> Well, there are two things happing in that patch. Per-hash chain
> locks for the page cache itself, and the lock added to the address
> space for that page list.
Last time I looked, those patches made the already ugly vm locking
even worse. I'd rather try to use some of the rcu techniques for
page cache lookup, and per-page locking for page cache removal
which will lead to *cleaner* code as well as a much more scalable
kernel.
Keep in mind that just because a lock is on someone's hitlist doesn't
mean that it is for the right reasons. Look at the io_request_lock
that is held around the bounce buffer copies in the scsi midlayer.
*shudder*
-ben
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-25 20:28 ` David S. Miller
@ 2001-09-25 21:05 ` Andrew Morton
2001-09-25 21:48 ` David S. Miller
0 siblings, 1 reply; 67+ messages in thread
From: Andrew Morton @ 2001-09-25 21:05 UTC (permalink / raw)
To: David S. Miller; +Cc: riel, marcelo, andrea, torvalds, linux-kernel
"David S. Miller" wrote:
>
> From: Rik van Riel <riel@conectiva.com.br>
> Date: Tue, 25 Sep 2001 17:24:21 -0300 (BRST)
>
> Or were you measuring loads which are mostly read-only ?
>
> When Kanoj Sarcar was back at SGI testing 32 processor Origin
> MIPS systems, pagecache_lock was at the top.
But when I asked kumon to test it on his 8-way Xeon,
page_cache_lock contention proved to be insignificant.
Seems to only be a NUMA thing.
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-25 21:05 ` Andrew Morton
@ 2001-09-25 21:48 ` David S. Miller
0 siblings, 0 replies; 67+ messages in thread
From: David S. Miller @ 2001-09-25 21:48 UTC (permalink / raw)
To: akpm; +Cc: riel, marcelo, andrea, torvalds, linux-kernel, mingo
From: Andrew Morton <akpm@zip.com.au>
Date: Tue, 25 Sep 2001 14:05:04 -0700
"David S. Miller" wrote:
> When Kanoj Sarcar was back at SGI testing 32 processor Origin
> MIPS systems, pagecache_lock was at the top.
But when I asked kumon to test it on his 8-way Xeon,
page_cache_lock contention proved to be insignificant.
Seems to only be a NUMA thing.
I doubt it is only a NUMA thing. I say this for TUX web benchmarks
that tended to hold most of the resident set in memory, the page cache
locking changes were measured to improve performance significantly on
SMP x86 systems.
Ingo would be able to comment further.
Franks a lot,
David S. Miller
davem@redhat.com
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-25 21:00 ` Benjamin LaHaise
@ 2001-09-25 21:55 ` David S. Miller
2001-09-25 22:16 ` Benjamin LaHaise
2001-09-25 22:03 ` Andrea Arcangeli
1 sibling, 1 reply; 67+ messages in thread
From: David S. Miller @ 2001-09-25 21:55 UTC (permalink / raw)
To: bcrl; +Cc: marcelo, andrea, torvalds, linux-kernel
From: Benjamin LaHaise <bcrl@redhat.com>
Date: Tue, 25 Sep 2001 17:00:55 -0400
Last time I looked, those patches made the already ugly vm locking
even worse. I'd rather try to use some of the rcu techniques for
page cache lookup, and per-page locking for page cache removal
which will lead to *cleaner* code as well as a much more scalable
kernel.
I'm willing to investigate using RCU. However, per hashchain locking
is a much proven technique (inside the networking in particular) which
is why that was the method employed. At the time the patch was
implemented, the RCU stuff was not fully formulated.
Please note that the problem is lock cachelines in dirty exclusive
state, not a "lock held for long time" issue.
Keep in mind that just because a lock is on someone's hitlist doesn't
mean that it is for the right reasons. Look at the io_request_lock
that is held around the bounce buffer copies in the scsi midlayer.
*shudder*
I agree. But to my understanding, and after having studied the
pagecache lock usage, it was minimally used and not used in any places
unnecessarily as per the io_request_lock example you are stating.
In fact, the pagecache_lock is mostly held for extremely short periods
of time.
Franks a lot,
David S. Miller
davem@redhat.com
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-25 19:57 ` David S. Miller
2001-09-25 18:40 ` Marcelo Tosatti
@ 2001-09-25 21:57 ` Andrea Arcangeli
1 sibling, 0 replies; 67+ messages in thread
From: Andrea Arcangeli @ 2001-09-25 21:57 UTC (permalink / raw)
To: David S. Miller; +Cc: marcelo, torvalds, linux-kernel
On Tue, Sep 25, 2001 at 12:57:58PM -0700, David S. Miller wrote:
> or else deadlock is possible. The spin_trylock is an optimization.
Indeed.
Andrea
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-25 20:15 ` David S. Miller
2001-09-25 19:02 ` Marcelo Tosatti
2001-09-25 20:24 ` Rik van Riel
@ 2001-09-25 22:01 ` Andrea Arcangeli
2001-09-25 22:03 ` David S. Miller
2 siblings, 1 reply; 67+ messages in thread
From: Andrea Arcangeli @ 2001-09-25 22:01 UTC (permalink / raw)
To: David S. Miller; +Cc: marcelo, torvalds, linux-kernel
On Tue, Sep 25, 2001 at 01:15:28PM -0700, David S. Miller wrote:
> I do think it's silly to hold the pagecache_lock during pure scanning
> activities of shrink_caches().
Indeed again.
> It is known that pagecache_lock is the biggest scalability issue on
> large SMP systems, and thus the page cache locking patches Ingo and
> myself did.
yes.
IMHO if we would hold the pagecache lock all the time while shrinking
the cache, then we could kill the lru lock in first place.
Andrea
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-25 22:01 ` Andrea Arcangeli
@ 2001-09-25 22:03 ` David S. Miller
2001-09-25 22:59 ` Andrea Arcangeli
0 siblings, 1 reply; 67+ messages in thread
From: David S. Miller @ 2001-09-25 22:03 UTC (permalink / raw)
To: andrea; +Cc: marcelo, torvalds, linux-kernel
From: Andrea Arcangeli <andrea@suse.de>
Date: Wed, 26 Sep 2001 00:01:02 +0200
IMHO if we would hold the pagecache lock all the time while shrinking
the cache, then we could kill the lru lock in first place.
And actually in the pagecache locking patches, doing such a thing
would be impossible :-) since each page needs to grab a different
lock (because the hash chain is potentially different).
Franks a lot,
David S. Miller
davem@redhat.com
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-25 21:00 ` Benjamin LaHaise
2001-09-25 21:55 ` David S. Miller
@ 2001-09-25 22:03 ` Andrea Arcangeli
1 sibling, 0 replies; 67+ messages in thread
From: Andrea Arcangeli @ 2001-09-25 22:03 UTC (permalink / raw)
To: Benjamin LaHaise; +Cc: David S. Miller, marcelo, torvalds, linux-kernel
On Tue, Sep 25, 2001 at 05:00:55PM -0400, Benjamin LaHaise wrote:
> even worse. I'd rather try to use some of the rcu techniques for
> page cache lookup, and per-page locking for page cache removal
> which will lead to *cleaner* code as well as a much more scalable
I don't think rcu fits there, truncations and releasing must be
extremely efficient too.
Andrea
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-25 21:55 ` David S. Miller
@ 2001-09-25 22:16 ` Benjamin LaHaise
2001-09-25 22:28 ` David S. Miller
0 siblings, 1 reply; 67+ messages in thread
From: Benjamin LaHaise @ 2001-09-25 22:16 UTC (permalink / raw)
To: David S. Miller; +Cc: marcelo, andrea, torvalds, linux-kernel
On Tue, Sep 25, 2001 at 02:55:47PM -0700, David S. Miller wrote:
> I'm willing to investigate using RCU. However, per hashchain locking
> is a much proven technique (inside the networking in particular) which
> is why that was the method employed. At the time the patch was
> implemented, the RCU stuff was not fully formulated.
*nod*
> Please note that the problem is lock cachelines in dirty exclusive
> state, not a "lock held for long time" issue.
Ahh, that's a cpu bug -- one my athlons don't suffer from.
> I agree. But to my understanding, and after having studied the
> pagecache lock usage, it was minimally used and not used in any places
> unnecessarily as per the io_request_lock example you are stating.
>
> In fact, the pagecache_lock is mostly held for extremely short periods
> of time.
True, and that is why I would like to see more of the research that
justifies these changes, as well as comparisons with alternate techniques
before any of these patches make it into the base tree. Even before that,
we need to clean up the code first.
-ben
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
[not found] ` <200109252215.f8PMFDa02034@eng2.beaverton.ibm.com>
@ 2001-09-25 22:26 ` David S. Miller
2001-09-26 17:42 ` Ingo Molnar
0 siblings, 1 reply; 67+ messages in thread
From: David S. Miller @ 2001-09-25 22:26 UTC (permalink / raw)
To: gerrit; +Cc: riel, marcelo, andrea, torvalds, linux-kernel
From: Gerrit Huizenga <gerrit@us.ibm.com>
Date: Tue, 25 Sep 2001 15:15:13 PDT
I'm very curious as to what workloads are showing pagecache_lock as
a bottleneck. We haven't noticed this particular bottleneck in most
of the workloads we are running. Is there a good workload that shows
this type of load?
Again, I defer to Ingo for specifics, but essentially something like
specweb99 where the whole dataset fits in memory.
Franks a lot,
David S. Miller
davem@redhat.com
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-25 22:16 ` Benjamin LaHaise
@ 2001-09-25 22:28 ` David S. Miller
2001-09-26 16:40 ` Alan Cox
0 siblings, 1 reply; 67+ messages in thread
From: David S. Miller @ 2001-09-25 22:28 UTC (permalink / raw)
To: bcrl; +Cc: marcelo, andrea, torvalds, linux-kernel
From: Benjamin LaHaise <bcrl@redhat.com>
Date: Tue, 25 Sep 2001 18:16:43 -0400
> Please note that the problem is lock cachelines in dirty exclusive
> state, not a "lock held for long time" issue.
Ahh, that's a cpu bug -- one my athlons don't suffer from.
Your Athlons may handle exclusive cache line acquisition more
efficiently (due to memory subsystem performance) but it still
does cost something.
True, and that is why I would like to see more of the research that
justifies these changes, as well as comparisons with alternate techniques
before any of these patches make it into the base tree. Even before that,
we need to clean up the code first.
As an aside, I actually think the per-hashchain version of the
pagecache locking is cleaner conceptually. The reason is that
it makes it more clear that we are locking the "identity of page X"
instead of "the page cache".
Franks a lot,
David S. Miller
davem@redhat.com
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-25 22:03 ` David S. Miller
@ 2001-09-25 22:59 ` Andrea Arcangeli
0 siblings, 0 replies; 67+ messages in thread
From: Andrea Arcangeli @ 2001-09-25 22:59 UTC (permalink / raw)
To: David S. Miller; +Cc: marcelo, torvalds, linux-kernel
On Tue, Sep 25, 2001 at 03:03:28PM -0700, David S. Miller wrote:
> From: Andrea Arcangeli <andrea@suse.de>
> Date: Wed, 26 Sep 2001 00:01:02 +0200
>
> IMHO if we would hold the pagecache lock all the time while shrinking
> the cache, then we could kill the lru lock in first place.
>
> And actually in the pagecache locking patches, doing such a thing
> would be impossible :-) since each page needs to grab a different
good further point too :), it would be an option only for mainline.
Andrea
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
@ 2001-09-26 5:04 Dipankar Sarma
2001-09-26 5:31 ` Andrew Morton
0 siblings, 1 reply; 67+ messages in thread
From: Dipankar Sarma @ 2001-09-26 5:04 UTC (permalink / raw)
To: davem; +Cc: marcelo, riel, Andrea Arcangeli, torvalds, linux-kernel, hawkes
In article <20010925.132816.52117370.davem@redhat.com> David S. Miller wrote:
> From: Rik van Riel <riel@conectiva.com.br>
> Date: Tue, 25 Sep 2001 17:24:21 -0300 (BRST)
>
> Or were you measuring loads which are mostly read-only ?
> When Kanoj Sarcar was back at SGI testing 32 processor Origin
> MIPS systems, pagecache_lock was at the top.
John Hawkes from SGI had published some AIM7 numbers that showed
pagecache_lock to be a bottleneck above 4 processors. At 32 processors,
half the CPU cycles were spent on waiting for pagecache_lock. The
thread is at -
http://marc.theaimsgroup.com/?l=lse-tech&m=98459051027582&w=2
Thanks
Dipankar
--
Dipankar Sarma <dipankar@in.ibm.com> Project: http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-26 5:04 Dipankar Sarma
@ 2001-09-26 5:31 ` Andrew Morton
2001-09-26 6:57 ` David S. Miller
` (2 more replies)
0 siblings, 3 replies; 67+ messages in thread
From: Andrew Morton @ 2001-09-26 5:31 UTC (permalink / raw)
To: dipankar
Cc: davem, marcelo, riel, Andrea Arcangeli, torvalds, linux-kernel,
hawkes
Dipankar Sarma wrote:
>
> In article <20010925.132816.52117370.davem@redhat.com> David S. Miller wrote:
> > From: Rik van Riel <riel@conectiva.com.br>
> > Date: Tue, 25 Sep 2001 17:24:21 -0300 (BRST)
> >
> > Or were you measuring loads which are mostly read-only ?
>
> > When Kanoj Sarcar was back at SGI testing 32 processor Origin
> > MIPS systems, pagecache_lock was at the top.
>
> John Hawkes from SGI had published some AIM7 numbers that showed
> pagecache_lock to be a bottleneck above 4 processors. At 32 processors,
> half the CPU cycles were spent on waiting for pagecache_lock. The
> thread is at -
>
> http://marc.theaimsgroup.com/?l=lse-tech&m=98459051027582&w=2
>
That's NUMA hardware. The per-hashqueue locking change made
a big improvement on that hardware. But when it was used on
Intel hardware it made no measurable difference at all.
Sorry, but the patch adds compexity and unless a significant
throughput benefit can be demonstrated on less exotic hardware,
why use it?
Here are kumon's test results from March, with and without
the hashed lock patch:
-------- Original Message --------
Subject: Re: [Fwd: Re: [Lse-tech] AIM7 scaling, pagecache_lock, multiqueue scheduler]
Date: Thu, 15 Mar 2001 18:03:55 +0900
From: kumon@flab.fujitsu.co.jp
Reply-To: kumon@flab.fujitsu.co.jp
To: Andrew Morton <andrewm@uow.edu.au>
CC: kumon@flab.fujitsu.co.jp, ahirai@flab.fujitsu.co.jp,John Hawkes <hawkes@engr.sgi.com>,kumon@flab.fujitsu.co.jp
In-Reply-To: <3AB032B3.87940521@uow.edu.au>,<3AB0089B.CF3496D2@uow.edu.au><200103150234.LAA28075@asami.proc><3AB032B3.87940521@uow.edu.au>
OK, the followings are a result of our brief measurement with WebBench
(mindcraft type) of 2.4.2 and 2.4.2+pcl .
Workload: WebBench 3.0 (static get)
Machine: Profusion 8way 550MHz/1MB cache 1GB mem.
Server: Apache 1.3.9-8 (w/ SINGLE_LISTEN_UNSERIALIZED_ACCEPT)
obtained from RedHat.
Clients: 32 clients each has 2 requesting threads.
The following number is Request per sec.
242 242+pcl ratio
-------------------------------------
1SMP 1,603 1,584 0.99
2(1+1)SMP 2,443 2,437 1.00
4(1+3)SMP 4,420 4,426 1.00
8(4+4)SMP 5,381 5,400 1.00
#No idle time observed in the 1 to 4 SMP runs.
#Only 8 SMP cases shows cpu-idle time, but it is about 2.1-2.8% of the
#total CPU time.
Note: The load of two buses of Profusion system isn't balance, because
the number of CPUs on each bus is unbalance.
Summary:
From the above brief test, (+pcl) patch doens't show the measurable
performance gain.
-
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-26 5:31 ` Andrew Morton
@ 2001-09-26 6:57 ` David S. Miller
2001-09-26 7:08 ` Dipankar Sarma
2001-09-26 16:52 ` John Hawkes
2 siblings, 0 replies; 67+ messages in thread
From: David S. Miller @ 2001-09-26 6:57 UTC (permalink / raw)
To: akpm; +Cc: dipankar, marcelo, riel, andrea, torvalds, linux-kernel, hawkes
From: Andrew Morton <akpm@zip.com.au>
Date: Tue, 25 Sep 2001 22:31:32 -0700
Here are kumon's test results from March, with and without
the hashed lock patch:
Please elaborate on what the webbench-3.0 static gets was
really doing.
Was this test composed of multiple accesses to the same or a small set
of files? If so, that is indeed the case where the page cache locking
patches won't help at all.
The more diversified the set of files being accessed, the greater the
gain from the locking changes. You have to encourage the cpus at
least have a chance at accessing different hash chains :-)
Franks a lot,
David S. Miller
davem@redhat.com
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-26 5:31 ` Andrew Morton
2001-09-26 6:57 ` David S. Miller
@ 2001-09-26 7:08 ` Dipankar Sarma
2001-09-26 16:52 ` John Hawkes
2 siblings, 0 replies; 67+ messages in thread
From: Dipankar Sarma @ 2001-09-26 7:08 UTC (permalink / raw)
To: Andrew Morton
Cc: davem, marcelo, riel, Andrea Arcangeli, torvalds, linux-kernel,
anton, jdoelle
On Tue, Sep 25, 2001 at 10:31:32PM -0700, Andrew Morton wrote:
> Dipankar Sarma wrote:
> >
> > John Hawkes from SGI had published some AIM7 numbers that showed
> > pagecache_lock to be a bottleneck above 4 processors. At 32 processors,
> > half the CPU cycles were spent on waiting for pagecache_lock. The
> > thread is at -
> >
> > http://marc.theaimsgroup.com/?l=lse-tech&m=98459051027582&w=2
> >
>
> That's NUMA hardware. The per-hashqueue locking change made
> a big improvement on that hardware. But when it was used on
> Intel hardware it made no measurable difference at all.
>
> Sorry, but the patch adds compexity and unless a significant
> throughput benefit can be demonstrated on less exotic hardware,
> why use it?
I agree that on NUMA systems, contention and lock wait times
degenerate non-linearly thereby skewing the actual impact.
IIRC, there were discussions on lse-tech about pagecache_lock and
dbench numbers published by Juergen Doelle (on 8way Intel) and
Anton Blanchard on 16way PPC. Perhaps they can shed some light on this.
Thanks
Dipankar
--
Dipankar Sarma <dipankar@in.ibm.com> Project: http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-25 22:28 ` David S. Miller
@ 2001-09-26 16:40 ` Alan Cox
2001-09-26 17:25 ` Linus Torvalds
0 siblings, 1 reply; 67+ messages in thread
From: Alan Cox @ 2001-09-26 16:40 UTC (permalink / raw)
To: David S. Miller; +Cc: bcrl, marcelo, andrea, torvalds, linux-kernel
> Ahh, that's a cpu bug -- one my athlons don't suffer from.
>
> Your Athlons may handle exclusive cache line acquisition more
> efficiently (due to memory subsystem performance) but it still
> does cost something.
On an exclusive line on Athlon a lock cycle is near enough free, its
just an ordering constraint. Since the line is in E state no other bus
master can hold a copy in cache so the atomicity is there. Ditto for newer
Intel processors
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-26 5:31 ` Andrew Morton
2001-09-26 6:57 ` David S. Miller
2001-09-26 7:08 ` Dipankar Sarma
@ 2001-09-26 16:52 ` John Hawkes
2 siblings, 0 replies; 67+ messages in thread
From: John Hawkes @ 2001-09-26 16:52 UTC (permalink / raw)
To: Andrew Morton, dipankar
Cc: davem, marcelo, riel, Andrea Arcangeli, torvalds, linux-kernel,
hawkes
From: "Andrew Morton" <akpm@zip.com.au>
> > John Hawkes from SGI had published some AIM7 numbers that showed
> > pagecache_lock to be a bottleneck above 4 processors. At 32
processors,
> > half the CPU cycles were spent on waiting for pagecache_lock. The
> > thread is at -
> >
> > http://marc.theaimsgroup.com/?l=lse-tech&m=98459051027582&w=2
> >
>
> That's NUMA hardware. The per-hashqueue locking change made
> a big improvement on that hardware. But when it was used on
> Intel hardware it made no measurable difference at all.
More specifically, that was on SGI Origin2000 32p mips64 ccNUMA
hardware. The pagecache_lock bottleneck is substantially less on SGI
Itanium ccNUMA hardware running those AIM7 workloads. I'm seeing
moderately significant contention on the Big Kernel Lock, mostly from
sys_lseek() and ext2_get_block().
John Hawkes
hawkes@sgi.com
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-26 16:40 ` Alan Cox
@ 2001-09-26 17:25 ` Linus Torvalds
2001-09-26 17:40 ` Alan Cox
` (4 more replies)
0 siblings, 5 replies; 67+ messages in thread
From: Linus Torvalds @ 2001-09-26 17:25 UTC (permalink / raw)
To: Alan Cox; +Cc: David S. Miller, bcrl, marcelo, andrea, linux-kernel
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1697 bytes --]
On Wed, 26 Sep 2001, Alan Cox wrote:
> >
> > Your Athlons may handle exclusive cache line acquisition more
> > efficiently (due to memory subsystem performance) but it still
> > does cost something.
>
> On an exclusive line on Athlon a lock cycle is near enough free, its
> just an ordering constraint. Since the line is in E state no other bus
> master can hold a copy in cache so the atomicity is there. Ditto for newer
> Intel processors
You misunderstood the problem, I think: when the line moves from one CPU
to the other (the exclusive state moves along with it), that is
_expensive_.
Even when you have a backside bus (or cache pushout content snooping) to
allow the cacheline to move directly from one CPU to the other without
having to go through memory, that's a really expensive thing to do.
So re-aquring the lock on the same CPU is pretty much free (18 cycles for
Intel, if I remember correctly, and that's _entirely_ due to the pipeline
flush to ensure in-order execution around it).
[ Oh, just for interest I checked my P4, which has a much longer pipeline:
the cost of an exclusive locked access is a whopping 104 cycles. But we
already knew that the first-generation P4 does badly on many things.
Just reading the cycle counter is apparently around 80 cycles on a P4,
it's 32 cycles on a PIII. Looks like that also stalls the pipeline or
something. But cpuid is _really_ horrible. Test out the attached
program.
PIII:
nothing: 32 cycles
locked add: 50 cycles
cpuid: 170 cycles
P4:
nothing: 80 cycles
locked add: 184 cycles
cpuid: 652 cycles
Remember: these are for the already-exclusive-cache cases. ]
What are the athlon numbers?
Linus
[-- Attachment #2: Type: TEXT/PLAIN, Size: 612 bytes --]
#define rdtsc(low) \
__asm__ __volatile__("rdtsc" : "=a" (low) : : "edx")
#define TIME(x,y) \
min = 100000; \
for (i = 0; i < 1000; i++) { \
unsigned long start,end; \
rdtsc(start); \
x; \
rdtsc(end); \
end -= start; \
if (end < min) \
min = end; \
} \
printf(y ": %d cycles\n", min);
#define LOCK asm volatile("lock ; addl $0,0(%esp)")
#define CPUID asm volatile("cpuid": : :"ax", "dx", "cx", "bx")
int main()
{
unsigned long min;
int i;
TIME(/* */, "nothing");
TIME(LOCK, "locked add");
TIME(CPUID, "cpuid");
}
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-26 17:25 ` Linus Torvalds
@ 2001-09-26 17:40 ` Alan Cox
2001-09-26 17:44 ` Linus Torvalds
` (2 more replies)
2001-09-26 17:43 ` Richard Gooch
` (3 subsequent siblings)
4 siblings, 3 replies; 67+ messages in thread
From: Alan Cox @ 2001-09-26 17:40 UTC (permalink / raw)
To: Linus Torvalds
Cc: Alan Cox, David S. Miller, bcrl, marcelo, andrea, linux-kernel
> PIII:
> nothing: 32 cycles
> locked add: 50 cycles
> cpuid: 170 cycles
>
> P4:
> nothing: 80 cycles
> locked add: 184 cycles
> cpuid: 652 cycles
Original core Athlon (step 2 and earlier)
nothing: 11 cycles
locked add: 22 cycles
cpuid: 67 cycles
generic Athlon is
nothing: 11 cycles
locked add: 11 cycles
cpuid: 64 cycles
I don't currently have a palomino core to test
Wait for AMD to publish graphs of CPUid performance for PIV versus Athlon 8)
Alan
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-25 22:26 ` David S. Miller
@ 2001-09-26 17:42 ` Ingo Molnar
0 siblings, 0 replies; 67+ messages in thread
From: Ingo Molnar @ 2001-09-26 17:42 UTC (permalink / raw)
To: David S. Miller
Cc: gerrit, riel, marcelo, Andrea Arcangeli, Linus Torvalds,
linux-kernel
On Tue, 25 Sep 2001, David S. Miller wrote:
> I'm very curious as to what workloads are showing pagecache_lock as
> a bottleneck. We haven't noticed this particular bottleneck in most
> of the workloads we are running. Is there a good workload that shows
> this type of load?
>
> Again, I defer to Ingo for specifics, but essentially something
> like specweb99 where the whole dataset fits in memory.
it was SPECweb99 tests done in 32 GB RAM, 8 CPUs, where the pagecache was
nearly 30 GB big. We saw visible pagecache_lock contention on such
systems. Due to TUX's use of zerocopy, page lookups happen at a much
larger frequency and they are not intermixed with memory copies - in
contrast with workloads like dbench.
Ingo
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-26 17:25 ` Linus Torvalds
2001-09-26 17:40 ` Alan Cox
@ 2001-09-26 17:43 ` Richard Gooch
2001-09-26 18:24 ` Benjamin LaHaise
2001-09-26 17:45 ` Dave Jones
` (2 subsequent siblings)
4 siblings, 1 reply; 67+ messages in thread
From: Richard Gooch @ 2001-09-26 17:43 UTC (permalink / raw)
To: Linus Torvalds
Cc: Alan Cox, David S. Miller, bcrl, marcelo, andrea, linux-kernel
Linus Torvalds writes:
> This message is in MIME format. The first part should be readable text,
> while the remaining parts are likely unreadable without MIME-aware tools.
> Send mail to mime@docserver.cac.washington.edu for more info.
Yuk! MIME! I thought you hated it too?
> PIII:
> nothing: 32 cycles
> locked add: 50 cycles
> cpuid: 170 cycles
>
> P4:
> nothing: 80 cycles
> locked add: 184 cycles
> cpuid: 652 cycles
>
> Remember: these are for the already-exclusive-cache cases. ]
>
> What are the athlon numbers?
Athalon 850 MHz:
nothing: 11 cycles
locked add: 12 cycles
cpuid: 64 cycles
BTW: your code had horrible control-M's on each line. So the compiler
choked (with a less-than-helpful error message). Of course, cat t.c
showed nothing amiss. Fortunately emacs doesn't hide information.
Regards,
Richard....
Permanent: rgooch@atnf.csiro.au
Current: rgooch@ras.ucalgary.ca
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-26 17:40 ` Alan Cox
@ 2001-09-26 17:44 ` Linus Torvalds
2001-09-26 18:01 ` Benjamin LaHaise
2001-09-26 18:01 ` Dave Jones
2001-09-26 20:20 ` Vojtech Pavlik
2 siblings, 1 reply; 67+ messages in thread
From: Linus Torvalds @ 2001-09-26 17:44 UTC (permalink / raw)
To: Alan Cox; +Cc: David S. Miller, bcrl, marcelo, andrea, linux-kernel
On Wed, 26 Sep 2001, Alan Cox wrote:
> > PIII:
> > nothing: 32 cycles
> > locked add: 50 cycles
> > cpuid: 170 cycles
> >
> > P4:
> > nothing: 80 cycles
> > locked add: 184 cycles
> > cpuid: 652 cycles
>
>
> Original core Athlon (step 2 and earlier)
> nothing: 11 cycles
> locked add: 22 cycles
> cpuid: 67 cycles
>
> generic Athlon:
> nothing: 11 cycles
> locked add: 11 cycles
> cpuid: 64 cycles
Do you have an actual SMP Athlon to test? I'd love to see if that "locked
add" thing is really SMP-safe - it may be that it's the old "AMD turned
off the 'lock' prefix synchronization because it doesn't matter in UP".
They used to have a bit to do that..
That said, it _can_ be real even on SMP. There's no reason why a memory
barrier would have to be as heavy as it is on some machines (even the P4
looks positively _fast_ compared to most older machines that did memory
barriers on the bus and took hundreds of much slower cycles to do it).
> Wait for AMD to publish graphs of CPUid performance for PIV versus Athlon 8)
The sad thing is, I think Intel used to suggest that people use "cpuid" as
the thing to serialize the cores. So people may actually be _using_ it for
something like semaphores. I remember that Ingo or somebody suggested we'd
use it for the Linux "mb()" macro - I _much_ prefer the saner locked zero
add into the stack, and the prediction that Intel would be more likely to
optimize for "add" than for "cpuid" certainly ended up being surprisingly
true on the P4.
Linus
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-26 17:25 ` Linus Torvalds
2001-09-26 17:40 ` Alan Cox
2001-09-26 17:43 ` Richard Gooch
@ 2001-09-26 17:45 ` Dave Jones
2001-09-26 17:50 ` Alan Cox
2001-09-26 23:26 ` David S. Miller
4 siblings, 0 replies; 67+ messages in thread
From: Dave Jones @ 2001-09-26 17:45 UTC (permalink / raw)
To: Linus Torvalds
Cc: Alan Cox, David S. Miller, bcrl, marcelo, andrea, linux-kernel
On Wed, 26 Sep 2001, Linus Torvalds wrote:
> What are the athlon numbers?
nothing: 11 cycles
locked add: 11 cycles
cpuid: 63 cycles
(cpuid varies between 63->68 here)
regards,
Dave.
--
| Dave Jones. http://www.suse.de/~davej
| SuSE Labs
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-26 17:25 ` Linus Torvalds
` (2 preceding siblings ...)
2001-09-26 17:45 ` Dave Jones
@ 2001-09-26 17:50 ` Alan Cox
2001-09-26 17:59 ` Dave Jones
2001-09-26 18:59 ` George Greer
2001-09-26 23:26 ` David S. Miller
4 siblings, 2 replies; 67+ messages in thread
From: Alan Cox @ 2001-09-26 17:50 UTC (permalink / raw)
To: Linus Torvalds
Cc: Alan Cox, David S. Miller, bcrl, marcelo, andrea, linux-kernel
and for completeness
VIA Cyrix CIII (original generation 0.18u)
nothing: 28 cycles
locked add: 29 cycles
cpuid: 72 cycles
Pentium Pro
nothing: 33 cycles
locked add: 51 cycles
cpuid: 98 cycles
(base comparison - pure in order machine)
IDT winchip
nothing: 17 cycles
locked add: 20 cycles
cpuid: 33 cycles
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-26 17:50 ` Alan Cox
@ 2001-09-26 17:59 ` Dave Jones
2001-09-26 18:07 ` Alan Cox
` (2 more replies)
2001-09-26 18:59 ` George Greer
1 sibling, 3 replies; 67+ messages in thread
From: Dave Jones @ 2001-09-26 17:59 UTC (permalink / raw)
To: Alan Cox
Cc: Linus Torvalds, David S. Miller, bcrl, marcelo, andrea,
linux-kernel
On Wed, 26 Sep 2001, Alan Cox wrote:
> VIA Cyrix CIII (original generation 0.18u)
>
> nothing: 28 cycles
> locked add: 29 cycles
> cpuid: 72 cycles
Interesting. From a newer C3..
nothing: 30 cycles
locked add: 31 cycles
cpuid: 79 cycles
Only slightly worse, but I'd not expected this.
This was from a 866MHz part too, whereas you have a 533 iirc ?
regards,
Dave.
--
| Dave Jones. http://www.suse.de/~davej
| SuSE Labs
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-26 17:40 ` Alan Cox
2001-09-26 17:44 ` Linus Torvalds
@ 2001-09-26 18:01 ` Dave Jones
2001-09-26 20:20 ` Vojtech Pavlik
2 siblings, 0 replies; 67+ messages in thread
From: Dave Jones @ 2001-09-26 18:01 UTC (permalink / raw)
To: Alan Cox
Cc: Linus Torvalds, David S. Miller, bcrl, marcelo, andrea,
linux-kernel
On Wed, 26 Sep 2001, Alan Cox wrote:
> Original core Athlon (step 2 and earlier)
>
> nothing: 11 cycles
> locked add: 22 cycles
> cpuid: 67 cycles
>
> I don't currently have a palomino core to test
Exactly the same as the original core.
nothing: 11 cycles
locked add: 11 cycles
cpuid: 67 cycles
(cpuid varies 63->68)
regards,
Dave.
--
| Dave Jones. http://www.suse.de/~davej
| SuSE Labs
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-26 17:44 ` Linus Torvalds
@ 2001-09-26 18:01 ` Benjamin LaHaise
0 siblings, 0 replies; 67+ messages in thread
From: Benjamin LaHaise @ 2001-09-26 18:01 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Alan Cox, David S. Miller, marcelo, andrea, linux-kernel
On Wed, Sep 26, 2001 at 10:44:14AM -0700, Linus Torvalds wrote:
> Do you have an actual SMP Athlon to test? I'd love to see if that "locked
> add" thing is really SMP-safe - it may be that it's the old "AMD turned
> off the 'lock' prefix synchronization because it doesn't matter in UP".
> They used to have a bit to do that..
Same, my dual reports:
[bcrl@toomuch ~]$ ./a.out
nothing: 11 cycles
locked add: 11 cycles
cpuid: 68 cycles
Which is pretty good.
> That said, it _can_ be real even on SMP. There's no reason why a memory
> barrier would have to be as heavy as it is on some machines (even the P4
> looks positively _fast_ compared to most older machines that did memory
> barriers on the bus and took hundreds of much slower cycles to do it).
I had discussions with a few people from intel about the p4 having much
improved locking performance, including the ability to speculatively
execute locked instructions. How much of that is enabled in the current
cores is another question entirely (gotta love microcode patches).
-ben
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-26 17:59 ` Dave Jones
@ 2001-09-26 18:07 ` Alan Cox
2001-09-26 18:09 ` Padraig Brady
2001-09-26 18:24 ` Linus Torvalds
2 siblings, 0 replies; 67+ messages in thread
From: Alan Cox @ 2001-09-26 18:07 UTC (permalink / raw)
To: Dave Jones
Cc: Alan Cox, Linus Torvalds, David S. Miller, bcrl, marcelo, andrea,
linux-kernel
> nothing: 30 cycles
> locked add: 31 cycles
> cpuid: 79 cycles
>
> Only slightly worse, but I'd not expected this.
> This was from a 866MHz part too, whereas you have a 533 iirc ?
The 0.13u part has a couple more pipeline steps I believe
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-26 17:59 ` Dave Jones
2001-09-26 18:07 ` Alan Cox
@ 2001-09-26 18:09 ` Padraig Brady
2001-09-26 18:22 ` Dave Jones
2001-09-26 18:24 ` Linus Torvalds
2 siblings, 1 reply; 67+ messages in thread
From: Padraig Brady @ 2001-09-26 18:09 UTC (permalink / raw)
To: Dave Jones; +Cc: Alan Cox, linux-kernel
Dave Jones wrote:
>On Wed, 26 Sep 2001, Alan Cox wrote:
>
>>VIA Cyrix CIII (original generation 0.18u)
>>
>>nothing: 28 cycles
>>locked add: 29 cycles
>>cpuid: 72 cycles
>>
>
>Interesting. From a newer C3..
>
>nothing: 30 cycles
>locked add: 31 cycles
>cpuid: 79 cycles
>
>Only slightly worse, but I'd not expected this.
>This was from a 866MHz part too, whereas you have a 533 iirc ?
>
>regards,
>
>Dave.
>
Interesting, does the origonal CIII have a TSC? would that affect the
timings Alan got?
The following table may be of use to people:
(All these S370)
----------------------------------------------------------------------------------------
core size name code Notes
----------------------------------------------------------------------------------------
samuel 0.18µm Via Cyrix III(C5) (128K L1 0K L2 cache). FPU
doesn't run @ full clock speed.
samuel II 0.15µm Via C3 (C5B) 667MHz CIII in Dabs are
C3's (128K L1, 64K L2 cache), (MMX/3D now!), FPU @ full clock speed.
mathew 0.15µm Via C3 (C5B) mobile samuel II with
integrated north bridge & 2D/3D graphics. (1.6v)
ezra 0.13µm Via C3 (C5C) Debut @ 850MHz rising to
1GHz quickly (1.35v)
nehemiah 0.13µm Via C4 (C5X) Debut @ 1.2GHz (128K L1,
256K L2 cache) (SSE)
esther 0.10µm Via C4 (C5Y) ?
----------------------
C3 availability details:
667 66 / 100 / 133 1.5 Socket 370 L1: 128kB,L2: 64kB
0.15µ 6-12W Mar 2001
733 66 / 100 / 133 1.5 Socket 370 L1: 128kB,L2: 64kB
0.15µ 6-12W May 2001
733 66 / 100 / 133 1.5 Socket 370 L1: 128kB,L2: 64kB
0.15µ 1+ W May 2001 (e series)
750 100 / 133 1.5 Socket 370 L1: 128kB,L2: 64kB 0.15µ
6-12W May 2001
800 100 / 133 1.5 Socket 370 L1: 128kB,L2: 64kB 0.13µ
7-12W May 2001 (ezra)
----------------------
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-26 18:09 ` Padraig Brady
@ 2001-09-26 18:22 ` Dave Jones
0 siblings, 0 replies; 67+ messages in thread
From: Dave Jones @ 2001-09-26 18:22 UTC (permalink / raw)
To: Padraig Brady; +Cc: Alan Cox, linux-kernel
On Wed, 26 Sep 2001, Padraig Brady wrote:
> Interesting, does the origonal CIII have a TSC?
Yes.
regards,
Dave.
--
| Dave Jones. http://www.suse.de/~davej
| SuSE Labs
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-26 17:43 ` Richard Gooch
@ 2001-09-26 18:24 ` Benjamin LaHaise
2001-09-26 18:48 ` Richard Gooch
0 siblings, 1 reply; 67+ messages in thread
From: Benjamin LaHaise @ 2001-09-26 18:24 UTC (permalink / raw)
To: Richard Gooch; +Cc: linux-kernel
On Wed, Sep 26, 2001 at 11:43:25AM -0600, Richard Gooch wrote:
> BTW: your code had horrible control-M's on each line. So the compiler
> choked (with a less-than-helpful error message). Of course, cat t.c
> showed nothing amiss. Fortunately emacs doesn't hide information.
You must be using some kind of broken MUA -- neither mutt nor pine
resulted in anything with a trace of 0x0d in it.
-ben
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-26 17:59 ` Dave Jones
2001-09-26 18:07 ` Alan Cox
2001-09-26 18:09 ` Padraig Brady
@ 2001-09-26 18:24 ` Linus Torvalds
2001-09-26 18:40 ` Dave Jones
2001-09-26 19:04 ` Locking comment on shrink_caches() George Greer
2 siblings, 2 replies; 67+ messages in thread
From: Linus Torvalds @ 2001-09-26 18:24 UTC (permalink / raw)
To: Dave Jones; +Cc: Alan Cox, David S. Miller, bcrl, marcelo, andrea, linux-kernel
On Wed, 26 Sep 2001, Dave Jones wrote:
> On Wed, 26 Sep 2001, Alan Cox wrote:
>
> > VIA Cyrix CIII (original generation 0.18u)
> >
> > nothing: 28 cycles
> > locked add: 29 cycles
> > cpuid: 72 cycles
>
> Interesting. From a newer C3..
>
> nothing: 30 cycles
> locked add: 31 cycles
> cpuid: 79 cycles
>
> Only slightly worse, but I'd not expected this.
That difference can easily be explained by the compiler and options.
You should use "gcc -O2" at least, in order to avoid having gcc do
unnecessary spills to memory in between the timings. And there may be some
versions of gcc that en dup spilling even then.
Linus
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-26 18:24 ` Linus Torvalds
@ 2001-09-26 18:40 ` Dave Jones
2001-09-26 19:12 ` Linus Torvalds
2001-09-26 19:04 ` Locking comment on shrink_caches() George Greer
1 sibling, 1 reply; 67+ messages in thread
From: Dave Jones @ 2001-09-26 18:40 UTC (permalink / raw)
To: Linus Torvalds
Cc: Alan Cox, David S. Miller, bcrl, marcelo, andrea, linux-kernel
On Wed, 26 Sep 2001, Linus Torvalds wrote:
> > > cpuid: 72 cycles
> > cpuid: 79 cycles
> > Only slightly worse, but I'd not expected this.
> That difference can easily be explained by the compiler and options.
Actually repeated runs of the test on that box show it deviating by up
to 10 cycles, making it match the results that Alan posted.
-O2 made no difference, these deviations still occur. They seem more
prominent on the C3 than other boxes I've tried, even with the same
compiler toolchain.
regards,
Dave.
--
| Dave Jones. http://www.suse.de/~davej
| SuSE Labs
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-26 18:24 ` Benjamin LaHaise
@ 2001-09-26 18:48 ` Richard Gooch
2001-09-26 18:58 ` Davide Libenzi
0 siblings, 1 reply; 67+ messages in thread
From: Richard Gooch @ 2001-09-26 18:48 UTC (permalink / raw)
To: Benjamin LaHaise; +Cc: linux-kernel
Benjamin LaHaise writes:
> On Wed, Sep 26, 2001 at 11:43:25AM -0600, Richard Gooch wrote:
> > BTW: your code had horrible control-M's on each line. So the compiler
> > choked (with a less-than-helpful error message). Of course, cat t.c
> > showed nothing amiss. Fortunately emacs doesn't hide information.
>
> You must be using some kind of broken MUA -- neither mutt nor pine
> resulted in anything with a trace of 0x0d in it.
My MUA doesn't know about MIME at all (part of the reason I hate
MIME). I save the message to a file and run uudeview 0.5pl13.
Regards,
Richard....
Permanent: rgooch@atnf.csiro.au
Current: rgooch@ras.ucalgary.ca
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-26 18:48 ` Richard Gooch
@ 2001-09-26 18:58 ` Davide Libenzi
0 siblings, 0 replies; 67+ messages in thread
From: Davide Libenzi @ 2001-09-26 18:58 UTC (permalink / raw)
To: Richard Gooch; +Cc: linux-kernel, Benjamin LaHaise
On 26-Sep-2001 Richard Gooch wrote:
> Benjamin LaHaise writes:
>> On Wed, Sep 26, 2001 at 11:43:25AM -0600, Richard Gooch wrote:
>> > BTW: your code had horrible control-M's on each line. So the compiler
>> > choked (with a less-than-helpful error message). Of course, cat t.c
>> > showed nothing amiss. Fortunately emacs doesn't hide information.
>>
>> You must be using some kind of broken MUA -- neither mutt nor pine
>> resulted in anything with a trace of 0x0d in it.
>
> My MUA doesn't know about MIME at all (part of the reason I hate
> MIME). I save the message to a file and run uudeview 0.5pl13.
Maybe the file you save is in RFC format ( \r\n ) and uudeview does not trim it.
- Davide
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-26 17:50 ` Alan Cox
2001-09-26 17:59 ` Dave Jones
@ 2001-09-26 18:59 ` George Greer
1 sibling, 0 replies; 67+ messages in thread
From: George Greer @ 2001-09-26 18:59 UTC (permalink / raw)
To: linux-kernel
On Wed, 26 Sep 2001, Alan Cox wrote:
>and for completeness
>
>VIA Cyrix CIII (original generation 0.18u)
>
>nothing: 28 cycles
>locked add: 29 cycles
>cpuid: 72 cycles
>
>Pentium Pro
>
>nothing: 33 cycles
>locked add: 51 cycles
>cpuid: 98 cycles
>
>(base comparison - pure in order machine)
>
>IDT winchip
>
>nothing: 17 cycles
>locked add: 20 cycles
>cpuid: 33 cycles
2x Pentium MMX 233MHz
nothing: 14 cycles
locked add: 59 cycles
cpuid: 31 cycles
2x Pentium 133MHz
nothing: 14 cycles
locked add: 76 cycles
cpuid: 31 cycles
cpuid is oddly fast.
--
George Greer, greerga@m-l.org | Genius may have its limitations, but stupidity
http://www.m-l.org/~greerga/ | is not thus handicapped. -- Elbert Hubbard
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-26 18:24 ` Linus Torvalds
2001-09-26 18:40 ` Dave Jones
@ 2001-09-26 19:04 ` George Greer
1 sibling, 0 replies; 67+ messages in thread
From: George Greer @ 2001-09-26 19:04 UTC (permalink / raw)
To: linux-kernel
On Wed, 26 Sep 2001, Linus Torvalds wrote:
>
>On Wed, 26 Sep 2001, Dave Jones wrote:
>> On Wed, 26 Sep 2001, Alan Cox wrote:
>>
>> > VIA Cyrix CIII (original generation 0.18u)
>> >
>> > nothing: 28 cycles
>> > locked add: 29 cycles
>> > cpuid: 72 cycles
>>
>> Interesting. From a newer C3..
>>
>> nothing: 30 cycles
>> locked add: 31 cycles
>> cpuid: 79 cycles
>>
>> Only slightly worse, but I'd not expected this.
>
>That difference can easily be explained by the compiler and options.
>
>You should use "gcc -O2" at least, in order to avoid having gcc do
>unnecessary spills to memory in between the timings. And there may be some
>versions of gcc that en dup spilling even then.
Nice big difference in 'locked add' seen here.
gcc version 2.96 20000731 (Red Hat Linux 7.1 2.96-85)
2x Pentium 233/MMX
-O0 -O2
nothing: 15 cycles nothing: 14 cycles
locked add: 60 cycles locked add: 32 cycles
cpuid: 33 cycles cpuid: 32 cycles
gcc version 2.96 20000731 (Red Hat Linux 7.1 2.96-85)
2x Pentium 133
-O0 -O2
nothing: 14 cycles nothing: 13 cycles
locked add: 76 cycles locked add: 25 cycles
cpuid: 31 cycles cpuid: 30 cycles
--
George Greer, greerga@m-l.org | Genius may have its limitations, but stupidity
http://www.m-l.org/~greerga/ | is not thus handicapped. -- Elbert Hubbard
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-26 18:40 ` Dave Jones
@ 2001-09-26 19:12 ` Linus Torvalds
2001-09-27 12:22 ` CPU frequency shifting "problems" Padraig Brady
0 siblings, 1 reply; 67+ messages in thread
From: Linus Torvalds @ 2001-09-26 19:12 UTC (permalink / raw)
To: linux-kernel
In article <Pine.LNX.4.30.0109262036480.8655-100000@Appserv.suse.de>,
Dave Jones <davej@suse.de> wrote:
>On Wed, 26 Sep 2001, Linus Torvalds wrote:
>
>> > > cpuid: 72 cycles
>> > cpuid: 79 cycles
>> > Only slightly worse, but I'd not expected this.
>> That difference can easily be explained by the compiler and options.
>
>Actually repeated runs of the test on that box show it deviating by up
>to 10 cycles, making it match the results that Alan posted.
>-O2 made no difference, these deviations still occur. They seem more
>prominent on the C3 than other boxes I've tried, even with the same
>compiler toolchain.
Does the C3 do any kind of frequency shifting?
For example, on a transmeta CPU, the TSC will run at a constant
"nominal" speed (the highest the CPU can go), although the real CPU
speed will depend on the load of the machine and temperature etc. So on
a crusoe CPU you'll see varying speeds (and it depends on the speed
grade, because that in turn depends on how many longrun steps are being
actively used).
For example, on a mostly idle machine I get
torvalds@kiwi:~ > ./a.out
nothing: 54 cycles
locked add: 54 cycles
cpuid: 91 cycles
while if I have another window that does an endless loop to keep the CPU
busy, the _real_ frequency of the CPU scales up, and the machine
basically becomes faster:
torvalds@kiwi:~ > ./a.out
nothing: 36 cycles
locked add: 36 cycles
cpuid: 54 cycles
(The reason why the "nothing" TSC read is expensive on crusoe is because
of the scaling of the TSC - rdtsc literally has to do a floating point
multiply-add to scale the clock to the right "nominal" frequency. Of
course, "expensive" is still a lot less than the inexplicable 80 cycles
on a P4).
(That's a 600MHz part going down to to 400MHz in idle, btw)
On a 633MHz part (I don't actually have access to any of the high speed
grades ;) it ends up being
fast:
nothing: 39 cycles
locked add: 40 cycles
cpuid: 68 cycles
slow:
nothing: 82 cycles
locked add: 84 cycles
cpuid: 122 cycles
which corresponds to a 633MHz part going down to 300MHz in idle.
And of course, you can get pretty much anything in between, depending on
what the load is...
Linus
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-26 17:40 ` Alan Cox
2001-09-26 17:44 ` Linus Torvalds
2001-09-26 18:01 ` Dave Jones
@ 2001-09-26 20:20 ` Vojtech Pavlik
2001-09-26 20:24 ` Vojtech Pavlik
2 siblings, 1 reply; 67+ messages in thread
From: Vojtech Pavlik @ 2001-09-26 20:20 UTC (permalink / raw)
To: Alan Cox
Cc: Linus Torvalds, David S. Miller, bcrl, marcelo, andrea,
linux-kernel
On Wed, Sep 26, 2001 at 06:40:15PM +0100, Alan Cox wrote:
> > PIII:
> > nothing: 32 cycles
> > locked add: 50 cycles
> > cpuid: 170 cycles
> >
> > P4:
> > nothing: 80 cycles
> > locked add: 184 cycles
> > cpuid: 652 cycles
>
>
> Original core Athlon (step 2 and earlier)
>
> nothing: 11 cycles
> locked add: 22 cycles
> cpuid: 67 cycles
>
> generic Athlon is
>
> nothing: 11 cycles
> locked add: 11 cycles
> cpuid: 64 cycles
Interestingly enough, my TBird 1.1G insist on cpuid being somewhat
slower:
nothing: 11 cycles
locked add: 11 cycles
cpuid: 87 cycles
--
Vojtech Pavlik
SuSE Labs
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-26 20:20 ` Vojtech Pavlik
@ 2001-09-26 20:24 ` Vojtech Pavlik
0 siblings, 0 replies; 67+ messages in thread
From: Vojtech Pavlik @ 2001-09-26 20:24 UTC (permalink / raw)
To: Alan Cox
Cc: Linus Torvalds, David S. Miller, bcrl, marcelo, andrea,
linux-kernel
On Wed, Sep 26, 2001 at 10:20:21PM +0200, Vojtech Pavlik wrote:
> > generic Athlon is
> >
> > nothing: 11 cycles
> > locked add: 11 cycles
> > cpuid: 64 cycles
>
> Interestingly enough, my TBird 1.1G insist on cpuid being somewhat
> slower:
>
> nothing: 11 cycles
> locked add: 11 cycles
> cpuid: 87 cycles
Oops, this is indeed just a difference in compiler options.
--
Vojtech Pavlik
SuSE Labs
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-26 17:25 ` Linus Torvalds
` (3 preceding siblings ...)
2001-09-26 17:50 ` Alan Cox
@ 2001-09-26 23:26 ` David S. Miller
2001-09-27 12:10 ` Alan Cox
4 siblings, 1 reply; 67+ messages in thread
From: David S. Miller @ 2001-09-26 23:26 UTC (permalink / raw)
To: torvalds; +Cc: alan, bcrl, marcelo, andrea, linux-kernel
From: Linus Torvalds <torvalds@transmeta.com>
Date: Wed, 26 Sep 2001 10:25:18 -0700 (PDT)
On Wed, 26 Sep 2001, Alan Cox wrote:
> >
> > Your Athlons may handle exclusive cache line acquisition more
> > efficiently (due to memory subsystem performance) but it still
> > does cost something.
>
> On an exclusive line on Athlon a lock cycle is near enough free, its
> just an ordering constraint. Since the line is in E state no other bus
> master can hold a copy in cache so the atomicity is there. Ditto for newer
> Intel processors
You misunderstood the problem, I think: when the line moves from one CPU
to the other (the exclusive state moves along with it), that is
_expensive_.
Yes, this was my intended point. Please see my quoted text above and
note the "exclusive cache line acquisition" with emphasis on the word
"acquisition" meaning you don't have the cache line in E state yet.
Franks a lot,
David S. Miller
davem@redhat.com
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
[not found] ` <i1m66a5o1zc.fsf@verden.pvv.ntnu.no>
@ 2001-09-27 1:34 ` Vojtech Pavlik
0 siblings, 0 replies; 67+ messages in thread
From: Vojtech Pavlik @ 2001-09-27 1:34 UTC (permalink / raw)
To: Trond Eivind Glomsr?d; +Cc: linux-kernel
On Thu, Sep 27, 2001 at 03:29:27AM +0200, Trond Eivind Glomsr?d wrote:
> Vojtech Pavlik <vojtech@suse.cz> writes:
>
> > On Wed, Sep 26, 2001 at 10:20:21PM +0200, Vojtech Pavlik wrote:
> >
> > > > generic Athlon is
> > > >
> > > > nothing: 11 cycles
> > > > locked add: 11 cycles
> > > > cpuid: 64 cycles
> > >
> > > Interestingly enough, my TBird 1.1G insist on cpuid being somewhat
> > > slower:
> > >
> > > nothing: 11 cycles
> > > locked add: 11 cycles
> > > cpuid: 87 cycles
> >
> > Oops, this is indeed just a difference in compiler options.
>
> No, it's not:
>
> [teg@xyzzy teg]$ ./t
> nothing: 11 cycles
> locked add: 11 cycles
> cpuid: 64 cycles
> [teg@xyzzy teg]$ ./t
> nothing: 11 cycles
> locked add: 11 cycles
> cpuid: 64 cycles
> [teg@xyzzy teg]$
> [teg@xyzzy teg]$ ./t
> nothing: 11 cycles
> locked add: 11 cycles
> cpuid: 87 cycles
> [teg@xyzzy teg]$ ./t
> nothing: 11 cycles
> locked add: 11 cycles
> cpuid: 87 cycles
> [teg@xyzzy teg]$ ./t
> nothing: 11 cycles
> locked add: 11 cycles
> cpuid: 64 cycles
Interesting: Try while true; do t; done and watch it change between 64
and 87 every 2.5 seconds ... :)
--
Vojtech Pavlik
SuSE Labs
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-26 23:26 ` David S. Miller
@ 2001-09-27 12:10 ` Alan Cox
2001-09-27 15:38 ` Linus Torvalds
2001-09-27 19:41 ` David S. Miller
0 siblings, 2 replies; 67+ messages in thread
From: Alan Cox @ 2001-09-27 12:10 UTC (permalink / raw)
To: David S. Miller; +Cc: torvalds, alan, bcrl, marcelo, andrea, linux-kernel
> Yes, this was my intended point. Please see my quoted text above and
> note the "exclusive cache line acquisition" with emphasis on the word
> "acquisition" meaning you don't have the cache line in E state yet.
See prefetching - the CPU prefetching will hide some of the effect and
the spin_lock_prefetch() macro does wonders for the rest.
Alan
^ permalink raw reply [flat|nested] 67+ messages in thread
* CPU frequency shifting "problems"
2001-09-26 19:12 ` Linus Torvalds
@ 2001-09-27 12:22 ` Padraig Brady
2001-09-27 12:44 ` Dave Jones
2001-09-27 23:23 ` Linus Torvalds
0 siblings, 2 replies; 67+ messages in thread
From: Padraig Brady @ 2001-09-27 12:22 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-kernel
Linus Torvalds wrote:
>In article <Pine.LNX.4.30.0109262036480.8655-100000@Appserv.suse.de>,
>Dave Jones <davej@suse.de> wrote:
>
>>On Wed, 26 Sep 2001, Linus Torvalds wrote:
>>
>>>>>cpuid: 72 cycles
>>>>>
>>>>cpuid: 79 cycles
>>>>Only slightly worse, but I'd not expected this.
>>>>
>>>That difference can easily be explained by the compiler and options.
>>>
>>Actually repeated runs of the test on that box show it deviating by up
>>to 10 cycles, making it match the results that Alan posted.
>>-O2 made no difference, these deviations still occur. They seem more
>>prominent on the C3 than other boxes I've tried, even with the same
>>compiler toolchain.
>>
>
>Does the C3 do any kind of frequency shifting?
>
Not automatic, but you can set the multiplier dynamically by setting the
msr.
Russell King has been working on an arch independent framework for this
kind of thing and support for the C3 has recently been added by Dave Jones.
The code is available @:
cvs -d :pserver:cvs@pubcvs.arm.linux.org.uk:/mnt/src/cvsroot login
cvs -d :pserver:cvs@pubcvs.arm.linux.org.uk:/mnt/src/cvsroot co cpufreq
>
>For example, on a transmeta CPU, the TSC will run at a constant
>"nominal" speed (the highest the CPU can go), although the real CPU
>speed will depend on the load of the machine and temperature etc.
>
As does the P4 from what I understand. So a question..
What are the software dependencies on this auto/manual frequency shifting?
The code referenced above scales jiffies appropriately when a manual
frequency change is requested. I'm not sure about the possible consequences
of this for e.g. could there be races introduced with various busy loop
locking etc.
A quick check for the use of jiffies in the kernel:
[padraig@pixelbeat linux]$ find -name "*.[ch]" | xargs grep jiffies | wc -l
3992
Also with the auto shifting of the transmeta/P4, wont this invalidate
the jiffies
value? Also how does this affect the rtLinux guys (and realtime software
in general).
cheers,
Padraig.
> So on
>a crusoe CPU you'll see varying speeds (and it depends on the speed
>grade, because that in turn depends on how many longrun steps are being
>actively used).
>
>For example, on a mostly idle machine I get
>
> torvalds@kiwi:~ > ./a.out
> nothing: 54 cycles
> locked add: 54 cycles
> cpuid: 91 cycles
>
>while if I have another window that does an endless loop to keep the CPU
>busy, the _real_ frequency of the CPU scales up, and the machine
>basically becomes faster:
>
> torvalds@kiwi:~ > ./a.out
> nothing: 36 cycles
> locked add: 36 cycles
> cpuid: 54 cycles
>
>(The reason why the "nothing" TSC read is expensive on crusoe is because
>of the scaling of the TSC - rdtsc literally has to do a floating point
>multiply-add to scale the clock to the right "nominal" frequency. Of
>course, "expensive" is still a lot less than the inexplicable 80 cycles
>on a P4).
>
>(That's a 600MHz part
>going down to to 400MHz in idle, btw)
>
>On a 633MHz part (I don't actually have access to any of the high speed
>grades ;) it ends up being
>
>fast:
> nothing: 39 cycles
> locked add: 40 cycles
> cpuid: 68 cycles
>
>slow:
> nothing: 82 cycles
> locked add: 84 cycles
> cpuid: 122 cycles
>
>which corresponds to a 633MHz part going down to 300MHz in idle.
>
>And of course, you can get pretty much anything in between, depending on
>what the load is...
>
> Linus
>
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: CPU frequency shifting "problems"
2001-09-27 12:22 ` CPU frequency shifting "problems" Padraig Brady
@ 2001-09-27 12:44 ` Dave Jones
2001-09-27 23:23 ` Linus Torvalds
1 sibling, 0 replies; 67+ messages in thread
From: Dave Jones @ 2001-09-27 12:44 UTC (permalink / raw)
To: Padraig Brady; +Cc: Linux Kernel Mailing List
On Thu, 27 Sep 2001, Padraig Brady wrote:
> >Does the C3 do any kind of frequency shifting?
> Not automatic, but you can set the multiplier dynamically by setting the
> msr.
> Russell King has been working on an arch independent framework for this
> kind of thing and support for the C3 has recently been added by Dave Jones.
If you're going to try this out on a C3 btw, heed the warning at the
top of the code :) This still needs quite a bit of work.
I just need to find the time to sit down and finish it.
(The x86 bits are all thats preventing Russell from saying
"This is ready" iirc, so I should get that finished at some point soon)
I'd like to add Transmeta Longrun support to it too, but that can
come later, when I get access to one.
regards,
Dave.
--
| Dave Jones. http://www.suse.de/~davej
| SuSE Labs
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-27 12:10 ` Alan Cox
@ 2001-09-27 15:38 ` Linus Torvalds
2001-09-27 17:44 ` Ingo Molnar
2001-09-27 19:41 ` David S. Miller
1 sibling, 1 reply; 67+ messages in thread
From: Linus Torvalds @ 2001-09-27 15:38 UTC (permalink / raw)
To: Alan Cox; +Cc: David S. Miller, bcrl, marcelo, andrea, linux-kernel
On Thu, 27 Sep 2001, Alan Cox wrote:
>
> > Yes, this was my intended point. Please see my quoted text above and
> > note the "exclusive cache line acquisition" with emphasis on the word
> > "acquisition" meaning you don't have the cache line in E state yet.
>
> See prefetching - the CPU prefetching will hide some of the effect and
> the spin_lock_prefetch() macro does wonders for the rest.
prefetching and friends won't do _anything_ for the case of a cache line
bouncing back and forth between CPU's.
In fact, it can easily make things _worse_, simply by having bouncing
happen even more (you bounce it into the CPU for the prefetch, another CPU
bounces it back, and you bounce it in again for the actual lock).
And this isn't at all unlikely if you have a lock that is accessed a _lot_
but held only for short times.
Now, I'm not convinced that pagecache_lock is _that_ critical yet, but is
it one of the top ones? Definitely.
Linus
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-27 15:38 ` Linus Torvalds
@ 2001-09-27 17:44 ` Ingo Molnar
0 siblings, 0 replies; 67+ messages in thread
From: Ingo Molnar @ 2001-09-27 17:44 UTC (permalink / raw)
To: Linus Torvalds
Cc: Alan Cox, David S. Miller, bcrl, marcelo, Andrea Arcangeli,
linux-kernel
On Thu, 27 Sep 2001, Linus Torvalds wrote:
> prefetching and friends won't do _anything_ for the case of a cache
> line bouncing back and forth between CPU's.
yep. that is exactly what was happening with pagecache_lock, while an
8-way system served 300+ MB/sec worth of SPECweb99 HTTP content in 1500
byte packets. Under that kind of workload the pagecache is used
read-mostly, and due to zerocopy (and Linux's hyper-scalable networking
code) there isnt much left that pollutes caches and/or inhibits raw
performance in any way. pagecache_lock was the top non-conceptual
cacheline-miss offender in instruction-level profiles of such workloads.
Does it show up on a dual PIII with 128 MB RAM? Probably not as strongly.
Are there other offenders under other kinds of workloads that have a
bigger effect than pagecache_lock? Probably yes - but this does not
justify ignoring the effects of pagecache_lock.
(to be precise there was another offender - timerlist_lock, we've fixed it
before fixing pagecache_lock, and posted a patch for that one too. It's
available under http://redhat.com/~mingo/scalable-timers/. I know no other
scalability offenders for read-mostly pagecache & network-intensive
workloads for the time being.)
Ingo
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-27 12:10 ` Alan Cox
2001-09-27 15:38 ` Linus Torvalds
@ 2001-09-27 19:41 ` David S. Miller
2001-09-27 22:59 ` Alan Cox
1 sibling, 1 reply; 67+ messages in thread
From: David S. Miller @ 2001-09-27 19:41 UTC (permalink / raw)
To: alan; +Cc: torvalds, bcrl, marcelo, andrea, linux-kernel
From: Alan Cox <alan@lxorguk.ukuu.org.uk>
Date: Thu, 27 Sep 2001 13:10:49 +0100 (BST)
See prefetching - the CPU prefetching will hide some of the effect and
the spin_lock_prefetch() macro does wonders for the rest.
Well, if prefetching can do it faster than avoiding the transaction
altogether, I'm game :-)
Franks a lot,
David S. Miller
davem@redhat.com
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches()
2001-09-27 19:41 ` David S. Miller
@ 2001-09-27 22:59 ` Alan Cox
0 siblings, 0 replies; 67+ messages in thread
From: Alan Cox @ 2001-09-27 22:59 UTC (permalink / raw)
To: David S. Miller; +Cc: alan, torvalds, bcrl, marcelo, andrea, linux-kernel
> See prefetching - the CPU prefetching will hide some of the effect and
> the spin_lock_prefetch() macro does wonders for the rest.
>
> Well, if prefetching can do it faster than avoiding the transaction
> altogether, I'm game :-)
That would depend on the cost of avoidance, the amount of contention and
the distance ahead you can fetch. Avoiding it also rather more portable so
I suspect you win
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: CPU frequency shifting "problems"
2001-09-27 12:22 ` CPU frequency shifting "problems" Padraig Brady
2001-09-27 12:44 ` Dave Jones
@ 2001-09-27 23:23 ` Linus Torvalds
2001-09-28 0:55 ` Alan Cox
2001-09-28 8:55 ` Jamie Lokier
1 sibling, 2 replies; 67+ messages in thread
From: Linus Torvalds @ 2001-09-27 23:23 UTC (permalink / raw)
To: Padraig Brady; +Cc: linux-kernel
On Thu, 27 Sep 2001, Padraig Brady wrote:
>
> >
> >For example, on a transmeta CPU, the TSC will run at a constant
> >"nominal" speed (the highest the CPU can go), although the real CPU
> >speed will depend on the load of the machine and temperature etc.
>
> As does the P4 from what I understand.
That might explain why the P4 "rdtsc" is so slow.
> So a question..
> What are the software dependencies on this auto/manual frequency shifting?
None. At least not as long as the CPU _does_ do it automatically, and the
TSC appears to run at a constant speed even if the CPU does not.
For example, the Intel "SpeedStep" CPU's are completely broken under
Linux, and real-time will advance at different speeds in DC and AC modes,
because Intel actually changes the frequency of the TSC _and_ they don't
document how to figure out that it changed.
With a CPU that does makes TSC appear constant-frequency, the fact that
the CPU itself can go faster/slower doesn't matter - from a kernel
perspective that's pretty much equivalent to the different speeds you get
from cache miss behaviour etc.
Linus
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: CPU frequency shifting "problems"
2001-09-27 23:23 ` Linus Torvalds
@ 2001-09-28 0:55 ` Alan Cox
2001-09-28 2:12 ` Stefan Smietanowski
2001-09-28 8:55 ` Jamie Lokier
1 sibling, 1 reply; 67+ messages in thread
From: Alan Cox @ 2001-09-28 0:55 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Padraig Brady, linux-kernel
> For example, the Intel "SpeedStep" CPU's are completely broken under
> Linux, and real-time will advance at different speeds in DC and AC modes,
> because Intel actually changes the frequency of the TSC _and_ they don't
> document how to figure out that it changed.
The change is APM or ACPI initiated. Intel won't tell anyone anything
useful but Microsoft have published some of the required intel confidential
information which helps a bit
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: CPU frequency shifting "problems"
2001-09-28 0:55 ` Alan Cox
@ 2001-09-28 2:12 ` Stefan Smietanowski
0 siblings, 0 replies; 67+ messages in thread
From: Stefan Smietanowski @ 2001-09-28 2:12 UTC (permalink / raw)
To: Alan Cox; +Cc: linux-kernel
Hey.
>>For example, the Intel "SpeedStep" CPU's are completely broken under
>>Linux, and real-time will advance at different speeds in DC and AC modes,
>>because Intel actually changes the frequency of the TSC _and_ they don't
>>document how to figure out that it changed.
>
> The change is APM or ACPI initiated. Intel won't tell anyone anything
> useful but Microsoft have published some of the required intel confidential
> information which helps a bit
Did you just say that Microsoft actually went and did something right
for a change? As in publishing specs I mean.
*Stands in awe*
:)
// Stefan
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: CPU frequency shifting "problems"
2001-09-27 23:23 ` Linus Torvalds
2001-09-28 0:55 ` Alan Cox
@ 2001-09-28 8:55 ` Jamie Lokier
2001-09-28 16:11 ` Linus Torvalds
1 sibling, 1 reply; 67+ messages in thread
From: Jamie Lokier @ 2001-09-28 8:55 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Padraig Brady, linux-kernel
Linus Torvalds wrote:
> With a CPU that does makes TSC appear constant-frequency, the fact that
> the CPU itself can go faster/slower doesn't matter - from a kernel
> perspective that's pretty much equivalent to the different speeds you get
> from cache miss behaviour etc.
On a Transmeta chip, does the TSC clock advance _exactly_ uniformly, or
is there a cumulative error due to speed changes?
I'll clarify. I imagine that the internal clocks are driven by PLLs,
DLLs or something similar. Unless multiple oscillators are used, this
means that speed switching is gradual, over several hundred or many more
clock cycles.
You said that Crusoe does a floating point op to scale the TSC value.
Now suppose I have a 600MHz Crusoe. I calibrate the clock and it comes
out as 600.01MHz.
I can now use `rdtsc' to measure time in userspace, rather more
accurately than gettimeofday(). (In fact I have worked with programs
that do this, for network traffic injection.). I can do this over a
period of minutes, expecting the clock to match "wall clock" time
reasonably accurately.
Suppose the CPU clock speed changes. Can I be confident that
600.01*10^6 (+/- small tolerance) cycles will still be counted per
second, or is there a cumulative error due to the gradual clock speed
change and the floating-point scale factor not integrating the gradual
change precisely?
(One hardware implementation that doesn't have this problem is to run a
small counter, say 3 or 4 bits, at the nominal clock speed all the time,
and have the slower core sample that. But it may use a little more
power, and your note about FP scaling tells me you don't do that).
thanks,
-- Jamie
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: CPU frequency shifting "problems"
2001-09-28 8:55 ` Jamie Lokier
@ 2001-09-28 16:11 ` Linus Torvalds
2001-09-28 20:29 ` Eric W. Biederman
0 siblings, 1 reply; 67+ messages in thread
From: Linus Torvalds @ 2001-09-28 16:11 UTC (permalink / raw)
To: Jamie Lokier; +Cc: Padraig Brady, linux-kernel
On Fri, 28 Sep 2001, Jamie Lokier wrote:
>
> On a Transmeta chip, does the TSC clock advance _exactly_ uniformly, or
> is there a cumulative error due to speed changes?
>
> I'll clarify. I imagine that the internal clocks are driven by PLLs,
> DLLs or something similar. Unless multiple oscillators are used, this
> means that speed switching is gradual, over several hundred or many more
> clock cycles.
Basically, there's the "slow" timer, and the fast one. The slow one always
runs, and fast one gives the precision but runs at CPU speed.
So yes, there are multiple oscillators, and no, they should not drift on
frequency shifting, because the slow and constant one is used to scale the
fast one. So no cumulative errors.
HOWEVER, anybody who believes that TSC is a "truly accurate clock" will be
sadly mistaken on any machine. Even PLL's drift over time, and as
mentioned, Intel already broke the "you can use TSC as wall time" in their
SpeedStep implementation. Who knows what their future CPU's will do..
> I can now use `rdtsc' to measure time in userspace, rather more
> accurately than gettimeofday(). (In fact I have worked with programs
> that do this, for network traffic injection.). I can do this over a
> period of minutes, expecting the clock to match "wall clock" time
> reasonably accurately.
It will work on Crusoe.
> (One hardware implementation that doesn't have this problem is to run a
> small counter, say 3 or 4 bits, at the nominal clock speed all the time,
> and have the slower core sample that. But it may use a little more
> power, and your note about FP scaling tells me you don't do that).
We do that, but the other way around. The thing is, the "nominal clock
speed" doesn't even _exist_ when running normally.
What does exist is the bus clock (well, a multiple of it, but you get the
idea), and that one is stable. I bet PCI devices don't like to be randomly
driven at frequencies "somewhere between 12 and 33MHz" depending on load ;)
But because the stable frequency is the _slow_ one, you can't just scale
that up (well, you could - you could just run your cycle counter at 66MHz
all the time, and you couldn't measure smaller intervals, and people would
be really disappointed). So you need the scaling of the fast one..
Linus
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: CPU frequency shifting "problems"
2001-09-28 16:11 ` Linus Torvalds
@ 2001-09-28 20:29 ` Eric W. Biederman
2001-09-28 22:24 ` Jamie Lokier
0 siblings, 1 reply; 67+ messages in thread
From: Eric W. Biederman @ 2001-09-28 20:29 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Jamie Lokier, Padraig Brady, linux-kernel
Linus Torvalds <torvalds@transmeta.com> writes:
> What does exist is the bus clock (well, a multiple of it, but you get the
> idea), and that one is stable. I bet PCI devices don't like to be randomly
> driven at frequencies "somewhere between 12 and 33MHz" depending on load ;)
I doubt they would like it but it is perfectly legal (PCI spec..) to
vary the pci clock, depending upon load.
Eric
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: CPU frequency shifting "problems"
2001-09-28 20:29 ` Eric W. Biederman
@ 2001-09-28 22:24 ` Jamie Lokier
0 siblings, 0 replies; 67+ messages in thread
From: Jamie Lokier @ 2001-09-28 22:24 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Linus Torvalds, Padraig Brady, linux-kernel
Eric W. Biederman wrote:
> > What does exist is the bus clock (well, a multiple of it, but you get the
> > idea), and that one is stable. I bet PCI devices don't like to be randomly
> > driven at frequencies "somewhere between 12 and 33MHz" depending on load ;)
>
> I doubt they would like it but it is perfectly legal (PCI spec..) to
> vary the pci clock, depending upon load.
Yes it is. Also, the PCI clock is frequency modulated to reduce
electrical interference. (Or on a more cynical note, to pass the
official emissions tests ;-)
However it's common practice to PLL to the PCI clock, for clock
distribution on a board, so varying the frequency must be done in a
strictly constrained fashion.
-- Jamie
^ permalink raw reply [flat|nested] 67+ messages in thread
end of thread, other threads:[~2001-09-28 22:25 UTC | newest]
Thread overview: 67+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-09-25 17:49 Locking comment on shrink_caches() Marcelo Tosatti
2001-09-25 19:57 ` David S. Miller
2001-09-25 18:40 ` Marcelo Tosatti
2001-09-25 20:15 ` David S. Miller
2001-09-25 19:02 ` Marcelo Tosatti
2001-09-25 20:29 ` David S. Miller
2001-09-25 21:00 ` Benjamin LaHaise
2001-09-25 21:55 ` David S. Miller
2001-09-25 22:16 ` Benjamin LaHaise
2001-09-25 22:28 ` David S. Miller
2001-09-26 16:40 ` Alan Cox
2001-09-26 17:25 ` Linus Torvalds
2001-09-26 17:40 ` Alan Cox
2001-09-26 17:44 ` Linus Torvalds
2001-09-26 18:01 ` Benjamin LaHaise
2001-09-26 18:01 ` Dave Jones
2001-09-26 20:20 ` Vojtech Pavlik
2001-09-26 20:24 ` Vojtech Pavlik
2001-09-26 17:43 ` Richard Gooch
2001-09-26 18:24 ` Benjamin LaHaise
2001-09-26 18:48 ` Richard Gooch
2001-09-26 18:58 ` Davide Libenzi
2001-09-26 17:45 ` Dave Jones
2001-09-26 17:50 ` Alan Cox
2001-09-26 17:59 ` Dave Jones
2001-09-26 18:07 ` Alan Cox
2001-09-26 18:09 ` Padraig Brady
2001-09-26 18:22 ` Dave Jones
2001-09-26 18:24 ` Linus Torvalds
2001-09-26 18:40 ` Dave Jones
2001-09-26 19:12 ` Linus Torvalds
2001-09-27 12:22 ` CPU frequency shifting "problems" Padraig Brady
2001-09-27 12:44 ` Dave Jones
2001-09-27 23:23 ` Linus Torvalds
2001-09-28 0:55 ` Alan Cox
2001-09-28 2:12 ` Stefan Smietanowski
2001-09-28 8:55 ` Jamie Lokier
2001-09-28 16:11 ` Linus Torvalds
2001-09-28 20:29 ` Eric W. Biederman
2001-09-28 22:24 ` Jamie Lokier
2001-09-26 19:04 ` Locking comment on shrink_caches() George Greer
2001-09-26 18:59 ` George Greer
2001-09-26 23:26 ` David S. Miller
2001-09-27 12:10 ` Alan Cox
2001-09-27 15:38 ` Linus Torvalds
2001-09-27 17:44 ` Ingo Molnar
2001-09-27 19:41 ` David S. Miller
2001-09-27 22:59 ` Alan Cox
2001-09-25 22:03 ` Andrea Arcangeli
2001-09-25 20:24 ` Rik van Riel
2001-09-25 20:28 ` David S. Miller
2001-09-25 21:05 ` Andrew Morton
2001-09-25 21:48 ` David S. Miller
[not found] ` <200109252215.f8PMFDa02034@eng2.beaverton.ibm.com>
2001-09-25 22:26 ` David S. Miller
2001-09-26 17:42 ` Ingo Molnar
2001-09-25 22:01 ` Andrea Arcangeli
2001-09-25 22:03 ` David S. Miller
2001-09-25 22:59 ` Andrea Arcangeli
2001-09-25 20:40 ` Josh MacDonald
2001-09-25 19:25 ` Marcelo Tosatti
2001-09-25 21:57 ` Andrea Arcangeli
-- strict thread matches above, loose matches on Subject: below --
2001-09-26 5:04 Dipankar Sarma
2001-09-26 5:31 ` Andrew Morton
2001-09-26 6:57 ` David S. Miller
2001-09-26 7:08 ` Dipankar Sarma
2001-09-26 16:52 ` John Hawkes
[not found] <fa.cbgmt3v.192gc8r@ifi.uio.no>
[not found] ` <fa.cd0mtbv.1aigc0v@ifi.uio.no>
[not found] ` <i1m66a5o1zc.fsf@verden.pvv.ntnu.no>
2001-09-27 1:34 ` Vojtech Pavlik
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox