public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Locking comment on shrink_caches()
@ 2001-09-25 17:49 Marcelo Tosatti
  2001-09-25 19:57 ` David S. Miller
  0 siblings, 1 reply; 67+ messages in thread
From: Marcelo Tosatti @ 2001-09-25 17:49 UTC (permalink / raw)
  To: Andrea Arcangeli, Linus Torvalds; +Cc: lkml



Andrea, 


Do you really need to do this ? 

                if (unlikely(!spin_trylock(&pagecache_lock))) {
                        /* we hold the page lock so the page cannot go away from under us */
                        spin_unlock(&pagemap_lru_lock);

                        spin_lock(&pagecache_lock);
                        spin_lock(&pagemap_lru_lock);
                }

Have you actually seen bad hold times of pagecache_lock by
shrink_caches() ? 

Its just that I prefer clear locking without those "tricks". (easier to
understand and harder to miss subtle details)


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-25 19:57 ` David S. Miller
@ 2001-09-25 18:40   ` Marcelo Tosatti
  2001-09-25 20:15     ` David S. Miller
  2001-09-25 20:40     ` Josh MacDonald
  2001-09-25 21:57   ` Andrea Arcangeli
  1 sibling, 2 replies; 67+ messages in thread
From: Marcelo Tosatti @ 2001-09-25 18:40 UTC (permalink / raw)
  To: David S. Miller; +Cc: andrea, torvalds, linux-kernel



On Tue, 25 Sep 2001, David S. Miller wrote:

>    From: Marcelo Tosatti <marcelo@conectiva.com.br>
>    Date: Tue, 25 Sep 2001 14:49:40 -0300 (BRT)
>    
>    Do you really need to do this ? 
>    
>                    if (unlikely(!spin_trylock(&pagecache_lock))) {
>                            /* we hold the page lock so the page cannot go away from under us */
>                            spin_unlock(&pagemap_lru_lock);
>    
>                            spin_lock(&pagecache_lock);
>                            spin_lock(&pagemap_lru_lock);
>                    }
>    
>    Have you actually seen bad hold times of pagecache_lock by
>    shrink_caches() ? 
> 
> Marcelo, this is needed because of the spin lock ordering rules.
> The pagecache_lock must be obtained before the pagemap_lru_lock
> or else deadlock is possible.  The spin_trylock is an optimization.

Not, it is not.

We can simply lock the pagecachelock and the pagemap_lru_lock at the
beginning of the cleaning function. page_launder() use to do that.

Thats why I asked Andrea if there was long hold times by shrink_caches().


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-25 20:15     ` David S. Miller
@ 2001-09-25 19:02       ` Marcelo Tosatti
  2001-09-25 20:29         ` David S. Miller
  2001-09-25 20:24       ` Rik van Riel
  2001-09-25 22:01       ` Andrea Arcangeli
  2 siblings, 1 reply; 67+ messages in thread
From: Marcelo Tosatti @ 2001-09-25 19:02 UTC (permalink / raw)
  To: David S. Miller; +Cc: andrea, torvalds, linux-kernel



On Tue, 25 Sep 2001, David S. Miller wrote:

>    From: Marcelo Tosatti <marcelo@conectiva.com.br>
>    Date: Tue, 25 Sep 2001 15:40:23 -0300 (BRT)
>    
>    We can simply lock the pagecachelock and the pagemap_lru_lock at the
>    beginning of the cleaning function. page_launder() use to do that.
>    
>    Thats why I asked Andrea if there was long hold times by shrink_caches().
>    
> Ok, I see.
> 
> I do think it's silly to hold the pagecache_lock during pure scanning
> activities of shrink_caches().

It may well be, but I would like to see some lockmeter results which show
that _shrink_cache()_ itself is a problem. :)

> It is known that pagecache_lock is the biggest scalability issue on
> large SMP systems, and thus the page cache locking patches Ingo and
> myself did.

Btw, is that one going into 2.5 for sure? (the per-address-space lock). 


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-25 20:40     ` Josh MacDonald
@ 2001-09-25 19:25       ` Marcelo Tosatti
  0 siblings, 0 replies; 67+ messages in thread
From: Marcelo Tosatti @ 2001-09-25 19:25 UTC (permalink / raw)
  To: Josh MacDonald; +Cc: linux-kernel



On Tue, 25 Sep 2001, Josh MacDonald wrote:

> Quoting Marcelo Tosatti (marcelo@conectiva.com.br):
> > 
> > 
> > On Tue, 25 Sep 2001, David S. Miller wrote:
> > 
> > >    From: Marcelo Tosatti <marcelo@conectiva.com.br>
> > >    Date: Tue, 25 Sep 2001 14:49:40 -0300 (BRT)
> > >    
> > >    Do you really need to do this ? 
> > >    
> > >                    if (unlikely(!spin_trylock(&pagecache_lock))) {
> > >                            /* we hold the page lock so the page cannot go away from under us */
> > >                            spin_unlock(&pagemap_lru_lock);
> > >    
> > >                            spin_lock(&pagecache_lock);
> > >                            spin_lock(&pagemap_lru_lock);
> > >                    }
> > >    
> > >    Have you actually seen bad hold times of pagecache_lock by
> > >    shrink_caches() ? 
> > > 
> > > Marcelo, this is needed because of the spin lock ordering rules.
> > > The pagecache_lock must be obtained before the pagemap_lru_lock
> > > or else deadlock is possible.  The spin_trylock is an optimization.
> > 
> > Not, it is not.
> > 
> > We can simply lock the pagecachelock and the pagemap_lru_lock at the
> > beginning of the cleaning function. page_launder() use to do that.
> 
> Since your main concern seems to be simplicity, the code can remain
> the way it is and be far more readable with, e.g.,
> 
> /* Aquire lock1 while holding lock2--reverse order. */
> #define spin_reverse_lock(lock1,lock2)     \
>     if (unlikely(!spin_trylock(&lock1))) { \           
>             spin_unlock(&lock2);           \        
>             spin_lock(&lock1);             \          
>             spin_lock(&lock2);             \        
>     }                                                          
> 
> You can't argue for simple in favor of increasing lock contention,
> but you can keep it readable.

Making the code readable is different from making it logically simple.

I've already seen pretty subtle races on the VM which were living for long
times (eg the latest race which Hugh and me found on
add_to_swap_cache/try_to_swap_out which was there since 2.4.early), so I
prefer to make the code as simpler as possible.

If there is really long hold times by shrink_cache(), then I agree to keep
the current snippet of code to avoid that.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-25 17:49 Locking comment on shrink_caches() Marcelo Tosatti
@ 2001-09-25 19:57 ` David S. Miller
  2001-09-25 18:40   ` Marcelo Tosatti
  2001-09-25 21:57   ` Andrea Arcangeli
  0 siblings, 2 replies; 67+ messages in thread
From: David S. Miller @ 2001-09-25 19:57 UTC (permalink / raw)
  To: marcelo; +Cc: andrea, torvalds, linux-kernel

   From: Marcelo Tosatti <marcelo@conectiva.com.br>
   Date: Tue, 25 Sep 2001 14:49:40 -0300 (BRT)
   
   Do you really need to do this ? 
   
                   if (unlikely(!spin_trylock(&pagecache_lock))) {
                           /* we hold the page lock so the page cannot go away from under us */
                           spin_unlock(&pagemap_lru_lock);
   
                           spin_lock(&pagecache_lock);
                           spin_lock(&pagemap_lru_lock);
                   }
   
   Have you actually seen bad hold times of pagecache_lock by
   shrink_caches() ? 

Marcelo, this is needed because of the spin lock ordering rules.
The pagecache_lock must be obtained before the pagemap_lru_lock
or else deadlock is possible.  The spin_trylock is an optimization.

Franks a lot,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-25 18:40   ` Marcelo Tosatti
@ 2001-09-25 20:15     ` David S. Miller
  2001-09-25 19:02       ` Marcelo Tosatti
                         ` (2 more replies)
  2001-09-25 20:40     ` Josh MacDonald
  1 sibling, 3 replies; 67+ messages in thread
From: David S. Miller @ 2001-09-25 20:15 UTC (permalink / raw)
  To: marcelo; +Cc: andrea, torvalds, linux-kernel

   From: Marcelo Tosatti <marcelo@conectiva.com.br>
   Date: Tue, 25 Sep 2001 15:40:23 -0300 (BRT)
   
   We can simply lock the pagecachelock and the pagemap_lru_lock at the
   beginning of the cleaning function. page_launder() use to do that.
   
   Thats why I asked Andrea if there was long hold times by shrink_caches().
   
Ok, I see.

I do think it's silly to hold the pagecache_lock during pure scanning
activities of shrink_caches().

It is known that pagecache_lock is the biggest scalability issue on
large SMP systems, and thus the page cache locking patches Ingo and
myself did.

Franks a lot,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-25 20:15     ` David S. Miller
  2001-09-25 19:02       ` Marcelo Tosatti
@ 2001-09-25 20:24       ` Rik van Riel
  2001-09-25 20:28         ` David S. Miller
       [not found]         ` <200109252215.f8PMFDa02034@eng2.beaverton.ibm.com>
  2001-09-25 22:01       ` Andrea Arcangeli
  2 siblings, 2 replies; 67+ messages in thread
From: Rik van Riel @ 2001-09-25 20:24 UTC (permalink / raw)
  To: David S. Miller; +Cc: marcelo, andrea, torvalds, linux-kernel

On Tue, 25 Sep 2001, David S. Miller wrote:

> It is known that pagecache_lock is the biggest scalability issue
> on large SMP systems, and thus the page cache locking patches
> Ingo and myself did.

Interesting, most lockmeter data dumps I've seen here
indicate the locks in fs/buffer.c as the big problem
and have pagecache_lock down in the noise.

Or were you measuring loads which are mostly read-only ?

regards,

Rik
--
IA64: a worthy successor to the i860.

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-25 20:24       ` Rik van Riel
@ 2001-09-25 20:28         ` David S. Miller
  2001-09-25 21:05           ` Andrew Morton
       [not found]         ` <200109252215.f8PMFDa02034@eng2.beaverton.ibm.com>
  1 sibling, 1 reply; 67+ messages in thread
From: David S. Miller @ 2001-09-25 20:28 UTC (permalink / raw)
  To: riel; +Cc: marcelo, andrea, torvalds, linux-kernel

   From: Rik van Riel <riel@conectiva.com.br>
   Date: Tue, 25 Sep 2001 17:24:21 -0300 (BRST)
   
   Or were you measuring loads which are mostly read-only ?

When Kanoj Sarcar was back at SGI testing 32 processor Origin
MIPS systems, pagecache_lock was at the top.

Franks a lot,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-25 19:02       ` Marcelo Tosatti
@ 2001-09-25 20:29         ` David S. Miller
  2001-09-25 21:00           ` Benjamin LaHaise
  0 siblings, 1 reply; 67+ messages in thread
From: David S. Miller @ 2001-09-25 20:29 UTC (permalink / raw)
  To: marcelo; +Cc: andrea, torvalds, linux-kernel

   From: Marcelo Tosatti <marcelo@conectiva.com.br>
   Date: Tue, 25 Sep 2001 16:02:29 -0300 (BRT)
   
   > It is known that pagecache_lock is the biggest scalability issue on
   > large SMP systems, and thus the page cache locking patches Ingo and
   > myself did.
   
   Btw, is that one going into 2.5 for sure? (the per-address-space lock). 
   
Well, there are two things happing in that patch.  Per-hash chain
locks for the page cache itself, and the lock added to the address
space for that page list.

Linus has indicated it will go into 2.5.x, yes.

Franks a lot,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-25 18:40   ` Marcelo Tosatti
  2001-09-25 20:15     ` David S. Miller
@ 2001-09-25 20:40     ` Josh MacDonald
  2001-09-25 19:25       ` Marcelo Tosatti
  1 sibling, 1 reply; 67+ messages in thread
From: Josh MacDonald @ 2001-09-25 20:40 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: linux-kernel

Quoting Marcelo Tosatti (marcelo@conectiva.com.br):
> 
> 
> On Tue, 25 Sep 2001, David S. Miller wrote:
> 
> >    From: Marcelo Tosatti <marcelo@conectiva.com.br>
> >    Date: Tue, 25 Sep 2001 14:49:40 -0300 (BRT)
> >    
> >    Do you really need to do this ? 
> >    
> >                    if (unlikely(!spin_trylock(&pagecache_lock))) {
> >                            /* we hold the page lock so the page cannot go away from under us */
> >                            spin_unlock(&pagemap_lru_lock);
> >    
> >                            spin_lock(&pagecache_lock);
> >                            spin_lock(&pagemap_lru_lock);
> >                    }
> >    
> >    Have you actually seen bad hold times of pagecache_lock by
> >    shrink_caches() ? 
> > 
> > Marcelo, this is needed because of the spin lock ordering rules.
> > The pagecache_lock must be obtained before the pagemap_lru_lock
> > or else deadlock is possible.  The spin_trylock is an optimization.
> 
> Not, it is not.
> 
> We can simply lock the pagecachelock and the pagemap_lru_lock at the
> beginning of the cleaning function. page_launder() use to do that.

Since your main concern seems to be simplicity, the code can remain
the way it is and be far more readable with, e.g.,

/* Aquire lock1 while holding lock2--reverse order. */
#define spin_reverse_lock(lock1,lock2)     \
    if (unlikely(!spin_trylock(&lock1))) { \           
            spin_unlock(&lock2);           \        
            spin_lock(&lock1);             \          
            spin_lock(&lock2);             \        
    }                                                          

You can't argue for simple in favor of increasing lock contention,
but you can keep it readable.

-josh

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-25 20:29         ` David S. Miller
@ 2001-09-25 21:00           ` Benjamin LaHaise
  2001-09-25 21:55             ` David S. Miller
  2001-09-25 22:03             ` Andrea Arcangeli
  0 siblings, 2 replies; 67+ messages in thread
From: Benjamin LaHaise @ 2001-09-25 21:00 UTC (permalink / raw)
  To: David S. Miller; +Cc: marcelo, andrea, torvalds, linux-kernel

On Tue, Sep 25, 2001 at 01:29:05PM -0700, David S. Miller wrote:
> Well, there are two things happing in that patch.  Per-hash chain
> locks for the page cache itself, and the lock added to the address
> space for that page list.

Last time I looked, those patches made the already ugly vm locking 
even worse.  I'd rather try to use some of the rcu techniques for 
page cache lookup, and per-page locking for page cache removal 
which will lead to *cleaner* code as well as a much more scalable 
kernel.

Keep in mind that just because a lock is on someone's hitlist doesn't 
mean that it is for the right reasons.  Look at the io_request_lock 
that is held around the bounce buffer copies in the scsi midlayer.  
*shudder*

		-ben

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-25 20:28         ` David S. Miller
@ 2001-09-25 21:05           ` Andrew Morton
  2001-09-25 21:48             ` David S. Miller
  0 siblings, 1 reply; 67+ messages in thread
From: Andrew Morton @ 2001-09-25 21:05 UTC (permalink / raw)
  To: David S. Miller; +Cc: riel, marcelo, andrea, torvalds, linux-kernel

"David S. Miller" wrote:
> 
>    From: Rik van Riel <riel@conectiva.com.br>
>    Date: Tue, 25 Sep 2001 17:24:21 -0300 (BRST)
> 
>    Or were you measuring loads which are mostly read-only ?
> 
> When Kanoj Sarcar was back at SGI testing 32 processor Origin
> MIPS systems, pagecache_lock was at the top.

But when I asked kumon to test it on his 8-way Xeon,
page_cache_lock contention proved to be insignificant.

Seems to only be a NUMA thing.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-25 21:05           ` Andrew Morton
@ 2001-09-25 21:48             ` David S. Miller
  0 siblings, 0 replies; 67+ messages in thread
From: David S. Miller @ 2001-09-25 21:48 UTC (permalink / raw)
  To: akpm; +Cc: riel, marcelo, andrea, torvalds, linux-kernel, mingo

   From: Andrew Morton <akpm@zip.com.au>
   Date: Tue, 25 Sep 2001 14:05:04 -0700

   "David S. Miller" wrote:
   > When Kanoj Sarcar was back at SGI testing 32 processor Origin
   > MIPS systems, pagecache_lock was at the top.
   
   But when I asked kumon to test it on his 8-way Xeon,
   page_cache_lock contention proved to be insignificant.
   
   Seems to only be a NUMA thing.
   
I doubt it is only a NUMA thing.  I say this for TUX web benchmarks
that tended to hold most of the resident set in memory, the page cache
locking changes were measured to improve performance significantly on
SMP x86 systems.

Ingo would be able to comment further.

Franks a lot,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-25 21:00           ` Benjamin LaHaise
@ 2001-09-25 21:55             ` David S. Miller
  2001-09-25 22:16               ` Benjamin LaHaise
  2001-09-25 22:03             ` Andrea Arcangeli
  1 sibling, 1 reply; 67+ messages in thread
From: David S. Miller @ 2001-09-25 21:55 UTC (permalink / raw)
  To: bcrl; +Cc: marcelo, andrea, torvalds, linux-kernel

   From: Benjamin LaHaise <bcrl@redhat.com>
   Date: Tue, 25 Sep 2001 17:00:55 -0400
   
   Last time I looked, those patches made the already ugly vm locking 
   even worse.  I'd rather try to use some of the rcu techniques for 
   page cache lookup, and per-page locking for page cache removal 
   which will lead to *cleaner* code as well as a much more scalable 
   kernel.
   
I'm willing to investigate using RCU.  However, per hashchain locking
is a much proven technique (inside the networking in particular) which
is why that was the method employed.  At the time the patch was
implemented, the RCU stuff was not fully formulated.

Please note that the problem is lock cachelines in dirty exclusive
state, not a "lock held for long time" issue.

   Keep in mind that just because a lock is on someone's hitlist doesn't 
   mean that it is for the right reasons.  Look at the io_request_lock 
   that is held around the bounce buffer copies in the scsi midlayer.  
   *shudder*

I agree.  But to my understanding, and after having studied the
pagecache lock usage, it was minimally used and not used in any places
unnecessarily as per the io_request_lock example you are stating.

In fact, the pagecache_lock is mostly held for extremely short periods
of time.

Franks a lot,
David S. Miller
davem@redhat.com


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-25 19:57 ` David S. Miller
  2001-09-25 18:40   ` Marcelo Tosatti
@ 2001-09-25 21:57   ` Andrea Arcangeli
  1 sibling, 0 replies; 67+ messages in thread
From: Andrea Arcangeli @ 2001-09-25 21:57 UTC (permalink / raw)
  To: David S. Miller; +Cc: marcelo, torvalds, linux-kernel

On Tue, Sep 25, 2001 at 12:57:58PM -0700, David S. Miller wrote:
> or else deadlock is possible.  The spin_trylock is an optimization.

Indeed.

Andrea

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-25 20:15     ` David S. Miller
  2001-09-25 19:02       ` Marcelo Tosatti
  2001-09-25 20:24       ` Rik van Riel
@ 2001-09-25 22:01       ` Andrea Arcangeli
  2001-09-25 22:03         ` David S. Miller
  2 siblings, 1 reply; 67+ messages in thread
From: Andrea Arcangeli @ 2001-09-25 22:01 UTC (permalink / raw)
  To: David S. Miller; +Cc: marcelo, torvalds, linux-kernel

On Tue, Sep 25, 2001 at 01:15:28PM -0700, David S. Miller wrote:
> I do think it's silly to hold the pagecache_lock during pure scanning
> activities of shrink_caches().

Indeed again.

> It is known that pagecache_lock is the biggest scalability issue on
> large SMP systems, and thus the page cache locking patches Ingo and
> myself did.

yes.

IMHO if we would hold the pagecache lock all the time while shrinking
the cache, then we could kill the lru lock in first place.

Andrea

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-25 22:01       ` Andrea Arcangeli
@ 2001-09-25 22:03         ` David S. Miller
  2001-09-25 22:59           ` Andrea Arcangeli
  0 siblings, 1 reply; 67+ messages in thread
From: David S. Miller @ 2001-09-25 22:03 UTC (permalink / raw)
  To: andrea; +Cc: marcelo, torvalds, linux-kernel

   From: Andrea Arcangeli <andrea@suse.de>
   Date: Wed, 26 Sep 2001 00:01:02 +0200
   
   IMHO if we would hold the pagecache lock all the time while shrinking
   the cache, then we could kill the lru lock in first place.

And actually in the pagecache locking patches, doing such a thing
would be impossible :-) since each page needs to grab a different
lock (because the hash chain is potentially different).

Franks a lot,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-25 21:00           ` Benjamin LaHaise
  2001-09-25 21:55             ` David S. Miller
@ 2001-09-25 22:03             ` Andrea Arcangeli
  1 sibling, 0 replies; 67+ messages in thread
From: Andrea Arcangeli @ 2001-09-25 22:03 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: David S. Miller, marcelo, torvalds, linux-kernel

On Tue, Sep 25, 2001 at 05:00:55PM -0400, Benjamin LaHaise wrote:
> even worse.  I'd rather try to use some of the rcu techniques for 
> page cache lookup, and per-page locking for page cache removal 
> which will lead to *cleaner* code as well as a much more scalable 

I don't think rcu fits there, truncations and releasing must be
extremely efficient too.

Andrea

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-25 21:55             ` David S. Miller
@ 2001-09-25 22:16               ` Benjamin LaHaise
  2001-09-25 22:28                 ` David S. Miller
  0 siblings, 1 reply; 67+ messages in thread
From: Benjamin LaHaise @ 2001-09-25 22:16 UTC (permalink / raw)
  To: David S. Miller; +Cc: marcelo, andrea, torvalds, linux-kernel

On Tue, Sep 25, 2001 at 02:55:47PM -0700, David S. Miller wrote:
> I'm willing to investigate using RCU.  However, per hashchain locking
> is a much proven technique (inside the networking in particular) which
> is why that was the method employed.  At the time the patch was
> implemented, the RCU stuff was not fully formulated.

*nod*

> Please note that the problem is lock cachelines in dirty exclusive
> state, not a "lock held for long time" issue.

Ahh, that's a cpu bug -- one my athlons don't suffer from.

> I agree.  But to my understanding, and after having studied the
> pagecache lock usage, it was minimally used and not used in any places
> unnecessarily as per the io_request_lock example you are stating.
> 
> In fact, the pagecache_lock is mostly held for extremely short periods
> of time.

True, and that is why I would like to see more of the research that 
justifies these changes, as well as comparisons with alternate techniques 
before any of these patches make it into the base tree.  Even before that, 
we need to clean up the code first.

		-ben

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
       [not found]         ` <200109252215.f8PMFDa02034@eng2.beaverton.ibm.com>
@ 2001-09-25 22:26           ` David S. Miller
  2001-09-26 17:42             ` Ingo Molnar
  0 siblings, 1 reply; 67+ messages in thread
From: David S. Miller @ 2001-09-25 22:26 UTC (permalink / raw)
  To: gerrit; +Cc: riel, marcelo, andrea, torvalds, linux-kernel

   From: Gerrit Huizenga <gerrit@us.ibm.com>
   Date: Tue, 25 Sep 2001 15:15:13 PDT

   I'm very curious as to what workloads are showing pagecache_lock as
   a bottleneck.  We haven't noticed this particular bottleneck in most
   of the workloads we are running.  Is there a good workload that shows
   this type of load?
   
Again, I defer to Ingo for specifics, but essentially something like
specweb99 where the whole dataset fits in memory.

Franks a lot,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-25 22:16               ` Benjamin LaHaise
@ 2001-09-25 22:28                 ` David S. Miller
  2001-09-26 16:40                   ` Alan Cox
  0 siblings, 1 reply; 67+ messages in thread
From: David S. Miller @ 2001-09-25 22:28 UTC (permalink / raw)
  To: bcrl; +Cc: marcelo, andrea, torvalds, linux-kernel

   From: Benjamin LaHaise <bcrl@redhat.com>
   Date: Tue, 25 Sep 2001 18:16:43 -0400

   > Please note that the problem is lock cachelines in dirty exclusive
   > state, not a "lock held for long time" issue.
   
   Ahh, that's a cpu bug -- one my athlons don't suffer from.
   
Your Athlons may handle exclusive cache line acquisition more
efficiently (due to memory subsystem performance) but it still
does cost something.

   True, and that is why I would like to see more of the research that 
   justifies these changes, as well as comparisons with alternate techniques 
   before any of these patches make it into the base tree.  Even before that, 
   we need to clean up the code first.
   
As an aside, I actually think the per-hashchain version of the
pagecache locking is cleaner conceptually.  The reason is that
it makes it more clear that we are locking the "identity of page X"
instead of "the page cache".

Franks a lot,
David S. Miller
davem@redhat.com

   

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-25 22:03         ` David S. Miller
@ 2001-09-25 22:59           ` Andrea Arcangeli
  0 siblings, 0 replies; 67+ messages in thread
From: Andrea Arcangeli @ 2001-09-25 22:59 UTC (permalink / raw)
  To: David S. Miller; +Cc: marcelo, torvalds, linux-kernel

On Tue, Sep 25, 2001 at 03:03:28PM -0700, David S. Miller wrote:
>    From: Andrea Arcangeli <andrea@suse.de>
>    Date: Wed, 26 Sep 2001 00:01:02 +0200
>    
>    IMHO if we would hold the pagecache lock all the time while shrinking
>    the cache, then we could kill the lru lock in first place.
> 
> And actually in the pagecache locking patches, doing such a thing
> would be impossible :-) since each page needs to grab a different

good further point too :), it would be an option only for mainline.

Andrea

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
@ 2001-09-26  5:04 Dipankar Sarma
  2001-09-26  5:31 ` Andrew Morton
  0 siblings, 1 reply; 67+ messages in thread
From: Dipankar Sarma @ 2001-09-26  5:04 UTC (permalink / raw)
  To: davem; +Cc: marcelo, riel, Andrea Arcangeli, torvalds, linux-kernel, hawkes

In article <20010925.132816.52117370.davem@redhat.com> David S. Miller wrote:
>    From: Rik van Riel <riel@conectiva.com.br>
>    Date: Tue, 25 Sep 2001 17:24:21 -0300 (BRST)
>    
>    Or were you measuring loads which are mostly read-only ?

> When Kanoj Sarcar was back at SGI testing 32 processor Origin
> MIPS systems, pagecache_lock was at the top.

John Hawkes from SGI had published some AIM7 numbers that showed
pagecache_lock to be a bottleneck above 4 processors. At 32 processors,
half the CPU cycles were spent on waiting for pagecache_lock. The
thread is at -

http://marc.theaimsgroup.com/?l=lse-tech&m=98459051027582&w=2

Thanks
Dipankar
-- 
Dipankar Sarma  <dipankar@in.ibm.com> Project: http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-26  5:04 Dipankar Sarma
@ 2001-09-26  5:31 ` Andrew Morton
  2001-09-26  6:57   ` David S. Miller
                     ` (2 more replies)
  0 siblings, 3 replies; 67+ messages in thread
From: Andrew Morton @ 2001-09-26  5:31 UTC (permalink / raw)
  To: dipankar
  Cc: davem, marcelo, riel, Andrea Arcangeli, torvalds, linux-kernel,
	hawkes

Dipankar Sarma wrote:
> 
> In article <20010925.132816.52117370.davem@redhat.com> David S. Miller wrote:
> >    From: Rik van Riel <riel@conectiva.com.br>
> >    Date: Tue, 25 Sep 2001 17:24:21 -0300 (BRST)
> >
> >    Or were you measuring loads which are mostly read-only ?
> 
> > When Kanoj Sarcar was back at SGI testing 32 processor Origin
> > MIPS systems, pagecache_lock was at the top.
> 
> John Hawkes from SGI had published some AIM7 numbers that showed
> pagecache_lock to be a bottleneck above 4 processors. At 32 processors,
> half the CPU cycles were spent on waiting for pagecache_lock. The
> thread is at -
> 
> http://marc.theaimsgroup.com/?l=lse-tech&m=98459051027582&w=2
> 

That's NUMA hardware.   The per-hashqueue locking change made
a big improvement on that hardware.  But when it was used on
Intel hardware it made no measurable difference at all.

Sorry, but the patch adds compexity and unless a significant
throughput benefit can be demonstrated on less exotic hardware,
why use it?

Here are kumon's test results from March, with and without
the hashed lock patch:



-------- Original Message --------
Subject: Re: [Fwd: Re: [Lse-tech] AIM7 scaling, pagecache_lock, multiqueue scheduler]
Date: Thu, 15 Mar 2001 18:03:55 +0900
From: kumon@flab.fujitsu.co.jp
Reply-To: kumon@flab.fujitsu.co.jp
To: Andrew Morton <andrewm@uow.edu.au>
CC: kumon@flab.fujitsu.co.jp, ahirai@flab.fujitsu.co.jp,John Hawkes <hawkes@engr.sgi.com>,kumon@flab.fujitsu.co.jp
In-Reply-To: <3AB032B3.87940521@uow.edu.au>,<3AB0089B.CF3496D2@uow.edu.au><200103150234.LAA28075@asami.proc><3AB032B3.87940521@uow.edu.au>

OK, the followings are a result of our brief measurement with WebBench
(mindcraft type) of 2.4.2 and 2.4.2+pcl .

Workload: WebBench 3.0 (static get)
Machine: Profusion 8way 550MHz/1MB cache 1GB mem.
Server: Apache 1.3.9-8 (w/ SINGLE_LISTEN_UNSERIALIZED_ACCEPT)
	obtained from RedHat.
Clients: 32 clients each has 2 requesting threads.

The following number is Request per sec.

		242	242+pcl	ratio
-------------------------------------
1SMP    	1,603	1,584	0.99
2(1+1)SMP	2,443	2,437	1.00
4(1+3)SMP	4,420	4,426	1.00
8(4+4)SMP	5,381	5,400	1.00

#No idle time observed in the 1 to 4 SMP runs.
#Only 8 SMP cases shows cpu-idle time, but it is about 2.1-2.8% of the
#total CPU time.

Note: The load of two buses of Profusion system isn't balance, because
the number of CPUs on each bus is unbalance.

Summary:
 From the above brief test, (+pcl) patch doens't show the measurable
performance gain.

-

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-26  5:31 ` Andrew Morton
@ 2001-09-26  6:57   ` David S. Miller
  2001-09-26  7:08   ` Dipankar Sarma
  2001-09-26 16:52   ` John Hawkes
  2 siblings, 0 replies; 67+ messages in thread
From: David S. Miller @ 2001-09-26  6:57 UTC (permalink / raw)
  To: akpm; +Cc: dipankar, marcelo, riel, andrea, torvalds, linux-kernel, hawkes

   From: Andrew Morton <akpm@zip.com.au>
   Date: Tue, 25 Sep 2001 22:31:32 -0700
   
   Here are kumon's test results from March, with and without
   the hashed lock patch:

Please elaborate on what the webbench-3.0 static gets was
really doing.

Was this test composed of multiple accesses to the same or a small set
of files?  If so, that is indeed the case where the page cache locking
patches won't help at all.

The more diversified the set of files being accessed, the greater the
gain from the locking changes.  You have to encourage the cpus at
least have a chance at accessing different hash chains :-)

Franks a lot,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-26  5:31 ` Andrew Morton
  2001-09-26  6:57   ` David S. Miller
@ 2001-09-26  7:08   ` Dipankar Sarma
  2001-09-26 16:52   ` John Hawkes
  2 siblings, 0 replies; 67+ messages in thread
From: Dipankar Sarma @ 2001-09-26  7:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: davem, marcelo, riel, Andrea Arcangeli, torvalds, linux-kernel,
	anton, jdoelle

On Tue, Sep 25, 2001 at 10:31:32PM -0700, Andrew Morton wrote:
> Dipankar Sarma wrote:
> > 
> > John Hawkes from SGI had published some AIM7 numbers that showed
> > pagecache_lock to be a bottleneck above 4 processors. At 32 processors,
> > half the CPU cycles were spent on waiting for pagecache_lock. The
> > thread is at -
> > 
> > http://marc.theaimsgroup.com/?l=lse-tech&m=98459051027582&w=2
> > 
> 
> That's NUMA hardware.   The per-hashqueue locking change made
> a big improvement on that hardware.  But when it was used on
> Intel hardware it made no measurable difference at all.
> 
> Sorry, but the patch adds compexity and unless a significant
> throughput benefit can be demonstrated on less exotic hardware,
> why use it?

I agree that on NUMA systems, contention and lock wait times
degenerate non-linearly thereby skewing the actual impact.

IIRC, there were discussions on lse-tech about pagecache_lock and 
dbench numbers published by Juergen Doelle (on 8way Intel) and 
Anton Blanchard on 16way PPC. Perhaps they can shed some light on this.

Thanks
Dipankar
-- 
Dipankar Sarma  <dipankar@in.ibm.com> Project: http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-25 22:28                 ` David S. Miller
@ 2001-09-26 16:40                   ` Alan Cox
  2001-09-26 17:25                     ` Linus Torvalds
  0 siblings, 1 reply; 67+ messages in thread
From: Alan Cox @ 2001-09-26 16:40 UTC (permalink / raw)
  To: David S. Miller; +Cc: bcrl, marcelo, andrea, torvalds, linux-kernel

>    Ahh, that's a cpu bug -- one my athlons don't suffer from.
>    
> Your Athlons may handle exclusive cache line acquisition more
> efficiently (due to memory subsystem performance) but it still
> does cost something.

On an exclusive line on Athlon a lock cycle is near enough free, its 
just an ordering constraint. Since the line is in E state no other bus
master can hold a copy in cache so the atomicity is there. Ditto for newer
Intel processors

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-26  5:31 ` Andrew Morton
  2001-09-26  6:57   ` David S. Miller
  2001-09-26  7:08   ` Dipankar Sarma
@ 2001-09-26 16:52   ` John Hawkes
  2 siblings, 0 replies; 67+ messages in thread
From: John Hawkes @ 2001-09-26 16:52 UTC (permalink / raw)
  To: Andrew Morton, dipankar
  Cc: davem, marcelo, riel, Andrea Arcangeli, torvalds, linux-kernel,
	hawkes

From: "Andrew Morton" <akpm@zip.com.au>
> > John Hawkes from SGI had published some AIM7 numbers that showed
> > pagecache_lock to be a bottleneck above 4 processors. At 32
processors,
> > half the CPU cycles were spent on waiting for pagecache_lock. The
> > thread is at -
> >
> > http://marc.theaimsgroup.com/?l=lse-tech&m=98459051027582&w=2
> >
>
> That's NUMA hardware.   The per-hashqueue locking change made
> a big improvement on that hardware.  But when it was used on
> Intel hardware it made no measurable difference at all.

More specifically, that was on SGI Origin2000 32p mips64 ccNUMA
hardware.  The pagecache_lock bottleneck is substantially less on SGI
Itanium ccNUMA hardware running those AIM7 workloads.  I'm seeing
moderately significant contention on the Big Kernel Lock, mostly from
sys_lseek() and ext2_get_block().

John Hawkes
hawkes@sgi.com



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-26 16:40                   ` Alan Cox
@ 2001-09-26 17:25                     ` Linus Torvalds
  2001-09-26 17:40                       ` Alan Cox
                                         ` (4 more replies)
  0 siblings, 5 replies; 67+ messages in thread
From: Linus Torvalds @ 2001-09-26 17:25 UTC (permalink / raw)
  To: Alan Cox; +Cc: David S. Miller, bcrl, marcelo, andrea, linux-kernel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1697 bytes --]


On Wed, 26 Sep 2001, Alan Cox wrote:
> >
> > Your Athlons may handle exclusive cache line acquisition more
> > efficiently (due to memory subsystem performance) but it still
> > does cost something.
>
> On an exclusive line on Athlon a lock cycle is near enough free, its
> just an ordering constraint. Since the line is in E state no other bus
> master can hold a copy in cache so the atomicity is there. Ditto for newer
> Intel processors

You misunderstood the problem, I think: when the line moves from one CPU
to the other (the exclusive state moves along with it), that is
_expensive_.

Even when you have a backside bus (or cache pushout content snooping)  to
allow the cacheline to move directly from one CPU to the other without
having to go through memory, that's a really expensive thing to do.

So re-aquring the lock on the same CPU is pretty much free (18 cycles for
Intel, if I remember correctly, and that's _entirely_ due to the pipeline
flush to ensure in-order execution around it).

[ Oh, just for interest I checked my P4, which has a much longer pipeline:
  the cost of an exclusive locked access is a whopping 104 cycles. But we
  already knew that the first-generation P4 does badly on many things.

  Just reading the cycle counter is apparently around 80 cycles on a P4,
  it's 32 cycles on a PIII. Looks like that also stalls the pipeline or
  something. But cpuid is _really_ horrible. Test out the attached
  program.

	PIII:
		nothing: 32 cycles
		locked add: 50 cycles
		cpuid: 170 cycles

	P4:
		nothing: 80 cycles
		locked add: 184 cycles
		cpuid: 652 cycles

   Remember: these are for the already-exclusive-cache cases. ]

What are the athlon numbers?

		Linus

[-- Attachment #2: Type: TEXT/PLAIN, Size: 612 bytes --]

#define rdtsc(low) \
   __asm__ __volatile__("rdtsc" : "=a" (low) : : "edx")

#define TIME(x,y) \
	min = 100000;						\
	for (i = 0; i < 1000; i++) {				\
		unsigned long start,end;			\
		rdtsc(start);					\
		x;						\
		rdtsc(end);					\
		end -= start;					\
		if (end < min)					\
			min = end;				\
	}							\
	printf(y ": %d cycles\n", min);

#define LOCK	asm volatile("lock ; addl $0,0(%esp)")
#define CPUID	asm volatile("cpuid": : :"ax", "dx", "cx", "bx")

int main()
{
	unsigned long min;
	int i;

	TIME(/* */, "nothing");
	TIME(LOCK, "locked add");
	TIME(CPUID, "cpuid");
}

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-26 17:25                     ` Linus Torvalds
@ 2001-09-26 17:40                       ` Alan Cox
  2001-09-26 17:44                         ` Linus Torvalds
                                           ` (2 more replies)
  2001-09-26 17:43                       ` Richard Gooch
                                         ` (3 subsequent siblings)
  4 siblings, 3 replies; 67+ messages in thread
From: Alan Cox @ 2001-09-26 17:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, David S. Miller, bcrl, marcelo, andrea, linux-kernel

> 	PIII:
> 		nothing: 32 cycles
> 		locked add: 50 cycles
> 		cpuid: 170 cycles
> 
> 	P4:
> 		nothing: 80 cycles
> 		locked add: 184 cycles
> 		cpuid: 652 cycles


Original core Athlon (step 2 and earlier)

nothing: 11 cycles
locked add: 22 cycles
cpuid: 67 cycles

generic Athlon is

nothing: 11 cycles
locked add: 11 cycles
cpuid: 64 cycles


I don't currently have a palomino core to test

Wait for AMD to publish graphs of CPUid performance for PIV versus Athlon 8)


Alan

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-25 22:26           ` David S. Miller
@ 2001-09-26 17:42             ` Ingo Molnar
  0 siblings, 0 replies; 67+ messages in thread
From: Ingo Molnar @ 2001-09-26 17:42 UTC (permalink / raw)
  To: David S. Miller
  Cc: gerrit, riel, marcelo, Andrea Arcangeli, Linus Torvalds,
	linux-kernel


On Tue, 25 Sep 2001, David S. Miller wrote:

>    I'm very curious as to what workloads are showing pagecache_lock as
>    a bottleneck.  We haven't noticed this particular bottleneck in most
>    of the workloads we are running.  Is there a good workload that shows
>    this type of load?
>
> Again, I defer to Ingo for specifics, but essentially something
> like specweb99 where the whole dataset fits in memory.

it was SPECweb99 tests done in 32 GB RAM, 8 CPUs, where the pagecache was
nearly 30 GB big. We saw visible pagecache_lock contention on such
systems. Due to TUX's use of zerocopy, page lookups happen at a much
larger frequency and they are not intermixed with memory copies - in
contrast with workloads like dbench.

	Ingo


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-26 17:25                     ` Linus Torvalds
  2001-09-26 17:40                       ` Alan Cox
@ 2001-09-26 17:43                       ` Richard Gooch
  2001-09-26 18:24                         ` Benjamin LaHaise
  2001-09-26 17:45                       ` Dave Jones
                                         ` (2 subsequent siblings)
  4 siblings, 1 reply; 67+ messages in thread
From: Richard Gooch @ 2001-09-26 17:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, David S. Miller, bcrl, marcelo, andrea, linux-kernel

Linus Torvalds writes:
>   This message is in MIME format.  The first part should be readable text,
>   while the remaining parts are likely unreadable without MIME-aware tools.
>   Send mail to mime@docserver.cac.washington.edu for more info.

Yuk! MIME! I thought you hated it too?

> 	PIII:
> 		nothing: 32 cycles
> 		locked add: 50 cycles
> 		cpuid: 170 cycles
> 
> 	P4:
> 		nothing: 80 cycles
> 		locked add: 184 cycles
> 		cpuid: 652 cycles
> 
>    Remember: these are for the already-exclusive-cache cases. ]
> 
> What are the athlon numbers?

Athalon 850 MHz:
nothing: 11 cycles
locked add: 12 cycles
cpuid: 64 cycles

BTW: your code had horrible control-M's on each line. So the compiler
choked (with a less-than-helpful error message). Of course, cat t.c
showed nothing amiss. Fortunately emacs doesn't hide information.

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-26 17:40                       ` Alan Cox
@ 2001-09-26 17:44                         ` Linus Torvalds
  2001-09-26 18:01                           ` Benjamin LaHaise
  2001-09-26 18:01                         ` Dave Jones
  2001-09-26 20:20                         ` Vojtech Pavlik
  2 siblings, 1 reply; 67+ messages in thread
From: Linus Torvalds @ 2001-09-26 17:44 UTC (permalink / raw)
  To: Alan Cox; +Cc: David S. Miller, bcrl, marcelo, andrea, linux-kernel


On Wed, 26 Sep 2001, Alan Cox wrote:

> > 	PIII:
> > 		nothing: 32 cycles
> > 		locked add: 50 cycles
> > 		cpuid: 170 cycles
> >
> > 	P4:
> > 		nothing: 80 cycles
> > 		locked add: 184 cycles
> > 		cpuid: 652 cycles
>
>
> Original core Athlon (step 2 and earlier)
>		nothing: 11 cycles
>		locked add: 22 cycles
>		cpuid: 67 cycles
>
> generic Athlon:
>		nothing: 11 cycles
>		locked add: 11 cycles
>		cpuid: 64 cycles

Do you have an actual SMP Athlon to test? I'd love to see if that "locked
add" thing is really SMP-safe - it may be that it's the old "AMD turned
off the 'lock' prefix synchronization because it doesn't matter in UP".
They used to have a bit to do that..

That said, it _can_ be real even on SMP. There's no reason why a memory
barrier would have to be as heavy as it is on some machines (even the P4
looks positively _fast_ compared to most older machines that did memory
barriers on the bus and took hundreds of much slower cycles to do it).

> Wait for AMD to publish graphs of CPUid performance for PIV versus Athlon 8)

The sad thing is, I think Intel used to suggest that people use "cpuid" as
the thing to serialize the cores. So people may actually be _using_ it for
something like semaphores. I remember that Ingo or somebody suggested we'd
use it for the Linux "mb()" macro - I _much_ prefer the saner locked zero
add into the stack, and the prediction that Intel would be more likely to
optimize for "add" than for "cpuid" certainly ended up being surprisingly
true on the P4.

		Linus


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-26 17:25                     ` Linus Torvalds
  2001-09-26 17:40                       ` Alan Cox
  2001-09-26 17:43                       ` Richard Gooch
@ 2001-09-26 17:45                       ` Dave Jones
  2001-09-26 17:50                       ` Alan Cox
  2001-09-26 23:26                       ` David S. Miller
  4 siblings, 0 replies; 67+ messages in thread
From: Dave Jones @ 2001-09-26 17:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, David S. Miller, bcrl, marcelo, andrea, linux-kernel

On Wed, 26 Sep 2001, Linus Torvalds wrote:

> What are the athlon numbers?

nothing: 11 cycles
locked add: 11 cycles
cpuid: 63 cycles

(cpuid varies between 63->68 here)

regards,

Dave.

-- 
| Dave Jones.        http://www.suse.de/~davej
| SuSE Labs


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-26 17:25                     ` Linus Torvalds
                                         ` (2 preceding siblings ...)
  2001-09-26 17:45                       ` Dave Jones
@ 2001-09-26 17:50                       ` Alan Cox
  2001-09-26 17:59                         ` Dave Jones
  2001-09-26 18:59                         ` George Greer
  2001-09-26 23:26                       ` David S. Miller
  4 siblings, 2 replies; 67+ messages in thread
From: Alan Cox @ 2001-09-26 17:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, David S. Miller, bcrl, marcelo, andrea, linux-kernel

and for completeness

VIA Cyrix CIII (original generation 0.18u)

nothing: 28 cycles
locked add: 29 cycles
cpuid: 72 cycles

Pentium Pro

nothing: 33 cycles
locked add: 51 cycles
cpuid: 98 cycles

(base comparison - pure in order machine)

IDT winchip

nothing: 17 cycles
locked add: 20 cycles
cpuid: 33 cycles


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-26 17:50                       ` Alan Cox
@ 2001-09-26 17:59                         ` Dave Jones
  2001-09-26 18:07                           ` Alan Cox
                                             ` (2 more replies)
  2001-09-26 18:59                         ` George Greer
  1 sibling, 3 replies; 67+ messages in thread
From: Dave Jones @ 2001-09-26 17:59 UTC (permalink / raw)
  To: Alan Cox
  Cc: Linus Torvalds, David S. Miller, bcrl, marcelo, andrea,
	linux-kernel

On Wed, 26 Sep 2001, Alan Cox wrote:

> VIA Cyrix CIII (original generation 0.18u)
>
> nothing: 28 cycles
> locked add: 29 cycles
> cpuid: 72 cycles

Interesting. From a newer C3..

nothing: 30 cycles
locked add: 31 cycles
cpuid: 79 cycles

Only slightly worse, but I'd not expected this.
This was from a 866MHz part too, whereas you have a 533 iirc ?

regards,

Dave.

-- 
| Dave Jones.        http://www.suse.de/~davej
| SuSE Labs


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-26 17:40                       ` Alan Cox
  2001-09-26 17:44                         ` Linus Torvalds
@ 2001-09-26 18:01                         ` Dave Jones
  2001-09-26 20:20                         ` Vojtech Pavlik
  2 siblings, 0 replies; 67+ messages in thread
From: Dave Jones @ 2001-09-26 18:01 UTC (permalink / raw)
  To: Alan Cox
  Cc: Linus Torvalds, David S. Miller, bcrl, marcelo, andrea,
	linux-kernel

On Wed, 26 Sep 2001, Alan Cox wrote:

> Original core Athlon (step 2 and earlier)
>
> nothing: 11 cycles
> locked add: 22 cycles
> cpuid: 67 cycles
>
> I don't currently have a palomino core to test

Exactly the same as the original core.

nothing: 11 cycles
locked add: 11 cycles
cpuid: 67 cycles

(cpuid varies 63->68)

regards,

Dave.

-- 
| Dave Jones.        http://www.suse.de/~davej
| SuSE Labs


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-26 17:44                         ` Linus Torvalds
@ 2001-09-26 18:01                           ` Benjamin LaHaise
  0 siblings, 0 replies; 67+ messages in thread
From: Benjamin LaHaise @ 2001-09-26 18:01 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Alan Cox, David S. Miller, marcelo, andrea, linux-kernel

On Wed, Sep 26, 2001 at 10:44:14AM -0700, Linus Torvalds wrote:
> Do you have an actual SMP Athlon to test? I'd love to see if that "locked
> add" thing is really SMP-safe - it may be that it's the old "AMD turned
> off the 'lock' prefix synchronization because it doesn't matter in UP".
> They used to have a bit to do that..

Same, my dual reports:

	[bcrl@toomuch ~]$ ./a.out 
	nothing: 11 cycles
	locked add: 11 cycles
	cpuid: 68 cycles

Which is pretty good.

> That said, it _can_ be real even on SMP. There's no reason why a memory
> barrier would have to be as heavy as it is on some machines (even the P4
> looks positively _fast_ compared to most older machines that did memory
> barriers on the bus and took hundreds of much slower cycles to do it).

I had discussions with a few people from intel about the p4 having much 
improved locking performance, including the ability to speculatively 
execute locked instructions.  How much of that is enabled in the current 
cores is another question entirely (gotta love microcode patches).

		-ben

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-26 17:59                         ` Dave Jones
@ 2001-09-26 18:07                           ` Alan Cox
  2001-09-26 18:09                           ` Padraig Brady
  2001-09-26 18:24                           ` Linus Torvalds
  2 siblings, 0 replies; 67+ messages in thread
From: Alan Cox @ 2001-09-26 18:07 UTC (permalink / raw)
  To: Dave Jones
  Cc: Alan Cox, Linus Torvalds, David S. Miller, bcrl, marcelo, andrea,
	linux-kernel

> nothing: 30 cycles
> locked add: 31 cycles
> cpuid: 79 cycles
> 
> Only slightly worse, but I'd not expected this.
> This was from a 866MHz part too, whereas you have a 533 iirc ?

The 0.13u part has a couple more pipeline steps I believe

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-26 17:59                         ` Dave Jones
  2001-09-26 18:07                           ` Alan Cox
@ 2001-09-26 18:09                           ` Padraig Brady
  2001-09-26 18:22                             ` Dave Jones
  2001-09-26 18:24                           ` Linus Torvalds
  2 siblings, 1 reply; 67+ messages in thread
From: Padraig Brady @ 2001-09-26 18:09 UTC (permalink / raw)
  To: Dave Jones; +Cc: Alan Cox, linux-kernel

Dave Jones wrote:

>On Wed, 26 Sep 2001, Alan Cox wrote:
>
>>VIA Cyrix CIII (original generation 0.18u)
>>
>>nothing: 28 cycles
>>locked add: 29 cycles
>>cpuid: 72 cycles
>>
>
>Interesting. From a newer C3..
>
>nothing: 30 cycles
>locked add: 31 cycles
>cpuid: 79 cycles
>
>Only slightly worse, but I'd not expected this.
>This was from a 866MHz part too, whereas you have a 533 iirc ?
>
>regards,
>
>Dave.
>
Interesting, does the origonal CIII have a TSC? would that affect the 
timings Alan got?

The following table may  be of use to people:

(All these S370)
----------------------------------------------------------------------------------------
core        size    name        code    Notes
----------------------------------------------------------------------------------------
samuel        0.18µm    Via Cyrix III(C5)   (128K L1 0K L2 cache). FPU 
doesn't run @ full clock speed.
samuel II    0.15µm    Via C3        (C5B)   667MHz CIII in Dabs are 
C3's (128K L1, 64K L2 cache), (MMX/3D now!), FPU @ full clock speed.
mathew      0.15µm    Via C3        (C5B)   mobile samuel II with 
integrated north bridge & 2D/3D graphics. (1.6v)
ezra        0.13µm    Via C3        (C5C)   Debut @ 850MHz rising to 
1GHz quickly (1.35v)
nehemiah    0.13µm    Via C4        (C5X)   Debut @ 1.2GHz (128K L1, 
256K L2 cache) (SSE)
esther        0.10µm    Via C4        (C5Y)   ?

----------------------

C3 availability details:
667    66 / 100 / 133    1.5    Socket 370    L1: 128kB,L2: 64kB    
0.15µ    6-12W    Mar 2001
733    66 / 100 / 133    1.5    Socket 370    L1: 128kB,L2: 64kB    
0.15µ    6-12W    May 2001
733    66 / 100 / 133    1.5    Socket 370    L1: 128kB,L2: 64kB    
0.15µ    1+ W    May 2001 (e series)
750    100 / 133    1.5    Socket 370    L1: 128kB,L2: 64kB    0.15µ    
6-12W    May 2001
800    100 / 133    1.5    Socket 370    L1: 128kB,L2: 64kB    0.13µ    
7-12W    May 2001 (ezra)

----------------------



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-26 18:09                           ` Padraig Brady
@ 2001-09-26 18:22                             ` Dave Jones
  0 siblings, 0 replies; 67+ messages in thread
From: Dave Jones @ 2001-09-26 18:22 UTC (permalink / raw)
  To: Padraig Brady; +Cc: Alan Cox, linux-kernel

On Wed, 26 Sep 2001, Padraig Brady wrote:

> Interesting, does the origonal CIII have a TSC?

Yes.

regards,

Dave.

-- 
| Dave Jones.        http://www.suse.de/~davej
| SuSE Labs


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-26 17:43                       ` Richard Gooch
@ 2001-09-26 18:24                         ` Benjamin LaHaise
  2001-09-26 18:48                           ` Richard Gooch
  0 siblings, 1 reply; 67+ messages in thread
From: Benjamin LaHaise @ 2001-09-26 18:24 UTC (permalink / raw)
  To: Richard Gooch; +Cc: linux-kernel

On Wed, Sep 26, 2001 at 11:43:25AM -0600, Richard Gooch wrote:
> BTW: your code had horrible control-M's on each line. So the compiler
> choked (with a less-than-helpful error message). Of course, cat t.c
> showed nothing amiss. Fortunately emacs doesn't hide information.

You must be using some kind of broken MUA -- neither mutt nor pine 
resulted in anything with a trace of 0x0d in it.

		-ben

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-26 17:59                         ` Dave Jones
  2001-09-26 18:07                           ` Alan Cox
  2001-09-26 18:09                           ` Padraig Brady
@ 2001-09-26 18:24                           ` Linus Torvalds
  2001-09-26 18:40                             ` Dave Jones
  2001-09-26 19:04                             ` Locking comment on shrink_caches() George Greer
  2 siblings, 2 replies; 67+ messages in thread
From: Linus Torvalds @ 2001-09-26 18:24 UTC (permalink / raw)
  To: Dave Jones; +Cc: Alan Cox, David S. Miller, bcrl, marcelo, andrea, linux-kernel


On Wed, 26 Sep 2001, Dave Jones wrote:
> On Wed, 26 Sep 2001, Alan Cox wrote:
>
> > VIA Cyrix CIII (original generation 0.18u)
> >
> > nothing: 28 cycles
> > locked add: 29 cycles
> > cpuid: 72 cycles
>
> Interesting. From a newer C3..
>
> nothing: 30 cycles
> locked add: 31 cycles
> cpuid: 79 cycles
>
> Only slightly worse, but I'd not expected this.

That difference can easily be explained by the compiler and options.

You should use "gcc -O2" at least, in order to avoid having gcc do
unnecessary spills to memory in between the timings. And there may be some
versions of gcc that en dup spilling even then.

		Linus


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-26 18:24                           ` Linus Torvalds
@ 2001-09-26 18:40                             ` Dave Jones
  2001-09-26 19:12                               ` Linus Torvalds
  2001-09-26 19:04                             ` Locking comment on shrink_caches() George Greer
  1 sibling, 1 reply; 67+ messages in thread
From: Dave Jones @ 2001-09-26 18:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, David S. Miller, bcrl, marcelo, andrea, linux-kernel

On Wed, 26 Sep 2001, Linus Torvalds wrote:

> > > cpuid: 72 cycles
> > cpuid: 79 cycles
> > Only slightly worse, but I'd not expected this.
> That difference can easily be explained by the compiler and options.

Actually repeated runs of the test on that box show it deviating by up
to 10 cycles, making it match the results that Alan posted.
-O2 made no difference, these deviations still occur. They seem more
prominent on the C3 than other boxes I've tried, even with the same
compiler toolchain.

regards,

Dave.

-- 
| Dave Jones.        http://www.suse.de/~davej
| SuSE Labs


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-26 18:24                         ` Benjamin LaHaise
@ 2001-09-26 18:48                           ` Richard Gooch
  2001-09-26 18:58                             ` Davide Libenzi
  0 siblings, 1 reply; 67+ messages in thread
From: Richard Gooch @ 2001-09-26 18:48 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: linux-kernel

Benjamin LaHaise writes:
> On Wed, Sep 26, 2001 at 11:43:25AM -0600, Richard Gooch wrote:
> > BTW: your code had horrible control-M's on each line. So the compiler
> > choked (with a less-than-helpful error message). Of course, cat t.c
> > showed nothing amiss. Fortunately emacs doesn't hide information.
> 
> You must be using some kind of broken MUA -- neither mutt nor pine 
> resulted in anything with a trace of 0x0d in it.

My MUA doesn't know about MIME at all (part of the reason I hate
MIME). I save the message to a file and run uudeview 0.5pl13.

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-26 18:48                           ` Richard Gooch
@ 2001-09-26 18:58                             ` Davide Libenzi
  0 siblings, 0 replies; 67+ messages in thread
From: Davide Libenzi @ 2001-09-26 18:58 UTC (permalink / raw)
  To: Richard Gooch; +Cc: linux-kernel, Benjamin LaHaise


On 26-Sep-2001 Richard Gooch wrote:
> Benjamin LaHaise writes:
>> On Wed, Sep 26, 2001 at 11:43:25AM -0600, Richard Gooch wrote:
>> > BTW: your code had horrible control-M's on each line. So the compiler
>> > choked (with a less-than-helpful error message). Of course, cat t.c
>> > showed nothing amiss. Fortunately emacs doesn't hide information.
>> 
>> You must be using some kind of broken MUA -- neither mutt nor pine 
>> resulted in anything with a trace of 0x0d in it.
> 
> My MUA doesn't know about MIME at all (part of the reason I hate
> MIME). I save the message to a file and run uudeview 0.5pl13.

Maybe the file you save is in RFC format ( \r\n ) and uudeview does not trim it.



- Davide


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-26 17:50                       ` Alan Cox
  2001-09-26 17:59                         ` Dave Jones
@ 2001-09-26 18:59                         ` George Greer
  1 sibling, 0 replies; 67+ messages in thread
From: George Greer @ 2001-09-26 18:59 UTC (permalink / raw)
  To: linux-kernel

On Wed, 26 Sep 2001, Alan Cox wrote:

>and for completeness
>
>VIA Cyrix CIII (original generation 0.18u)
>
>nothing: 28 cycles
>locked add: 29 cycles
>cpuid: 72 cycles
>
>Pentium Pro
>
>nothing: 33 cycles
>locked add: 51 cycles
>cpuid: 98 cycles
>
>(base comparison - pure in order machine)
>
>IDT winchip
>
>nothing: 17 cycles
>locked add: 20 cycles
>cpuid: 33 cycles

2x Pentium MMX 233MHz

nothing: 14 cycles
locked add: 59 cycles
cpuid: 31 cycles

2x Pentium 133MHz

nothing: 14 cycles
locked add: 76 cycles
cpuid: 31 cycles

cpuid is oddly fast.

-- 
George Greer, greerga@m-l.org | Genius may have its limitations, but stupidity
http://www.m-l.org/~greerga/  | is not thus handicapped. -- Elbert Hubbard


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-26 18:24                           ` Linus Torvalds
  2001-09-26 18:40                             ` Dave Jones
@ 2001-09-26 19:04                             ` George Greer
  1 sibling, 0 replies; 67+ messages in thread
From: George Greer @ 2001-09-26 19:04 UTC (permalink / raw)
  To: linux-kernel

On Wed, 26 Sep 2001, Linus Torvalds wrote:

>
>On Wed, 26 Sep 2001, Dave Jones wrote:
>> On Wed, 26 Sep 2001, Alan Cox wrote:
>>
>> > VIA Cyrix CIII (original generation 0.18u)
>> >
>> > nothing: 28 cycles
>> > locked add: 29 cycles
>> > cpuid: 72 cycles
>>
>> Interesting. From a newer C3..
>>
>> nothing: 30 cycles
>> locked add: 31 cycles
>> cpuid: 79 cycles
>>
>> Only slightly worse, but I'd not expected this.
>
>That difference can easily be explained by the compiler and options.
>
>You should use "gcc -O2" at least, in order to avoid having gcc do
>unnecessary spills to memory in between the timings. And there may be some
>versions of gcc that en dup spilling even then.

Nice big difference in 'locked add' seen here.

gcc version 2.96 20000731 (Red Hat Linux 7.1 2.96-85)
2x Pentium 233/MMX

-O0				-O2
nothing: 15 cycles		nothing: 14 cycles
locked add: 60 cycles		locked add: 32 cycles
cpuid: 33 cycles		cpuid: 32 cycles


gcc version 2.96 20000731 (Red Hat Linux 7.1 2.96-85)
2x Pentium 133

-O0				-O2
nothing: 14 cycles		nothing: 13 cycles
locked add: 76 cycles		locked add: 25 cycles
cpuid: 31 cycles		cpuid: 30 cycles

-- 
George Greer, greerga@m-l.org | Genius may have its limitations, but stupidity
http://www.m-l.org/~greerga/  | is not thus handicapped. -- Elbert Hubbard


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-26 18:40                             ` Dave Jones
@ 2001-09-26 19:12                               ` Linus Torvalds
  2001-09-27 12:22                                 ` CPU frequency shifting "problems" Padraig Brady
  0 siblings, 1 reply; 67+ messages in thread
From: Linus Torvalds @ 2001-09-26 19:12 UTC (permalink / raw)
  To: linux-kernel

In article <Pine.LNX.4.30.0109262036480.8655-100000@Appserv.suse.de>,
Dave Jones  <davej@suse.de> wrote:
>On Wed, 26 Sep 2001, Linus Torvalds wrote:
>
>> > > cpuid: 72 cycles
>> > cpuid: 79 cycles
>> > Only slightly worse, but I'd not expected this.
>> That difference can easily be explained by the compiler and options.
>
>Actually repeated runs of the test on that box show it deviating by up
>to 10 cycles, making it match the results that Alan posted.
>-O2 made no difference, these deviations still occur. They seem more
>prominent on the C3 than other boxes I've tried, even with the same
>compiler toolchain.

Does the C3 do any kind of frequency shifting?

For example, on a transmeta CPU, the TSC will run at a constant
"nominal" speed (the highest the CPU can go), although the real CPU
speed will depend on the load of the machine and temperature etc.  So on
a crusoe CPU you'll see varying speeds (and it depends on the speed
grade, because that in turn depends on how many longrun steps are being
actively used). 

For example, on a mostly idle machine I get

	torvalds@kiwi:~ > ./a.out 
	nothing: 54 cycles
	locked add: 54 cycles
	cpuid: 91 cycles

while if I have another window that does an endless loop to keep the CPU
busy, the _real_ frequency of the CPU scales up, and the machine
basically becomes faster:

	torvalds@kiwi:~ > ./a.out 
	nothing: 36 cycles
	locked add: 36 cycles
	cpuid: 54 cycles

(The reason why the "nothing" TSC read is expensive on crusoe is because
of the scaling of the TSC - rdtsc literally has to do a floating point
multiply-add to scale the clock to the right "nominal" frequency.  Of
course, "expensive" is still a lot less than the inexplicable 80 cycles
on a P4). 

(That's a 600MHz part going down to to 400MHz in idle, btw)

On a 633MHz part (I don't actually have access to any of the high speed
grades ;) it ends up being 

fast:
	nothing: 39 cycles
	locked add: 40 cycles
	cpuid: 68 cycles

slow: 
	nothing: 82 cycles
	locked add: 84 cycles
	cpuid: 122 cycles

which corresponds to a 633MHz part going down to 300MHz in idle.

And of course, you can get pretty much anything in between, depending on
what the load is...

		Linus

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-26 17:40                       ` Alan Cox
  2001-09-26 17:44                         ` Linus Torvalds
  2001-09-26 18:01                         ` Dave Jones
@ 2001-09-26 20:20                         ` Vojtech Pavlik
  2001-09-26 20:24                           ` Vojtech Pavlik
  2 siblings, 1 reply; 67+ messages in thread
From: Vojtech Pavlik @ 2001-09-26 20:20 UTC (permalink / raw)
  To: Alan Cox
  Cc: Linus Torvalds, David S. Miller, bcrl, marcelo, andrea,
	linux-kernel

On Wed, Sep 26, 2001 at 06:40:15PM +0100, Alan Cox wrote:

> > 	PIII:
> > 		nothing: 32 cycles
> > 		locked add: 50 cycles
> > 		cpuid: 170 cycles
> > 
> > 	P4:
> > 		nothing: 80 cycles
> > 		locked add: 184 cycles
> > 		cpuid: 652 cycles
> 
> 
> Original core Athlon (step 2 and earlier)
> 
> nothing: 11 cycles
> locked add: 22 cycles
> cpuid: 67 cycles
> 
> generic Athlon is
> 
> nothing: 11 cycles
> locked add: 11 cycles
> cpuid: 64 cycles

Interestingly enough, my TBird 1.1G insist on cpuid being somewhat
slower:

nothing: 11 cycles
locked add: 11 cycles
cpuid: 87 cycles

-- 
Vojtech Pavlik
SuSE Labs

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-26 20:20                         ` Vojtech Pavlik
@ 2001-09-26 20:24                           ` Vojtech Pavlik
  0 siblings, 0 replies; 67+ messages in thread
From: Vojtech Pavlik @ 2001-09-26 20:24 UTC (permalink / raw)
  To: Alan Cox
  Cc: Linus Torvalds, David S. Miller, bcrl, marcelo, andrea,
	linux-kernel

On Wed, Sep 26, 2001 at 10:20:21PM +0200, Vojtech Pavlik wrote:

> > generic Athlon is
> > 
> > nothing: 11 cycles
> > locked add: 11 cycles
> > cpuid: 64 cycles
> 
> Interestingly enough, my TBird 1.1G insist on cpuid being somewhat
> slower:
> 
> nothing: 11 cycles
> locked add: 11 cycles
> cpuid: 87 cycles

Oops, this is indeed just a difference in compiler options.

-- 
Vojtech Pavlik
SuSE Labs

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-26 17:25                     ` Linus Torvalds
                                         ` (3 preceding siblings ...)
  2001-09-26 17:50                       ` Alan Cox
@ 2001-09-26 23:26                       ` David S. Miller
  2001-09-27 12:10                         ` Alan Cox
  4 siblings, 1 reply; 67+ messages in thread
From: David S. Miller @ 2001-09-26 23:26 UTC (permalink / raw)
  To: torvalds; +Cc: alan, bcrl, marcelo, andrea, linux-kernel

   From: Linus Torvalds <torvalds@transmeta.com>
   Date: Wed, 26 Sep 2001 10:25:18 -0700 (PDT)

   
   On Wed, 26 Sep 2001, Alan Cox wrote:
   > >
   > > Your Athlons may handle exclusive cache line acquisition more
   > > efficiently (due to memory subsystem performance) but it still
   > > does cost something.
   >
   > On an exclusive line on Athlon a lock cycle is near enough free, its
   > just an ordering constraint. Since the line is in E state no other bus
   > master can hold a copy in cache so the atomicity is there. Ditto for newer
   > Intel processors
   
   You misunderstood the problem, I think: when the line moves from one CPU
   to the other (the exclusive state moves along with it), that is
   _expensive_.

Yes, this was my intended point.  Please see my quoted text above and
note the "exclusive cache line acquisition" with emphasis on the word
"acquisition" meaning you don't have the cache line in E state yet.

Franks a lot,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
       [not found]   ` <i1m66a5o1zc.fsf@verden.pvv.ntnu.no>
@ 2001-09-27  1:34     ` Vojtech Pavlik
  0 siblings, 0 replies; 67+ messages in thread
From: Vojtech Pavlik @ 2001-09-27  1:34 UTC (permalink / raw)
  To: Trond Eivind Glomsr?d; +Cc: linux-kernel

On Thu, Sep 27, 2001 at 03:29:27AM +0200, Trond Eivind Glomsr?d wrote:

> Vojtech Pavlik <vojtech@suse.cz> writes:
> 
> > On Wed, Sep 26, 2001 at 10:20:21PM +0200, Vojtech Pavlik wrote:
> > 
> > > > generic Athlon is
> > > > 
> > > > nothing: 11 cycles
> > > > locked add: 11 cycles
> > > > cpuid: 64 cycles
> > > 
> > > Interestingly enough, my TBird 1.1G insist on cpuid being somewhat
> > > slower:
> > > 
> > > nothing: 11 cycles
> > > locked add: 11 cycles
> > > cpuid: 87 cycles
> > 
> > Oops, this is indeed just a difference in compiler options.
> 
> No, it's not:
> 
> [teg@xyzzy teg]$ ./t
> nothing: 11 cycles
> locked add: 11 cycles
> cpuid: 64 cycles
> [teg@xyzzy teg]$ ./t
> nothing: 11 cycles
> locked add: 11 cycles
> cpuid: 64 cycles
> [teg@xyzzy teg]$ 
> [teg@xyzzy teg]$ ./t
> nothing: 11 cycles
> locked add: 11 cycles
> cpuid: 87 cycles
> [teg@xyzzy teg]$ ./t
> nothing: 11 cycles
> locked add: 11 cycles
> cpuid: 87 cycles
> [teg@xyzzy teg]$ ./t
> nothing: 11 cycles
> locked add: 11 cycles
> cpuid: 64 cycles

Interesting: Try while true; do t; done and watch it change between 64
and 87 every 2.5 seconds ... :)

-- 
Vojtech Pavlik
SuSE Labs

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-26 23:26                       ` David S. Miller
@ 2001-09-27 12:10                         ` Alan Cox
  2001-09-27 15:38                           ` Linus Torvalds
  2001-09-27 19:41                           ` David S. Miller
  0 siblings, 2 replies; 67+ messages in thread
From: Alan Cox @ 2001-09-27 12:10 UTC (permalink / raw)
  To: David S. Miller; +Cc: torvalds, alan, bcrl, marcelo, andrea, linux-kernel

> Yes, this was my intended point.  Please see my quoted text above and
> note the "exclusive cache line acquisition" with emphasis on the word
> "acquisition" meaning you don't have the cache line in E state yet.

See prefetching - the CPU prefetching will hide some of the effect and
the spin_lock_prefetch() macro does wonders for the rest.

Alan


^ permalink raw reply	[flat|nested] 67+ messages in thread

* CPU frequency shifting "problems"
  2001-09-26 19:12                               ` Linus Torvalds
@ 2001-09-27 12:22                                 ` Padraig Brady
  2001-09-27 12:44                                   ` Dave Jones
  2001-09-27 23:23                                   ` Linus Torvalds
  0 siblings, 2 replies; 67+ messages in thread
From: Padraig Brady @ 2001-09-27 12:22 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

  Linus Torvalds wrote:

>In article <Pine.LNX.4.30.0109262036480.8655-100000@Appserv.suse.de>,
>Dave Jones  <davej@suse.de> wrote:
>
>>On Wed, 26 Sep 2001, Linus Torvalds wrote:
>>
>>>>>cpuid: 72 cycles
>>>>>
>>>>cpuid: 79 cycles
>>>>Only slightly worse, but I'd not expected this.
>>>>
>>>That difference can easily be explained by the compiler and options.
>>>
>>Actually repeated runs of the test on that box show it deviating by up
>>to 10 cycles, making it match the results that Alan posted.
>>-O2 made no difference, these deviations still occur. They seem more
>>prominent on the C3 than other boxes I've tried, even with the same
>>compiler toolchain.
>>
>
>Does the C3 do any kind of frequency shifting?
>
Not automatic, but you can set the multiplier dynamically by setting the 
msr.
Russell King has been working on an arch independent framework for this
kind of thing and support for the C3 has recently been added by Dave Jones.
The code is available @:
cvs -d :pserver:cvs@pubcvs.arm.linux.org.uk:/mnt/src/cvsroot login
cvs -d :pserver:cvs@pubcvs.arm.linux.org.uk:/mnt/src/cvsroot co cpufreq

>
>For example, on a transmeta CPU, the TSC will run at a constant
>"nominal" speed (the highest the CPU can go), although the real CPU
>speed will depend on the load of the machine and temperature etc.
>
As does the P4 from what I understand. So a question..
What are the software dependencies on this auto/manual frequency shifting?
The code referenced above scales jiffies appropriately when a manual
frequency change is requested. I'm not sure about the possible consequences
of this for e.g. could there be races introduced with various busy loop 
locking etc.
A quick check for the use of jiffies in the kernel:
[padraig@pixelbeat linux]$ find -name "*.[ch]" | xargs grep jiffies | wc -l
   3992

Also with the auto shifting of the transmeta/P4, wont this invalidate 
the jiffies
value? Also how does this affect the rtLinux guys (and realtime software
in general).

cheers,
Padraig.

>  So on
>a crusoe CPU you'll see varying speeds (and it depends on the speed
>grade, because that in turn depends on how many longrun steps are being
>actively used). 
>
>For example, on a mostly idle machine I get
>
>	torvalds@kiwi:~ > ./a.out 
>	nothing: 54 cycles
>	locked add: 54 cycles
>	cpuid: 91 cycles
>
>while if I have another window that does an endless loop to keep the CPU
>busy, the _real_ frequency of the CPU scales up, and the machine
>basically becomes faster:
>
>	torvalds@kiwi:~ > ./a.out 
>	nothing: 36 cycles
>	locked add: 36 cycles
>	cpuid: 54 cycles
>
>(The reason why the "nothing" TSC read is expensive on crusoe is because
>of the scaling of the TSC - rdtsc literally has to do a floating point
>multiply-add to scale the clock to the right "nominal" frequency.  Of
>course, "expensive" is still a lot less than the inexplicable 80 cycles
>on a P4). 
>
>(That's a 600MHz part 
>going down to to 400MHz in idle, btw)
>
>On a 633MHz part (I don't actually have access to any of the high speed
>grades ;) it ends up being 
>
>fast:
>	nothing: 39 cycles
>	locked add: 40 cycles
>	cpuid: 68 cycles
>
>slow: 
>	nothing: 82 cycles
>	locked add: 84 cycles
>	cpuid: 122 cycles
>
>which corresponds to a 633MHz part going down to 300MHz in idle.
>
>And of course, you can get pretty much anything in between, depending on
>what the load is...
>
>		Linus
>





^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: CPU frequency shifting "problems"
  2001-09-27 12:22                                 ` CPU frequency shifting "problems" Padraig Brady
@ 2001-09-27 12:44                                   ` Dave Jones
  2001-09-27 23:23                                   ` Linus Torvalds
  1 sibling, 0 replies; 67+ messages in thread
From: Dave Jones @ 2001-09-27 12:44 UTC (permalink / raw)
  To: Padraig Brady; +Cc: Linux Kernel Mailing List

On Thu, 27 Sep 2001, Padraig Brady wrote:

> >Does the C3 do any kind of frequency shifting?
> Not automatic, but you can set the multiplier dynamically by setting the
> msr.
> Russell King has been working on an arch independent framework for this
> kind of thing and support for the C3 has recently been added by Dave Jones.

If you're going to try this out on a C3 btw, heed the warning at the
top of the code :) This still needs quite a bit of work.
I just need to find the time to sit down and finish it.
(The x86 bits are all thats preventing Russell from saying
 "This is ready" iirc, so I should get that finished at some point soon)

I'd like to add Transmeta Longrun support to it too, but that can
come later, when I get access to one.

regards,

Dave.

-- 
| Dave Jones.        http://www.suse.de/~davej
| SuSE Labs


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-27 12:10                         ` Alan Cox
@ 2001-09-27 15:38                           ` Linus Torvalds
  2001-09-27 17:44                             ` Ingo Molnar
  2001-09-27 19:41                           ` David S. Miller
  1 sibling, 1 reply; 67+ messages in thread
From: Linus Torvalds @ 2001-09-27 15:38 UTC (permalink / raw)
  To: Alan Cox; +Cc: David S. Miller, bcrl, marcelo, andrea, linux-kernel


On Thu, 27 Sep 2001, Alan Cox wrote:
>
> > Yes, this was my intended point.  Please see my quoted text above and
> > note the "exclusive cache line acquisition" with emphasis on the word
> > "acquisition" meaning you don't have the cache line in E state yet.
>
> See prefetching - the CPU prefetching will hide some of the effect and
> the spin_lock_prefetch() macro does wonders for the rest.

prefetching and friends won't do _anything_ for the case of a cache line
bouncing back and forth between CPU's.

In fact, it can easily make things _worse_, simply by having bouncing
happen even more (you bounce it into the CPU for the prefetch, another CPU
bounces it back, and you bounce it in again for the actual lock).

And this isn't at all unlikely if you have a lock that is accessed a _lot_
but held only for short times.

Now, I'm not convinced that pagecache_lock is _that_ critical yet, but is
it one of the top ones? Definitely.

		Linus


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-27 15:38                           ` Linus Torvalds
@ 2001-09-27 17:44                             ` Ingo Molnar
  0 siblings, 0 replies; 67+ messages in thread
From: Ingo Molnar @ 2001-09-27 17:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, David S. Miller, bcrl, marcelo, Andrea Arcangeli,
	linux-kernel


On Thu, 27 Sep 2001, Linus Torvalds wrote:

> prefetching and friends won't do _anything_ for the case of a cache
> line bouncing back and forth between CPU's.

yep. that is exactly what was happening with pagecache_lock, while an
8-way system served 300+ MB/sec worth of SPECweb99 HTTP content in 1500
byte packets. Under that kind of workload the pagecache is used
read-mostly, and due to zerocopy (and Linux's hyper-scalable networking
code) there isnt much left that pollutes caches and/or inhibits raw
performance in any way. pagecache_lock was the top non-conceptual
cacheline-miss offender in instruction-level profiles of such workloads.
Does it show up on a dual PIII with 128 MB RAM? Probably not as strongly.
Are there other offenders under other kinds of workloads that have a
bigger effect than pagecache_lock? Probably yes - but this does not
justify ignoring the effects of pagecache_lock.

(to be precise there was another offender - timerlist_lock, we've fixed it
before fixing pagecache_lock, and posted a patch for that one too. It's
available under http://redhat.com/~mingo/scalable-timers/. I know no other
scalability offenders for read-mostly pagecache & network-intensive
workloads for the time being.)

	Ingo


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-27 12:10                         ` Alan Cox
  2001-09-27 15:38                           ` Linus Torvalds
@ 2001-09-27 19:41                           ` David S. Miller
  2001-09-27 22:59                             ` Alan Cox
  1 sibling, 1 reply; 67+ messages in thread
From: David S. Miller @ 2001-09-27 19:41 UTC (permalink / raw)
  To: alan; +Cc: torvalds, bcrl, marcelo, andrea, linux-kernel

   From: Alan Cox <alan@lxorguk.ukuu.org.uk>
   Date: Thu, 27 Sep 2001 13:10:49 +0100 (BST)

   See prefetching - the CPU prefetching will hide some of the effect and
   the spin_lock_prefetch() macro does wonders for the rest.
   
Well, if prefetching can do it faster than avoiding the transaction
altogether, I'm game :-)

Franks a lot,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Locking comment on shrink_caches()
  2001-09-27 19:41                           ` David S. Miller
@ 2001-09-27 22:59                             ` Alan Cox
  0 siblings, 0 replies; 67+ messages in thread
From: Alan Cox @ 2001-09-27 22:59 UTC (permalink / raw)
  To: David S. Miller; +Cc: alan, torvalds, bcrl, marcelo, andrea, linux-kernel

>    See prefetching - the CPU prefetching will hide some of the effect and
>    the spin_lock_prefetch() macro does wonders for the rest.
>    
> Well, if prefetching can do it faster than avoiding the transaction
> altogether, I'm game :-)

That would depend on the cost of avoidance, the amount of contention and
the distance ahead you can fetch. Avoiding it also rather more portable so
I suspect you win 


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: CPU frequency shifting "problems"
  2001-09-27 12:22                                 ` CPU frequency shifting "problems" Padraig Brady
  2001-09-27 12:44                                   ` Dave Jones
@ 2001-09-27 23:23                                   ` Linus Torvalds
  2001-09-28  0:55                                     ` Alan Cox
  2001-09-28  8:55                                     ` Jamie Lokier
  1 sibling, 2 replies; 67+ messages in thread
From: Linus Torvalds @ 2001-09-27 23:23 UTC (permalink / raw)
  To: Padraig Brady; +Cc: linux-kernel


On Thu, 27 Sep 2001, Padraig Brady wrote:
>
> >
> >For example, on a transmeta CPU, the TSC will run at a constant
> >"nominal" speed (the highest the CPU can go), although the real CPU
> >speed will depend on the load of the machine and temperature etc.
>
> As does the P4 from what I understand.

That might explain why the P4 "rdtsc" is so slow.

> So a question..
> What are the software dependencies on this auto/manual frequency shifting?

None. At least not as long as the CPU _does_ do it automatically, and the
TSC appears to run at a constant speed even if the CPU does not.

For example, the Intel "SpeedStep" CPU's are completely broken under
Linux, and real-time will advance at different speeds in DC and AC modes,
because Intel actually changes the frequency of the TSC _and_ they don't
document how to figure out that it changed.

With a CPU that does makes TSC appear constant-frequency, the fact that
the CPU itself can go faster/slower doesn't matter - from a kernel
perspective that's pretty much equivalent to the different speeds you get
from cache miss behaviour etc.

			Linus


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: CPU frequency shifting "problems"
  2001-09-27 23:23                                   ` Linus Torvalds
@ 2001-09-28  0:55                                     ` Alan Cox
  2001-09-28  2:12                                       ` Stefan Smietanowski
  2001-09-28  8:55                                     ` Jamie Lokier
  1 sibling, 1 reply; 67+ messages in thread
From: Alan Cox @ 2001-09-28  0:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Padraig Brady, linux-kernel

> For example, the Intel "SpeedStep" CPU's are completely broken under
> Linux, and real-time will advance at different speeds in DC and AC modes,
> because Intel actually changes the frequency of the TSC _and_ they don't
> document how to figure out that it changed.

The change is APM or ACPI initiated. Intel won't tell anyone anything
useful but Microsoft have published some of the required intel confidential
information which helps a bit

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: CPU frequency shifting "problems"
  2001-09-28  0:55                                     ` Alan Cox
@ 2001-09-28  2:12                                       ` Stefan Smietanowski
  0 siblings, 0 replies; 67+ messages in thread
From: Stefan Smietanowski @ 2001-09-28  2:12 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

Hey.

>>For example, the Intel "SpeedStep" CPU's are completely broken under
>>Linux, and real-time will advance at different speeds in DC and AC modes,
>>because Intel actually changes the frequency of the TSC _and_ they don't
>>document how to figure out that it changed.
> 
> The change is APM or ACPI initiated. Intel won't tell anyone anything
> useful but Microsoft have published some of the required intel confidential
> information which helps a bit

Did you just say that Microsoft actually went and did something right 
for a change? As in publishing specs I mean.

*Stands in awe*

:)

// Stefan



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: CPU frequency shifting "problems"
  2001-09-27 23:23                                   ` Linus Torvalds
  2001-09-28  0:55                                     ` Alan Cox
@ 2001-09-28  8:55                                     ` Jamie Lokier
  2001-09-28 16:11                                       ` Linus Torvalds
  1 sibling, 1 reply; 67+ messages in thread
From: Jamie Lokier @ 2001-09-28  8:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Padraig Brady, linux-kernel

Linus Torvalds wrote:
> With a CPU that does makes TSC appear constant-frequency, the fact that
> the CPU itself can go faster/slower doesn't matter - from a kernel
> perspective that's pretty much equivalent to the different speeds you get
> from cache miss behaviour etc.

On a Transmeta chip, does the TSC clock advance _exactly_ uniformly, or
is there a cumulative error due to speed changes?

I'll clarify.  I imagine that the internal clocks are driven by PLLs,
DLLs or something similar.  Unless multiple oscillators are used, this
means that speed switching is gradual, over several hundred or many more
clock cycles.

You said that Crusoe does a floating point op to scale the TSC value.
Now suppose I have a 600MHz Crusoe.  I calibrate the clock and it comes
out as 600.01MHz.

I can now use `rdtsc' to measure time in userspace, rather more
accurately than gettimeofday().  (In fact I have worked with programs
that do this, for network traffic injection.).  I can do this over a
period of minutes, expecting the clock to match "wall clock" time
reasonably accurately.

Suppose the CPU clock speed changes.  Can I be confident that
600.01*10^6 (+/- small tolerance) cycles will still be counted per
second, or is there a cumulative error due to the gradual clock speed
change and the floating-point scale factor not integrating the gradual
change precisely?

(One hardware implementation that doesn't have this problem is to run a
small counter, say 3 or 4 bits, at the nominal clock speed all the time,
and have the slower core sample that.  But it may use a little more
power, and your note about FP scaling tells me you don't do that).

thanks,
-- Jamie

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: CPU frequency shifting "problems"
  2001-09-28  8:55                                     ` Jamie Lokier
@ 2001-09-28 16:11                                       ` Linus Torvalds
  2001-09-28 20:29                                         ` Eric W. Biederman
  0 siblings, 1 reply; 67+ messages in thread
From: Linus Torvalds @ 2001-09-28 16:11 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Padraig Brady, linux-kernel


On Fri, 28 Sep 2001, Jamie Lokier wrote:
>
> On a Transmeta chip, does the TSC clock advance _exactly_ uniformly, or
> is there a cumulative error due to speed changes?
>
> I'll clarify.  I imagine that the internal clocks are driven by PLLs,
> DLLs or something similar.  Unless multiple oscillators are used, this
> means that speed switching is gradual, over several hundred or many more
> clock cycles.

Basically, there's the "slow" timer, and the fast one. The slow one always
runs, and fast one gives the precision but runs at CPU speed.

So yes, there are multiple oscillators, and no, they should not drift on
frequency shifting, because the slow and constant one is used to scale the
fast one. So no cumulative errors.

HOWEVER, anybody who believes that TSC is a "truly accurate clock" will be
sadly mistaken on any machine. Even PLL's drift over time, and as
mentioned, Intel already broke the "you can use TSC as wall time" in their
SpeedStep implementation. Who knows what their future CPU's will do..

> I can now use `rdtsc' to measure time in userspace, rather more
> accurately than gettimeofday(). (In fact I have worked with programs
> that do this, for network traffic injection.).  I can do this over a
> period of minutes, expecting the clock to match "wall clock" time
> reasonably accurately.

It will work on Crusoe.

> (One hardware implementation that doesn't have this problem is to run a
> small counter, say 3 or 4 bits, at the nominal clock speed all the time,
> and have the slower core sample that.  But it may use a little more
> power, and your note about FP scaling tells me you don't do that).

We do that, but the other way around. The thing is, the "nominal clock
speed" doesn't even _exist_ when running normally.

What does exist is the bus clock (well, a multiple of it, but you get the
idea), and that one is stable. I bet PCI devices don't like to be randomly
driven at frequencies "somewhere between 12 and 33MHz" depending on load ;)

But because the stable frequency is the _slow_ one, you can't just scale
that up (well, you could - you could just run your cycle counter at 66MHz
all the time, and you couldn't measure smaller intervals, and people would
be really disappointed). So you need the scaling of the fast one..

		Linus


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: CPU frequency shifting "problems"
  2001-09-28 16:11                                       ` Linus Torvalds
@ 2001-09-28 20:29                                         ` Eric W. Biederman
  2001-09-28 22:24                                           ` Jamie Lokier
  0 siblings, 1 reply; 67+ messages in thread
From: Eric W. Biederman @ 2001-09-28 20:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jamie Lokier, Padraig Brady, linux-kernel

Linus Torvalds <torvalds@transmeta.com> writes:

> What does exist is the bus clock (well, a multiple of it, but you get the
> idea), and that one is stable. I bet PCI devices don't like to be randomly
> driven at frequencies "somewhere between 12 and 33MHz" depending on load ;)

I doubt they would like it but it is perfectly legal (PCI spec..) to
vary the pci clock, depending upon load.   


Eric

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: CPU frequency shifting "problems"
  2001-09-28 20:29                                         ` Eric W. Biederman
@ 2001-09-28 22:24                                           ` Jamie Lokier
  0 siblings, 0 replies; 67+ messages in thread
From: Jamie Lokier @ 2001-09-28 22:24 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Linus Torvalds, Padraig Brady, linux-kernel

Eric W. Biederman wrote:
> > What does exist is the bus clock (well, a multiple of it, but you get the
> > idea), and that one is stable. I bet PCI devices don't like to be randomly
> > driven at frequencies "somewhere between 12 and 33MHz" depending on load ;)
> 
> I doubt they would like it but it is perfectly legal (PCI spec..) to
> vary the pci clock, depending upon load.   

Yes it is.  Also, the PCI clock is frequency modulated to reduce
electrical interference.  (Or on a more cynical note, to pass the
official emissions tests ;-)

However it's common practice to PLL to the PCI clock, for clock
distribution on a board, so varying the frequency must be done in a
strictly constrained fashion.

-- Jamie

^ permalink raw reply	[flat|nested] 67+ messages in thread

end of thread, other threads:[~2001-09-28 22:25 UTC | newest]

Thread overview: 67+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-09-25 17:49 Locking comment on shrink_caches() Marcelo Tosatti
2001-09-25 19:57 ` David S. Miller
2001-09-25 18:40   ` Marcelo Tosatti
2001-09-25 20:15     ` David S. Miller
2001-09-25 19:02       ` Marcelo Tosatti
2001-09-25 20:29         ` David S. Miller
2001-09-25 21:00           ` Benjamin LaHaise
2001-09-25 21:55             ` David S. Miller
2001-09-25 22:16               ` Benjamin LaHaise
2001-09-25 22:28                 ` David S. Miller
2001-09-26 16:40                   ` Alan Cox
2001-09-26 17:25                     ` Linus Torvalds
2001-09-26 17:40                       ` Alan Cox
2001-09-26 17:44                         ` Linus Torvalds
2001-09-26 18:01                           ` Benjamin LaHaise
2001-09-26 18:01                         ` Dave Jones
2001-09-26 20:20                         ` Vojtech Pavlik
2001-09-26 20:24                           ` Vojtech Pavlik
2001-09-26 17:43                       ` Richard Gooch
2001-09-26 18:24                         ` Benjamin LaHaise
2001-09-26 18:48                           ` Richard Gooch
2001-09-26 18:58                             ` Davide Libenzi
2001-09-26 17:45                       ` Dave Jones
2001-09-26 17:50                       ` Alan Cox
2001-09-26 17:59                         ` Dave Jones
2001-09-26 18:07                           ` Alan Cox
2001-09-26 18:09                           ` Padraig Brady
2001-09-26 18:22                             ` Dave Jones
2001-09-26 18:24                           ` Linus Torvalds
2001-09-26 18:40                             ` Dave Jones
2001-09-26 19:12                               ` Linus Torvalds
2001-09-27 12:22                                 ` CPU frequency shifting "problems" Padraig Brady
2001-09-27 12:44                                   ` Dave Jones
2001-09-27 23:23                                   ` Linus Torvalds
2001-09-28  0:55                                     ` Alan Cox
2001-09-28  2:12                                       ` Stefan Smietanowski
2001-09-28  8:55                                     ` Jamie Lokier
2001-09-28 16:11                                       ` Linus Torvalds
2001-09-28 20:29                                         ` Eric W. Biederman
2001-09-28 22:24                                           ` Jamie Lokier
2001-09-26 19:04                             ` Locking comment on shrink_caches() George Greer
2001-09-26 18:59                         ` George Greer
2001-09-26 23:26                       ` David S. Miller
2001-09-27 12:10                         ` Alan Cox
2001-09-27 15:38                           ` Linus Torvalds
2001-09-27 17:44                             ` Ingo Molnar
2001-09-27 19:41                           ` David S. Miller
2001-09-27 22:59                             ` Alan Cox
2001-09-25 22:03             ` Andrea Arcangeli
2001-09-25 20:24       ` Rik van Riel
2001-09-25 20:28         ` David S. Miller
2001-09-25 21:05           ` Andrew Morton
2001-09-25 21:48             ` David S. Miller
     [not found]         ` <200109252215.f8PMFDa02034@eng2.beaverton.ibm.com>
2001-09-25 22:26           ` David S. Miller
2001-09-26 17:42             ` Ingo Molnar
2001-09-25 22:01       ` Andrea Arcangeli
2001-09-25 22:03         ` David S. Miller
2001-09-25 22:59           ` Andrea Arcangeli
2001-09-25 20:40     ` Josh MacDonald
2001-09-25 19:25       ` Marcelo Tosatti
2001-09-25 21:57   ` Andrea Arcangeli
  -- strict thread matches above, loose matches on Subject: below --
2001-09-26  5:04 Dipankar Sarma
2001-09-26  5:31 ` Andrew Morton
2001-09-26  6:57   ` David S. Miller
2001-09-26  7:08   ` Dipankar Sarma
2001-09-26 16:52   ` John Hawkes
     [not found] <fa.cbgmt3v.192gc8r@ifi.uio.no>
     [not found] ` <fa.cd0mtbv.1aigc0v@ifi.uio.no>
     [not found]   ` <i1m66a5o1zc.fsf@verden.pvv.ntnu.no>
2001-09-27  1:34     ` Vojtech Pavlik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox