* Locking comment on shrink_caches()
@ 2001-09-25 17:49 Marcelo Tosatti
2001-09-25 19:57 ` David S. Miller
0 siblings, 1 reply; 67+ messages in thread
From: Marcelo Tosatti @ 2001-09-25 17:49 UTC (permalink / raw)
To: Andrea Arcangeli, Linus Torvalds; +Cc: lkml
Andrea,
Do you really need to do this ?
if (unlikely(!spin_trylock(&pagecache_lock))) {
/* we hold the page lock so the page cannot go away from under us */
spin_unlock(&pagemap_lru_lock);
spin_lock(&pagecache_lock);
spin_lock(&pagemap_lru_lock);
}
Have you actually seen bad hold times of pagecache_lock by
shrink_caches() ?
Its just that I prefer clear locking without those "tricks". (easier to
understand and harder to miss subtle details)
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: Locking comment on shrink_caches() 2001-09-25 17:49 Locking comment on shrink_caches() Marcelo Tosatti @ 2001-09-25 19:57 ` David S. Miller 2001-09-25 18:40 ` Marcelo Tosatti 2001-09-25 21:57 ` Andrea Arcangeli 0 siblings, 2 replies; 67+ messages in thread From: David S. Miller @ 2001-09-25 19:57 UTC (permalink / raw) To: marcelo; +Cc: andrea, torvalds, linux-kernel From: Marcelo Tosatti <marcelo@conectiva.com.br> Date: Tue, 25 Sep 2001 14:49:40 -0300 (BRT) Do you really need to do this ? if (unlikely(!spin_trylock(&pagecache_lock))) { /* we hold the page lock so the page cannot go away from under us */ spin_unlock(&pagemap_lru_lock); spin_lock(&pagecache_lock); spin_lock(&pagemap_lru_lock); } Have you actually seen bad hold times of pagecache_lock by shrink_caches() ? Marcelo, this is needed because of the spin lock ordering rules. The pagecache_lock must be obtained before the pagemap_lru_lock or else deadlock is possible. The spin_trylock is an optimization. Franks a lot, David S. Miller davem@redhat.com ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-25 19:57 ` David S. Miller @ 2001-09-25 18:40 ` Marcelo Tosatti 2001-09-25 20:15 ` David S. Miller 2001-09-25 20:40 ` Josh MacDonald 2001-09-25 21:57 ` Andrea Arcangeli 1 sibling, 2 replies; 67+ messages in thread From: Marcelo Tosatti @ 2001-09-25 18:40 UTC (permalink / raw) To: David S. Miller; +Cc: andrea, torvalds, linux-kernel On Tue, 25 Sep 2001, David S. Miller wrote: > From: Marcelo Tosatti <marcelo@conectiva.com.br> > Date: Tue, 25 Sep 2001 14:49:40 -0300 (BRT) > > Do you really need to do this ? > > if (unlikely(!spin_trylock(&pagecache_lock))) { > /* we hold the page lock so the page cannot go away from under us */ > spin_unlock(&pagemap_lru_lock); > > spin_lock(&pagecache_lock); > spin_lock(&pagemap_lru_lock); > } > > Have you actually seen bad hold times of pagecache_lock by > shrink_caches() ? > > Marcelo, this is needed because of the spin lock ordering rules. > The pagecache_lock must be obtained before the pagemap_lru_lock > or else deadlock is possible. The spin_trylock is an optimization. Not, it is not. We can simply lock the pagecachelock and the pagemap_lru_lock at the beginning of the cleaning function. page_launder() use to do that. Thats why I asked Andrea if there was long hold times by shrink_caches(). ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-25 18:40 ` Marcelo Tosatti @ 2001-09-25 20:15 ` David S. Miller 2001-09-25 19:02 ` Marcelo Tosatti ` (2 more replies) 2001-09-25 20:40 ` Josh MacDonald 1 sibling, 3 replies; 67+ messages in thread From: David S. Miller @ 2001-09-25 20:15 UTC (permalink / raw) To: marcelo; +Cc: andrea, torvalds, linux-kernel From: Marcelo Tosatti <marcelo@conectiva.com.br> Date: Tue, 25 Sep 2001 15:40:23 -0300 (BRT) We can simply lock the pagecachelock and the pagemap_lru_lock at the beginning of the cleaning function. page_launder() use to do that. Thats why I asked Andrea if there was long hold times by shrink_caches(). Ok, I see. I do think it's silly to hold the pagecache_lock during pure scanning activities of shrink_caches(). It is known that pagecache_lock is the biggest scalability issue on large SMP systems, and thus the page cache locking patches Ingo and myself did. Franks a lot, David S. Miller davem@redhat.com ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-25 20:15 ` David S. Miller @ 2001-09-25 19:02 ` Marcelo Tosatti 2001-09-25 20:29 ` David S. Miller 2001-09-25 20:24 ` Rik van Riel 2001-09-25 22:01 ` Andrea Arcangeli 2 siblings, 1 reply; 67+ messages in thread From: Marcelo Tosatti @ 2001-09-25 19:02 UTC (permalink / raw) To: David S. Miller; +Cc: andrea, torvalds, linux-kernel On Tue, 25 Sep 2001, David S. Miller wrote: > From: Marcelo Tosatti <marcelo@conectiva.com.br> > Date: Tue, 25 Sep 2001 15:40:23 -0300 (BRT) > > We can simply lock the pagecachelock and the pagemap_lru_lock at the > beginning of the cleaning function. page_launder() use to do that. > > Thats why I asked Andrea if there was long hold times by shrink_caches(). > > Ok, I see. > > I do think it's silly to hold the pagecache_lock during pure scanning > activities of shrink_caches(). It may well be, but I would like to see some lockmeter results which show that _shrink_cache()_ itself is a problem. :) > It is known that pagecache_lock is the biggest scalability issue on > large SMP systems, and thus the page cache locking patches Ingo and > myself did. Btw, is that one going into 2.5 for sure? (the per-address-space lock). ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-25 19:02 ` Marcelo Tosatti @ 2001-09-25 20:29 ` David S. Miller 2001-09-25 21:00 ` Benjamin LaHaise 0 siblings, 1 reply; 67+ messages in thread From: David S. Miller @ 2001-09-25 20:29 UTC (permalink / raw) To: marcelo; +Cc: andrea, torvalds, linux-kernel From: Marcelo Tosatti <marcelo@conectiva.com.br> Date: Tue, 25 Sep 2001 16:02:29 -0300 (BRT) > It is known that pagecache_lock is the biggest scalability issue on > large SMP systems, and thus the page cache locking patches Ingo and > myself did. Btw, is that one going into 2.5 for sure? (the per-address-space lock). Well, there are two things happing in that patch. Per-hash chain locks for the page cache itself, and the lock added to the address space for that page list. Linus has indicated it will go into 2.5.x, yes. Franks a lot, David S. Miller davem@redhat.com ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-25 20:29 ` David S. Miller @ 2001-09-25 21:00 ` Benjamin LaHaise 2001-09-25 21:55 ` David S. Miller 2001-09-25 22:03 ` Andrea Arcangeli 0 siblings, 2 replies; 67+ messages in thread From: Benjamin LaHaise @ 2001-09-25 21:00 UTC (permalink / raw) To: David S. Miller; +Cc: marcelo, andrea, torvalds, linux-kernel On Tue, Sep 25, 2001 at 01:29:05PM -0700, David S. Miller wrote: > Well, there are two things happing in that patch. Per-hash chain > locks for the page cache itself, and the lock added to the address > space for that page list. Last time I looked, those patches made the already ugly vm locking even worse. I'd rather try to use some of the rcu techniques for page cache lookup, and per-page locking for page cache removal which will lead to *cleaner* code as well as a much more scalable kernel. Keep in mind that just because a lock is on someone's hitlist doesn't mean that it is for the right reasons. Look at the io_request_lock that is held around the bounce buffer copies in the scsi midlayer. *shudder* -ben ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-25 21:00 ` Benjamin LaHaise @ 2001-09-25 21:55 ` David S. Miller 2001-09-25 22:16 ` Benjamin LaHaise 2001-09-25 22:03 ` Andrea Arcangeli 1 sibling, 1 reply; 67+ messages in thread From: David S. Miller @ 2001-09-25 21:55 UTC (permalink / raw) To: bcrl; +Cc: marcelo, andrea, torvalds, linux-kernel From: Benjamin LaHaise <bcrl@redhat.com> Date: Tue, 25 Sep 2001 17:00:55 -0400 Last time I looked, those patches made the already ugly vm locking even worse. I'd rather try to use some of the rcu techniques for page cache lookup, and per-page locking for page cache removal which will lead to *cleaner* code as well as a much more scalable kernel. I'm willing to investigate using RCU. However, per hashchain locking is a much proven technique (inside the networking in particular) which is why that was the method employed. At the time the patch was implemented, the RCU stuff was not fully formulated. Please note that the problem is lock cachelines in dirty exclusive state, not a "lock held for long time" issue. Keep in mind that just because a lock is on someone's hitlist doesn't mean that it is for the right reasons. Look at the io_request_lock that is held around the bounce buffer copies in the scsi midlayer. *shudder* I agree. But to my understanding, and after having studied the pagecache lock usage, it was minimally used and not used in any places unnecessarily as per the io_request_lock example you are stating. In fact, the pagecache_lock is mostly held for extremely short periods of time. Franks a lot, David S. Miller davem@redhat.com ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-25 21:55 ` David S. Miller @ 2001-09-25 22:16 ` Benjamin LaHaise 2001-09-25 22:28 ` David S. Miller 0 siblings, 1 reply; 67+ messages in thread From: Benjamin LaHaise @ 2001-09-25 22:16 UTC (permalink / raw) To: David S. Miller; +Cc: marcelo, andrea, torvalds, linux-kernel On Tue, Sep 25, 2001 at 02:55:47PM -0700, David S. Miller wrote: > I'm willing to investigate using RCU. However, per hashchain locking > is a much proven technique (inside the networking in particular) which > is why that was the method employed. At the time the patch was > implemented, the RCU stuff was not fully formulated. *nod* > Please note that the problem is lock cachelines in dirty exclusive > state, not a "lock held for long time" issue. Ahh, that's a cpu bug -- one my athlons don't suffer from. > I agree. But to my understanding, and after having studied the > pagecache lock usage, it was minimally used and not used in any places > unnecessarily as per the io_request_lock example you are stating. > > In fact, the pagecache_lock is mostly held for extremely short periods > of time. True, and that is why I would like to see more of the research that justifies these changes, as well as comparisons with alternate techniques before any of these patches make it into the base tree. Even before that, we need to clean up the code first. -ben ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-25 22:16 ` Benjamin LaHaise @ 2001-09-25 22:28 ` David S. Miller 2001-09-26 16:40 ` Alan Cox 0 siblings, 1 reply; 67+ messages in thread From: David S. Miller @ 2001-09-25 22:28 UTC (permalink / raw) To: bcrl; +Cc: marcelo, andrea, torvalds, linux-kernel From: Benjamin LaHaise <bcrl@redhat.com> Date: Tue, 25 Sep 2001 18:16:43 -0400 > Please note that the problem is lock cachelines in dirty exclusive > state, not a "lock held for long time" issue. Ahh, that's a cpu bug -- one my athlons don't suffer from. Your Athlons may handle exclusive cache line acquisition more efficiently (due to memory subsystem performance) but it still does cost something. True, and that is why I would like to see more of the research that justifies these changes, as well as comparisons with alternate techniques before any of these patches make it into the base tree. Even before that, we need to clean up the code first. As an aside, I actually think the per-hashchain version of the pagecache locking is cleaner conceptually. The reason is that it makes it more clear that we are locking the "identity of page X" instead of "the page cache". Franks a lot, David S. Miller davem@redhat.com ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-25 22:28 ` David S. Miller @ 2001-09-26 16:40 ` Alan Cox 2001-09-26 17:25 ` Linus Torvalds 0 siblings, 1 reply; 67+ messages in thread From: Alan Cox @ 2001-09-26 16:40 UTC (permalink / raw) To: David S. Miller; +Cc: bcrl, marcelo, andrea, torvalds, linux-kernel > Ahh, that's a cpu bug -- one my athlons don't suffer from. > > Your Athlons may handle exclusive cache line acquisition more > efficiently (due to memory subsystem performance) but it still > does cost something. On an exclusive line on Athlon a lock cycle is near enough free, its just an ordering constraint. Since the line is in E state no other bus master can hold a copy in cache so the atomicity is there. Ditto for newer Intel processors ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-26 16:40 ` Alan Cox @ 2001-09-26 17:25 ` Linus Torvalds 2001-09-26 17:40 ` Alan Cox ` (4 more replies) 0 siblings, 5 replies; 67+ messages in thread From: Linus Torvalds @ 2001-09-26 17:25 UTC (permalink / raw) To: Alan Cox; +Cc: David S. Miller, bcrl, marcelo, andrea, linux-kernel [-- Attachment #1: Type: TEXT/PLAIN, Size: 1697 bytes --] On Wed, 26 Sep 2001, Alan Cox wrote: > > > > Your Athlons may handle exclusive cache line acquisition more > > efficiently (due to memory subsystem performance) but it still > > does cost something. > > On an exclusive line on Athlon a lock cycle is near enough free, its > just an ordering constraint. Since the line is in E state no other bus > master can hold a copy in cache so the atomicity is there. Ditto for newer > Intel processors You misunderstood the problem, I think: when the line moves from one CPU to the other (the exclusive state moves along with it), that is _expensive_. Even when you have a backside bus (or cache pushout content snooping) to allow the cacheline to move directly from one CPU to the other without having to go through memory, that's a really expensive thing to do. So re-aquring the lock on the same CPU is pretty much free (18 cycles for Intel, if I remember correctly, and that's _entirely_ due to the pipeline flush to ensure in-order execution around it). [ Oh, just for interest I checked my P4, which has a much longer pipeline: the cost of an exclusive locked access is a whopping 104 cycles. But we already knew that the first-generation P4 does badly on many things. Just reading the cycle counter is apparently around 80 cycles on a P4, it's 32 cycles on a PIII. Looks like that also stalls the pipeline or something. But cpuid is _really_ horrible. Test out the attached program. PIII: nothing: 32 cycles locked add: 50 cycles cpuid: 170 cycles P4: nothing: 80 cycles locked add: 184 cycles cpuid: 652 cycles Remember: these are for the already-exclusive-cache cases. ] What are the athlon numbers? Linus [-- Attachment #2: Type: TEXT/PLAIN, Size: 612 bytes --] #define rdtsc(low) \ __asm__ __volatile__("rdtsc" : "=a" (low) : : "edx") #define TIME(x,y) \ min = 100000; \ for (i = 0; i < 1000; i++) { \ unsigned long start,end; \ rdtsc(start); \ x; \ rdtsc(end); \ end -= start; \ if (end < min) \ min = end; \ } \ printf(y ": %d cycles\n", min); #define LOCK asm volatile("lock ; addl $0,0(%esp)") #define CPUID asm volatile("cpuid": : :"ax", "dx", "cx", "bx") int main() { unsigned long min; int i; TIME(/* */, "nothing"); TIME(LOCK, "locked add"); TIME(CPUID, "cpuid"); } ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-26 17:25 ` Linus Torvalds @ 2001-09-26 17:40 ` Alan Cox 2001-09-26 17:44 ` Linus Torvalds ` (2 more replies) 2001-09-26 17:43 ` Richard Gooch ` (3 subsequent siblings) 4 siblings, 3 replies; 67+ messages in thread From: Alan Cox @ 2001-09-26 17:40 UTC (permalink / raw) To: Linus Torvalds Cc: Alan Cox, David S. Miller, bcrl, marcelo, andrea, linux-kernel > PIII: > nothing: 32 cycles > locked add: 50 cycles > cpuid: 170 cycles > > P4: > nothing: 80 cycles > locked add: 184 cycles > cpuid: 652 cycles Original core Athlon (step 2 and earlier) nothing: 11 cycles locked add: 22 cycles cpuid: 67 cycles generic Athlon is nothing: 11 cycles locked add: 11 cycles cpuid: 64 cycles I don't currently have a palomino core to test Wait for AMD to publish graphs of CPUid performance for PIV versus Athlon 8) Alan ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-26 17:40 ` Alan Cox @ 2001-09-26 17:44 ` Linus Torvalds 2001-09-26 18:01 ` Benjamin LaHaise 2001-09-26 18:01 ` Dave Jones 2001-09-26 20:20 ` Vojtech Pavlik 2 siblings, 1 reply; 67+ messages in thread From: Linus Torvalds @ 2001-09-26 17:44 UTC (permalink / raw) To: Alan Cox; +Cc: David S. Miller, bcrl, marcelo, andrea, linux-kernel On Wed, 26 Sep 2001, Alan Cox wrote: > > PIII: > > nothing: 32 cycles > > locked add: 50 cycles > > cpuid: 170 cycles > > > > P4: > > nothing: 80 cycles > > locked add: 184 cycles > > cpuid: 652 cycles > > > Original core Athlon (step 2 and earlier) > nothing: 11 cycles > locked add: 22 cycles > cpuid: 67 cycles > > generic Athlon: > nothing: 11 cycles > locked add: 11 cycles > cpuid: 64 cycles Do you have an actual SMP Athlon to test? I'd love to see if that "locked add" thing is really SMP-safe - it may be that it's the old "AMD turned off the 'lock' prefix synchronization because it doesn't matter in UP". They used to have a bit to do that.. That said, it _can_ be real even on SMP. There's no reason why a memory barrier would have to be as heavy as it is on some machines (even the P4 looks positively _fast_ compared to most older machines that did memory barriers on the bus and took hundreds of much slower cycles to do it). > Wait for AMD to publish graphs of CPUid performance for PIV versus Athlon 8) The sad thing is, I think Intel used to suggest that people use "cpuid" as the thing to serialize the cores. So people may actually be _using_ it for something like semaphores. I remember that Ingo or somebody suggested we'd use it for the Linux "mb()" macro - I _much_ prefer the saner locked zero add into the stack, and the prediction that Intel would be more likely to optimize for "add" than for "cpuid" certainly ended up being surprisingly true on the P4. Linus ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-26 17:44 ` Linus Torvalds @ 2001-09-26 18:01 ` Benjamin LaHaise 0 siblings, 0 replies; 67+ messages in thread From: Benjamin LaHaise @ 2001-09-26 18:01 UTC (permalink / raw) To: Linus Torvalds; +Cc: Alan Cox, David S. Miller, marcelo, andrea, linux-kernel On Wed, Sep 26, 2001 at 10:44:14AM -0700, Linus Torvalds wrote: > Do you have an actual SMP Athlon to test? I'd love to see if that "locked > add" thing is really SMP-safe - it may be that it's the old "AMD turned > off the 'lock' prefix synchronization because it doesn't matter in UP". > They used to have a bit to do that.. Same, my dual reports: [bcrl@toomuch ~]$ ./a.out nothing: 11 cycles locked add: 11 cycles cpuid: 68 cycles Which is pretty good. > That said, it _can_ be real even on SMP. There's no reason why a memory > barrier would have to be as heavy as it is on some machines (even the P4 > looks positively _fast_ compared to most older machines that did memory > barriers on the bus and took hundreds of much slower cycles to do it). I had discussions with a few people from intel about the p4 having much improved locking performance, including the ability to speculatively execute locked instructions. How much of that is enabled in the current cores is another question entirely (gotta love microcode patches). -ben ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-26 17:40 ` Alan Cox 2001-09-26 17:44 ` Linus Torvalds @ 2001-09-26 18:01 ` Dave Jones 2001-09-26 20:20 ` Vojtech Pavlik 2 siblings, 0 replies; 67+ messages in thread From: Dave Jones @ 2001-09-26 18:01 UTC (permalink / raw) To: Alan Cox Cc: Linus Torvalds, David S. Miller, bcrl, marcelo, andrea, linux-kernel On Wed, 26 Sep 2001, Alan Cox wrote: > Original core Athlon (step 2 and earlier) > > nothing: 11 cycles > locked add: 22 cycles > cpuid: 67 cycles > > I don't currently have a palomino core to test Exactly the same as the original core. nothing: 11 cycles locked add: 11 cycles cpuid: 67 cycles (cpuid varies 63->68) regards, Dave. -- | Dave Jones. http://www.suse.de/~davej | SuSE Labs ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-26 17:40 ` Alan Cox 2001-09-26 17:44 ` Linus Torvalds 2001-09-26 18:01 ` Dave Jones @ 2001-09-26 20:20 ` Vojtech Pavlik 2001-09-26 20:24 ` Vojtech Pavlik 2 siblings, 1 reply; 67+ messages in thread From: Vojtech Pavlik @ 2001-09-26 20:20 UTC (permalink / raw) To: Alan Cox Cc: Linus Torvalds, David S. Miller, bcrl, marcelo, andrea, linux-kernel On Wed, Sep 26, 2001 at 06:40:15PM +0100, Alan Cox wrote: > > PIII: > > nothing: 32 cycles > > locked add: 50 cycles > > cpuid: 170 cycles > > > > P4: > > nothing: 80 cycles > > locked add: 184 cycles > > cpuid: 652 cycles > > > Original core Athlon (step 2 and earlier) > > nothing: 11 cycles > locked add: 22 cycles > cpuid: 67 cycles > > generic Athlon is > > nothing: 11 cycles > locked add: 11 cycles > cpuid: 64 cycles Interestingly enough, my TBird 1.1G insist on cpuid being somewhat slower: nothing: 11 cycles locked add: 11 cycles cpuid: 87 cycles -- Vojtech Pavlik SuSE Labs ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-26 20:20 ` Vojtech Pavlik @ 2001-09-26 20:24 ` Vojtech Pavlik 0 siblings, 0 replies; 67+ messages in thread From: Vojtech Pavlik @ 2001-09-26 20:24 UTC (permalink / raw) To: Alan Cox Cc: Linus Torvalds, David S. Miller, bcrl, marcelo, andrea, linux-kernel On Wed, Sep 26, 2001 at 10:20:21PM +0200, Vojtech Pavlik wrote: > > generic Athlon is > > > > nothing: 11 cycles > > locked add: 11 cycles > > cpuid: 64 cycles > > Interestingly enough, my TBird 1.1G insist on cpuid being somewhat > slower: > > nothing: 11 cycles > locked add: 11 cycles > cpuid: 87 cycles Oops, this is indeed just a difference in compiler options. -- Vojtech Pavlik SuSE Labs ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-26 17:25 ` Linus Torvalds 2001-09-26 17:40 ` Alan Cox @ 2001-09-26 17:43 ` Richard Gooch 2001-09-26 18:24 ` Benjamin LaHaise 2001-09-26 17:45 ` Dave Jones ` (2 subsequent siblings) 4 siblings, 1 reply; 67+ messages in thread From: Richard Gooch @ 2001-09-26 17:43 UTC (permalink / raw) To: Linus Torvalds Cc: Alan Cox, David S. Miller, bcrl, marcelo, andrea, linux-kernel Linus Torvalds writes: > This message is in MIME format. The first part should be readable text, > while the remaining parts are likely unreadable without MIME-aware tools. > Send mail to mime@docserver.cac.washington.edu for more info. Yuk! MIME! I thought you hated it too? > PIII: > nothing: 32 cycles > locked add: 50 cycles > cpuid: 170 cycles > > P4: > nothing: 80 cycles > locked add: 184 cycles > cpuid: 652 cycles > > Remember: these are for the already-exclusive-cache cases. ] > > What are the athlon numbers? Athalon 850 MHz: nothing: 11 cycles locked add: 12 cycles cpuid: 64 cycles BTW: your code had horrible control-M's on each line. So the compiler choked (with a less-than-helpful error message). Of course, cat t.c showed nothing amiss. Fortunately emacs doesn't hide information. Regards, Richard.... Permanent: rgooch@atnf.csiro.au Current: rgooch@ras.ucalgary.ca ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-26 17:43 ` Richard Gooch @ 2001-09-26 18:24 ` Benjamin LaHaise 2001-09-26 18:48 ` Richard Gooch 0 siblings, 1 reply; 67+ messages in thread From: Benjamin LaHaise @ 2001-09-26 18:24 UTC (permalink / raw) To: Richard Gooch; +Cc: linux-kernel On Wed, Sep 26, 2001 at 11:43:25AM -0600, Richard Gooch wrote: > BTW: your code had horrible control-M's on each line. So the compiler > choked (with a less-than-helpful error message). Of course, cat t.c > showed nothing amiss. Fortunately emacs doesn't hide information. You must be using some kind of broken MUA -- neither mutt nor pine resulted in anything with a trace of 0x0d in it. -ben ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-26 18:24 ` Benjamin LaHaise @ 2001-09-26 18:48 ` Richard Gooch 2001-09-26 18:58 ` Davide Libenzi 0 siblings, 1 reply; 67+ messages in thread From: Richard Gooch @ 2001-09-26 18:48 UTC (permalink / raw) To: Benjamin LaHaise; +Cc: linux-kernel Benjamin LaHaise writes: > On Wed, Sep 26, 2001 at 11:43:25AM -0600, Richard Gooch wrote: > > BTW: your code had horrible control-M's on each line. So the compiler > > choked (with a less-than-helpful error message). Of course, cat t.c > > showed nothing amiss. Fortunately emacs doesn't hide information. > > You must be using some kind of broken MUA -- neither mutt nor pine > resulted in anything with a trace of 0x0d in it. My MUA doesn't know about MIME at all (part of the reason I hate MIME). I save the message to a file and run uudeview 0.5pl13. Regards, Richard.... Permanent: rgooch@atnf.csiro.au Current: rgooch@ras.ucalgary.ca ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-26 18:48 ` Richard Gooch @ 2001-09-26 18:58 ` Davide Libenzi 0 siblings, 0 replies; 67+ messages in thread From: Davide Libenzi @ 2001-09-26 18:58 UTC (permalink / raw) To: Richard Gooch; +Cc: linux-kernel, Benjamin LaHaise On 26-Sep-2001 Richard Gooch wrote: > Benjamin LaHaise writes: >> On Wed, Sep 26, 2001 at 11:43:25AM -0600, Richard Gooch wrote: >> > BTW: your code had horrible control-M's on each line. So the compiler >> > choked (with a less-than-helpful error message). Of course, cat t.c >> > showed nothing amiss. Fortunately emacs doesn't hide information. >> >> You must be using some kind of broken MUA -- neither mutt nor pine >> resulted in anything with a trace of 0x0d in it. > > My MUA doesn't know about MIME at all (part of the reason I hate > MIME). I save the message to a file and run uudeview 0.5pl13. Maybe the file you save is in RFC format ( \r\n ) and uudeview does not trim it. - Davide ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-26 17:25 ` Linus Torvalds 2001-09-26 17:40 ` Alan Cox 2001-09-26 17:43 ` Richard Gooch @ 2001-09-26 17:45 ` Dave Jones 2001-09-26 17:50 ` Alan Cox 2001-09-26 23:26 ` David S. Miller 4 siblings, 0 replies; 67+ messages in thread From: Dave Jones @ 2001-09-26 17:45 UTC (permalink / raw) To: Linus Torvalds Cc: Alan Cox, David S. Miller, bcrl, marcelo, andrea, linux-kernel On Wed, 26 Sep 2001, Linus Torvalds wrote: > What are the athlon numbers? nothing: 11 cycles locked add: 11 cycles cpuid: 63 cycles (cpuid varies between 63->68 here) regards, Dave. -- | Dave Jones. http://www.suse.de/~davej | SuSE Labs ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-26 17:25 ` Linus Torvalds ` (2 preceding siblings ...) 2001-09-26 17:45 ` Dave Jones @ 2001-09-26 17:50 ` Alan Cox 2001-09-26 17:59 ` Dave Jones 2001-09-26 18:59 ` George Greer 2001-09-26 23:26 ` David S. Miller 4 siblings, 2 replies; 67+ messages in thread From: Alan Cox @ 2001-09-26 17:50 UTC (permalink / raw) To: Linus Torvalds Cc: Alan Cox, David S. Miller, bcrl, marcelo, andrea, linux-kernel and for completeness VIA Cyrix CIII (original generation 0.18u) nothing: 28 cycles locked add: 29 cycles cpuid: 72 cycles Pentium Pro nothing: 33 cycles locked add: 51 cycles cpuid: 98 cycles (base comparison - pure in order machine) IDT winchip nothing: 17 cycles locked add: 20 cycles cpuid: 33 cycles ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-26 17:50 ` Alan Cox @ 2001-09-26 17:59 ` Dave Jones 2001-09-26 18:07 ` Alan Cox ` (2 more replies) 2001-09-26 18:59 ` George Greer 1 sibling, 3 replies; 67+ messages in thread From: Dave Jones @ 2001-09-26 17:59 UTC (permalink / raw) To: Alan Cox Cc: Linus Torvalds, David S. Miller, bcrl, marcelo, andrea, linux-kernel On Wed, 26 Sep 2001, Alan Cox wrote: > VIA Cyrix CIII (original generation 0.18u) > > nothing: 28 cycles > locked add: 29 cycles > cpuid: 72 cycles Interesting. From a newer C3.. nothing: 30 cycles locked add: 31 cycles cpuid: 79 cycles Only slightly worse, but I'd not expected this. This was from a 866MHz part too, whereas you have a 533 iirc ? regards, Dave. -- | Dave Jones. http://www.suse.de/~davej | SuSE Labs ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-26 17:59 ` Dave Jones @ 2001-09-26 18:07 ` Alan Cox 2001-09-26 18:09 ` Padraig Brady 2001-09-26 18:24 ` Linus Torvalds 2 siblings, 0 replies; 67+ messages in thread From: Alan Cox @ 2001-09-26 18:07 UTC (permalink / raw) To: Dave Jones Cc: Alan Cox, Linus Torvalds, David S. Miller, bcrl, marcelo, andrea, linux-kernel > nothing: 30 cycles > locked add: 31 cycles > cpuid: 79 cycles > > Only slightly worse, but I'd not expected this. > This was from a 866MHz part too, whereas you have a 533 iirc ? The 0.13u part has a couple more pipeline steps I believe ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-26 17:59 ` Dave Jones 2001-09-26 18:07 ` Alan Cox @ 2001-09-26 18:09 ` Padraig Brady 2001-09-26 18:22 ` Dave Jones 2001-09-26 18:24 ` Linus Torvalds 2 siblings, 1 reply; 67+ messages in thread From: Padraig Brady @ 2001-09-26 18:09 UTC (permalink / raw) To: Dave Jones; +Cc: Alan Cox, linux-kernel Dave Jones wrote: >On Wed, 26 Sep 2001, Alan Cox wrote: > >>VIA Cyrix CIII (original generation 0.18u) >> >>nothing: 28 cycles >>locked add: 29 cycles >>cpuid: 72 cycles >> > >Interesting. From a newer C3.. > >nothing: 30 cycles >locked add: 31 cycles >cpuid: 79 cycles > >Only slightly worse, but I'd not expected this. >This was from a 866MHz part too, whereas you have a 533 iirc ? > >regards, > >Dave. > Interesting, does the origonal CIII have a TSC? would that affect the timings Alan got? The following table may be of use to people: (All these S370) ---------------------------------------------------------------------------------------- core size name code Notes ---------------------------------------------------------------------------------------- samuel 0.18µm Via Cyrix III(C5) (128K L1 0K L2 cache). FPU doesn't run @ full clock speed. samuel II 0.15µm Via C3 (C5B) 667MHz CIII in Dabs are C3's (128K L1, 64K L2 cache), (MMX/3D now!), FPU @ full clock speed. mathew 0.15µm Via C3 (C5B) mobile samuel II with integrated north bridge & 2D/3D graphics. (1.6v) ezra 0.13µm Via C3 (C5C) Debut @ 850MHz rising to 1GHz quickly (1.35v) nehemiah 0.13µm Via C4 (C5X) Debut @ 1.2GHz (128K L1, 256K L2 cache) (SSE) esther 0.10µm Via C4 (C5Y) ? ---------------------- C3 availability details: 667 66 / 100 / 133 1.5 Socket 370 L1: 128kB,L2: 64kB 0.15µ 6-12W Mar 2001 733 66 / 100 / 133 1.5 Socket 370 L1: 128kB,L2: 64kB 0.15µ 6-12W May 2001 733 66 / 100 / 133 1.5 Socket 370 L1: 128kB,L2: 64kB 0.15µ 1+ W May 2001 (e series) 750 100 / 133 1.5 Socket 370 L1: 128kB,L2: 64kB 0.15µ 6-12W May 2001 800 100 / 133 1.5 Socket 370 L1: 128kB,L2: 64kB 0.13µ 7-12W May 2001 (ezra) ---------------------- ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-26 18:09 ` Padraig Brady @ 2001-09-26 18:22 ` Dave Jones 0 siblings, 0 replies; 67+ messages in thread From: Dave Jones @ 2001-09-26 18:22 UTC (permalink / raw) To: Padraig Brady; +Cc: Alan Cox, linux-kernel On Wed, 26 Sep 2001, Padraig Brady wrote: > Interesting, does the origonal CIII have a TSC? Yes. regards, Dave. -- | Dave Jones. http://www.suse.de/~davej | SuSE Labs ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-26 17:59 ` Dave Jones 2001-09-26 18:07 ` Alan Cox 2001-09-26 18:09 ` Padraig Brady @ 2001-09-26 18:24 ` Linus Torvalds 2001-09-26 18:40 ` Dave Jones 2001-09-26 19:04 ` Locking comment on shrink_caches() George Greer 2 siblings, 2 replies; 67+ messages in thread From: Linus Torvalds @ 2001-09-26 18:24 UTC (permalink / raw) To: Dave Jones; +Cc: Alan Cox, David S. Miller, bcrl, marcelo, andrea, linux-kernel On Wed, 26 Sep 2001, Dave Jones wrote: > On Wed, 26 Sep 2001, Alan Cox wrote: > > > VIA Cyrix CIII (original generation 0.18u) > > > > nothing: 28 cycles > > locked add: 29 cycles > > cpuid: 72 cycles > > Interesting. From a newer C3.. > > nothing: 30 cycles > locked add: 31 cycles > cpuid: 79 cycles > > Only slightly worse, but I'd not expected this. That difference can easily be explained by the compiler and options. You should use "gcc -O2" at least, in order to avoid having gcc do unnecessary spills to memory in between the timings. And there may be some versions of gcc that en dup spilling even then. Linus ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-26 18:24 ` Linus Torvalds @ 2001-09-26 18:40 ` Dave Jones 2001-09-26 19:12 ` Linus Torvalds 2001-09-26 19:04 ` Locking comment on shrink_caches() George Greer 1 sibling, 1 reply; 67+ messages in thread From: Dave Jones @ 2001-09-26 18:40 UTC (permalink / raw) To: Linus Torvalds Cc: Alan Cox, David S. Miller, bcrl, marcelo, andrea, linux-kernel On Wed, 26 Sep 2001, Linus Torvalds wrote: > > > cpuid: 72 cycles > > cpuid: 79 cycles > > Only slightly worse, but I'd not expected this. > That difference can easily be explained by the compiler and options. Actually repeated runs of the test on that box show it deviating by up to 10 cycles, making it match the results that Alan posted. -O2 made no difference, these deviations still occur. They seem more prominent on the C3 than other boxes I've tried, even with the same compiler toolchain. regards, Dave. -- | Dave Jones. http://www.suse.de/~davej | SuSE Labs ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-26 18:40 ` Dave Jones @ 2001-09-26 19:12 ` Linus Torvalds 2001-09-27 12:22 ` CPU frequency shifting "problems" Padraig Brady 0 siblings, 1 reply; 67+ messages in thread From: Linus Torvalds @ 2001-09-26 19:12 UTC (permalink / raw) To: linux-kernel In article <Pine.LNX.4.30.0109262036480.8655-100000@Appserv.suse.de>, Dave Jones <davej@suse.de> wrote: >On Wed, 26 Sep 2001, Linus Torvalds wrote: > >> > > cpuid: 72 cycles >> > cpuid: 79 cycles >> > Only slightly worse, but I'd not expected this. >> That difference can easily be explained by the compiler and options. > >Actually repeated runs of the test on that box show it deviating by up >to 10 cycles, making it match the results that Alan posted. >-O2 made no difference, these deviations still occur. They seem more >prominent on the C3 than other boxes I've tried, even with the same >compiler toolchain. Does the C3 do any kind of frequency shifting? For example, on a transmeta CPU, the TSC will run at a constant "nominal" speed (the highest the CPU can go), although the real CPU speed will depend on the load of the machine and temperature etc. So on a crusoe CPU you'll see varying speeds (and it depends on the speed grade, because that in turn depends on how many longrun steps are being actively used). For example, on a mostly idle machine I get torvalds@kiwi:~ > ./a.out nothing: 54 cycles locked add: 54 cycles cpuid: 91 cycles while if I have another window that does an endless loop to keep the CPU busy, the _real_ frequency of the CPU scales up, and the machine basically becomes faster: torvalds@kiwi:~ > ./a.out nothing: 36 cycles locked add: 36 cycles cpuid: 54 cycles (The reason why the "nothing" TSC read is expensive on crusoe is because of the scaling of the TSC - rdtsc literally has to do a floating point multiply-add to scale the clock to the right "nominal" frequency. Of course, "expensive" is still a lot less than the inexplicable 80 cycles on a P4). (That's a 600MHz part going down to to 400MHz in idle, btw) On a 633MHz part (I don't actually have access to any of the high speed grades ;) it ends up being fast: nothing: 39 cycles locked add: 40 cycles cpuid: 68 cycles slow: nothing: 82 cycles locked add: 84 cycles cpuid: 122 cycles which corresponds to a 633MHz part going down to 300MHz in idle. And of course, you can get pretty much anything in between, depending on what the load is... Linus ^ permalink raw reply [flat|nested] 67+ messages in thread
* CPU frequency shifting "problems" 2001-09-26 19:12 ` Linus Torvalds @ 2001-09-27 12:22 ` Padraig Brady 2001-09-27 12:44 ` Dave Jones 2001-09-27 23:23 ` Linus Torvalds 0 siblings, 2 replies; 67+ messages in thread From: Padraig Brady @ 2001-09-27 12:22 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel Linus Torvalds wrote: >In article <Pine.LNX.4.30.0109262036480.8655-100000@Appserv.suse.de>, >Dave Jones <davej@suse.de> wrote: > >>On Wed, 26 Sep 2001, Linus Torvalds wrote: >> >>>>>cpuid: 72 cycles >>>>> >>>>cpuid: 79 cycles >>>>Only slightly worse, but I'd not expected this. >>>> >>>That difference can easily be explained by the compiler and options. >>> >>Actually repeated runs of the test on that box show it deviating by up >>to 10 cycles, making it match the results that Alan posted. >>-O2 made no difference, these deviations still occur. They seem more >>prominent on the C3 than other boxes I've tried, even with the same >>compiler toolchain. >> > >Does the C3 do any kind of frequency shifting? > Not automatic, but you can set the multiplier dynamically by setting the msr. Russell King has been working on an arch independent framework for this kind of thing and support for the C3 has recently been added by Dave Jones. The code is available @: cvs -d :pserver:cvs@pubcvs.arm.linux.org.uk:/mnt/src/cvsroot login cvs -d :pserver:cvs@pubcvs.arm.linux.org.uk:/mnt/src/cvsroot co cpufreq > >For example, on a transmeta CPU, the TSC will run at a constant >"nominal" speed (the highest the CPU can go), although the real CPU >speed will depend on the load of the machine and temperature etc. > As does the P4 from what I understand. So a question.. What are the software dependencies on this auto/manual frequency shifting? The code referenced above scales jiffies appropriately when a manual frequency change is requested. I'm not sure about the possible consequences of this for e.g. could there be races introduced with various busy loop locking etc. A quick check for the use of jiffies in the kernel: [padraig@pixelbeat linux]$ find -name "*.[ch]" | xargs grep jiffies | wc -l 3992 Also with the auto shifting of the transmeta/P4, wont this invalidate the jiffies value? Also how does this affect the rtLinux guys (and realtime software in general). cheers, Padraig. > So on >a crusoe CPU you'll see varying speeds (and it depends on the speed >grade, because that in turn depends on how many longrun steps are being >actively used). > >For example, on a mostly idle machine I get > > torvalds@kiwi:~ > ./a.out > nothing: 54 cycles > locked add: 54 cycles > cpuid: 91 cycles > >while if I have another window that does an endless loop to keep the CPU >busy, the _real_ frequency of the CPU scales up, and the machine >basically becomes faster: > > torvalds@kiwi:~ > ./a.out > nothing: 36 cycles > locked add: 36 cycles > cpuid: 54 cycles > >(The reason why the "nothing" TSC read is expensive on crusoe is because >of the scaling of the TSC - rdtsc literally has to do a floating point >multiply-add to scale the clock to the right "nominal" frequency. Of >course, "expensive" is still a lot less than the inexplicable 80 cycles >on a P4). > >(That's a 600MHz part >going down to to 400MHz in idle, btw) > >On a 633MHz part (I don't actually have access to any of the high speed >grades ;) it ends up being > >fast: > nothing: 39 cycles > locked add: 40 cycles > cpuid: 68 cycles > >slow: > nothing: 82 cycles > locked add: 84 cycles > cpuid: 122 cycles > >which corresponds to a 633MHz part going down to 300MHz in idle. > >And of course, you can get pretty much anything in between, depending on >what the load is... > > Linus > ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: CPU frequency shifting "problems" 2001-09-27 12:22 ` CPU frequency shifting "problems" Padraig Brady @ 2001-09-27 12:44 ` Dave Jones 2001-09-27 23:23 ` Linus Torvalds 1 sibling, 0 replies; 67+ messages in thread From: Dave Jones @ 2001-09-27 12:44 UTC (permalink / raw) To: Padraig Brady; +Cc: Linux Kernel Mailing List On Thu, 27 Sep 2001, Padraig Brady wrote: > >Does the C3 do any kind of frequency shifting? > Not automatic, but you can set the multiplier dynamically by setting the > msr. > Russell King has been working on an arch independent framework for this > kind of thing and support for the C3 has recently been added by Dave Jones. If you're going to try this out on a C3 btw, heed the warning at the top of the code :) This still needs quite a bit of work. I just need to find the time to sit down and finish it. (The x86 bits are all thats preventing Russell from saying "This is ready" iirc, so I should get that finished at some point soon) I'd like to add Transmeta Longrun support to it too, but that can come later, when I get access to one. regards, Dave. -- | Dave Jones. http://www.suse.de/~davej | SuSE Labs ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: CPU frequency shifting "problems" 2001-09-27 12:22 ` CPU frequency shifting "problems" Padraig Brady 2001-09-27 12:44 ` Dave Jones @ 2001-09-27 23:23 ` Linus Torvalds 2001-09-28 0:55 ` Alan Cox 2001-09-28 8:55 ` Jamie Lokier 1 sibling, 2 replies; 67+ messages in thread From: Linus Torvalds @ 2001-09-27 23:23 UTC (permalink / raw) To: Padraig Brady; +Cc: linux-kernel On Thu, 27 Sep 2001, Padraig Brady wrote: > > > > >For example, on a transmeta CPU, the TSC will run at a constant > >"nominal" speed (the highest the CPU can go), although the real CPU > >speed will depend on the load of the machine and temperature etc. > > As does the P4 from what I understand. That might explain why the P4 "rdtsc" is so slow. > So a question.. > What are the software dependencies on this auto/manual frequency shifting? None. At least not as long as the CPU _does_ do it automatically, and the TSC appears to run at a constant speed even if the CPU does not. For example, the Intel "SpeedStep" CPU's are completely broken under Linux, and real-time will advance at different speeds in DC and AC modes, because Intel actually changes the frequency of the TSC _and_ they don't document how to figure out that it changed. With a CPU that does makes TSC appear constant-frequency, the fact that the CPU itself can go faster/slower doesn't matter - from a kernel perspective that's pretty much equivalent to the different speeds you get from cache miss behaviour etc. Linus ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: CPU frequency shifting "problems" 2001-09-27 23:23 ` Linus Torvalds @ 2001-09-28 0:55 ` Alan Cox 2001-09-28 2:12 ` Stefan Smietanowski 2001-09-28 8:55 ` Jamie Lokier 1 sibling, 1 reply; 67+ messages in thread From: Alan Cox @ 2001-09-28 0:55 UTC (permalink / raw) To: Linus Torvalds; +Cc: Padraig Brady, linux-kernel > For example, the Intel "SpeedStep" CPU's are completely broken under > Linux, and real-time will advance at different speeds in DC and AC modes, > because Intel actually changes the frequency of the TSC _and_ they don't > document how to figure out that it changed. The change is APM or ACPI initiated. Intel won't tell anyone anything useful but Microsoft have published some of the required intel confidential information which helps a bit ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: CPU frequency shifting "problems" 2001-09-28 0:55 ` Alan Cox @ 2001-09-28 2:12 ` Stefan Smietanowski 0 siblings, 0 replies; 67+ messages in thread From: Stefan Smietanowski @ 2001-09-28 2:12 UTC (permalink / raw) To: Alan Cox; +Cc: linux-kernel Hey. >>For example, the Intel "SpeedStep" CPU's are completely broken under >>Linux, and real-time will advance at different speeds in DC and AC modes, >>because Intel actually changes the frequency of the TSC _and_ they don't >>document how to figure out that it changed. > > The change is APM or ACPI initiated. Intel won't tell anyone anything > useful but Microsoft have published some of the required intel confidential > information which helps a bit Did you just say that Microsoft actually went and did something right for a change? As in publishing specs I mean. *Stands in awe* :) // Stefan ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: CPU frequency shifting "problems" 2001-09-27 23:23 ` Linus Torvalds 2001-09-28 0:55 ` Alan Cox @ 2001-09-28 8:55 ` Jamie Lokier 2001-09-28 16:11 ` Linus Torvalds 1 sibling, 1 reply; 67+ messages in thread From: Jamie Lokier @ 2001-09-28 8:55 UTC (permalink / raw) To: Linus Torvalds; +Cc: Padraig Brady, linux-kernel Linus Torvalds wrote: > With a CPU that does makes TSC appear constant-frequency, the fact that > the CPU itself can go faster/slower doesn't matter - from a kernel > perspective that's pretty much equivalent to the different speeds you get > from cache miss behaviour etc. On a Transmeta chip, does the TSC clock advance _exactly_ uniformly, or is there a cumulative error due to speed changes? I'll clarify. I imagine that the internal clocks are driven by PLLs, DLLs or something similar. Unless multiple oscillators are used, this means that speed switching is gradual, over several hundred or many more clock cycles. You said that Crusoe does a floating point op to scale the TSC value. Now suppose I have a 600MHz Crusoe. I calibrate the clock and it comes out as 600.01MHz. I can now use `rdtsc' to measure time in userspace, rather more accurately than gettimeofday(). (In fact I have worked with programs that do this, for network traffic injection.). I can do this over a period of minutes, expecting the clock to match "wall clock" time reasonably accurately. Suppose the CPU clock speed changes. Can I be confident that 600.01*10^6 (+/- small tolerance) cycles will still be counted per second, or is there a cumulative error due to the gradual clock speed change and the floating-point scale factor not integrating the gradual change precisely? (One hardware implementation that doesn't have this problem is to run a small counter, say 3 or 4 bits, at the nominal clock speed all the time, and have the slower core sample that. But it may use a little more power, and your note about FP scaling tells me you don't do that). thanks, -- Jamie ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: CPU frequency shifting "problems" 2001-09-28 8:55 ` Jamie Lokier @ 2001-09-28 16:11 ` Linus Torvalds 2001-09-28 20:29 ` Eric W. Biederman 0 siblings, 1 reply; 67+ messages in thread From: Linus Torvalds @ 2001-09-28 16:11 UTC (permalink / raw) To: Jamie Lokier; +Cc: Padraig Brady, linux-kernel On Fri, 28 Sep 2001, Jamie Lokier wrote: > > On a Transmeta chip, does the TSC clock advance _exactly_ uniformly, or > is there a cumulative error due to speed changes? > > I'll clarify. I imagine that the internal clocks are driven by PLLs, > DLLs or something similar. Unless multiple oscillators are used, this > means that speed switching is gradual, over several hundred or many more > clock cycles. Basically, there's the "slow" timer, and the fast one. The slow one always runs, and fast one gives the precision but runs at CPU speed. So yes, there are multiple oscillators, and no, they should not drift on frequency shifting, because the slow and constant one is used to scale the fast one. So no cumulative errors. HOWEVER, anybody who believes that TSC is a "truly accurate clock" will be sadly mistaken on any machine. Even PLL's drift over time, and as mentioned, Intel already broke the "you can use TSC as wall time" in their SpeedStep implementation. Who knows what their future CPU's will do.. > I can now use `rdtsc' to measure time in userspace, rather more > accurately than gettimeofday(). (In fact I have worked with programs > that do this, for network traffic injection.). I can do this over a > period of minutes, expecting the clock to match "wall clock" time > reasonably accurately. It will work on Crusoe. > (One hardware implementation that doesn't have this problem is to run a > small counter, say 3 or 4 bits, at the nominal clock speed all the time, > and have the slower core sample that. But it may use a little more > power, and your note about FP scaling tells me you don't do that). We do that, but the other way around. The thing is, the "nominal clock speed" doesn't even _exist_ when running normally. What does exist is the bus clock (well, a multiple of it, but you get the idea), and that one is stable. I bet PCI devices don't like to be randomly driven at frequencies "somewhere between 12 and 33MHz" depending on load ;) But because the stable frequency is the _slow_ one, you can't just scale that up (well, you could - you could just run your cycle counter at 66MHz all the time, and you couldn't measure smaller intervals, and people would be really disappointed). So you need the scaling of the fast one.. Linus ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: CPU frequency shifting "problems" 2001-09-28 16:11 ` Linus Torvalds @ 2001-09-28 20:29 ` Eric W. Biederman 2001-09-28 22:24 ` Jamie Lokier 0 siblings, 1 reply; 67+ messages in thread From: Eric W. Biederman @ 2001-09-28 20:29 UTC (permalink / raw) To: Linus Torvalds; +Cc: Jamie Lokier, Padraig Brady, linux-kernel Linus Torvalds <torvalds@transmeta.com> writes: > What does exist is the bus clock (well, a multiple of it, but you get the > idea), and that one is stable. I bet PCI devices don't like to be randomly > driven at frequencies "somewhere between 12 and 33MHz" depending on load ;) I doubt they would like it but it is perfectly legal (PCI spec..) to vary the pci clock, depending upon load. Eric ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: CPU frequency shifting "problems" 2001-09-28 20:29 ` Eric W. Biederman @ 2001-09-28 22:24 ` Jamie Lokier 0 siblings, 0 replies; 67+ messages in thread From: Jamie Lokier @ 2001-09-28 22:24 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Linus Torvalds, Padraig Brady, linux-kernel Eric W. Biederman wrote: > > What does exist is the bus clock (well, a multiple of it, but you get the > > idea), and that one is stable. I bet PCI devices don't like to be randomly > > driven at frequencies "somewhere between 12 and 33MHz" depending on load ;) > > I doubt they would like it but it is perfectly legal (PCI spec..) to > vary the pci clock, depending upon load. Yes it is. Also, the PCI clock is frequency modulated to reduce electrical interference. (Or on a more cynical note, to pass the official emissions tests ;-) However it's common practice to PLL to the PCI clock, for clock distribution on a board, so varying the frequency must be done in a strictly constrained fashion. -- Jamie ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-26 18:24 ` Linus Torvalds 2001-09-26 18:40 ` Dave Jones @ 2001-09-26 19:04 ` George Greer 1 sibling, 0 replies; 67+ messages in thread From: George Greer @ 2001-09-26 19:04 UTC (permalink / raw) To: linux-kernel On Wed, 26 Sep 2001, Linus Torvalds wrote: > >On Wed, 26 Sep 2001, Dave Jones wrote: >> On Wed, 26 Sep 2001, Alan Cox wrote: >> >> > VIA Cyrix CIII (original generation 0.18u) >> > >> > nothing: 28 cycles >> > locked add: 29 cycles >> > cpuid: 72 cycles >> >> Interesting. From a newer C3.. >> >> nothing: 30 cycles >> locked add: 31 cycles >> cpuid: 79 cycles >> >> Only slightly worse, but I'd not expected this. > >That difference can easily be explained by the compiler and options. > >You should use "gcc -O2" at least, in order to avoid having gcc do >unnecessary spills to memory in between the timings. And there may be some >versions of gcc that en dup spilling even then. Nice big difference in 'locked add' seen here. gcc version 2.96 20000731 (Red Hat Linux 7.1 2.96-85) 2x Pentium 233/MMX -O0 -O2 nothing: 15 cycles nothing: 14 cycles locked add: 60 cycles locked add: 32 cycles cpuid: 33 cycles cpuid: 32 cycles gcc version 2.96 20000731 (Red Hat Linux 7.1 2.96-85) 2x Pentium 133 -O0 -O2 nothing: 14 cycles nothing: 13 cycles locked add: 76 cycles locked add: 25 cycles cpuid: 31 cycles cpuid: 30 cycles -- George Greer, greerga@m-l.org | Genius may have its limitations, but stupidity http://www.m-l.org/~greerga/ | is not thus handicapped. -- Elbert Hubbard ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-26 17:50 ` Alan Cox 2001-09-26 17:59 ` Dave Jones @ 2001-09-26 18:59 ` George Greer 1 sibling, 0 replies; 67+ messages in thread From: George Greer @ 2001-09-26 18:59 UTC (permalink / raw) To: linux-kernel On Wed, 26 Sep 2001, Alan Cox wrote: >and for completeness > >VIA Cyrix CIII (original generation 0.18u) > >nothing: 28 cycles >locked add: 29 cycles >cpuid: 72 cycles > >Pentium Pro > >nothing: 33 cycles >locked add: 51 cycles >cpuid: 98 cycles > >(base comparison - pure in order machine) > >IDT winchip > >nothing: 17 cycles >locked add: 20 cycles >cpuid: 33 cycles 2x Pentium MMX 233MHz nothing: 14 cycles locked add: 59 cycles cpuid: 31 cycles 2x Pentium 133MHz nothing: 14 cycles locked add: 76 cycles cpuid: 31 cycles cpuid is oddly fast. -- George Greer, greerga@m-l.org | Genius may have its limitations, but stupidity http://www.m-l.org/~greerga/ | is not thus handicapped. -- Elbert Hubbard ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-26 17:25 ` Linus Torvalds ` (3 preceding siblings ...) 2001-09-26 17:50 ` Alan Cox @ 2001-09-26 23:26 ` David S. Miller 2001-09-27 12:10 ` Alan Cox 4 siblings, 1 reply; 67+ messages in thread From: David S. Miller @ 2001-09-26 23:26 UTC (permalink / raw) To: torvalds; +Cc: alan, bcrl, marcelo, andrea, linux-kernel From: Linus Torvalds <torvalds@transmeta.com> Date: Wed, 26 Sep 2001 10:25:18 -0700 (PDT) On Wed, 26 Sep 2001, Alan Cox wrote: > > > > Your Athlons may handle exclusive cache line acquisition more > > efficiently (due to memory subsystem performance) but it still > > does cost something. > > On an exclusive line on Athlon a lock cycle is near enough free, its > just an ordering constraint. Since the line is in E state no other bus > master can hold a copy in cache so the atomicity is there. Ditto for newer > Intel processors You misunderstood the problem, I think: when the line moves from one CPU to the other (the exclusive state moves along with it), that is _expensive_. Yes, this was my intended point. Please see my quoted text above and note the "exclusive cache line acquisition" with emphasis on the word "acquisition" meaning you don't have the cache line in E state yet. Franks a lot, David S. Miller davem@redhat.com ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-26 23:26 ` David S. Miller @ 2001-09-27 12:10 ` Alan Cox 2001-09-27 15:38 ` Linus Torvalds 2001-09-27 19:41 ` David S. Miller 0 siblings, 2 replies; 67+ messages in thread From: Alan Cox @ 2001-09-27 12:10 UTC (permalink / raw) To: David S. Miller; +Cc: torvalds, alan, bcrl, marcelo, andrea, linux-kernel > Yes, this was my intended point. Please see my quoted text above and > note the "exclusive cache line acquisition" with emphasis on the word > "acquisition" meaning you don't have the cache line in E state yet. See prefetching - the CPU prefetching will hide some of the effect and the spin_lock_prefetch() macro does wonders for the rest. Alan ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-27 12:10 ` Alan Cox @ 2001-09-27 15:38 ` Linus Torvalds 2001-09-27 17:44 ` Ingo Molnar 2001-09-27 19:41 ` David S. Miller 1 sibling, 1 reply; 67+ messages in thread From: Linus Torvalds @ 2001-09-27 15:38 UTC (permalink / raw) To: Alan Cox; +Cc: David S. Miller, bcrl, marcelo, andrea, linux-kernel On Thu, 27 Sep 2001, Alan Cox wrote: > > > Yes, this was my intended point. Please see my quoted text above and > > note the "exclusive cache line acquisition" with emphasis on the word > > "acquisition" meaning you don't have the cache line in E state yet. > > See prefetching - the CPU prefetching will hide some of the effect and > the spin_lock_prefetch() macro does wonders for the rest. prefetching and friends won't do _anything_ for the case of a cache line bouncing back and forth between CPU's. In fact, it can easily make things _worse_, simply by having bouncing happen even more (you bounce it into the CPU for the prefetch, another CPU bounces it back, and you bounce it in again for the actual lock). And this isn't at all unlikely if you have a lock that is accessed a _lot_ but held only for short times. Now, I'm not convinced that pagecache_lock is _that_ critical yet, but is it one of the top ones? Definitely. Linus ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-27 15:38 ` Linus Torvalds @ 2001-09-27 17:44 ` Ingo Molnar 0 siblings, 0 replies; 67+ messages in thread From: Ingo Molnar @ 2001-09-27 17:44 UTC (permalink / raw) To: Linus Torvalds Cc: Alan Cox, David S. Miller, bcrl, marcelo, Andrea Arcangeli, linux-kernel On Thu, 27 Sep 2001, Linus Torvalds wrote: > prefetching and friends won't do _anything_ for the case of a cache > line bouncing back and forth between CPU's. yep. that is exactly what was happening with pagecache_lock, while an 8-way system served 300+ MB/sec worth of SPECweb99 HTTP content in 1500 byte packets. Under that kind of workload the pagecache is used read-mostly, and due to zerocopy (and Linux's hyper-scalable networking code) there isnt much left that pollutes caches and/or inhibits raw performance in any way. pagecache_lock was the top non-conceptual cacheline-miss offender in instruction-level profiles of such workloads. Does it show up on a dual PIII with 128 MB RAM? Probably not as strongly. Are there other offenders under other kinds of workloads that have a bigger effect than pagecache_lock? Probably yes - but this does not justify ignoring the effects of pagecache_lock. (to be precise there was another offender - timerlist_lock, we've fixed it before fixing pagecache_lock, and posted a patch for that one too. It's available under http://redhat.com/~mingo/scalable-timers/. I know no other scalability offenders for read-mostly pagecache & network-intensive workloads for the time being.) Ingo ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-27 12:10 ` Alan Cox 2001-09-27 15:38 ` Linus Torvalds @ 2001-09-27 19:41 ` David S. Miller 2001-09-27 22:59 ` Alan Cox 1 sibling, 1 reply; 67+ messages in thread From: David S. Miller @ 2001-09-27 19:41 UTC (permalink / raw) To: alan; +Cc: torvalds, bcrl, marcelo, andrea, linux-kernel From: Alan Cox <alan@lxorguk.ukuu.org.uk> Date: Thu, 27 Sep 2001 13:10:49 +0100 (BST) See prefetching - the CPU prefetching will hide some of the effect and the spin_lock_prefetch() macro does wonders for the rest. Well, if prefetching can do it faster than avoiding the transaction altogether, I'm game :-) Franks a lot, David S. Miller davem@redhat.com ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-27 19:41 ` David S. Miller @ 2001-09-27 22:59 ` Alan Cox 0 siblings, 0 replies; 67+ messages in thread From: Alan Cox @ 2001-09-27 22:59 UTC (permalink / raw) To: David S. Miller; +Cc: alan, torvalds, bcrl, marcelo, andrea, linux-kernel > See prefetching - the CPU prefetching will hide some of the effect and > the spin_lock_prefetch() macro does wonders for the rest. > > Well, if prefetching can do it faster than avoiding the transaction > altogether, I'm game :-) That would depend on the cost of avoidance, the amount of contention and the distance ahead you can fetch. Avoiding it also rather more portable so I suspect you win ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-25 21:00 ` Benjamin LaHaise 2001-09-25 21:55 ` David S. Miller @ 2001-09-25 22:03 ` Andrea Arcangeli 1 sibling, 0 replies; 67+ messages in thread From: Andrea Arcangeli @ 2001-09-25 22:03 UTC (permalink / raw) To: Benjamin LaHaise; +Cc: David S. Miller, marcelo, torvalds, linux-kernel On Tue, Sep 25, 2001 at 05:00:55PM -0400, Benjamin LaHaise wrote: > even worse. I'd rather try to use some of the rcu techniques for > page cache lookup, and per-page locking for page cache removal > which will lead to *cleaner* code as well as a much more scalable I don't think rcu fits there, truncations and releasing must be extremely efficient too. Andrea ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-25 20:15 ` David S. Miller 2001-09-25 19:02 ` Marcelo Tosatti @ 2001-09-25 20:24 ` Rik van Riel 2001-09-25 20:28 ` David S. Miller [not found] ` <200109252215.f8PMFDa02034@eng2.beaverton.ibm.com> 2001-09-25 22:01 ` Andrea Arcangeli 2 siblings, 2 replies; 67+ messages in thread From: Rik van Riel @ 2001-09-25 20:24 UTC (permalink / raw) To: David S. Miller; +Cc: marcelo, andrea, torvalds, linux-kernel On Tue, 25 Sep 2001, David S. Miller wrote: > It is known that pagecache_lock is the biggest scalability issue > on large SMP systems, and thus the page cache locking patches > Ingo and myself did. Interesting, most lockmeter data dumps I've seen here indicate the locks in fs/buffer.c as the big problem and have pagecache_lock down in the noise. Or were you measuring loads which are mostly read-only ? regards, Rik -- IA64: a worthy successor to the i860. http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-25 20:24 ` Rik van Riel @ 2001-09-25 20:28 ` David S. Miller 2001-09-25 21:05 ` Andrew Morton [not found] ` <200109252215.f8PMFDa02034@eng2.beaverton.ibm.com> 1 sibling, 1 reply; 67+ messages in thread From: David S. Miller @ 2001-09-25 20:28 UTC (permalink / raw) To: riel; +Cc: marcelo, andrea, torvalds, linux-kernel From: Rik van Riel <riel@conectiva.com.br> Date: Tue, 25 Sep 2001 17:24:21 -0300 (BRST) Or were you measuring loads which are mostly read-only ? When Kanoj Sarcar was back at SGI testing 32 processor Origin MIPS systems, pagecache_lock was at the top. Franks a lot, David S. Miller davem@redhat.com ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-25 20:28 ` David S. Miller @ 2001-09-25 21:05 ` Andrew Morton 2001-09-25 21:48 ` David S. Miller 0 siblings, 1 reply; 67+ messages in thread From: Andrew Morton @ 2001-09-25 21:05 UTC (permalink / raw) To: David S. Miller; +Cc: riel, marcelo, andrea, torvalds, linux-kernel "David S. Miller" wrote: > > From: Rik van Riel <riel@conectiva.com.br> > Date: Tue, 25 Sep 2001 17:24:21 -0300 (BRST) > > Or were you measuring loads which are mostly read-only ? > > When Kanoj Sarcar was back at SGI testing 32 processor Origin > MIPS systems, pagecache_lock was at the top. But when I asked kumon to test it on his 8-way Xeon, page_cache_lock contention proved to be insignificant. Seems to only be a NUMA thing. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-25 21:05 ` Andrew Morton @ 2001-09-25 21:48 ` David S. Miller 0 siblings, 0 replies; 67+ messages in thread From: David S. Miller @ 2001-09-25 21:48 UTC (permalink / raw) To: akpm; +Cc: riel, marcelo, andrea, torvalds, linux-kernel, mingo From: Andrew Morton <akpm@zip.com.au> Date: Tue, 25 Sep 2001 14:05:04 -0700 "David S. Miller" wrote: > When Kanoj Sarcar was back at SGI testing 32 processor Origin > MIPS systems, pagecache_lock was at the top. But when I asked kumon to test it on his 8-way Xeon, page_cache_lock contention proved to be insignificant. Seems to only be a NUMA thing. I doubt it is only a NUMA thing. I say this for TUX web benchmarks that tended to hold most of the resident set in memory, the page cache locking changes were measured to improve performance significantly on SMP x86 systems. Ingo would be able to comment further. Franks a lot, David S. Miller davem@redhat.com ^ permalink raw reply [flat|nested] 67+ messages in thread
[parent not found: <200109252215.f8PMFDa02034@eng2.beaverton.ibm.com>]
* Re: Locking comment on shrink_caches() [not found] ` <200109252215.f8PMFDa02034@eng2.beaverton.ibm.com> @ 2001-09-25 22:26 ` David S. Miller 2001-09-26 17:42 ` Ingo Molnar 0 siblings, 1 reply; 67+ messages in thread From: David S. Miller @ 2001-09-25 22:26 UTC (permalink / raw) To: gerrit; +Cc: riel, marcelo, andrea, torvalds, linux-kernel From: Gerrit Huizenga <gerrit@us.ibm.com> Date: Tue, 25 Sep 2001 15:15:13 PDT I'm very curious as to what workloads are showing pagecache_lock as a bottleneck. We haven't noticed this particular bottleneck in most of the workloads we are running. Is there a good workload that shows this type of load? Again, I defer to Ingo for specifics, but essentially something like specweb99 where the whole dataset fits in memory. Franks a lot, David S. Miller davem@redhat.com ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-25 22:26 ` David S. Miller @ 2001-09-26 17:42 ` Ingo Molnar 0 siblings, 0 replies; 67+ messages in thread From: Ingo Molnar @ 2001-09-26 17:42 UTC (permalink / raw) To: David S. Miller Cc: gerrit, riel, marcelo, Andrea Arcangeli, Linus Torvalds, linux-kernel On Tue, 25 Sep 2001, David S. Miller wrote: > I'm very curious as to what workloads are showing pagecache_lock as > a bottleneck. We haven't noticed this particular bottleneck in most > of the workloads we are running. Is there a good workload that shows > this type of load? > > Again, I defer to Ingo for specifics, but essentially something > like specweb99 where the whole dataset fits in memory. it was SPECweb99 tests done in 32 GB RAM, 8 CPUs, where the pagecache was nearly 30 GB big. We saw visible pagecache_lock contention on such systems. Due to TUX's use of zerocopy, page lookups happen at a much larger frequency and they are not intermixed with memory copies - in contrast with workloads like dbench. Ingo ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-25 20:15 ` David S. Miller 2001-09-25 19:02 ` Marcelo Tosatti 2001-09-25 20:24 ` Rik van Riel @ 2001-09-25 22:01 ` Andrea Arcangeli 2001-09-25 22:03 ` David S. Miller 2 siblings, 1 reply; 67+ messages in thread From: Andrea Arcangeli @ 2001-09-25 22:01 UTC (permalink / raw) To: David S. Miller; +Cc: marcelo, torvalds, linux-kernel On Tue, Sep 25, 2001 at 01:15:28PM -0700, David S. Miller wrote: > I do think it's silly to hold the pagecache_lock during pure scanning > activities of shrink_caches(). Indeed again. > It is known that pagecache_lock is the biggest scalability issue on > large SMP systems, and thus the page cache locking patches Ingo and > myself did. yes. IMHO if we would hold the pagecache lock all the time while shrinking the cache, then we could kill the lru lock in first place. Andrea ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-25 22:01 ` Andrea Arcangeli @ 2001-09-25 22:03 ` David S. Miller 2001-09-25 22:59 ` Andrea Arcangeli 0 siblings, 1 reply; 67+ messages in thread From: David S. Miller @ 2001-09-25 22:03 UTC (permalink / raw) To: andrea; +Cc: marcelo, torvalds, linux-kernel From: Andrea Arcangeli <andrea@suse.de> Date: Wed, 26 Sep 2001 00:01:02 +0200 IMHO if we would hold the pagecache lock all the time while shrinking the cache, then we could kill the lru lock in first place. And actually in the pagecache locking patches, doing such a thing would be impossible :-) since each page needs to grab a different lock (because the hash chain is potentially different). Franks a lot, David S. Miller davem@redhat.com ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-25 22:03 ` David S. Miller @ 2001-09-25 22:59 ` Andrea Arcangeli 0 siblings, 0 replies; 67+ messages in thread From: Andrea Arcangeli @ 2001-09-25 22:59 UTC (permalink / raw) To: David S. Miller; +Cc: marcelo, torvalds, linux-kernel On Tue, Sep 25, 2001 at 03:03:28PM -0700, David S. Miller wrote: > From: Andrea Arcangeli <andrea@suse.de> > Date: Wed, 26 Sep 2001 00:01:02 +0200 > > IMHO if we would hold the pagecache lock all the time while shrinking > the cache, then we could kill the lru lock in first place. > > And actually in the pagecache locking patches, doing such a thing > would be impossible :-) since each page needs to grab a different good further point too :), it would be an option only for mainline. Andrea ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-25 18:40 ` Marcelo Tosatti 2001-09-25 20:15 ` David S. Miller @ 2001-09-25 20:40 ` Josh MacDonald 2001-09-25 19:25 ` Marcelo Tosatti 1 sibling, 1 reply; 67+ messages in thread From: Josh MacDonald @ 2001-09-25 20:40 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: linux-kernel Quoting Marcelo Tosatti (marcelo@conectiva.com.br): > > > On Tue, 25 Sep 2001, David S. Miller wrote: > > > From: Marcelo Tosatti <marcelo@conectiva.com.br> > > Date: Tue, 25 Sep 2001 14:49:40 -0300 (BRT) > > > > Do you really need to do this ? > > > > if (unlikely(!spin_trylock(&pagecache_lock))) { > > /* we hold the page lock so the page cannot go away from under us */ > > spin_unlock(&pagemap_lru_lock); > > > > spin_lock(&pagecache_lock); > > spin_lock(&pagemap_lru_lock); > > } > > > > Have you actually seen bad hold times of pagecache_lock by > > shrink_caches() ? > > > > Marcelo, this is needed because of the spin lock ordering rules. > > The pagecache_lock must be obtained before the pagemap_lru_lock > > or else deadlock is possible. The spin_trylock is an optimization. > > Not, it is not. > > We can simply lock the pagecachelock and the pagemap_lru_lock at the > beginning of the cleaning function. page_launder() use to do that. Since your main concern seems to be simplicity, the code can remain the way it is and be far more readable with, e.g., /* Aquire lock1 while holding lock2--reverse order. */ #define spin_reverse_lock(lock1,lock2) \ if (unlikely(!spin_trylock(&lock1))) { \ spin_unlock(&lock2); \ spin_lock(&lock1); \ spin_lock(&lock2); \ } You can't argue for simple in favor of increasing lock contention, but you can keep it readable. -josh ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-25 20:40 ` Josh MacDonald @ 2001-09-25 19:25 ` Marcelo Tosatti 0 siblings, 0 replies; 67+ messages in thread From: Marcelo Tosatti @ 2001-09-25 19:25 UTC (permalink / raw) To: Josh MacDonald; +Cc: linux-kernel On Tue, 25 Sep 2001, Josh MacDonald wrote: > Quoting Marcelo Tosatti (marcelo@conectiva.com.br): > > > > > > On Tue, 25 Sep 2001, David S. Miller wrote: > > > > > From: Marcelo Tosatti <marcelo@conectiva.com.br> > > > Date: Tue, 25 Sep 2001 14:49:40 -0300 (BRT) > > > > > > Do you really need to do this ? > > > > > > if (unlikely(!spin_trylock(&pagecache_lock))) { > > > /* we hold the page lock so the page cannot go away from under us */ > > > spin_unlock(&pagemap_lru_lock); > > > > > > spin_lock(&pagecache_lock); > > > spin_lock(&pagemap_lru_lock); > > > } > > > > > > Have you actually seen bad hold times of pagecache_lock by > > > shrink_caches() ? > > > > > > Marcelo, this is needed because of the spin lock ordering rules. > > > The pagecache_lock must be obtained before the pagemap_lru_lock > > > or else deadlock is possible. The spin_trylock is an optimization. > > > > Not, it is not. > > > > We can simply lock the pagecachelock and the pagemap_lru_lock at the > > beginning of the cleaning function. page_launder() use to do that. > > Since your main concern seems to be simplicity, the code can remain > the way it is and be far more readable with, e.g., > > /* Aquire lock1 while holding lock2--reverse order. */ > #define spin_reverse_lock(lock1,lock2) \ > if (unlikely(!spin_trylock(&lock1))) { \ > spin_unlock(&lock2); \ > spin_lock(&lock1); \ > spin_lock(&lock2); \ > } > > You can't argue for simple in favor of increasing lock contention, > but you can keep it readable. Making the code readable is different from making it logically simple. I've already seen pretty subtle races on the VM which were living for long times (eg the latest race which Hugh and me found on add_to_swap_cache/try_to_swap_out which was there since 2.4.early), so I prefer to make the code as simpler as possible. If there is really long hold times by shrink_cache(), then I agree to keep the current snippet of code to avoid that. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-25 19:57 ` David S. Miller 2001-09-25 18:40 ` Marcelo Tosatti @ 2001-09-25 21:57 ` Andrea Arcangeli 1 sibling, 0 replies; 67+ messages in thread From: Andrea Arcangeli @ 2001-09-25 21:57 UTC (permalink / raw) To: David S. Miller; +Cc: marcelo, torvalds, linux-kernel On Tue, Sep 25, 2001 at 12:57:58PM -0700, David S. Miller wrote: > or else deadlock is possible. The spin_trylock is an optimization. Indeed. Andrea ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() @ 2001-09-26 5:04 Dipankar Sarma 2001-09-26 5:31 ` Andrew Morton 0 siblings, 1 reply; 67+ messages in thread From: Dipankar Sarma @ 2001-09-26 5:04 UTC (permalink / raw) To: davem; +Cc: marcelo, riel, Andrea Arcangeli, torvalds, linux-kernel, hawkes In article <20010925.132816.52117370.davem@redhat.com> David S. Miller wrote: > From: Rik van Riel <riel@conectiva.com.br> > Date: Tue, 25 Sep 2001 17:24:21 -0300 (BRST) > > Or were you measuring loads which are mostly read-only ? > When Kanoj Sarcar was back at SGI testing 32 processor Origin > MIPS systems, pagecache_lock was at the top. John Hawkes from SGI had published some AIM7 numbers that showed pagecache_lock to be a bottleneck above 4 processors. At 32 processors, half the CPU cycles were spent on waiting for pagecache_lock. The thread is at - http://marc.theaimsgroup.com/?l=lse-tech&m=98459051027582&w=2 Thanks Dipankar -- Dipankar Sarma <dipankar@in.ibm.com> Project: http://lse.sourceforge.net Linux Technology Center, IBM Software Lab, Bangalore, India. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-26 5:04 Dipankar Sarma @ 2001-09-26 5:31 ` Andrew Morton 2001-09-26 6:57 ` David S. Miller ` (2 more replies) 0 siblings, 3 replies; 67+ messages in thread From: Andrew Morton @ 2001-09-26 5:31 UTC (permalink / raw) To: dipankar Cc: davem, marcelo, riel, Andrea Arcangeli, torvalds, linux-kernel, hawkes Dipankar Sarma wrote: > > In article <20010925.132816.52117370.davem@redhat.com> David S. Miller wrote: > > From: Rik van Riel <riel@conectiva.com.br> > > Date: Tue, 25 Sep 2001 17:24:21 -0300 (BRST) > > > > Or were you measuring loads which are mostly read-only ? > > > When Kanoj Sarcar was back at SGI testing 32 processor Origin > > MIPS systems, pagecache_lock was at the top. > > John Hawkes from SGI had published some AIM7 numbers that showed > pagecache_lock to be a bottleneck above 4 processors. At 32 processors, > half the CPU cycles were spent on waiting for pagecache_lock. The > thread is at - > > http://marc.theaimsgroup.com/?l=lse-tech&m=98459051027582&w=2 > That's NUMA hardware. The per-hashqueue locking change made a big improvement on that hardware. But when it was used on Intel hardware it made no measurable difference at all. Sorry, but the patch adds compexity and unless a significant throughput benefit can be demonstrated on less exotic hardware, why use it? Here are kumon's test results from March, with and without the hashed lock patch: -------- Original Message -------- Subject: Re: [Fwd: Re: [Lse-tech] AIM7 scaling, pagecache_lock, multiqueue scheduler] Date: Thu, 15 Mar 2001 18:03:55 +0900 From: kumon@flab.fujitsu.co.jp Reply-To: kumon@flab.fujitsu.co.jp To: Andrew Morton <andrewm@uow.edu.au> CC: kumon@flab.fujitsu.co.jp, ahirai@flab.fujitsu.co.jp,John Hawkes <hawkes@engr.sgi.com>,kumon@flab.fujitsu.co.jp In-Reply-To: <3AB032B3.87940521@uow.edu.au>,<3AB0089B.CF3496D2@uow.edu.au><200103150234.LAA28075@asami.proc><3AB032B3.87940521@uow.edu.au> OK, the followings are a result of our brief measurement with WebBench (mindcraft type) of 2.4.2 and 2.4.2+pcl . Workload: WebBench 3.0 (static get) Machine: Profusion 8way 550MHz/1MB cache 1GB mem. Server: Apache 1.3.9-8 (w/ SINGLE_LISTEN_UNSERIALIZED_ACCEPT) obtained from RedHat. Clients: 32 clients each has 2 requesting threads. The following number is Request per sec. 242 242+pcl ratio ------------------------------------- 1SMP 1,603 1,584 0.99 2(1+1)SMP 2,443 2,437 1.00 4(1+3)SMP 4,420 4,426 1.00 8(4+4)SMP 5,381 5,400 1.00 #No idle time observed in the 1 to 4 SMP runs. #Only 8 SMP cases shows cpu-idle time, but it is about 2.1-2.8% of the #total CPU time. Note: The load of two buses of Profusion system isn't balance, because the number of CPUs on each bus is unbalance. Summary: From the above brief test, (+pcl) patch doens't show the measurable performance gain. - ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-26 5:31 ` Andrew Morton @ 2001-09-26 6:57 ` David S. Miller 2001-09-26 7:08 ` Dipankar Sarma 2001-09-26 16:52 ` John Hawkes 2 siblings, 0 replies; 67+ messages in thread From: David S. Miller @ 2001-09-26 6:57 UTC (permalink / raw) To: akpm; +Cc: dipankar, marcelo, riel, andrea, torvalds, linux-kernel, hawkes From: Andrew Morton <akpm@zip.com.au> Date: Tue, 25 Sep 2001 22:31:32 -0700 Here are kumon's test results from March, with and without the hashed lock patch: Please elaborate on what the webbench-3.0 static gets was really doing. Was this test composed of multiple accesses to the same or a small set of files? If so, that is indeed the case where the page cache locking patches won't help at all. The more diversified the set of files being accessed, the greater the gain from the locking changes. You have to encourage the cpus at least have a chance at accessing different hash chains :-) Franks a lot, David S. Miller davem@redhat.com ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-26 5:31 ` Andrew Morton 2001-09-26 6:57 ` David S. Miller @ 2001-09-26 7:08 ` Dipankar Sarma 2001-09-26 16:52 ` John Hawkes 2 siblings, 0 replies; 67+ messages in thread From: Dipankar Sarma @ 2001-09-26 7:08 UTC (permalink / raw) To: Andrew Morton Cc: davem, marcelo, riel, Andrea Arcangeli, torvalds, linux-kernel, anton, jdoelle On Tue, Sep 25, 2001 at 10:31:32PM -0700, Andrew Morton wrote: > Dipankar Sarma wrote: > > > > John Hawkes from SGI had published some AIM7 numbers that showed > > pagecache_lock to be a bottleneck above 4 processors. At 32 processors, > > half the CPU cycles were spent on waiting for pagecache_lock. The > > thread is at - > > > > http://marc.theaimsgroup.com/?l=lse-tech&m=98459051027582&w=2 > > > > That's NUMA hardware. The per-hashqueue locking change made > a big improvement on that hardware. But when it was used on > Intel hardware it made no measurable difference at all. > > Sorry, but the patch adds compexity and unless a significant > throughput benefit can be demonstrated on less exotic hardware, > why use it? I agree that on NUMA systems, contention and lock wait times degenerate non-linearly thereby skewing the actual impact. IIRC, there were discussions on lse-tech about pagecache_lock and dbench numbers published by Juergen Doelle (on 8way Intel) and Anton Blanchard on 16way PPC. Perhaps they can shed some light on this. Thanks Dipankar -- Dipankar Sarma <dipankar@in.ibm.com> Project: http://lse.sourceforge.net Linux Technology Center, IBM Software Lab, Bangalore, India. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: Locking comment on shrink_caches() 2001-09-26 5:31 ` Andrew Morton 2001-09-26 6:57 ` David S. Miller 2001-09-26 7:08 ` Dipankar Sarma @ 2001-09-26 16:52 ` John Hawkes 2 siblings, 0 replies; 67+ messages in thread From: John Hawkes @ 2001-09-26 16:52 UTC (permalink / raw) To: Andrew Morton, dipankar Cc: davem, marcelo, riel, Andrea Arcangeli, torvalds, linux-kernel, hawkes From: "Andrew Morton" <akpm@zip.com.au> > > John Hawkes from SGI had published some AIM7 numbers that showed > > pagecache_lock to be a bottleneck above 4 processors. At 32 processors, > > half the CPU cycles were spent on waiting for pagecache_lock. The > > thread is at - > > > > http://marc.theaimsgroup.com/?l=lse-tech&m=98459051027582&w=2 > > > > That's NUMA hardware. The per-hashqueue locking change made > a big improvement on that hardware. But when it was used on > Intel hardware it made no measurable difference at all. More specifically, that was on SGI Origin2000 32p mips64 ccNUMA hardware. The pagecache_lock bottleneck is substantially less on SGI Itanium ccNUMA hardware running those AIM7 workloads. I'm seeing moderately significant contention on the Big Kernel Lock, mostly from sys_lseek() and ext2_get_block(). John Hawkes hawkes@sgi.com ^ permalink raw reply [flat|nested] 67+ messages in thread
[parent not found: <fa.cbgmt3v.192gc8r@ifi.uio.no>]
[parent not found: <fa.cd0mtbv.1aigc0v@ifi.uio.no>]
[parent not found: <i1m66a5o1zc.fsf@verden.pvv.ntnu.no>]
* Re: Locking comment on shrink_caches() [not found] ` <i1m66a5o1zc.fsf@verden.pvv.ntnu.no> @ 2001-09-27 1:34 ` Vojtech Pavlik 0 siblings, 0 replies; 67+ messages in thread From: Vojtech Pavlik @ 2001-09-27 1:34 UTC (permalink / raw) To: Trond Eivind Glomsr?d; +Cc: linux-kernel On Thu, Sep 27, 2001 at 03:29:27AM +0200, Trond Eivind Glomsr?d wrote: > Vojtech Pavlik <vojtech@suse.cz> writes: > > > On Wed, Sep 26, 2001 at 10:20:21PM +0200, Vojtech Pavlik wrote: > > > > > > generic Athlon is > > > > > > > > nothing: 11 cycles > > > > locked add: 11 cycles > > > > cpuid: 64 cycles > > > > > > Interestingly enough, my TBird 1.1G insist on cpuid being somewhat > > > slower: > > > > > > nothing: 11 cycles > > > locked add: 11 cycles > > > cpuid: 87 cycles > > > > Oops, this is indeed just a difference in compiler options. > > No, it's not: > > [teg@xyzzy teg]$ ./t > nothing: 11 cycles > locked add: 11 cycles > cpuid: 64 cycles > [teg@xyzzy teg]$ ./t > nothing: 11 cycles > locked add: 11 cycles > cpuid: 64 cycles > [teg@xyzzy teg]$ > [teg@xyzzy teg]$ ./t > nothing: 11 cycles > locked add: 11 cycles > cpuid: 87 cycles > [teg@xyzzy teg]$ ./t > nothing: 11 cycles > locked add: 11 cycles > cpuid: 87 cycles > [teg@xyzzy teg]$ ./t > nothing: 11 cycles > locked add: 11 cycles > cpuid: 64 cycles Interesting: Try while true; do t; done and watch it change between 64 and 87 every 2.5 seconds ... :) -- Vojtech Pavlik SuSE Labs ^ permalink raw reply [flat|nested] 67+ messages in thread
end of thread, other threads:[~2001-09-28 22:25 UTC | newest]
Thread overview: 67+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-09-25 17:49 Locking comment on shrink_caches() Marcelo Tosatti
2001-09-25 19:57 ` David S. Miller
2001-09-25 18:40 ` Marcelo Tosatti
2001-09-25 20:15 ` David S. Miller
2001-09-25 19:02 ` Marcelo Tosatti
2001-09-25 20:29 ` David S. Miller
2001-09-25 21:00 ` Benjamin LaHaise
2001-09-25 21:55 ` David S. Miller
2001-09-25 22:16 ` Benjamin LaHaise
2001-09-25 22:28 ` David S. Miller
2001-09-26 16:40 ` Alan Cox
2001-09-26 17:25 ` Linus Torvalds
2001-09-26 17:40 ` Alan Cox
2001-09-26 17:44 ` Linus Torvalds
2001-09-26 18:01 ` Benjamin LaHaise
2001-09-26 18:01 ` Dave Jones
2001-09-26 20:20 ` Vojtech Pavlik
2001-09-26 20:24 ` Vojtech Pavlik
2001-09-26 17:43 ` Richard Gooch
2001-09-26 18:24 ` Benjamin LaHaise
2001-09-26 18:48 ` Richard Gooch
2001-09-26 18:58 ` Davide Libenzi
2001-09-26 17:45 ` Dave Jones
2001-09-26 17:50 ` Alan Cox
2001-09-26 17:59 ` Dave Jones
2001-09-26 18:07 ` Alan Cox
2001-09-26 18:09 ` Padraig Brady
2001-09-26 18:22 ` Dave Jones
2001-09-26 18:24 ` Linus Torvalds
2001-09-26 18:40 ` Dave Jones
2001-09-26 19:12 ` Linus Torvalds
2001-09-27 12:22 ` CPU frequency shifting "problems" Padraig Brady
2001-09-27 12:44 ` Dave Jones
2001-09-27 23:23 ` Linus Torvalds
2001-09-28 0:55 ` Alan Cox
2001-09-28 2:12 ` Stefan Smietanowski
2001-09-28 8:55 ` Jamie Lokier
2001-09-28 16:11 ` Linus Torvalds
2001-09-28 20:29 ` Eric W. Biederman
2001-09-28 22:24 ` Jamie Lokier
2001-09-26 19:04 ` Locking comment on shrink_caches() George Greer
2001-09-26 18:59 ` George Greer
2001-09-26 23:26 ` David S. Miller
2001-09-27 12:10 ` Alan Cox
2001-09-27 15:38 ` Linus Torvalds
2001-09-27 17:44 ` Ingo Molnar
2001-09-27 19:41 ` David S. Miller
2001-09-27 22:59 ` Alan Cox
2001-09-25 22:03 ` Andrea Arcangeli
2001-09-25 20:24 ` Rik van Riel
2001-09-25 20:28 ` David S. Miller
2001-09-25 21:05 ` Andrew Morton
2001-09-25 21:48 ` David S. Miller
[not found] ` <200109252215.f8PMFDa02034@eng2.beaverton.ibm.com>
2001-09-25 22:26 ` David S. Miller
2001-09-26 17:42 ` Ingo Molnar
2001-09-25 22:01 ` Andrea Arcangeli
2001-09-25 22:03 ` David S. Miller
2001-09-25 22:59 ` Andrea Arcangeli
2001-09-25 20:40 ` Josh MacDonald
2001-09-25 19:25 ` Marcelo Tosatti
2001-09-25 21:57 ` Andrea Arcangeli
-- strict thread matches above, loose matches on Subject: below --
2001-09-26 5:04 Dipankar Sarma
2001-09-26 5:31 ` Andrew Morton
2001-09-26 6:57 ` David S. Miller
2001-09-26 7:08 ` Dipankar Sarma
2001-09-26 16:52 ` John Hawkes
[not found] <fa.cbgmt3v.192gc8r@ifi.uio.no>
[not found] ` <fa.cd0mtbv.1aigc0v@ifi.uio.no>
[not found] ` <i1m66a5o1zc.fsf@verden.pvv.ntnu.no>
2001-09-27 1:34 ` Vojtech Pavlik
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox