* Re: Minutes from Feb 21 LSE Call
2003-02-22 23:15 ` Larry McVoy
@ 2003-02-22 23:23 ` Christoph Hellwig
2003-02-22 23:54 ` Mark Hahn
2003-02-22 23:44 ` Martin J. Bligh
` (2 subsequent siblings)
3 siblings, 1 reply; 157+ messages in thread
From: Christoph Hellwig @ 2003-02-22 23:23 UTC (permalink / raw)
To: Larry McVoy, Martin J. Bligh, Larry McVoy, Mark Hahn,
David S. Miller, linux-kernel
On Sat, Feb 22, 2003 at 03:15:52PM -0800, Larry McVoy wrote:
> Show me one OS which scales to 32 CPUs on an I/O load and run lmbench
> on a single CPU. Then take that same CPU and stuff it into a uniprocessor
> motherboard and run the same benchmarks on under Linux. The Linux one
> will blow away the multi threaded one. Come on, prove me wrong, show
> me the data.
I could ask the SGI Eagan folks to do that with an Altix and a IA64
Whitebox - oh wait, both OSes would be Linux..
^ permalink raw reply [flat|nested] 157+ messages in thread* Re: Minutes from Feb 21 LSE Call
2003-02-22 23:23 ` Christoph Hellwig
@ 2003-02-22 23:54 ` Mark Hahn
0 siblings, 0 replies; 157+ messages in thread
From: Mark Hahn @ 2003-02-22 23:54 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-kernel
> I could ask the SGI Eagan folks to do that with an Altix and a IA64
> Whitebox - oh wait, both OSes would be Linux..
the only public info I've seen is "round-trip in as little as 40ns",
which is too vague to be useful. and sounds WAY optimistic - perhaps
that's just between two CPUs in a single brick. remember that
LMBench shows memory latencies of O(100ns) for even fast uniprocessors.
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-22 23:15 ` Larry McVoy
2003-02-22 23:23 ` Christoph Hellwig
@ 2003-02-22 23:44 ` Martin J. Bligh
2003-02-24 4:56 ` Larry McVoy
2003-02-22 23:57 ` Jeff Garzik
2003-02-23 23:57 ` Bill Davidsen
3 siblings, 1 reply; 157+ messages in thread
From: Martin J. Bligh @ 2003-02-22 23:44 UTC (permalink / raw)
To: Larry McVoy; +Cc: Mark Hahn, David S. Miller, linux-kernel
>> OK, so now you've slid from talking about PCs to 2-way to 4-way ...
>> perhaps because your original arguement was fatally flawed.
>
> Nice attempt at deflection but it won't work.
On your part or mine? seemingly yours.
> Your position is that
> there is no money in PC's only in big iron. Last I checked, "big iron"
> doesn't include $25K 4 way machines, now does it?
I would call 4x a "big machine" which is what I originally said.
> You claimed that
> Dell was making the majority of their profits from servers.
I think that's probably true (nobody can be certain, as we don't have the
numbers).
> To refresh
> your memory: "I bet they still make more money on servers than desktops
> and notebooks combined". Are you still claiming that?
Yup.
> If so, please
> provide some data to back it up because, as Mark and others have pointed
> out, the bulk of their servers are headless desktop machines in tower
> or rackmount cases.
So what? they're still servers. I can no more provide data to back it up
than you can to contradict it, because they don't release those figures.
Note my sentence began "I bet", not "I have cast iron evidence".
> Let's get back to your position. You want to shovel stuff in the kernel
> for the benefit of the 32 way / 64 way etc boxes.
Actually, I'm focussed on 16-way at the moment, and have never run on,
or published numbers for anything higher. If you need to exaggerate
to make your point, then go ahead, but it's pretty transparent.
> I don't see that as wise. You could prove me wrong.
> Here's how you do it: go get oprofile
> or whatever that tool is which lets you run apps and count cache misses.
> Start including before/after runs of each microbench in lmbench and
> some time sharing loads with and without your changes. When you can do
> that and you don't add any more bus traffic, you're a genius and
> I'll shut up.
I don't feel the need to do that to prove my point, but if you feel the
need to do it to prove yours, go ahead.
> But that's a false promise because by definition, fine grained threading
> adds more bus traffic. It's kind of hard to not have that happen, the
> caches have to stay coherent somehow.
Adding more bus traffic is fine if you increase throughput. Focussing
on just one tiny aspect of performance is ludicrous. Look at the big
picture. Run some non-micro benchmarks. Analyse the results. Compare
2.4 vs 2.5 (or any set of patches I've put into the kernel of your choice)
On UP, 2P or whatever you care about.
You seem to think the maintainers are morons that we can just slide crap
straight by ... give them a little more credit than that.
> Tell it to Google. That's probably one of the largest applications in
> the world; I was the 4th engineer there, and I didn't think that the
> cluster added complexity at all. On the contrary, it made things go
> one hell of a lot faster.
As I've explained to you many times before, it depends on the system.
Some things split easily, some don't.
>> You don't believe we can make it scale without screwing up the low end,
>> I do believe we can do that.
>
> I'd like a little more than "I think I can, I think I can, I think I can".
> The people who are saying "no you can't, no you can't, no you can't" have
> seen this sort of work done before and there is no data which shows that
> it is possible and all sorts of data which shows that it is not.
The only data that's relevant is what we've done to Linux. If you want
to run the numbers, and show some useful metric on a semi-realistic
benchmark, I'd love to seem.
> Show me one OS which scales to 32 CPUs on an I/O load and run lmbench
> on a single CPU. Then take that same CPU and stuff it into a uniprocessor
> motherboard and run the same benchmarks on under Linux. The Linux one
> will blow away the multi threaded one.
Nobody has every really focussed before on an OS that scales across the
board from UP to big iron ... a closed development system is bad at
resolving that sort of thing. The real interesting comparison is UP
or 2x SMP on Linux with and without the scalability changes that have
made it into the tree.
> Come on, prove me wrong, show me the data.
I don't have to *prove* you wrong. I'm happy in my own personal knowledge
that you're wrong, and things seem to be going along just fine, thanks.
If you want to change the attitude of the maintainers, I suggest you
generate the data yourself.
M.
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-22 23:44 ` Martin J. Bligh
@ 2003-02-24 4:56 ` Larry McVoy
2003-02-24 5:06 ` William Lee Irwin III
2003-02-24 5:16 ` Martin J. Bligh
0 siblings, 2 replies; 157+ messages in thread
From: Larry McVoy @ 2003-02-24 4:56 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: Larry McVoy, Mark Hahn, David S. Miller, linux-kernel
> > Your position is that
> > there is no money in PC's only in big iron. Last I checked, "big iron"
> > doesn't include $25K 4 way machines, now does it?
>
> I would call 4x a "big machine" which is what I originally said.
Nonsense. You were talking about 16/32/64 way boxes, go read your own mail.
In fact, you said so in this message.
Furthermore, I can prove that isn't what you are talking about. Show me
the performance gains you are getting on 4way systems from your changes.
Last I checked, things scaled pretty nicely on 4 ways.
> > You claimed that
> > Dell was making the majority of their profits from servers.
>
> I think that's probably true (nobody can be certain, as we don't have the
> numbers).
Yes, we do. You just don't like what the numbers are saying. You can
work backward from the size of the server market and the percentages
claimed by Sun, HP, IBM, etc. If you do that, you'll see that even
if Dell was making 100% margins on every server they sold, that still
wouldn't be 51% of their profits.
It's not "probably true", it's not physically possible that it is true
and if you don't know that you are simply waving your hands and not
doing any math.
> > To refresh
> > your memory: "I bet they still make more money on servers than desktops
> > and notebooks combined". Are you still claiming that?
>
> Yup.
Well, you are flat out 100% wrong.
> > If so, please
> > provide some data to back it up because, as Mark and others have pointed
> > out, the bulk of their servers are headless desktop machines in tower
> > or rackmount cases.
>
> So what? they're still servers. I can no more provide data to back it up
> than you can to contradict it, because they don't release those figures.
Read the mail I've posted on topic, the data is there. Or better yet,
don't trust me, go work it out for yourself, it isn't hard.
> > I don't see that as wise. You could prove me wrong.
> > Here's how you do it: go get oprofile
> > or whatever that tool is which lets you run apps and count cache misses.
> > Start including before/after runs of each microbench in lmbench and
> > some time sharing loads with and without your changes. When you can do
> > that and you don't add any more bus traffic, you're a genius and
> > I'll shut up.
>
> I don't feel the need to do that to prove my point, but if you feel the
> need to do it to prove yours, go ahead.
Ahh, now we're getting somewhere. As soon as we get anywhere near real
numbers, you don't want anything to do with it. Why is that?
> You seem to think the maintainers are morons that we can just slide crap
> straight by ... give them a little more credit than that.
It happens all the time.
> > Come on, prove me wrong, show me the data.
>
> I don't have to *prove* you wrong. I'm happy in my own personal knowledge
> that you're wrong, and things seem to be going along just fine, thanks.
Wow. Compelling. "It is so because I say it is so". Jeez, forgive me
if I'm not falling all over myself to have that sort of engineering being
the basis for scaling work.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-24 4:56 ` Larry McVoy
@ 2003-02-24 5:06 ` William Lee Irwin III
2003-02-24 6:00 ` Mark Hahn
2003-02-24 15:06 ` Alan Cox
2003-02-24 5:16 ` Martin J. Bligh
1 sibling, 2 replies; 157+ messages in thread
From: William Lee Irwin III @ 2003-02-24 5:06 UTC (permalink / raw)
To: Larry McVoy, Martin J. Bligh, Larry McVoy, Mark Hahn,
David S. Miller, linux-kernel
On Sun, Feb 23, 2003 at 08:56:16PM -0800, Larry McVoy wrote:
> Furthermore, I can prove that isn't what you are talking about. Show me
> the performance gains you are getting on 4way systems from your changes.
> Last I checked, things scaled pretty nicely on 4 ways.
Try 4 or 8 mkfs's in parallel on a 4x box running virgin 2.4.x.
-- wli
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-24 5:06 ` William Lee Irwin III
@ 2003-02-24 6:00 ` Mark Hahn
2003-02-24 6:02 ` William Lee Irwin III
2003-02-24 15:06 ` Alan Cox
1 sibling, 1 reply; 157+ messages in thread
From: Mark Hahn @ 2003-02-24 6:00 UTC (permalink / raw)
To: William Lee Irwin III; +Cc: Larry McVoy, linux-kernel
> > Last I checked, things scaled pretty nicely on 4 ways.
>
> Try 4 or 8 mkfs's in parallel on a 4x box running virgin 2.4.x.
"Doctor, it hurts..."
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-24 6:00 ` Mark Hahn
@ 2003-02-24 6:02 ` William Lee Irwin III
0 siblings, 0 replies; 157+ messages in thread
From: William Lee Irwin III @ 2003-02-24 6:02 UTC (permalink / raw)
To: Mark Hahn; +Cc: Larry McVoy, linux-kernel
At some point in the past, Larry McVoy wrote:
>>> Last I checked, things scaled pretty nicely on 4 ways.
At some point in the past, I wrote:
>> Try 4 or 8 mkfs's in parallel on a 4x box running virgin 2.4.x.
On Mon, Feb 24, 2003 at 01:00:22AM -0500, Mark Hahn wrote:
> "Doctor, it hurts..."
Doing disk io is supposed to hurt? I'll file this in the "sick and
wrong" category along with RBJ and Hohensee.
In the meantime, compare to 2.5.x.
-- wli
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-24 5:06 ` William Lee Irwin III
2003-02-24 6:00 ` Mark Hahn
@ 2003-02-24 15:06 ` Alan Cox
2003-02-24 23:18 ` William Lee Irwin III
1 sibling, 1 reply; 157+ messages in thread
From: Alan Cox @ 2003-02-24 15:06 UTC (permalink / raw)
To: William Lee Irwin III
Cc: Larry McVoy, Martin J. Bligh, Larry McVoy, Mark Hahn,
David S. Miller, Linux Kernel Mailing List
On Mon, 2003-02-24 at 05:06, William Lee Irwin III wrote:
> On Sun, Feb 23, 2003 at 08:56:16PM -0800, Larry McVoy wrote:
> > Furthermore, I can prove that isn't what you are talking about. Show me
> > the performance gains you are getting on 4way systems from your changes.
> > Last I checked, things scaled pretty nicely on 4 ways.
>
> Try 4 or 8 mkfs's in parallel on a 4x box running virgin 2.4.x.
You have strange ideas of typical workloads. The mkfs paralle one is a good
one though because its also a lot better on one CPU in 2.5
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-24 15:06 ` Alan Cox
@ 2003-02-24 23:18 ` William Lee Irwin III
0 siblings, 0 replies; 157+ messages in thread
From: William Lee Irwin III @ 2003-02-24 23:18 UTC (permalink / raw)
To: Alan Cox
Cc: Larry McVoy, Martin J. Bligh, Larry McVoy, Mark Hahn,
David S. Miller, Linux Kernel Mailing List
On Mon, 2003-02-24 at 05:06, William Lee Irwin III wrote:
>> Try 4 or 8 mkfs's in parallel on a 4x box running virgin 2.4.x.
On Mon, Feb 24, 2003 at 03:06:53PM +0000, Alan Cox wrote:
> You have strange ideas of typical workloads. The mkfs paralle one is a good
> one though because its also a lot better on one CPU in 2.5
The results I saw were that this did not affect 2.5 in any interesting
way and 2.4 behaved "very badly".
It's a simple way to get lots of disk io going without a complex
benchmark. There are good reasons and real workloads why things were
done to fix this.
-- wli
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-24 4:56 ` Larry McVoy
2003-02-24 5:06 ` William Lee Irwin III
@ 2003-02-24 5:16 ` Martin J. Bligh
2003-02-24 6:58 ` Larry McVoy
1 sibling, 1 reply; 157+ messages in thread
From: Martin J. Bligh @ 2003-02-24 5:16 UTC (permalink / raw)
To: Larry McVoy; +Cc: linux-kernel
> Nonsense. You were talking about 16/32/64 way boxes, go read your own
> mail. In fact, you said so in this message.
Where? I never mentioned 32 / 64 way boxes, for starters ...
> Furthermore, I can prove that isn't what you are talking about. Show me
> the performance gains you are getting on 4way systems from your changes.
> Last I checked, things scaled pretty nicely on 4 ways.
Depends what you mean by "your changes". If you do a before and after
comparison on a 4x machine on the scalability changes IBM LTC has made, I
think you'd find a dramatic difference. Of course, it depends to some
extent on what tests you run. Maybe running bitkeeper (or whatever you're
testing) just eats cpu, and doesn't do much interprocess communication or
disk IO (compared to the CPU load), in which case it'll scale pretty well
on
anything as long as it's multithreaded enough. If you're just worried about
one particular app, yes of course you could tweak the system to go faster
for it ... but that's not what a general purpose OS is about.
> Yes, we do. You just don't like what the numbers are saying. You can
> work backward from the size of the server market and the percentages
> claimed by Sun, HP, IBM, etc. If you do that, you'll see that even
> if Dell was making 100% margins on every server they sold, that still
> wouldn't be 51% of their profits.
Ummm ... now go back to what we were actually talking about. Linux margins.
You think a significant percentage of the desktops they sell run Linux?
>> > To refresh
>> > your memory: "I bet they still make more money on servers than desktops
>> > and notebooks combined". Are you still claiming that?
>>
>> Yup.
>
> Well, you are flat out 100% wrong.
In the context we were talking about (Linux), I seriously doubt it.
Apologies if I didn't feel the need to continously restate the context in
every email to stop you from trying to twist the argument.
> Ahh, now we're getting somewhere. As soon as we get anywhere near real
> numbers, you don't want anything to do with it. Why is that?
Because I don't see why I should waste my time running benchmarks just to
prove you wrong. I don't respect you that much, and it seems the
maintainers don't either. When you become somebody with the stature in the
Linux community of, say, Linus or Andrew I'd be prepared to spend a lot
more time running benchmarks on any concerns you might have.
>> I don't have to *prove* you wrong. I'm happy in my own personal knowledge
>> that you're wrong, and things seem to be going along just fine, thanks.
>
> Wow. Compelling. "It is so because I say it is so". Jeez, forgive me
> if I'm not falling all over myself to have that sort of engineering being
> the basis for scaling work.
Ummm ... and your argument is different because of what? You've run some
tiny little microfocused benchmark, seen a couple of bus cycles, and
projected the results out? Not very impressive, really, is it? Go run a
real benchmark and prove it makes a difference if you want to sway people's
opinions. Until then, I suspect the current status quo will continue in
terms of us getting patches accepted.
M.
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-24 5:16 ` Martin J. Bligh
@ 2003-02-24 6:58 ` Larry McVoy
2003-02-24 7:39 ` Martin J. Bligh
` (3 more replies)
0 siblings, 4 replies; 157+ messages in thread
From: Larry McVoy @ 2003-02-24 6:58 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: Larry McVoy, linux-kernel
On Sun, Feb 23, 2003 at 09:16:38PM -0800, Martin J. Bligh wrote:
> Ummm ... now go back to what we were actually talking about. Linux margins.
> You think a significant percentage of the desktops they sell run Linux?
The real discussion was the justification for scaling work beyond the
small SMPs. You tried to make the point that there is no money in PC's so
any work to scale Linux up would help hardware companies stay financially
healthy. I and others pointed out that there is indeed a pile of money
in PC's, that's vast majority of the hardware dell sells. They don't
sell anything bigger than an 8 way and they only have one of those.
We went on to do the digging to figure out that it's impossible that
dell makes a substantial portion of their profits from the big servers.
The point being that there is a company generating $32B/year in sales and
almost all of that is in uniprocessors. Directly countering your statement
that there is no margin in PC's. They are making $2B/year in profits, QED.
Which brings us back to the point. If the world is not heading towards
an 8 way on every desk then it is really questionable to make a lot of
changes to the kernel to make it work really well on 8-ways. Yeah, I'm
sure it makes you feel good, but it's more of a intellectual exercise than
anything which really benefits the vast majority of the kernel user base.
> > Ahh, now we're getting somewhere. As soon as we get anywhere near real
> > numbers, you don't want anything to do with it. Why is that?
>
> Because I don't see why I should waste my time running benchmarks just to
> prove you wrong. I don't respect you that much, and it seems the
> maintainers don't either. When you become somebody with the stature in the
> Linux community of, say, Linus or Andrew I'd be prepared to spend a lot
> more time running benchmarks on any concerns you might have.
Who cares if you respect me, what does that have to do with proper
engineering? Do you think that I'm the only person who wants to see
numbers? You think Linus doesn't care about this? Maybe you missed
the whole IA32 vs IA64 instruction cache thread. It sure sounded like
he cares. How about Alan? He stepped up and pointed out that less
is more. How about Mark? He knows a thing or two about the topic?
In fact, I think you'd be hard pressed to find anyone who wouldn't be
interested in seeing the cache effects of a patch.
People care about performance, both scaling up and scaling down. A lot of
performance changes are measured poorly, in a way that makes the changes
look good but doesn't expose the hidden costs of the change. What I'm
saying is that those sorts of measurements screwed over performance in
the past, why are you trying to repeat old mistakes?
> > Wow. Compelling. "It is so because I say it is so". Jeez, forgive me
> > if I'm not falling all over myself to have that sort of engineering being
> > the basis for scaling work.
>
> Ummm ... and your argument is different because of what? You've run some
> tiny little microfocused benchmark, seen a couple of bus cycles, and
> projected the results out?
My argument is different because every effort which has gone in the
direction you are going has ended up with a kernel that worked well on
big boxes and sucked rocks on little boxes. And all of them started
with kernels which performed quite nicely on uniprocessors.
If I was waving my hands and saying "I'm an old fart and I think this
won't work" and that was it, you'd have every right to tell me to piss
off. I'd tell me to piss off. But that's not what is going on here.
What's going on is that a pile of smart people have tried over and over
to do what you claim you will do and they all failed. They all ended up
with kernels that gave up lots of uniprocessor performance and justified
it by throwing more processors at that problem. You haven't said a
single thing to refute that and when challenged to measure the parts
which lead to those results you respond with "nah, nah, I don't respect
you so I don't have to measure it". Come on, *you* should want to know
if what I'm saying is true. You're an engineer, not a marketing drone,
of course you should want to know, why wouldn't you?
Linux is a really fast system right now. The code paths are short and
it is possible to use the OS almost as if it were a library, the cost is
so little that you really can mmap stuff in as you need, something that
people have wanted since Multics. There will always be many more uses
of Linux in small systems than large, simply because there will always
be more small systems. Keeping Linux working well on small systems is
going to have a dramatically larger positive benefit for the world than
scaling it to 64 processors. So who do you want to help? An elite
few or everyone?
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm
^ permalink raw reply [flat|nested] 157+ messages in thread* Re: Minutes from Feb 21 LSE Call
2003-02-24 6:58 ` Larry McVoy
@ 2003-02-24 7:39 ` Martin J. Bligh
2003-02-24 16:17 ` Larry McVoy
2003-02-24 7:51 ` William Lee Irwin III
` (2 subsequent siblings)
3 siblings, 1 reply; 157+ messages in thread
From: Martin J. Bligh @ 2003-02-24 7:39 UTC (permalink / raw)
To: Larry McVoy; +Cc: linux-kernel
>> Ummm ... now go back to what we were actually talking about. Linux
>> margins. You think a significant percentage of the desktops they sell
>> run Linux?
>
> The real discussion was the justification for scaling work beyond the
> small SMPs. You tried to make the point that there is no money in PC's so
> any work to scale Linux up would help hardware companies stay financially
> healthy.
More or less, yes.
> The point being that there is a company generating $32B/year in sales and
> almost all of that is in uniprocessors. Directly countering your
> statement that there is no margin in PC's. They are making $2B/year in
> profits, QED.
Which is totally irrelevant. It's the *LINUX* market that matters. What
part of that do you find so hard to understand?
> Which brings us back to the point. If the world is not heading towards
> an 8 way on every desk then it is really questionable to make a lot of
> changes to the kernel to make it work really well on 8-ways. Yeah, I'm
> sure it makes you feel good, but it's more of a intellectual exercise than
> anything which really benefits the vast majority of the kernel user base.
It makes IBM money, ergo they pay me. I enjoy doing it, ergo I work for
them. Most of the work benefits smaller systems as well, ergo we get our
patches accepted. So everyone's happy, apart from you, who keeps whining.
>> Because I don't see why I should waste my time running benchmarks just to
>> prove you wrong. I don't respect you that much, and it seems the
>> maintainers don't either. When you become somebody with the stature in
>> the Linux community of, say, Linus or Andrew I'd be prepared to spend a
>> lot more time running benchmarks on any concerns you might have.
>
> Who cares if you respect me, what does that have to do with proper
> engineering? Do you think that I'm the only person who wants to see
> numbers? You think Linus doesn't care about this? Maybe you missed
> the whole IA32 vs IA64 instruction cache thread. It sure sounded like
> he cares. How about Alan? He stepped up and pointed out that less
> is more. How about Mark? He knows a thing or two about the topic?
> In fact, I think you'd be hard pressed to find anyone who wouldn't be
> interested in seeing the cache effects of a patch.
So now we've slid from talking about bus traffic from fine-grained locking,
which is mostly just you whining in ignorance of the big picture, to cache
effects, which are obviously important. Nice try at twisting the
conversation. Again.
> People care about performance, both scaling up and scaling down. A lot of
> performance changes are measured poorly, in a way that makes the changes
> look good but doesn't expose the hidden costs of the change. What I'm
> saying is that those sorts of measurements screwed over performance in
> the past, why are you trying to repeat old mistakes?
One way to measure those changes poorly would be to do what you were
advocating earlier - look at one tiny metric of a microbenchmark, rather
than the actual throughput of the machine. So pardon me if I take your
concerns, and file them in the appropriate place.
> My argument is different because every effort which has gone in the
> direction you are going has ended up with a kernel that worked well on
> big boxes and sucked rocks on little boxes. And all of them started
> with kernels which performed quite nicely on uniprocessors.
So you're trying to say that fine-grained locking ruins uniprocessor
performance now? Or did you have some other change in mind?
> If I was waving my hands and saying "I'm an old fart and I think this
> won't work" and that was it, you'd have every right to tell me to piss
> off. I'd tell me to piss off. But that's not what is going on here.
> What's going on is that a pile of smart people have tried over and over
> to do what you claim you will do and they all failed. They all ended up
> with kernels that gave up lots of uniprocessor performance and justified
> it by throwing more processors at that problem. You haven't said a
> single thing to refute that and when challenged to measure the parts
> which lead to those results you respond with "nah, nah, I don't respect
> you so I don't have to measure it". Come on, *you* should want to know
> if what I'm saying is true. You're an engineer, not a marketing drone,
> of course you should want to know, why wouldn't you?
You just don't get it, do you? Your head is so vastly inflated that you
think everyone should run around researching whatever *you* happen to think
is interesting. Do your own benchmarking if you think it's a problem.
You're the one whining about this.
> Linux is a really fast system right now. The code paths are short and
> it is possible to use the OS almost as if it were a library, the cost is
> so little that you really can mmap stuff in as you need, something that
> people have wanted since Multics. There will always be many more uses
> of Linux in small systems than large, simply because there will always
> be more small systems. Keeping Linux working well on small systems is
> going to have a dramatically larger positive benefit for the world than
> scaling it to 64 processors. So who do you want to help? An elite
> few or everyone?
Everyone. And we can do that, and make large systems work at the same time.
Despite the fact you don't believe me. And despite the fact that you can't
grasp the difference between the number 16 and the number 64.
M.
^ permalink raw reply [flat|nested] 157+ messages in thread* Re: Minutes from Feb 21 LSE Call
2003-02-24 7:39 ` Martin J. Bligh
@ 2003-02-24 16:17 ` Larry McVoy
2003-02-24 16:49 ` Martin J. Bligh
2003-02-24 18:22 ` Minutes from Feb 21 LSE Call John W. M. Stevens
0 siblings, 2 replies; 157+ messages in thread
From: Larry McVoy @ 2003-02-24 16:17 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: Larry McVoy, linux-kernel
On Sun, Feb 23, 2003 at 11:39:34PM -0800, Martin J. Bligh wrote:
> > The point being that there is a company generating $32B/year in sales and
> > almost all of that is in uniprocessors. Directly countering your
> > statement that there is no margin in PC's. They are making $2B/year in
> > profits, QED.
>
> Which is totally irrelevant. It's the *LINUX* market that matters. What
> part of that do you find so hard to understand?
OK, so you can't handle the reality that the server market overall doesn't
make your point so you retreat to the Linux market. OK, fine. All the
data anyone has ever seen has Linux running on *smaller* servers, not
larger. Show me all the cases where people replaced 4 CPU NT boxes with
8 CPU Linux boxes.
The point being that if in the overall market place, big iron isn't
dominating, you have one hell of a tough time making the case that the
Linux market place is somehow profoundly different and needs larger
boxes to do the same job.
In fact, the opposite is true. Linux squeezes substantially more
performance out of the same hardware than the commercial OS offerings,
NT or Unix. So where is the market force which says "oh, switching to
Linux? Better get more CPUs".
> It makes IBM money, ergo they pay me. I enjoy doing it, ergo I work for
> them. Most of the work benefits smaller systems as well, ergo we get our
> patches accepted. So everyone's happy, apart from you, who keeps whining.
Indeed I do, I'm good at it. You're about to find out how good. It's
quite effective to simply focus attention on a problem area. Here's
my promise to you: there will be a ton of attention focussed on the
scaling patches until you and anyone else doing them starts showing
up with cache miss counters as part of the submission process.
> So now we've slid from talking about bus traffic from fine-grained locking,
> which is mostly just you whining in ignorance of the big picture, to cache
> effects, which are obviously important. Nice try at twisting the
> conversation. Again.
You need to take a deep breath and try and understand that the focus of
the conversation is Linux, not your ego or mine. Getting mad at me just
wastes energy, stay focussed on the real issue, Linux.
> > People care about performance, both scaling up and scaling down. A lot of
> > performance changes are measured poorly, in a way that makes the changes
> > look good but doesn't expose the hidden costs of the change. What I'm
> > saying is that those sorts of measurements screwed over performance in
> > the past, why are you trying to repeat old mistakes?
>
> One way to measure those changes poorly would be to do what you were
> advocating earlier - look at one tiny metric of a microbenchmark, rather
> than the actual throughput of the machine. So pardon me if I take your
> concerns, and file them in the appropriate place.
You apparently missed the point where I have said (a bunch of times)
run the benchmarks you want and report before and after the patch
cache miss counters for the same runs. Microbenchmarks would be
a really bad way to do that, you really want to run a real application
because you need it fighting for the cache.
> > My argument is different because every effort which has gone in the
> > direction you are going has ended up with a kernel that worked well on
> > big boxes and sucked rocks on little boxes. And all of them started
> > with kernels which performed quite nicely on uniprocessors.
>
> So you're trying to say that fine-grained locking ruins uniprocessor
> performance now?
I've been saying that for almost 10 years, check the archives.
> You just don't get it, do you? Your head is so vastly inflated that you
> think everyone should run around researching whatever *you* happen to think
> is interesting. Do your own benchmarking if you think it's a problem.
That's exactly what I'll do if you don't learn how to do it yourself. I'm
astounded that any competent engineer wouldn't want to know the effects of
their changes, I think you actually do but are just too pissed right now
to see it.
> > Linux is a really fast system right now. [etc]
>
> Everyone. And we can do that, and make large systems work at the same time.
> Despite the fact you don't believe me. And despite the fact that you can't
> grasp the difference between the number 16 and the number 64.
See other postings on this one. All engineers in your position have said
"we're just trying to get to N cpus where N = ~2x where we are today and
it won't hurt uniprocessor performance". They *all* say that. And they
all end up with a slow uniprocessor OS. Unlike security and a number of
other invasive features, the SMP stuff can't be configed out or you end
up with an #ifdef-ed mess like IRIX.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-24 16:17 ` Larry McVoy
@ 2003-02-24 16:49 ` Martin J. Bligh
2003-02-25 0:41 ` Server shipments [was Re: Minutes from Feb 21 LSE Call] Larry McVoy
2003-02-24 18:22 ` Minutes from Feb 21 LSE Call John W. M. Stevens
1 sibling, 1 reply; 157+ messages in thread
From: Martin J. Bligh @ 2003-02-24 16:49 UTC (permalink / raw)
To: Larry McVoy; +Cc: linux-kernel
>> > The point being that there is a company generating $32B/year in sales
>> > and almost all of that is in uniprocessors. Directly countering your
>> > statement that there is no margin in PC's. They are making $2B/year in
>> > profits, QED.
>>
>> Which is totally irrelevant. It's the *LINUX* market that matters. What
>> part of that do you find so hard to understand?
>
> OK, so you can't handle the reality that the server market overall doesn't
> make your point so you retreat to the Linux market. OK, fine. All the
Errm. no. That was the conversation all along - you just took some remarks
out of context
> The point being that if in the overall market place, big iron isn't
> dominating, you have one hell of a tough time making the case that the
> Linux market place is somehow profoundly different and needs larger
> boxes to do the same job.
Dominating in terms of volume? No. My postion is that Linux sales for
hardware companies make more money on servers than desktops. We're working
on scalability ... that means CPUs, memory, disk IO, networking,
everything. That improves both the efficiency of servers ... "large
machines" (which your original message had as, and I quote, "4 or more CPU
SMP machines"), 2x and even larger 1x machines. If you're being more
specific as to things like NUMA changes, please point to examples of
patches you think degrades performance on UP / 2x or whatever.
> Indeed I do, I'm good at it. You're about to find out how good. It's
> quite effective to simply focus attention on a problem area. Here's
> my promise to you: there will be a ton of attention focussed on the
> scaling patches until you and anyone else doing them starts showing
> up with cache miss counters as part of the submission process.
Here's my promise to you: people listen to you far less than you think, and
our patches will continue to go into the kernel.
>> So now we've slid from talking about bus traffic from fine-grained
>> locking, which is mostly just you whining in ignorance of the big
>> picture, to cache effects, which are obviously important. Nice try at
>> twisting the conversation. Again.
>
> You need to take a deep breath and try and understand that the focus of
> the conversation is Linux, not your ego or mine. Getting mad at me just
> wastes energy, stay focussed on the real issue, Linux.
So exactly what do you think is the problem? it seems to keep shifting
mysteriously. Name some patches that got accepted into mainline ... if
they're broken, that'll give us some clues what is bad for the future, and
we can fix them.
>> One way to measure those changes poorly would be to do what you were
>> advocating earlier - look at one tiny metric of a microbenchmark, rather
>> than the actual throughput of the machine. So pardon me if I take your
>> concerns, and file them in the appropriate place.
>
> You apparently missed the point where I have said (a bunch of times)
> run the benchmarks you want and report before and after the patch
> cache miss counters for the same runs. Microbenchmarks would be
> a really bad way to do that, you really want to run a real application
> because you need it fighting for the cache.
One statistic (eg cache miss counters) isn't the big picture. If throughput
goes up or remains the same on all machines, that's what important.
>> So you're trying to say that fine-grained locking ruins uniprocessor
>> performance now?
>
> I've been saying that for almost 10 years, check the archives.
And you haven't worked out that locks compile away to nothing on UP yet? I
think you might be better off pulling your head out of where it's currently
residing, and pointing it at the source code.
>> You just don't get it, do you? Your head is so vastly inflated that you
>> think everyone should run around researching whatever *you* happen to
>> think is interesting. Do your own benchmarking if you think it's a
>> problem.
>
> That's exactly what I'll do if you don't learn how to do it yourself. I'm
> astounded that any competent engineer wouldn't want to know the effects of
> their changes, I think you actually do but are just too pissed right now
> to see it.
Cool, I'd love to see some benchmarks ... and real throughput numbers from
them, not just microstatistics.
> See other postings on this one. All engineers in your position have said
> "we're just trying to get to N cpus where N = ~2x where we are today and
> it won't hurt uniprocessor performance". They *all* say that. And they
> all end up with a slow uniprocessor OS. Unlike security and a number of
> other invasive features, the SMP stuff can't be configed out or you end
> up with an #ifdef-ed mess like IRIX.
Try looking up "abstraction" in a dictionary. Linus doesn't take #ifdef's
in the main code.
M.
^ permalink raw reply [flat|nested] 157+ messages in thread
* Server shipments [was Re: Minutes from Feb 21 LSE Call]
2003-02-24 16:49 ` Martin J. Bligh
@ 2003-02-25 0:41 ` Larry McVoy
2003-02-25 0:41 ` Martin J. Bligh
0 siblings, 1 reply; 157+ messages in thread
From: Larry McVoy @ 2003-02-25 0:41 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: Larry McVoy, linux-kernel
More data from news.com.
Dell has 19% of the server market with $531M/quarter in sales[1] over
212,750 machines per quarter[2].
That means that the average sale price for a server from Dell was $2495.
The average sale price of all servers from all companies is $9347.
I still don't see the big profits touted by the scaling fanatics, anyone
care to explain it?
[1] http://news.com.com/2100-1001-983892.html
[2] http://news.com.com/2100-1001-982004.html
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Server shipments [was Re: Minutes from Feb 21 LSE Call]
2003-02-25 0:41 ` Server shipments [was Re: Minutes from Feb 21 LSE Call] Larry McVoy
@ 2003-02-25 0:41 ` Martin J. Bligh
2003-02-25 0:54 ` Larry McVoy
2003-02-25 1:09 ` David Lang
0 siblings, 2 replies; 157+ messages in thread
From: Martin J. Bligh @ 2003-02-25 0:41 UTC (permalink / raw)
To: Larry McVoy; +Cc: linux-kernel
> More data from news.com.
>
> Dell has 19% of the server market with $531M/quarter in sales[1] over
> 212,750 machines per quarter[2].
>
> That means that the average sale price for a server from Dell was $2495.
>
> The average sale price of all servers from all companies is $9347.
>
> I still don't see the big profits touted by the scaling fanatics, anyone
> care to explain it?
Sigh. If you're so convinced that there's no money in larger systems,
why don't you write to Sam Palmisano and explain to him the error of
his ways? I'm sure IBM has absolutely no market data to go on ...
If only he could receive an explanation of the error of his ways from
Larry McVoy, I'm sure he'd turn the ship around, for you obviously have
all the facts, figures, and experience of the server market to make this
kind of decision. I await the email from the our CEO that tells us how
much he respects you, and has taken this decision at your bidding.
M.
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Server shipments [was Re: Minutes from Feb 21 LSE Call]
2003-02-25 0:41 ` Martin J. Bligh
@ 2003-02-25 0:54 ` Larry McVoy
2003-02-25 2:00 ` Tupshin Harper
2003-02-25 3:00 ` Martin J. Bligh
2003-02-25 1:09 ` David Lang
1 sibling, 2 replies; 157+ messages in thread
From: Larry McVoy @ 2003-02-25 0:54 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: Larry McVoy, linux-kernel
On Mon, Feb 24, 2003 at 04:41:04PM -0800, Martin J. Bligh wrote:
> > More data from news.com.
> >
> > Dell has 19% of the server market with $531M/quarter in sales[1] over
> > 212,750 machines per quarter[2].
> >
> > That means that the average sale price for a server from Dell was $2495.
> >
> > The average sale price of all servers from all companies is $9347.
> >
> > I still don't see the big profits touted by the scaling fanatics, anyone
> > care to explain it?
>
> Sigh. If you're so convinced that there's no money in larger systems,
> why don't you write to Sam Palmisano and explain to him the error of
> his ways? I'm sure IBM has absolutely no market data to go on ...
Numbers talk, bullshit walks. Got shoes?
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Server shipments [was Re: Minutes from Feb 21 LSE Call]
2003-02-25 0:54 ` Larry McVoy
@ 2003-02-25 2:00 ` Tupshin Harper
2003-02-25 3:54 ` Martin J. Bligh
2003-02-25 3:00 ` Martin J. Bligh
1 sibling, 1 reply; 157+ messages in thread
From: Tupshin Harper @ 2003-02-25 2:00 UTC (permalink / raw)
To: linux-kernel
This conversation has not only gotten out of hand, it's gotten quite
silly. People are arguing semantics and relative economic value where a
few simple assertions should do:
1) There is a significant interest from developers and users in having
Linux run efficiently on *small* platforms.
2) There is a significant interest from developers and users in having
Linux run efficiently on *large* platforms.
3) There is disagreement on whether it is possible to accomplish 1 and 2
simultaneously.
4) There is disagreement on whether adequate testing is taking place to
make sure 2 doesn't degrade 1(or vice versa).
This leads to two choices:
a) Fork. Obviously to be avoided at all reasonable costs.
b) Identify reasonable improvements to the testing methodology so that
any design conflicts are identified immediately instead of gradually
accumulating and degrading performance over time.
I vote b(surprise surprise), however, this just changes the debate to
"what is reasonable testing methodology?" This, however is a debate much
more worth having than "who ships more of what" and "who said what when".
Given that a fairly thorough performance testing suite is already in
place, it would seem to be up to the advocates for the "threatened"
computing environment (large or small) to convince the "testers that be"
that certain tests should be added. It is inherently unreasonable to
expect the developer of a feature/change to be unbiased and neutral with
respect to that feature, therefore it is unreasonable to expect them to
prove beyond a reasonable doubt that their feature has no negative
impact. The best that they can do is convince themselves that the
feature passes the really deep sniff test. The rest is up to the
community. The ability of a third party to critique code changes is a
large part of why the bazaar nature of linux development is so valuable.
-Tupshin
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Server shipments [was Re: Minutes from Feb 21 LSE Call]
2003-02-25 2:00 ` Tupshin Harper
@ 2003-02-25 3:54 ` Martin J. Bligh
0 siblings, 0 replies; 157+ messages in thread
From: Martin J. Bligh @ 2003-02-25 3:54 UTC (permalink / raw)
To: Tupshin Harper, linux-kernel
> Given that a fairly thorough performance testing suite is already in
> place, it would seem to be up to the advocates for the "threatened"
> computing environment (large or small) to convince the "testers that be"
> that certain tests should be added. It is inherently unreasonable to
> expect the developer of a feature/change to be unbiased and neutral with
> respect to that feature, therefore it is unreasonable to expect them to
> prove beyond a reasonable doubt that their feature has no negative
> impact. The best that they can do is convince themselves that the
> feature passes the really deep sniff test. The rest is up to the
> community. The ability of a third party to critique code changes is a
> large part of why the bazaar nature of linux development is so valuable.
An excellent and well thought out summary, and exactly why I welcome
Larry's proposal to do some testing and produce specific numbers on
specific patches instead of hand-waving and spreading FUD. This kind of
arrangement is exactly why the open development model will allow Linux to
win out in the long term.
M.
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Server shipments [was Re: Minutes from Feb 21 LSE Call]
2003-02-25 0:54 ` Larry McVoy
2003-02-25 2:00 ` Tupshin Harper
@ 2003-02-25 3:00 ` Martin J. Bligh
2003-02-25 3:13 ` Larry McVoy
2003-02-25 17:37 ` Andrea Arcangeli
1 sibling, 2 replies; 157+ messages in thread
From: Martin J. Bligh @ 2003-02-25 3:00 UTC (permalink / raw)
To: Larry McVoy; +Cc: linux-kernel
>> > More data from news.com.
>> >
>> > Dell has 19% of the server market with $531M/quarter in sales[1] over
>> > 212,750 machines per quarter[2].
>> >
>> > That means that the average sale price for a server from Dell was
>> > $2495.
>> >
>> > The average sale price of all servers from all companies is $9347.
>> >
>> > I still don't see the big profits touted by the scaling fanatics,
>> > anyone care to explain it?
>>
>> Sigh. If you're so convinced that there's no money in larger systems,
>> why don't you write to Sam Palmisano and explain to him the error of
>> his ways? I'm sure IBM has absolutely no market data to go on ...
>
> Numbers talk, bullshit walks. Got shoes?
Bullshit numbers walk too. Remember the context? Linux.
Linux servers vs. Linux desktops. If you think the Linux desktop market is
large, I'd like some of whatever you're smoking, as it's obviously good
stuff.
I think there's money in big iron, you don't seem to. That's fine, you're
not paying my salary (thank $deity).
Perhaps a person with the slightest understanding of basic arithmetic would
see that this:
>> > That means that the average sale price for a server from Dell was
>> > $2495.
>> >
>> > The average sale price of all servers from all companies is $9347.
means that somebody other than Dell is making the money on the big servers.
As Dell is a PC company, no real suprise there.
By the way ... you remember when I said that Linux could scale upwards
without hurting the low end? And that the reason we'd succeed in that where
Solaris et al failed was because the development model was different?
When you said you'd go run some UP benchmarks, that's *exactly* where the
development model is different. It's open enough that you can go do that
sort of thing, and if errors are made, you can point them out. I honestly
welcome the benchmark results you provide ... it's the strength of the
system.
M.
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Server shipments [was Re: Minutes from Feb 21 LSE Call]
2003-02-25 3:00 ` Martin J. Bligh
@ 2003-02-25 3:13 ` Larry McVoy
2003-02-25 4:11 ` Martin J. Bligh
2003-02-25 17:37 ` Andrea Arcangeli
1 sibling, 1 reply; 157+ messages in thread
From: Larry McVoy @ 2003-02-25 3:13 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: Larry McVoy, linux-kernel
On Mon, Feb 24, 2003 at 07:00:42PM -0800, Martin J. Bligh wrote:
> >> > That means that the average sale price for a server from Dell was
> >> > $2495.
> >> >
> >> > The average sale price of all servers from all companies is $9347.
>
> means that somebody other than Dell is making the money on the big servers.
What part of "all servers from all companies" did you not understand?
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Server shipments [was Re: Minutes from Feb 21 LSE Call]
2003-02-25 3:13 ` Larry McVoy
@ 2003-02-25 4:11 ` Martin J. Bligh
2003-02-25 4:17 ` Larry McVoy
0 siblings, 1 reply; 157+ messages in thread
From: Martin J. Bligh @ 2003-02-25 4:11 UTC (permalink / raw)
To: Larry McVoy; +Cc: linux-kernel
>> >> > That means that the average sale price for a server from Dell was
>> >> > $2495.
>> >> >
>> >> > The average sale price of all servers from all companies is $9347.
>>
>> means that somebody other than Dell is making the money on the big
>> servers.
>
> What part of "all servers from all companies" did you not understand?
Average price from Dell: $2495
Average price overall: $9347
Conclusion ... Dell makes cheaper servers than average, presumably smaller.
M.
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Server shipments [was Re: Minutes from Feb 21 LSE Call]
2003-02-25 4:11 ` Martin J. Bligh
@ 2003-02-25 4:17 ` Larry McVoy
2003-02-25 4:21 ` Martin J. Bligh
2003-02-25 22:02 ` Gerrit Huizenga
0 siblings, 2 replies; 157+ messages in thread
From: Larry McVoy @ 2003-02-25 4:17 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: Larry McVoy, linux-kernel
On Mon, Feb 24, 2003 at 08:11:21PM -0800, Martin J. Bligh wrote:
> > What part of "all servers from all companies" did you not understand?
>
> Average price from Dell: $2495
> Average price overall: $9347
>
> Conclusion ... Dell makes cheaper servers than average, presumably smaller.
So how many CPUs do you think you get in a $9K server?
Better yet, since you work for IBM, how many servers do they ship in a year
with 16 CPUs?
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Server shipments [was Re: Minutes from Feb 21 LSE Call]
2003-02-25 4:17 ` Larry McVoy
@ 2003-02-25 4:21 ` Martin J. Bligh
2003-02-25 4:37 ` Larry McVoy
2003-02-25 22:02 ` Gerrit Huizenga
1 sibling, 1 reply; 157+ messages in thread
From: Martin J. Bligh @ 2003-02-25 4:21 UTC (permalink / raw)
To: Larry McVoy; +Cc: linux-kernel
>> > What part of "all servers from all companies" did you not understand?
>>
>> Average price from Dell: $2495
>> Average price overall: $9347
>>
>> Conclusion ... Dell makes cheaper servers than average, presumably
>> smaller.
>
> So how many CPUs do you think you get in a $9K server?
Not sure. Average by price is probably 4 or a little over.
> Better yet, since you work for IBM, how many servers do they ship in a
> year with 16 CPUs?
Will look. If I can find that data, and it's releasable, I'll send it out.
What's more interesting is how much money they make on machines with, say,
more than 4 CPUs. But I doubt I'll be allowed to release that info ;-)
M.
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Server shipments [was Re: Minutes from Feb 21 LSE Call]
2003-02-25 4:21 ` Martin J. Bligh
@ 2003-02-25 4:37 ` Larry McVoy
0 siblings, 0 replies; 157+ messages in thread
From: Larry McVoy @ 2003-02-25 4:37 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: Larry McVoy, linux-kernel
On Mon, Feb 24, 2003 at 08:21:57PM -0800, Martin J. Bligh wrote:
> > So how many CPUs do you think you get in a $9K server?
>
> Not sure. Average by price is probably 4 or a little over.
Nope. For $12K you can get
4x 1.9Ghz
512MB
No networking
1 disk
No operating system
That's as cheap as it gets. And I don't know about you, but I have a
tough time believing that anyone buys a 4 CPU box without an OS, without
networking, with .5GB of ram, and with one disk.
If you think you are getting a realistic 4 CPU server for $9K from
a vendor, you're dreaming.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Server shipments [was Re: Minutes from Feb 21 LSE Call]
2003-02-25 4:17 ` Larry McVoy
2003-02-25 4:21 ` Martin J. Bligh
@ 2003-02-25 22:02 ` Gerrit Huizenga
2003-02-25 23:19 ` Larry McVoy
1 sibling, 1 reply; 157+ messages in thread
From: Gerrit Huizenga @ 2003-02-25 22:02 UTC (permalink / raw)
To: Larry McVoy; +Cc: Martin J. Bligh, linux-kernel
On Mon, 24 Feb 2003 20:17:01 PST, Larry McVoy wrote:
> On Mon, Feb 24, 2003 at 08:11:21PM -0800, Martin J. Bligh wrote:
> > > What part of "all servers from all companies" did you not understand?
> >
> > Average price from Dell: $2495
> > Average price overall: $9347
> >
> > Conclusion ... Dell makes cheaper servers than average, presumably smaller.
>
> So how many CPUs do you think you get in a $9K server?
Did the numbers track add-on prices, as opposed to base server? Most
servers are sold with one CPU and lots of extra slots. Need to dig
down to the add-on data to find upgrades to more CPUs and more memory
in the field (and more disk drives).
> Better yet, since you work for IBM, how many servers do they ship in a year
> with 16 CPUs?
Proprietary data, unfortunately. And I'm not sure if even internally
the totals are rolled up as pSeries, xSeries, zSeries, iSeries, etc. and
broken down by linux/aix/NT/VM/etc. Nor do most big companies efficiently
track the size of a machine at a customer site after hardware upgrades
(I know for Sequent this in particular was a painful problem - sold a
two way and supported an 18-way machine later).
gerrit
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Server shipments [was Re: Minutes from Feb 21 LSE Call]
2003-02-25 22:02 ` Gerrit Huizenga
@ 2003-02-25 23:19 ` Larry McVoy
2003-02-25 23:46 ` Gerhard Mack
0 siblings, 1 reply; 157+ messages in thread
From: Larry McVoy @ 2003-02-25 23:19 UTC (permalink / raw)
To: Gerrit Huizenga; +Cc: Larry McVoy, Martin J. Bligh, linux-kernel
On Tue, Feb 25, 2003 at 02:02:28PM -0800, Gerrit Huizenga wrote:
> On Mon, 24 Feb 2003 20:17:01 PST, Larry McVoy wrote:
> > On Mon, Feb 24, 2003 at 08:11:21PM -0800, Martin J. Bligh wrote:
> > > > What part of "all servers from all companies" did you not understand?
> > >
> > > Average price from Dell: $2495
> > > Average price overall: $9347
> > >
> > > Conclusion ... Dell makes cheaper servers than average, presumably smaller.
> >
> > So how many CPUs do you think you get in a $9K server?
>
> Did the numbers track add-on prices, as opposed to base server? Most
> servers are sold with one CPU and lots of extra slots. Need to dig
> down to the add-on data to find upgrades to more CPUs and more memory
> in the field (and more disk drives).
I included the URL's so you could check for yourself but I arrived at
those numbers by taking the world wide revenue associated with servers
and dividing by the number of units shipped. I would expect that would
include the add on stuff.
I'm sure IBM makes money on their high end stuff but I'd suspect that
it is more bragging rights than what keeps the lights on.
I think the point which was missed in this whole thread is that even if
IBM has fantastic margins today on big iron, it's unlikely to stay that
way. The world is catching up. I can by a dual 1.8Ghz AMD box for
about $1500. 4 ways are more, maybe $10K or so. So you have the cheapo
white boxes coming at you from the low end.
On the high end, go look at what customers want. They are mostly taking
those big boxes and partitioning them. Sooner or later some bright boy
is going to realize that they could put 4 4 way boxes in one rack and
call it a 16 way box with 4 way partitioning "pre-installed".
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Server shipments [was Re: Minutes from Feb 21 LSE Call]
2003-02-25 23:19 ` Larry McVoy
@ 2003-02-25 23:46 ` Gerhard Mack
2003-02-26 4:23 ` Jesse Pollard
0 siblings, 1 reply; 157+ messages in thread
From: Gerhard Mack @ 2003-02-25 23:46 UTC (permalink / raw)
To: Larry McVoy; +Cc: Gerrit Huizenga, Martin J. Bligh, linux-kernel
On Tue, 25 Feb 2003, Larry McVoy wrote:
> Date: Tue, 25 Feb 2003 15:19:26 -0800
> From: Larry McVoy <lm@bitmover.com>
> To: Gerrit Huizenga <gh@us.ibm.com>
> Cc: Larry McVoy <lm@bitmover.com>, Martin J. Bligh <mbligh@aracnet.com>,
> linux-kernel@vger.kernel.org
> Subject: Re: Server shipments [was Re: Minutes from Feb 21 LSE Call]
>
> On Tue, Feb 25, 2003 at 02:02:28PM -0800, Gerrit Huizenga wrote:
> > On Mon, 24 Feb 2003 20:17:01 PST, Larry McVoy wrote:
> > > On Mon, Feb 24, 2003 at 08:11:21PM -0800, Martin J. Bligh wrote:
> > > > > What part of "all servers from all companies" did you not understand?
> > > >
> > > > Average price from Dell: $2495
> > > > Average price overall: $9347
> > > >
> > > > Conclusion ... Dell makes cheaper servers than average, presumably smaller.
> > >
> > > So how many CPUs do you think you get in a $9K server?
> >
> > Did the numbers track add-on prices, as opposed to base server? Most
> > servers are sold with one CPU and lots of extra slots. Need to dig
> > down to the add-on data to find upgrades to more CPUs and more memory
> > in the field (and more disk drives).
>
> I included the URL's so you could check for yourself but I arrived at
> those numbers by taking the world wide revenue associated with servers
> and dividing by the number of units shipped. I would expect that would
> include the add on stuff.
>
> I'm sure IBM makes money on their high end stuff but I'd suspect that
> it is more bragging rights than what keeps the lights on.
>
> I think the point which was missed in this whole thread is that even if
> IBM has fantastic margins today on big iron, it's unlikely to stay that
> way. The world is catching up. I can by a dual 1.8Ghz AMD box for
> about $1500. 4 ways are more, maybe $10K or so. So you have the cheapo
> white boxes coming at you from the low end.
>
> On the high end, go look at what customers want. They are mostly taking
> those big boxes and partitioning them. Sooner or later some bright boy
> is going to realize that they could put 4 4 way boxes in one rack and
> call it a 16 way box with 4 way partitioning "pre-installed".
er you mean like what racksaver.com does with their 2 dual CPU servers in
a box?
Gerhard
--
Gerhard Mack
gmack@innerfire.net
<>< As a computer I find your faith in technology amusing.
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Server shipments [was Re: Minutes from Feb 21 LSE Call]
2003-02-25 23:46 ` Gerhard Mack
@ 2003-02-26 4:23 ` Jesse Pollard
2003-02-26 5:05 ` William Lee Irwin III
2003-02-26 5:27 ` Bernd Eckenfels
0 siblings, 2 replies; 157+ messages in thread
From: Jesse Pollard @ 2003-02-26 4:23 UTC (permalink / raw)
To: Gerhard Mack, Larry McVoy; +Cc: Gerrit Huizenga, Martin J. Bligh, linux-kernel
On Tuesday 25 February 2003 17:46, Gerhard Mack wrote:
> On Tue, 25 Feb 2003, Larry McVoy wrote:
> > Date: Tue, 25 Feb 2003 15:19:26 -0800
> > From: Larry McVoy <lm@bitmover.com>
> > To: Gerrit Huizenga <gh@us.ibm.com>
> > Cc: Larry McVoy <lm@bitmover.com>, Martin J. Bligh <mbligh@aracnet.com>,
> > linux-kernel@vger.kernel.org
> > Subject: Re: Server shipments [was Re: Minutes from Feb 21 LSE Call]
> >
[snip]
> > On the high end, go look at what customers want. They are mostly taking
> > those big boxes and partitioning them. Sooner or later some bright boy
> > is going to realize that they could put 4 4 way boxes in one rack and
> > call it a 16 way box with 4 way partitioning "pre-installed".
>
> er you mean like what racksaver.com does with their 2 dual CPU servers in
> a box?
And that is not "Big Iron".
sorry - Big Iron is a 1 -5 TFlop single system image, shared memory, with
streaming vector processor...
Something like a Cray X1, single processor for instance.
Or a 1024 processor Cray T3, again single system image, even if it doesn't
have a streaming vector processor.
I don't see that any of the current cluster systems provide the throughput
of such a system. Not even IBMs' SP series. Aggregate measures of theoretical
throughput just don't add up. Practical throughput is almost always only 80%
of the theoretical (ie. the advertised) througput. Most cannot handle the data
I/O requirement, much less the IPC latency.
Sure, 3 microseconds sounds nice for myranet, but nothing beats 17 clock ticks
where each tick is 4 ns for the first 64 bit word of data... followed by the
next word in 4 ns per buss. ( and that is on a slow processor....)
The output is fed to memory on every clock tick. (most Cray processors have 4
memory busses for each processor - two for input data, one for output data
and one for the instruction stream ; and each has the same cycle time...Now
go to 4/8/16/32 processors without reducing that timing. That requires some
CAREFULL hardware design.)
And you better believe that there are big margins on such a system. You only
have to sell 8 to 16 units to exceed the yearly profit of most computer
companies. Do I have hard numbers on the units? no. I don't work for Cray.
I have used their systems for the last 12 years, and until the Earth Simulator
came on line, there was nothing that came close to their throughput for
weather modeling, finite element analysis, or other large problem types.
None of the microprocessors (possibly excepting the Power 4) can come close -
When you look at the processor internals, they all only have a single memory
buss, running approximately 1 - 2 GB/second to cache.
Look at the cray this way: ALL of main memory is cache... with 4 ports to
it... for EACH processor...
Would I like to see Linux running on these? yes. Can I pay for it? No. I'm
not in such a position where I could buy one. Would customers buy one?
Perhaps - if the price were right or the need great enough. Would having
Linux on it save the vendor money? I don't know. I hope that it would.
Unfortunately, there are too many things missing from Linux for it to be
considered:
job and process checkpoint/restart (with files/pipes/sockets intact)
batch job processors (REAL batch jobs ... not just cron)
resource accounting and resource allocation control
compartmented mode security support
truly large filesystem support (10 TB online, 300+ TB nearline in one fs)
large file support (100-300 GB in one file at least)
large process support
(10Gb processes, 10-1000 threads... I can dream can't I :-)
automatic hardware failover support
hot swap components (disks, tapes, memory, processors)
to make a short list.
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Server shipments [was Re: Minutes from Feb 21 LSE Call]
2003-02-26 4:23 ` Jesse Pollard
@ 2003-02-26 5:05 ` William Lee Irwin III
2003-02-26 5:27 ` Bernd Eckenfels
1 sibling, 0 replies; 157+ messages in thread
From: William Lee Irwin III @ 2003-02-26 5:05 UTC (permalink / raw)
To: Jesse Pollard
Cc: Gerhard Mack, Larry McVoy, Gerrit Huizenga, Martin J. Bligh,
linux-kernel
On Tue, Feb 25, 2003 at 10:23:04PM -0600, Jesse Pollard wrote:
> And that is not "Big Iron".
> sorry - Big Iron is a 1 -5 TFlop single system image, shared memory, with
> streaming vector processor...
Thank you for putting things in their perspectives.
This is why I call x86en maxed to their architectural limits "midrange",
which is a kind overestimate given their sickeningly enormous deficits.
-- wli
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Server shipments [was Re: Minutes from Feb 21 LSE Call]
2003-02-26 4:23 ` Jesse Pollard
2003-02-26 5:05 ` William Lee Irwin III
@ 2003-02-26 5:27 ` Bernd Eckenfels
2003-02-26 9:36 ` Eric W. Biederman
2003-02-26 12:09 ` Jesse Pollard
1 sibling, 2 replies; 157+ messages in thread
From: Bernd Eckenfels @ 2003-02-26 5:27 UTC (permalink / raw)
To: linux-kernel
In article <03022522230400.04587@tabby> you wrote:
> Something like a Cray X1, single processor for instance.
> Or a 1024 processor Cray T3, again single system image, even if it doesn't
> have a streaming vector processor.
>
> I don't see that any of the current cluster systems provide the throughput
> of such a system. Not even IBMs' SP series.
This clearly depends on the workload. For most vector processors
partitioning does not make sense. And dont forget, most of those systems are
pure compute servers used fr scientific computing.
> The output is fed to memory on every clock tick. (most Cray processors have 4
> memory busses for each processor - two for input data, one for output data
> and one for the instruction stream
The fastest Cray on top500.org is T3E1200 on rank _22_, the fastest IBM is
ranked _2_ with a Power3 PRocessor. There are 13 IBM systems before the
first (fastest) Cray system. Of course those GFlops are measured for
parallel problems, but there are a lot out there.
And all those numbers are totally uninteresting for DB or Storage Servers.
Even a SAP SD Benchmark would not be fun on a Cray.
> I have used their systems for the last 12 years, and until the Earth Simulator
> came on line, there was nothing that came close to their throughput for
> weather modeling, finite element analysis, or other large problem types.
thats clearly wrong. http://www.top500.org/lists/lists.php?Y=2002&M=06
There are a lot of Power3 ans Alpha systems before the first cray.
Greetings
Bernd
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Server shipments [was Re: Minutes from Feb 21 LSE Call]
2003-02-26 5:27 ` Bernd Eckenfels
@ 2003-02-26 9:36 ` Eric W. Biederman
2003-02-26 12:09 ` Jesse Pollard
1 sibling, 0 replies; 157+ messages in thread
From: Eric W. Biederman @ 2003-02-26 9:36 UTC (permalink / raw)
To: Bernd Eckenfels; +Cc: linux-kernel
Bernd Eckenfels <ecki@calista.eckenfels.6bone.ka-ip.net> writes:
> In article <03022522230400.04587@tabby> you wrote:
> > The output is fed to memory on every clock tick. (most Cray processors have 4
>
> > memory busses for each processor - two for input data, one for output data
> > and one for the instruction stream
>
> The fastest Cray on top500.org is T3E1200 on rank _22_, the fastest IBM is
> ranked _2_ with a Power3 PRocessor. There are 13 IBM systems before the
> first (fastest) Cray system. Of course those GFlops are measured for
> parallel problems, but there are a lot out there.
And it is especially interesting when you note that among 2-5 the
ratings are so close a strong breeze can cause an upset. And that #5
is composed of dual CPU P4 Xeon nodes....
Eric
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Server shipments [was Re: Minutes from Feb 21 LSE Call]
2003-02-26 5:27 ` Bernd Eckenfels
2003-02-26 9:36 ` Eric W. Biederman
@ 2003-02-26 12:09 ` Jesse Pollard
2003-02-26 16:42 ` Geert Uytterhoeven
1 sibling, 1 reply; 157+ messages in thread
From: Jesse Pollard @ 2003-02-26 12:09 UTC (permalink / raw)
To: Bernd Eckenfels, linux-kernel
On Tuesday 25 February 2003 23:27, Bernd Eckenfels wrote:
> In article <03022522230400.04587@tabby> you wrote:
> > Something like a Cray X1, single processor for instance.
> > Or a 1024 processor Cray T3, again single system image, even if it
> > doesn't have a streaming vector processor.
> >
> > I don't see that any of the current cluster systems provide the
> > throughput of such a system. Not even IBMs' SP series.
>
> This clearly depends on the workload. For most vector processors
> partitioning does not make sense. And dont forget, most of those systems
> are pure compute servers used fr scientific computing.
Not as much as you would expect. I've been next to (cubical over) from some
people doing benchmarking on the IBM SP 3 (a 330 node quad processor system
and a newer one). Neither could achieve the "advertised" speed on real
problems.
> > The output is fed to memory on every clock tick. (most Cray processors
> > have 4 memory busses for each processor - two for input data, one for
> > output data and one for the instruction stream
>
> The fastest Cray on top500.org is T3E1200 on rank _22_, the fastest IBM is
> ranked _2_ with a Power3 PRocessor. There are 13 IBM systems before the
> first (fastest) Cray system. Of course those GFlops are measured for
> parallel problems, but there are a lot out there.
The T3 achieves its speed based on the torus network. The processors
are only 400 MHz Alphas, 4 to a processing element. The IBM achives
its speed from a carefully crafted benchmark to show the fasted aggregate
computation possible. It is not a practical usage. Basically the computation
is split into the largest possible chunk, each chunk run on independant
systems, and merged at the very end of the computation. (I've used them too
and have access to two of them).
It takes something in the neighborhood of 60-100 processors in a T3 to
equal one Cray arch processor (even on a C90). A 32 processor C90
easily kept up with a T3 until you exceed 900 processors in the T3. (had
access to each of those too).
> And all those numbers are totally uninteresting for DB or Storage Servers.
> Even a SAP SD Benchmark would not be fun on a Cray.
The Cray has been known to support 200+ GB filesystems with 300+TB
nearline storage with a maximum of 11 second access to data when that
data has been migrated to tape... Admittedly, the time gets longer if the file
exceeds about 100 MB since it must then access multiple tapes in parallel.
> > I have used their systems for the last 12 years, and until the Earth
> > Simulator came on line, there was nothing that came close to their
> > throughput for weather modeling, finite element analysis, or other large
> > problem types.
>
> thats clearly wrong. http://www.top500.org/lists/lists.php?Y=2002&M=06
what you are actually looking at is a custom benchmark, carefully crafted
to show the fasted aggregate computation possible. It is not a practical
usage. The aggregate Cray system throughput (if you max out a X1 cluster)
exceeds even the Earth Simulator. Unfortunately, one of these hasn't been
sold yet.
One of the biggest weaknesses in the IBM world is the SP switch. The lack
of true shared memory programming model limites the systems to very coarse
grained parallelism. It really is just a collection of very fast small
servers. There is no "single system image". The OS and all core utilities
must be duplicated on each node or the cluster will not boot.
> There are a lot of Power3 ans Alpha systems before the first cray.
Ah no. The first cray was before the Pentium... The company made a profit
off of its first sale on one system. There was no power 3 or alpha chip.
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Server shipments [was Re: Minutes from Feb 21 LSE Call]
2003-02-26 12:09 ` Jesse Pollard
@ 2003-02-26 16:42 ` Geert Uytterhoeven
0 siblings, 0 replies; 157+ messages in thread
From: Geert Uytterhoeven @ 2003-02-26 16:42 UTC (permalink / raw)
To: Jesse Pollard; +Cc: Bernd Eckenfels, Linux Kernel Development
On Wed, 26 Feb 2003, Jesse Pollard wrote:
> On Tuesday 25 February 2003 23:27, Bernd Eckenfels wrote:
> > There are a lot of Power3 ans Alpha systems before the first cray.
>
> Ah no. The first cray was before the Pentium... The company made a profit
> off of its first sale on one system. There was no power 3 or alpha chip.
I think Bernd was speaking about the Top 500, not about a historical timeline.
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Server shipments [was Re: Minutes from Feb 21 LSE Call]
2003-02-25 3:00 ` Martin J. Bligh
2003-02-25 3:13 ` Larry McVoy
@ 2003-02-25 17:37 ` Andrea Arcangeli
1 sibling, 0 replies; 157+ messages in thread
From: Andrea Arcangeli @ 2003-02-25 17:37 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: Larry McVoy, linux-kernel
On Mon, Feb 24, 2003 at 07:00:42PM -0800, Martin J. Bligh wrote:
> Solaris et al failed was because the development model was different?
Solaris can't be recompiled UP AFIK. This whole discussion about UP
performance is almost pointless in linux since we have CONFIG_SMP and we
can recompile it.
Especially if what you care is the desktop (not the UP server), the only
kernel bits that matters for the desktop are the VM, the scheduler and
I/O latency and perpahs the clear_page too. the rest is all a matter of
the X/kde/qt/glibc-dynamiclinking/opengl/memorybloatwithmultiplelibs/etc..
the kernel core-raw performance in the fast paths doesn't matter much for the
desktop, even if the syscall would be twice slower desktop users
wouldn't notice much.
Andrea
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Server shipments [was Re: Minutes from Feb 21 LSE Call]
2003-02-25 0:41 ` Martin J. Bligh
2003-02-25 0:54 ` Larry McVoy
@ 2003-02-25 1:09 ` David Lang
1 sibling, 0 replies; 157+ messages in thread
From: David Lang @ 2003-02-25 1:09 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: Larry McVoy, linux-kernel
if you want to say that sales of LINUX servers generates mor profits then
sales of LINUX desktops then you have a chance of being right, not becouse
the server market is so large, but becouse the desktop market is so small.
however if the linux desktop were to get 10% of the market in terms of new
sales (it's already in the 5%-7% range according to some reports, but a
large percentage of that is in repurposed windows desktops) then the
sales and profits of the desktops would easily outclass the sales and
profits of servers due to the shear volume.
IBM and Sun make a lot of sales from the theory that their machines (you
know, the ones in fancy PC cases with PC power supplies and IDE drives)
are somehow more reliable then a x86 machine. as pwople really start
analysing the cost/performance of the machines and implement HA becouse
they need 24x7 coverage and even the big boys boxes need to be updated
people realize that they can buy multiple cheap boxes and get HA for less
then the cost of buying the one 'professional' box (in some cases they can
afford to buy the multiple smaller boxes and replace them every year for
less then the cost of the professional box over 3 years). And as more
folks use linux on the small(er) machines it breaks down the risk barrier.
one of the big reasons people have traditionally used small numbers of
large boxes was that the licensing costs have been significant, well linux
doesn't have a per server license cost (unless you really want to pay one)
so that's also no longer an issue.
there are some jobs that require large machines instead of clusters,
databases are still one of them (at least as far as I have been able to
learn) but a lot of other jobs are being moved to multiple smaller boxes
(or to multiple logical boxes on one large box which is what Larry is
advocating) and in spite of the doomsayers the problems are being worked
out (can you imagine the reaction from telling a sysadmin team managing
one server in 1970 that in 2000 a similar sized team would be managing
hundreds or thousands of servers ala google :-) yes it takes planning and
dicipline, but it's not nearly as hard as people imagine before they get
started down that path)
David Lang
On Mon, 24 Feb 2003, Martin J. Bligh wrote:
> Date: Mon, 24 Feb 2003 16:41:04 -0800
> From: Martin J. Bligh <mbligh@aracnet.com>
> To: Larry McVoy <lm@bitmover.com>
> Cc: linux-kernel@vger.kernel.org
> Subject: Re: Server shipments [was Re: Minutes from Feb 21 LSE Call]
>
> > More data from news.com.
> >
> > Dell has 19% of the server market with $531M/quarter in sales[1] over
> > 212,750 machines per quarter[2].
> >
> > That means that the average sale price for a server from Dell was $2495.
> >
> > The average sale price of all servers from all companies is $9347.
> >
> > I still don't see the big profits touted by the scaling fanatics, anyone
> > care to explain it?
>
> Sigh. If you're so convinced that there's no money in larger systems,
> why don't you write to Sam Palmisano and explain to him the error of
> his ways? I'm sure IBM has absolutely no market data to go on ...
>
> If only he could receive an explanation of the error of his ways from
> Larry McVoy, I'm sure he'd turn the ship around, for you obviously have
> all the facts, figures, and experience of the server market to make this
> kind of decision. I await the email from the our CEO that tells us how
> much he respects you, and has taken this decision at your bidding.
>
> M.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-24 16:17 ` Larry McVoy
2003-02-24 16:49 ` Martin J. Bligh
@ 2003-02-24 18:22 ` John W. M. Stevens
1 sibling, 0 replies; 157+ messages in thread
From: John W. M. Stevens @ 2003-02-24 18:22 UTC (permalink / raw)
To: Larry McVoy, Martin J. Bligh, Larry McVoy, linux-kernel
On Mon, Feb 24, 2003 at 08:17:16AM -0800, Larry McVoy wrote:
> On Sun, Feb 23, 2003 at 11:39:34PM -0800, Martin J. Bligh wrote:
>
> See other postings on this one. All engineers in your position have said
> "we're just trying to get to N cpus where N = ~2x where we are today and
> it won't hurt uniprocessor performance". They *all* say that. And they
> all end up with a slow uniprocessor OS. Unlike security and a number of
> other invasive features, the SMP stuff can't be configed out
Heck, you can't even configure it out on so-called UP systems.
The moment you introduce DMA into a system, you have an (admittedly,
constrained) SMP system.
And of course, simple interruption is another, contrained, kind of
"virtual SMP", yes?
Anybody whose done any USB HC programming is horribly aware of this
fact, trust me! ;-)
> or you end
> up with an #ifdef-ed mess like IRIX.
Why if-def it every where?
#ifdef SMP
#define lock( mutex ) smpLock( lock )
#else
#define lock( mutex )
#endif
Do that once, use the lock macro, and forget about it (except in
cases where you have to worry about DMA, interruption, or some other
kind of MP, of course).
My (limited, only about 600 machines) experience is that Linux is
inevitably less stable on non-Intel, and on non-UP machines. Before
worrying about scalability, my opinion is that worrying about getting
the simplest (dual processor) machines as stable as UP machines, first,
would be both a better ROI, and a good basis for higher levels of
scalability.
Mind you, there is a perfectly simple reason (for Linux being less
stable on non-Intel, non-UP machines) that this is true: the
Linux development methodology pretty much makes this an emergent
property.
Interesting discussion, though . . . from my experience, the commercial
Unices use fine grained locking.
Luck,
John S.
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-24 6:58 ` Larry McVoy
2003-02-24 7:39 ` Martin J. Bligh
@ 2003-02-24 7:51 ` William Lee Irwin III
2003-02-24 15:47 ` Larry McVoy
2003-02-24 13:28 ` Alan Cox
2003-02-24 18:44 ` Davide Libenzi
3 siblings, 1 reply; 157+ messages in thread
From: William Lee Irwin III @ 2003-02-24 7:51 UTC (permalink / raw)
To: Larry McVoy, Martin J. Bligh, Larry McVoy, linux-kernel
On Sun, Feb 23, 2003 at 10:58:26PM -0800, Larry McVoy wrote:
> Linux is a really fast system right now. The code paths are short and
> it is possible to use the OS almost as if it were a library, the cost is
> so little that you really can mmap stuff in as you need, something that
> people have wanted since Multics. There will always be many more uses
> of Linux in small systems than large, simply because there will always
> be more small systems. Keeping Linux working well on small systems is
> going to have a dramatically larger positive benefit for the world than
> scaling it to 64 processors. So who do you want to help? An elite
> few or everyone?
I don't know what kind of joke you think I'm trying to play here.
"Scalability" is about making the kernel properly adapt to the size of
the system. This means UP. This means embedded. This means mid-range
x86 bigfathighmem turds. This means SGI Altix. I have _personally_
written patches to decrease the space footprint of pidhashes and other
data structures so that embedded systems function more optimally.
It's not about crapping all over the low end. It's not about degrading
performance on commonly available systems. It's about increasing the
range of systems on which Linux performs well and is useful.
Maintaining the performance of Linux on commonly available systems is
not only deeply ingrained as one of a set of personal standards amongst
all kernel hackers involved with scalability, it's also a prerequisite
for patch acceptance that is rigorously enforced by maintainers. To
further demonstrate this, look at the pgd_ctor patches, which markedly
reduced the overhead of pgd setup and teardown on UP lowmem systems and
were very minor improvements on PAE systems.
Now it's time to turn the question back around on you. Why do you not
want Linux to work well on a broader range of systems than it does now?
-- wli
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-24 7:51 ` William Lee Irwin III
@ 2003-02-24 15:47 ` Larry McVoy
2003-02-24 16:00 ` Martin J. Bligh
` (2 more replies)
0 siblings, 3 replies; 157+ messages in thread
From: Larry McVoy @ 2003-02-24 15:47 UTC (permalink / raw)
To: William Lee Irwin III, Martin J. Bligh, Larry McVoy, linux-kernel
On Sun, Feb 23, 2003 at 11:51:42PM -0800, William Lee Irwin III wrote:
> Now it's time to turn the question back around on you. Why do you not
> want Linux to work well on a broader range of systems than it does now?
I never said that I didn't. I'm just taking issue with the choosen path
which has been demonstrated to not work.
"Let's scale Linux by multi threading"
"Err, that really sucked for everyone who has tried it in the past, all
the code paths got long and uniprocessor performance suffered"
"Oh, but we won't do that, that would be bad".
"Great, how about you measure the changes carefully and really show that?"
"We don't need to measure the changes, we know we'll do it right".
And just like in every other time this come up in every other engineering
organization, the focus is in 2x wherever we are today. It is *never*
about getting to 100x or 1000x.
If you were looking at the problem assuming that the same code had to
run on uniprocessor and a 1000 way smp, right now, today, and designing
for it, I doubt very much we'd have anything to argue about. A lot of
what I'm saying starts to become obviously true as you increase the
number of CPUs but engineers are always seduced into making it go 2x
farther than it does today. Unfortunately, each of those 2x increases
comes at some cost and they add up.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm
^ permalink raw reply [flat|nested] 157+ messages in thread* Re: Minutes from Feb 21 LSE Call
2003-02-24 15:47 ` Larry McVoy
@ 2003-02-24 16:00 ` Martin J. Bligh
2003-02-24 16:23 ` Benjamin LaHaise
2003-02-24 23:36 ` William Lee Irwin III
2 siblings, 0 replies; 157+ messages in thread
From: Martin J. Bligh @ 2003-02-24 16:00 UTC (permalink / raw)
To: Larry McVoy, William Lee Irwin III, linux-kernel
> I never said that I didn't. I'm just taking issue with the choosen path
> which has been demonstrated to not work.
>
> "Let's scale Linux by multi threading"
>
> "Err, that really sucked for everyone who has tried it in the past,
> all the code paths got long and uniprocessor performance suffered"
>
> "Oh, but we won't do that, that would be bad".
>
> "Great, how about you measure the changes carefully and really show
> that?"
>
> "We don't need to measure the changes, we know we'll do it right".
Most of the threading changes have been things like 1 thread per cpu, which
would seem to scale up and down rather well to me ... could you illustrate
by pointing to an example of something that's changed in that area which
you think is bad? Yes, if Linux started 2000 kernel threads on a UP system,
that would obviously be bad.
M.
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-24 15:47 ` Larry McVoy
2003-02-24 16:00 ` Martin J. Bligh
@ 2003-02-24 16:23 ` Benjamin LaHaise
2003-02-24 16:25 ` yodaiken
2003-02-24 16:31 ` Minutes from Feb 21 LSE Call Larry McVoy
2003-02-24 23:36 ` William Lee Irwin III
2 siblings, 2 replies; 157+ messages in thread
From: Benjamin LaHaise @ 2003-02-24 16:23 UTC (permalink / raw)
To: Larry McVoy, William Lee Irwin III, Martin J. Bligh, Larry McVoy,
linux-kernel
On Mon, Feb 24, 2003 at 07:47:25AM -0800, Larry McVoy wrote:
> If you were looking at the problem assuming that the same code had to
> run on uniprocessor and a 1000 way smp, right now, today, and designing
> for it, I doubt very much we'd have anything to argue about. A lot of
> what I'm saying starts to become obviously true as you increase the
> number of CPUs but engineers are always seduced into making it go 2x
> farther than it does today. Unfortunately, each of those 2x increases
> comes at some cost and they add up.
Good point. However, we are in a position to compare test results of
older linux kernels against newer, and to recompile code out of the
kernel for specific applications. I'm curious if there is a collection
of lmbench results of hand configured and compiled kernels vs the vendor
module based kernels across 2.0, 2.2, 2.4 and recent 2.5 on the same
uniprocessor and dual processor configuration. That would really give
us a better idea of how a properly tuned kernel vs what people actually
use for support reasons is costing us, and if we're winning or losing.
-ben
--
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-24 16:23 ` Benjamin LaHaise
@ 2003-02-24 16:25 ` yodaiken
2003-02-24 18:20 ` Gerrit Huizenga
2003-02-24 16:31 ` Minutes from Feb 21 LSE Call Larry McVoy
1 sibling, 1 reply; 157+ messages in thread
From: yodaiken @ 2003-02-24 16:25 UTC (permalink / raw)
To: Benjamin LaHaise
Cc: Larry McVoy, William Lee Irwin III, Martin J. Bligh, Larry McVoy,
linux-kernel
On Mon, Feb 24, 2003 at 11:23:14AM -0500, Benjamin LaHaise wrote:
> Good point. However, we are in a position to compare test results of
> older linux kernels against newer, and to recompile code out of the
> kernel for specific applications. I'm curious if there is a collection
> of lmbench results of hand configured and compiled kernels vs the vendor
> module based kernels across 2.0, 2.2, 2.4 and recent 2.5 on the same
> uniprocessor and dual processor configuration. That would really give
> us a better idea of how a properly tuned kernel vs what people actually
> use for support reasons is costing us, and if we're winning or losing.
It's interesting to me that the people supporting the scale up do not
carefully do such benchmarks and indeed have a rather cavilier attitude
to testing and benchmarking: or perhaps they don't think it's worth
publishing.
--
---------------------------------------------------------
Victor Yodaiken
Finite State Machine Labs: The RTLinux Company.
www.fsmlabs.com www.rtlinux.com
1+ 505 838 9109
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-24 16:25 ` yodaiken
@ 2003-02-24 18:20 ` Gerrit Huizenga
2003-02-25 1:51 ` Minutes from Feb 21 LSE Call - publishing performance data Craig Thomas
0 siblings, 1 reply; 157+ messages in thread
From: Gerrit Huizenga @ 2003-02-24 18:20 UTC (permalink / raw)
To: yodaiken
Cc: Benjamin LaHaise, Larry McVoy, William Lee Irwin III,
Martin J. Bligh, Larry McVoy, linux-kernel
On Mon, 24 Feb 2003 09:25:33 MST, yodaiken@fsmlabs.com wrote:
> It's interesting to me that the people supporting the scale up do not
> carefully do such benchmarks and indeed have a rather cavilier attitude
> to testing and benchmarking: or perhaps they don't think it's worth
> publishing.
I'm afraid it is the latter half that is closer to correct. Within
IBM's Linux Technology Center, we have a good sized performance team
and a tightly coupled set of developers who can internally share a
lot of real benchmark data. Unfortunately, the rules of SPEC and TPC
don't allow us to release data unless it is carefully (and time-
consumingly) audited, and IBM has a history of not dumping the output
of a few hundred runs of benchmarks out in the open and then claiming
that it is all valid, without doing a lot of internal validation first.
I'm sure other large companies doing Linux stuff have similar hurdles.
In some cases, ours are probably higher than average (IBM as an
entity has zero interest in pissing of the TPC or SPEC).
We do have a few papers out there, check OLS for the large database
workload one that steps through 2.4 performance changes (stock
2.4 vs. a set of patches we pushed to UL & RHAT) that increase
database performance about, oh, I forget, 5-fold... And there
is occasional other data sent out on web server stuff, some
microbenchmark data (see the continuing stream of data from mbligh,
for instance). Also, the contest data, OSDL data, etc. etc.
shows comparisons and trends for anyone who cares to pay attention.
It *would* be nice if someone could publish a compedium of performance
data, but that would be asking a lot...
gerrit
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call - publishing performance data
2003-02-24 18:20 ` Gerrit Huizenga
@ 2003-02-25 1:51 ` Craig Thomas
0 siblings, 0 replies; 157+ messages in thread
From: Craig Thomas @ 2003-02-25 1:51 UTC (permalink / raw)
To: Gerrit Huizenga
Cc: yodaiken, Benjamin LaHaise, Larry McVoy, William Lee Irwin III,
Martin J. Bligh, Larry McVoy, linux-kernel
On Mon, 2003-02-24 at 10:20, Gerrit Huizenga wrote:
>
> We do have a few papers out there, check OLS for the large database
> workload one that steps through 2.4 performance changes (stock
> 2.4 vs. a set of patches we pushed to UL & RHAT) that increase
> database performance about, oh, I forget, 5-fold... And there
> is occasional other data sent out on web server stuff, some
> microbenchmark data (see the continuing stream of data from mbligh,
> for instance). Also, the contest data, OSDL data, etc. etc.
> shows comparisons and trends for anyone who cares to pay attention.
>
> It *would* be nice if someone could publish a compedium of performance
> data, but that would be asking a lot...
>
> gerrit
> -
OSDL is trying to provide something like this for the 2.5 kernel. It is
an interest we have to provide this sort of data. We have been building
database workload information and generating test results from our STP
test framework.
We are in the midst of creating content for a Linux Stability Results
web page. http://www.osdl.org/projects/26lnxstblztn/results/ There is
a great desire on our part to share good performance data for the kernel
as it evolves. I would like to ask you guys what would you like to
see on page like this? I feel that we could create a single site where
anyone can get access to performance and reliability information about
the Linux kernel as we move toward the 2.6 version.
The page is set up now so that anyone can contribute content to the page
by editing an html template file to point to test and performance data.
If anyone is interested in this concept, email me privately or
cliffw@osdl.org
--
Craig Thomas <craiger@osdl.org>
OSDL
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-24 16:23 ` Benjamin LaHaise
2003-02-24 16:25 ` yodaiken
@ 2003-02-24 16:31 ` Larry McVoy
1 sibling, 0 replies; 157+ messages in thread
From: Larry McVoy @ 2003-02-24 16:31 UTC (permalink / raw)
To: Benjamin LaHaise
Cc: Larry McVoy, William Lee Irwin III, Martin J. Bligh, linux-kernel
On Mon, Feb 24, 2003 at 11:23:14AM -0500, Benjamin LaHaise wrote:
> kernel for specific applications. I'm curious if there is a collection
> of lmbench results of hand configured and compiled kernels vs the vendor
> module based kernels across 2.0, 2.2, 2.4 and recent 2.5 on the same
> uniprocessor and dual processor configuration.
If someone were willing to build the init script infra structure to
reboot to a new kernel, run the test, etc., I'll buy a couple of
machines and just let them run through this. I'd like to do it
with the cache miss counters turned on so if P4's do a nicer job
of counting than Athlons, I'll get those.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-24 15:47 ` Larry McVoy
2003-02-24 16:00 ` Martin J. Bligh
2003-02-24 16:23 ` Benjamin LaHaise
@ 2003-02-24 23:36 ` William Lee Irwin III
2003-02-25 0:23 ` Larry McVoy
2 siblings, 1 reply; 157+ messages in thread
From: William Lee Irwin III @ 2003-02-24 23:36 UTC (permalink / raw)
To: Larry McVoy, Martin J. Bligh, Larry McVoy, linux-kernel
On Sun, Feb 23, 2003 at 11:51:42PM -0800, William Lee Irwin III wrote:
>> Now it's time to turn the question back around on you. Why do you not
>> want Linux to work well on a broader range of systems than it does now?
On Mon, Feb 24, 2003 at 07:47:25AM -0800, Larry McVoy wrote:
> I never said that I didn't. I'm just taking issue with the choosen path
> which has been demonstrated to not work.
> "Let's scale Linux by multi threading"
> "Err, that really sucked for everyone who has tried it in the past, all
> the code paths got long and uniprocessor performance suffered"
> "Oh, but we won't do that, that would be bad".
> "Great, how about you measure the changes carefully and really show that?"
> "We don't need to measure the changes, we know we'll do it right".
The changes are getting measured. By and large if it's slower on UP
it's rejected. There's a dedicated benchmark crew, of which Randy Hron
is an important member, that benchmarks such things very consistently.
Internal benchmarking includes both free and non-free benchmarks. dbench,
tiobench, kernel compiles, contest, and so on are the publicable bits.
Also, code paths are also not necessarily getting longer. Single-
threaded efficiency lowers lock hold time and helps small systems too,
and numerous improvements with buffer_heads, task searching, file
truncation, and the like, are of that flavor.
On Mon, Feb 24, 2003 at 07:47:25AM -0800, Larry McVoy wrote:
> And just like in every other time this come up in every other engineering
> organization, the focus is in 2x wherever we are today. It is *never*
> about getting to 100x or 1000x.
> If you were looking at the problem assuming that the same code had to
> run on uniprocessor and a 1000 way smp, right now, today, and designing
> for it, I doubt very much we'd have anything to argue about. A lot of
> what I'm saying starts to become obviously true as you increase the
> number of CPUs but engineers are always seduced into making it go 2x
> farther than it does today. Unfortunately, each of those 2x increases
> comes at some cost and they add up.
Linux is a patchwork kernel. No coherent design will ever shine through.
Scaling the kernel incrementally merely becomes that much more difficult.
The small system performance standards aren't getting lowered.
Also note there are various efforts to scale the kernel _downward_ to
smaller embedded systems, partly by controlling "bloated" hash tables'
sizes and partly by making major subsystems optional and partly by
supporting systems with no MMU. This is not a one-way street, though I
myself am clearly pointed in the upward direction.
-- wli
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-24 23:36 ` William Lee Irwin III
@ 2003-02-25 0:23 ` Larry McVoy
2003-02-25 2:37 ` Werner Almesberger
2003-02-25 4:42 ` William Lee Irwin III
0 siblings, 2 replies; 157+ messages in thread
From: Larry McVoy @ 2003-02-25 0:23 UTC (permalink / raw)
To: William Lee Irwin III, Martin J. Bligh, Larry McVoy, linux-kernel
> The changes are getting measured. By and large if it's slower on UP
> it's rejected.
Suppose I have an application which has a working set which just exactly
fits in the I+D caches, including the related OS stuff.
Someone makes some change to the OS and the benchmark for that change is
smaller than the I+D caches but the change increased the I+D cache space
needed.
The benchmark will not show any slowdown, correct?
My application no longer fits and will suffer, correct?
The point is that if you are putting SMP changes into the system, you
have to be held to a higher standard for measurement given the past
track record of SMP changes increasing code length and cache footprints.
So "measuring" doesn't mean "it's not slower on XYZ microbenchmark".
It means "under the following work loads the cache misses went down or
stayed the same for before and after tests".
And if you said that all changes should be held to this standard, not
just scaling changes, I'd agree with you. But scaling changes are the
"bad guy" in my mind, they are not to be trusted, so they should be held
to this standard first. If we can get everyone to step up to this bat,
that's all to the good.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-25 0:23 ` Larry McVoy
@ 2003-02-25 2:37 ` Werner Almesberger
2003-02-25 4:42 ` William Lee Irwin III
1 sibling, 0 replies; 157+ messages in thread
From: Werner Almesberger @ 2003-02-25 2:37 UTC (permalink / raw)
To: William Lee Irwin III, Martin J. Bligh, Larry McVoy, linux-kernel
Larry McVoy wrote:
> The point is that if you are putting SMP changes into the system, you
> have to be held to a higher standard for measurement given the past
> track record of SMP changes increasing code length and cache footprints.
So you probably want to run this benchmark on a synthetic CPU a la
cachegrind. The difficult part would be to come up with a reasonably
understandable additive metric for cache pressure.
(I guess there goes another call to arms to academia :-)
- Werner
--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-25 0:23 ` Larry McVoy
2003-02-25 2:37 ` Werner Almesberger
@ 2003-02-25 4:42 ` William Lee Irwin III
2003-02-25 4:54 ` Larry McVoy
1 sibling, 1 reply; 157+ messages in thread
From: William Lee Irwin III @ 2003-02-25 4:42 UTC (permalink / raw)
To: Larry McVoy, Martin J. Bligh, Larry McVoy, linux-kernel
At some point in the past, I wrote:
>> The changes are getting measured. By and large if it's slower on UP
>> it's rejected.
On Mon, Feb 24, 2003 at 04:23:09PM -0800, Larry McVoy wrote:
> Suppose I have an application which has a working set which just exactly
> fits in the I+D caches, including the related OS stuff.
> Someone makes some change to the OS and the benchmark for that change is
> smaller than the I+D caches but the change increased the I+D cache space
> needed.
> The benchmark will not show any slowdown, correct?
> My application no longer fits and will suffer, correct?
Well, it's often clear from the code whether it'll have a larger cache
footprint or not, so it's probably not that large a problem. OTOH it is
a real problem that little cache or TLB profiling is going on. I tried
once or twice and actually came up with a function or two that should
be inlined instead of uninlined in very short order. Much low-hanging
fruit could be gleaned from those kinds profiles.
It's also worthwhile noting increased cache footprints are actually
very often degradations on SMP and especially NUMA. The notion that
optimizing for SMP and/or NUMA involves increasing cache footprint
on anything doesn't really sound plausible, though I'll admit that
the mistake of trusting microbenchmarks too far on SMP has probably
already been committed at least once. Userspace owns the cache; using
cache for the kernel is "cache pollution", which should be minimized.
Going too far out on the space end of time/space tradeoff curves is
every bit as bad for SMP as UP, and really horrible for NUMA.
On Mon, Feb 24, 2003 at 04:23:09PM -0800, Larry McVoy wrote:
> The point is that if you are putting SMP changes into the system, you
> have to be held to a higher standard for measurement given the past
> track record of SMP changes increasing code length and cache footprints.
> So "measuring" doesn't mean "it's not slower on XYZ microbenchmark".
> It means "under the following work loads the cache misses went down or
> stayed the same for before and after tests".
This kind of measurement is actually relatively unusual. I'm definitely
interested in it, as there appear to be some deficits wrt. locality of
reference that show up as big profile spikes on NUMA boxen. With care
exercised good solutions should also trim down cache misses on UP also.
Cache and TLB miss profile driven development sounds very attractive.
On Mon, Feb 24, 2003 at 04:23:09PM -0800, Larry McVoy wrote:
> And if you said that all changes should be held to this standard, not
> just scaling changes, I'd agree with you. But scaling changes are the
> "bad guy" in my mind, they are not to be trusted, so they should be held
> to this standard first. If we can get everyone to step up to this bat,
> that's all to the good.
Let me put it this way: IBM sells tiny boxen too, from 4x, to UP, to
whatever. And people are simultaneously actively trying to scale
downward to embedded bacteria or whatever. So the small systems are
being neither ignored nor sacrificed for anything else.
-- wli
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-25 4:42 ` William Lee Irwin III
@ 2003-02-25 4:54 ` Larry McVoy
2003-02-25 6:00 ` William Lee Irwin III
0 siblings, 1 reply; 157+ messages in thread
From: Larry McVoy @ 2003-02-25 4:54 UTC (permalink / raw)
To: William Lee Irwin III, Martin J. Bligh, Larry McVoy, linux-kernel
> Userspace owns the cache; using
> cache for the kernel is "cache pollution", which should be minimized.
> Going too far out on the space end of time/space tradeoff curves is
> every bit as bad for SMP as UP, and really horrible for NUMA.
Cool, I agree 100% with this.
> > So "measuring" doesn't mean "it's not slower on XYZ microbenchmark".
> > It means "under the following work loads the cache misses went down or
> > stayed the same for before and after tests".
>
> This kind of measurement is actually relatively unusual. I'm definitely
> interested in it, as there appear to be some deficits wrt. locality of
> reference that show up as big profile spikes on NUMA boxen. With care
> exercised good solutions should also trim down cache misses on UP also.
> Cache and TLB miss profile driven development sounds very attractive.
Again, I'm with you all the way on this. If the scale up guys can adopt
this as a mantra, I'm a lot less concerned that anything bad will happen.
Tim at OSDL and I have been talking about trying to work out some benchmarks
to test for this. I came up with the idea of adding a "-s XXX" which means
"touch XXX bytes between each iteration" to each LMbench test. One problem
is the lack of page coloring will make the numbers bounce around too much.
We talked that over with Linus and he suggested using the big TLB hack to
get around that. Assuming we can deal with the page coloring, do you think
that there is any merit in taking microbenchmarks, adding an artificial
working set, and running those?
> Let me put it this way: IBM sells tiny boxen too, from 4x, to UP, to
> whatever. And people are simultaneously actively trying to scale
> downward to embedded bacteria or whatever.
That's really great, I know it's a lot less sexy but it's important.
I'd love to see as much attention on making Linux work on tiny embedded
platforms as there is on making it work on big iron. Small is cool too.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-25 4:54 ` Larry McVoy
@ 2003-02-25 6:00 ` William Lee Irwin III
2003-02-25 7:00 ` Val Henson
0 siblings, 1 reply; 157+ messages in thread
From: William Lee Irwin III @ 2003-02-25 6:00 UTC (permalink / raw)
To: Larry McVoy, Martin J. Bligh, Larry McVoy, linux-kernel
At some point in the past, I wrote:
>> This kind of measurement is actually relatively unusual. I'm definitely
>> interested in it, as there appear to be some deficits wrt. locality of
>> reference that show up as big profile spikes on NUMA boxen. With care
>> exercised good solutions should also trim down cache misses on UP also.
>> Cache and TLB miss profile driven development sounds very attractive.
On Mon, Feb 24, 2003 at 08:54:04PM -0800, Larry McVoy wrote:
> Again, I'm with you all the way on this. If the scale up guys can adopt
> this as a mantra, I'm a lot less concerned that anything bad will happen.
I don't know about mantras, but we're getting to the point where lock
contention is a non-issue on midrange SMP and straight line efficiency
is beyond the range of "obviously it should be done some other way."
The time to chase cache pollution is certainly coming.
On Mon, Feb 24, 2003 at 08:54:04PM -0800, Larry McVoy wrote:
> Tim at OSDL and I have been talking about trying to work out some benchmarks
> to test for this. I came up with the idea of adding a "-s XXX" which means
> "touch XXX bytes between each iteration" to each LMbench test. One problem
> is the lack of page coloring will make the numbers bounce around too much.
> We talked that over with Linus and he suggested using the big TLB hack to
> get around that. Assuming we can deal with the page coloring, do you think
> that there is any merit in taking microbenchmarks, adding an artificial
> working set, and running those?
Page coloring needs to get into the kernel at some point. Using large
TLB entries will artificially tie this to TLB effects and fragmentation,
in addition to pagetable space conservation (on x86 anyway). So I really
don't see any way to deal with reproducibility issues on this front but
just doing page coloring. Everything else that does it as a side effect
would unduly disturb the results, IMHO.
At some point in the past, I wrote:
>> Let me put it this way: IBM sells tiny boxen too, from 4x, to UP, to
>> whatever. And people are simultaneously actively trying to scale
>> downward to embedded bacteria or whatever.
On Mon, Feb 24, 2003 at 08:54:04PM -0800, Larry McVoy wrote:
> That's really great, I know it's a lot less sexy but it's important.
> I'd love to see as much attention on making Linux work on tiny embedded
> platforms as there is on making it work on big iron. Small is cool too.
There is, unfortunately the participation in the development cycle of
embedded vendors is not as visible as it is with large system vendors.
More direct, frequent, and vocal input from embedded kernel hackers
would be very valuable, as many "corner cases" with automatic kernel
scaling should occur on the small end, not just the large end.
I've had some brief attempts to explain to me the motives and methods
of embedded system vendors and the like, but I've failed to absorb
enough to get a "big picture" or much of any notion as to why embedded
kernel hackers aren't participating as much in the development cycle.
On the large system side, it's very clear that issues in the core VM
and other parts of the kernel must be addressed to achieve the goals,
and hence participation in the development cycle is outright mandatory.
It's not "working effectively". It's a requirement. And part of that
"requirement" bit is we have to work with constraints never enforced
before, including maintaining the scalability curve on the low end.
It's hard, and probably not impossible, but absolutely required.
-- wli
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-25 6:00 ` William Lee Irwin III
@ 2003-02-25 7:00 ` Val Henson
0 siblings, 0 replies; 157+ messages in thread
From: Val Henson @ 2003-02-25 7:00 UTC (permalink / raw)
To: William Lee Irwin III, linux-kernel
On Mon, Feb 24, 2003 at 10:00:53PM -0800, William Lee Irwin III wrote:
> On Mon, Feb 24, 2003 at 08:54:04PM -0800, Larry McVoy wrote:
> > That's really great, I know it's a lot less sexy but it's important.
> > I'd love to see as much attention on making Linux work on tiny embedded
> > platforms as there is on making it work on big iron. Small is cool too.
>
> There is, unfortunately the participation in the development cycle of
> embedded vendors is not as visible as it is with large system vendors.
> More direct, frequent, and vocal input from embedded kernel hackers
> would be very valuable, as many "corner cases" with automatic kernel
> scaling should occur on the small end, not just the large end.
>
> I've had some brief attempts to explain to me the motives and methods
> of embedded system vendors and the like, but I've failed to absorb
> enough to get a "big picture" or much of any notion as to why embedded
> kernel hackers aren't participating as much in the development cycle.
Speaking as a former Linux developer for an embedded[1] systems
vendor, it's because embedded companies aren't the size of IBM and
don't have money to spend on software development beyond the "make it
work on our boards" point. One of the many reasons I'm a _former_
embedded Linux developer.
-VAL
[1] Okay, our boards had up to 4 processors and 1GB memory. But the
same principles applied.
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-24 6:58 ` Larry McVoy
2003-02-24 7:39 ` Martin J. Bligh
2003-02-24 7:51 ` William Lee Irwin III
@ 2003-02-24 13:28 ` Alan Cox
2003-02-25 5:19 ` Chris Wedgwood
2003-02-24 18:44 ` Davide Libenzi
3 siblings, 1 reply; 157+ messages in thread
From: Alan Cox @ 2003-02-24 13:28 UTC (permalink / raw)
To: Larry McVoy; +Cc: Martin J. Bligh, Linux Kernel Mailing List
On Mon, 2003-02-24 at 06:58, Larry McVoy wrote:
> Which brings us back to the point. If the world is not heading towards
> an 8 way on every desk then it is really questionable to make a lot of
> changes to the kernel to make it work really well on 8-ways.
_If_ it harms performance on small boxes. Otherwise you turn Linux into
Irix and your market doesnt look so hot in 3 or 4 years time. Featuritus
is a slow creeping death.
The definitive Linux box appears to be $199 from Walmart right now, and
its not SMP.
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-24 13:28 ` Alan Cox
@ 2003-02-25 5:19 ` Chris Wedgwood
2003-02-25 5:26 ` William Lee Irwin III
` (3 more replies)
0 siblings, 4 replies; 157+ messages in thread
From: Chris Wedgwood @ 2003-02-25 5:19 UTC (permalink / raw)
To: Alan Cox; +Cc: Larry McVoy, Martin J. Bligh, Linux Kernel Mailing List
On Mon, Feb 24, 2003 at 01:28:30PM +0000, Alan Cox wrote:
> _If_ it harms performance on small boxes.
You mean like the general slowdown from 2.4 - >2.5?
It seems to me for small boxes, 2.5.x is margianlly slower at most
things than 2.4.x.
I'm hoping and the code solidifes and things are tuned this gap will
go away and 2.5.x will inch ahead... hoping....
> The definitive Linux box appears to be $199 from Walmart right now,
> and its not SMP.
In two year this kind of hardware probably will be SMP (HT or some
variant).
--cw
^ permalink raw reply [flat|nested] 157+ messages in thread* Re: Minutes from Feb 21 LSE Call
2003-02-25 5:19 ` Chris Wedgwood
@ 2003-02-25 5:26 ` William Lee Irwin III
2003-02-25 21:21 ` Chris Wedgwood
2003-02-25 6:17 ` Martin J. Bligh
` (2 subsequent siblings)
3 siblings, 1 reply; 157+ messages in thread
From: William Lee Irwin III @ 2003-02-25 5:26 UTC (permalink / raw)
To: Chris Wedgwood
Cc: Alan Cox, Larry McVoy, Martin J. Bligh, Linux Kernel Mailing List
On Mon, Feb 24, 2003 at 01:28:30PM +0000, Alan Cox wrote:
>> _If_ it harms performance on small boxes.
On Mon, Feb 24, 2003 at 09:19:56PM -0800, Chris Wedgwood wrote:
> You mean like the general slowdown from 2.4 - >2.5?
> It seems to me for small boxes, 2.5.x is margianlly slower at most
> things than 2.4.x.
> I'm hoping and the code solidifes and things are tuned this gap will
> go away and 2.5.x will inch ahead... hoping....
Could you help identify the regressions? Profiles? Workload?
On Mon, Feb 24, 2003 at 01:28:30PM +0000, Alan Cox wrote:
>> The definitive Linux box appears to be $199 from Walmart right now,
>> and its not SMP.
On Mon, Feb 24, 2003 at 09:19:56PM -0800, Chris Wedgwood wrote:
> In two year this kind of hardware probably will be SMP (HT or some
I'm a programmer not an economist (despite utility functions and Nash
equilibria). Don't tell me what's definitive, give me some profiles.
-- wli
^ permalink raw reply [flat|nested] 157+ messages in thread* Re: Minutes from Feb 21 LSE Call
2003-02-25 5:26 ` William Lee Irwin III
@ 2003-02-25 21:21 ` Chris Wedgwood
2003-02-25 21:14 ` Martin J. Bligh
2003-02-25 21:21 ` William Lee Irwin III
0 siblings, 2 replies; 157+ messages in thread
From: Chris Wedgwood @ 2003-02-25 21:21 UTC (permalink / raw)
To: William Lee Irwin III, Alan Cox, Larry McVoy, Martin J. Bligh,
Linux Kernel Mailing List
On Mon, Feb 24, 2003 at 09:26:02PM -0800, William Lee Irwin III wrote:
> Could you help identify the regressions? Profiles? Workload?
I the OSDL data that Cliff White pointed out sufficient to work-with,
or do you want specific tests run with oprofile outputs?
--cw
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-25 21:21 ` Chris Wedgwood
@ 2003-02-25 21:14 ` Martin J. Bligh
2003-02-25 21:21 ` William Lee Irwin III
1 sibling, 0 replies; 157+ messages in thread
From: Martin J. Bligh @ 2003-02-25 21:14 UTC (permalink / raw)
To: Chris Wedgwood, William Lee Irwin III, Linux Kernel Mailing List
>> Could you help identify the regressions? Profiles? Workload?
>
> I the OSDL data that Cliff White pointed out sufficient to work-with,
> or do you want specific tests run with oprofile outputs?
It's a great start, but profiles would really help if you can grab them.
M.
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-25 21:21 ` Chris Wedgwood
2003-02-25 21:14 ` Martin J. Bligh
@ 2003-02-25 21:21 ` William Lee Irwin III
2003-02-25 22:08 ` Larry McVoy
1 sibling, 1 reply; 157+ messages in thread
From: William Lee Irwin III @ 2003-02-25 21:21 UTC (permalink / raw)
To: Chris Wedgwood
Cc: Alan Cox, Larry McVoy, Martin J. Bligh, Linux Kernel Mailing List
On Mon, Feb 24, 2003 at 09:26:02PM -0800, William Lee Irwin III wrote:
>> Could you help identify the regressions? Profiles? Workload?
On Tue, Feb 25, 2003 at 01:21:15PM -0800, Chris Wedgwood wrote:
> I the OSDL data that Cliff White pointed out sufficient to work-with,
> or do you want specific tests run with oprofile outputs?
oprofile is what's needed. Looks like he's taking care of that too.
-- wli
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-25 21:21 ` William Lee Irwin III
@ 2003-02-25 22:08 ` Larry McVoy
2003-02-25 22:10 ` William Lee Irwin III
2003-02-25 22:37 ` Chris Wedgwood
0 siblings, 2 replies; 157+ messages in thread
From: Larry McVoy @ 2003-02-25 22:08 UTC (permalink / raw)
To: William Lee Irwin III, Chris Wedgwood, Alan Cox, Larry McVoy,
Martin J. Bligh, Linux Kernel Mailing List
On Tue, Feb 25, 2003 at 01:21:34PM -0800, William Lee Irwin III wrote:
> On Mon, Feb 24, 2003 at 09:26:02PM -0800, William Lee Irwin III wrote:
> >> Could you help identify the regressions? Profiles? Workload?
>
> On Tue, Feb 25, 2003 at 01:21:15PM -0800, Chris Wedgwood wrote:
> > I the OSDL data that Cliff White pointed out sufficient to work-with,
> > or do you want specific tests run with oprofile outputs?
>
> oprofile is what's needed. Looks like he's taking care of that too.
Without doing something about the page coloring problem (and he might be)
the numbers will be fairly meaningless.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-25 22:08 ` Larry McVoy
@ 2003-02-25 22:10 ` William Lee Irwin III
2003-02-25 22:37 ` Chris Wedgwood
1 sibling, 0 replies; 157+ messages in thread
From: William Lee Irwin III @ 2003-02-25 22:10 UTC (permalink / raw)
To: Larry McVoy, Chris Wedgwood, Alan Cox, Larry McVoy,
Martin J. Bligh, Linux Kernel Mailing List
On Tue, Feb 25, 2003 at 01:21:34PM -0800, William Lee Irwin III wrote:
>> oprofile is what's needed. Looks like he's taking care of that too.
On Tue, Feb 25, 2003 at 02:08:11PM -0800, Larry McVoy wrote:
> Without doing something about the page coloring problem (and he might be)
> the numbers will be fairly meaningless.
Hmm, point. Let's see if we can get Cliff to apply the new patch that
one guy put out yesterday or so.
-- wli
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-25 22:08 ` Larry McVoy
2003-02-25 22:10 ` William Lee Irwin III
@ 2003-02-25 22:37 ` Chris Wedgwood
2003-02-25 22:58 ` Larry McVoy
1 sibling, 1 reply; 157+ messages in thread
From: Chris Wedgwood @ 2003-02-25 22:37 UTC (permalink / raw)
To: Larry McVoy, William Lee Irwin III, Alan Cox, Larry McVoy,
Martin J. Bligh, Linux Kernel Mailing List
On Tue, Feb 25, 2003 at 02:08:11PM -0800, Larry McVoy wrote:
> Without doing something about the page coloring problem (and he
> might be) the numbers will be fairly meaningless.
page coloring problem?
i was under the impression on anything 8-way-associative or better the
page coloring improvements were negligible for real-world benchmarks
(ie. kernel compiles)
... or is this more an artifact that even though the improvements for
real-world are negligible, micro-benchmarks are susceptible to these
variations this making things like the std. dev. larger than it would
otherwise be?
--cw
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-25 22:37 ` Chris Wedgwood
@ 2003-02-25 22:58 ` Larry McVoy
0 siblings, 0 replies; 157+ messages in thread
From: Larry McVoy @ 2003-02-25 22:58 UTC (permalink / raw)
To: Chris Wedgwood
Cc: Larry McVoy, William Lee Irwin III, Alan Cox, Martin J. Bligh,
Linux Kernel Mailing List
> ... or is this more an artifact that even though the improvements for
> real-world are negligible, micro-benchmarks are susceptible to these
> variations this making things like the std. dev. larger than it would
> otherwise be?
Bingo. If you are trying to measure whether something adds cache misses
you really want reproducible runs.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-25 5:19 ` Chris Wedgwood
2003-02-25 5:26 ` William Lee Irwin III
@ 2003-02-25 6:17 ` Martin J. Bligh
2003-02-25 17:11 ` Cliff White
2003-02-25 21:28 ` William Lee Irwin III
2003-02-25 19:20 ` Alan Cox
2003-02-25 19:59 ` Scott Robert Ladd
3 siblings, 2 replies; 157+ messages in thread
From: Martin J. Bligh @ 2003-02-25 6:17 UTC (permalink / raw)
To: Chris Wedgwood, Alan Cox; +Cc: Larry McVoy, Linux Kernel Mailing List
>> _If_ it harms performance on small boxes.
>
> You mean like the general slowdown from 2.4 - >2.5?
>
> It seems to me for small boxes, 2.5.x is margianlly slower at most
> things than 2.4.x.
Can you name a benchmark, or at least do something reproducible between
versions, and produce a 2.4 vs 2.5 profile? Let's at least try to fix it ...
M.
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-25 6:17 ` Martin J. Bligh
@ 2003-02-25 17:11 ` Cliff White
2003-02-25 17:17 ` William Lee Irwin III
` (2 more replies)
2003-02-25 21:28 ` William Lee Irwin III
1 sibling, 3 replies; 157+ messages in thread
From: Cliff White @ 2003-02-25 17:11 UTC (permalink / raw)
To: Martin J. Bligh
Cc: Chris Wedgwood, Alan Cox, Larry McVoy, Linux Kernel Mailing List,
cliffw
> >> _If_ it harms performance on small boxes.
> >
> > You mean like the general slowdown from 2.4 - >2.5?
> >
> > It seems to me for small boxes, 2.5.x is margianlly slower at most
> > things than 2.4.x.
>
> Can you name a benchmark, or at least do something reproducible between
> versions, and produce a 2.4 vs 2.5 profile? Let's at least try to fix it ...
>
> M.
Well, here's one bit of data. Easy enough to do if you have a web browser.
LMBench 2.0 on 1-way and 2-way, kernels 2.4.18 and 2.5.60
1-way (stp1-003 stp1-002)
2.4.18 http://khack.osdl.org/stp/7443/
2.5.60 http://khack.osdl.org/stp/265622/
2-way (stp2-003 stp2-000)
2.4.18 http://khack.osdl.org/stp/3165/
2.5.60 http://khack.osdl.org/stp/265643/
Interesting items for me are the fork/exec/sh times and some of the file + VM
numbers
LMBench 2.0 Data ( items selected from total of five runs )
Processor, Processes - times in microseconds - smaller is better
----------------------------------------------------------------
Host OS Mhz null null open selct sig sig fork exec sh
call I/O stat clos TCP inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ----- ---- ---- ---- ---- ----
stp2-003. Linux 2.4.18 1000 0.39 0.67 3.89 4.99 30.4 0.93 3.06 344. 1403 4465
stp2-000. Linux 2.5.60 1000 0.41 0.77 4.34 5.57 32.6 1.15 3.59 245. 1406 5795
stp1-003. Linux 2.4.18 1000 0.32 0.46 2.60 3.21 16.6 0.79 2.52 104. 918. 4460
stp1-002. Linux 2.5.60 1000 0.33 0.47 2.83 3.47 16.0 0.94 2.70 143. 1212 5292
Context switching - times in microseconds - smaller is better
-------------------------------------------------------------
Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw
--------- ------------- ----- ------ ------ ------ ------ ------- -------
stp2-003. Linux 2.4.18 2.680 6.2100 15.8 7.9400 110.7 26.4 111.1
stp2-000. Linux 2.5.60 1.590 5.0700 17.6 7.5800 79.8 11.0 113.6
stp1-003. Linux 2.4.18 0.590 3.4700 11.1 4.8200 134.3 30.8 131.7
stp1-002. Linux 2.5.60 1.000 3.5400 11.2 4.1400 129.6 30.4 127.8
*Local* Communication latencies in microseconds - smaller is better
-------------------------------------------------------------------
Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP
ctxsw UNIX UDP TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
stp2-003. Linux 2.4.18 2.680 9.071 17.5 26.9 46.2 34.4 60.0 62.9
stp2-000. Linux 2.5.60 1.590 8.414 13.2 21.2 43.2 28.3 54.1 97.1
stp1-003. Linux 2.4.18 0.590 3.623 6.98 11.7 28.2 17.8 38.4 300K
stp1-002. Linux 2.5.60 1.050 4.591 8.54 14.8 31.8 20.0 41.0 67.1
File & VM system latencies in microseconds - smaller is better
--------------------------------------------------------------
Host OS 0K File 10K File Mmap Prot Page
Create Delete Create Delete Latency Fault Fault
--------- ------------- ------ ------ ------ ------ ------- ----- -----
stp2-003. Linux 2.4.18 34.6 7.2490 110.9 17.9 2642.0 0.771 3.00000
stp2-000. Linux 2.5.60 40.0 9.2780 113.3 23.3 4592.0 0.543 3.00000
stp1-003. Linux 2.4.18 28.8 4.8890 107.5 11.3 686.0 0.621 2.00000
stp1-002. Linux 2.5.60 32.4 6.4290 112.9 16.2 1455.0 0.465 2.00000
*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------
Host OS Pipe AF TCP File Mmap Bcopy Bcopy Mem Mem
UNIX reread reread (libc) (hand) read write
--------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----
stp2-003. Linux 2.4.18 563. 277. 263. 437.0 552.8 249.1 180.7 553. 215.2
stp2-000. Linux 2.5.60 603. 516. 151. 436.3 549.0 238.0 171.9 548. 233.7
stp1-003. Linux 2.4.18 1009 820. 404. 414.3 467.0 167.2 154.1 466. 236.2
stp1-002. Linux 2.5.60 806. 584. 69.1 408.0 461.7 161.1 149.1 461. 233.5
Memory latencies in nanoseconds - smaller is better
(WARNING - may not be correct, check graphs)
---------------------------------------------------
Host OS Mhz L1 $ L2 $ Main mem Guesses
--------- ------------- ---- ----- ------ -------- -------
stp2-003. Linux 2.4.18 1000 3.464 8.0820 110.9
stp2-000. Linux 2.5.60 1000 3.545 8.2790 110.6
stp1-003. Linux 2.4.18 1000 2.994 6.9850 121.4
stp1-002. Linux 2.5.60 1000 3.023 7.0530 122.5
------------------
cliffw
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
^ permalink raw reply [flat|nested] 157+ messages in thread* Re: Minutes from Feb 21 LSE Call
2003-02-25 17:11 ` Cliff White
@ 2003-02-25 17:17 ` William Lee Irwin III
2003-02-25 17:38 ` Linus Torvalds
2003-02-25 19:48 ` Martin J. Bligh
2 siblings, 0 replies; 157+ messages in thread
From: William Lee Irwin III @ 2003-02-25 17:17 UTC (permalink / raw)
To: Cliff White
Cc: Martin J. Bligh, Chris Wedgwood, Alan Cox, Larry McVoy,
Linux Kernel Mailing List
On Tue, Feb 25, 2003 at 09:11:38AM -0800, Cliff White wrote:
> Interesting items for me are the fork/exec/sh times and some of the file + VM
> numbers
> LMBench 2.0 Data ( items selected from total of five runs )
Okay, got profiles for the individual tests you're interested in?
Also, what are the statistical significance cutoffs?
-- wli
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-25 17:11 ` Cliff White
2003-02-25 17:17 ` William Lee Irwin III
@ 2003-02-25 17:38 ` Linus Torvalds
2003-02-25 19:54 ` Dave Jones
2003-02-25 19:48 ` Martin J. Bligh
2 siblings, 1 reply; 157+ messages in thread
From: Linus Torvalds @ 2003-02-25 17:38 UTC (permalink / raw)
To: linux-kernel
In article <200302251711.h1PHBct16624@mail.osdl.org>,
Cliff White <cliffw@osdl.org> wrote:
>
>Well, here's one bit of data. Easy enough to do if you have a web browser.
>LMBench 2.0 on 1-way and 2-way, kernels 2.4.18 and 2.5.60
>1-way (stp1-003 stp1-002)
>2.4.18 http://khack.osdl.org/stp/7443/
>2.5.60 http://khack.osdl.org/stp/265622/
>
>2-way (stp2-003 stp2-000)
>2.4.18 http://khack.osdl.org/stp/3165/
>2.5.60 http://khack.osdl.org/stp/265643/
>
>Interesting items for me are the fork/exec/sh times and some of the file + VM
>numbers
>LMBench 2.0 Data ( items selected from total of five runs )
>
>Processor, Processes - times in microseconds - smaller is better
>----------------------------------------------------------------
>Host OS Mhz null null open selct sig sig fork exec sh
> call I/O stat clos TCP inst hndl proc proc proc
>--------- ------------- ---- ---- ---- ---- ---- ----- ---- ---- ---- ---- ----
>stp2-003. Linux 2.4.18 1000 0.39 0.67 3.89 4.99 30.4 0.93 3.06 344. 1403 4465
>stp2-000. Linux 2.5.60 1000 0.41 0.77 4.34 5.57 32.6 1.15 3.59 245. 1406 5795
Note that those numbers will look quite different (at least on a P4) if
you use a modern library that uses the "sysenter" stuff. The difference
ends up being something like this:
Host OS Mhz null null open selct sig sig fork exec sh
call I/O stat clos inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ----- ---- ---- ---- ---- ----
i686-linu Linux 2.5.30 2380 0.8 1.1 3 5 0.04K 1.1 3 0.2K 1K 3K
i686-linu Linux 2.5.62 2380 0.2 0.6 3 4 0.04K 0.7 3 0.2K 1K 3K
(Yeah, I've never run a 2.4.x kernel on this machine, so..) In other
words, the system call has been speeded up quite noticeably.
Yes, if you don't take advantage of sysenter, then all the sysenter
support will just make us look worse ;(
I'm surprised by your "sh proc" changes, they are quite big. I guess
it's rmap and highmem that bites us, and yes, we've gotten slower there.
Linus
^ permalink raw reply [flat|nested] 157+ messages in thread* Re: Minutes from Feb 21 LSE Call
2003-02-25 17:38 ` Linus Torvalds
@ 2003-02-25 19:54 ` Dave Jones
2003-02-26 2:04 ` Linus Torvalds
0 siblings, 1 reply; 157+ messages in thread
From: Dave Jones @ 2003-02-25 19:54 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-kernel
On Tue, Feb 25, 2003 at 05:38:31PM +0000, Linus Torvalds wrote:
> Yes, if you don't take advantage of sysenter, then all the sysenter
> support will just make us look worse ;(
Andi's patch[1] to remove one of the wrmsr's from the context switch
fast path should win back at least some of the lost microbenchmark
points. (Full info at http://bugzilla.kernel.org/show_bug.cgi?id=350)
Dave
[1] http://bugzilla.kernel.org/attachment.cgi?id=140&action=view
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-25 19:54 ` Dave Jones
@ 2003-02-26 2:04 ` Linus Torvalds
0 siblings, 0 replies; 157+ messages in thread
From: Linus Torvalds @ 2003-02-26 2:04 UTC (permalink / raw)
To: Dave Jones; +Cc: linux-kernel
On Tue, 25 Feb 2003, Dave Jones wrote:
>
> > Yes, if you don't take advantage of sysenter, then all the sysenter
> > support will just make us look worse ;(
>
> Andi's patch[1] to remove one of the wrmsr's from the context switch
> fast path should win back at least some of the lost microbenchmark
> points.
But the patch is fundamentally broken wrt preemption at least, and it
looks totally unfixable.
It's also overly complex, for no apparent reason. The simple way to avoid
the wrmsr of SYSENTER_CS is to just cache a per-cpu copy in memory,
preferably in some location that is already in the cache at context switch
time for other reasons.
Linus
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-25 17:11 ` Cliff White
2003-02-25 17:17 ` William Lee Irwin III
2003-02-25 17:38 ` Linus Torvalds
@ 2003-02-25 19:48 ` Martin J. Bligh
2 siblings, 0 replies; 157+ messages in thread
From: Martin J. Bligh @ 2003-02-25 19:48 UTC (permalink / raw)
To: Cliff White
Cc: Chris Wedgwood, Alan Cox, Larry McVoy, Linux Kernel Mailing List
> Interesting items for me are the fork/exec/sh times and some of the file
> + VM numbers
For the ones where you see degradation in fork/exec type stuff, any chance
you could rerun them with 62-mjb3 with the objrmap stuff in it? That should
fix a lot of the overhead.
Thanks,
M.
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-25 6:17 ` Martin J. Bligh
2003-02-25 17:11 ` Cliff White
@ 2003-02-25 21:28 ` William Lee Irwin III
1 sibling, 0 replies; 157+ messages in thread
From: William Lee Irwin III @ 2003-02-25 21:28 UTC (permalink / raw)
To: Martin J. Bligh
Cc: Chris Wedgwood, Alan Cox, Larry McVoy, Linux Kernel Mailing List
At some point in the past, Chris Wedgewood wrote:
>> It seems to me for small boxes, 2.5.x is margianlly slower at most
>> things than 2.4.x.
On Mon, Feb 24, 2003 at 10:17:05PM -0800, Martin J. Bligh wrote:
> Can you name a benchmark, or at least do something reproducible between
> versions, and produce a 2.4 vs 2.5 profile? Let's at least try to fix it ...
Looks like Cliff's got some good data.
-- wli
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-25 5:19 ` Chris Wedgwood
2003-02-25 5:26 ` William Lee Irwin III
2003-02-25 6:17 ` Martin J. Bligh
@ 2003-02-25 19:20 ` Alan Cox
2003-02-25 19:59 ` Scott Robert Ladd
3 siblings, 0 replies; 157+ messages in thread
From: Alan Cox @ 2003-02-25 19:20 UTC (permalink / raw)
To: Chris Wedgwood; +Cc: Larry McVoy, Martin J. Bligh, Linux Kernel Mailing List
On Tue, 2003-02-25 at 05:19, Chris Wedgwood wrote:
> > The definitive Linux box appears to be $199 from Walmart right now,
> > and its not SMP.
>
> In two year this kind of hardware probably will be SMP (HT or some
> variant).
Not if it costs money. If the cheapest reasonable x86 cpu is one that has chosen
to avoid HT and SMP it won't have HT and SMP. Think 4xUSB2 connectors, brick PSU
and no user adjustable components.
^ permalink raw reply [flat|nested] 157+ messages in thread
* RE: Minutes from Feb 21 LSE Call
2003-02-25 5:19 ` Chris Wedgwood
` (2 preceding siblings ...)
2003-02-25 19:20 ` Alan Cox
@ 2003-02-25 19:59 ` Scott Robert Ladd
2003-02-25 20:18 ` jlnance
2003-02-25 21:19 ` Chris Wedgwood
3 siblings, 2 replies; 157+ messages in thread
From: Scott Robert Ladd @ 2003-02-25 19:59 UTC (permalink / raw)
To: Chris Wedgwood, Alan Cox
Cc: Larry McVoy, Martin J. Bligh, Linux Kernel Mailing List
Chris Wedgwood wrote:
> > The definitive Linux box appears to be $199 from Walmart right now,
> > and its not SMP.
>
> In two year this kind of hardware probably will be SMP (HT or some
> variant).
HT is not the same thing as SMP; while the chip may appear to be two
processors, it is actually equivalent 1.1 to 1.3 processors, depending on
the application.
Multicore processors and true SMP systems are unlikely to become mainstream
consumer items, given the premium price charged for such systems.
That given, I see some value in a stripped-down, low-overhead,
consumer-focused Linux that targets uniprocessor and HT systems, to be used
in the typical business or gaming PC. I'm not sure such is achievable with
the current config options; perhaps I should try to see how small a kernel I
can build for a simple ia32 system...
..Scott
Scott Robert Ladd
Coyote Gulch Productions (http://www.coyotegulch.com)
Professional programming for science and engineering;
Interesting and unusual bits of very free code.
^ permalink raw reply [flat|nested] 157+ messages in thread* Re: Minutes from Feb 21 LSE Call
2003-02-25 19:59 ` Scott Robert Ladd
@ 2003-02-25 20:18 ` jlnance
2003-02-25 20:59 ` Scott Robert Ladd
2003-02-25 21:19 ` Chris Wedgwood
1 sibling, 1 reply; 157+ messages in thread
From: jlnance @ 2003-02-25 20:18 UTC (permalink / raw)
To: linux-kernel
On Tue, Feb 25, 2003 at 02:59:05PM -0500, Scott Robert Ladd wrote:
> > In two year this kind of hardware probably will be SMP (HT or some
> > variant).
>
> HT is not the same thing as SMP; while the chip may appear to be two
> processors, it is actually equivalent 1.1 to 1.3 processors, depending on
> the application.
>
> Multicore processors and true SMP systems are unlikely to become mainstream
> consumer items, given the premium price charged for such systems.
I think the difference between SMP and HT is likely to decrease rather
than increase in the future. Even now people want to put multiple CPUs
on the same piece of silicon. Once you do that it only makes sense to
start sharning things between them. If you had a system with 2 CPUs
which shared a common L1 cache is that going to be a HT or an SMP system?
Or you could go further and have 2 CPUs which share an FPU. There are
all sorts of combinations you could come up with. I think designers
will experiment and find the one that gives the most throughput for
the least money.
Jim
^ permalink raw reply [flat|nested] 157+ messages in thread
* RE: Minutes from Feb 21 LSE Call
2003-02-25 20:18 ` jlnance
@ 2003-02-25 20:59 ` Scott Robert Ladd
0 siblings, 0 replies; 157+ messages in thread
From: Scott Robert Ladd @ 2003-02-25 20:59 UTC (permalink / raw)
To: jlnance, linux-kernel
jlnance@unity.ncsu.edu wrote:
> I think the difference between SMP and HT is likely to decrease rather
> than increase in the future. Even now people want to put multiple CPUs
> on the same piece of silicon. Once you do that it only makes sense to
> start sharning things between them. If you had a system with 2 CPUs
> which shared a common L1 cache is that going to be a HT or an SMP system?
> Or you could go further and have 2 CPUs which share an FPU. There are
> all sorts of combinations you could come up with. I think designers
> will experiment and find the one that gives the most throughput for
> the least money.
IBM's forthcoming Power5 will have two cores, each with SMT (the generic
term for HyperThreading); it will present itself to the OS as four
processors. Those four processors, however, are not equal; SMT is certainly
valuable, but it can only be as effective as mutliple cores if it in effect
*becomes* multiple cores (and, as such, turns into SMP).
I'm writing a chapter on memory architectures in my parallel programming
book; it's giving me a bit of a headache, as the issues you raise are both
important and complex. We have multiple levels of caches, NUMA
architectures, clusters, SMP, HT... the list just goes on and on, infinite
in diversity and combinations. Vendors will continue to experiment; I doubt
very much that any one architecture will take center stage.
I hope Linux handles the brain-sprain better than I am at the moment! ;)
..Scott
Scott Robert Ladd
Coyote Gulch Productions (http://www.coyotegulch.com)
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-25 19:59 ` Scott Robert Ladd
2003-02-25 20:18 ` jlnance
@ 2003-02-25 21:19 ` Chris Wedgwood
2003-02-25 21:38 ` Scott Robert Ladd
1 sibling, 1 reply; 157+ messages in thread
From: Chris Wedgwood @ 2003-02-25 21:19 UTC (permalink / raw)
To: Scott Robert Ladd
Cc: Alan Cox, Larry McVoy, Martin J. Bligh, Linux Kernel Mailing List
On Tue, Feb 25, 2003 at 02:59:05PM -0500, Scott Robert Ladd wrote:
> HT is not the same thing as SMP; while the chip may appear to be two
> processors, it is actually equivalent 1.1 to 1.3 processors,
> depending on the application.
You can't have non-integer numbers of processors. HT is a hack that
makes what appears to be two processors using common silicon.
The fact it's slower than a really dual CPU box is irrelevant in some
sense, you still need SMP smart to deal with it; it's only important
when you want to know why performance increases aren't apparent or you
loose performance in some cases... (ie. other virtual CPU thrashing
the cache).
> Multicore processors and true SMP systems are unlikely to become
> mainstream consumer items, given the premium price charged for such
> systems.
I overstated things thinking SMP/HT would be in low-end hardware given
two years.
As Alan pointed out, since the 'Walmart' class hardware is 'whatever
is cheapest' then perhaps HT/SMT/whatever won't be common place for
super-low end boxes in two years --- but I would be surprised if it
didn't gain considerable market share elsewhere.
> That given, I see some value in a stripped-down, low-overhead,
> consumer-focused Linux that targets uniprocessor and HT systems, to
> be used in the typical business or gaming PC.
UP != HT
HT is SMP with magic requirements. For multiple physical CPUs the
requirements become even more complex; you want to try to group tasks
to physical CPUs, not logical ones lest you thrash the cache.
Presumably there are other tweaks possible two, cache-line's don't
bounce between logic CPUs on a physical CPU for example, so some locks
and other data structures will be much faster to access than those
which actually do need cache-lines to migrate between different
physical CPUs. I'm not sure if these specific property cane be
exploited in the general case though.
> I'm not sure such is achievable with the current config options;
> perhaps I should try to see how small a kernel I can build for a
> simple ia32 system...
Present 2.5.x looks like it will have smarts for HT as a subset of
NUMA.
If HT does become more common and similar things abound, I'm not sure
if it even makes sense to have a UP kernel for certain platforms
and/or CPUs --- since a mere BIOS change will affect what is
'virtually' apparent to the OS.
--cw
^ permalink raw reply [flat|nested] 157+ messages in thread
* RE: Minutes from Feb 21 LSE Call
2003-02-25 21:19 ` Chris Wedgwood
@ 2003-02-25 21:38 ` Scott Robert Ladd
0 siblings, 0 replies; 157+ messages in thread
From: Scott Robert Ladd @ 2003-02-25 21:38 UTC (permalink / raw)
To: Chris Wedgwood
Cc: Alan Cox, Larry McVoy, Martin J. Bligh, Linux Kernel Mailing List
Chris Wedgwood wrote:
SRL>HT is not the same thing as SMP; while the chip may appear to be
SRL>two processors, it is actually equivalent 1.1 to 1.3 processors,
SRL>depending on the application.
>
CW> You can't have non-integer numbers of processors. HT is a hack
CW> that makes what appears to be two processors using common
CW> silicon.
I'm aware of that. ;) I'm well aware of the architecture needed to support
HT.
> The fact it's slower than a really dual CPU box is irrelevant in some
> sense, you still need SMP smart to deal with it; it's only important
> when you want to know why performance increases aren't apparent or you
> loose performance in some cases... (ie. other virtual CPU thrashing
> the cache).
Performance differences *are* quite relevant when it comes to thread
scheduling; the two virtual CPUS are not necessarily equivalent in
performnace.
> As Alan pointed out, since the 'Walmart' class hardware is 'whatever
> is cheapest' then perhaps HT/SMT/whatever won't be common place for
> super-low end boxes in two years --- but I would be surprised if it
> didn't gain considerable market share elsewhere.
I suspect HT/SMT be common for people who have multimedia systems, for video
editing and high-end gaming.
I doubt we'll see SMT toasters, though.
> UP != HT
An HT system is still a single, phsyical processor; HT is not equivalent to
a multicore chip, either. Much depends on memory and connection models; a
dual-core chip may be faster or slower than two similar physical SMP
processors. depending on the architecture.
I was speaking in terms of Intel's push to add HT to all of their P4s.
Systems with a single CPU will likely have HT; that still doesn't make them
as powerful as a true dual processor (or dual core CPU) system.
> HT is SMP with magic requirements. For multiple physical CPUs the
> requirements become even more complex; you want to try to group tasks
> to physical CPUs, not logical ones lest you thrash the cache.
Eaxctly. This is why HT is not the same thing as two physical CPUs. The OS
must be aware of this the effectively schedule jobs. So I think we generally
agree.
> If HT does become more common and similar things abound, I'm not sure
> if it even makes sense to have a UP kernel for certain platforms
> and/or CPUs --- since a mere BIOS change will affect what is
> 'virtually' apparent to the OS.
A good point.
..Scott
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-24 6:58 ` Larry McVoy
` (2 preceding siblings ...)
2003-02-24 13:28 ` Alan Cox
@ 2003-02-24 18:44 ` Davide Libenzi
3 siblings, 0 replies; 157+ messages in thread
From: Davide Libenzi @ 2003-02-24 18:44 UTC (permalink / raw)
To: Larry McVoy; +Cc: Linux Kernel Mailing List
On Sun, 23 Feb 2003, Larry McVoy wrote:
> > Because I don't see why I should waste my time running benchmarks just to
> > prove you wrong. I don't respect you that much, and it seems the
> > maintainers don't either. When you become somebody with the stature in the
> > Linux community of, say, Linus or Andrew I'd be prepared to spend a lot
> > more time running benchmarks on any concerns you might have.
>
> Who cares if you respect me, what does that have to do with proper
> engineering? Do you think that I'm the only person who wants to see
> numbers? You think Linus doesn't care about this? Maybe you missed
> the whole IA32 vs IA64 instruction cache thread. It sure sounded like
> he cares. How about Alan? He stepped up and pointed out that less
> is more. How about Mark? He knows a thing or two about the topic?
> In fact, I think you'd be hard pressed to find anyone who wouldn't be
> interested in seeing the cache effects of a patch.
>
> People care about performance, both scaling up and scaling down. A lot of
> performance changes are measured poorly, in a way that makes the changes
> look good but doesn't expose the hidden costs of the change. What I'm
> saying is that those sorts of measurements screwed over performance in
> the past, why are you trying to repeat old mistakes?
Larry, how many times this kind of discussions went on during the last
years ? I think you should remember pretty well because it was always you
on that side of the river pushing back "Barbarians" with your UP sword.
The point is that people ( expecially young ) like to dig where other
failed, it's normal. It's attractive like honey for bears. Let them try,
many they will fail, but chances are that someone will succeed making it
worth the try. And trust Linus, that is more on your wavelength than on
the huge scalabity one.
- Davide
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-22 23:15 ` Larry McVoy
2003-02-22 23:23 ` Christoph Hellwig
2003-02-22 23:44 ` Martin J. Bligh
@ 2003-02-22 23:57 ` Jeff Garzik
2003-02-23 23:57 ` Bill Davidsen
3 siblings, 0 replies; 157+ messages in thread
From: Jeff Garzik @ 2003-02-22 23:57 UTC (permalink / raw)
To: Larry McVoy, Martin J. Bligh, Larry McVoy, Mark Hahn,
David S. Miller, linux-kernel
On Sat, Feb 22, 2003 at 03:15:52PM -0800, Larry McVoy wrote:
> or rackmount cases. I fail to see how there are better margins on the
> same hardware in a rackmount box for $800 when the desktop costs $750.
> Those rack mount power supplies and cases are not as cheap as the desktop
> ones, so I see no difference in the margins.
Oh, it's definitely different hardware. Maybe the 16550-related portion
of the ASIC is the same :) but just do an lspci to see huge differences in
motherboard chipsets, on-board parts, more complicated BIOS, remote
management bells and whistles, etc. Even the low-end rackmounts.
But the better margins come simply from the mentality, IMO. Desktops
just aren't "as important" to a business compared to servers, so IT
shops are willing to spend more money to not only get better hardware,
but also the support services that accompany it. Selling servers
to enterprise data centers means bigger, more concentrated cash pool.
Jeff
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-22 23:15 ` Larry McVoy
` (2 preceding siblings ...)
2003-02-22 23:57 ` Jeff Garzik
@ 2003-02-23 23:57 ` Bill Davidsen
2003-02-24 6:22 ` Val Henson
3 siblings, 1 reply; 157+ messages in thread
From: Bill Davidsen @ 2003-02-23 23:57 UTC (permalink / raw)
To: Larry McVoy; +Cc: Linux Kernel Mailing List
On Sat, 22 Feb 2003, Larry McVoy wrote:
> > We would never try to propose such a change, and never have.
> > Name a scalability change that's hurt the performance of UP by 5%.
> > There isn't one.
>
> This is *exactly* the reasoning that every OS marketing weenie has used
> for the last 20 years to justify their "feature" of the week.
>
> The road to slow bloated code is paved one cache miss at a time. You
> may quote me on that. In fact, print it out and put it above your
> monitor and look at it every day. One cache miss at a time. How much
> does one cache miss add to any benchmark? .001%? Less.
>
> But your pet features didn't slow the system down. Nope, they just made
> the cache smaller, which you didn't notice because whatever artificial
> benchmark you ran didn't happen to need the whole cache.
Clearly this is the case, the benefit of a change must balance the
negative effects. Making the code paths longer hurts free cache, having
more of them should not. More code is not always slower code, and doesn't
always have more impact on cache use. You identify something which must be
considered, but it's not the only thing to consider. Linux shouild be
stable, not moribund.
> You need to understand that system resources belong to the user. Not the
> kernel. The goal is to have all of the kernel code running under any
> load be less than 1% of the CPU. Your 5% number up there would pretty
> much double the amount of time we spend in the kernel for most workloads.
Who profits? For most users a bit more system time resulting in better
disk performance would be a win, or at least non-lose. This isn't black
and white.
On Sat, 22 Feb 2003, Larry McVoy wrote:
> Let's get back to your position. You want to shovel stuff in the kernel
> for the benefit of the 32 way / 64 way etc boxes. I don't see that as
> wise. You could prove me wrong. Here's how you do it: go get oprofile
> or whatever that tool is which lets you run apps and count cache misses.
> Start including before/after runs of each microbench in lmbench and
> some time sharing loads with and without your changes. When you can do
> that and you don't add any more bus traffic, you're a genius and
> I'll shut up.
Code only costs when it's executed. Linux is somewhat heading to the place
where a distro has a few useful configs and then people who care for the
last bit of whatever they see as a bottleneck can build their own fro
"make config." So it is possible to add features for big machines without
any impact on the builds which don't use the features. it goes without
saying that this is hard. I would guess that it results in more bugs as
well, if one path or another is "the less-traveled way."
>
> But that's a false promise because by definition, fine grained threading
> adds more bus traffic. It's kind of hard to not have that happen, the
> caches have to stay coherent somehow.
Clearly. And things which require more locking will pay some penalty for
this. But a quick scan of this list on keyword "lockless' will show that
people are thinking about this.
I don't think developers will buy ignoring part of the market to
completely optimize for another. Linux will grow by being ubiquitious, not
by winning some battle and losing the war. It's not a niche market os.
--
bill davidsen <davidsen@tmr.com>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
^ permalink raw reply [flat|nested] 157+ messages in thread* Re: Minutes from Feb 21 LSE Call
2003-02-23 23:57 ` Bill Davidsen
@ 2003-02-24 6:22 ` Val Henson
2003-02-24 6:41 ` William Lee Irwin III
0 siblings, 1 reply; 157+ messages in thread
From: Val Henson @ 2003-02-24 6:22 UTC (permalink / raw)
To: Bill Davidsen; +Cc: Larry McVoy, Linux Kernel Mailing List
On Sun, Feb 23, 2003 at 06:57:09PM -0500, Bill Davidsen wrote:
> On Sat, 22 Feb 2003, Larry McVoy wrote:
> >
> > But that's a false promise because by definition, fine grained threading
> > adds more bus traffic. It's kind of hard to not have that happen, the
> > caches have to stay coherent somehow.
>
> Clearly. And things which require more locking will pay some penalty for
> this. But a quick scan of this list on keyword "lockless' will show that
> people are thinking about this.
Lockless algorithms still generate bus traffic when you do the atomic
compare-and-swap or load-linked or whatever hardware instruction you
use to implement your lockless algorithm. Caches still have to stay
coherent, lock or no lock.
-VAL
^ permalink raw reply [flat|nested] 157+ messages in thread
* Re: Minutes from Feb 21 LSE Call
2003-02-24 6:22 ` Val Henson
@ 2003-02-24 6:41 ` William Lee Irwin III
0 siblings, 0 replies; 157+ messages in thread
From: William Lee Irwin III @ 2003-02-24 6:41 UTC (permalink / raw)
To: Val Henson; +Cc: Bill Davidsen, Larry McVoy, Linux Kernel Mailing List
On Sun, Feb 23, 2003 at 06:57:09PM -0500, Bill Davidsen wrote:
>> Clearly. And things which require more locking will pay some penalty for
>> this. But a quick scan of this list on keyword "lockless' will show that
>> people are thinking about this.
On Sun, Feb 23, 2003 at 11:22:30PM -0700, Val Henson wrote:
> Lockless algorithms still generate bus traffic when you do the atomic
> compare-and-swap or load-linked or whatever hardware instruction you
> use to implement your lockless algorithm. Caches still have to stay
> coherent, lock or no lock.
Not all lockless algorithms operate on the "access everything with
atomic operations" principle. RCU, for example, uses no atomic
operations on the read side, which is actually fewer atomic operations
than standard rwlocks use for the read side.
-- wli
^ permalink raw reply [flat|nested] 157+ messages in thread