From: Willy Tarreau <w@1wt.eu>
To: Ingo Molnar <mingo@elte.hu>
Cc: linux-kernel@vger.kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>,
Andrew Morton <akpm@linux-foundation.org>,
Mike Galbraith <efault@gmx.de>,
Arjan van de Ven <arjan@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
Dmitry Adamushko <dmitry.adamushko@gmail.com>,
Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Subject: Re: [patch] CFS scheduler, -v19
Date: Tue, 17 Jul 2007 23:44:41 +0200 [thread overview]
Message-ID: <20070717214441.GA20208@1wt.eu> (raw)
In-Reply-To: <20070709223950.GA29653@elte.hu>
Hi Ingo,
sorry for the long delay, I've spent a week doing non-kernel work.
On Tue, Jul 10, 2007 at 12:39:50AM +0200, Ingo Molnar wrote:
>
> * Willy Tarreau <w@1wt.eu> wrote:
>
> > > The biggest user-visible change in -v19 is reworked sleeper
> > > fairness: it's similar in behavior to -v18 but works more
> > > consistently across nice levels. Fork-happy workloads (like kernel
> > > builds) should behave better as well. There are also a handful of
> > > speedups: unsigned math, 32-bit speedups, O(1) task pickup,
> > > debloating and other micro-optimizations.
> >
> > Interestingly, I also noticed the possibility of O(1) task pickup when
> > playing with v18, but did not detect any noticeable improvement with
> > it. Of course, it depends on the workload and I probably didn't
> > perform the most relevant tests.
>
> yeah - it's a small tweak. CFS is O(31) in sleep/wakeup so it's now all
> a big O(1) family again :)
Yes, that's what I tried to explain to a guy once : what I like with log(N)
algos is that even with N very large, log(N) is always small, and it's
sometimes faster to perform log(N) fast operations than 1 slow operation.
That's also why I don't care about balanced trees : my unbalanced trees may
hold 32 levels for 32 carefully chosen values, while balanced trees will
have 5 levels (worst difference between both). If I can insert and delete
a node 6 times faster, I always win. And quite frankly, I'm not interested
at the 32 entries case in a tree :-)
> > V19 works very well here on 2.6.20.14. I could start 32k busy loops at
> > nice +19 (I exhausted the 32k pids limit), and could still perform
> > normal operations. I noticed that 'vmstat' scans all the pid entries
> > under /proc, which takes ages to collect data before displaying a
> > line. Obviously, the system sometimes shows some short latencies, but
> > not much more than what you get from and SSH through a remote DSL
> > connection.
>
> great! I did not try to push it this far, yet.
Well, I borrowed two 1GB sticks because I discovered that one of my 512MB
had one defect bit. It was finally an opportunity for me to push the test
this far.
> > Here's a vmstat 1 output :
> >
> > r b w swpd free buff cache si so bi bo in cs us sy id
> > 32437 0 0 0 809724 488 6196 0 0 1 0 135 0 24 72 4
> > 32436 0 0 0 811336 488 6196 0 0 0 0 717 0 78 22 0
>
> crazy :-)
indeed :-)
> > Amusingly, I started mpg123 during this test and it skipped quite a
> > bit. After setting all tasks to SCHED_IDLE, it did not skip anymore.
> > All this seems to behave like one could expect.
>
> yeah. It behaves better than i expected in fact - 32K tasks is pushing
> things quite a bit. (we've got a 32K PID limit for example)
Yes, and in fact, I suspect that we still have an O(N) or O(N^2) pid
allocation algo somewhere (I did not look at the code), because forking
was very very slow when reaching those numbers. I'll possibly check this
when I have some spare time, because it reminds me a trivial source port
ring allocator I wrote a few years ago which was O(1). With 32k pids, it
will only require 64kB RAM for the whole system, and we may even optimize
it to spread CPUs entry points in order to nearly always avoid lock
contention.
> > I also started 30k processes distributed in 130 groups of 234 chained
> > by pipes in which one byte is passed. I get an average of 8000 in the
> > run queue. The context switch rate is very low and sometimes even null
> > in this test, maybe some of them are starving, I really do not know :
> >
> > r b w swpd free buff cache si so bi bo in cs us sy id
> > 7752 0 1 0 656892 244 4196 0 0 0 0 725 0 16 84 0
>
> hm, could you profile this? We could have some bottleneck somewhere
> (likely not in the scheduler) with that many tasks being runnable. [
> With CFS you can actually run a profiler under this workload ;-) ]
I may probably try some time later (not this week-end, I have some 2.4 to
work on).
> > In my tree, I have replaced the rbtree with the ebtree we talked
> > about, but it did not bring any performance boost because, eventhough
> > insert() and delete() are faster, the scheduler is already quite good
> > at avoiding them as much as possible, mostly relying on rb_next()
> > which has the same cost in both trees. All in all, the only variations
> > I noticed were caused by cacheline alignment when I tried to reorder
> > fields in the eb_node. So I will stop my experimentations here since I
> > don't see any more room for improvement.
>
> well, just a little bit of improvement would be nice to have too :)
Yes but I prefer to merge it where it really bring something (I'll have a
look at epoll, I noticed epollctl() was 30% slower under 2.6 with an rbtree
as it is under 2.4 with a hash). Then people will tell me "you're completely
dumb, you could have improved it that way!" and then, once it's optimized to
be always faster than the rbtree, we can switch CFS to it again ;-)
> Ingo
Cheers,
Willy
next prev parent reply other threads:[~2007-07-17 21:45 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-07-06 17:33 [patch] CFS scheduler, -v19 Ingo Molnar
2007-07-08 17:46 ` Willy Tarreau
2007-07-09 22:39 ` Ingo Molnar
2007-07-17 21:44 ` Willy Tarreau [this message]
2007-07-10 8:08 ` Mike Galbraith
2007-07-11 17:26 ` Bill Davidsen
2007-07-11 20:55 ` Ingo Molnar
2007-07-12 12:41 ` Bill Davidsen
2007-07-13 21:19 ` Bill Davidsen
2007-07-16 21:34 ` Chuck Ebbert
2007-07-16 21:55 ` Ingo Molnar
2007-07-17 4:22 ` Bill Davidsen
2007-07-17 5:01 ` Ian Kent
2007-07-17 7:45 ` Ingo Molnar
2007-07-17 11:17 ` Ian Kent
2007-07-17 17:16 ` Ingo Molnar
2007-07-18 1:24 ` Bill Davidsen
2007-07-18 6:19 ` Ian Kent
2007-07-17 16:30 ` Chuck Ebbert
2007-07-17 21:16 ` David Schwartz
2007-07-18 5:59 ` Ian Kent
2007-07-18 7:54 ` Ingo Molnar
2007-07-18 13:50 ` Bill Davidsen
2007-07-18 17:23 ` Linus Torvalds
2007-07-18 16:03 ` Linus Torvalds
2007-07-18 17:31 ` Ian Kent
2007-07-18 21:37 ` Bill Davidsen
2007-07-19 8:53 ` Ingo Molnar
2007-07-19 14:32 ` Ingo Molnar
2007-07-19 17:06 ` Bill Davidsen
2007-07-19 17:10 ` Ingo Molnar
2007-07-19 17:17 ` Ingo Molnar
2007-07-19 17:26 ` Bill Davidsen
2007-07-19 17:42 ` Ingo Molnar
2007-07-20 2:32 ` Bill Davidsen
2007-07-19 8:16 ` Ingo Molnar
2007-07-14 11:34 ` Markus
2007-07-14 15:11 ` Markus
2007-07-16 9:41 ` Ingo Molnar
2007-07-16 17:59 ` Markus
2007-07-17 7:37 ` Ingo Molnar
2007-07-17 13:06 ` Markus
2007-07-17 17:06 ` Ingo Molnar
2007-07-17 17:13 ` Ingo Molnar
2007-07-17 19:42 ` Markus
2007-07-17 20:09 ` Ingo Molnar
2007-07-17 20:37 ` Linus Torvalds
2007-07-17 20:43 ` Ingo Molnar
2007-07-17 22:03 ` Markus
2007-07-20 22:26 ` Markus
2007-07-22 11:59 ` konqueror suddenly vanishing, "konqueror: Fatal IO error: client killed" Ingo Molnar
2007-07-22 14:26 ` Markus
2007-08-09 17:34 ` [patch] CFS scheduler, -v19 Markus
2007-08-10 7:46 ` Ingo Molnar
2007-08-14 17:15 ` Markus
2007-10-17 0:02 ` Markus
2007-07-14 17:19 ` Ed Tomlinson
2007-07-15 5:25 ` Mike Galbraith
2007-07-15 12:53 ` Markus
2007-07-15 19:46 ` Mike Galbraith
2007-07-15 21:11 ` Markus
2007-07-16 6:42 ` Mike Galbraith
2007-07-16 8:00 ` Ingo Molnar
2007-07-16 9:17 ` Ingo Molnar
2007-07-16 11:10 ` Ed Tomlinson
-- strict thread matches above, loose matches on Subject: below --
2007-07-08 20:51 Al Boldi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070717214441.GA20208@1wt.eu \
--to=w@1wt.eu \
--cc=akpm@linux-foundation.org \
--cc=arjan@infradead.org \
--cc=dmitry.adamushko@gmail.com \
--cc=efault@gmx.de \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=vatsa@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox