From: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
To: Ingo Molnar <mingo@elte.hu>
Cc: Nicholas Miell <nmiell@comcast.net>,
Linus Torvalds <torvalds@linux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: Linux 2.6.23
Date: Thu, 11 Oct 2007 10:34:17 +0800 [thread overview]
Message-ID: <1192070057.3019.17.camel@ymzhang> (raw)
In-Reply-To: <20071010101452.GA25433@elte.hu>
On Wed, 2007-10-10 at 12:14 +0200, Ingo Molnar wrote:
> * Nicholas Miell <nmiell@comcast.net> wrote:
>
> > Does CFS still generate the following sysbench graphs with 2.6.23, or
> > did that get fixed?
> >
> > http://people.freebsd.org/~kris/scaling/linux-pgsql.png
> > http://people.freebsd.org/~kris/scaling/linux-mysql.png
I also captured the same issue on a couple of machines.
>
> as far as my testsystem goes, v2.6.23 beats v2.6.22.9 in sysbench:
>
> http://redhat.com/~mingo/misc/sysbench.jpg
>
> As you can see it in the graph, v2.6.23 schedules much more consistently
> too. [ v2.6.22 has a small (but potentially statistically insignificant)
> edge at 4-6 clients, and CFS has a slightly better peak (which is
> statistically insignificant). ]
>
> ( Config is at http://redhat.com/~mingo/misc/config, system is Core2Duo
> 1.83 GHz, mysql-5.0.45, glibc-2.6. Nothing fancy either in the config
> nor in the setup - everything is pretty close to the defaults. )
I used FedoraCore 8 Test2 distribution, so glibc-2.6.90-13 already fixed
the old malloc scalability issue. Cpu is 2.66GHZ quad core, 2 physical
processor, totally 8 cores. The regression is about 28%.
>
> i'm aware of a 2.6.21 vs. 2.6.23 sysbench regression report, and it
> apparently got resolved after various changes to the test environment:
>
> http://jeffr-tech.livejournal.com/10103.html
>
> " [<CFS>] has virtually no dropoff and performs better under load than
> the default 2.6.21 scheduler. " (paraphrased)
>
> (The new link you posted, just a few hours after the release of v2.6.23,
> has not been reported to lkml before AFAICS - when did you become aware
> of it? If you learned about it before v2.6.23 it might have been useful
> to report it to the v2.6.23 regression list.)
I tested it in 2.6.22 and all 2.6.23-rc kernels. All 2.6.23-rc kernel has
the same regression. The testing result is stable.
> At a quick glance there are no .configs or other testing details at or
> around that URL that i could use to reproduce their result precisely, so
> at least a minimal bugreport would be nice.
Commandline to run testing:
#sysbench --test=oltp --mysql-user=root --mysql-db=mysql --max-time=120
--max-requests=0 --oltp-read-only=on --num-threads=16 run
> In any case, here are a few general comments about sysbench numbers:
>
> Sysbench is a pretty 'batched' workload: it benefits most from batchy
> scheduling: the client doing as much work as it can, then server doing
> as much work as it can - and so on. The longer the client can work the
> more cache-efficient the workload is. Any round-trip to the server due
> to pesky preemption only blows up the cache footprint of the workload
> and gives lower throughput.
>
> This kind of workload would probably run best on DOS or Windows 3.11,
> with no preemptive scheduling done at all. In other words: run both
> mysqld and the client as SCHED_FIFO to get the best performance out of
> it. So in that sense the workload is a bit similar to dbench.
>
> The other thing is that mysqld does _tons_ of sys_time() calls, so GTOD
> differences between .22 and .23 might cause extra overhead - especially
> with 8 CPUs/cores. Does the sys_time() scalability patch below improve
> sysbench performance for you? (i'm not sure about psqld)
>
> If it's indeed due to batched vs. well-spread-out scheduling behavior
> (which is possible), there are a few things you could do to make
> scheduling more batched:
>
> 1) start the DB daemon up as SCHED_BATCH:
>
> schedtool -B -e service mysqld restart
>
> (and do the same with the client-side commands as well)
>
> or:
>
> schedtool -B $$
>
> to mark the parent shell as SCHED_BATCH - then start up the DB and
> start the client workload. (All other tasks not started from this
> shell will still be SCHED_OTHER, so only your mysql workload will be
> affected.) For example "beagled" already runs under SCHED_BATCH by
> default.
>
> SCHED_BATCH will cause the scheduler to batch up the workload more.
> You basically tell the scheduler: "this workload really wants
> throughput above all", and the scheduler takes that hint and acts
> upon it. (it's still not as drastic as SCHED_FIFO, it's somewhere
> between SCHED_OTHER and SCHED_FIFO, in terms of batching. Start up
> your DB and your client as SCHED_FIFO via "schedtool -F -p 10 ..." to
> establish the best-case batching win.)
>
> 2) check out the v22 CFS backport patch which has the latest & greatest
> scheduler code, from http://people.redhat.com/mingo/cfs-scheduler/ .
> Does performance go up for you with it? It's somewhat less
> preemption-eager, which might as well make the crutial difference for
> sysbench.
>
> 3) if it's enabled, disable CONFIG_PREEMPT=y. CONFIG_PREEMPT can cause
> unwanted overscheduling and cache-trashing under overload.
Below is PREMPT config in my kernel config file.
CONFIG_SCHED_SMT=y
CONFIG_SCHED_MC=y
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_BKL=y
# CONFIG_NUMA is not set
-yanmin
next prev parent reply other threads:[~2007-10-11 2:35 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-10-09 20:54 Linux 2.6.23 Linus Torvalds
2007-10-10 6:12 ` Nicholas Miell
2007-10-10 10:14 ` Ingo Molnar
2007-10-11 1:20 ` Nicholas Miell
2007-10-11 2:34 ` Zhang, Yanmin [this message]
2007-10-11 13:32 ` Ingo Molnar
2007-10-11 9:16 ` Nick Piggin
2007-10-12 5:46 ` Ingo Molnar
2007-10-11 14:15 ` Nick Piggin
2007-10-12 12:21 ` Bill Davidsen
2007-10-10 7:44 ` René Rebe
2007-10-10 8:37 ` Alexey Dobriyan
2007-10-10 9:12 ` Michael Tokarev
2007-10-10 10:36 ` Alexey Dobriyan
2007-10-10 10:53 ` Jan Engelhardt
2007-10-10 11:13 ` Michael Tokarev
2007-10-10 19:14 ` Ingo Molnar
2007-10-10 19:26 ` Michael Tokarev
2007-10-10 20:04 ` Andi Kleen
2007-10-10 23:27 ` Krzysztof Halasa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1192070057.3019.17.camel@ymzhang \
--to=yanmin_zhang@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=nmiell@comcast.net \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox