Re: Linux 2.6.23 - Zhang, Yanmin

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
To: Ingo Molnar <mingo@elte.hu>
Cc: Nicholas Miell <nmiell@comcast.net>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: Linux 2.6.23
Date: Thu, 11 Oct 2007 10:34:17 +0800	[thread overview]
Message-ID: <1192070057.3019.17.camel@ymzhang> (raw)
In-Reply-To: <20071010101452.GA25433@elte.hu>

On Wed, 2007-10-10 at 12:14 +0200, Ingo Molnar wrote:
> * Nicholas Miell <nmiell@comcast.net> wrote:
> 
> > Does CFS still generate the following sysbench graphs with 2.6.23, or 
> > did that get fixed?
> >
> > http://people.freebsd.org/~kris/scaling/linux-pgsql.png 
> > http://people.freebsd.org/~kris/scaling/linux-mysql.png
I also captured the same issue on a couple of machines.

> 
> as far as my testsystem goes, v2.6.23 beats v2.6.22.9 in sysbench:
> 
>     http://redhat.com/~mingo/misc/sysbench.jpg
> 
> As you can see it in the graph, v2.6.23 schedules much more consistently 
> too. [ v2.6.22 has a small (but potentially statistically insignificant) 
> edge at 4-6 clients, and CFS has a slightly better peak (which is 
> statistically insignificant). ]
> 
> ( Config is at http://redhat.com/~mingo/misc/config, system is Core2Duo
>   1.83 GHz, mysql-5.0.45, glibc-2.6. Nothing fancy either in the config
>   nor in the setup - everything is pretty close to the defaults. )
I used FedoraCore 8 Test2 distribution, so glibc-2.6.90-13 already fixed
the old malloc scalability issue. Cpu is 2.66GHZ quad core, 2 physical
processor, totally 8 cores. The regression is about 28%.


> 
> i'm aware of a 2.6.21 vs. 2.6.23 sysbench regression report, and it 
> apparently got resolved after various changes to the test environment:
> 
>    http://jeffr-tech.livejournal.com/10103.html
> 
>  " [<CFS>] has virtually no dropoff and performs better under load than
>    the default 2.6.21 scheduler. " (paraphrased)
> 
> (The new link you posted, just a few hours after the release of v2.6.23, 
> has not been reported to lkml before AFAICS - when did you become aware 
> of it? If you learned about it before v2.6.23 it might have been useful 
> to report it to the v2.6.23 regression list.)
I tested it in 2.6.22 and all 2.6.23-rc kernels. All 2.6.23-rc kernel has
the same regression. The testing result is stable.

> At a quick glance there are no .configs or other testing details at or 
> around that URL that i could use to reproduce their result precisely, so 
> at least a minimal bugreport would be nice.
Commandline to run testing:
#sysbench --test=oltp --mysql-user=root --mysql-db=mysql --max-time=120
--max-requests=0 --oltp-read-only=on --num-threads=16 run

> In any case, here are a few general comments about sysbench numbers:
> 
> Sysbench is a pretty 'batched' workload: it benefits most from batchy 
> scheduling: the client doing as much work as it can, then server doing 
> as much work as it can - and so on. The longer the client can work the 
> more cache-efficient the workload is. Any round-trip to the server due 
> to pesky preemption only blows up the cache footprint of the workload 
> and gives lower throughput.
> 
> This kind of workload would probably run best on DOS or Windows 3.11, 
> with no preemptive scheduling done at all. In other words: run both 
> mysqld and the client as SCHED_FIFO to get the best performance out of 
> it. So in that sense the workload is a bit similar to dbench.
> 
> The other thing is that mysqld does _tons_ of sys_time() calls, so GTOD 
> differences between .22 and .23 might cause extra overhead - especially 
> with 8 CPUs/cores. Does the sys_time() scalability patch below improve 
> sysbench performance for you? (i'm not sure about psqld)
> 
> If it's indeed due to batched vs. well-spread-out scheduling behavior 
> (which is possible), there are a few things you could do to make 
> scheduling more batched:
> 
> 1) start the DB daemon up as SCHED_BATCH:
> 
>      schedtool -B -e service mysqld restart
> 
>    (and do the same with the client-side commands as well)
> 
>    or:
> 
>        schedtool -B $$
> 
>    to mark the parent shell as SCHED_BATCH - then start up the DB and 
>    start the client workload. (All other tasks not started from this 
>    shell will still be SCHED_OTHER, so only your mysql workload will be 
>    affected.) For example "beagled" already runs under SCHED_BATCH by 
>    default.
> 
>    SCHED_BATCH will cause the scheduler to batch up the workload more. 
>    You basically tell the scheduler: "this workload really wants
>    throughput above all", and the scheduler takes that hint and acts 
>    upon it. (it's still not as drastic as SCHED_FIFO, it's somewhere 
>    between SCHED_OTHER and SCHED_FIFO, in terms of batching. Start up 
>    your DB and your client as SCHED_FIFO via "schedtool -F -p 10 ..." to 
>    establish the best-case batching win.)
> 
> 2) check out the v22 CFS backport patch which has the latest & greatest 
>    scheduler code, from http://people.redhat.com/mingo/cfs-scheduler/ . 
>    Does performance go up for you with it? It's somewhat less
>    preemption-eager, which might as well make the crutial difference for
>    sysbench.
> 
> 3) if it's enabled, disable CONFIG_PREEMPT=y. CONFIG_PREEMPT can cause
>    unwanted overscheduling and cache-trashing under overload.
Below is PREMPT config in my kernel config file.

CONFIG_SCHED_SMT=y
CONFIG_SCHED_MC=y
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_BKL=y
# CONFIG_NUMA is not set


-yanmin

next prev parent reply	other threads:[~2007-10-11  2:35 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-10-09 20:54 Linux 2.6.23 Linus Torvalds
2007-10-10  6:12 ` Nicholas Miell
2007-10-10 10:14   ` Ingo Molnar
2007-10-11  1:20     ` Nicholas Miell
2007-10-11  2:34     ` Zhang, Yanmin [this message]
2007-10-11 13:32       ` Ingo Molnar
2007-10-11  9:16     ` Nick Piggin
2007-10-12  5:46       ` Ingo Molnar
2007-10-11 14:15         ` Nick Piggin
2007-10-12 12:21         ` Bill Davidsen
2007-10-10  7:44 ` René Rebe
2007-10-10  8:37   ` Alexey Dobriyan
2007-10-10  9:12     ` Michael Tokarev
2007-10-10 10:36       ` Alexey Dobriyan
2007-10-10 10:53         ` Jan Engelhardt
2007-10-10 11:13           ` Michael Tokarev
2007-10-10 19:14   ` Ingo Molnar
2007-10-10 19:26     ` Michael Tokarev
2007-10-10 20:04     ` Andi Kleen
2007-10-10 23:27     ` Krzysztof Halasa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1192070057.3019.17.camel@ymzhang \
    --to=yanmin_zhang@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=nmiell@comcast.net \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox