From: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
To: Andreas Mohr <andi@lisas.de>
Cc: LKML <linux-kernel@vger.kernel.org>, linux-acpi@vger.kernel.org
Subject: Re: Dynamic configure max_cstate
Date: Tue, 28 Jul 2009 10:42:15 +0800 [thread overview]
Message-ID: <1248748935.2560.669.camel@ymzhang> (raw)
In-Reply-To: <20090727073338.GA12669@rhlx01.hs-esslingen.de>
On Mon, 2009-07-27 at 09:33 +0200, Andreas Mohr wrote:
> Hi,
>
> > When running a fio workload, I found sometimes cpu C state has
> > big impact on the result. Mostly, fio is a disk I/O workload
> > which doesn't spend much time with cpu, so cpu switch to C2/C3
> > freqently and the latency is big.
>
> Rather than inventing ways to limit ACPI Cx state usefulness, we should
> perhaps be thinking of what's wrong here.
Andreas,
Thanks for your kind comments.
>
> And your complaint might just fit into a thought I had recently:
> are we actually taking ACPI Cx exit latency into account, for timers???
I tried both tickless kernel and non-tickless kernels. The result is similiar.
Originally, I also thought it's related to timer. As you know, I/O block layer
has many timers. Such timers don't expire normally. For example, an I/O request
is submitted to driver and driver delievers it to disk and hardware triggers
an interrupt after finishing I/O. Mostly, the I/O submit and interrupt, not
the timer, drive the I/O.
>
> If we program a timer to fire at some point, then it is quite imaginable
> that any ACPI Cx exit latency due to the CPU being idle at that moment
> could add to actual timer trigger time significantly.
>
> To combat this, one would need to tweak the timer expiration time
> to include the exit latency. But of course once the CPU is running
> again, one would need to re-add the latency amount (read: reprogram the
> timer hardware, ugh...) to prevent the timer from firing too early.
>
> Given that one would need to reprogram timer hardware quite often,
> I don't know whether taking Cx exit latency into account is feasible.
> OTOH analysis of the single next timer value and actual hardware reprogramming
> would have to be done only once (in ACPI sleep and wake paths each),
> thus it might just turn out to be very beneficial after all
> (minus prolonging ACPI Cx path activity and thus aggravating CPU power
> savings, of course).
>
> Arjan mentioned examples of maybe 10us for C2 and 185us for C3/C4 in an
> article.
>
> OTOH even 185us is only 0.185ms, which, when compared to disk seek
> latency (around 7ms still, except for SSD), doesn't seem to be all that much.
> Or what kind of ballpark figure do you have for percentage of I/O
> deterioration?
I have lots of FIO sub test cases which test I/O on single disk and JBOD (a disk
bos which mostly has 12~13 disks) on nahelam machines. Your analysis on disk seek
is reasonable. I found sequential buffered read has the worst regression while rand
read is far better. For example, I start 12 processes per disk and every disk has 24
1-G files. There are 12 disks. The sequential read fio result is about 593MB/second
with idle=poll, and about 375MB/s without idle=poll. Read block size is 4KB.
Another exmaple is single fio direct seqential read (block size is 4K) on a single
SATA disk. The result is about 28MB/s without idle=poll and about 32.5MB with
idle=poll.
How did I find C state has impact on disk I/O result? Frankly, I found a regression
between kernel 2.6.27 and 2.6.28. Bisect located a nonstop tsc patch, but the patch
is quite good. I found the patch changes the default clocksource from hpet to
tsc. Then, I tried all clocksources and got the best result with acpi_pm clocksource.
But oprofile data shows acpi_pm has more cpu utilization. clocksource jiffies has
worst result but least cpu utilization. As you know, fio calls gettimeofday frequently.
Then, I tried boot parameter processor.max_cstate and idle=poll.
I get the similar result with processor.max_cstate=1 like the one with idle=poll.
I also run the testing on 2 stoakley machines and don't find such issues.
/proc/acpi/processor/CPUXXX/power shows stoakley cpu only has C1.
> I'm wondering whether we might have an even bigger problem with disk I/O
> related to this than just the raw ACPI exit latency value itself.
We might have. I'm still doing more testing. With Venki's tool (write/read MSR registers),
I collected some C state switch stat.
Current cpuidle has a good consideration on cpu utilization, but doesn't have
consideration on devices. So with I/O delivery and interrupt drive model
with little cpu utilization, performance might be hurt if C state exit has a long
latency.
Yanmin
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2009-07-28 2:42 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-07-27 5:30 Dynamic configure max_cstate Zhang, Yanmin
2009-07-27 7:33 ` Andreas Mohr
2009-07-28 2:42 ` Zhang, Yanmin [this message]
2009-07-28 7:20 ` Corrado Zoccolo
2009-07-28 9:00 ` Zhang, Yanmin
2009-07-28 10:11 ` Andreas Mohr
2009-07-28 14:03 ` Andreas Mohr
2009-07-28 17:35 ` ok, now would this be useful? (Re: Dynamic configure max_cstate) Andreas Mohr
2009-07-29 8:20 ` Dynamic configure max_cstate Zhang, Yanmin
2009-07-31 3:43 ` Robert Hancock
2009-07-31 7:06 ` Zhang, Yanmin
2009-07-31 8:07 ` Andreas Mohr
2009-07-31 14:40 ` Andi Kleen
2009-07-31 14:56 ` Michael S. Zick
2009-07-31 17:37 ` Pallipadi, Venkatesh
2009-07-31 15:14 ` Len Brown
2009-07-30 6:28 ` Zhang, Yanmin
2009-07-28 19:25 ` Len Brown
2009-07-29 0:17 ` Len Brown
2009-07-29 8:00 ` Andreas Mohr
2009-07-28 19:47 ` Len Brown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1248748935.2560.669.camel@ymzhang \
--to=yanmin_zhang@linux.intel.com \
--cc=andi@lisas.de \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox