Re: IO queueing and complete affinity w/ threads: Some results

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Alan D. Brunelle" <Alan.Brunelle@hp.com>
To: "Alan D. Brunelle" <Alan.Brunelle@hp.com>
Cc: linux-kernel@vger.kernel.org, Jens Axboe <jens.axboe@oracle.com>,
	npiggin@suse.de, dgc@sgi.com, arjan@linux.intel.com
Subject: Re: IO queueing and complete affinity w/ threads: Some results
Date: Thu, 14 Feb 2008 10:36:40 -0500	[thread overview]
Message-ID: <47B46008.9020804@hp.com> (raw)
In-Reply-To: <47B0B69B.1050807@hp.com>

Taking a step back, I went to a very simple test environment:

o  4-way IA64
o  2 disks (on separate RAID controller, handled by separate ports on the same FC HBA - generates different IRQs).
o  Using write-cached tests - keep all IOs inside of the RAID controller's cache, so no perturbations due to platter accesses)

Basically:

o  CPU 0 handled IRQs for /dev/sds
o  CPU 2 handled IRQs for /dev/sdaa

We placed an IO generator on CPU1 (for /dev/sds) and CPU3 (for /dev/sdaa). The IO generator performed 4KiB sequential direct AIOs in a very small range (2MB - well within the controller cache on the external storage device). We have found that this is a simple way to maximize throughput, and thus be able to watch the system for effects without worrying about odd seek & other platter-induced issues. Each test took about 6 minutes to run (ran a specific amount of IO, so we could compare & contrast system measurements).

First: overall performance

2.6.24 (no patches)              : 106.90 MB/sec

2.6.24 + original patches + rq=0 : 103.09 MB/sec
                            rq=1 :  98.81 MB/sec

2.6.24 + kthreads patches + rq=0 : 106.85 MB/sec
                            rq=1 : 107.16 MB/sec

So, the kthreads patches works much better here - and on-par or better than straight 2.6.24. I also ran Caliper (akin to Oprofile, proprietary and ia64-specific, sorry), and looked at the cycles used. On an ia64 back-end-bubbles are deadly, and can be caused by cache misses &c. Looking at the gross data:

Kernel                                CPU_CYCLES       BACK END BUBBLES  100.0 * (BEB/CC)
--------------------------------   -----------------  -----------------  ----------------
2.6.24 (no patches)              : 2,357,215,454,852    231,547,237,267   9.8%

2.6.24 + original patches + rq=0 : 2,444,895,579,790    242,719,920,828   9.9%
                            rq=1 : 2,551,175,203,455    148,586,145,513   5.8%

2.6.24 + kthreads patches + rq=0 : 2,359,376,156,043    255,563,975,526  10.8%
                            rq=1 : 2,350,539,631,362    208,888,961,094   8.9%

For both the original & kthreads patches we see a /significant/ drop in bubbles when setting rq=1 over rq=0. This shows up in extra CPU cycles available (not spent in %system) - a graph is provided up on http://free.linux.hp.com/~adb/jens/cached_mps.png - it shows the results from stats extracted from running mpstat in conjunction with the IO runs.

Combining %sys & %soft IRQ, we see:

Kernel                              % user     % sys   % iowait   % idle
--------------------------------   --------  --------  --------  --------
2.6.24 (no patches)              :   0.141%   10.088%   43.949%   45.819%

2.6.24 + original patches + rq=0 :   0.123%   11.361%   43.507%   45.008%
                            rq=1 :   0.156%    6.030%   44.021%   49.794%

2.6.24 + kthreads patches + rq=0 :   0.163%   10.402%   43.744%   45.686%
                            rq=1 :   0.156%    8.160%   41.880%   49.804%

The good news (I think) is that even with rq=0 with the kthreads patches we're getting on-par performance w/ 2.6.24, so the default case should be ok...

I've only done a few runs by hand with this - these results are from one representative run out of the bunch - but at least this (I believe) shows what this patch stream is intending to do: optimize placement of IO completion handling to minimize cache & TLB disruptions. Freeing up cycles in the kernel is always helpful! :-)

I'm going to try similar runs on an AMD64 w/ Oprofile and see what results I get there... (BTW: I'll be dropping testing of the original patch sequence, the kthreads patches look better in general (both in terms of code & results, coincidence?).

Alan

next prev parent reply	other threads:[~2008-02-14 15:36 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-11 20:56 IO queueing and complete affinity w/ threads: Some results Alan D. Brunelle
2008-02-12 20:56 ` Alan D. Brunelle
2008-02-12 22:08 ` Alan D. Brunelle
2008-02-12 22:26   ` Alan D. Brunelle
2008-02-13 15:35 ` Alan D. Brunelle
2008-02-14 15:36 ` Alan D. Brunelle [this message]
2008-02-18 12:37   ` Jens Axboe
2008-02-18 13:33     ` Andi Kleen
2008-02-18 14:16       ` Jens Axboe
2008-02-19  1:49       ` Nick Piggin
2008-02-19 21:14     ` Paul Jackson
2008-02-19 21:31       ` Mike Travis
2008-02-20  8:08         ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47B46008.9020804@hp.com \
    --to=alan.brunelle@hp.com \
    --cc=arjan@linux.intel.com \
    --cc=dgc@sgi.com \
    --cc=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=npiggin@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.