From: Jens Axboe <axboe@suse.de>
To: Chris Mason <mason@suse.com>
Cc: Andrea Arcangeli <andrea@suse.de>, Andrew Morton <akpm@osdl.org>,
marcelo@conectiva.com.br, linux-kernel@vger.kernel.org
Subject: Re: RFC on io-stalls patch
Date: Tue, 15 Jul 2003 16:29:50 +0200 [thread overview]
Message-ID: <20030715142950.GZ833@suse.de> (raw)
In-Reply-To: <1058278688.4012.57.camel@tiny.suse.com>
On Tue, Jul 15 2003, Chris Mason wrote:
> On Tue, 2003-07-15 at 05:17, Jens Axboe wrote:
>
> > > BTW, the contest run times vary pretty wildy. My 3 compiles with
> > > io_load running on 2.4.21 were 603s, 443s and 515s. This doesn't make
> > > the average of the 3 numbers invalid, but we need a more stable metric.
> >
> > Mine are pretty consistent [1], I'd suspect that it isn't contest but your
> > drive tcq skewing things. But it would be nice to test with other things
> > as well, I just used contest because it was at hand.
>
> I hacked up my random io generator a bit, combined with tiobench it gets
> pretty consistent numbers. rio is attached, it does lots of different
> things but the basic idea is to create a sparse file and then do io of
> random sizes into random places in the file.
>
> So, random writes with a large max record size can starve readers pretty
> well. I've been running it like so:
>
> #!/bin/sh
> #
> # rio sparse file size of 2G, writes only, max record size 1MB
> #
> rio -s 2G -W -m 1m &
> kid=$!
> sleep 1
> tiobench.pl --threads 8
> kill $kid
> wait
>
> tiobench is nice because it also gives latency numbers, I'm playing a
> bit to see how the number of tiobench threads changes things, but the
> contest output is much easier to compare. After a little formatting:
>
> Sequential Reads
> File Blk Num Avg Maximum Lat% Lat% CPU
> Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff
> ------------ ------ ----- --- ------ ------ --------- ----------- -------- -------- -----
> 2.4.22-pre5 1792 4096 8 6.92 2.998% 4.363 9666.86 0.02245 0.00000 231
> 2.4.21 1792 4096 8 8.40 3.275% 3.052 3249.79 0.00131 0.00000 256
>
> Random Reads
> File Blk Num Avg Maximum Lat% Lat% CPU
> Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff
> ------------ ------ ----- --- ------ ------ --------- ----------- -------- -------- -----
> 2.4.22-pre5 1792 4096 8 1.41 0.540% 20.932 604.13 0.00000 0.00000 260
> 2.4.21 1792 4096 8 0.65 0.540% 41.458 2689.96 0.05000 0.00000 120
>
> Sequential Writes
> File Blk Num Avg Maximum Lat% Lat% CPU
> Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff
> ------------ ------ ----- --- ------ ------ --------- ----------- -------- -------- -----
> 2.4.22-pre5 1792 4096 8 13.77 8.793% 1.550 3416.72 0.00567 0.00000 157
> 2.4.21 1792 4096 8 15.38 8.847% 1.169 47134.93 0.00719 0.00305 174
>
> Random Writes
> File Blk Num Avg Maximum Lat% Lat% CPU
> Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff
> ------------ ------ ----- --- ------ ------ --------- ----------- -------- -------- -----
> 2.4.22-pre5 1792 4096 8 0.68 1.470% 0.027 12.91 0.00000 0.00000 46
> 2.4.21 1792 4096 8 0.67 0.598% 0.043 67.21 0.00000 0.00000 112
>
> rio output:
> 2.4.22-pre5 total io 2683.087 MB, 428.000 seconds 6.269 MB/s
> 2.4.21 total io 3231.576 MB, 381.000 seconds 8.482 MB/s
> 2.4.22-pre5 writes: 2683.087 MB, 6.269 MB/s
> 2.4.21 writes: 3231.576 MB, 8.482 MB/s
>
> Without breaking tiobench apart to skip the write phase it is hard to
> tell where the rio throughput loss is. My guess is that most of it was
> lost during the sequential write tiobench phase.
>
> Since jens is getting consistent contest numbers, this isn't meant to
> replace it. Just another data point worth looking at. Jens if you want
> to send me your current patch I can give it a quick spin here.
Very nice Chris! I'm still getting the ide numbers here, the patch I'm
using at present looks like the following.
===== drivers/block/ll_rw_blk.c 1.47 vs edited =====
--- 1.47/drivers/block/ll_rw_blk.c Fri Jul 11 10:30:54 2003
+++ edited/drivers/block/ll_rw_blk.c Tue Jul 15 11:09:38 2003
@@ -458,6 +458,7 @@
INIT_LIST_HEAD(&q->rq.free);
q->rq.count = 0;
+ q->rq.pending[READ] = q->rq.pending[WRITE] = 0;
q->nr_requests = 0;
si_meminfo(&si);
@@ -549,18 +550,33 @@
static struct request *get_request(request_queue_t *q, int rw)
{
struct request *rq = NULL;
- struct request_list *rl;
+ struct request_list *rl = &q->rq;
- rl = &q->rq;
- if (!list_empty(&rl->free) && !blk_oversized_queue(q)) {
+ if (blk_oversized_queue(q)) {
+ int rlim = q->nr_requests >> 5;
+
+ if (rlim < 4)
+ rlim = 4;
+
+ /*
+ * if its a write, or we have more than a handful of reads
+ * pending, bail out
+ */
+ if ((rw == WRITE) || (rw == READ && rl->pending[READ] > rlim))
+ return NULL;
+ }
+
+ if (!list_empty(&rl->free)) {
rq = blkdev_free_rq(&rl->free);
list_del(&rq->queue);
rl->count--;
+ rl->pending[rw]++;
rq->rq_status = RQ_ACTIVE;
rq->cmd = rw;
rq->special = NULL;
rq->q = q;
}
+
return rq;
}
@@ -866,13 +882,25 @@
* assume it has free buffers and check waiters
*/
if (q) {
+ struct request_list *rl = &q->rq;
int oversized_batch = 0;
if (q->can_throttle)
oversized_batch = blk_oversized_queue_batch(q);
- q->rq.count++;
- list_add(&req->queue, &q->rq.free);
- if (q->rq.count >= q->batch_requests && !oversized_batch) {
+ rl->count++;
+ /*
+ * paranoia check
+ */
+ if (req->cmd == READ || req->cmd == WRITE)
+ rl->pending[req->cmd]--;
+ if (rl->pending[READ] > q->nr_requests)
+ printf("reads: %u\n", rl->pending[READ]);
+ if (rl->pending[WRITE] > q->nr_requests)
+ printf("writes: %u\n", rl->pending[WRITE]);
+ if (rl->pending[READ] + rl->pending[WRITE] > q->nr_requests)
+ printf("too many: %u + %u > %u\n", rl->pending[READ], rl->pending[WRITE], q->nr_requests);
+ list_add(&req->queue, &rl->free);
+ if (rl->count >= q->batch_requests && !oversized_batch) {
smp_mb();
if (waitqueue_active(&q->wait_for_requests))
wake_up(&q->wait_for_requests);
@@ -947,7 +975,7 @@
static int __make_request(request_queue_t * q, int rw,
struct buffer_head * bh)
{
- unsigned int sector, count;
+ unsigned int sector, count, sync;
int max_segments = MAX_SEGMENTS;
struct request * req, *freereq = NULL;
int rw_ahead, max_sectors, el_ret;
@@ -958,6 +986,7 @@
count = bh->b_size >> 9;
sector = bh->b_rsector;
+ sync = test_and_clear_bit(BH_Sync, &bh->b_state);
rw_ahead = 0; /* normal case; gets changed below for READA */
switch (rw) {
@@ -1124,6 +1153,8 @@
blkdev_release_request(freereq);
if (should_wake)
get_request_wait_wakeup(q, rw);
+ if (sync)
+ __generic_unplug_device(q);
spin_unlock_irq(&io_request_lock);
return 0;
end_io:
===== fs/buffer.c 1.89 vs edited =====
--- 1.89/fs/buffer.c Tue Jun 24 23:11:29 2003
+++ edited/fs/buffer.c Mon Jul 14 16:59:59 2003
@@ -1142,6 +1142,7 @@
bh = getblk(dev, block, size);
if (buffer_uptodate(bh))
return bh;
+ set_bit(BH_Sync, &bh->b_state);
ll_rw_block(READ, 1, &bh);
wait_on_buffer(bh);
if (buffer_uptodate(bh))
===== include/linux/blkdev.h 1.24 vs edited =====
--- 1.24/include/linux/blkdev.h Fri Jul 4 19:18:12 2003
+++ edited/include/linux/blkdev.h Tue Jul 15 10:06:04 2003
@@ -66,6 +66,7 @@
struct request_list {
unsigned int count;
+ unsigned int pending[2];
struct list_head free;
};
===== include/linux/fs.h 1.82 vs edited =====
--- 1.82/include/linux/fs.h Wed Jul 9 20:42:38 2003
+++ edited/include/linux/fs.h Tue Jul 15 10:13:13 2003
@@ -221,6 +221,7 @@
BH_Launder, /* 1 if we can throttle on this buffer */
BH_Attached, /* 1 if b_inode_buffers is linked into a list */
BH_JBD, /* 1 if it has an attached journal_head */
+ BH_Sync, /* 1 if the buffer is a sync read */
BH_PrivateStart,/* not a state bit, but the first bit available
* for private allocation by other entities
--
Jens Axboe
next prev parent reply other threads:[~2003-07-15 14:15 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-07-08 20:06 RFC on io-stalls patch Marcelo Tosatti
2003-07-10 13:57 ` Jens Axboe
2003-07-11 14:13 ` Chris Mason
2003-07-12 0:20 ` Nick Piggin
2003-07-12 18:37 ` Chris Mason
2003-07-12 7:37 ` Jens Axboe
2003-07-12 7:48 ` Jens Axboe
2003-07-12 18:32 ` Chris Mason
2003-07-13 0:33 ` Andrea Arcangeli
2003-07-13 9:01 ` Jens Axboe
2003-07-13 16:20 ` Chris Mason
2003-07-13 16:45 ` Jeff Garzik
2003-07-13 19:33 ` Andrea Arcangeli
2003-07-13 17:47 ` Jens Axboe
2003-07-13 19:35 ` Andrea Arcangeli
2003-07-14 0:36 ` Chris Mason
2003-07-13 19:19 ` Andrea Arcangeli
2003-07-14 5:49 ` Jens Axboe
2003-07-14 12:23 ` Marcelo Tosatti
2003-07-14 13:12 ` Jens Axboe
2003-07-14 19:51 ` Jens Axboe
2003-07-14 20:09 ` Chris Mason
2003-07-14 20:19 ` Andrea Arcangeli
2003-07-14 21:24 ` Chris Mason
2003-07-15 5:46 ` Jens Axboe
2003-07-14 20:09 ` Marcelo Tosatti
2003-07-14 20:24 ` Andrea Arcangeli
2003-07-14 20:34 ` Chris Mason
2003-07-15 5:35 ` Jens Axboe
[not found] ` <20030714224528.GU16313@dualathlon.random>
2003-07-15 5:40 ` Jens Axboe
[not found] ` <1058229360.13317.364.camel@tiny.suse.com>
2003-07-15 5:43 ` Jens Axboe
[not found] ` <20030714175238.3eaddd9a.akpm@osdl.org>
[not found] ` <20030715020706.GC16313@dualathlon.random>
2003-07-15 5:45 ` Jens Axboe
2003-07-15 6:01 ` Andrea Arcangeli
2003-07-15 6:08 ` Jens Axboe
2003-07-15 7:03 ` Andrea Arcangeli
2003-07-15 8:28 ` Jens Axboe
2003-07-15 9:12 ` Chris Mason
2003-07-15 9:17 ` Jens Axboe
2003-07-15 9:18 ` Jens Axboe
2003-07-15 9:30 ` Chris Mason
2003-07-15 10:03 ` Andrea Arcangeli
2003-07-15 10:11 ` Jens Axboe
2003-07-15 14:18 ` Chris Mason
2003-07-15 14:29 ` Jens Axboe [this message]
2003-07-16 17:06 ` Chris Mason
2003-07-15 9:22 ` Chris Mason
2003-07-15 9:59 ` Andrea Arcangeli
2003-07-15 9:48 ` Andrea Arcangeli
2003-07-14 20:16 ` Andrea Arcangeli
2003-07-14 20:17 ` Marcelo Tosatti
2003-07-14 20:27 ` Andrea Arcangeli
2003-07-15 5:26 ` Jens Axboe
2003-07-15 5:48 ` Andrea Arcangeli
2003-07-15 6:01 ` Jens Axboe
2003-07-15 6:33 ` Andrea Arcangeli
2003-07-15 11:22 ` Alan Cox
2003-07-15 11:27 ` Jens Axboe
2003-07-16 12:43 ` Andrea Arcangeli
2003-07-16 12:46 ` Jens Axboe
2003-07-16 12:59 ` Andrea Arcangeli
2003-07-16 13:04 ` Jens Axboe
2003-07-16 13:11 ` Andrea Arcangeli
2003-07-16 13:21 ` Jens Axboe
2003-07-16 13:44 ` Andrea Arcangeli
2003-07-16 14:00 ` Jens Axboe
2003-07-16 14:24 ` Andrea Arcangeli
2003-07-16 16:49 ` Andrew Morton
-- strict thread matches above, loose matches on Subject: below --
2003-07-15 18:47 Shane Shrybman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20030715142950.GZ833@suse.de \
--to=axboe@suse.de \
--cc=akpm@osdl.org \
--cc=andrea@suse.de \
--cc=linux-kernel@vger.kernel.org \
--cc=marcelo@conectiva.com.br \
--cc=mason@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.