From mboxrd@z Thu Jan 1 00:00:00 1970 From: Werner Almesberger Subject: Re: elevator priorities vs. full request queues Date: Mon, 12 Jul 2004 20:52:27 -0300 Sender: linux-fsdevel-owner@vger.kernel.org Message-ID: <20040712205227.A12285@almesberger.net> References: <20040622012502.B1325@almesberger.net> <20040622074852.GW12881@suse.de> <20040622052644.D1325@almesberger.net> <20040622101434.GB12881@suse.de> <20040622160859.I1325@almesberger.net> <20040623101430.GI1120@suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: linux-fsdevel@vger.kernel.org Return-path: Received: from almesberger.net ([63.105.73.238]:32518 "EHLO host.almesberger.net") by vger.kernel.org with ESMTP id S264265AbUGLXwn (ORCPT ); Mon, 12 Jul 2004 19:52:43 -0400 To: Jens Axboe Content-Disposition: inline In-Reply-To: <20040623101430.GI1120@suse.de>; from axboe@suse.de on Wed, Jun 23, 2004 at 12:14:31PM +0200 List-Id: linux-fsdevel.vger.kernel.org Jens Axboe wrote: > Something like this (probably a little half-assed, and definitely very > untested :-). Nevertheless, it seems to work well enough :-) The only bug I've noticed is that calculations related to bi_rw need to be unsigned long, for 64 bit compatibility, i.e. +#define bio_set_prio(bio, prio) do { \ + WARN_ON(prio >= (1 << BIO_PRIO_BITS)); \ + (bio)->bi_rw &= ((1UL << BIO_PRIO_SHIFT) - 1); \ + (bio)->bi_rw |= ((unsigned long) (prio) << BIO_PRIO_SHIFT); \ +} while (0) I've adapted your per-process IO priority idea, and used it as follows: --- linux-2.6.7-orig/include/linux/sched.h Wed Jun 16 02:18:57 2004 +++ linux-2.6.7/include/linux/sched.h Sun Jul 11 15:00:31 2004 @@ -505,6 +505,7 @@ struct task_struct { struct backing_dev_info *backing_dev_info; struct io_context *io_context; + int ioprio; unsigned long ptrace_message; siginfo_t *last_siginfo; /* For ptrace use. */ --- linux-2.6.7-orig/fs/buffer.c Wed Jun 16 02:19:36 2004 +++ linux-2.6.7/fs/buffer.c Mon Jul 12 08:25:41 2004 @@ -2789,6 +2789,8 @@ void submit_bh(int rw, struct buffer_hea bio->bi_end_io = end_bio_bh_io_sync; bio->bi_private = bh; + bio_set_prio(bio, current->ioprio); + submit_bio(rw, bio); } --- linux-2.6.7-orig/drivers/block/ll_rw_blk.c Sun Jul 11 14:20:07 2004 +++ linux-2.6.7/drivers/block/ll_rw_blk.c Mon Jul 12 08:20:41 2004 @@ -2320,7 +2320,7 @@ static inline void blk_partition_remap(s if (bdev != bdev->bd_contains) { struct hd_struct *p = bdev->bd_part; - switch (bio->bi_rw) { + switch (bio->bi_rw & BIO_RW) { case READ: p->read_sectors += bio_sectors(bio); p->reads++; @@ -2451,7 +2451,7 @@ void submit_bio(int rw, struct bio *bio) BIO_BUG_ON(!bio->bi_size); BIO_BUG_ON(!bio->bi_io_vec); - bio->bi_rw = rw; + bio->bi_rw |= rw; if (rw & WRITE) mod_page_state(pgpgout, count); else Because I'm lazy, I'm using a default priority of zero, so I don't need any explicit initialization. I've been playing with this for a few hours, and even a request-happy load with random accesses through AIO, which normally basically kills the machine, doesn't impress my high-priority reader anymore. I haven't looked into fairness issues, though. - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/