From: Jens Axboe <jens.axboe@oracle.com>
To: "Jenkins, Lee" <Lee.Jenkins@hp.com>
Cc: "fio@vger.kernel.org" <fio@vger.kernel.org>
Subject: Re: I/O alignment
Date: Wed, 11 Mar 2009 10:57:55 +0100 [thread overview]
Message-ID: <20090311095754.GL11787@kernel.dk> (raw)
In-Reply-To: <20090310195408.GJ11787@kernel.dk>
On Tue, Mar 10 2009, Jens Axboe wrote:
> On Tue, Mar 10 2009, Jenkins, Lee wrote:
> > Is there a way to control the alignment of I/O offsets? The HOWTO
> > shows bsrange= and bs_unaligned=, but these seem to be related to the
> > size of the I/O, not the offset.
> >
> > In our lab testing it appears from blktrace dumps that I/Os are
> > boundary-aligned based on the size of the I/O. For example, in a test
> > of 64KB Random Reads all the I/O addresses were multiples of 64KB (128
> > sectors). This alignment has a profound impact on I/O performance for
> > certain disk array configurations. Ideally we'd like to be able to
> > control the alignment to match our customers' run-time environment.
>
> That is correct, fio will use your minimum block size as the alignment
> block as well. This is needed for the random map and doing verifies, for
> instance. But I see your point, being able to specifically set your
> minimum alignment is indeed useful. It would have to be with the
> 'norandommap' option, at least that would be the easiest.
>
> I'll add such an option for you tomorrow. Suggestions for option name
> would be appreciated, I'm not very good with coming up with good names
> :-)
This should work, I hope. It adds a blockalign/ba option (thanks Lee :-)
and will align random offsets to that boundary. You need to use
norandommap for this feature, fio will complain if you do not. So if you
use bs=64k and ba=4k for your test, you will get 4k alignment on offsets
with ios of 64k in size.
I have committed the patch, so you can also just update to the latest
version instead of applying this one manually.
diff --git a/HOWTO b/HOWTO
index 4e52e65..999f777 100644
--- a/HOWTO
+++ b/HOWTO
@@ -327,6 +327,14 @@ bs=int The block size used for the io units. Defaults to 4k. Values
can do so by passing an empty read size - bs=,8k will set
8k for writes and leave the read default value.
+blockalign=int
+ba=int At what boundary to align random IO offsets. Defaults to
+ the same as 'blocksize' the minimum blocksize given.
+ Minimum alignment is typically 512b for using direct IO,
+ though it usually depends on the hardware block size. This
+ option is mutually exclusive with using a random map for
+ files, so it will turn off that option.
+
blocksize_range=irange
bsrange=irange Instead of giving a single block size, specify a range
and fio will mix the issued io block sizes. The issued
diff --git a/fio.h b/fio.h
index b6ffe60..a9e2e3b 100644
--- a/fio.h
+++ b/fio.h
@@ -429,6 +429,7 @@ struct thread_options {
unsigned long long start_offset;
unsigned int bs[2];
+ unsigned int ba[2];
unsigned int min_bs[2];
unsigned int max_bs[2];
struct bssplit *bssplit;
diff --git a/init.c b/init.c
index 4ae3baf..80d098d 100644
--- a/init.c
+++ b/init.c
@@ -273,6 +273,21 @@ static int fixup_options(struct thread_data *td)
o->rw_min_bs = min(o->min_bs[DDIR_READ], o->min_bs[DDIR_WRITE]);
+ /*
+ * For random IO, allow blockalign offset other than min_bs.
+ */
+ if (!o->ba[DDIR_READ] || !td_random(td))
+ o->ba[DDIR_READ] = o->min_bs[DDIR_READ];
+ if (!o->ba[DDIR_WRITE] || !td_random(td))
+ o->ba[DDIR_WRITE] = o->min_bs[DDIR_WRITE];
+
+ if ((o->ba[DDIR_READ] != o->min_bs[DDIR_READ] ||
+ o->ba[DDIR_WRITE] != o->min_bs[DDIR_WRITE]) &&
+ !td->o.norandommap) {
+ log_err("fio: Any use of blockalign= turns off randommap\n");
+ td->o.norandommap = 1;
+ }
+
if (!o->file_size_high)
o->file_size_high = o->file_size_low;
diff --git a/io_u.c b/io_u.c
index 27014c8..476658e 100644
--- a/io_u.c
+++ b/io_u.c
@@ -95,7 +95,7 @@ static unsigned long long last_block(struct thread_data *td, struct fio_file *f,
if (max_size > f->real_file_size)
max_size = f->real_file_size;
- max_blocks = max_size / (unsigned long long) td->o.min_bs[ddir];
+ max_blocks = max_size / (unsigned long long) td->o.ba[ddir];
if (!max_blocks)
return 0;
@@ -212,7 +212,7 @@ static int get_next_offset(struct thread_data *td, struct io_u *io_u)
b = (f->last_pos - f->file_offset) / td->o.min_bs[ddir];
}
- io_u->offset = b * td->o.min_bs[ddir];
+ io_u->offset = b * td->o.ba[ddir];
if (io_u->offset >= f->io_size) {
dprint(FD_IO, "get_next_offset: offset %llu >= io_size %llu\n",
io_u->offset, f->io_size);
diff --git a/options.c b/options.c
index 73815bb..9700110 100644
--- a/options.c
+++ b/options.c
@@ -793,6 +793,16 @@ static struct fio_option options[] = {
.parent = "rw",
},
{
+ .name = "ba",
+ .alias = "blockalign",
+ .type = FIO_OPT_STR_VAL_INT,
+ .off1 = td_var_offset(ba[DDIR_READ]),
+ .off2 = td_var_offset(ba[DDIR_WRITE]),
+ .minval = 1,
+ .help = "IO block offset alignment",
+ .parent = "rw",
+ },
+ {
.name = "bsrange",
.alias = "blocksize_range",
.type = FIO_OPT_RANGE,
--
Jens Axboe
prev parent reply other threads:[~2009-03-11 9:57 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <499e38d2.fWWZqp8SkMl4HRX/%jens.axboe@oracle.com>
2009-03-10 19:43 ` I/O alignment Jenkins, Lee
2009-03-10 19:54 ` Jens Axboe
2009-03-11 9:57 ` Jens Axboe [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090311095754.GL11787@kernel.dk \
--to=jens.axboe@oracle.com \
--cc=Lee.Jenkins@hp.com \
--cc=fio@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.