From: Jamie Lokier <jamie@shareable.org>
To: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Al Viro <viro@ftp.linux.org.uk>,
hch@lst.de, akpm@osdl.org, davem@redhat.com,
Ulrich Drepper <drepper@redhat.com>,
Linus Torvalds <torvalds@osdl.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
lkml <linux-kernel@vger.kernel.org>
Subject: Re: [RFC][PATCH] New iovec support & VFS changes
Date: Tue, 20 Dec 2005 16:59:39 +0000 [thread overview]
Message-ID: <20051220165939.GA16465@mail.shareable.org> (raw)
In-Reply-To: <1135095487.19193.90.camel@localhost.localdomain>
Badari Pulavarty wrote:
> I was trying to add support for preadv()/pwritev() for threaded
> databases. Currently the patch is in -mm tree.
>
> http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.15-
> rc5/2.6.15-rc5-mm3/broken-out/support-for-preadv-pwritev.patch
>
> This needs a new set of system calls. Ulrich Drepper pointed out
> that, instead of adding a system call for the limited functionality
> it provides, why not we add new iovec interface as follows (offset-per-
> segment) which provides greater functionality & flexibility.
>
> +struct niovec
> +{
> + void __user *iov_base;
> + __kernel_size_t iov_len;
> + __kernel_loff_t iov_off; /* NEW */
> +};
For a database, it's also helpful to know when an operation is going
to block on I/O (i.e. because the data isn't cached, or write buffers
full) and if that's going to happen, move it to another thread, or
move other operations to another thread. This allows a program to
continue to work on other things concurrently with I/O more
effectively than thread pool guesswork.
So if you add these new syscalls, it would be helpful to add a "flags"
argument to each of them, and define one flag: "don't block on I/O".
When the flag is set, the syscalls should do as much reading or
writing as they can without blocking, and then return the count, or
EAGAIN.
(FreeBSD's sendfile() has an SF_NODISKIO flag which means this, and it
is used in exactly that way: so a program can move the sendfile() to
another thread iff that is necessary to avoid blocking the program.)
There's also a case for making these into async I/O operations.
However, if there is any possibility of async I/O blocking a task for
a long time (which there is with Linux async I/O apparently), that is
not half as useful as a flag to stop I/O when it would block, and let
the program decide what to do.
I mention this precisely because it's relevant to I/O performance of
databases and similar programs, and therefore a reason to have a
"flags" argument to these new syscalls, even if no flags are defined
at first.
-- Jamie
next prev parent reply other threads:[~2005-12-20 17:00 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-12-20 16:18 [RFC][PATCH] New iovec support & VFS changes Badari Pulavarty
2005-12-20 16:59 ` Jamie Lokier [this message]
2005-12-20 17:26 ` Badari Pulavarty
2005-12-20 18:00 ` Avi Kivity
2005-12-20 18:08 ` Badari Pulavarty
2005-12-20 18:20 ` Avi Kivity
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20051220165939.GA16465@mail.shareable.org \
--to=jamie@shareable.org \
--cc=akpm@osdl.org \
--cc=davem@redhat.com \
--cc=drepper@redhat.com \
--cc=hch@lst.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbadari@us.ibm.com \
--cc=torvalds@osdl.org \
--cc=viro@ftp.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.