From: Jens Axboe <jens.axboe@oracle.com>
To: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: marty <martyleisner@yahoo.com>,
linux-kernel@vger.kernel.org, martin.leisner@xerox.com
Subject: Re: disk IO directly from PCI memory to block device sectors
Date: Fri, 26 Sep 2008 13:34:04 +0200 [thread overview]
Message-ID: <20080926113404.GY2677@kernel.dk> (raw)
In-Reply-To: <20080926101954.GW2677@kernel.dk>
On Fri, Sep 26 2008, Jens Axboe wrote:
> Another alternative would be using splice - if the pci device exposed a
> char device node, you could support ->splice_read() there which would
> just fill the pages into the pipe buffer. Then change the block device
> fops ->splice_write() to go direct to the block device through a bio
> instead of using the page cache based generic_file_splice_write(). Such
> a change would actually make sense to do, if the block device has been
> opened with O_DIRECT. And it would get you about the same performance as
> doing it in-kernel, the only extra overhead would be two syscalls per
> 64k (well probably only one extra syscall, since you probably need an
> ioctl/syscall to initiate the in-kernel activity as well). So just about
> as free as you could get.
Something like this, totally untested but should get the point across.
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 57e2786..fd06032 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -24,6 +24,7 @@
#include <linux/uio.h>
#include <linux/namei.h>
#include <linux/log2.h>
+#include <linux/splice.h>
#include <asm/uaccess.h>
#include "internal.h"
@@ -1224,6 +1225,77 @@ static long block_ioctl(struct file *file, unsigned cmd, unsigned long arg)
return blkdev_ioctl(file->f_mapping->host, file, cmd, arg);
}
+static void block_splice_end_io(struct bio *bio, int err)
+{
+ bio_put(bio);
+}
+
+static int pipe_to_disk(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
+ struct splice_desc *sd)
+{
+ struct block_device *bdev = I_BDEV(sd->u.file->f_mapping->host);
+ struct bio *bio;
+ int ret, bs;
+
+ bs = queue_hardsect_size(bdev_get_queue(bdev));
+ if (sd->pos & (bs - 1))
+ return -EINVAL;
+
+ ret = buf->ops->confirm(pipe, buf);
+ if (unlikely(ret))
+ return ret;
+
+ bio = bio_alloc(GFP_KERNEL, 1);
+ bio->bi_sector = sd->pos / bs;
+ bio->bi_bdev = bdev;
+ bio->bi_end_io = block_splice_end_io;
+
+ bio_add_page(bio, buf->page, buf->len, buf->offset);
+
+ submit_bio(WRITE, bio);
+ return buf->len;
+}
+
+/*
+ * Splice to file opened with O_DIRECT. Bypass caching completely and
+ * just go direct-to-bio
+ */
+static ssize_t __block_splice_write(struct pipe_inode_info *pipe,
+ struct file *out, loff_t *ppos, size_t len,
+ unsigned int flags)
+{
+ struct splice_desc sd = {
+ .total_len = len,
+ .flags = flags,
+ .pos = *ppos,
+ .u.file = out,
+ };
+ struct inode *inode = out->f_mapping->host;
+ ssize_t ret;
+
+ if (unlikely(*ppos & 511))
+ return -EINVAL;
+
+ inode_double_lock(inode, pipe->inode);
+ ret = __splice_from_pipe(pipe, &sd, pipe_to_disk);
+ inode_double_unlock(inode, pipe->inode);
+
+ if (ret > 0)
+ *ppos += ret;
+
+ return ret;
+}
+
+static ssize_t block_splice_write(struct pipe_inode_info *pipe,
+ struct file *out, loff_t *ppos, size_t len,
+ unsigned int flags)
+{
+ if (out->f_flags & O_DIRECT)
+ return __block_splice_write(pipe, out, ppos, len, flags);
+
+ return generic_file_splice_write(pipe, out, ppos, len, flags);
+}
+
static const struct address_space_operations def_blk_aops = {
.readpage = blkdev_readpage,
.writepage = blkdev_writepage,
@@ -1249,7 +1321,7 @@ const struct file_operations def_blk_fops = {
.compat_ioctl = compat_blkdev_ioctl,
#endif
.splice_read = generic_file_splice_read,
- .splice_write = generic_file_splice_write,
+ .splice_write = block_splice_write,
};
int ioctl_by_bdev(struct block_device *bdev, unsigned cmd, unsigned long arg)
--
Jens Axboe
next prev parent reply other threads:[~2008-09-26 11:34 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-09-26 7:29 disk IO directly from PCI memory to block device sectors marty
2008-09-26 8:03 ` Jens Axboe
2008-09-26 8:46 ` Alan Cox
2008-09-26 9:11 ` Jens Axboe
2008-09-26 10:06 ` Alan Cox
2008-09-26 10:19 ` Jens Axboe
2008-09-26 11:34 ` Jens Axboe [this message]
2008-09-26 15:51 ` Leisner, Martin
2008-09-29 13:02 ` Jens Axboe
2008-10-01 19:05 ` Jens Axboe
2008-10-02 16:15 ` Leon Woestenberg
2008-10-02 16:32 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080926113404.GY2677@kernel.dk \
--to=jens.axboe@oracle.com \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=linux-kernel@vger.kernel.org \
--cc=martin.leisner@xerox.com \
--cc=martyleisner@yahoo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.