From: Chris Mason <chris.mason@oracle.com>
To: linux-fsdevel@vger.kernel.org
Subject: [PATCH RFC] Add cache-only generic file read
Date: Sun, 4 Oct 2009 19:20:23 -0400 [thread overview]
Message-ID: <20091004232023.GA20160@think> (raw)
Hello everyone,
We're still hammering out O_DIRECT code for btrfs that doesn't go
through the page cache. Until things are done, it helps to have
O_DIRECT that goes through the page cache and just immediately
invalidates. It doesn't give us AIO, but it is still useful for
some workloads.
Btrfs has had O_DIRECT writes for a while by invalidating the cache in
its own file_write. Reads are harder because filemap.c doesn't quite
export enough. Rather than export things and make a btrfs read, I've
made a generic cache-only read function that includes O_DIRECT
invalidation.
What does everyone think of adding something like this:
-chris
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2adaa25..9ee97a9 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2203,6 +2203,9 @@ extern int generic_file_readonly_mmap(struct file *, struct vm_area_struct *);
extern int file_read_actor(read_descriptor_t * desc, struct page *page, unsigned long offset, unsigned long size);
int generic_write_checks(struct file *file, loff_t *pos, size_t *count, int isblk);
extern ssize_t generic_file_aio_read(struct kiocb *, const struct iovec *, unsigned long, loff_t);
+extern ssize_t generic_file_cached_read(struct kiocb *iocb,
+ const struct iovec *iov,
+ unsigned long nr_segs, loff_t pos);
extern ssize_t __generic_file_aio_write(struct kiocb *, const struct iovec *, unsigned long,
loff_t *);
extern ssize_t generic_file_aio_write(struct kiocb *, const struct iovec *, unsigned long, loff_t);
diff --git a/mm/filemap.c b/mm/filemap.c
index ef169f3..506d769 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1269,6 +1269,85 @@ int generic_segment_checks(const struct iovec *iov,
EXPORT_SYMBOL(generic_segment_checks);
/**
+ * generic_file_cached_read - generic filesystem read routine, no O_DIRECT
+ * is done
+ * @iocb: kernel I/O control block
+ * @iov: io vector request
+ * @nr_segs: number of segments in the iovec
+ * @pos: current file position
+ *
+ * This is the "read()" routine for all filesystems
+ * that can use the page cache directly, but do not support
+ * O_DIRECT. The O_DIRECT part is emulated with page cache invalidatation
+ */
+ssize_t
+generic_file_cached_read(struct kiocb *iocb, const struct iovec *iov,
+ unsigned long nr_segs, loff_t pos)
+{
+ struct file *filp = iocb->ki_filp;
+ ssize_t retval;
+ unsigned long seg;
+ size_t count;
+ loff_t *ppos = &iocb->ki_pos;
+ struct address_space *mapping;
+ struct inode *inode;
+ loff_t size;
+ size_t iov_len = 0;
+
+ count = 0;
+ retval = generic_segment_checks(iov, &nr_segs, &count, VERIFY_WRITE);
+ if (retval)
+ return retval;
+
+ mapping = filp->f_mapping;
+ inode = mapping->host;
+ size = i_size_read(inode);
+
+ /* coalesce the iovecs and go direct-to-BIO for O_DIRECT */
+ if (filp->f_flags & O_DIRECT) {
+ if (!count)
+ goto out; /* skip atime */
+
+ iov_len = iov_length(iov, nr_segs);
+
+ if (pos < size) {
+ retval = filemap_write_and_wait_range(mapping, pos,
+ pos + iov_len - 1);
+ if (retval)
+ goto out;
+ }
+ }
+
+ for (seg = 0; seg < nr_segs; seg++) {
+ read_descriptor_t desc;
+
+ desc.written = 0;
+ desc.arg.buf = iov[seg].iov_base;
+ desc.count = iov[seg].iov_len;
+ if (desc.count == 0)
+ continue;
+ desc.error = 0;
+ do_generic_file_read(filp, ppos, &desc, file_read_actor);
+ retval += desc.written;
+ if (desc.error) {
+ retval = retval ?: desc.error;
+ break;
+ }
+ if (desc.count > 0)
+ break;
+ }
+ if ((filp->f_flags & O_DIRECT) && pos < size) {
+ invalidate_inode_pages2_range(mapping,
+ pos >> PAGE_CACHE_SHIFT,
+ (pos + iov_len - 1) >> PAGE_CACHE_SHIFT);
+ }
+ file_accessed(filp);
+out:
+ return retval;
+}
+EXPORT_SYMBOL(generic_file_cached_read);
+
+/**
* generic_file_aio_read - generic filesystem read routine
* @iocb: kernel I/O control block
* @iov: io vector request
--
1.6.4.1
next reply other threads:[~2009-10-04 23:21 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-10-04 23:20 Chris Mason [this message]
2009-10-05 21:58 ` [PATCH RFC] Add cache-only generic file read Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20091004232023.GA20160@think \
--to=chris.mason@oracle.com \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).