From: Christoph Hellwig <hch@lst.de>
To: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Jan Kara <jack@suse.cz>,
linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org,
linux-ext4@vger.kernel.org
Subject: [PATCH 2/5] vfs: Add page_cache_seek_hole_data helper
Date: Thu, 29 Jun 2017 06:54:17 -0700 [thread overview]
Message-ID: <20170629135420.21357-3-hch@lst.de> (raw)
In-Reply-To: <20170629135420.21357-1-hch@lst.de>
From: Andreas Gruenbacher <agruenba@redhat.com>
Both ext4 and xfs implement seeking for the next hole or piece of data
in unwritten extents by scanning the page cache, and both versions share
the same bug when iterating the buffers of a page: the start offset into
the page isn't taken into account, so when a page fits more than two
filesystem blocks, things will go wrong. For example, on a filesystem
with a block size of 1k, the following command will fail:
xfs_io -f -c "falloc 0 4k" \
-c "pwrite 1k 1k" \
-c "pwrite 3k 1k" \
-c "seek -a -r 0" foo
In this example, neither lseek(fd, 1024, SEEK_HOLE) nor lseek(fd, 2048,
SEEK_DATA) will return the correct result.
Introduce a generic vfs helper for seeking in the page cache that gets
this right. The next commits will replace the filesystem specific
implementations.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
[hch: dropped the export]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/buffer.c | 124 ++++++++++++++++++++++++++++++++++++++++++++
include/linux/buffer_head.h | 2 +
2 files changed, 126 insertions(+)
diff --git a/fs/buffer.c b/fs/buffer.c
index 161be58c5cb0..b3674eb7c9c0 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -3492,6 +3492,130 @@ int bh_submit_read(struct buffer_head *bh)
}
EXPORT_SYMBOL(bh_submit_read);
+/*
+ * Seek for SEEK_DATA / SEEK_HOLE within @page, starting at @lastoff.
+ *
+ * Returns the offset within the file on success, and -ENOENT otherwise.
+ */
+static loff_t
+page_seek_hole_data(struct page *page, loff_t lastoff, int whence)
+{
+ loff_t offset = page_offset(page);
+ struct buffer_head *bh, *head;
+ bool seek_data = whence == SEEK_DATA;
+
+ if (lastoff < offset)
+ lastoff = offset;
+
+ bh = head = page_buffers(page);
+ do {
+ offset += bh->b_size;
+ if (lastoff >= offset)
+ continue;
+
+ /*
+ * Unwritten extents that have data in the page cache covering
+ * them can be identified by the BH_Unwritten state flag.
+ * Pages with multiple buffers might have a mix of holes, data
+ * and unwritten extents - any buffer with valid data in it
+ * should have BH_Uptodate flag set on it.
+ */
+
+ if ((buffer_unwritten(bh) || buffer_uptodate(bh)) == seek_data)
+ return lastoff;
+
+ lastoff = offset;
+ } while ((bh = bh->b_this_page) != head);
+ return -ENOENT;
+}
+
+/*
+ * Seek for SEEK_DATA / SEEK_HOLE in the page cache.
+ *
+ * Within unwritten extents, the page cache determines which parts are holes
+ * and which are data: unwritten and uptodate buffer heads count as data;
+ * everything else counts as a hole.
+ *
+ * Returns the resulting offset on successs, and -ENOENT otherwise.
+ */
+loff_t
+page_cache_seek_hole_data(struct inode *inode, loff_t offset, loff_t length,
+ int whence)
+{
+ pgoff_t index = offset >> PAGE_SHIFT;
+ pgoff_t end = DIV_ROUND_UP(offset + length, PAGE_SIZE);
+ loff_t lastoff = offset;
+ struct pagevec pvec;
+
+ if (length <= 0)
+ return -ENOENT;
+
+ pagevec_init(&pvec, 0);
+
+ do {
+ unsigned want, nr_pages, i;
+
+ want = min_t(unsigned, end - index, PAGEVEC_SIZE);
+ nr_pages = pagevec_lookup(&pvec, inode->i_mapping, index, want);
+ if (nr_pages == 0)
+ break;
+
+ for (i = 0; i < nr_pages; i++) {
+ struct page *page = pvec.pages[i];
+
+ /*
+ * At this point, the page may be truncated or
+ * invalidated (changing page->mapping to NULL), or
+ * even swizzled back from swapper_space to tmpfs file
+ * mapping. However, page->index will not change
+ * because we have a reference on the page.
+ *
+ * If current page offset is beyond where we've ended,
+ * we've found a hole.
+ */
+ if (whence == SEEK_HOLE &&
+ lastoff < page_offset(page))
+ goto check_range;
+
+ /* Searching done if the page index is out of range. */
+ if (page->index >= end)
+ goto not_found;
+
+ lock_page(page);
+ if (likely(page->mapping == inode->i_mapping) &&
+ page_has_buffers(page)) {
+ lastoff = page_seek_hole_data(page, lastoff, whence);
+ if (lastoff >= 0) {
+ unlock_page(page);
+ goto check_range;
+ }
+ }
+ unlock_page(page);
+ lastoff = page_offset(page) + PAGE_SIZE;
+ }
+
+ /* Searching done if fewer pages returned than wanted. */
+ if (nr_pages < want)
+ break;
+
+ index = pvec.pages[i - 1]->index + 1;
+ pagevec_release(&pvec);
+ } while (index < end);
+
+ /* When no page at lastoff and we are not done, we found a hole. */
+ if (whence != SEEK_HOLE)
+ goto not_found;
+
+check_range:
+ if (lastoff < offset + length)
+ goto out;
+not_found:
+ lastoff = -ENOENT;
+out:
+ pagevec_release(&pvec);
+ return lastoff;
+}
+
void __init buffer_init(void)
{
unsigned long nrpages;
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index bd029e52ef5e..ad4e024ce17e 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -201,6 +201,8 @@ void write_boundary_block(struct block_device *bdev,
sector_t bblock, unsigned blocksize);
int bh_uptodate_or_lock(struct buffer_head *bh);
int bh_submit_read(struct buffer_head *bh);
+loff_t page_cache_seek_hole_data(struct inode *inode, loff_t offset,
+ loff_t length, int whence);
extern int buffer_heads_over_limit;
--
2.11.0
next prev parent reply other threads:[~2017-06-29 13:54 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-29 13:54 lseek SEEK_HOLE / SEEK_DATA fixes and switch to iomap V3.2-hch Christoph Hellwig
2017-06-29 13:54 ` [PATCH 1/5] xfs: remove a whitespace-only line from xfs_fs_get_nextdqblk Christoph Hellwig
2017-07-01 2:45 ` Darrick J. Wong
2017-06-29 13:54 ` Christoph Hellwig [this message]
2017-06-29 13:54 ` [PATCH 3/5] vfs: Add iomap_seek_hole and iomap_seek_data helpers Christoph Hellwig
2017-06-29 13:54 ` [PATCH 4/5] xfs: Switch to iomap for SEEK_HOLE / SEEK_DATA Christoph Hellwig
2017-06-29 13:54 ` [PATCH 5/5] ext4: " Christoph Hellwig
2017-06-30 11:51 ` Andreas Gruenbacher
2017-06-30 12:11 ` Andreas Gruenbacher
2017-06-30 17:37 ` Christoph Hellwig
2017-07-01 7:03 ` Darrick J. Wong
2017-07-02 15:24 ` Christoph Hellwig
2017-07-03 15:03 ` Andreas Gruenbacher
2017-07-03 16:21 ` Darrick J. Wong
2017-07-03 22:58 ` Dave Chinner
2017-07-07 21:27 ` Andreas Gruenbacher
2017-07-07 21:27 ` [PATCH v4 1/3] ext4: Add missing locking around iomap_seek_{hole,data} Andreas Gruenbacher
2017-07-12 9:17 ` Christoph Hellwig
2017-07-07 21:28 ` [PATCH v4 2/3] iomap: Switch from blkno to physical offset Andreas Gruenbacher
2017-07-12 9:20 ` Christoph Hellwig
2017-07-07 21:28 ` [PATCH v4 3/3] ext4: Add IOMAP_REPORT support for inline data Andreas Gruenbacher
2017-07-25 12:16 ` Jan Kara
2017-07-25 12:19 ` Andreas Gruenbacher
2017-07-25 12:45 ` [PATCH 5/5] ext4: Switch to iomap for SEEK_HOLE / SEEK_DATA Jan Kara
2017-08-29 13:46 ` Andreas Gruenbacher
2017-06-29 18:47 ` lseek SEEK_HOLE / SEEK_DATA fixes and switch to iomap V3.2-hch Darrick J. Wong
2017-06-29 18:53 ` Christoph Hellwig
2017-07-03 15:11 ` Andreas Gruenbacher
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170629135420.21357-3-hch@lst.de \
--to=hch@lst.de \
--cc=agruenba@redhat.com \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).