* [PATCH v3 0/5] lseek SEEK_HOLE / SEEK_DATA fix and switch to iomap
@ 2017-06-26 14:25 Andreas Gruenbacher
2017-06-26 14:25 ` [PATCH v3 1/5] vfs: Add page_cache_seek_hole_data helper Andreas Gruenbacher
` (4 more replies)
0 siblings, 5 replies; 8+ messages in thread
From: Andreas Gruenbacher @ 2017-06-26 14:25 UTC (permalink / raw)
To: linux-fsdevel; +Cc: Andreas Gruenbacher, linux-xfs, linux-ext4
Here is an updated version of the lseek and iomap patches. Changes:
* Some cleanups to iomap_seek_{hole,data}_actor.
* GFS2 patches omitted for now; those can be discussed later.
* Per request, lseek and iomap patches posted in a single patch queue.
Thanks,
Andreas
Andreas Gruenbacher (5):
vfs: Add page_cache_seek_hole_data helper
ext4: Switch to page_cache_seek_hole_data
xfs: Switch to page_cache_seek_hole_data
vfs: Add iomap_seek_hole_data helper
xfs: Switch to iomap for SEEK_HOLE / SEEK_DATA
fs/buffer.c | 125 +++++++++++++++++++++++++
fs/ext4/file.c | 144 +++--------------------------
fs/iomap.c | 91 ++++++++++++++++++
fs/xfs/xfs_file.c | 218 +++-----------------------------------------
include/linux/buffer_head.h | 2 +
include/linux/iomap.h | 3 +
6 files changed, 248 insertions(+), 335 deletions(-)
--
2.7.5
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v3 1/5] vfs: Add page_cache_seek_hole_data helper
2017-06-26 14:25 [PATCH v3 0/5] lseek SEEK_HOLE / SEEK_DATA fix and switch to iomap Andreas Gruenbacher
@ 2017-06-26 14:25 ` Andreas Gruenbacher
2017-06-26 14:25 ` [PATCH v3 2/5] ext4: Switch to page_cache_seek_hole_data Andreas Gruenbacher
` (3 subsequent siblings)
4 siblings, 0 replies; 8+ messages in thread
From: Andreas Gruenbacher @ 2017-06-26 14:25 UTC (permalink / raw)
To: linux-fsdevel; +Cc: Andreas Gruenbacher, linux-xfs, linux-ext4
Both ext4 and xfs implement seeking for the next hole or piece of data
in unwritten extents by scanning the page cache, and both versions share
the same bug when iterating the buffers of a page: the start offset into
the page isn't taken into account, so when a page fits more than two
filesystem blocks, things will go wrong. For example, on a filesystem
with a block size of 1k, the following command will fail:
xfs_io -f -c "falloc 0 4k" \
-c "pwrite 1k 1k" \
-c "pwrite 3k 1k" \
-c "seek -a -r 0" foo
In this example, neither lseek(fd, 1024, SEEK_HOLE) nor lseek(fd, 2048,
SEEK_DATA) will return the correct result.
Introduce a generic vfs helper for seeking in the page cache that gets
this right. The next commits will replace the filesystem specific
implementations.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
fs/buffer.c | 125 ++++++++++++++++++++++++++++++++++++++++++++
include/linux/buffer_head.h | 2 +
2 files changed, 127 insertions(+)
diff --git a/fs/buffer.c b/fs/buffer.c
index 4d5d03b..7702269 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -3485,6 +3485,131 @@ int bh_submit_read(struct buffer_head *bh)
}
EXPORT_SYMBOL(bh_submit_read);
+/*
+ * Seek for SEEK_DATA / SEEK_HOLE within @page, starting at @lastoff.
+ *
+ * Returns the offset within the file on success, and -ENOENT otherwise.
+ */
+static loff_t
+page_seek_hole_data(struct page *page, loff_t lastoff, int whence)
+{
+ loff_t offset = page_offset(page);
+ struct buffer_head *bh, *head;
+ bool seek_data = whence == SEEK_DATA;
+
+ if (lastoff < offset)
+ lastoff = offset;
+
+ bh = head = page_buffers(page);
+ do {
+ offset += bh->b_size;
+ if (lastoff >= offset)
+ continue;
+
+ /*
+ * Unwritten extents that have data in the page cache covering
+ * them can be identified by the BH_Unwritten state flag.
+ * Pages with multiple buffers might have a mix of holes, data
+ * and unwritten extents - any buffer with valid data in it
+ * should have BH_Uptodate flag set on it.
+ */
+
+ if ((buffer_unwritten(bh) || buffer_uptodate(bh)) == seek_data)
+ return lastoff;
+
+ lastoff = offset;
+ } while ((bh = bh->b_this_page) != head);
+ return -ENOENT;
+}
+
+/*
+ * Seek for SEEK_DATA / SEEK_HOLE in the page cache.
+ *
+ * Within unwritten extents, the page cache determines which parts are holes
+ * and which are data: unwritten and uptodate buffer heads count as data;
+ * everything else counts as a hole.
+ *
+ * Returns the resulting offset on successs, and -ENOENT otherwise.
+ */
+loff_t
+page_cache_seek_hole_data(struct inode *inode, loff_t offset, loff_t length,
+ int whence)
+{
+ pgoff_t index = offset >> PAGE_SHIFT;
+ pgoff_t end = DIV_ROUND_UP(offset + length, PAGE_SIZE);
+ loff_t lastoff = offset;
+ struct pagevec pvec;
+
+ if (length <= 0)
+ return -ENOENT;
+
+ pagevec_init(&pvec, 0);
+
+ do {
+ unsigned want, nr_pages, i;
+
+ want = min_t(unsigned, end - index, PAGEVEC_SIZE);
+ nr_pages = pagevec_lookup(&pvec, inode->i_mapping, index, want);
+ if (nr_pages == 0)
+ break;
+
+ for (i = 0; i < nr_pages; i++) {
+ struct page *page = pvec.pages[i];
+
+ /*
+ * At this point, the page may be truncated or
+ * invalidated (changing page->mapping to NULL), or
+ * even swizzled back from swapper_space to tmpfs file
+ * mapping. However, page->index will not change
+ * because we have a reference on the page.
+ *
+ * If current page offset is beyond where we've ended,
+ * we've found a hole.
+ */
+ if (whence == SEEK_HOLE &&
+ lastoff < page_offset(page))
+ goto check_range;
+
+ /* Searching done if the page index is out of range. */
+ if (page->index >= end)
+ goto not_found;
+
+ lock_page(page);
+ if (likely(page->mapping == inode->i_mapping) &&
+ page_has_buffers(page)) {
+ lastoff = page_seek_hole_data(page, lastoff, whence);
+ if (lastoff >= 0) {
+ unlock_page(page);
+ goto check_range;
+ }
+ }
+ unlock_page(page);
+ lastoff = page_offset(page) + PAGE_SIZE;
+ }
+
+ /* Searching done if fewer pages returned than wanted. */
+ if (nr_pages < want)
+ break;
+
+ index = pvec.pages[i - 1]->index + 1;
+ pagevec_release(&pvec);
+ } while (index < end);
+
+ /* When no page at lastoff and we are not done, we found a hole. */
+ if (whence != SEEK_HOLE)
+ goto not_found;
+
+check_range:
+ if (lastoff < offset + length)
+ goto out;
+not_found:
+ lastoff = -ENOENT;
+out:
+ pagevec_release(&pvec);
+ return lastoff;
+}
+EXPORT_SYMBOL_GPL(page_cache_seek_hole_data);
+
void __init buffer_init(void)
{
unsigned long nrpages;
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index e0abeba..c8dae55 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -202,6 +202,8 @@ void write_boundary_block(struct block_device *bdev,
sector_t bblock, unsigned blocksize);
int bh_uptodate_or_lock(struct buffer_head *bh);
int bh_submit_read(struct buffer_head *bh);
+loff_t page_cache_seek_hole_data(struct inode *inode, loff_t offset,
+ loff_t length, int whence);
extern int buffer_heads_over_limit;
--
2.7.5
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v3 2/5] ext4: Switch to page_cache_seek_hole_data
2017-06-26 14:25 [PATCH v3 0/5] lseek SEEK_HOLE / SEEK_DATA fix and switch to iomap Andreas Gruenbacher
2017-06-26 14:25 ` [PATCH v3 1/5] vfs: Add page_cache_seek_hole_data helper Andreas Gruenbacher
@ 2017-06-26 14:25 ` Andreas Gruenbacher
2017-06-26 14:25 ` [PATCH v3 3/5] xfs: " Andreas Gruenbacher
` (2 subsequent siblings)
4 siblings, 0 replies; 8+ messages in thread
From: Andreas Gruenbacher @ 2017-06-26 14:25 UTC (permalink / raw)
To: linux-fsdevel; +Cc: Andreas Gruenbacher, linux-xfs, linux-ext4
Fix SEEK_HOLE / SEEK_DATA when a page fits more than two filesystem
blocks (see the commit that introduces page_cache_seek_hole_data).
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
fs/ext4/file.c | 144 ++++++---------------------------------------------------
1 file changed, 13 insertions(+), 131 deletions(-)
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 94eeef2..3e8f3d1 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -461,129 +461,6 @@ static int ext4_file_open(struct inode * inode, struct file * filp)
}
/*
- * Here we use ext4_map_blocks() to get a block mapping for a extent-based
- * file rather than ext4_ext_walk_space() because we can introduce
- * SEEK_DATA/SEEK_HOLE for block-mapped and extent-mapped file at the same
- * function. When extent status tree has been fully implemented, it will
- * track all extent status for a file and we can directly use it to
- * retrieve the offset for SEEK_DATA/SEEK_HOLE.
- */
-
-/*
- * When we retrieve the offset for SEEK_DATA/SEEK_HOLE, we would need to
- * lookup page cache to check whether or not there has some data between
- * [startoff, endoff] because, if this range contains an unwritten extent,
- * we determine this extent as a data or a hole according to whether the
- * page cache has data or not.
- */
-static int ext4_find_unwritten_pgoff(struct inode *inode,
- int whence,
- ext4_lblk_t end_blk,
- loff_t *offset)
-{
- struct pagevec pvec;
- unsigned int blkbits;
- pgoff_t index;
- pgoff_t end;
- loff_t endoff;
- loff_t startoff;
- loff_t lastoff;
- int found = 0;
-
- blkbits = inode->i_sb->s_blocksize_bits;
- startoff = *offset;
- lastoff = startoff;
- endoff = (loff_t)end_blk << blkbits;
-
- index = startoff >> PAGE_SHIFT;
- end = (endoff - 1) >> PAGE_SHIFT;
-
- pagevec_init(&pvec, 0);
- do {
- int i, num;
- unsigned long nr_pages;
-
- num = min_t(pgoff_t, end - index, PAGEVEC_SIZE - 1) + 1;
- nr_pages = pagevec_lookup(&pvec, inode->i_mapping, index,
- (pgoff_t)num);
- if (nr_pages == 0)
- break;
-
- for (i = 0; i < nr_pages; i++) {
- struct page *page = pvec.pages[i];
- struct buffer_head *bh, *head;
-
- /*
- * If current offset is smaller than the page offset,
- * there is a hole at this offset.
- */
- if (whence == SEEK_HOLE && lastoff < endoff &&
- lastoff < page_offset(pvec.pages[i])) {
- found = 1;
- *offset = lastoff;
- goto out;
- }
-
- if (page->index > end)
- goto out;
-
- lock_page(page);
-
- if (unlikely(page->mapping != inode->i_mapping)) {
- unlock_page(page);
- continue;
- }
-
- if (!page_has_buffers(page)) {
- unlock_page(page);
- continue;
- }
-
- if (page_has_buffers(page)) {
- lastoff = page_offset(page);
- bh = head = page_buffers(page);
- do {
- if (buffer_uptodate(bh) ||
- buffer_unwritten(bh)) {
- if (whence == SEEK_DATA)
- found = 1;
- } else {
- if (whence == SEEK_HOLE)
- found = 1;
- }
- if (found) {
- *offset = max_t(loff_t,
- startoff, lastoff);
- unlock_page(page);
- goto out;
- }
- lastoff += bh->b_size;
- bh = bh->b_this_page;
- } while (bh != head);
- }
-
- lastoff = page_offset(page) + PAGE_SIZE;
- unlock_page(page);
- }
-
- /* The no. of pages is less than our desired, we are done. */
- if (nr_pages < num)
- break;
-
- index = pvec.pages[i - 1]->index + 1;
- pagevec_release(&pvec);
- } while (index <= end);
-
- if (whence == SEEK_HOLE && lastoff < endoff) {
- found = 1;
- *offset = lastoff;
- }
-out:
- pagevec_release(&pvec);
- return found;
-}
-
-/*
* ext4_seek_data() retrieves the offset for SEEK_DATA.
*/
static loff_t ext4_seek_data(struct file *file, loff_t offset, loff_t maxsize)
@@ -591,7 +468,7 @@ static loff_t ext4_seek_data(struct file *file, loff_t offset, loff_t maxsize)
struct inode *inode = file->f_mapping->host;
struct extent_status es;
ext4_lblk_t start, last, end;
- loff_t dataoff, isize;
+ loff_t dataoff, length, isize;
int blkbits;
int ret;
@@ -630,8 +507,10 @@ static loff_t ext4_seek_data(struct file *file, loff_t offset, loff_t maxsize)
* it will be as a data or a hole according to page
* cache that has data or not.
*/
- if (ext4_find_unwritten_pgoff(inode, SEEK_DATA,
- es.es_lblk + es.es_len, &dataoff))
+ length = ((loff_t)(last + es.es_len) << blkbits) - dataoff;
+ dataoff = page_cache_seek_hole_data(inode, dataoff,
+ length, SEEK_DATA);
+ if (dataoff >= 0)
break;
last += es.es_len;
dataoff = (loff_t)last << blkbits;
@@ -654,7 +533,7 @@ static loff_t ext4_seek_hole(struct file *file, loff_t offset, loff_t maxsize)
struct inode *inode = file->f_mapping->host;
struct extent_status es;
ext4_lblk_t start, last, end;
- loff_t holeoff, isize;
+ loff_t holeoff, length, isize;
int blkbits;
int ret;
@@ -689,10 +568,13 @@ static loff_t ext4_seek_hole(struct file *file, loff_t offset, loff_t maxsize)
* it will be as a data or a hole according to page
* cache that has data or not.
*/
- if (ext4_es_is_unwritten(&es) &&
- ext4_find_unwritten_pgoff(inode, SEEK_HOLE,
- last + es.es_len, &holeoff))
- break;
+ if (ext4_es_is_unwritten(&es)) {
+ length = ((loff_t)(last + es.es_len) << blkbits) - holeoff;
+ holeoff = page_cache_seek_hole_data(inode, holeoff,
+ length, SEEK_HOLE);
+ if (holeoff >= 0)
+ break;
+ }
last += es.es_len;
holeoff = (loff_t)last << blkbits;
--
2.7.5
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v3 3/5] xfs: Switch to page_cache_seek_hole_data
2017-06-26 14:25 [PATCH v3 0/5] lseek SEEK_HOLE / SEEK_DATA fix and switch to iomap Andreas Gruenbacher
2017-06-26 14:25 ` [PATCH v3 1/5] vfs: Add page_cache_seek_hole_data helper Andreas Gruenbacher
2017-06-26 14:25 ` [PATCH v3 2/5] ext4: Switch to page_cache_seek_hole_data Andreas Gruenbacher
@ 2017-06-26 14:25 ` Andreas Gruenbacher
2017-06-26 14:25 ` [PATCH v3 4/5] vfs: Add iomap_seek_hole_data helper Andreas Gruenbacher
2017-06-26 14:25 ` [PATCH v3 5/5] xfs: Switch to iomap for SEEK_HOLE / SEEK_DATA Andreas Gruenbacher
4 siblings, 0 replies; 8+ messages in thread
From: Andreas Gruenbacher @ 2017-06-26 14:25 UTC (permalink / raw)
To: linux-fsdevel; +Cc: Andreas Gruenbacher, linux-xfs, linux-ext4
Fix SEEK_HOLE / SEEK_DATA when a page fits more than two filesystem
blocks (see the commit that introduces page_cache_seek_hole_data).
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
fs/xfs/xfs_file.c | 197 +++---------------------------------------------------
1 file changed, 9 insertions(+), 188 deletions(-)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 86701a0..962dafd 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -981,191 +981,6 @@ xfs_file_readdir(
}
/*
- * This type is designed to indicate the type of offset we would like
- * to search from page cache for xfs_seek_hole_data().
- */
-enum {
- HOLE_OFF = 0,
- DATA_OFF,
-};
-
-/*
- * Lookup the desired type of offset from the given page.
- *
- * On success, return true and the offset argument will point to the
- * start of the region that was found. Otherwise this function will
- * return false and keep the offset argument unchanged.
- */
-STATIC bool
-xfs_lookup_buffer_offset(
- struct page *page,
- loff_t *offset,
- unsigned int type)
-{
- loff_t lastoff = page_offset(page);
- bool found = false;
- struct buffer_head *bh, *head;
-
- bh = head = page_buffers(page);
- do {
- /*
- * Unwritten extents that have data in the page
- * cache covering them can be identified by the
- * BH_Unwritten state flag. Pages with multiple
- * buffers might have a mix of holes, data and
- * unwritten extents - any buffer with valid
- * data in it should have BH_Uptodate flag set
- * on it.
- */
- if (buffer_unwritten(bh) ||
- buffer_uptodate(bh)) {
- if (type == DATA_OFF)
- found = true;
- } else {
- if (type == HOLE_OFF)
- found = true;
- }
-
- if (found) {
- *offset = lastoff;
- break;
- }
- lastoff += bh->b_size;
- } while ((bh = bh->b_this_page) != head);
-
- return found;
-}
-
-/*
- * This routine is called to find out and return a data or hole offset
- * from the page cache for unwritten extents according to the desired
- * type for xfs_seek_hole_data().
- *
- * The argument offset is used to tell where we start to search from the
- * page cache. Map is used to figure out the end points of the range to
- * lookup pages.
- *
- * Return true if the desired type of offset was found, and the argument
- * offset is filled with that address. Otherwise, return false and keep
- * offset unchanged.
- */
-STATIC bool
-xfs_find_get_desired_pgoff(
- struct inode *inode,
- struct xfs_bmbt_irec *map,
- unsigned int type,
- loff_t *offset)
-{
- struct xfs_inode *ip = XFS_I(inode);
- struct xfs_mount *mp = ip->i_mount;
- struct pagevec pvec;
- pgoff_t index;
- pgoff_t end;
- loff_t endoff;
- loff_t startoff = *offset;
- loff_t lastoff = startoff;
- bool found = false;
-
- pagevec_init(&pvec, 0);
-
- index = startoff >> PAGE_SHIFT;
- endoff = XFS_FSB_TO_B(mp, map->br_startoff + map->br_blockcount);
- end = (endoff - 1) >> PAGE_SHIFT;
- do {
- int want;
- unsigned nr_pages;
- unsigned int i;
-
- want = min_t(pgoff_t, end - index, PAGEVEC_SIZE - 1) + 1;
- nr_pages = pagevec_lookup(&pvec, inode->i_mapping, index,
- want);
- if (nr_pages == 0)
- break;
-
- for (i = 0; i < nr_pages; i++) {
- struct page *page = pvec.pages[i];
- loff_t b_offset;
-
- /*
- * At this point, the page may be truncated or
- * invalidated (changing page->mapping to NULL),
- * or even swizzled back from swapper_space to tmpfs
- * file mapping. However, page->index will not change
- * because we have a reference on the page.
- *
- * If current page offset is beyond where we've ended,
- * we've found a hole.
- */
- if (type == HOLE_OFF && lastoff < endoff &&
- lastoff < page_offset(pvec.pages[i])) {
- found = true;
- *offset = lastoff;
- goto out;
- }
- /* Searching done if the page index is out of range. */
- if (page->index > end)
- goto out;
-
- lock_page(page);
- /*
- * Page truncated or invalidated(page->mapping == NULL).
- * We can freely skip it and proceed to check the next
- * page.
- */
- if (unlikely(page->mapping != inode->i_mapping)) {
- unlock_page(page);
- continue;
- }
-
- if (!page_has_buffers(page)) {
- unlock_page(page);
- continue;
- }
-
- found = xfs_lookup_buffer_offset(page, &b_offset, type);
- if (found) {
- /*
- * The found offset may be less than the start
- * point to search if this is the first time to
- * come here.
- */
- *offset = max_t(loff_t, startoff, b_offset);
- unlock_page(page);
- goto out;
- }
-
- /*
- * We either searching data but nothing was found, or
- * searching hole but found a data buffer. In either
- * case, probably the next page contains the desired
- * things, update the last offset to it so.
- */
- lastoff = page_offset(page) + PAGE_SIZE;
- unlock_page(page);
- }
-
- /*
- * The number of returned pages less than our desired, search
- * done.
- */
- if (nr_pages < want)
- break;
-
- index = pvec.pages[i - 1]->index + 1;
- pagevec_release(&pvec);
- } while (index <= end);
-
- /* No page at lastoff and we are not done - we found a hole. */
- if (type == HOLE_OFF && lastoff < endoff) {
- *offset = lastoff;
- found = true;
- }
-out:
- pagevec_release(&pvec);
- return found;
-}
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v3 4/5] vfs: Add iomap_seek_hole_data helper
2017-06-26 14:25 [PATCH v3 0/5] lseek SEEK_HOLE / SEEK_DATA fix and switch to iomap Andreas Gruenbacher
` (2 preceding siblings ...)
2017-06-26 14:25 ` [PATCH v3 3/5] xfs: " Andreas Gruenbacher
@ 2017-06-26 14:25 ` Andreas Gruenbacher
2017-06-26 14:25 ` [PATCH v3 5/5] xfs: Switch to iomap for SEEK_HOLE / SEEK_DATA Andreas Gruenbacher
4 siblings, 0 replies; 8+ messages in thread
From: Andreas Gruenbacher @ 2017-06-26 14:25 UTC (permalink / raw)
To: linux-fsdevel; +Cc: Andreas Gruenbacher, linux-xfs, linux-ext4
Filesystems can use this for implementing lseek SEEK_HOLE / SEEK_DATA
support via iomap.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
fs/iomap.c | 91 +++++++++++++++++++++++++++++++++++++++++++++++++++
include/linux/iomap.h | 3 ++
2 files changed, 94 insertions(+)
diff --git a/fs/iomap.c b/fs/iomap.c
index c71a64b..ced7423 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -584,6 +584,97 @@ int iomap_fiemap(struct inode *inode, struct fiemap_extent_info *fi,
}
EXPORT_SYMBOL_GPL(iomap_fiemap);
+static loff_t
+iomap_seek_hole_actor(struct inode *inode, loff_t offset, loff_t length,
+ void *data, struct iomap *iomap)
+{
+ switch(iomap->type) {
+ case IOMAP_UNWRITTEN:
+ offset = page_cache_seek_hole_data(inode, offset, length,
+ SEEK_HOLE);
+ if (offset < 0)
+ break;
+ /* fall through */
+ case IOMAP_HOLE:
+ *(loff_t *)data = offset;
+ return 0;
+ }
+ return length;
+}
+
+static loff_t
+iomap_seek_data_actor(struct inode *inode, loff_t offset, loff_t length,
+ void *data, struct iomap *iomap)
+{
+ switch(iomap->type) {
+ case IOMAP_UNWRITTEN:
+ offset = page_cache_seek_hole_data(inode, offset, length,
+ SEEK_DATA);
+ if (offset >= 0)
+ break;
+ /* fall through */
+ case IOMAP_HOLE:
+ return length;
+ }
+ *(loff_t *)data = offset;
+ return 0;
+}
+
+/*
+ * Filesystem helper for lseek SEEK_HOLE / SEEK_DATA.
+ */
+loff_t
+iomap_seek_hole_data(struct inode *inode, loff_t offset,
+ int whence, const struct iomap_ops *ops)
+{
+ static loff_t (*actor)(struct inode *, loff_t, loff_t, void *,
+ struct iomap *);
+ loff_t size = i_size_read(inode);
+ loff_t length;
+
+ /* Nothing to be found beyond the end of the file. */
+ if (offset >= size)
+ return -ENXIO;
+ length = size - offset;
+
+ switch(whence) {
+ case SEEK_HOLE:
+ actor = iomap_seek_hole_actor;
+ break;
+
+ case SEEK_DATA:
+ actor = iomap_seek_data_actor;
+ break;
+ }
+
+ while (length > 0) {
+ loff_t ret;
+
+ ret = iomap_apply(inode, offset, length, IOMAP_REPORT, ops,
+ &offset, actor);
+ if (ret <= 0) {
+ if (ret < 0)
+ return ret;
+ break;
+ }
+ offset += ret;
+ length -= ret;
+ }
+
+ if (length <= 0) {
+ /* There is an implicit hole at the end of the file. */
+ if (whence != SEEK_HOLE)
+ return -ENXIO;
+
+ /* The last segment can extend beyond the end of the file. */
+ if (offset > size)
+ offset = size;
+ }
+
+ return offset;
+}
+EXPORT_SYMBOL_GPL(iomap_seek_hole_data);
+
/*
* Private flags for iomap_dio, must not overlap with the public ones in
* iomap.h:
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 69f4e94..ca0a5ec 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -84,6 +84,9 @@ int iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
int iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops);
int iomap_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
loff_t start, loff_t len, const struct iomap_ops *ops);
+loff_t iomap_seek_hole_data(struct inode *inode, loff_t offset,
+ int whence, const struct iomap_ops *ops);
+
/*
* Flags for direct I/O ->end_io:
--
2.7.5
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v3 5/5] xfs: Switch to iomap for SEEK_HOLE / SEEK_DATA
2017-06-26 14:25 [PATCH v3 0/5] lseek SEEK_HOLE / SEEK_DATA fix and switch to iomap Andreas Gruenbacher
` (3 preceding siblings ...)
2017-06-26 14:25 ` [PATCH v3 4/5] vfs: Add iomap_seek_hole_data helper Andreas Gruenbacher
@ 2017-06-26 14:25 ` Andreas Gruenbacher
2017-06-27 0:34 ` Darrick J. Wong
4 siblings, 1 reply; 8+ messages in thread
From: Andreas Gruenbacher @ 2017-06-26 14:25 UTC (permalink / raw)
To: linux-fsdevel; +Cc: Andreas Gruenbacher, linux-xfs, linux-ext4
Switch to the iomap_seek_hole_data vfs helper for implementing lseek
SEEK_HOLE / SEEK_DATA. __xfs_seek_hole_data can go away once it's no
longer used by the quota code.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
fs/xfs/xfs_file.c | 21 +++++----------------
1 file changed, 5 insertions(+), 16 deletions(-)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 962dafd..94fe89a 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1131,29 +1131,18 @@ xfs_seek_hole_data(
struct xfs_inode *ip = XFS_I(inode);
struct xfs_mount *mp = ip->i_mount;
uint lock;
- loff_t offset, end;
- int error = 0;
+ loff_t offset;
if (XFS_FORCED_SHUTDOWN(mp))
return -EIO;
lock = xfs_ilock_data_map_shared(ip);
-
- end = i_size_read(inode);
- offset = __xfs_seek_hole_data(inode, start, end, whence);
- if (offset < 0) {
- error = offset;
- goto out_unlock;
- }
-
- offset = vfs_setpos(file, offset, inode->i_sb->s_maxbytes);
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v3 5/5] xfs: Switch to iomap for SEEK_HOLE / SEEK_DATA
2017-06-26 14:25 ` [PATCH v3 5/5] xfs: Switch to iomap for SEEK_HOLE / SEEK_DATA Andreas Gruenbacher
@ 2017-06-27 0:34 ` Darrick J. Wong
2017-06-27 12:56 ` Dave Chinner
0 siblings, 1 reply; 8+ messages in thread
From: Darrick J. Wong @ 2017-06-27 0:34 UTC (permalink / raw)
To: Andreas Gruenbacher
Cc: linux-fsdevel, linux-xfs, linux-ext4, Christoph Hellwig
[adding Christoph to cc]
On Mon, Jun 26, 2017 at 04:25:18PM +0200, Andreas Gruenbacher wrote:
> Switch to the iomap_seek_hole_data vfs helper for implementing lseek
> SEEK_HOLE / SEEK_DATA. __xfs_seek_hole_data can go away once it's no
> longer used by the quota code.
>
> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
> ---
> fs/xfs/xfs_file.c | 21 +++++----------------
> 1 file changed, 5 insertions(+), 16 deletions(-)
>
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 962dafd..94fe89a 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -1131,29 +1131,18 @@ xfs_seek_hole_data(
> struct xfs_inode *ip = XFS_I(inode);
> struct xfs_mount *mp = ip->i_mount;
> uint lock;
> - loff_t offset, end;
> - int error = 0;
> + loff_t offset;
>
> if (XFS_FORCED_SHUTDOWN(mp))
> return -EIO;
>
> lock = xfs_ilock_data_map_shared(ip);
> -
> - end = i_size_read(inode);
> - offset = __xfs_seek_hole_data(inode, start, end, whence);
> - if (offset < 0) {
> - error = offset;
> - goto out_unlock;
> - }
> -
> - offset = vfs_setpos(file, offset, inode->i_sb->s_maxbytes);
> -
> -out_unlock:
> + offset = iomap_seek_hole_data(inode, start, whence, &xfs_iomap_ops);
Hm. We grab the data map ilock above, then we call
iomap_seek_hole_data, which (eventually) calls xfs_file_iomap_begin,
which tries to grab the data map ilock. We shouldn't be grabbing the
ilock twice, obviously, but on the other hand...
...under the old code, we'd take the ilock and do the whole block map
and page cache scans without ever dropping the ilock. This new iomap
based thing only holds the ilock during ->iomap_begin, which makes me
worry that someone else can wander in and mess with things while we're
looking for holes/data?
--D
FWIW generic/285 blows up with this:
[ 2975.947417] run fstests generic/285 at 2017-06-26 10:13:48
[ 2976.474195] ============================================
[ 2976.474856] WARNING: possible recursive locking detected
[ 2976.475392] 4.12.0-rc6-dgc #2 Tainted: G W
[ 2976.475875] --------------------------------------------
[ 2976.476361] seek_sanity_tes/18280 is trying to acquire lock:
[ 2976.476874] (&xfs_nondir_ilock_class){++++..}, at: [<ffffffffa0143eb7>] xfs_ilock+0x137/0x330 [xfs]
[ 2976.478009]
but task is already holding lock:
[ 2976.479328] (&xfs_nondir_ilock_class){++++..}, at: [<ffffffffa0143eb7>] xfs_ilock+0x137/0x330 [xfs]
[ 2976.480506]
other info that might help us debug this:
[ 2976.481295] Possible unsafe locking scenario:
[ 2976.481973] CPU0
[ 2976.482253] ----
[ 2976.482556] lock(&xfs_nondir_ilock_class);
[ 2976.482960] lock(&xfs_nondir_ilock_class);
[ 2976.483360]
*** DEADLOCK ***
[ 2976.486560] May be due to missing lock nesting notation
[ 2976.487274] 1 lock held by seek_sanity_tes/18280:
[ 2976.487775] #0: (&xfs_nondir_ilock_class){++++..}, at: [<ffffffffa0143eb7>] xfs_ilock+0x137/0x330 [xfs]
[ 2976.489006]
stack backtrace:
[ 2976.489760] CPU: 0 PID: 18280 Comm: seek_sanity_tes Tainted: G W 4.12.0-rc6-dgc #2
[ 2976.491308] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 2976.492907] Call Trace:
[ 2976.493422] dump_stack+0x85/0xc7
[ 2976.494049] __lock_acquire+0x1567/0x15c0
[ 2976.494782] ? _raw_spin_unlock+0x31/0x50
[ 2976.495499] lock_acquire+0xac/0x200
[ 2976.496485] ? lock_acquire+0xac/0x200
[ 2976.497347] ? xfs_ilock+0x137/0x330 [xfs]
[ 2976.497949] ? xfs_ilock_data_map_shared+0x30/0x40 [xfs]
[ 2976.498799] down_read_nested+0x49/0xb0
[ 2976.499414] ? xfs_ilock+0x137/0x330 [xfs]
[ 2976.500153] xfs_ilock+0x137/0x330 [xfs]
[ 2976.503377] xfs_ilock_data_map_shared+0x30/0x40 [xfs]
[ 2976.504281] xfs_file_iomap_begin+0x8e/0xd40 [xfs]
[ 2976.504981] ? xfs_iunlock+0x2ab/0x310 [xfs]
[ 2976.505630] ? xfs_ilock+0x137/0x330 [xfs]
[ 2976.506198] iomap_apply+0x48/0xe0
[ 2976.506760] iomap_seek_hole_data+0xa6/0x100
[ 2976.507510] ? iomap_to_fiemap+0x80/0x80
[ 2976.508163] xfs_seek_hole_data+0x6a/0xb0 [xfs]
[ 2976.508903] xfs_file_llseek+0x1c/0x30 [xfs]
[ 2976.509497] SyS_lseek+0x8d/0xb0
[ 2976.509936] entry_SYSCALL_64_fastpath+0x1f/0xbe
[ 2976.510622] RIP: 0033:0x7f43f2468b67
[ 2976.511162] RSP: 002b:00007ffd07380e58 EFLAGS: 00000202 ORIG_RAX: 0000000000000008
[ 2976.512312] RAX: ffffffffffffffda RBX: 00007f43f2452b20 RCX: 00007f43f2468b67
[ 2976.513448] RDX: 0000000000000003 RSI: 0000000000000000 RDI: 0000000000000004
[ 2976.514769] RBP: 0000000000001011 R08: 0000000000000000 R09: 0000000000000016
[ 2976.515995] R10: 00000000000000c2 R11: 0000000000000202 R12: 00007f43f2452b78
[ 2976.517290] R13: 00007f43f2452b78 R14: 0000000000002710 R15: 0000000000403e26
[ 2976.733289] XFS (pmem1): Unmounting Filesystem
[ 2976.850354] XFS (pmem1): Mounting V4 Filesystem
[ 2976.856274] XFS (pmem1): Ending clean mount
> xfs_iunlock(ip, lock);
>
> - if (error)
> - return error;
> - return offset;
> + if (offset < 0)
> + return offset;
> + return vfs_setpos(file, offset, inode->i_sb->s_maxbytes);
> }
>
> STATIC loff_t
> --
> 2.7.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 5/5] xfs: Switch to iomap for SEEK_HOLE / SEEK_DATA
2017-06-27 0:34 ` Darrick J. Wong
@ 2017-06-27 12:56 ` Dave Chinner
0 siblings, 0 replies; 8+ messages in thread
From: Dave Chinner @ 2017-06-27 12:56 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Andreas Gruenbacher, linux-fsdevel, linux-xfs, linux-ext4,
Christoph Hellwig
On Mon, Jun 26, 2017 at 05:34:35PM -0700, Darrick J. Wong wrote:
> [adding Christoph to cc]
>
> On Mon, Jun 26, 2017 at 04:25:18PM +0200, Andreas Gruenbacher wrote:
> > Switch to the iomap_seek_hole_data vfs helper for implementing lseek
> > SEEK_HOLE / SEEK_DATA. __xfs_seek_hole_data can go away once it's no
> > longer used by the quota code.
> >
> > Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
> > ---
> > fs/xfs/xfs_file.c | 21 +++++----------------
> > 1 file changed, 5 insertions(+), 16 deletions(-)
> >
> > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > index 962dafd..94fe89a 100644
> > --- a/fs/xfs/xfs_file.c
> > +++ b/fs/xfs/xfs_file.c
> > @@ -1131,29 +1131,18 @@ xfs_seek_hole_data(
> > struct xfs_inode *ip = XFS_I(inode);
> > struct xfs_mount *mp = ip->i_mount;
> > uint lock;
> > - loff_t offset, end;
> > - int error = 0;
> > + loff_t offset;
> >
> > if (XFS_FORCED_SHUTDOWN(mp))
> > return -EIO;
> >
> > lock = xfs_ilock_data_map_shared(ip);
> > -
> > - end = i_size_read(inode);
> > - offset = __xfs_seek_hole_data(inode, start, end, whence);
> > - if (offset < 0) {
> > - error = offset;
> > - goto out_unlock;
> > - }
> > -
> > - offset = vfs_setpos(file, offset, inode->i_sb->s_maxbytes);
> > -
> > -out_unlock:
> > + offset = iomap_seek_hole_data(inode, start, whence, &xfs_iomap_ops);
>
> Hm. We grab the data map ilock above, then we call
> iomap_seek_hole_data, which (eventually) calls xfs_file_iomap_begin,
> which tries to grab the data map ilock. We shouldn't be grabbing the
> ilock twice, obviously, but on the other hand...
>
> ...under the old code, we'd take the ilock and do the whole block map
> and page cache scans without ever dropping the ilock.
Which I'm pretty sure I've previously pointed out is broken and
needed fixing (lockdep reports, IIRC), as the lock order is iolock
-> page lock -> ilock.
(yes, I'm using "iolock" as shorthand for inode->i_rwsem)
> This new iomap
> based thing only holds the ilock during ->iomap_begin, which makes me
> worry that someone else can wander in and mess with things while we're
> looking for holes/data?
Locking won't prevent seek races with concurrent modifications from
the perspective of userspace.
i.e. we can lock the inode down, seek to data, unlock it, and before
we get back to userspace something else punches out that data. So by
the time the app gets to use the position set by the seek, there's a
hole where it's being told there *was* data....
-Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2017-06-27 12:56 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-06-26 14:25 [PATCH v3 0/5] lseek SEEK_HOLE / SEEK_DATA fix and switch to iomap Andreas Gruenbacher
2017-06-26 14:25 ` [PATCH v3 1/5] vfs: Add page_cache_seek_hole_data helper Andreas Gruenbacher
2017-06-26 14:25 ` [PATCH v3 2/5] ext4: Switch to page_cache_seek_hole_data Andreas Gruenbacher
2017-06-26 14:25 ` [PATCH v3 3/5] xfs: " Andreas Gruenbacher
2017-06-26 14:25 ` [PATCH v3 4/5] vfs: Add iomap_seek_hole_data helper Andreas Gruenbacher
2017-06-26 14:25 ` [PATCH v3 5/5] xfs: Switch to iomap for SEEK_HOLE / SEEK_DATA Andreas Gruenbacher
2017-06-27 0:34 ` Darrick J. Wong
2017-06-27 12:56 ` Dave Chinner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).