All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Liu <jeff.liu@oracle.com>
To: Zheng Liu <gnehzuil.liu@gmail.com>
Cc: linux-ext4@vger.kernel.org, tytso@mit.edu, hughd@google.com,
	xiaoqiangnk@gmail.com, achender@linux.vnet.ibm.com,
	lczerner@redhat.com, Zheng Liu <wenqing.lz@taobao.com>
Subject: Re: [PATCH 8/8 v3] ext4: introduce lseek SEEK_DATA/SEEK_HOLE support
Date: Sat, 27 Oct 2012 18:05:59 +0800	[thread overview]
Message-ID: <508BB207.1010506@oracle.com> (raw)
In-Reply-To: <1351257825-3701-9-git-send-email-wenqing.lz@taobao.com>

Hi Zheng,

On 10/26/2012 09:23 PM, Zheng Liu wrote:
> From: Zheng Liu <wenqing.lz@taobao.com>
>
> This patch makes ext4 really support SEEK_DATA/SEEK_HOLE flags.  Block-mapped
> and extent-mapped files are fully implemented together because ext4_map_blocks
> hides this differences.
>
> After applying this patch, it will cause a failure in xfstest #285 when the file
> is block-mapped due to block-mapped file isn't support fallocate(2).
>
> I had tried to use ext4_ext_walk_space() to retrieve the offset for a
> extent-mapped file.  But finally I decide to keep using ext4_map_blocks() to
> support SEEK_DATA/SEEK_HOLE because ext4_map_blocks() can hide the difference
> between block-mapped file and extent-mapped file.  Moreover, in next step,
> extent status tree will track all extent status, and we can get all mappings
> from this tree.  So I think that using ext4_map_blocks() is a better choice.
>
> CC: "Theodore Ts'o" <tytso@mit.edu>
> CC: Hugh Dickins <hughd@google.com>
> Signed-off-by: Jie Liu <jeff.liu@oracle.com>
> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
> ---
> v3 <- v2:
>   - determine an unwritten extent as a data/hole according to the whether there
>     has some data in page cache or not.
>   - remove ext4_seek_data_hole(), and call ext4_seek_data/hole() directly.
>   - fix some bugs
>
> Please apply this patch before testing this patch using xfstest #285 due to
> there is a bug in xfstest (http://patchwork.xfs.org/patch/4276/).
>
> The function ext4_find_unwritten_pgoff() is inspired by Jeff's work for xfs
> (d126d43f631f996daeee5006714fed914be32368).  Thus, I add a Signed-off-by for
> his crediting.
>
> Jeff, I really appreicate if you could allow me to add Signed-off-by in this
> patch.  Please let me know if you think it is not well.  Thanks!
I'm fine. :)
Finally, maybe I'll try to tweak that routine with page cache lookup for 
unwritten extents to be a
VFS helper so that Btrfs can make use of it as well, thank you!

-Jeff
>
>   fs/ext4/file.c | 334 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 332 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index bf3966b..2f5759e 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -24,6 +24,7 @@
>   #include <linux/mount.h>
>   #include <linux/path.h>
>   #include <linux/quotaops.h>
> +#include <linux/pagevec.h>
>   #include "ext4.h"
>   #include "ext4_jbd2.h"
>   #include "xattr.h"
> @@ -286,6 +287,324 @@ static int ext4_file_open(struct inode * inode, struct file * filp)
>   }
>   
>   /*
> + * Here we use ext4_map_blocks() to get a block mapping for a extent-based
> + * file rather than ext4_ext_walk_space() because we can introduce
> + * SEEK_DATA/SEEK_HOLE for block-mapped and extent-mapped file at the same
> + * function.  When extent status tree has been fully implemented, it will
> + * track all extent status for a file and we can directly use it to
> + * retrieve the offset for SEEK_DATA/SEEK_HOLE.
> + */
> +
> +/*
> + * When we retrieve the offset for SEEK_DATA/SEEK_HOLE, we would need to
> + * lookup page cache to check whether or not there has some data between
> + * [startoff, endoff] because, if this range contains an unwritten extent,
> + * we determine this extent as a data or a hole according to whether the
> + * page cache has data or not.
> + */
> +static int ext4_find_unwritten_pgoff(struct inode *inode,
> +				     int origin,
> +				     struct ext4_map_blocks *map,
> +				     loff_t *offset)
> +{
> +	struct pagevec pvec;
> +	unsigned int blkbits;
> +	pgoff_t index;
> +	pgoff_t end;
> +	loff_t endoff;
> +	loff_t startoff;
> +	loff_t lastoff;
> +	int found = 0;
> +
> +	blkbits = inode->i_sb->s_blocksize_bits;
> +	startoff = *offset;
> +	lastoff = startoff;
> +	endoff = (map->m_lblk + map->m_len) << blkbits;
> +
> +	index = startoff >> PAGE_CACHE_SHIFT;
> +	end = endoff >> PAGE_CACHE_SHIFT;
> +
> +	pagevec_init(&pvec, 0);
> +	do {
> +		int i, num;
> +		unsigned long nr_pages;
> +
> +		num = min_t(pgoff_t, end - index, PAGEVEC_SIZE);
> +		nr_pages = pagevec_lookup(&pvec, inode->i_mapping, index,
> +					  (pgoff_t)num);
> +		if (nr_pages == 0) {
> +			if (origin == SEEK_DATA)
> +				break;
> +
> +			BUG_ON(origin != SEEK_HOLE);
> +			/*
> +			 * If this is the first time to go into the loop and
> +			 * offset is not beyond the end offset, it will be a
> +			 * hole at this offset
> +			 */
> +			if (lastoff == startoff || lastoff < endoff)
> +				found = 1;
> +			break;
> +		}
> +
> +		/*
> +		 * If this is the first time to go into the loop and
> +		 * offset is smaller than the first page offset, it will be a
> +		 * hole at this offset.
> +		 */
> +		if (lastoff == startoff && origin == SEEK_HOLE &&
> +		    lastoff < page_offset(pvec.pages[0])) {
> +			found = 1;
> +			break;
> +		}
> +
> +		for (i = 0; i < nr_pages; i++) {
> +			struct page *page = pvec.pages[i];
> +			struct buffer_head *bh, *head;
> +
> +			/*
> +			 * If the current offset is not beyond the end of given
> +			 * range, it will be a hole.
> +			 */
> +			if (lastoff < endoff && origin == SEEK_HOLE &&
> +			    page->index > end) {
> +				found = 1;
> +				*offset = lastoff;
> +				goto out;
> +			}
> +
> +			lock_page(page);
> +
> +			if (unlikely(page->mapping != inode->i_mapping)) {
> +				unlock_page(page);
> +				continue;
> +			}
> +
> +			if (!page_has_buffers(page)) {
> +				unlock_page(page);
> +				continue;
> +			}
> +
> +			if (page_has_buffers(page)) {
> +				lastoff = page_offset(page);
> +				bh = head = page_buffers(page);
> +				do {
> +					if (buffer_uptodate(bh) ||
> +					    buffer_unwritten(bh)) {
> +						if (origin == SEEK_DATA)
> +							found = 1;
> +					} else {
> +						if (origin == SEEK_HOLE)
> +							found = 1;
> +					}
> +					if (found) {
> +						*offset = max_t(loff_t,
> +							startoff, lastoff);
> +						unlock_page(page);
> +						goto out;
> +					}
> +					lastoff += bh->b_size;
> +					bh = bh->b_this_page;
> +				} while (bh != head);
> +			}
> +
> +			lastoff = page_offset(page) + PAGE_SIZE;
> +			unlock_page(page);
> +		}
> +
> +		/*
> +		 * The no. of pages is less than our desired, that would be a
> +		 * hole in there.
> +		 */
> +		if (nr_pages < num && origin == SEEK_HOLE) {
> +			found = 1;
> +			*offset = lastoff;
> +			break;
> +		}
> +
> +		index = pvec.pages[i - 1]->index + 1;
> +		pagevec_release(&pvec);
> +	} while (index <= end);
> +
> +out:
> +	pagevec_release(&pvec);
> +	return found;
> +}
> +
> +/*
> + * ext4_seek_data() retrieves the offset for SEEK_DATA.
> + */
> +static loff_t ext4_seek_data(struct file *file, loff_t offset, loff_t maxsize)
> +{
> +	struct inode *inode = file->f_mapping->host;
> +	struct ext4_map_blocks map;
> +	struct extent_status es;
> +	ext4_lblk_t start, last, end;
> +	loff_t dataoff, isize;
> +	int blkbits;
> +	int ret = 0;
> +
> +	mutex_lock(&inode->i_mutex);
> +
> +	isize = i_size_read(inode);
> +	if (offset >= isize) {
> +		mutex_unlock(&inode->i_mutex);
> +		return -ENXIO;
> +	}
> +
> +	blkbits = inode->i_sb->s_blocksize_bits;
> +	start = offset >> blkbits;
> +	last = start;
> +	end = isize >> blkbits;
> +	dataoff = offset;
> +
> +	do {
> +		map.m_lblk = last;
> +		map.m_len = end - last + 1;
> +		ret = ext4_map_blocks(NULL, inode, &map, 0);
> +		if (ret > 0 && !(map.m_flags & EXT4_MAP_UNWRITTEN)) {
> +			if (last != start)
> +				dataoff = last << blkbits;
> +			break;
> +		}
> +
> +		/*
> +		 * If there is a delay extent at this offset,
> +		 * it will be as a data.
> +		 */
> +		es.start = last;
> +		(void)ext4_es_find_extent(inode, &es);
> +		if (last >= es.start &&
> +		    last < es.start + es.len) {
> +			if (last != start)
> +				dataoff = last << blkbits;
> +			break;
> +		}
> +
> +		/*
> +		 * If there is a unwritten extent at this offset,
> +		 * it will be as a data or a hole according to page
> +		 * cache that has data or not.
> +		 */
> +		if (map.m_flags & EXT4_MAP_UNWRITTEN) {
> +			int unwritten;
> +			unwritten = ext4_find_unwritten_pgoff(inode, SEEK_DATA,
> +							      &map, &dataoff);
> +			if (unwritten)
> +				break;
> +		}
> +
> +		last++;
> +		dataoff = last << blkbits;
> +	} while (last <= end);
> +
> +	mutex_unlock(&inode->i_mutex);
> +
> +	if (dataoff > isize)
> +		return -ENXIO;
> +
> +	if (dataoff < 0 && !(file->f_mode & FMODE_UNSIGNED_OFFSET))
> +		return -EINVAL;
> +	if (dataoff > maxsize)
> +		return -EINVAL;
> +
> +	if (dataoff != file->f_pos) {
> +		file->f_pos = dataoff;
> +		file->f_version = 0;
> +	}
> +
> +	return dataoff;
> +}
> +
> +/*
> + * ext4_seek_hole() retrieves the offset for SEEK_HOLE.
> + */
> +static loff_t ext4_seek_hole(struct file *file, loff_t offset, loff_t maxsize)
> +{
> +	struct inode *inode = file->f_mapping->host;
> +	struct ext4_map_blocks map;
> +	struct extent_status es;
> +	ext4_lblk_t start, last, end;
> +	loff_t holeoff, isize;
> +	int blkbits;
> +	int ret = 0;
> +
> +	mutex_lock(&inode->i_mutex);
> +
> +	isize = i_size_read(inode);
> +	if (offset >= isize) {
> +		mutex_unlock(&inode->i_mutex);
> +		return -ENXIO;
> +	}
> +
> +	blkbits = inode->i_sb->s_blocksize_bits;
> +	start = offset >> blkbits;
> +	last = start;
> +	end = isize >> blkbits;
> +	holeoff = offset;
> +
> +	do {
> +		map.m_lblk = last;
> +		map.m_len = end - last + 1;
> +		ret = ext4_map_blocks(NULL, inode, &map, 0);
> +		if (ret > 0 && !(map.m_flags & EXT4_MAP_UNWRITTEN)) {
> +			last += ret;
> +			holeoff = last << blkbits;
> +			continue;
> +		}
> +
> +		/*
> +		 * If there is a delay extent at this offset,
> +		 * we will skip this extent.
> +		 */
> +		es.start = last;
> +		(void)ext4_es_find_extent(inode, &es);
> +		if (last >= es.start &&
> +		    last < es.start + es.len) {
> +			last = es.start + es.len;
> +			holeoff = last << blkbits;
> +			continue;
> +		}
> +
> +		/*
> +		 * If there is a unwritten extent at this offset,
> +		 * it will be as a data or a hole according to page
> +		 * cache that has data or not.
> +		 */
> +		if (map.m_flags & EXT4_MAP_UNWRITTEN) {
> +			int unwritten;
> +			unwritten = ext4_find_unwritten_pgoff(inode, SEEK_HOLE,
> +							      &map, &holeoff);
> +			if (!unwritten) {
> +				last += ret;
> +				holeoff = last << blkbits;
> +				continue;
> +			}
> +		}
> +
> +		/* find a hole */
> +		break;
> +	} while (last <= end);
> +
> +	mutex_unlock(&inode->i_mutex);
> +
> +	if (holeoff > isize)
> +		holeoff = isize;
> +
> +	if (holeoff < 0 && !(file->f_mode & FMODE_UNSIGNED_OFFSET))
> +		return -EINVAL;
> +	if (holeoff > maxsize)
> +		return -EINVAL;
> +
> +	if (holeoff != file->f_pos) {
> +		file->f_pos = holeoff;
> +		file->f_version = 0;
> +	}
> +
> +	return holeoff;
> +}
> +
> +/*
>    * ext4_llseek() handles both block-mapped and extent-mapped maxbytes values
>    * by calling generic_file_llseek_size() with the appropriate maxbytes
>    * value for each.
> @@ -300,8 +619,19 @@ loff_t ext4_llseek(struct file *file, loff_t offset, int origin)
>   	else
>   		maxbytes = inode->i_sb->s_maxbytes;
>   
> -	return generic_file_llseek_size(file, offset, origin,
> -					maxbytes, i_size_read(inode));
> +	switch (origin) {
> +	case SEEK_SET:
> +	case SEEK_CUR:
> +	case SEEK_END:
> +		return generic_file_llseek_size(file, offset, origin,
> +						maxbytes, i_size_read(inode));
> +	case SEEK_DATA:
> +		return ext4_seek_data(file, offset, maxbytes);
> +	case SEEK_HOLE:
> +		return ext4_seek_hole(file, offset, maxbytes);
> +	}
> +
> +	return -EINVAL;
>   }
>   
>   const struct file_operations ext4_file_operations = {


  reply	other threads:[~2012-10-27 10:16 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-26 13:23 [PATCH 0/8 v3] ext4: extent status tree (step 1) Zheng Liu
2012-10-26 13:23 ` [PATCH 1/8 v3] ext4: add two structures supporting extent status tree Zheng Liu
2012-10-26 13:23 ` [PATCH 2/8 v3] ext4: add operations on " Zheng Liu
2012-11-08 23:21   ` Theodore Ts'o
2012-11-09  2:22     ` Zheng Liu
2012-10-26 13:23 ` [PATCH 3/8 v3] ext4: initialize " Zheng Liu
2012-10-26 13:23 ` [PATCH 4/8 v3] ext4: let ext4 maintain " Zheng Liu
2012-10-26 13:23 ` [PATCH 5/8 v3] ext4: add some tracepoints in " Zheng Liu
2012-10-26 13:23 ` [PATCH 6/8 v3] ext4: reimplement ext4_find_delay_alloc_range on " Zheng Liu
2012-10-26 13:23 ` [PATCH 7/8 v3] ext4: reimplement fiemap " Zheng Liu
2012-10-26 13:23 ` [PATCH 8/8 v3] ext4: introduce lseek SEEK_DATA/SEEK_HOLE support Zheng Liu
2012-10-27 10:05   ` Jeff Liu [this message]
2012-10-27 15:30     ` Zheng Liu
2012-11-19  3:17 ` [PATCH 0/8 v3] ext4: extent status tree (step 1) Theodore Ts'o
2012-11-19  5:28   ` Zheng Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=508BB207.1010506@oracle.com \
    --to=jeff.liu@oracle.com \
    --cc=achender@linux.vnet.ibm.com \
    --cc=gnehzuil.liu@gmail.com \
    --cc=hughd@google.com \
    --cc=lczerner@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=wenqing.lz@taobao.com \
    --cc=xiaoqiangnk@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.