* [PATCH v1] ext4:Make FIEMAP and delayed allocation play well together.
@ 2011-02-24 16:15 Yongqiang Yang
2011-02-24 17:17 ` Eric Sandeen
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Yongqiang Yang @ 2011-02-24 16:15 UTC (permalink / raw)
To: linux-ext4; +Cc: sandeen, Yongqiang Yang
1. lookup dirty pages with specified range in pagecache.
If no page is got, then there is no delayed-extent and
return with EXT_CONTINUE.
2. find the 1st mapped buffer,
3. check if the mapped buffer is both in the request range
and a delayed buffer. If not, there is no delayed-extent,
then return.
4. a delayed-extent is found, the extent will be collected.
Reported by Chris Mason <chris.mason@oracle.com>:
We've had reports on btrfs that cp is giving us files full of zeros
instead of actually copying them. It was tracked down to a bug with
the btrfs fiemap implementation where it was returning holes for
delalloc ranges.
Newer versions of cp are trusting fiemap to tell it where the holes
are, which does seem like a pretty neat trick.
I decided to give xfs and ext4 a shot with a few tests cases too, xfs
passed with all the ones btrfs was getting wrong, and ext4 got the basic
delalloc case right.
$ mkfs.ext4 /dev/xxx
$ mount /dev/xxx /mnt
$ dd if=/dev/zero of=/mnt/foo bs=1M count=1
$ fiemap-test foo
ext: 0 logical: [ 0.. 255] phys: 0.. 255
flags: 0x007 tot: 256
Horray! But once we throw a hole in, things go bad:
$ mkfs.ext4 /dev/xxx
$ mount /dev/xxx /mnt
$ dd if=/dev/zero of=/mnt/foo bs=1M count=1 seek=1
$ fiemap-test foo
< no output >
We've got a delalloc extent after the hole and ext4 fiemap didn't find
it. If I run sync to kick the delalloc out:
$sync
$ fiemap-test foo
ext: 0 logical: [ 256.. 511] phys: 34048.. 34303
flags: 0x001 tot: 256
fiemap-test is sitting in my /usr/local/bin, and I have no idea how it
got there. It's full of pretty comments so I know it isn't mine, but
you can grab it here:
http://oss.oracle.com/~mason/fiemap-test.c
xfsqa has a fiemap program too.
After Fix, test results are as follows:
ext: 0 logical: [ 256.. 511] phys: 0.. 255
flags: 0x007 tot: 256
ext: 0 logical: [ 256.. 511] phys: 33280.. 33535
flags: 0x001 tot: 256
$ mkfs.ext4 /dev/xxx
$ mount /dev/xxx /mnt
$ dd if=/dev/zero of=/mnt/foo bs=1M count=1 seek=1
$ sync
$ dd if=/dev/zero of=/mnt/foo bs=1M count=1 seek=3
$ dd if=/dev/zero of=/mnt/foo bs=1M count=1 seek=5
$ fiemap-test foo
ext: 0 logical: [ 256.. 511] phys: 33280.. 33535
flags: 0x000 tot: 256
ext: 1 logical: [ 768.. 1023] phys: 0.. 255
flags: 0x006 tot: 256
ext: 2 logical: [ 1280.. 1535] phys: 0.. 255
flags: 0x007 tot: 256
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
---
fs/ext4/extents.c | 183 +++++++++++++++++++++++++++++++++++++++++-----------
1 files changed, 144 insertions(+), 39 deletions(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index ccce8a7..710d46e 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3778,42 +3778,155 @@ int ext4_convert_unwritten_extents(struct inode *inode, loff_t offset,
/*
* Callback function called for each extent to gather FIEMAP information.
*/
+#define NR_PAGES_PER_TIME 128
+
static int ext4_ext_fiemap_cb(struct inode *inode, struct ext4_ext_path *path,
struct ext4_ext_cache *newex, struct ext4_extent *ex,
void *data)
{
+ struct page *pages[NR_PAGES_PER_TIME];
+ __u64 logical;
+ __u64 physical;
+ __u64 length;
+
+ loff_t size = i_size_read(inode);
+ pgoff_t index = 0;
+ ext4_lblk_t end = 0;
+
+ __u32 flags = 0;
+ int ret = 0;
struct fiemap_extent_info *fieinfo = data;
+
unsigned char blksize_bits = inode->i_sb->s_blocksize_bits;
- __u64 logical;
- __u64 physical;
- __u64 length;
- __u32 flags = 0;
- int error;
logical = (__u64)newex->ec_block << blksize_bits;
if (newex->ec_start == 0) {
+ /*
+ * No extent in extent-tree contains block @newex->ec_start,
+ * then the block may stay in 1)a hole or 2)delayed-extent.
+ *
+ * Holes or delayed-extents are processed as follows.
+ * 1. lookup dirty pages with specified range in pagecache.
+ * If no page is got, then there is no delayed-extent and
+ * return with EXT_CONTINUE.
+ * 2. find the 1st mapped buffer,
+ * 3. check if the mapped buffer is both in the request range
+ * and a delayed buffer. If not, there is no delayed-extent,
+ * then return.
+ * 4. a delayed-extent is found, the extent will be collected.
+ */
+ pgoff_t last_offset;
pgoff_t offset;
- struct page *page;
struct buffer_head *bh = NULL;
+ struct buffer_head *head = NULL;
+ unsigned int nr_pages = NR_PAGES_PER_TIME;
offset = logical >> PAGE_SHIFT;
- page = find_get_page(inode->i_mapping, offset);
- if (!page || !page_has_buffers(page))
- return EXT_CONTINUE;
-
- bh = page_buffers(page);
-
- if (!bh)
- return EXT_CONTINUE;
-
- if (buffer_delay(bh)) {
- flags |= FIEMAP_EXTENT_DELALLOC;
- page_cache_release(page);
+repeat:
+ index = 0;
+ last_offset = offset;
+ head = NULL;
+ ret = find_get_pages_tag(inode->i_mapping, &offset,
+ PAGECACHE_TAG_DIRTY, nr_pages, pages);
+
+ if (!(flags & FIEMAP_EXTENT_DELALLOC)) {
+ /* First time, try to find a mapped buffer. */
+ if (ret == 0)
+ /* just a hole. */
+ return EXT_CONTINUE;
+
+ /* Try to find the 1st mapped buffer. */
+ end = ((__u64)pages[0]->index << PAGE_SHIFT) >> blksize_bits;
+ for (index = 0; index < ret; index ++) {
+ if (!page_has_buffers(pages[index]))
+ goto out;
+ head = page_buffers(pages[index]);
+ if (!head)
+ goto out;
+
+ bh = head;
+ do {
+ if (buffer_mapped(bh)) {
+ /* get the 1st mapped buffer. */
+ if (end > newex->ec_block + newex->ec_len)
+ /* The buffer is out of the request range. */
+ goto out;
+ goto found_mapped_buffer;
+ }
+ bh = bh->b_this_page;
+ end ++;
+ } while (bh != head);
+ }
+
+ if (index == ret)
+ /* No mapped buffer found? */
+ BUG();
} else {
- page_cache_release(page);
- return EXT_CONTINUE;
+ /*Find contiguous delayed buffers. */
+ if (ret > 0 && pages[0]->index == last_offset)
+ head = page_buffers(pages[0]);
+ bh = head;
+ }
+
+found_mapped_buffer:
+ if (bh && buffer_delay(bh)) {
+ /* 1st or contiguous delayed buffer found. */
+ if (!(flags & FIEMAP_EXTENT_DELALLOC)) {
+ /* 1st delayed buffer found, record the start of extent. */
+ flags |= FIEMAP_EXTENT_DELALLOC;
+ newex->ec_block = end;
+ logical = (__u64)end << blksize_bits;
+ }
+ /* Find contiguous delayed buffers. */
+ do {
+ if (!buffer_delay(bh))
+ goto found_delayed_extent;
+ bh = bh->b_this_page;
+ end ++;
+ } while (bh != head);
+
+ for (index ++; index < ret; index ++) {
+ if (!page_has_buffers(pages[index])) {
+ bh = NULL;
+ break;
+ }
+ head = page_buffers(pages[index]);
+ if (!head) {
+ bh = NULL;
+ break;
+ }
+ if (pages[index]->index != pages[0]->index + index) {
+ /* Blocks are not contiguous. */
+ bh = NULL;
+ break;
+ }
+ bh = head;
+ do {
+ if (!buffer_delay(bh))
+ /* Delayed-extent ends. */
+ goto found_delayed_extent;
+ bh = bh->b_this_page;
+ end ++;
+ } while (bh != head);
+ }
+ } else if (!(flags & FIEMAP_EXTENT_DELALLOC))
+ /* a hole found. */
+ goto out;
+
+found_delayed_extent:
+ newex->ec_len = min(end - newex->ec_block,
+ (ext4_lblk_t)EXT_INIT_MAX_LEN);
+ if (ret == nr_pages && newex->ec_len < EXT_INIT_MAX_LEN
+ && bh && buffer_delay(bh)) {
+ /* Have not collected an extent and continue. */
+ for (index =0; index < ret; index ++)
+ page_cache_release(pages[index]);
+ goto repeat;
}
+
+ for (index =0; index < ret; index ++)
+ page_cache_release(pages[index]);
}
physical = (__u64)newex->ec_start << blksize_bits;
@@ -3822,32 +3935,24 @@ static int ext4_ext_fiemap_cb(struct inode *inode, struct ext4_ext_path *path,
if (ex && ext4_ext_is_uninitialized(ex))
flags |= FIEMAP_EXTENT_UNWRITTEN;
- /*
- * If this extent reaches EXT_MAX_BLOCK, it must be last.
- *
- * Or if ext4_ext_next_allocated_block is EXT_MAX_BLOCK,
- * this also indicates no more allocated blocks.
- *
- * XXX this might miss a single-block extent at EXT_MAX_BLOCK
- */
- if (ext4_ext_next_allocated_block(path) == EXT_MAX_BLOCK ||
- newex->ec_block + newex->ec_len - 1 == EXT_MAX_BLOCK) {
- loff_t size = i_size_read(inode);
+ if (logical + length >= size) {
loff_t bs = EXT4_BLOCK_SIZE(inode->i_sb);
-
- flags |= FIEMAP_EXTENT_LAST;
- if ((flags & FIEMAP_EXTENT_DELALLOC) &&
- logical+length > size)
+ flags |= FIEMAP_EXTENT_LAST;
+ if (flags & FIEMAP_EXTENT_DELALLOC)
length = (size - logical + bs - 1) & ~(bs-1);
}
- error = fiemap_fill_next_extent(fieinfo, logical, physical,
+ ret = fiemap_fill_next_extent(fieinfo, logical, physical,
length, flags);
- if (error < 0)
- return error;
- if (error == 1)
+ if (ret < 0)
+ return ret;
+ if (ret == 1)
return EXT_BREAK;
+out:
+ for (index =0; index < ret; index ++)
+ page_cache_release(pages[index]);
+
return EXT_CONTINUE;
}
--
1.7.4
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v1] ext4:Make FIEMAP and delayed allocation play well together.
2011-02-24 16:15 [PATCH v1] ext4:Make FIEMAP and delayed allocation play well together Yongqiang Yang
@ 2011-02-24 17:17 ` Eric Sandeen
2011-02-24 21:14 ` Andreas Dilger
2011-02-24 22:23 ` Eric Sandeen
2 siblings, 0 replies; 5+ messages in thread
From: Eric Sandeen @ 2011-02-24 17:17 UTC (permalink / raw)
To: Yongqiang Yang; +Cc: linux-ext4
On 02/24/2011 10:15 AM, Yongqiang Yang wrote:
> 1. lookup dirty pages with specified range in pagecache.
> If no page is got, then there is no delayed-extent and
> return with EXT_CONTINUE.
> 2. find the 1st mapped buffer,
> 3. check if the mapped buffer is both in the request range
> and a delayed buffer. If not, there is no delayed-extent,
> then return.
> 4. a delayed-extent is found, the extent will be collected.
...
looking better, I think - please run this through scripts/checkpatch.pl,
there are a lot of whitespace issues etc.
> Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
> ---
> fs/ext4/extents.c | 183 +++++++++++++++++++++++++++++++++++++++++-----------
> 1 files changed, 144 insertions(+), 39 deletions(-)
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index ccce8a7..710d46e 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -3778,42 +3778,155 @@ int ext4_convert_unwritten_extents(struct inode *inode, loff_t offset,
> /*
> * Callback function called for each extent to gather FIEMAP information.
> */
> +#define NR_PAGES_PER_TIME 128
> +
> static int ext4_ext_fiemap_cb(struct inode *inode, struct ext4_ext_path *path,
> struct ext4_ext_cache *newex, struct ext4_extent *ex,
> void *data)
> {
> + struct page *pages[NR_PAGES_PER_TIME];
This is going to be a problem... stack usage of this function has jumped
from 96 bytes to 1K with this change. I don't think this
can live on the stack.
I'll take some time to read over the rest of this, though, thanks.
It does pass xfstests 225, which exercises fiemap, but with the sync flag.
-Eric
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v1] ext4:Make FIEMAP and delayed allocation play well together.
2011-02-24 16:15 [PATCH v1] ext4:Make FIEMAP and delayed allocation play well together Yongqiang Yang
2011-02-24 17:17 ` Eric Sandeen
@ 2011-02-24 21:14 ` Andreas Dilger
2011-02-25 8:05 ` Yongqiang Yang
2011-02-24 22:23 ` Eric Sandeen
2 siblings, 1 reply; 5+ messages in thread
From: Andreas Dilger @ 2011-02-24 21:14 UTC (permalink / raw)
To: Yongqiang Yang; +Cc: linux-ext4, sandeen
On 2011-02-24, at 9:15 AM, Yongqiang Yang wrote:
> static int ext4_ext_fiemap_cb(struct inode *inode, struct ext4_ext_path *path,
> struct ext4_ext_cache *newex, struct ext4_extent *ex,
> void *data)
> {
> + struct page *pages[NR_PAGES_PER_TIME];
I agree with Eric that this cannot live on the stack (or alternately that we need better stack handling in the kernel :-). Fortunately, the kmalloc() of this data only needs to happen in rare cases, so it shouldn't impact performance very much. It might be possible to cache this buffer in newex, to limit the allocation to once per file, instead of once per hole, but I'm not sure it is worthwhile.
The benefit of kmalloc() is that it could be a full page or two, and allow far fewer calls into pagevec_lookup_tag(), which I suspect is more of a performance win than the cost of calling kmalloc().
> + __u64 logical;
> + __u64 physical;
> + __u64 length;
> +
> + loff_t size = i_size_read(inode);
> + pgoff_t index = 0;
> + ext4_lblk_t end = 0;
> +
> + __u32 flags = 0;
> + int ret = 0;
(style) please align variable declarations in some consistent manner
> struct fiemap_extent_info *fieinfo = data;
> +
> unsigned char blksize_bits = inode->i_sb->s_blocksize_bits;
Many of these are specific to the "if (newex->ec_start == 0)" context, so should be allocated there to make this more clear.
(style) there shouldn't be blank lines between the variable declarations. There are also many other CodingStyle violations that should be cleaned up.
> if (newex->ec_start == 0) {
> + /*
> + * No extent in extent-tree contains block @newex->ec_start,
> + * then the block may stay in 1)a hole or 2)delayed-extent.
> + *
> + * Holes or delayed-extents are processed as follows.
> + * 1. lookup dirty pages with specified range in pagecache.
> + * If no page is got, then there is no delayed-extent and
> + * return with EXT_CONTINUE.
> + * 2. find the 1st mapped buffer,
> + * 3. check if the mapped buffer is both in the request range
> + * and a delayed buffer. If not, there is no delayed-extent,
> + * then return.
> + * 4. a delayed-extent is found, the extent will be collected.
> + */
Good comment.
> + pgoff_t last_offset;
> pgoff_t offset;
> struct buffer_head *bh = NULL;
> + struct buffer_head *head = NULL;
> + unsigned int nr_pages = NR_PAGES_PER_TIME;
>
> offset = logical >> PAGE_SHIFT;
>
> +repeat:
> + index = 0;
> + last_offset = offset;
> + head = NULL;
> + ret = find_get_pages_tag(inode->i_mapping, &offset,
> + PAGECACHE_TAG_DIRTY, nr_pages, pages);
> +
> + if (!(flags & FIEMAP_EXTENT_DELALLOC)) {
> + /* First time, try to find a mapped buffer. */
> + if (ret == 0)
> + /* just a hole. */
> + return EXT_CONTINUE;
This could contain the "out:" label, and the page_cache_release() loop would be skipped because ret == 0. That would avoid the need to use the pages[] and index values outside of the scope of the "if (ec_start == 0)" clause, like:
if (ret == 0) {
out:
/* just a hole. */
for (index = 0; index < ret; index++)
page_cache_release(pages[index]);
return EXT_CONTINUE;
}
> + /* Try to find the 1st mapped buffer. */
> + end = ((__u64)pages[0]->index << PAGE_SHIFT) >> blksize_bits;
> + for (index = 0; index < ret; index ++) {
> + if (!page_has_buffers(pages[index]))
> + goto out;
> + head = page_buffers(pages[index]);
> + if (!head)
> + goto out;
> +
> + bh = head;
> + do {
> + if (buffer_mapped(bh)) {
> + /* get the 1st mapped buffer. */
> + if (end > newex->ec_block + newex->ec_len)
> + /* The buffer is out of the request range. */
> + goto out;
> + goto found_mapped_buffer;
> + }
> + bh = bh->b_this_page;
> + end ++;
> + } while (bh != head);
> + }
> +
> + if (index == ret)
> + /* No mapped buffer found? */
> + BUG();
Hmm, this should be BUG_ON(index == ret), but even so I'm not so fond of BUG() as a form of error handling. I'd rather just return at this point without an extent.
> } else {
> + /*Find contiguous delayed buffers. */
> + if (ret > 0 && pages[0]->index == last_offset)
> + head = page_buffers(pages[0]);
> + bh = head;
> + }
> +
> +found_mapped_buffer:
> + if (bh && buffer_delay(bh)) {
> + /* 1st or contiguous delayed buffer found. */
> + if (!(flags & FIEMAP_EXTENT_DELALLOC)) {
> + /* 1st delayed buffer found, record the start of extent. */
> + flags |= FIEMAP_EXTENT_DELALLOC;
> + newex->ec_block = end;
> + logical = (__u64)end << blksize_bits;
> + }
> + /* Find contiguous delayed buffers. */
> + do {
> + if (!buffer_delay(bh))
> + goto found_delayed_extent;
> + bh = bh->b_this_page;
> + end ++;
> + } while (bh != head);
> +
> + for (index ++; index < ret; index ++) {
> + if (!page_has_buffers(pages[index])) {
> + bh = NULL;
> + break;
> + }
> + head = page_buffers(pages[index]);
> + if (!head) {
> + bh = NULL;
> + break;
> + }
> + if (pages[index]->index != pages[0]->index + index) {
> + /* Blocks are not contiguous. */
> + bh = NULL;
> + break;
> + }
> + bh = head;
> + do {
> + if (!buffer_delay(bh))
> + /* Delayed-extent ends. */
> + goto found_delayed_extent;
> + bh = bh->b_this_page;
> + end ++;
> + } while (bh != head);
> + }
> + } else if (!(flags & FIEMAP_EXTENT_DELALLOC))
> + /* a hole found. */
> + goto out;
> +
> +found_delayed_extent:
> + newex->ec_len = min(end - newex->ec_block,
> + (ext4_lblk_t)EXT_INIT_MAX_LEN);
> + if (ret == nr_pages && newex->ec_len < EXT_INIT_MAX_LEN
> + && bh && buffer_delay(bh)) {
> + /* Have not collected an extent and continue. */
> + for (index =0; index < ret; index ++)
> + page_cache_release(pages[index]);
> + goto repeat;
> }
> +
> + for (index =0; index < ret; index ++)
> + page_cache_release(pages[index]);
> }
>
> physical = (__u64)newex->ec_start << blksize_bits;
> @@ -3822,32 +3935,24 @@ static int ext4_ext_fiemap_cb(struct inode *inode, struct ext4_ext_path *path,
> if (ex && ext4_ext_is_uninitialized(ex))
> flags |= FIEMAP_EXTENT_UNWRITTEN;
>
>
> + if (logical + length >= size) {
> loff_t bs = EXT4_BLOCK_SIZE(inode->i_sb);
> + flags |= FIEMAP_EXTENT_LAST;
> + if (flags & FIEMAP_EXTENT_DELALLOC)
> length = (size - logical + bs - 1) & ~(bs-1);
This should probably set "physical = 0" just so that we don't return random garbage in that field.
> }
>
> - error = fiemap_fill_next_extent(fieinfo, logical, physical,
> + ret = fiemap_fill_next_extent(fieinfo, logical, physical,
> length, flags);
> - if (error < 0)
> - return error;
> - if (error == 1)
> + if (ret < 0)
> + return ret;
> + if (ret == 1)
> return EXT_BREAK;
>
> +out:
> + for (index =0; index < ret; index ++)
> + page_cache_release(pages[index]);
> +
> return EXT_CONTINUE;
> }
>
> --
> 1.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
Cheers, Andreas
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v1] ext4:Make FIEMAP and delayed allocation play well together.
2011-02-24 16:15 [PATCH v1] ext4:Make FIEMAP and delayed allocation play well together Yongqiang Yang
2011-02-24 17:17 ` Eric Sandeen
2011-02-24 21:14 ` Andreas Dilger
@ 2011-02-24 22:23 ` Eric Sandeen
2 siblings, 0 replies; 5+ messages in thread
From: Eric Sandeen @ 2011-02-24 22:23 UTC (permalink / raw)
To: Yongqiang Yang; +Cc: linux-ext4
On 02/24/2011 10:15 AM, Yongqiang Yang wrote:
> 1. lookup dirty pages with specified range in pagecache.
> If no page is got, then there is no delayed-extent and
> return with EXT_CONTINUE.
> 2. find the 1st mapped buffer,
> 3. check if the mapped buffer is both in the request range
> and a delayed buffer. If not, there is no delayed-extent,
> then return.
> 4. a delayed-extent is found, the extent will be collected.
This does seem to work now; I patched xfstests to do a run
with and without sync, and both pass.
The patches for xfstests are here:
http://oss.sgi.com/pipermail/xfs/2011-February/049525.html
http://oss.sgi.com/pipermail/xfs/2011-February/049527.html
-Eric
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v1] ext4:Make FIEMAP and delayed allocation play well together.
2011-02-24 21:14 ` Andreas Dilger
@ 2011-02-25 8:05 ` Yongqiang Yang
0 siblings, 0 replies; 5+ messages in thread
From: Yongqiang Yang @ 2011-02-25 8:05 UTC (permalink / raw)
To: Andreas Dilger; +Cc: linux-ext4, sandeen
On Fri, Feb 25, 2011 at 5:14 AM, Andreas Dilger <adilger@dilger.ca> wrote:
> On 2011-02-24, at 9:15 AM, Yongqiang Yang wrote:
>> static int ext4_ext_fiemap_cb(struct inode *inode, struct ext4_ext_path *path,
>> struct ext4_ext_cache *newex, struct ext4_extent *ex,
>> void *data)
>> {
>> + struct page *pages[NR_PAGES_PER_TIME];
>
> I agree with Eric that this cannot live on the stack (or alternately that we need better stack handling in the kernel :-). Fortunately, the kmalloc() of this data only needs to happen in rare cases, so it shouldn't impact performance very much. It might be possible to cache this buffer in newex, to limit the allocation to once per file, instead of once per hole, but I'm not sure it is worthwhile.
>
> The benefit of kmalloc() is that it could be a full page or two, and allow far fewer calls into pagevec_lookup_tag(), which I suspect is more of a performance win than the cost of calling kmalloc().
>
>> + __u64 logical;
>> + __u64 physical;
>> + __u64 length;
>> +
>> + loff_t size = i_size_read(inode);
>> + pgoff_t index = 0;
>> + ext4_lblk_t end = 0;
>> +
>> + __u32 flags = 0;
>> + int ret = 0;
>
> (style) please align variable declarations in some consistent manner
>
>> struct fiemap_extent_info *fieinfo = data;
>> +
>> unsigned char blksize_bits = inode->i_sb->s_blocksize_bits;
>
> Many of these are specific to the "if (newex->ec_start == 0)" context, so should be allocated there to make this more clear.
>
> (style) there shouldn't be blank lines between the variable declarations. There are also many other CodingStyle violations that should be cleaned up.
>
>> if (newex->ec_start == 0) {
>> + /*
>> + * No extent in extent-tree contains block @newex->ec_start,
>> + * then the block may stay in 1)a hole or 2)delayed-extent.
>> + *
>> + * Holes or delayed-extents are processed as follows.
>> + * 1. lookup dirty pages with specified range in pagecache.
>> + * If no page is got, then there is no delayed-extent and
>> + * return with EXT_CONTINUE.
>> + * 2. find the 1st mapped buffer,
>> + * 3. check if the mapped buffer is both in the request range
>> + * and a delayed buffer. If not, there is no delayed-extent,
>> + * then return.
>> + * 4. a delayed-extent is found, the extent will be collected.
>> + */
>
> Good comment.
>
>> + pgoff_t last_offset;
>> pgoff_t offset;
>> struct buffer_head *bh = NULL;
>> + struct buffer_head *head = NULL;
>> + unsigned int nr_pages = NR_PAGES_PER_TIME;
>>
>> offset = logical >> PAGE_SHIFT;
>>
>> +repeat:
>> + index = 0;
>> + last_offset = offset;
>> + head = NULL;
>> + ret = find_get_pages_tag(inode->i_mapping, &offset,
>> + PAGECACHE_TAG_DIRTY, nr_pages, pages);
>> +
>> + if (!(flags & FIEMAP_EXTENT_DELALLOC)) {
>> + /* First time, try to find a mapped buffer. */
>> + if (ret == 0)
>> + /* just a hole. */
>> + return EXT_CONTINUE;
>
> This could contain the "out:" label, and the page_cache_release() loop would be skipped because ret == 0. That would avoid the need to use the pages[] and index values outside of the scope of the "if (ec_start == 0)" clause, like:
>
> if (ret == 0) {
> out:
> /* just a hole. */
> for (index = 0; index < ret; index++)
> page_cache_release(pages[index]);
> return EXT_CONTINUE;
> }
>
>> + /* Try to find the 1st mapped buffer. */
>> + end = ((__u64)pages[0]->index << PAGE_SHIFT) >> blksize_bits;
>> + for (index = 0; index < ret; index ++) {
>> + if (!page_has_buffers(pages[index]))
>> + goto out;
>> + head = page_buffers(pages[index]);
>> + if (!head)
>> + goto out;
>> +
>> + bh = head;
>> + do {
>> + if (buffer_mapped(bh)) {
>> + /* get the 1st mapped buffer. */
>> + if (end > newex->ec_block + newex->ec_len)
>> + /* The buffer is out of the request range. */
>> + goto out;
>> + goto found_mapped_buffer;
>> + }
>> + bh = bh->b_this_page;
>> + end ++;
>> + } while (bh != head);
>> + }
>> +
>> + if (index == ret)
>> + /* No mapped buffer found? */
>> + BUG();
>
> Hmm, this should be BUG_ON(index == ret), but even so I'm not so fond of BUG() as a form of error handling. I'd rather just return at this point without an extent.
>
>> } else {
>> + /*Find contiguous delayed buffers. */
>> + if (ret > 0 && pages[0]->index == last_offset)
>> + head = page_buffers(pages[0]);
>> + bh = head;
>> + }
>> +
>> +found_mapped_buffer:
>> + if (bh && buffer_delay(bh)) {
>> + /* 1st or contiguous delayed buffer found. */
>> + if (!(flags & FIEMAP_EXTENT_DELALLOC)) {
>> + /* 1st delayed buffer found, record the start of extent. */
>> + flags |= FIEMAP_EXTENT_DELALLOC;
>> + newex->ec_block = end;
>> + logical = (__u64)end << blksize_bits;
>> + }
>> + /* Find contiguous delayed buffers. */
>> + do {
>> + if (!buffer_delay(bh))
>> + goto found_delayed_extent;
>> + bh = bh->b_this_page;
>> + end ++;
>> + } while (bh != head);
>> +
>> + for (index ++; index < ret; index ++) {
>> + if (!page_has_buffers(pages[index])) {
>> + bh = NULL;
>> + break;
>> + }
>> + head = page_buffers(pages[index]);
>> + if (!head) {
>> + bh = NULL;
>> + break;
>> + }
>> + if (pages[index]->index != pages[0]->index + index) {
>> + /* Blocks are not contiguous. */
>> + bh = NULL;
>> + break;
>> + }
>> + bh = head;
>> + do {
>> + if (!buffer_delay(bh))
>> + /* Delayed-extent ends. */
>> + goto found_delayed_extent;
>> + bh = bh->b_this_page;
>> + end ++;
>> + } while (bh != head);
>> + }
>> + } else if (!(flags & FIEMAP_EXTENT_DELALLOC))
>> + /* a hole found. */
>> + goto out;
>> +
>> +found_delayed_extent:
>> + newex->ec_len = min(end - newex->ec_block,
>> + (ext4_lblk_t)EXT_INIT_MAX_LEN);
>> + if (ret == nr_pages && newex->ec_len < EXT_INIT_MAX_LEN
>> + && bh && buffer_delay(bh)) {
>> + /* Have not collected an extent and continue. */
>> + for (index =0; index < ret; index ++)
>> + page_cache_release(pages[index]);
>> + goto repeat;
>> }
>> +
>> + for (index =0; index < ret; index ++)
>> + page_cache_release(pages[index]);
>> }
>>
>> physical = (__u64)newex->ec_start << blksize_bits;
>> @@ -3822,32 +3935,24 @@ static int ext4_ext_fiemap_cb(struct inode *inode, struct ext4_ext_path *path,
>> if (ex && ext4_ext_is_uninitialized(ex))
>> flags |= FIEMAP_EXTENT_UNWRITTEN;
>>
>>
>> + if (logical + length >= size) {
>> loff_t bs = EXT4_BLOCK_SIZE(inode->i_sb);
>> + flags |= FIEMAP_EXTENT_LAST;
>> + if (flags & FIEMAP_EXTENT_DELALLOC)
>> length = (size - logical + bs - 1) & ~(bs-1);
>
> This should probably set "physical = 0" just so that we don't return random garbage in that field.
physical has equaled 0 here because newex->ec_start equals 0 for delayed-extent.
>
>> }
>>
>> - error = fiemap_fill_next_extent(fieinfo, logical, physical,
>> + ret = fiemap_fill_next_extent(fieinfo, logical, physical,
>> length, flags);
>> - if (error < 0)
>> - return error;
>> - if (error == 1)
>> + if (ret < 0)
>> + return ret;
>> + if (ret == 1)
>> return EXT_BREAK;
>>
>> +out:
>> + for (index =0; index < ret; index ++)
>> + page_cache_release(pages[index]);
>> +
>> return EXT_CONTINUE;
>> }
>>
>> --
>> 1.7.4
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
> Cheers, Andreas
>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
Best Wishes
Yongqiang Yang
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2011-02-25 8:05 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-24 16:15 [PATCH v1] ext4:Make FIEMAP and delayed allocation play well together Yongqiang Yang
2011-02-24 17:17 ` Eric Sandeen
2011-02-24 21:14 ` Andreas Dilger
2011-02-25 8:05 ` Yongqiang Yang
2011-02-24 22:23 ` Eric Sandeen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).