* [PATCH] ext4:Fix a bug in ext4_ext_fiemap_cb().
@ 2011-02-23 15:59 Yongqiang Yang
2011-02-23 16:41 ` Eric Sandeen
2011-02-23 23:35 ` Andreas Dilger
0 siblings, 2 replies; 10+ messages in thread
From: Yongqiang Yang @ 2011-02-23 15:59 UTC (permalink / raw)
To: linux-ext4; +Cc: sandeen, Yongqiang Yang
1] Delayed extents after a hole are neglected.
By using find_get_pages() instead of find_get_page() to
lookup pagecache, delayed extents can be found, because
find_get_pages() with nr_pages=1 will return the next page
in pagecache.
2] Extents after a delayed extent or a hole are neglected as well.
Fix it by accurating the request range by the result of
ext4_ext_next_allocated_block().
Reported by Chris Mason <chris.mason@oracle.com>:
We've had reports on btrfs that cp is giving us files full of zeros
instead of actually copying them. It was tracked down to a bug with
the btrfs fiemap implementation where it was returning holes for
delalloc ranges.
Newer versions of cp are trusting fiemap to tell it where the holes
are, which does seem like a pretty neat trick.
I decided to give xfs and ext4 a shot with a few tests cases too, xfs
passed with all the ones btrfs was getting wrong, and ext4 got the basic
delalloc case right.
$ mkfs.ext4 /dev/xxx
$ mount /dev/xxx /mnt
$ dd if=/dev/zero of=/mnt/foo bs=1M count=1
$ fiemap-test foo
ext: 0 logical: [ 0.. 255] phys: 0.. 255
flags: 0x007 tot: 256
Horray! But once we throw a hole in, things go bad:
$ mkfs.ext4 /dev/xxx
$ mount /dev/xxx /mnt
$ dd if=/dev/zero of=/mnt/foo bs=1M count=1 seek=1
$ fiemap-test foo
< no output >
We've got a delalloc extent after the hole and ext4 fiemap didn't find
it. If I run sync to kick the delalloc out:
$sync
$ fiemap-test foo
ext: 0 logical: [ 256.. 511] phys: 34048.. 34303
flags: 0x001 tot: 256
fiemap-test is sitting in my /usr/local/bin, and I have no idea how it
got there. It's full of pretty comments so I know it isn't mine, but
you can grab it here:
http://oss.oracle.com/~mason/fiemap-test.c
xfsqa has a fiemap program too.
After Fix, test results are as follows:
ext: 0 logical: [ 256.. 511] phys: 0.. 255
flags: 0x007 tot: 256
ext: 0 logical: [ 256.. 511] phys: 33280.. 33535
flags: 0x001 tot: 256
Signe-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
---
fs/ext4/extents.c | 26 +++++++++++++++++++++++---
mm/filemap.c | 1 +
2 files changed, 24 insertions(+), 3 deletions(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index ccce8a7..ad455a0 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3788,17 +3788,27 @@ static int ext4_ext_fiemap_cb(struct inode *inode, struct ext4_ext_path *path,
__u64 physical;
__u64 length;
__u32 flags = 0;
+ ext4_lblk_t end;
int error;
logical = (__u64)newex->ec_block << blksize_bits;
- if (newex->ec_start == 0) {
+ if (!newex->ec_start) {
+ /*
+ * There is no extent contains @newex->ec_block block.
+ * It implies that @newex->ec_block block lies 1)a hole
+ * or 2)delayed-allocated blocks that has not been
+ * allocated, so pagecache is needed to lookup.
+ *
+ * And if it is case 2, @newex->ec_len needs to be corrected.
+ *
+ */
pgoff_t offset;
struct page *page;
struct buffer_head *bh = NULL;
offset = logical >> PAGE_SHIFT;
- page = find_get_page(inode->i_mapping, offset);
+ (void)find_get_pages(inode->i_mapping, offset, 1, &page);
if (!page || !page_has_buffers(page))
return EXT_CONTINUE;
@@ -3807,8 +3817,13 @@ static int ext4_ext_fiemap_cb(struct inode *inode, struct ext4_ext_path *path,
if (!bh)
return EXT_CONTINUE;
+ /* Assume block-size equals page-size. */
if (buffer_delay(bh)) {
flags |= FIEMAP_EXTENT_DELALLOC;
+ if (page->index > offset) {
+ logical = ((__u64)page->index << PAGE_SHIFT);
+ newex->ec_block = logical >> blksize_bits;
+ }
page_cache_release(page);
} else {
page_cache_release(page);
@@ -3830,7 +3845,8 @@ static int ext4_ext_fiemap_cb(struct inode *inode, struct ext4_ext_path *path,
*
* XXX this might miss a single-block extent at EXT_MAX_BLOCK
*/
- if (ext4_ext_next_allocated_block(path) == EXT_MAX_BLOCK ||
+ end = ext4_ext_next_allocated_block(path);
+ if (end == EXT_MAX_BLOCK ||
newex->ec_block + newex->ec_len - 1 == EXT_MAX_BLOCK) {
loff_t size = i_size_read(inode);
loff_t bs = EXT4_BLOCK_SIZE(inode->i_sb);
@@ -3839,8 +3855,12 @@ static int ext4_ext_fiemap_cb(struct inode *inode, struct ext4_ext_path *path,
if ((flags & FIEMAP_EXTENT_DELALLOC) &&
logical+length > size)
length = (size - logical + bs - 1) & ~(bs-1);
+ } else {
+ newex->ec_len = end - newex->ec_block;
+ length = (__u64)newex->ec_len << blksize_bits;
}
+
error = fiemap_fill_next_extent(fieinfo, logical, physical,
length, flags);
if (error < 0)
diff --git a/mm/filemap.c b/mm/filemap.c
index 83a45d3..1c01ffc 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -803,6 +803,7 @@ repeat:
rcu_read_unlock();
return ret;
}
+EXPORT_SYMBOL(find_get_pages);
/**
* find_get_pages_contig - gang contiguous pagecache lookup
--
1.5.6.5
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] ext4:Fix a bug in ext4_ext_fiemap_cb().
2011-02-23 15:59 [PATCH] ext4:Fix a bug in ext4_ext_fiemap_cb() Yongqiang Yang
@ 2011-02-23 16:41 ` Eric Sandeen
2011-02-24 0:04 ` Dave Chinner
` (2 more replies)
2011-02-23 23:35 ` Andreas Dilger
1 sibling, 3 replies; 10+ messages in thread
From: Eric Sandeen @ 2011-02-23 16:41 UTC (permalink / raw)
To: Yongqiang Yang; +Cc: linux-ext4
On 2/23/11 9:59 AM, Yongqiang Yang wrote:
> 1] Delayed extents after a hole are neglected.
>
> By using find_get_pages() instead of find_get_page() to
> lookup pagecache, delayed extents can be found, because
> find_get_pages() with nr_pages=1 will return the next page
> in pagecache.
>
> 2] Extents after a delayed extent or a hole are neglected as well.
>
> Fix it by accurating the request range by the result of
> ext4_ext_next_allocated_block().
>
> Reported by Chris Mason <chris.mason@oracle.com>:
> We've had reports on btrfs that cp is giving us files full of zeros
> instead of actually copying them. It was tracked down to a bug with
> the btrfs fiemap implementation where it was returning holes for
> delalloc ranges.
>
> Newer versions of cp are trusting fiemap to tell it where the holes
> are, which does seem like a pretty neat trick.
>
> I decided to give xfs and ext4 a shot with a few tests cases too, xfs
> passed with all the ones btrfs was getting wrong, and ext4 got the basic
> delalloc case right.
> $ mkfs.ext4 /dev/xxx
> $ mount /dev/xxx /mnt
> $ dd if=/dev/zero of=/mnt/foo bs=1M count=1
> $ fiemap-test foo
> ext: 0 logical: [ 0.. 255] phys: 0.. 255
> flags: 0x007 tot: 256
>
> Horray! But once we throw a hole in, things go bad:
> $ mkfs.ext4 /dev/xxx
> $ mount /dev/xxx /mnt
> $ dd if=/dev/zero of=/mnt/foo bs=1M count=1 seek=1
> $ fiemap-test foo
> < no output >
>
> We've got a delalloc extent after the hole and ext4 fiemap didn't find
> it. If I run sync to kick the delalloc out:
> $sync
> $ fiemap-test foo
> ext: 0 logical: [ 256.. 511] phys: 34048.. 34303
> flags: 0x001 tot: 256
>
> fiemap-test is sitting in my /usr/local/bin, and I have no idea how it
> got there. It's full of pretty comments so I know it isn't mine, but
> you can grab it here:
>
> http://oss.oracle.com/~mason/fiemap-test.c
>
> xfsqa has a fiemap program too.
>
> After Fix, test results are as follows:
> ext: 0 logical: [ 256.. 511] phys: 0.. 255
> flags: 0x007 tot: 256
> ext: 0 logical: [ 256.. 511] phys: 33280.. 33535
> flags: 0x001 tot: 256
>
> Signe-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
> ---
> fs/ext4/extents.c | 26 +++++++++++++++++++++++---
> mm/filemap.c | 1 +
> 2 files changed, 24 insertions(+), 3 deletions(-)
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index ccce8a7..ad455a0 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -3788,17 +3788,27 @@ static int ext4_ext_fiemap_cb(struct inode *inode, struct ext4_ext_path *path,
> __u64 physical;
> __u64 length;
> __u32 flags = 0;
> + ext4_lblk_t end;
> int error;
>
> logical = (__u64)newex->ec_block << blksize_bits;
>
> - if (newex->ec_start == 0) {
> + if (!newex->ec_start) {
> + /*
> + * There is no extent contains @newex->ec_block block.
> + * It implies that @newex->ec_block block lies 1)a hole
> + * or 2)delayed-allocated blocks that has not been
> + * allocated, so pagecache is needed to lookup.
> + *
> + * And if it is case 2, @newex->ec_len needs to be corrected.
> + *
> + */
> pgoff_t offset;
> struct page *page;
> struct buffer_head *bh = NULL;
>
> offset = logical >> PAGE_SHIFT;
> - page = find_get_page(inode->i_mapping, offset);
> + (void)find_get_pages(inode->i_mapping, offset, 1, &page);
> if (!page || !page_has_buffers(page))
> return EXT_CONTINUE;
>
> @@ -3807,8 +3817,13 @@ static int ext4_ext_fiemap_cb(struct inode *inode, struct ext4_ext_path *path,
> if (!bh)
> return EXT_CONTINUE;
>
> + /* Assume block-size equals page-size. */
> if (buffer_delay(bh)) {
> flags |= FIEMAP_EXTENT_DELALLOC;
> + if (page->index > offset) {
> + logical = ((__u64)page->index << PAGE_SHIFT);
> + newex->ec_block = logical >> blksize_bits;
> + }
> page_cache_release(page);
> } else {
> page_cache_release(page);
> @@ -3830,7 +3845,8 @@ static int ext4_ext_fiemap_cb(struct inode *inode, struct ext4_ext_path *path,
> *
> * XXX this might miss a single-block extent at EXT_MAX_BLOCK
> */
> - if (ext4_ext_next_allocated_block(path) == EXT_MAX_BLOCK ||
> + end = ext4_ext_next_allocated_block(path);
I think this will fall down if you have:
[ HOLE ][ DELALLOC ][ HOLE ][ ALLOCATED ] won't it?
i.e. your "end" will be the first block of "allocated" right?
-Eric
> + if (end == EXT_MAX_BLOCK ||
> newex->ec_block + newex->ec_len - 1 == EXT_MAX_BLOCK) {
> loff_t size = i_size_read(inode);
> loff_t bs = EXT4_BLOCK_SIZE(inode->i_sb);
> @@ -3839,8 +3855,12 @@ static int ext4_ext_fiemap_cb(struct inode *inode, struct ext4_ext_path *path,
> if ((flags & FIEMAP_EXTENT_DELALLOC) &&
> logical+length > size)
> length = (size - logical + bs - 1) & ~(bs-1);
> + } else {
> + newex->ec_len = end - newex->ec_block;
> + length = (__u64)newex->ec_len << blksize_bits;
> }
>
> +
> error = fiemap_fill_next_extent(fieinfo, logical, physical,
> length, flags);
> if (error < 0)
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 83a45d3..1c01ffc 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -803,6 +803,7 @@ repeat:
> rcu_read_unlock();
> return ret;
> }
> +EXPORT_SYMBOL(find_get_pages);
>
> /**
> * find_get_pages_contig - gang contiguous pagecache lookup
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] ext4:Fix a bug in ext4_ext_fiemap_cb().
2011-02-23 15:59 [PATCH] ext4:Fix a bug in ext4_ext_fiemap_cb() Yongqiang Yang
2011-02-23 16:41 ` Eric Sandeen
@ 2011-02-23 23:35 ` Andreas Dilger
2011-02-24 0:37 ` Yongqiang Yang
1 sibling, 1 reply; 10+ messages in thread
From: Andreas Dilger @ 2011-02-23 23:35 UTC (permalink / raw)
To: Yongqiang Yang; +Cc: linux-ext4, sandeen
On 2011-02-23, at 8:59 AM, Yongqiang Yang wrote:
> @@ -3788,17 +3788,27 @@ static int ext4_ext_fiemap_cb(struct inode *inode,
> - if (newex->ec_start == 0) {
> + if (!newex->ec_start) {
(style) the original code is actually correct. ec_start is not a boolean, so comparing it == 0 is actually the right thing to do.
> @@ -3807,8 +3817,13 @@ static int ext4_ext_fiemap_cb(struct inode *inode, struct ext4_ext_path *path,
> if (!bh)
> return EXT_CONTINUE;
>
> + /* Assume block-size equals page-size. */
This is not a valid assumption.
> if (buffer_delay(bh)) {
> flags |= FIEMAP_EXTENT_DELALLOC;
> + if (page->index > offset) {
> + logical = ((__u64)page->index << PAGE_SHIFT);
> + newex->ec_block = logical >> blksize_bits;
> + }
So, this assumes that the entire unmapped extent is described by the first page, but doesn't actually check whether all of the pages exist. For the purpose of cp it might be OK, since at worst it means that cp will be reading from a hole in the source file. However, I wonder if other applications will depend on the allocated extent being more accurate?
> +EXPORT_SYMBOL(find_get_pages);
Eric had also suggested pagevec_lookup_tag(), which is already exported.
Cheers, Andreas
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] ext4:Fix a bug in ext4_ext_fiemap_cb().
2011-02-23 16:41 ` Eric Sandeen
@ 2011-02-24 0:04 ` Dave Chinner
2011-02-24 16:34 ` Eric Sandeen
2011-02-24 0:33 ` Yongqiang Yang
2011-02-24 0:40 ` Yongqiang Yang
2 siblings, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2011-02-24 0:04 UTC (permalink / raw)
To: Eric Sandeen; +Cc: Yongqiang Yang, linux-ext4
On Wed, Feb 23, 2011 at 10:41:42AM -0600, Eric Sandeen wrote:
> On 2/23/11 9:59 AM, Yongqiang Yang wrote:
> > 1] Delayed extents after a hole are neglected.
> >
> > By using find_get_pages() instead of find_get_page() to
> > lookup pagecache, delayed extents can be found, because
> > find_get_pages() with nr_pages=1 will return the next page
> > in pagecache.
> >
> > 2] Extents after a delayed extent or a hole are neglected as well.
> >
> > Fix it by accurating the request range by the result of
> > ext4_ext_next_allocated_block().
> >
> > Reported by Chris Mason <chris.mason@oracle.com>:
> > We've had reports on btrfs that cp is giving us files full of zeros
> > instead of actually copying them. It was tracked down to a bug with
> > the btrfs fiemap implementation where it was returning holes for
> > delalloc ranges.
> >
> > Newer versions of cp are trusting fiemap to tell it where the holes
> > are, which does seem like a pretty neat trick.
> >
> > I decided to give xfs and ext4 a shot with a few tests cases too, xfs
> > passed with all the ones btrfs was getting wrong, and ext4 got the basic
> > delalloc case right.
> > $ mkfs.ext4 /dev/xxx
> > $ mount /dev/xxx /mnt
> > $ dd if=/dev/zero of=/mnt/foo bs=1M count=1
> > $ fiemap-test foo
> > ext: 0 logical: [ 0.. 255] phys: 0.. 255
> > flags: 0x007 tot: 256
> >
> > Horray! But once we throw a hole in, things go bad:
> > $ mkfs.ext4 /dev/xxx
> > $ mount /dev/xxx /mnt
> > $ dd if=/dev/zero of=/mnt/foo bs=1M count=1 seek=1
> > $ fiemap-test foo
> > < no output >
> >
> > We've got a delalloc extent after the hole and ext4 fiemap didn't find
> > it. If I run sync to kick the delalloc out:
> > $sync
> > $ fiemap-test foo
> > ext: 0 logical: [ 256.. 511] phys: 34048.. 34303
> > flags: 0x001 tot: 256
> >
> > fiemap-test is sitting in my /usr/local/bin, and I have no idea how it
> > got there. It's full of pretty comments so I know it isn't mine, but
> > you can grab it here:
> >
> > http://oss.oracle.com/~mason/fiemap-test.c
> >
> > xfsqa has a fiemap program too.
.....
> I think this will fall down if you have:
>
> [ HOLE ][ DELALLOC ][ HOLE ][ ALLOCATED ] won't it?
>
> i.e. your "end" will be the first block of "allocated" right?
This sound sort of problem indicateѕ to me that we need a generic
fiemap test in xfstests that exercises all these corner cases.
Perhaps something similar to the way I tested all the
XFS_IOC_ZERO_RANGE corner cases in test 242 by setting up all the
different hole/delalloc/unwritten/allocated combinations using
xfs_io and used the fiemap output as the golden output?
That would catch all of these problems in the different filesystems
that implement fiemap and make sure we notice regressions pretty
quickly.....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] ext4:Fix a bug in ext4_ext_fiemap_cb().
2011-02-23 16:41 ` Eric Sandeen
2011-02-24 0:04 ` Dave Chinner
@ 2011-02-24 0:33 ` Yongqiang Yang
2011-02-24 16:36 ` Eric Sandeen
2011-02-24 0:40 ` Yongqiang Yang
2 siblings, 1 reply; 10+ messages in thread
From: Yongqiang Yang @ 2011-02-24 0:33 UTC (permalink / raw)
To: Eric Sandeen; +Cc: linux-ext4
On Thu, Feb 24, 2011 at 12:41 AM, Eric Sandeen <sandeen@redhat.com> wrote:
> On 2/23/11 9:59 AM, Yongqiang Yang wrote:
>> 1] Delayed extents after a hole are neglected.
>>
>> By using find_get_pages() instead of find_get_page() to
>> lookup pagecache, delayed extents can be found, because
>> find_get_pages() with nr_pages=1 will return the next page
>> in pagecache.
>>
>> 2] Extents after a delayed extent or a hole are neglected as well.
>>
>> Fix it by accurating the request range by the result of
>> ext4_ext_next_allocated_block().
>>
>> Reported by Chris Mason <chris.mason@oracle.com>:
>> We've had reports on btrfs that cp is giving us files full of zeros
>> instead of actually copying them. It was tracked down to a bug with
>> the btrfs fiemap implementation where it was returning holes for
>> delalloc ranges.
>>
>> Newer versions of cp are trusting fiemap to tell it where the holes
>> are, which does seem like a pretty neat trick.
>>
>> I decided to give xfs and ext4 a shot with a few tests cases too, xfs
>> passed with all the ones btrfs was getting wrong, and ext4 got the basic
>> delalloc case right.
>> $ mkfs.ext4 /dev/xxx
>> $ mount /dev/xxx /mnt
>> $ dd if=/dev/zero of=/mnt/foo bs=1M count=1
>> $ fiemap-test foo
>> ext: 0 logical: [ 0.. 255] phys: 0.. 255
>> flags: 0x007 tot: 256
>>
>> Horray! But once we throw a hole in, things go bad:
>> $ mkfs.ext4 /dev/xxx
>> $ mount /dev/xxx /mnt
>> $ dd if=/dev/zero of=/mnt/foo bs=1M count=1 seek=1
>> $ fiemap-test foo
>> < no output >
>>
>> We've got a delalloc extent after the hole and ext4 fiemap didn't find
>> it. If I run sync to kick the delalloc out:
>> $sync
>> $ fiemap-test foo
>> ext: 0 logical: [ 256.. 511] phys: 34048.. 34303
>> flags: 0x001 tot: 256
>>
>> fiemap-test is sitting in my /usr/local/bin, and I have no idea how it
>> got there. It's full of pretty comments so I know it isn't mine, but
>> you can grab it here:
>>
>> http://oss.oracle.com/~mason/fiemap-test.c
>>
>> xfsqa has a fiemap program too.
>>
>> After Fix, test results are as follows:
>> ext: 0 logical: [ 256.. 511] phys: 0.. 255
>> flags: 0x007 tot: 256
>> ext: 0 logical: [ 256.. 511] phys: 33280.. 33535
>> flags: 0x001 tot: 256
>>
>> Signe-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
>> ---
>> fs/ext4/extents.c | 26 +++++++++++++++++++++++---
>> mm/filemap.c | 1 +
>> 2 files changed, 24 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
>> index ccce8a7..ad455a0 100644
>> --- a/fs/ext4/extents.c
>> +++ b/fs/ext4/extents.c
>> @@ -3788,17 +3788,27 @@ static int ext4_ext_fiemap_cb(struct inode *inode, struct ext4_ext_path *path,
>> __u64 physical;
>> __u64 length;
>> __u32 flags = 0;
>> + ext4_lblk_t end;
>> int error;
>>
>> logical = (__u64)newex->ec_block << blksize_bits;
>>
>> - if (newex->ec_start == 0) {
>> + if (!newex->ec_start) {
>> + /*
>> + * There is no extent contains @newex->ec_block block.
>> + * It implies that @newex->ec_block block lies 1)a hole
>> + * or 2)delayed-allocated blocks that has not been
>> + * allocated, so pagecache is needed to lookup.
>> + *
>> + * And if it is case 2, @newex->ec_len needs to be corrected.
>> + *
>> + */
>> pgoff_t offset;
>> struct page *page;
>> struct buffer_head *bh = NULL;
>>
>> offset = logical >> PAGE_SHIFT;
>> - page = find_get_page(inode->i_mapping, offset);
>> + (void)find_get_pages(inode->i_mapping, offset, 1, &page);
>> if (!page || !page_has_buffers(page))
>> return EXT_CONTINUE;
>>
>> @@ -3807,8 +3817,13 @@ static int ext4_ext_fiemap_cb(struct inode *inode, struct ext4_ext_path *path,
>> if (!bh)
>> return EXT_CONTINUE;
>>
>> + /* Assume block-size equals page-size. */
>> if (buffer_delay(bh)) {
>> flags |= FIEMAP_EXTENT_DELALLOC;
>> + if (page->index > offset) {
>> + logical = ((__u64)page->index << PAGE_SHIFT);
>> + newex->ec_block = logical >> blksize_bits;
>> + }
>> page_cache_release(page);
>> } else {
>> page_cache_release(page);
>> @@ -3830,7 +3845,8 @@ static int ext4_ext_fiemap_cb(struct inode *inode, struct ext4_ext_path *path,
>> *
>> * XXX this might miss a single-block extent at EXT_MAX_BLOCK
>> */
>> - if (ext4_ext_next_allocated_block(path) == EXT_MAX_BLOCK ||
>> + end = ext4_ext_next_allocated_block(path);
>
> I think this will fall down if you have:
>
> [ HOLE ][ DELALLOC ][ HOLE ][ ALLOCATED ] won't it?
>
> i.e. your "end" will be the first block of "allocated" right?
Yes, but it neglect nothing. If we want to deal his model, we need to
lookup dirty pages in specified range.
>
> -Eric
>
>> + if (end == EXT_MAX_BLOCK ||
>> newex->ec_block + newex->ec_len - 1 == EXT_MAX_BLOCK) {
>> loff_t size = i_size_read(inode);
>> loff_t bs = EXT4_BLOCK_SIZE(inode->i_sb);
>> @@ -3839,8 +3855,12 @@ static int ext4_ext_fiemap_cb(struct inode *inode, struct ext4_ext_path *path,
>> if ((flags & FIEMAP_EXTENT_DELALLOC) &&
>> logical+length > size)
>> length = (size - logical + bs - 1) & ~(bs-1);
>> + } else {
>> + newex->ec_len = end - newex->ec_block;
>> + length = (__u64)newex->ec_len << blksize_bits;
>> }
>>
>> +
>> error = fiemap_fill_next_extent(fieinfo, logical, physical,
>> length, flags);
>> if (error < 0)
>> diff --git a/mm/filemap.c b/mm/filemap.c
>> index 83a45d3..1c01ffc 100644
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -803,6 +803,7 @@ repeat:
>> rcu_read_unlock();
>> return ret;
>> }
>> +EXPORT_SYMBOL(find_get_pages);
>>
>> /**
>> * find_get_pages_contig - gang contiguous pagecache lookup
>
>
--
Best Wishes
Yongqiang Yang
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] ext4:Fix a bug in ext4_ext_fiemap_cb().
2011-02-23 23:35 ` Andreas Dilger
@ 2011-02-24 0:37 ` Yongqiang Yang
0 siblings, 0 replies; 10+ messages in thread
From: Yongqiang Yang @ 2011-02-24 0:37 UTC (permalink / raw)
To: Andreas Dilger; +Cc: linux-ext4, sandeen
On Thu, Feb 24, 2011 at 7:35 AM, Andreas Dilger <adilger@dilger.ca> wrote:
> On 2011-02-23, at 8:59 AM, Yongqiang Yang wrote:
>> @@ -3788,17 +3788,27 @@ static int ext4_ext_fiemap_cb(struct inode *inode,
>> - if (newex->ec_start == 0) {
>> + if (!newex->ec_start) {
>
> (style) the original code is actually correct. ec_start is not a boolean, so comparing it == 0 is actually the right thing to do.
>
>> @@ -3807,8 +3817,13 @@ static int ext4_ext_fiemap_cb(struct inode *inode, struct ext4_ext_path *path,
>> if (!bh)
>> return EXT_CONTINUE;
>>
>> + /* Assume block-size equals page-size. */
>
> This is not a valid assumption.
>
>> if (buffer_delay(bh)) {
>> flags |= FIEMAP_EXTENT_DELALLOC;
>> + if (page->index > offset) {
>> + logical = ((__u64)page->index << PAGE_SHIFT);
>> + newex->ec_block = logical >> blksize_bits;
>> + }
>
> So, this assumes that the entire unmapped extent is described by the first page, but doesn't actually check whether all of the pages exist. For the purpose of cp it might be OK, since at worst it means that cp will be reading from a hole in the source file. However, I wonder if other applications will depend on the allocated extent being more accurate?
>
>> +EXPORT_SYMBOL(find_get_pages);
>
> Eric had also suggested pagevec_lookup_tag(), which is already exported.
Yes, we can use this function. I will make a new patch.
>
> Cheers, Andreas
>
>
>
>
>
>
--
Best Wishes
Yongqiang Yang
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] ext4:Fix a bug in ext4_ext_fiemap_cb().
2011-02-23 16:41 ` Eric Sandeen
2011-02-24 0:04 ` Dave Chinner
2011-02-24 0:33 ` Yongqiang Yang
@ 2011-02-24 0:40 ` Yongqiang Yang
2011-02-24 0:56 ` Yongqiang Yang
2 siblings, 1 reply; 10+ messages in thread
From: Yongqiang Yang @ 2011-02-24 0:40 UTC (permalink / raw)
To: Eric Sandeen; +Cc: linux-ext4
On Thu, Feb 24, 2011 at 12:41 AM, Eric Sandeen <sandeen@redhat.com> wrote:
> On 2/23/11 9:59 AM, Yongqiang Yang wrote:
>> 1] Delayed extents after a hole are neglected.
>>
>> By using find_get_pages() instead of find_get_page() to
>> lookup pagecache, delayed extents can be found, because
>> find_get_pages() with nr_pages=1 will return the next page
>> in pagecache.
>>
>> 2] Extents after a delayed extent or a hole are neglected as well.
>>
>> Fix it by accurating the request range by the result of
>> ext4_ext_next_allocated_block().
>>
>> Reported by Chris Mason <chris.mason@oracle.com>:
>> We've had reports on btrfs that cp is giving us files full of zeros
>> instead of actually copying them. It was tracked down to a bug with
>> the btrfs fiemap implementation where it was returning holes for
>> delalloc ranges.
>>
>> Newer versions of cp are trusting fiemap to tell it where the holes
>> are, which does seem like a pretty neat trick.
>>
>> I decided to give xfs and ext4 a shot with a few tests cases too, xfs
>> passed with all the ones btrfs was getting wrong, and ext4 got the basic
>> delalloc case right.
>> $ mkfs.ext4 /dev/xxx
>> $ mount /dev/xxx /mnt
>> $ dd if=/dev/zero of=/mnt/foo bs=1M count=1
>> $ fiemap-test foo
>> ext: 0 logical: [ 0.. 255] phys: 0.. 255
>> flags: 0x007 tot: 256
>>
>> Horray! But once we throw a hole in, things go bad:
>> $ mkfs.ext4 /dev/xxx
>> $ mount /dev/xxx /mnt
>> $ dd if=/dev/zero of=/mnt/foo bs=1M count=1 seek=1
>> $ fiemap-test foo
>> < no output >
>>
>> We've got a delalloc extent after the hole and ext4 fiemap didn't find
>> it. If I run sync to kick the delalloc out:
>> $sync
>> $ fiemap-test foo
>> ext: 0 logical: [ 256.. 511] phys: 34048.. 34303
>> flags: 0x001 tot: 256
>>
>> fiemap-test is sitting in my /usr/local/bin, and I have no idea how it
>> got there. It's full of pretty comments so I know it isn't mine, but
>> you can grab it here:
>>
>> http://oss.oracle.com/~mason/fiemap-test.c
>>
>> xfsqa has a fiemap program too.
>>
>> After Fix, test results are as follows:
>> ext: 0 logical: [ 256.. 511] phys: 0.. 255
>> flags: 0x007 tot: 256
>> ext: 0 logical: [ 256.. 511] phys: 33280.. 33535
>> flags: 0x001 tot: 256
>>
>> Signe-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
>> ---
>> fs/ext4/extents.c | 26 +++++++++++++++++++++++---
>> mm/filemap.c | 1 +
>> 2 files changed, 24 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
>> index ccce8a7..ad455a0 100644
>> --- a/fs/ext4/extents.c
>> +++ b/fs/ext4/extents.c
>> @@ -3788,17 +3788,27 @@ static int ext4_ext_fiemap_cb(struct inode *inode, struct ext4_ext_path *path,
>> __u64 physical;
>> __u64 length;
>> __u32 flags = 0;
>> + ext4_lblk_t end;
>> int error;
>>
>> logical = (__u64)newex->ec_block << blksize_bits;
>>
>> - if (newex->ec_start == 0) {
>> + if (!newex->ec_start) {
>> + /*
>> + * There is no extent contains @newex->ec_block block.
>> + * It implies that @newex->ec_block block lies 1)a hole
>> + * or 2)delayed-allocated blocks that has not been
>> + * allocated, so pagecache is needed to lookup.
>> + *
>> + * And if it is case 2, @newex->ec_len needs to be corrected.
>> + *
>> + */
>> pgoff_t offset;
>> struct page *page;
>> struct buffer_head *bh = NULL;
>>
>> offset = logical >> PAGE_SHIFT;
>> - page = find_get_page(inode->i_mapping, offset);
>> + (void)find_get_pages(inode->i_mapping, offset, 1, &page);
>> if (!page || !page_has_buffers(page))
>> return EXT_CONTINUE;
>>
>> @@ -3807,8 +3817,13 @@ static int ext4_ext_fiemap_cb(struct inode *inode, struct ext4_ext_path *path,
>> if (!bh)
>> return EXT_CONTINUE;
>>
>> + /* Assume block-size equals page-size. */
>> if (buffer_delay(bh)) {
>> flags |= FIEMAP_EXTENT_DELALLOC;
>> + if (page->index > offset) {
>> + logical = ((__u64)page->index << PAGE_SHIFT);
>> + newex->ec_block = logical >> blksize_bits;
>> + }
>> page_cache_release(page);
>> } else {
>> page_cache_release(page);
>> @@ -3830,7 +3845,8 @@ static int ext4_ext_fiemap_cb(struct inode *inode, struct ext4_ext_path *path,
>> *
>> * XXX this might miss a single-block extent at EXT_MAX_BLOCK
>> */
>> - if (ext4_ext_next_allocated_block(path) == EXT_MAX_BLOCK ||
>> + end = ext4_ext_next_allocated_block(path);
>
> I think this will fall down if you have:
>
> [ HOLE ][ DELALLOC ][ HOLE ][ ALLOCATED ] won't it?
>
> i.e. your "end" will be the first block of "allocated" right?
We use pagevec_lookup_tag() instead of find_get_page() and check
BH_Delay of contiguous pages. Then, we can deal this model.
How do you think?
>
> -Eric
>
>> + if (end == EXT_MAX_BLOCK ||
>> newex->ec_block + newex->ec_len - 1 == EXT_MAX_BLOCK) {
>> loff_t size = i_size_read(inode);
>> loff_t bs = EXT4_BLOCK_SIZE(inode->i_sb);
>> @@ -3839,8 +3855,12 @@ static int ext4_ext_fiemap_cb(struct inode *inode, struct ext4_ext_path *path,
>> if ((flags & FIEMAP_EXTENT_DELALLOC) &&
>> logical+length > size)
>> length = (size - logical + bs - 1) & ~(bs-1);
>> + } else {
>> + newex->ec_len = end - newex->ec_block;
>> + length = (__u64)newex->ec_len << blksize_bits;
>> }
>>
>> +
>> error = fiemap_fill_next_extent(fieinfo, logical, physical,
>> length, flags);
>> if (error < 0)
>> diff --git a/mm/filemap.c b/mm/filemap.c
>> index 83a45d3..1c01ffc 100644
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -803,6 +803,7 @@ repeat:
>> rcu_read_unlock();
>> return ret;
>> }
>> +EXPORT_SYMBOL(find_get_pages);
>>
>> /**
>> * find_get_pages_contig - gang contiguous pagecache lookup
>
>
--
Best Wishes
Yongqiang Yang
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] ext4:Fix a bug in ext4_ext_fiemap_cb().
2011-02-24 0:40 ` Yongqiang Yang
@ 2011-02-24 0:56 ` Yongqiang Yang
0 siblings, 0 replies; 10+ messages in thread
From: Yongqiang Yang @ 2011-02-24 0:56 UTC (permalink / raw)
To: Eric Sandeen; +Cc: linux-ext4
On Thu, Feb 24, 2011 at 8:40 AM, Yongqiang Yang <xiaoqiangnk@gmail.com> wrote:
> On Thu, Feb 24, 2011 at 12:41 AM, Eric Sandeen <sandeen@redhat.com> wrote:
>> On 2/23/11 9:59 AM, Yongqiang Yang wrote:
>>> 1] Delayed extents after a hole are neglected.
>>>
>>> By using find_get_pages() instead of find_get_page() to
>>> lookup pagecache, delayed extents can be found, because
>>> find_get_pages() with nr_pages=1 will return the next page
>>> in pagecache.
>>>
>>> 2] Extents after a delayed extent or a hole are neglected as well.
>>>
>>> Fix it by accurating the request range by the result of
>>> ext4_ext_next_allocated_block().
>>>
>>> Reported by Chris Mason <chris.mason@oracle.com>:
>>> We've had reports on btrfs that cp is giving us files full of zeros
>>> instead of actually copying them. It was tracked down to a bug with
>>> the btrfs fiemap implementation where it was returning holes for
>>> delalloc ranges.
>>>
>>> Newer versions of cp are trusting fiemap to tell it where the holes
>>> are, which does seem like a pretty neat trick.
>>>
>>> I decided to give xfs and ext4 a shot with a few tests cases too, xfs
>>> passed with all the ones btrfs was getting wrong, and ext4 got the basic
>>> delalloc case right.
>>> $ mkfs.ext4 /dev/xxx
>>> $ mount /dev/xxx /mnt
>>> $ dd if=/dev/zero of=/mnt/foo bs=1M count=1
>>> $ fiemap-test foo
>>> ext: 0 logical: [ 0.. 255] phys: 0.. 255
>>> flags: 0x007 tot: 256
>>>
>>> Horray! But once we throw a hole in, things go bad:
>>> $ mkfs.ext4 /dev/xxx
>>> $ mount /dev/xxx /mnt
>>> $ dd if=/dev/zero of=/mnt/foo bs=1M count=1 seek=1
>>> $ fiemap-test foo
>>> < no output >
>>>
>>> We've got a delalloc extent after the hole and ext4 fiemap didn't find
>>> it. If I run sync to kick the delalloc out:
>>> $sync
>>> $ fiemap-test foo
>>> ext: 0 logical: [ 256.. 511] phys: 34048.. 34303
>>> flags: 0x001 tot: 256
>>>
>>> fiemap-test is sitting in my /usr/local/bin, and I have no idea how it
>>> got there. It's full of pretty comments so I know it isn't mine, but
>>> you can grab it here:
>>>
>>> http://oss.oracle.com/~mason/fiemap-test.c
>>>
>>> xfsqa has a fiemap program too.
>>>
>>> After Fix, test results are as follows:
>>> ext: 0 logical: [ 256.. 511] phys: 0.. 255
>>> flags: 0x007 tot: 256
>>> ext: 0 logical: [ 256.. 511] phys: 33280.. 33535
>>> flags: 0x001 tot: 256
>>>
>>> Signe-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
>>> ---
>>> fs/ext4/extents.c | 26 +++++++++++++++++++++++---
>>> mm/filemap.c | 1 +
>>> 2 files changed, 24 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
>>> index ccce8a7..ad455a0 100644
>>> --- a/fs/ext4/extents.c
>>> +++ b/fs/ext4/extents.c
>>> @@ -3788,17 +3788,27 @@ static int ext4_ext_fiemap_cb(struct inode *inode, struct ext4_ext_path *path,
>>> __u64 physical;
>>> __u64 length;
>>> __u32 flags = 0;
>>> + ext4_lblk_t end;
>>> int error;
>>>
>>> logical = (__u64)newex->ec_block << blksize_bits;
>>>
>>> - if (newex->ec_start == 0) {
>>> + if (!newex->ec_start) {
>>> + /*
>>> + * There is no extent contains @newex->ec_block block.
>>> + * It implies that @newex->ec_block block lies 1)a hole
>>> + * or 2)delayed-allocated blocks that has not been
>>> + * allocated, so pagecache is needed to lookup.
>>> + *
>>> + * And if it is case 2, @newex->ec_len needs to be corrected.
>>> + *
>>> + */
>>> pgoff_t offset;
>>> struct page *page;
>>> struct buffer_head *bh = NULL;
>>>
>>> offset = logical >> PAGE_SHIFT;
>>> - page = find_get_page(inode->i_mapping, offset);
>>> + (void)find_get_pages(inode->i_mapping, offset, 1, &page);
>>> if (!page || !page_has_buffers(page))
>>> return EXT_CONTINUE;
>>>
>>> @@ -3807,8 +3817,13 @@ static int ext4_ext_fiemap_cb(struct inode *inode, struct ext4_ext_path *path,
>>> if (!bh)
>>> return EXT_CONTINUE;
>>>
>>> + /* Assume block-size equals page-size. */
>>> if (buffer_delay(bh)) {
>>> flags |= FIEMAP_EXTENT_DELALLOC;
>>> + if (page->index > offset) {
>>> + logical = ((__u64)page->index << PAGE_SHIFT);
>>> + newex->ec_block = logical >> blksize_bits;
>>> + }
>>> page_cache_release(page);
>>> } else {
>>> page_cache_release(page);
>>> @@ -3830,7 +3845,8 @@ static int ext4_ext_fiemap_cb(struct inode *inode, struct ext4_ext_path *path,
>>> *
>>> * XXX this might miss a single-block extent at EXT_MAX_BLOCK
>>> */
>>> - if (ext4_ext_next_allocated_block(path) == EXT_MAX_BLOCK ||
>>> + end = ext4_ext_next_allocated_block(path);
>>
>> I think this will fall down if you have:
>>
>> [ HOLE ][ DELALLOC ][ HOLE ][ ALLOCATED ] won't it?
>>
>> i.e. your "end" will be the first block of "allocated" right?
> We use pagevec_lookup_tag() instead of find_get_page() and check
> BH_Delay of contiguous pages. Then, we can deal this model.
>
> How do you think?
If we have a function which can get contiguous pages with specified
tag, it will be greater! I am not sure if adding this function is
allowed.
>>
>> -Eric
>>
>>> + if (end == EXT_MAX_BLOCK ||
>>> newex->ec_block + newex->ec_len - 1 == EXT_MAX_BLOCK) {
>>> loff_t size = i_size_read(inode);
>>> loff_t bs = EXT4_BLOCK_SIZE(inode->i_sb);
>>> @@ -3839,8 +3855,12 @@ static int ext4_ext_fiemap_cb(struct inode *inode, struct ext4_ext_path *path,
>>> if ((flags & FIEMAP_EXTENT_DELALLOC) &&
>>> logical+length > size)
>>> length = (size - logical + bs - 1) & ~(bs-1);
>>> + } else {
>>> + newex->ec_len = end - newex->ec_block;
>>> + length = (__u64)newex->ec_len << blksize_bits;
>>> }
>>>
>>> +
>>> error = fiemap_fill_next_extent(fieinfo, logical, physical,
>>> length, flags);
>>> if (error < 0)
>>> diff --git a/mm/filemap.c b/mm/filemap.c
>>> index 83a45d3..1c01ffc 100644
>>> --- a/mm/filemap.c
>>> +++ b/mm/filemap.c
>>> @@ -803,6 +803,7 @@ repeat:
>>> rcu_read_unlock();
>>> return ret;
>>> }
>>> +EXPORT_SYMBOL(find_get_pages);
>>>
>>> /**
>>> * find_get_pages_contig - gang contiguous pagecache lookup
>>
>>
>
>
>
> --
> Best Wishes
> Yongqiang Yang
>
--
Best Wishes
Yongqiang Yang
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] ext4:Fix a bug in ext4_ext_fiemap_cb().
2011-02-24 0:04 ` Dave Chinner
@ 2011-02-24 16:34 ` Eric Sandeen
0 siblings, 0 replies; 10+ messages in thread
From: Eric Sandeen @ 2011-02-24 16:34 UTC (permalink / raw)
To: Dave Chinner; +Cc: Yongqiang Yang, linux-ext4
On 02/23/2011 06:04 PM, Dave Chinner wrote:
> This sound sort of problem indicateѕ to me that we need a generic
> fiemap test in xfstests that exercises all these corner cases.
> Perhaps something similar to the way I tested all the
> XFS_IOC_ZERO_RANGE corner cases in test 242 by setting up all the
> different hole/delalloc/unwritten/allocated combinations using
> xfs_io and used the fiemap output as the golden output?
>
> That would catch all of these problems in the different filesystems
> that implement fiemap and make sure we notice regressions pretty
> quickly.....
>
> Cheers,
>
> Dave.
Or, remove the "sync" calls from josef's fiemap tester:
fiemap->fm_flags = FIEMAP_FLAG_SYNC;
More random but would quickly blow up I bet.
-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] ext4:Fix a bug in ext4_ext_fiemap_cb().
2011-02-24 0:33 ` Yongqiang Yang
@ 2011-02-24 16:36 ` Eric Sandeen
0 siblings, 0 replies; 10+ messages in thread
From: Eric Sandeen @ 2011-02-24 16:36 UTC (permalink / raw)
To: Yongqiang Yang; +Cc: linux-ext4
On 02/23/2011 06:33 PM, Yongqiang Yang wrote:
> On Thu, Feb 24, 2011 at 12:41 AM, Eric Sandeen <sandeen@redhat.com> wrote:
>> On 2/23/11 9:59 AM, Yongqiang Yang wrote:
...
>>> @@ -3830,7 +3845,8 @@ static int ext4_ext_fiemap_cb(struct inode *inode, struct ext4_ext_path *path,
>>> *
>>> * XXX this might miss a single-block extent at EXT_MAX_BLOCK
>>> */
>>> - if (ext4_ext_next_allocated_block(path) == EXT_MAX_BLOCK ||
>>> + end = ext4_ext_next_allocated_block(path);
>>
>> I think this will fall down if you have:
>>
>> [ HOLE ][ DELALLOC ][ HOLE ][ ALLOCATED ] won't it?
>>
>> i.e. your "end" will be the first block of "allocated" right?
> Yes, but it neglect nothing. If we want to deal his model, we need to
> lookup dirty pages in specified range.
>
I think it's clearly a bug to return a delalloc range when in fact it's
a hole...
> We use pagevec_lookup_tag() instead of find_get_page() and check
> BH_Delay of contiguous pages. Then, we can deal this model.
Yes, that's how I was going to go about it before you jumped right in,
thanks! :)
-Eric
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2011-02-24 16:36 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-23 15:59 [PATCH] ext4:Fix a bug in ext4_ext_fiemap_cb() Yongqiang Yang
2011-02-23 16:41 ` Eric Sandeen
2011-02-24 0:04 ` Dave Chinner
2011-02-24 16:34 ` Eric Sandeen
2011-02-24 0:33 ` Yongqiang Yang
2011-02-24 16:36 ` Eric Sandeen
2011-02-24 0:40 ` Yongqiang Yang
2011-02-24 0:56 ` Yongqiang Yang
2011-02-23 23:35 ` Andreas Dilger
2011-02-24 0:37 ` Yongqiang Yang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).