public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
* donor file data inconsistent after EXT4_IOC_MOVE_EXT
@ 2009-10-18  7:03 Peng Tao
  2009-10-19  0:40 ` Akira Fujita
  0 siblings, 1 reply; 5+ messages in thread
From: Peng Tao @ 2009-10-18  7:03 UTC (permalink / raw)
  To: ext4 development; +Cc: Akira Fujita, Kazuya Mio, Theodore Ts'o

[-- Attachment #1: Type: text/plain, Size: 3407 bytes --]

Hi,

As I am looking more closely to the EXT4_IOC_MOVE_EXT ioctl, I found a
problem. The iotcl exchanges the block layout of the orig file and donor file
and then writes out orig file data to orig file's new blocks.
After the ioctl, the donor file would have the blocks previously owned by the
orig file. But it turns out inconsistent.

A simple test case for revealing the bug:
The program a.out is calling EXT4_IOC_MOVE_EXT against argv[1] (as orig file)
and argv[2] (as donor file) and move_data.len = argv[1]'s block count.

And I am running mainline kernel 2.6.32-rc3 and the ext4 partition is mounted
in ordered mode with default settings, if you are interested.

[bergwolf@move_extent]$sh test-5.sh 
make full-img
========create full.img========
dd if=/home/bergwolf/vm/OpenSolaris200805.iso of=full-1.img bs=1M count=30
30+0 records in
30+0 records out
31457280 bytes (31 MB) copied, 0.0847457 s, 371 MB/s
dd if="/home/bergwolf/vm/WINXP_EN_PRO_SP3_MSDN/WinXp+Sp3 enu.iso" of=full-2.img bs=1M count=30
30+0 records in
30+0 records out
31457280 bytes (31 MB) copied, 0.0664263 s, 474 MB/s
md5sum full-1.img full-2.img
4f47bee75290d094c94f8a7cb2075c69  full-1.img
9e35330146a610d0aa2fab1d16aa2b09  full-2.img
./a.out full-1.img full-2.img
md5sum full-1.img full-2.img
4f47bee75290d094c94f8a7cb2075c69  full-1.img
9e35330146a610d0aa2fab1d16aa2b09  full-2.img		<---- wrong content
[bergwolf@move_extent]$cd
[bergwolf@~]$sudo umount /other/
[bergwolf@~]$sudo mount /other/
[bergwolf@~]$cd -
/other/test/move_extent
[bergwolf@move_extent]$md5sum full-1.img full-2.img 
4f47bee75290d094c94f8a7cb2075c69  full-1.img
4f47bee75290d094c94f8a7cb2075c69  full-2.img		<---- right result

I verified that the bug is because of the pagecache hit in the  vfs_read(), 
via the following test case:

[bergwolf@move_extent]$sudo sh test-4.sh 
make full-img
========create full.img========
dd if=/home/bergwolf/vm/OpenSolaris200805.iso of=full-1.img bs=1M count=30
30+0 records in
30+0 records out
31457280 bytes (31 MB) copied, 0.115624 s, 272 MB/s
dd if="/home/bergwolf/vm/WINXP_EN_PRO_SP3_MSDN/WinXp+Sp3 enu.iso" of=full-2.img bs=1M count=30
30+0 records in
30+0 records out
31457280 bytes (31 MB) copied, 1.16482 s, 27.0 MB/s
md5sum full-1.img full-2.img
4f47bee75290d094c94f8a7cb2075c69  full-1.img
9e35330146a610d0aa2fab1d16aa2b09  full-2.img
sync
echo 1 > /proc/sys/vm/drop_caches	<------- this drops all pagecaches, FYI
./a.out full-1.img full-2.img
md5sum full-1.img full-2.img
4f47bee75290d094c94f8a7cb2075c69  full-1.img
4f47bee75290d094c94f8a7cb2075c69  full-2.img

IIUC, this is because pagecache not uptodate.  FWIW, EXT4_IOC_MOVE_EXT
calls ext4_ext_invalidate_cache() to prevent later access to donor file reading
old data. But if the data is already in the pagecache (in which case,
ext4_get_blocks() won't be called), vfs_read will still read the old data.
But I don't know if there is a way to discard all pagecache for a specific
inode. I tried to write something similar to ext4_da_block_invalidatepages()
and ClearPageUptodate() on each page found in the mapping address,
but it didn't work.

So am I missing anything? And any hints how to force the following vfs_read()
to read from disk?

-- 
Best Regards,
Peng Tao
State Key Laboratory of Networking and Switching Technology
Beijing Univ. of Posts and Telecoms.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 261 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: donor file data inconsistent after EXT4_IOC_MOVE_EXT
  2009-10-18  7:03 donor file data inconsistent after EXT4_IOC_MOVE_EXT Peng Tao
@ 2009-10-19  0:40 ` Akira Fujita
  2009-10-19  2:28   ` Peng Tao
  0 siblings, 1 reply; 5+ messages in thread
From: Akira Fujita @ 2009-10-19  0:40 UTC (permalink / raw)
  To: Peng Tao; +Cc: ext4 development, Kazuya Mio, Theodore Ts'o

Hi Peng,

This is a known issue, and I sent a patch to linux-ext4 2 weeks ago.
Unfortunately it is not included in the ext4 patch queue yet.

http://marc.info/?l=linux-ext4&m=125447192709338&w=2

Would you retry your test case with above my patch?

Regards,
Akira Fujita

Peng Tao wrote:
> Hi,
> 
> As I am looking more closely to the EXT4_IOC_MOVE_EXT ioctl, I found a
> problem. The iotcl exchanges the block layout of the orig file and donor file
> and then writes out orig file data to orig file's new blocks.
> After the ioctl, the donor file would have the blocks previously owned by the
> orig file. But it turns out inconsistent.
> 
> A simple test case for revealing the bug:
> The program a.out is calling EXT4_IOC_MOVE_EXT against argv[1] (as orig file)
> and argv[2] (as donor file) and move_data.len = argv[1]'s block count.
> 
> And I am running mainline kernel 2.6.32-rc3 and the ext4 partition is mounted
> in ordered mode with default settings, if you are interested.
> 
> [bergwolf@move_extent]$sh test-5.sh 
> make full-img
> ========create full.img========
> dd if=/home/bergwolf/vm/OpenSolaris200805.iso of=full-1.img bs=1M count=30
> 30+0 records in
> 30+0 records out
> 31457280 bytes (31 MB) copied, 0.0847457 s, 371 MB/s
> dd if="/home/bergwolf/vm/WINXP_EN_PRO_SP3_MSDN/WinXp+Sp3 enu.iso" of=full-2.img bs=1M count=30
> 30+0 records in
> 30+0 records out
> 31457280 bytes (31 MB) copied, 0.0664263 s, 474 MB/s
> md5sum full-1.img full-2.img
> 4f47bee75290d094c94f8a7cb2075c69  full-1.img
> 9e35330146a610d0aa2fab1d16aa2b09  full-2.img
> ./a.out full-1.img full-2.img
> md5sum full-1.img full-2.img
> 4f47bee75290d094c94f8a7cb2075c69  full-1.img
> 9e35330146a610d0aa2fab1d16aa2b09  full-2.img		<---- wrong content
> [bergwolf@move_extent]$cd
> [bergwolf@~]$sudo umount /other/
> [bergwolf@~]$sudo mount /other/
> [bergwolf@~]$cd -
> /other/test/move_extent
> [bergwolf@move_extent]$md5sum full-1.img full-2.img 
> 4f47bee75290d094c94f8a7cb2075c69  full-1.img
> 4f47bee75290d094c94f8a7cb2075c69  full-2.img		<---- right result
> 
> I verified that the bug is because of the pagecache hit in the  vfs_read(), 
> via the following test case:
> 
> [bergwolf@move_extent]$sudo sh test-4.sh 
> make full-img
> ========create full.img========
> dd if=/home/bergwolf/vm/OpenSolaris200805.iso of=full-1.img bs=1M count=30
> 30+0 records in
> 30+0 records out
> 31457280 bytes (31 MB) copied, 0.115624 s, 272 MB/s
> dd if="/home/bergwolf/vm/WINXP_EN_PRO_SP3_MSDN/WinXp+Sp3 enu.iso" of=full-2.img bs=1M count=30
> 30+0 records in
> 30+0 records out
> 31457280 bytes (31 MB) copied, 1.16482 s, 27.0 MB/s
> md5sum full-1.img full-2.img
> 4f47bee75290d094c94f8a7cb2075c69  full-1.img
> 9e35330146a610d0aa2fab1d16aa2b09  full-2.img
> sync
> echo 1 > /proc/sys/vm/drop_caches	<------- this drops all pagecaches, FYI
> ./a.out full-1.img full-2.img
> md5sum full-1.img full-2.img
> 4f47bee75290d094c94f8a7cb2075c69  full-1.img
> 4f47bee75290d094c94f8a7cb2075c69  full-2.img
> 
> IIUC, this is because pagecache not uptodate.  FWIW, EXT4_IOC_MOVE_EXT
> calls ext4_ext_invalidate_cache() to prevent later access to donor file reading
> old data. But if the data is already in the pagecache (in which case,
> ext4_get_blocks() won't be called), vfs_read will still read the old data.
> But I don't know if there is a way to discard all pagecache for a specific
> inode. I tried to write something similar to ext4_da_block_invalidatepages()
> and ClearPageUptodate() on each page found in the mapping address,
> but it didn't work.
> 
> So am I missing anything? And any hints how to force the following vfs_read()
> to read from disk?
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: donor file data inconsistent after EXT4_IOC_MOVE_EXT
  2009-10-19  0:40 ` Akira Fujita
@ 2009-10-19  2:28   ` Peng Tao
  2009-10-19  5:42     ` Peng Tao
  0 siblings, 1 reply; 5+ messages in thread
From: Peng Tao @ 2009-10-19  2:28 UTC (permalink / raw)
  To: Akira Fujita; +Cc: ext4 development, Kazuya Mio, Theodore Ts'o

Hi, Akira,
Akira Fujita wrote:
> Hi Peng,
> 
> This is a known issue, and I sent a patch to linux-ext4 2 weeks ago.
> Unfortunately it is not included in the ext4 patch queue yet.
> 
> http://marc.info/?l=linux-ext4&m=125447192709338&w=2
> 
> Would you retry your test case with above my patch?
It didn't work. I still got the old donor file data.

I applied the two patches to avoid conflicts:
http://marc.info/?l=linux-ext4&m=125447192609335&w=2
http://marc.info/?l=linux-ext4&m=125447192709338&w=2

[bergwolf@move_extent]$sh test-5.sh 
make full-img
========create full.img========
dd if=/home/bergwolf/vm/OpenSolaris200805.iso of=full-1.img bs=1M count=30
30+0 records in
30+0 records out
31457280 bytes (31 MB) copied, 0.907358 s, 34.7 MB/s
dd if="/home/bergwolf/vm/WINXP_EN_PRO_SP3_MSDN/WinXp+Sp3 enu.iso" of=full-2.img bs=1M count=30
30+0 records in
30+0 records out
31457280 bytes (31 MB) copied, 2.09212 s, 15.0 MB/s
md5sum full-1.img full-2.img
4f47bee75290d094c94f8a7cb2075c69  full-1.img
9e35330146a610d0aa2fab1d16aa2b09  full-2.img
./a.out full-1.img full-2.img
7680
7680
md5sum full-1.img full-2.img
4f47bee75290d094c94f8a7cb2075c69  full-1.img
9e35330146a610d0aa2fab1d16aa2b09  full-2.img	<---- wrong pagecache hit here
[bergwolf@move_extent]$cd
[bergwolf@~]$sudo umount /other/
[bergwolf@~]$sudo mount /other/
[bergwolf@~]$cd -
/other/test/move_extent
[bergwolf@move_extent]$md5sum full-2.img
4f47bee75290d094c94f8a7cb2075c69  full-2.img

<snip>

-- 
Best Regards,
Peng Tao
State Key Laboratory of Networking and Switching Technology
Beijing Univ. of Posts and Telecoms.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: donor file data inconsistent after EXT4_IOC_MOVE_EXT
  2009-10-19  2:28   ` Peng Tao
@ 2009-10-19  5:42     ` Peng Tao
  2009-10-19  8:28       ` Akira Fujita
  0 siblings, 1 reply; 5+ messages in thread
From: Peng Tao @ 2009-10-19  5:42 UTC (permalink / raw)
  To: Akira Fujita; +Cc: ext4 development, Kazuya Mio, Theodore Ts'o

Hi, Akira,

Peng Tao wrote:
> Hi, Akira,
> Akira Fujita wrote:
>> Hi Peng,
>>
>> This is a known issue, and I sent a patch to linux-ext4 2 weeks ago.
>> Unfortunately it is not included in the ext4 patch queue yet.
>>
>> http://marc.info/?l=linux-ext4&m=125447192709338&w=2
>>
>> Would you retry your test case with above my patch?
> It didn't work. I still got the old donor file data.
invalidate_mapping_pages() in your patch won't invalidate the locked page returned by
grab_cache_page(). The following patch addresses the bug by calling
invalidate_inode_pages2_range() instead.

commit e88c6fa454c38039e9f342196129f8ef782f5edb
Author: Peng Tao <bergwolf@gmail.com>
Date:   Mon Oct 19 10:57:51 2009 +0800

    ext4: Invalidate donor file pagecache in EXT4_IOC_MOVE_EXT
    
    After calling EXT4_IOC_MOVE_EXT, the donor file will have data blocks previously
    owned by the orig file. But the pagecache for donor file still refers to old
    blocks in memory. The patch invalidates all pagecaches for donor file corresponding
    to replaced blocks. So later reading donor file won't hit the wrong pagecache.
    
    Signed-off-by: Peng Tao <bergwolf@gmail.com>

diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
index 25b6b14..dcdfec8 100644
--- a/fs/ext4/move_extent.c
+++ b/fs/ext4/move_extent.c
@@ -826,6 +826,7 @@ move_extent_per_page(struct file *o_filp, struct inode *donor_inode,
 	const struct address_space_operations *a_ops = mapping->a_ops;
 	handle_t *handle;
 	ext4_lblk_t orig_blk_offset;
+	pgoff_t donor_page_offset = orig_page_offset;
 	long long offs = orig_page_offset << PAGE_CACHE_SHIFT;
 	unsigned long blocksize = orig_inode->i_sb->s_blocksize;
 	unsigned int w_flags = 0;
@@ -920,6 +921,8 @@ move_extent_per_page(struct file *o_filp, struct inode *donor_inode,
 	ext4_ext_invalidate_cache(orig_inode);
 	ext4_ext_invalidate_cache(donor_inode);
 
+	invalidate_inode_pages2_range(donor_inode->i_mapping, donor_page_offset,
+					donor_page_offset);
 	if (!page_has_buffers(page))
 		create_empty_buffers(page, 1 << orig_inode->i_blkbits, 0);
 




^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: donor file data inconsistent after EXT4_IOC_MOVE_EXT
  2009-10-19  5:42     ` Peng Tao
@ 2009-10-19  8:28       ` Akira Fujita
  0 siblings, 0 replies; 5+ messages in thread
From: Akira Fujita @ 2009-10-19  8:28 UTC (permalink / raw)
  To: Peng Tao; +Cc: ext4 development, Kazuya Mio, Theodore Ts'o

Hi Peng,
Peng Tao wrote:
> Hi, Akira,
> 
> Peng Tao wrote:
>> Hi, Akira,
>> Akira Fujita wrote:
>>> Hi Peng,
>>>
>>> This is a known issue, and I sent a patch to linux-ext4 2 weeks ago.
>>> Unfortunately it is not included in the ext4 patch queue yet.
>>>
>>> http://marc.info/?l=linux-ext4&m=125447192709338&w=2
>>>
>>> Would you retry your test case with above my patch?
>> It didn't work. I still got the old donor file data.
> invalidate_mapping_pages() in your patch won't invalidate the locked page returned by
> grab_cache_page(). The following patch addresses the bug by calling
> invalidate_inode_pages2_range() instead.

Though I tested my patch, I seem to have made a mistake.
Anyway, I confirmed your patch works fine, thank you.

Regards,
Akira Fujita

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-10-19  8:29 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-18  7:03 donor file data inconsistent after EXT4_IOC_MOVE_EXT Peng Tao
2009-10-19  0:40 ` Akira Fujita
2009-10-19  2:28   ` Peng Tao
2009-10-19  5:42     ` Peng Tao
2009-10-19  8:28       ` Akira Fujita

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox