[PATCH] ext4: avoid exposure of stale data in ext4_punch

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] ext4: avoid exposure of stale data in ext4_punch_hole()
@ 2013-09-26 17:32 Maxim Patlasov
  2013-09-26 18:53 ` Jan Kara
  2013-09-27 15:54 ` [PATCH] ext4: avoid exposure of stale data in ext4_punch_hole() -v2 Maxim Patlasov
  0 siblings, 2 replies; 9+ messages in thread
From: Maxim Patlasov @ 2013-09-26 17:32 UTC (permalink / raw)
  To: tytso; +Cc: linux-ext4, adilger.kernel, linux-kernel

While handling punch-hole fallocate, it's useless to truncate page cache
before removing the range from extent tree (or block map in indirect case)
because page cache can be re-populated (by read-ahead or read(2) or mmap-ed
read) immediately after truncating page cache, but before updating extent
tree (or block map). In that case the user will see stale data even after
fallocate is completed.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
---
 fs/ext4/inode.c |   17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 0d424d7..6b71116 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3564,14 +3564,6 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length)
 
 	}
 
-	first_block_offset = round_up(offset, sb->s_blocksize);
-	last_block_offset = round_down((offset + length), sb->s_blocksize) - 1;
-
-	/* Now release the pages and zero block aligned part of pages*/
-	if (last_block_offset > first_block_offset)
-		truncate_pagecache_range(inode, first_block_offset,
-					 last_block_offset);

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] ext4: avoid exposure of stale data in ext4_punch_hole()
  2013-09-26 17:32 [PATCH] ext4: avoid exposure of stale data in ext4_punch_hole() Maxim Patlasov
@ 2013-09-26 18:53 ` Jan Kara
  2013-09-27 13:05   ` Maxim Patlasov
  2013-09-27 15:54 ` [PATCH] ext4: avoid exposure of stale data in ext4_punch_hole() -v2 Maxim Patlasov
  1 sibling, 1 reply; 9+ messages in thread
From: Jan Kara @ 2013-09-26 18:53 UTC (permalink / raw)
  To: Maxim Patlasov; +Cc: tytso, linux-ext4, adilger.kernel, linux-kernel

  Hello,

On Thu 26-09-13 21:32:07, Maxim Patlasov wrote:
> While handling punch-hole fallocate, it's useless to truncate page cache
> before removing the range from extent tree (or block map in indirect case)
> because page cache can be re-populated (by read-ahead or read(2) or mmap-ed
> read) immediately after truncating page cache, but before updating extent
> tree (or block map). In that case the user will see stale data even after
> fallocate is completed.
  Yes, this is a known problem. The trouble is there isn't a reliable fix
currently possible. If we don't truncate page cache before removing blocks,
we will have pages in memory being backed by already freed blocks - not
good as that can lead to data corruption. So you should't really remove the
truncation from before we remove the blocks.

You are right that if punch hole races with page fault or read, we can
create again pages with block mapping which will become stale soon and the
same problem as I wrote above applies. Truncating pagecache after we
removed blocks only narrows the race window but doesn't really fix the
problem.

Properly fixing the problem requires significant overhaul in how mmap_sem
is used in page fault. I'm working on patches to do that but it will take
some time.

								Honza
 
> Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
> ---
>  fs/ext4/inode.c |   17 +++++++++--------
>  1 file changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 0d424d7..6b71116 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3564,14 +3564,6 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length)
>  
>  	}
>  
> -	first_block_offset = round_up(offset, sb->s_blocksize);
> -	last_block_offset = round_down((offset + length), sb->s_blocksize) - 1;
> -
> -	/* Now release the pages and zero block aligned part of pages*/
> -	if (last_block_offset > first_block_offset)
> -		truncate_pagecache_range(inode, first_block_offset,
> -					 last_block_offset);
> -
>  	/* Wait all existing dio workers, newcomers will block on i_mutex */
>  	ext4_inode_block_unlocked_dio(inode);
>  	inode_dio_wait(inode);
> @@ -3621,6 +3613,15 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length)
>  	up_write(&EXT4_I(inode)->i_data_sem);
>  	if (IS_SYNC(inode))
>  		ext4_handle_sync(handle);
> +
> +	first_block_offset = round_up(offset, sb->s_blocksize);
> +	last_block_offset = round_down((offset + length), sb->s_blocksize) - 1;
> +
> +	/* Now release the pages and zero block aligned part of pages */
> +	if (last_block_offset > first_block_offset)
> +		truncate_pagecache_range(inode, first_block_offset,
> +					 last_block_offset);
> +
>  	inode->i_mtime = inode->i_ctime = ext4_current_time(inode);
>  	ext4_mark_inode_dirty(handle, inode);
>  out_stop:
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] ext4: avoid exposure of stale data in ext4_punch_hole()
  2013-09-26 18:53 ` Jan Kara
@ 2013-09-27 13:05   ` Maxim Patlasov
  2013-09-27 14:43     ` Jan Kara
  0 siblings, 1 reply; 9+ messages in thread
From: Maxim Patlasov @ 2013-09-27 13:05 UTC (permalink / raw)
  To: Jan Kara; +Cc: tytso, linux-ext4, adilger.kernel, linux-kernel

Hi Jan,

On 09/26/2013 10:53 PM, Jan Kara wrote:
>    Hello,
>
> On Thu 26-09-13 21:32:07, Maxim Patlasov wrote:
>> While handling punch-hole fallocate, it's useless to truncate page cache
>> before removing the range from extent tree (or block map in indirect case)
>> because page cache can be re-populated (by read-ahead or read(2) or mmap-ed
>> read) immediately after truncating page cache, but before updating extent
>> tree (or block map). In that case the user will see stale data even after
>> fallocate is completed.
>    Yes, this is a known problem. The trouble is there isn't a reliable fix
> currently possible. If we don't truncate page cache before removing blocks,
> we will have pages in memory being backed by already freed blocks - not
> good as that can lead to data corruption. So you should't really remove the
> truncation from before we remove the blocks.

I'd like to understand the problem better. Could you please provide any 
details about that data corruption? And if it was already discussed 
somewhere, please point me to there.

>
> You are right that if punch hole races with page fault or read, we can
> create again pages with block mapping which will become stale soon and the
> same problem as I wrote above applies. Truncating pagecache after we
> removed blocks only narrows the race window but doesn't really fix the
> problem.

There seems to be two different problems: 1) pages backed by already 
freed blocks; 2) keeping page-cache populated by pages with stale data 
after fallocate completes. Your concerns refer to the first problem. My 
patch was intended to resolve the second. It seems to me that my patch 
really fixes the second problem and it doesn't make things worse w.r.t. 
the fisrt problem. Do I miss something?

Thanks,
Maxim

>
> Properly fixing the problem requires significant overhaul in how mmap_sem
> is used in page fault. I'm working on patches to do that but it will take
> some time.
>
> 								Honza
>   
>> Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
>> ---
>>   fs/ext4/inode.c |   17 +++++++++--------
>>   1 file changed, 9 insertions(+), 8 deletions(-)
>>
>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>> index 0d424d7..6b71116 100644
>> --- a/fs/ext4/inode.c
>> +++ b/fs/ext4/inode.c
>> @@ -3564,14 +3564,6 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length)
>>   
>>   	}
>>   
>> -	first_block_offset = round_up(offset, sb->s_blocksize);
>> -	last_block_offset = round_down((offset + length), sb->s_blocksize) - 1;
>> -
>> -	/* Now release the pages and zero block aligned part of pages*/
>> -	if (last_block_offset > first_block_offset)
>> -		truncate_pagecache_range(inode, first_block_offset,
>> -					 last_block_offset);
>> -
>>   	/* Wait all existing dio workers, newcomers will block on i_mutex */
>>   	ext4_inode_block_unlocked_dio(inode);
>>   	inode_dio_wait(inode);
>> @@ -3621,6 +3613,15 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length)
>>   	up_write(&EXT4_I(inode)->i_data_sem);
>>   	if (IS_SYNC(inode))
>>   		ext4_handle_sync(handle);
>> +
>> +	first_block_offset = round_up(offset, sb->s_blocksize);
>> +	last_block_offset = round_down((offset + length), sb->s_blocksize) - 1;
>> +
>> +	/* Now release the pages and zero block aligned part of pages */
>> +	if (last_block_offset > first_block_offset)
>> +		truncate_pagecache_range(inode, first_block_offset,
>> +					 last_block_offset);
>> +
>>   	inode->i_mtime = inode->i_ctime = ext4_current_time(inode);
>>   	ext4_mark_inode_dirty(handle, inode);
>>   out_stop:
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] ext4: avoid exposure of stale data in ext4_punch_hole()
  2013-09-27 13:05   ` Maxim Patlasov
@ 2013-09-27 14:43     ` Jan Kara
  2013-09-27 15:16       ` Maxim Patlasov
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Kara @ 2013-09-27 14:43 UTC (permalink / raw)
  To: Maxim Patlasov; +Cc: Jan Kara, tytso, linux-ext4, adilger.kernel, linux-kernel

  Hi,

On Fri 27-09-13 17:05:18, Maxim Patlasov wrote:
> On 09/26/2013 10:53 PM, Jan Kara wrote:
> >   Hello,
> >
> >On Thu 26-09-13 21:32:07, Maxim Patlasov wrote:
> >>While handling punch-hole fallocate, it's useless to truncate page cache
> >>before removing the range from extent tree (or block map in indirect case)
> >>because page cache can be re-populated (by read-ahead or read(2) or mmap-ed
> >>read) immediately after truncating page cache, but before updating extent
> >>tree (or block map). In that case the user will see stale data even after
> >>fallocate is completed.
> >   Yes, this is a known problem. The trouble is there isn't a reliable fix
> >currently possible. If we don't truncate page cache before removing blocks,
> >we will have pages in memory being backed by already freed blocks - not
> >good as that can lead to data corruption. So you should't really remove the
> >truncation from before we remove the blocks.
> 
> I'd like to understand the problem better. Could you please provide
> any details about that data corruption? And if it was already
> discussed somewhere, please point me to there.
  It was discussed at linux-ext4 / linux-fsdevel sometime in spring but I'm
not sure to which extent we covered details of the races. Anyway if you
have blocks in pagecache which point to already freed blocks (lets say they
belong to file foo), the following
can happen:
  1) Someone reallocates freed blocks for another file bar. And writes new
     data to them. Writeback flushes the data to disk.
  2) Someone dirties pages with stale mapping data in file foo.
  3) Writeback writes dirty pages of foo, overwriting data in bar.

> >You are right that if punch hole races with page fault or read, we can
> >create again pages with block mapping which will become stale soon and the
> >same problem as I wrote above applies. Truncating pagecache after we
> >removed blocks only narrows the race window but doesn't really fix the
> >problem.
> 
> There seems to be two different problems: 1) pages backed by already
> freed blocks; 2) keeping page-cache populated by pages with stale
> data after fallocate completes. Your concerns refer to the first
> problem. My patch was intended to resolve the second. It seems to me
> that my patch really fixes the second problem and it doesn't make
> things worse w.r.t. the fisrt problem. Do I miss something?
  IMHO these are different aspects of the same problem but that's not
important. Your patch actually makes things worse because currently if the
file isn't written to via mmap while punch_hole is running, everything is
fine. After your patch writeback of old data could race with punch hole
freeing blocks resulting in data corruption I have described above. The
'no-harm' solution would be to add another truncation of pagecache after
punch hole is done. I think that would be a good way to reduce the race
window before the problem gets fixed properly.

								Honza

> >>Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
> >>---
> >>  fs/ext4/inode.c |   17 +++++++++--------
> >>  1 file changed, 9 insertions(+), 8 deletions(-)
> >>
> >>diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> >>index 0d424d7..6b71116 100644
> >>--- a/fs/ext4/inode.c
> >>+++ b/fs/ext4/inode.c
> >>@@ -3564,14 +3564,6 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length)
> >>  	}
> >>-	first_block_offset = round_up(offset, sb->s_blocksize);
> >>-	last_block_offset = round_down((offset + length), sb->s_blocksize) - 1;
> >>-
> >>-	/* Now release the pages and zero block aligned part of pages*/
> >>-	if (last_block_offset > first_block_offset)
> >>-		truncate_pagecache_range(inode, first_block_offset,
> >>-					 last_block_offset);
> >>-
> >>  	/* Wait all existing dio workers, newcomers will block on i_mutex */
> >>  	ext4_inode_block_unlocked_dio(inode);
> >>  	inode_dio_wait(inode);
> >>@@ -3621,6 +3613,15 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length)
> >>  	up_write(&EXT4_I(inode)->i_data_sem);
> >>  	if (IS_SYNC(inode))
> >>  		ext4_handle_sync(handle);
> >>+
> >>+	first_block_offset = round_up(offset, sb->s_blocksize);
> >>+	last_block_offset = round_down((offset + length), sb->s_blocksize) - 1;
> >>+
> >>+	/* Now release the pages and zero block aligned part of pages */
> >>+	if (last_block_offset > first_block_offset)
> >>+		truncate_pagecache_range(inode, first_block_offset,
> >>+					 last_block_offset);
> >>+
> >>  	inode->i_mtime = inode->i_ctime = ext4_current_time(inode);
> >>  	ext4_mark_inode_dirty(handle, inode);
> >>  out_stop:
> >>
> >>--
> >>To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> >>the body of a message to majordomo@vger.kernel.org
> >>More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] ext4: avoid exposure of stale data in ext4_punch_hole()
  2013-09-27 14:43     ` Jan Kara
@ 2013-09-27 15:16       ` Maxim Patlasov
  0 siblings, 0 replies; 9+ messages in thread
From: Maxim Patlasov @ 2013-09-27 15:16 UTC (permalink / raw)
  To: Jan Kara; +Cc: tytso, linux-ext4, adilger.kernel, linux-kernel

Hi,

On 09/27/2013 06:43 PM, Jan Kara wrote:
> On Fri 27-09-13 17:05:18, Maxim Patlasov wrote:
>> On 09/26/2013 10:53 PM, Jan Kara wrote:
>>>    Hello,
>>>
>>> On Thu 26-09-13 21:32:07, Maxim Patlasov wrote:
>>>> While handling punch-hole fallocate, it's useless to truncate page cache
>>>> before removing the range from extent tree (or block map in indirect case)
>>>> because page cache can be re-populated (by read-ahead or read(2) or mmap-ed
>>>> read) immediately after truncating page cache, but before updating extent
>>>> tree (or block map). In that case the user will see stale data even after
>>>> fallocate is completed.
>>>    Yes, this is a known problem. The trouble is there isn't a reliable fix
>>> currently possible. If we don't truncate page cache before removing blocks,
>>> we will have pages in memory being backed by already freed blocks - not
>>> good as that can lead to data corruption. So you should't really remove the
>>> truncation from before we remove the blocks.
>> I'd like to understand the problem better. Could you please provide
>> any details about that data corruption? And if it was already
>> discussed somewhere, please point me to there.
>    It was discussed at linux-ext4 / linux-fsdevel sometime in spring but I'm
> not sure to which extent we covered details of the races. Anyway if you
> have blocks in pagecache which point to already freed blocks (lets say they
> belong to file foo), the following
> can happen:
>    1) Someone reallocates freed blocks for another file bar. And writes new
>       data to them. Writeback flushes the data to disk.
>    2) Someone dirties pages with stale mapping data in file foo.
>    3) Writeback writes dirty pages of foo, overwriting data in bar.

That's clear now. Thanks a lot for the explanation.

>>> You are right that if punch hole races with page fault or read, we can
>>> create again pages with block mapping which will become stale soon and the
>>> same problem as I wrote above applies. Truncating pagecache after we
>>> removed blocks only narrows the race window but doesn't really fix the
>>> problem.
>> There seems to be two different problems: 1) pages backed by already
>> freed blocks; 2) keeping page-cache populated by pages with stale
>> data after fallocate completes. Your concerns refer to the first
>> problem. My patch was intended to resolve the second. It seems to me
>> that my patch really fixes the second problem and it doesn't make
>> things worse w.r.t. the fisrt problem. Do I miss something?
>    IMHO these are different aspects of the same problem but that's not
> important. Your patch actually makes things worse because currently if the
> file isn't written to via mmap while punch_hole is running, everything is
> fine. After your patch writeback of old data could race with punch hole
> freeing blocks resulting in data corruption I have described above. The
> 'no-harm' solution would be to add another truncation of pagecache after
> punch hole is done. I think that would be a good way to reduce the race
> window before the problem gets fixed properly.

Yes, I agree. I also think there is another reason making 'no-harm' 
patch worthwhile. Because currently, even users who write nothing suffer 
(any read may re-populate PC in the window). I'll resend corrected patch 
just in case somebody else is interested.

Thanks,
Maxim

>>>> Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
>>>> ---
>>>>   fs/ext4/inode.c |   17 +++++++++--------
>>>>   1 file changed, 9 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>>>> index 0d424d7..6b71116 100644
>>>> --- a/fs/ext4/inode.c
>>>> +++ b/fs/ext4/inode.c
>>>> @@ -3564,14 +3564,6 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length)
>>>>   	}
>>>> -	first_block_offset = round_up(offset, sb->s_blocksize);
>>>> -	last_block_offset = round_down((offset + length), sb->s_blocksize) - 1;
>>>> -
>>>> -	/* Now release the pages and zero block aligned part of pages*/
>>>> -	if (last_block_offset > first_block_offset)
>>>> -		truncate_pagecache_range(inode, first_block_offset,
>>>> -					 last_block_offset);
>>>> -
>>>>   	/* Wait all existing dio workers, newcomers will block on i_mutex */
>>>>   	ext4_inode_block_unlocked_dio(inode);
>>>>   	inode_dio_wait(inode);
>>>> @@ -3621,6 +3613,15 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length)
>>>>   	up_write(&EXT4_I(inode)->i_data_sem);
>>>>   	if (IS_SYNC(inode))
>>>>   		ext4_handle_sync(handle);
>>>> +
>>>> +	first_block_offset = round_up(offset, sb->s_blocksize);
>>>> +	last_block_offset = round_down((offset + length), sb->s_blocksize) - 1;
>>>> +
>>>> +	/* Now release the pages and zero block aligned part of pages */
>>>> +	if (last_block_offset > first_block_offset)
>>>> +		truncate_pagecache_range(inode, first_block_offset,
>>>> +					 last_block_offset);
>>>> +
>>>>   	inode->i_mtime = inode->i_ctime = ext4_current_time(inode);
>>>>   	ext4_mark_inode_dirty(handle, inode);
>>>>   out_stop:
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] ext4: avoid exposure of stale data in ext4_punch_hole() -v2
  2013-09-26 17:32 [PATCH] ext4: avoid exposure of stale data in ext4_punch_hole() Maxim Patlasov
  2013-09-26 18:53 ` Jan Kara
@ 2013-09-27 15:54 ` Maxim Patlasov
  2013-09-27 16:05   ` Jan Kara
  1 sibling, 1 reply; 9+ messages in thread
From: Maxim Patlasov @ 2013-09-27 15:54 UTC (permalink / raw)
  To: tytso; +Cc: adilger.kernel, linux-ext4, jack, linux-kernel

While handling punch-hole fallocate, it's useless to truncate page cache
before removing the range from extent tree (or block map in indirect case)
because page cache can be re-populated (by read-ahead or read(2) or mmap-ed
read) immediately after truncating page cache, but before updating extent
tree (or block map). In that case the user will see stale data even after
fallocate is completed.

Changed in v2 (Thanks to Jan Kara):
 - Until the problem of data corruption resulting from pages backed by
   already freed blocks is fully resolved, the simple thing we can do now
   is to add another truncation of pagecache after punch hole is done.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
---
 fs/ext4/inode.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 0d424d7..2984ddf 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3621,6 +3621,12 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length)
 	up_write(&EXT4_I(inode)->i_data_sem);
 	if (IS_SYNC(inode))
 		ext4_handle_sync(handle);
+
+	/* Now release the pages again to reduce race window */
+	if (last_block_offset > first_block_offset)
+		truncate_pagecache_range(inode, first_block_offset,
+					 last_block_offset);
+
 	inode->i_mtime = inode->i_ctime = ext4_current_time(inode);
 	ext4_mark_inode_dirty(handle, inode);
 out_stop:

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] ext4: avoid exposure of stale data in ext4_punch_hole() -v2
  2013-09-27 15:54 ` [PATCH] ext4: avoid exposure of stale data in ext4_punch_hole() -v2 Maxim Patlasov
@ 2013-09-27 16:05   ` Jan Kara
  2014-02-21  0:21     ` Theodore Ts'o
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Kara @ 2013-09-27 16:05 UTC (permalink / raw)
  To: Maxim Patlasov; +Cc: tytso, adilger.kernel, linux-ext4, jack, linux-kernel

On Fri 27-09-13 19:54:03, Maxim Patlasov wrote:
> While handling punch-hole fallocate, it's useless to truncate page cache
> before removing the range from extent tree (or block map in indirect case)
> because page cache can be re-populated (by read-ahead or read(2) or mmap-ed
> read) immediately after truncating page cache, but before updating extent
> tree (or block map). In that case the user will see stale data even after
> fallocate is completed.
> 
> Changed in v2 (Thanks to Jan Kara):
>  - Until the problem of data corruption resulting from pages backed by
>    already freed blocks is fully resolved, the simple thing we can do now
>    is to add another truncation of pagecache after punch hole is done.
  The patch looks good. You can add:
Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
> ---
>  fs/ext4/inode.c |    6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 0d424d7..2984ddf 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3621,6 +3621,12 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length)
>  	up_write(&EXT4_I(inode)->i_data_sem);
>  	if (IS_SYNC(inode))
>  		ext4_handle_sync(handle);
> +
> +	/* Now release the pages again to reduce race window */
> +	if (last_block_offset > first_block_offset)
> +		truncate_pagecache_range(inode, first_block_offset,
> +					 last_block_offset);
> +
>  	inode->i_mtime = inode->i_ctime = ext4_current_time(inode);
>  	ext4_mark_inode_dirty(handle, inode);
>  out_stop:
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] ext4: avoid exposure of stale data in ext4_punch_hole() -v2
  2013-09-27 16:05   ` Jan Kara
@ 2014-02-21  0:21     ` Theodore Ts'o
  2014-02-21  9:45       ` Jan Kara
  0 siblings, 1 reply; 9+ messages in thread
From: Theodore Ts'o @ 2014-02-21  0:21 UTC (permalink / raw)
  To: Jan Kara; +Cc: Maxim Patlasov, adilger.kernel, linux-ext4, linux-kernel

On Fri, Sep 27, 2013 at 06:05:17PM +0200, Jan Kara wrote:
> On Fri 27-09-13 19:54:03, Maxim Patlasov wrote:
> > While handling punch-hole fallocate, it's useless to truncate page cache
> > before removing the range from extent tree (or block map in indirect case)
> > because page cache can be re-populated (by read-ahead or read(2) or mmap-ed
> > read) immediately after truncating page cache, but before updating extent
> > tree (or block map). In that case the user will see stale data even after
> > fallocate is completed.
> > 
> > Changed in v2 (Thanks to Jan Kara):
> >  - Until the problem of data corruption resulting from pages backed by
> >    already freed blocks is fully resolved, the simple thing we can do now
> >    is to add another truncation of pagecache after punch hole is done.
>   The patch looks good. You can add:
> Reviewed-by: Jan Kara <jack@suse.cz>

I was going through old patches, and it looks like this one got
dropped.  My apologies.

As far as I can tell, the underlying problem in the VFS/MM layer
hasn't been solved yet (Jan, can you confirm?), so I've queued this
patch for the next merge window.

					- Ted

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] ext4: avoid exposure of stale data in ext4_punch_hole() -v2
  2014-02-21  0:21     ` Theodore Ts'o
@ 2014-02-21  9:45       ` Jan Kara
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Kara @ 2014-02-21  9:45 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Jan Kara, Maxim Patlasov, adilger.kernel, linux-ext4,
	linux-kernel

On Thu 20-02-14 19:21:07, Ted Tso wrote:
> On Fri, Sep 27, 2013 at 06:05:17PM +0200, Jan Kara wrote:
> > On Fri 27-09-13 19:54:03, Maxim Patlasov wrote:
> > > While handling punch-hole fallocate, it's useless to truncate page cache
> > > before removing the range from extent tree (or block map in indirect case)
> > > because page cache can be re-populated (by read-ahead or read(2) or mmap-ed
> > > read) immediately after truncating page cache, but before updating extent
> > > tree (or block map). In that case the user will see stale data even after
> > > fallocate is completed.
> > > 
> > > Changed in v2 (Thanks to Jan Kara):
> > >  - Until the problem of data corruption resulting from pages backed by
> > >    already freed blocks is fully resolved, the simple thing we can do now
> > >    is to add another truncation of pagecache after punch hole is done.
> >   The patch looks good. You can add:
> > Reviewed-by: Jan Kara <jack@suse.cz>
> 
> I was going through old patches, and it looks like this one got
> dropped.  My apologies.
> 
> As far as I can tell, the underlying problem in the VFS/MM layer
> hasn't been solved yet (Jan, can you confirm?), so I've queued this
> patch for the next merge window.
  Yes, we didn't solve it yet. Thanks for queueing the patch!

								Honza

-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-02-21  9:45 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-26 17:32 [PATCH] ext4: avoid exposure of stale data in ext4_punch_hole() Maxim Patlasov
2013-09-26 18:53 ` Jan Kara
2013-09-27 13:05   ` Maxim Patlasov
2013-09-27 14:43     ` Jan Kara
2013-09-27 15:16       ` Maxim Patlasov
2013-09-27 15:54 ` [PATCH] ext4: avoid exposure of stale data in ext4_punch_hole() -v2 Maxim Patlasov
2013-09-27 16:05   ` Jan Kara
2014-02-21  0:21     ` Theodore Ts'o
2014-02-21  9:45       ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).