[PATCH] Btrfs: fix race condition between writting and scrubing supers

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] Btrfs: fix race condition between writting and scrubing supers
@ 2013-10-19  4:17 Wang Shilong
  2013-10-19  8:50 ` Stefan Behrens
  0 siblings, 1 reply; 14+ messages in thread
From: Wang Shilong @ 2013-10-19  4:17 UTC (permalink / raw)
  To: linux-btrfs

From: Wang Shilong <wangsl.fnst@cn.fujitsu.com>

Scrubing supers is not in a transaction context, when trying to
write supers to disk, we should check if we are trying to
scrub supers.Fix it.

Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
---
 fs/btrfs/disk-io.c     | 2 ++
 fs/btrfs/transaction.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 419968e..0debb19 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3582,7 +3582,9 @@ int btrfs_commit_super(struct btrfs_root *root)
 		return ret;
 	}
 
+	btrfs_scrub_pause_super(root);
 	ret = write_ctree_super(NULL, root, 0);
+	btrfs_scrub_continue_super(root);
 	return ret;
 }
 
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 277fe81..3ebcbbd 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1892,7 +1892,9 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
 		goto cleanup_transaction;
 	}
 
+	btrfs_scrub_pause_super(root);
 	ret = write_ctree_super(trans, root, 0);
+	btrfs_scrub_continue_super(root);
 	if (ret) {
 		mutex_unlock(&root->fs_info->tree_log_mutex);
 		goto cleanup_transaction;
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers
  2013-10-19  4:17 [PATCH] Btrfs: fix race condition between writting and scrubing supers Wang Shilong
@ 2013-10-19  8:50 ` Stefan Behrens
  2013-10-19 10:32   ` Shilong Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Stefan Behrens @ 2013-10-19  8:50 UTC (permalink / raw)
  To: Wang Shilong, linux-btrfs

On 10/19/2013 06:17, Wang Shilong wrote:
> From: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
>
> Scrubing supers is not in a transaction context, when trying to
> write supers to disk, we should check if we are trying to
> scrub supers.Fix it.
>
> Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
> ---
>   fs/btrfs/disk-io.c     | 2 ++
>   fs/btrfs/transaction.c | 2 ++
>   2 files changed, 4 insertions(+)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 419968e..0debb19 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3582,7 +3582,9 @@ int btrfs_commit_super(struct btrfs_root *root)
>   		return ret;
>   	}
>
> +	btrfs_scrub_pause_super(root);
>   	ret = write_ctree_super(NULL, root, 0);
> +	btrfs_scrub_continue_super(root);
>   	return ret;
>   }
>
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index 277fe81..3ebcbbd 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -1892,7 +1892,9 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
>   		goto cleanup_transaction;
>   	}
>
> +	btrfs_scrub_pause_super(root);
>   	ret = write_ctree_super(trans, root, 0);
> +	btrfs_scrub_continue_super(root);
>   	if (ret) {
>   		mutex_unlock(&root->fs_info->tree_log_mutex);
>   		goto cleanup_transaction;
>

What kind of race do you see between writing the 4K superblock and scrub 
checking its checksum? Or in other words, what could happen?



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers
  2013-10-19  8:50 ` Stefan Behrens
@ 2013-10-19 10:32   ` Shilong Wang
  2013-10-19 14:03     ` Stefan Behrens
  0 siblings, 1 reply; 14+ messages in thread
From: Shilong Wang @ 2013-10-19 10:32 UTC (permalink / raw)
  To: Stefan Behrens; +Cc: linux-btrfs

Yeah, it did not hurt. but it may  output checksum mismatch.  For example:
Writing 4k superblock is not totally finished, but we are trying to scrub it.

Thanks,
Wang

2013/10/19, Stefan Behrens <sbehrens@giantdisaster.de>:
> On 10/19/2013 06:17, Wang Shilong wrote:
>> From: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
>>
>> Scrubing supers is not in a transaction context, when trying to
>> write supers to disk, we should check if we are trying to
>> scrub supers.Fix it.
>>
>> Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
>> ---
>>   fs/btrfs/disk-io.c     | 2 ++
>>   fs/btrfs/transaction.c | 2 ++
>>   2 files changed, 4 insertions(+)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 419968e..0debb19 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -3582,7 +3582,9 @@ int btrfs_commit_super(struct btrfs_root *root)
>>   		return ret;
>>   	}
>>
>> +	btrfs_scrub_pause_super(root);
>>   	ret = write_ctree_super(NULL, root, 0);
>> +	btrfs_scrub_continue_super(root);
>>   	return ret;
>>   }
>>
>> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
>> index 277fe81..3ebcbbd 100644
>> --- a/fs/btrfs/transaction.c
>> +++ b/fs/btrfs/transaction.c
>> @@ -1892,7 +1892,9 @@ int btrfs_commit_transaction(struct
>> btrfs_trans_handle *trans,
>>   		goto cleanup_transaction;
>>   	}
>>
>> +	btrfs_scrub_pause_super(root);
>>   	ret = write_ctree_super(trans, root, 0);
>> +	btrfs_scrub_continue_super(root);
>>   	if (ret) {
>>   		mutex_unlock(&root->fs_info->tree_log_mutex);
>>   		goto cleanup_transaction;
>>
>
> What kind of race do you see between writing the 4K superblock and scrub
> checking its checksum? Or in other words, what could happen?
>
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers
  2013-10-19 10:32   ` Shilong Wang
@ 2013-10-19 14:03     ` Stefan Behrens
  2013-10-19 14:34       ` Wang Shilong
                         ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Stefan Behrens @ 2013-10-19 14:03 UTC (permalink / raw)
  To: Shilong Wang; +Cc: linux-btrfs

On 10/19/2013 12:32, Shilong Wang wrote:
> 2013/10/19, Stefan Behrens <sbehrens@giantdisaster.de>:
>> On 10/19/2013 06:17, Wang Shilong wrote:
>>> From: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
>>>
>>> Scrubing supers is not in a transaction context, when trying to
>>> write supers to disk, we should check if we are trying to
>>> scrub supers.Fix it.
>>>
>>> Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
>>> ---
>>>    fs/btrfs/disk-io.c     | 2 ++
>>>    fs/btrfs/transaction.c | 2 ++
>>>    2 files changed, 4 insertions(+)
>>>
>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>> index 419968e..0debb19 100644
>>> --- a/fs/btrfs/disk-io.c
>>> +++ b/fs/btrfs/disk-io.c
>>> @@ -3582,7 +3582,9 @@ int btrfs_commit_super(struct btrfs_root *root)
>>>    		return ret;
>>>    	}
>>>
>>> +	btrfs_scrub_pause_super(root);
>>>    	ret = write_ctree_super(NULL, root, 0);
>>> +	btrfs_scrub_continue_super(root);
>>>    	return ret;
>>>    }
>>>
>>> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
>>> index 277fe81..3ebcbbd 100644
>>> --- a/fs/btrfs/transaction.c
>>> +++ b/fs/btrfs/transaction.c
>>> @@ -1892,7 +1892,9 @@ int btrfs_commit_transaction(struct
>>> btrfs_trans_handle *trans,
>>>    		goto cleanup_transaction;
>>>    	}
>>>
>>> +	btrfs_scrub_pause_super(root);
>>>    	ret = write_ctree_super(trans, root, 0);
>>> +	btrfs_scrub_continue_super(root);
>>>    	if (ret) {
>>>    		mutex_unlock(&root->fs_info->tree_log_mutex);
>>>    		goto cleanup_transaction;
>>>
>>
>> What kind of race do you see between writing the 4K superblock and scrub
>> checking its checksum? Or in other words, what could happen?

 > Yeah, it did not hurt. but it may  output checksum mismatch.  For 
example:
 > Writing 4k superblock is not totally finished, but we are trying to 
scrub it.

Have you ever seen this issue?

If yes, let's find a different solution. You scrub, let's say, once a 
week. Scrubbing the superblock takes, let's say, 100ms, then it's 
finished. This short race doesn't justify to add such code to 
btrfs_commit_transaction and btrfs_commit_super IMHO. And commiting a 
transaction is synchronized to scrub already when the commit root is 
updated.

If this is really an issue and these 4K disk writes and reads interfere, 
let's find a better solution please.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers
  2013-10-19 14:03     ` Stefan Behrens
@ 2013-10-19 14:34       ` Wang Shilong
  2013-10-20  4:03       ` Wang Shilong
  2013-10-20  7:28       ` Bob Marley
  2 siblings, 0 replies; 14+ messages in thread
From: Wang Shilong @ 2013-10-19 14:34 UTC (permalink / raw)
  To: Stefan Behrens; +Cc: linux-btrfs

> On 10/19/2013 12:32, Shilong Wang wrote:
>> 2013/10/19, Stefan Behrens <sbehrens@giantdisaster.de>:
>>> On 10/19/2013 06:17, Wang Shilong wrote:
>>>> From: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
>>>> 
>>>> Scrubing supers is not in a transaction context, when trying to
>>>> write supers to disk, we should check if we are trying to
>>>> scrub supers.Fix it.
>>>> 
>>>> Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
>>>> ---
>>>>   fs/btrfs/disk-io.c     | 2 ++
>>>>   fs/btrfs/transaction.c | 2 ++
>>>>   2 files changed, 4 insertions(+)
>>>> 
>>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>>> index 419968e..0debb19 100644
>>>> --- a/fs/btrfs/disk-io.c
>>>> +++ b/fs/btrfs/disk-io.c
>>>> @@ -3582,7 +3582,9 @@ int btrfs_commit_super(struct btrfs_root *root)
>>>>   		return ret;
>>>>   	}
>>>> 
>>>> +	btrfs_scrub_pause_super(root);
>>>>   	ret = write_ctree_super(NULL, root, 0);
>>>> +	btrfs_scrub_continue_super(root);
>>>>   	return ret;
>>>>   }
>>>> 
>>>> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
>>>> index 277fe81..3ebcbbd 100644
>>>> --- a/fs/btrfs/transaction.c
>>>> +++ b/fs/btrfs/transaction.c
>>>> @@ -1892,7 +1892,9 @@ int btrfs_commit_transaction(struct
>>>> btrfs_trans_handle *trans,
>>>>   		goto cleanup_transaction;
>>>>   	}
>>>> 
>>>> +	btrfs_scrub_pause_super(root);
>>>>   	ret = write_ctree_super(trans, root, 0);
>>>> +	btrfs_scrub_continue_super(root);
>>>>   	if (ret) {
>>>>   		mutex_unlock(&root->fs_info->tree_log_mutex);
>>>>   		goto cleanup_transaction;
>>>> 
>>> 
>>> What kind of race do you see between writing the 4K superblock and scrub
>>> checking its checksum? Or in other words, what could happen?
> 
> > Yeah, it did not hurt. but it may  output checksum mismatch.  For example:
> > Writing 4k superblock is not totally finished, but we are trying to scrub it.
> 
> Have you ever seen this issue?

No, just noticing it by accident.

> 
> If yes, let's find a different solution. You scrub, let's say, once a week. Scrubbing the superblock takes, let's say, 100ms, then it's finished. This short race doesn't justify to add such code to btrfs_commit_transaction and btrfs_commit_super IMHO. And commiting a transaction is synchronized to scrub already when the commit root is updated.

Agree with this point.

> 
> If this is really an issue and these 4K disk writes and reads interfere, let's find a better solution please.

>From now, scrubing supers don't involve writing,  so it doesn't hurt even if this short race really happens.
Anyway, thanks for comments .

Thanks,
Wang
> 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers
  2013-10-19 14:03     ` Stefan Behrens
  2013-10-19 14:34       ` Wang Shilong
@ 2013-10-20  4:03       ` Wang Shilong
  2013-10-22  8:37         ` Stefan Behrens
  2013-10-20  7:28       ` Bob Marley
  2 siblings, 1 reply; 14+ messages in thread
From: Wang Shilong @ 2013-10-20  4:03 UTC (permalink / raw)
  To: Stefan Behrens; +Cc: linux-btrfs


> On 10/19/2013 12:32, Shilong Wang wrote:
>> 2013/10/19, Stefan Behrens <sbehrens@giantdisaster.de>:
>>> On 10/19/2013 06:17, Wang Shilong wrote:
>>>> From: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
>>>> 
>>>> Scrubing supers is not in a transaction context, when trying to
>>>> write supers to disk, we should check if we are trying to
>>>> scrub supers.Fix it.
>>>> 
>>>> Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
>>>> ---
>>>>   fs/btrfs/disk-io.c     | 2 ++
>>>>   fs/btrfs/transaction.c | 2 ++
>>>>   2 files changed, 4 insertions(+)
>>>> 
>>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>>> index 419968e..0debb19 100644
>>>> --- a/fs/btrfs/disk-io.c
>>>> +++ b/fs/btrfs/disk-io.c
>>>> @@ -3582,7 +3582,9 @@ int btrfs_commit_super(struct btrfs_root *root)
>>>>   		return ret;
>>>>   	}
>>>> 
>>>> +	btrfs_scrub_pause_super(root);
>>>>   	ret = write_ctree_super(NULL, root, 0);
>>>> +	btrfs_scrub_continue_super(root);
>>>>   	return ret;
>>>>   }
>>>> 
>>>> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
>>>> index 277fe81..3ebcbbd 100644
>>>> --- a/fs/btrfs/transaction.c
>>>> +++ b/fs/btrfs/transaction.c
>>>> @@ -1892,7 +1892,9 @@ int btrfs_commit_transaction(struct
>>>> btrfs_trans_handle *trans,
>>>>   		goto cleanup_transaction;
>>>>   	}
>>>> 
>>>> +	btrfs_scrub_pause_super(root);
>>>>   	ret = write_ctree_super(trans, root, 0);
>>>> +	btrfs_scrub_continue_super(root);
>>>>   	if (ret) {
>>>>   		mutex_unlock(&root->fs_info->tree_log_mutex);
>>>>   		goto cleanup_transaction;
>>>> 
>>> 
>>> What kind of race do you see between writing the 4K superblock and scrub
>>> checking its checksum? Or in other words, what could happen?
> 
> > Yeah, it did not hurt. but it may  output checksum mismatch.  For example:
> > Writing 4k superblock is not totally finished, but we are trying to scrub it.
> 
> Have you ever seen this issue?
> 
> If yes, let's find a different solution. You scrub, let's say, once a week. Scrubbing the superblock takes, let's say, 100ms, then it's finished. This short race doesn't justify to add such code to btrfs_commit_transaction and btrfs_commit_super IMHO. And commiting a transaction is synchronized to scrub already when the commit root is updated.
> 
> If this is really an issue and these 4K disk writes and reads interfere, let's find a better solution please.

How about this approach?

We let scrub_supers in a transaction context.

btrfs_join_transaction()

scrub_supers

btrfs_commit_transaction().

This is not elegant, but we can remove scrub_lock with supers(Notice, there is another place that have used
this lock).

Thanks,
Wang
> 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers
  2013-10-19 14:03     ` Stefan Behrens
  2013-10-19 14:34       ` Wang Shilong
  2013-10-20  4:03       ` Wang Shilong
@ 2013-10-20  7:28       ` Bob Marley
  2 siblings, 0 replies; 14+ messages in thread
From: Bob Marley @ 2013-10-20  7:28 UTC (permalink / raw)
  To: Stefan Behrens; +Cc: Shilong Wang, linux-btrfs

On 19/10/2013 16:03, Stefan Behrens wrote:
> On 10/19/2013 12:32, Shilong Wang wrote:
>
> > Yeah, it did not hurt. but it may  output checksum mismatch. For 
> example:
> > Writing 4k superblock is not totally finished, but we are trying to 
> scrub it.
>
> Have you ever seen this issue?
>

...

> If this is really an issue and these 4K disk writes and reads 
> interfere, let's find a better solution please.
>


Why don't you scrub optimistically as is now, and then just in case of 
checksum mismatch you re-scrub in transaction context?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers
  2013-10-20  4:03       ` Wang Shilong
@ 2013-10-22  8:37         ` Stefan Behrens
  2013-10-22 16:55           ` Bob Marley
  0 siblings, 1 reply; 14+ messages in thread
From: Stefan Behrens @ 2013-10-22  8:37 UTC (permalink / raw)
  To: Wang Shilong; +Cc: linux-btrfs

On Sun, 20 Oct 2013 12:03:01 +0800, Wang Shilong wrote:
>> On 10/19/2013 12:32, Shilong Wang wrote:
>>> 2013/10/19, Stefan Behrens <sbehrens@giantdisaster.de>:
>>>> On 10/19/2013 06:17, Wang Shilong wrote:
>>>>> From: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
>>>>>
>>>>> Scrubing supers is not in a transaction context, when trying to
>>>>> write supers to disk, we should check if we are trying to
>>>>> scrub supers.Fix it.
>>>>>
>>>>> Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
[...]
>>>> What kind of race do you see between writing the 4K superblock and scrub
>>>> checking its checksum? Or in other words, what could happen?
>>
>>> Yeah, it did not hurt. but it may  output checksum mismatch.  For example:
>>> Writing 4k superblock is not totally finished, but we are trying to scrub it.
>>
>> Have you ever seen this issue?

You replied with "No, just noticing it by accident" in the mail before.

I don't believe that this issue can ever happen. I don't believe that
somewhere on the path to the flash memory, to the magnetic disc or to
the drive's cache memory, someone interrupts a 4KB write in the middle
of operation to read from this 4KB area. This is not an issue IMHO.


>> If yes, let's find a different solution. You scrub, let's say, once a week. Scrubbing the superblock takes, let's say, 100ms, then it's finished. This short race doesn't justify to add such code to btrfs_commit_transaction and btrfs_commit_super IMHO. And commiting a transaction is synchronized to scrub already when the commit root is updated.
>>
>> If this is really an issue and these 4K disk writes and reads interfere, let's find a better solution please.
> 
> How about this approach?
> 
> We let scrub_supers in a transaction context.
> 
> btrfs_join_transaction()
> 
> scrub_supers
> 
> btrfs_commit_transaction().
> 
> This is not elegant, but we can remove scrub_lock with supers(Notice, there is another place that have used
> this lock).



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers
  2013-10-22  8:37         ` Stefan Behrens
@ 2013-10-22 16:55           ` Bob Marley
  2013-10-23 17:21             ` Stefan Behrens
  0 siblings, 1 reply; 14+ messages in thread
From: Bob Marley @ 2013-10-22 16:55 UTC (permalink / raw)
  To: Stefan Behrens; +Cc: Wang Shilong, linux-btrfs

On 22/10/2013 10:37, Stefan Behrens wrote:
> I don't believe that this issue can ever happen. I don't believe that
> somewhere on the path to the flash memory, to the magnetic disc or to
> the drive's cache memory, someone interrupts a 4KB write in the middle
> of operation to read from this 4KB area. This is not an issue IMHO.

I think I have read that unfortunately it can happen.
SAS and SATA specs for disks do not mandate that if a write is in-flight 
but still not completed, reads from the same sector should return the 
value it is being written; they can return the old value.
I also think that Linux does not check either.
Much worse, I think I have even read that two simultaneous in-flight 
writes to the same sector can be completed in any order by the disk, and 
since the write which wins is the latter being completed, this results 
in an indeterminate value persisting on that sector at the end. One 
needs to synchronize cache between the two writes to guarantee the 
outcome. Way worse is when the drives also cheat on synchronize cache, 
and that one is impossible to fix I believe.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers
  2013-10-22 16:55           ` Bob Marley
@ 2013-10-23 17:21             ` Stefan Behrens
  2013-10-24 10:08               ` Chris Mason
  0 siblings, 1 reply; 14+ messages in thread
From: Stefan Behrens @ 2013-10-23 17:21 UTC (permalink / raw)
  To: Bob Marley; +Cc: Wang Shilong, linux-btrfs

On Tue, 22 Oct 2013 18:55:59 +0200, Bob Marley wrote:
> On 22/10/2013 10:37, Stefan Behrens wrote:
>> I don't believe that this issue can ever happen. I don't believe that
>> somewhere on the path to the flash memory, to the magnetic disc or to
>> the drive's cache memory, someone interrupts a 4KB write in the middle
>> of operation to read from this 4KB area. This is not an issue IMHO.
> 
> I think I have read that unfortunately it can happen.
> SAS and SATA specs for disks do not mandate that if a write is in-flight
> but still not completed, reads from the same sector should return the
> value it is being written; they can return the old value.
> I also think that Linux does not check either.

If the _old_ 4KB block is returned, that's fine and won't cause a
checksum error.

The patch in question addresses the case that Btrfs submits a write
request for a 4KB block, and a concurrent read request for that 4KB
block reads partially the old block and partially the new block,
resulting in a checksum error reported in the scrub statistic counters.


> Much worse, I think I have even read that two simultaneous in-flight
> writes to the same sector can be completed in any order by the disk, and
> since the write which wins is the latter being completed, this results
> in an indeterminate value persisting on that sector at the end. One
> needs to synchronize cache between the two writes to guarantee the
> outcome. Way worse is when the drives also cheat on synchronize cache,
> and that one is impossible to fix I believe.

Two simultaneous in-flight writes to the same superblock cannot happen
in Btrfs.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers
  2013-10-23 17:21             ` Stefan Behrens
@ 2013-10-24 10:08               ` Chris Mason
  2013-10-24 10:42                 ` Miao Xie
  2013-10-24 11:32                 ` Wang Shilong
  0 siblings, 2 replies; 14+ messages in thread
From: Chris Mason @ 2013-10-24 10:08 UTC (permalink / raw)
  To: Stefan Behrens, Bob Marley; +Cc: Wang Shilong, linux-btrfs

Quoting Stefan Behrens (2013-10-23 13:21:34)
> On Tue, 22 Oct 2013 18:55:59 +0200, Bob Marley wrote:
> > On 22/10/2013 10:37, Stefan Behrens wrote:
> >> I don't believe that this issue can ever happen. I don't believe that
> >> somewhere on the path to the flash memory, to the magnetic disc or to
> >> the drive's cache memory, someone interrupts a 4KB write in the middle
> >> of operation to read from this 4KB area. This is not an issue IMHO.
> > 
> > I think I have read that unfortunately it can happen.
> > SAS and SATA specs for disks do not mandate that if a write is in-flight
> > but still not completed, reads from the same sector should return the
> > value it is being written; they can return the old value.
> > I also think that Linux does not check either.
> 
> If the _old_ 4KB block is returned, that's fine and won't cause a
> checksum error.
> 
> The patch in question addresses the case that Btrfs submits a write
> request for a 4KB block, and a concurrent read request for that 4KB
> block reads partially the old block and partially the new block,
> resulting in a checksum error reported in the scrub statistic counters.

Concurrent reads and writes to the device are completely undefined, and
Any combination of old, new, random memory corruption wouldn't
surprise me...I'd rather avoid them ;)

Doing the transaction join during the super read is probably the least
complex choice.

-chris

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers
  2013-10-24 10:08               ` Chris Mason
@ 2013-10-24 10:42                 ` Miao Xie
  2013-10-24 11:32                 ` Wang Shilong
  1 sibling, 0 replies; 14+ messages in thread
From: Miao Xie @ 2013-10-24 10:42 UTC (permalink / raw)
  To: Chris Mason, Stefan Behrens, Bob Marley; +Cc: Wang Shilong, linux-btrfs

On thu, 24 Oct 2013 06:08:42 -0400, Chris Mason wrote:
> Quoting Stefan Behrens (2013-10-23 13:21:34)
>> On Tue, 22 Oct 2013 18:55:59 +0200, Bob Marley wrote:
>>> On 22/10/2013 10:37, Stefan Behrens wrote:
>>>> I don't believe that this issue can ever happen. I don't believe that
>>>> somewhere on the path to the flash memory, to the magnetic disc or to
>>>> the drive's cache memory, someone interrupts a 4KB write in the middle
>>>> of operation to read from this 4KB area. This is not an issue IMHO.
>>>
>>> I think I have read that unfortunately it can happen.
>>> SAS and SATA specs for disks do not mandate that if a write is in-flight
>>> but still not completed, reads from the same sector should return the
>>> value it is being written; they can return the old value.
>>> I also think that Linux does not check either.
>>
>> If the _old_ 4KB block is returned, that's fine and won't cause a
>> checksum error.
>>
>> The patch in question addresses the case that Btrfs submits a write
>> request for a 4KB block, and a concurrent read request for that 4KB
>> block reads partially the old block and partially the new block,
>> resulting in a checksum error reported in the scrub statistic counters.
> 
> Concurrent reads and writes to the device are completely undefined, and
> Any combination of old, new, random memory corruption wouldn't
> surprise me...I'd rather avoid them ;)
> 
> Doing the transaction join during the super read is probably the least
> complex choice.

But it can not block the log tree sync, I think using device_list_mutex is
better since we should acquire this mutex when writing the super blocks and
we are sure that the super blocks are on non-volatile media on completion
after we unlock the mutex.

Thanks
Miao

> 
> -chris
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers
  2013-10-24 10:08               ` Chris Mason
  2013-10-24 10:42                 ` Miao Xie
@ 2013-10-24 11:32                 ` Wang Shilong
  2013-10-25  2:14                   ` Miao Xie
  1 sibling, 1 reply; 14+ messages in thread
From: Wang Shilong @ 2013-10-24 11:32 UTC (permalink / raw)
  To: Chris Mason; +Cc: Stefan Behrens, Bob Marley, Wang Shilong, linux-btrfs

On 10/24/2013 06:08 PM, Chris Mason wrote:
> Quoting Stefan Behrens (2013-10-23 13:21:34)
>> On Tue, 22 Oct 2013 18:55:59 +0200, Bob Marley wrote:
>>> On 22/10/2013 10:37, Stefan Behrens wrote:
>>>> I don't believe that this issue can ever happen. I don't believe that
>>>> somewhere on the path to the flash memory, to the magnetic disc or to
>>>> the drive's cache memory, someone interrupts a 4KB write in the middle
>>>> of operation to read from this 4KB area. This is not an issue IMHO.
>>> I think I have read that unfortunately it can happen.
>>> SAS and SATA specs for disks do not mandate that if a write is in-flight
>>> but still not completed, reads from the same sector should return the
>>> value it is being written; they can return the old value.
>>> I also think that Linux does not check either.
>> If the _old_ 4KB block is returned, that's fine and won't cause a
>> checksum error.
>>
>> The patch in question addresses the case that Btrfs submits a write
>> request for a 4KB block, and a concurrent read request for that 4KB
>> block reads partially the old block and partially the new block,
>> resulting in a checksum error reported in the scrub statistic counters.
> Concurrent reads and writes to the device are completely undefined, and
> Any combination of old, new, random memory corruption wouldn't
> surprise me...I'd rather avoid them ;)
>
> Doing the transaction join during the super read is probably the least
> complex choice.
Yeah, by joining transaction we can solve this problem, but it is a 
little confused,
because we don't involve writting in scrubing supers.

And the only race condition happens in commiting transaction, Miao also 
pointed out that
maybe the best way is to move btrfs_scrub_continue after 
write_ctree_super().

Thanks,
Wang
> -chris
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers
  2013-10-24 11:32                 ` Wang Shilong
@ 2013-10-25  2:14                   ` Miao Xie
  0 siblings, 0 replies; 14+ messages in thread
From: Miao Xie @ 2013-10-25  2:14 UTC (permalink / raw)
  To: Wang Shilong, Chris Mason
  Cc: Stefan Behrens, Bob Marley, Wang Shilong, linux-btrfs

On 	thu, 24 Oct 2013 19:32:15 +0800, Wang Shilong wrote:
> On 10/24/2013 06:08 PM, Chris Mason wrote:
>> Quoting Stefan Behrens (2013-10-23 13:21:34)
>>> On Tue, 22 Oct 2013 18:55:59 +0200, Bob Marley wrote:
>>>> On 22/10/2013 10:37, Stefan Behrens wrote:
>>>>> I don't believe that this issue can ever happen. I don't believe that
>>>>> somewhere on the path to the flash memory, to the magnetic disc or to
>>>>> the drive's cache memory, someone interrupts a 4KB write in the middle
>>>>> of operation to read from this 4KB area. This is not an issue IMHO.
>>>> I think I have read that unfortunately it can happen.
>>>> SAS and SATA specs for disks do not mandate that if a write is in-flight
>>>> but still not completed, reads from the same sector should return the
>>>> value it is being written; they can return the old value.
>>>> I also think that Linux does not check either.
>>> If the _old_ 4KB block is returned, that's fine and won't cause a
>>> checksum error.
>>>
>>> The patch in question addresses the case that Btrfs submits a write
>>> request for a 4KB block, and a concurrent read request for that 4KB
>>> block reads partially the old block and partially the new block,
>>> resulting in a checksum error reported in the scrub statistic counters.
>> Concurrent reads and writes to the device are completely undefined, and
>> Any combination of old, new, random memory corruption wouldn't
>> surprise me...I'd rather avoid them ;)
>>
>> Doing the transaction join during the super read is probably the least
>> complex choice.
> Yeah, by joining transaction we can solve this problem, but it is a little confused,
> because we don't involve writting in scrubing supers.
> 
> And the only race condition happens in commiting transaction, Miao also pointed out that
> maybe the best way is to move btrfs_scrub_continue after write_ctree_super().

Sorry, My miss.

btrfs_scrub_continue() is behind write_ctree_super() all the while, so the above problem
doesn't exist.

Thanks
Miao

> 
> Thanks,
> Wang
>> -chris
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2013-10-25  2:13 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-19  4:17 [PATCH] Btrfs: fix race condition between writting and scrubing supers Wang Shilong
2013-10-19  8:50 ` Stefan Behrens
2013-10-19 10:32   ` Shilong Wang
2013-10-19 14:03     ` Stefan Behrens
2013-10-19 14:34       ` Wang Shilong
2013-10-20  4:03       ` Wang Shilong
2013-10-22  8:37         ` Stefan Behrens
2013-10-22 16:55           ` Bob Marley
2013-10-23 17:21             ` Stefan Behrens
2013-10-24 10:08               ` Chris Mason
2013-10-24 10:42                 ` Miao Xie
2013-10-24 11:32                 ` Wang Shilong
2013-10-25  2:14                   ` Miao Xie
2013-10-20  7:28       ` Bob Marley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).