* [PATCH] bcache: lock in btree_flush_write() to avoid races
@ 2018-01-24 6:54 tang.junhui
2018-01-24 9:13 ` Coly Li
2018-01-24 10:30 ` Nikolay Borisov
0 siblings, 2 replies; 8+ messages in thread
From: tang.junhui @ 2018-01-24 6:54 UTC (permalink / raw)
To: colyli, mlyle; +Cc: linux-bcache, linux-block, tang.junhui
From: Tang Junhui <tang.junhui@zte.com.cn>
In btree_flush_write(), two places need to take a locker to
avoid races:
Firstly, we need take rcu read locker to protect the bucket_hash
traverse, since hlist_for_each_entry_rcu() must be called under
the protection of rcu read locker.
Secondly, we need take b->write_lock locker to protect journal
of the btree node, otherwise, the btree node may have been
written, and the journal have been assign to NULL.
Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
---
drivers/md/bcache/journal.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index 02a98dd..505f9f3 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -375,7 +375,9 @@ static void btree_flush_write(struct cache_set *c)
retry:
best = NULL;
- for_each_cached_btree(b, c, i)
+ rcu_read_lock();
+ for_each_cached_btree(b, c, i) {
+ mutex_lock(&b->write_lock);
if (btree_current_write(b)->journal) {
if (!best)
best = b;
@@ -385,6 +387,9 @@ static void btree_flush_write(struct cache_set *c)
best = b;
}
}
+ mutex_unlock(&b->write_lock);
+ }
+ rcu_read_unlock();
b = best;
if (b) {
--
1.8.3.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] bcache: lock in btree_flush_write() to avoid races
2018-01-24 6:54 tang.junhui
@ 2018-01-24 9:13 ` Coly Li
2018-01-24 10:30 ` Nikolay Borisov
1 sibling, 0 replies; 8+ messages in thread
From: Coly Li @ 2018-01-24 9:13 UTC (permalink / raw)
To: tang.junhui; +Cc: mlyle, linux-bcache, linux-block
On 24/01/2018 2:54 PM, tang.junhui@zte.com.cn wrote:
> From: Tang Junhui <tang.junhui@zte.com.cn>
>
> In btree_flush_write(), two places need to take a locker to
> avoid races:
>
> Firstly, we need take rcu read locker to protect the bucket_hash
> traverse, since hlist_for_each_entry_rcu() must be called under
> the protection of rcu read locker.
>
> Secondly, we need take b->write_lock locker to protect journal
> of the btree node, otherwise, the btree node may have been
> written, and the journal have been assign to NULL.
>
> Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
> ---
> drivers/md/bcache/journal.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
> index 02a98dd..505f9f3 100644
> --- a/drivers/md/bcache/journal.c
> +++ b/drivers/md/bcache/journal.c
> @@ -375,7 +375,9 @@ static void btree_flush_write(struct cache_set *c)
> retry:
> best = NULL;
>
> - for_each_cached_btree(b, c, i)
> + rcu_read_lock();
> + for_each_cached_btree(b, c, i) {
> + mutex_lock(&b->write_lock);
> if (btree_current_write(b)->journal) {
> if (!best)
> best = b;
> @@ -385,6 +387,9 @@ static void btree_flush_write(struct cache_set *c)
> best = b;
> }
> }
> + mutex_unlock(&b->write_lock);
> + }
> + rcu_read_unlock();
>
> b = best;
> if (b) {
>
Hi Junhui,
Do you run into some real problem ? The patch looks good to me at first
glance, but I need time to dig it more before I provide my response.
Thanks.
Coly Li
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] bcache: lock in btree_flush_write() to avoid races
@ 2018-01-24 9:45 tang.junhui
2018-01-24 9:49 ` Coly Li
0 siblings, 1 reply; 8+ messages in thread
From: tang.junhui @ 2018-01-24 9:45 UTC (permalink / raw)
To: colyli; +Cc: mlyle, linux-bcache, linux-block, tang.junhui
From: Tang Junhui <tang.junhui@zte.com.cn>
>On 24/01/2018 2:54 PM, tang.junhui@zte.com.cn wrote:
>> From: Tang Junhui <tang.junhui@zte.com.cn>
>>
>> In btree_flush_write(), two places need to take a locker to
>> avoid races:
>>
>> Firstly, we need take rcu read locker to protect the bucket_hash
>> traverse, since hlist_for_each_entry_rcu() must be called under
>> the protection of rcu read locker.
>>
>> Secondly, we need take b->write_lock locker to protect journal
>> of the btree node, otherwise, the btree node may have been
>> written, and the journal have been assign to NULL.
>>
>> Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
>> ---
>> drivers/md/bcache/journal.c | 7 ++++++-
>> 1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
>> index 02a98dd..505f9f3 100644
>> --- a/drivers/md/bcache/journal.c
>> +++ b/drivers/md/bcache/journal.c
>> @@ -375,7 +375,9 @@ static void btree_flush_write(struct cache_set *c)
>> retry:
>> best = NULL;
>>
>> - for_each_cached_btree(b, c, i)
>> + rcu_read_lock();
>> + for_each_cached_btree(b, c, i) {
>> + mutex_lock(&b->write_lock);
>> if (btree_current_write(b)->journal) {
>> if (!best)
>> best = b;
>> @@ -385,6 +387,9 @@ static void btree_flush_write(struct cache_set *c)
>> best = b;
>> }
>> }
>> + mutex_unlock(&b->write_lock);
>> + }
>> + rcu_read_unlock();
>>
>> b = best;
>> if (b) {
>>
>
>Hi Junhui,
>
>Do you run into some real problem ? The patch looks good to me at first
>glance, but I need time to dig it more before I provide my response.
>
>Thanks.
No real problem run out, I find it by code reading, I think we'd better
lock it for safety.
Thanks,
Tang Junhui
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] bcache: lock in btree_flush_write() to avoid races
2018-01-24 9:45 tang.junhui
@ 2018-01-24 9:49 ` Coly Li
0 siblings, 0 replies; 8+ messages in thread
From: Coly Li @ 2018-01-24 9:49 UTC (permalink / raw)
To: tang.junhui; +Cc: mlyle, linux-bcache, linux-block
On 24/01/2018 5:45 PM, tang.junhui@zte.com.cn wrote:
> From: Tang Junhui <tang.junhui@zte.com.cn>
>
>> On 24/01/2018 2:54 PM, tang.junhui@zte.com.cn wrote:
>>> From: Tang Junhui <tang.junhui@zte.com.cn>
>>>
>>> In btree_flush_write(), two places need to take a locker to
>>> avoid races:
>>>
>>> Firstly, we need take rcu read locker to protect the bucket_hash
>>> traverse, since hlist_for_each_entry_rcu() must be called under
>>> the protection of rcu read locker.
>>>
>>> Secondly, we need take b->write_lock locker to protect journal
>>> of the btree node, otherwise, the btree node may have been
>>> written, and the journal have been assign to NULL.
>>>
>>> Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
>>> ---
>>> drivers/md/bcache/journal.c | 7 ++++++-
>>> 1 file changed, 6 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
>>> index 02a98dd..505f9f3 100644
>>> --- a/drivers/md/bcache/journal.c
>>> +++ b/drivers/md/bcache/journal.c
>>> @@ -375,7 +375,9 @@ static void btree_flush_write(struct cache_set *c)
>>> retry:
>>> best = NULL;
>>>
>>> - for_each_cached_btree(b, c, i)
>>> + rcu_read_lock();
>>> + for_each_cached_btree(b, c, i) {
>>> + mutex_lock(&b->write_lock);
>>> if (btree_current_write(b)->journal) {
>>> if (!best)
>>> best = b;
>>> @@ -385,6 +387,9 @@ static void btree_flush_write(struct cache_set *c)
>>> best = b;
>>> }
>>> }
>>> + mutex_unlock(&b->write_lock);
>>> + }
>>> + rcu_read_unlock();
>>>
>>> b = best;
>>> if (b) {
>>>
>>
>> Hi Junhui,
>>
>> Do you run into some real problem ? The patch looks good to me at first
>> glance, but I need time to dig it more before I provide my response.
>>
>> Thanks.
>
> No real problem run out, I find it by code reading, I think we'd better
> lock it for safety.
I see, then I will take it as a lower priority and finish my current
task firstly. I will provide my review comment 1 week later, if no one
else does it then.
Thanks.
Coly Li
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] bcache: lock in btree_flush_write() to avoid races
2018-01-24 6:54 tang.junhui
2018-01-24 9:13 ` Coly Li
@ 2018-01-24 10:30 ` Nikolay Borisov
1 sibling, 0 replies; 8+ messages in thread
From: Nikolay Borisov @ 2018-01-24 10:30 UTC (permalink / raw)
To: tang.junhui, colyli, mlyle; +Cc: linux-bcache, linux-block
On 24.01.2018 08:54, tang.junhui@zte.com.cn wrote:
> From: Tang Junhui <tang.junhui@zte.com.cn>
>
> In btree_flush_write(), two places need to take a locker to
> avoid races:
>
> Firstly, we need take rcu read locker to protect the bucket_hash
> traverse, since hlist_for_each_entry_rcu() must be called under
> the protection of rcu read locker.
>
> Secondly, we need take b->write_lock locker to protect journal
> of the btree node, otherwise, the btree node may have been
> written, and the journal have been assign to NULL.
>
> Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
> ---
> drivers/md/bcache/journal.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
> index 02a98dd..505f9f3 100644
> --- a/drivers/md/bcache/journal.c
> +++ b/drivers/md/bcache/journal.c
> @@ -375,7 +375,9 @@ static void btree_flush_write(struct cache_set *c)
> retry:
> best = NULL;
>
> - for_each_cached_btree(b, c, i)
> + rcu_read_lock();
> + for_each_cached_btree(b, c, i) {
> + mutex_lock(&b->write_lock);
You can't sleep in rcu read critical section, yet here you take mutex
which can sleep under rcu_read_lock.
> if (btree_current_write(b)->journal) {
> if (!best)
> best = b;
> @@ -385,6 +387,9 @@ static void btree_flush_write(struct cache_set *c)
> best = b;
> }
> }
> + mutex_unlock(&b->write_lock);
> + }
> + rcu_read_unlock();
>
> b = best;
> if (b) {
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] bcache: lock in btree_flush_write() to avoid races
@ 2018-01-25 1:50 tang.junhui
2018-01-25 1:58 ` Kent Overstreet
0 siblings, 1 reply; 8+ messages in thread
From: tang.junhui @ 2018-01-25 1:50 UTC (permalink / raw)
To: kent.overstreet, nix
Cc: colyli, mlyle, linux-bcache, linux-block, tang.junhui
From: Tang Junhui <tang.junhui@zte.com.cn>
Hello Kent && Nix
>
>neither of those locks are needed - rcu_read_lock() isn't needed because we never
>free struct btree (except at shutdown), and we're not derefing journal there
__bch_btree_node_write() is called in many places, in __bch_btree_node_write(),
before node writing over, it has changed the write buff index to another, but the
journal of changed write buff maybe still NULL if no journal writing occurred. But
luckily we only use the NULL as ZERO to calculate the fifo_idx, the result is a very
big value, so it does no harm. And since we cannot take mutex under rcu_read_lock,
we can ignore it.
As to the rcu lock, though the btree is not freed, but we often delete it from one
list (such as one bucket_hash) and move it to another list (such as btree_cache_freed),
shouldn't it be protected by rcu locker?
>On Wed, Jan 24, 2018 at 5:30 AM, Nikolay Borisov <nborisov@suse.com> wrote:
>
>
>On 24.01.2018 08:54, tang.junhui@zte.com.cn wrote:
>> From: Tang Junhui <tang.junhui@zte.com.cn>
>>
>> In btree_flush_write(), two places need to take a locker to
>> avoid races:
>>
>> Firstly, we need take rcu read locker to protect the bucket_hash
>> traverse, since hlist_for_each_entry_rcu() must be called under
>> the protection of rcu read locker.
>>
>> Secondly, we need take b->write_lock locker to protect journal
>> of the btree node, otherwise, the btree node may have been
>> written, and the journal have been assign to NULL.
>>
>> Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
>> ---
>> drivers/md/bcache/journal.c | 7 ++++++-
>> 1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
>> index 02a98dd.505f9f3 100644
>> --- a/drivers/md/bcache/journal.c
>> +++ b/drivers/md/bcache/journal.c
>> @@ -375,7 +375,9 @@ static void btree_flush_write(struct cache_set *c)
>> retry:
>> best = NULL;
>>
>> - for_each_cached_btree(b, c, i)
>> + rcu_read_lock();
>> + for_each_cached_btree(b, c, i) {
>> + mutex_lock(&b->write_lock);
>
>You can't sleep in rcu read critical section, yet here you take mutex
>which can sleep under rcu_read_lock.
To Nix: Yes, you are write. Good catch.
>
>> if (btree_current_write(b)->journal) {
>> if (!best)
>> best = b;
>> @@ -385,6 +387,9 @@ static void btree_flush_write(struct cache_set *c)
>> best = b;
>> }
>> }
>> + mutex_unlock(&b->write_lock);
>> + }
>> + rcu_read_unlock();
>>
>> b = best;
>> if (b) {
Thanks,
Tang Junhui
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] bcache: lock in btree_flush_write() to avoid races
2018-01-25 1:50 tang.junhui
@ 2018-01-25 1:58 ` Kent Overstreet
0 siblings, 0 replies; 8+ messages in thread
From: Kent Overstreet @ 2018-01-25 1:58 UTC (permalink / raw)
To: Junhui Tang; +Cc: Nix, Coly Li, Michael Lyle, linux-bcache, linux-block
The only purpose of rcu_read_lock() would be to ensure the object
isn't freed out from under you. That's not an issue here.
Racing with other writers isn't a correctness issue here, and if it
was - just throwing write_lock around where you have it wouldn't be
sufficient, you'd need something much more complicated.
On Wed, Jan 24, 2018 at 8:50 PM, <tang.junhui@zte.com.cn> wrote:
> From: Tang Junhui <tang.junhui@zte.com.cn>
>
> Hello Kent && Nix
>
>>
>>neither of those locks are needed - rcu_read_lock() isn't needed because we never
>>free struct btree (except at shutdown), and we're not derefing journal there
>
> __bch_btree_node_write() is called in many places, in __bch_btree_node_write(),
> before node writing over, it has changed the write buff index to another, but the
> journal of changed write buff maybe still NULL if no journal writing occurred. But
> luckily we only use the NULL as ZERO to calculate the fifo_idx, the result is a very
> big value, so it does no harm. And since we cannot take mutex under rcu_read_lock,
> we can ignore it.
>
> As to the rcu lock, though the btree is not freed, but we often delete it from one
> list (such as one bucket_hash) and move it to another list (such as btree_cache_freed),
> shouldn't it be protected by rcu locker?
>
>
>>On Wed, Jan 24, 2018 at 5:30 AM, Nikolay Borisov <nborisov@suse.com> wrote:
>>
>>
>>On 24.01.2018 08:54, tang.junhui@zte.com.cn wrote:
>>> From: Tang Junhui <tang.junhui@zte.com.cn>
>>>
>>> In btree_flush_write(), two places need to take a locker to
>>> avoid races:
>>>
>>> Firstly, we need take rcu read locker to protect the bucket_hash
>>> traverse, since hlist_for_each_entry_rcu() must be called under
>>> the protection of rcu read locker.
>>>
>>> Secondly, we need take b->write_lock locker to protect journal
>>> of the btree node, otherwise, the btree node may have been
>>> written, and the journal have been assign to NULL.
>>>
>>> Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
>>> ---
>>> drivers/md/bcache/journal.c | 7 ++++++-
>>> 1 file changed, 6 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
>>> index 02a98dd.505f9f3 100644
>>> --- a/drivers/md/bcache/journal.c
>>> +++ b/drivers/md/bcache/journal.c
>>> @@ -375,7 +375,9 @@ static void btree_flush_write(struct cache_set *c)
>>> retry:
>>> best = NULL;
>>>
>>> - for_each_cached_btree(b, c, i)
>>> + rcu_read_lock();
>>> + for_each_cached_btree(b, c, i) {
>>> + mutex_lock(&b->write_lock);
>>
>
>>You can't sleep in rcu read critical section, yet here you take mutex
>>which can sleep under rcu_read_lock.
> To Nix: Yes, you are write. Good catch.
>
>>
>>> if (btree_current_write(b)->journal) {
>>> if (!best)
>>> best = b;
>>> @@ -385,6 +387,9 @@ static void btree_flush_write(struct cache_set *c)
>>> best = b;
>>> }
>>> }
>>> + mutex_unlock(&b->write_lock);
>>> + }
>>> + rcu_read_unlock();
>>>
>>> b = best;
>>> if (b) {
>
> Thanks,
> Tang Junhui
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] bcache: lock in btree_flush_write() to avoid races
@ 2018-01-25 2:30 tang.junhui
0 siblings, 0 replies; 8+ messages in thread
From: tang.junhui @ 2018-01-25 2:30 UTC (permalink / raw)
To: kent.overstreet
Cc: nix, colyli, mlyle, linux-bcache, linux-block, tang.junhui
From: Tang Junhui <tang.junhui@zte.com.cn>
Hello Kent
>The only purpose of rcu_read_lock() would be to ensure the object
>isn't freed out from under you. That's not an issue here.
>
I do not think so. In for_each_cached_btree(), we traverse all
btrees by hlist_for_each_entry_rcu(), and the hlist_for_each_entry_rcu()
should be protect by rcu lock, as the API describle bellow:
* This list-traversal primitive may safely run concurrently with
* the _rcu list-mutation primitives such as hlist_add_head_rcu()
* as long as the traversal is guarded by rcu_read_lock().
*/
#define hlist_for_each_entry_rcu(pos, head, member) \
for (pos = hlist_entry_safe (rcu_dereference_raw(hlist_first_rcu(head)),\
>Racing with other writers isn't a correctness issue here, and if it
>was - just throwing write_lock around where you have it wouldn't be
>sufficient, you'd need something much more complicated.
>
since we only read by hlist_for_each_entry_rcu(), so I think we does not
need any more write locker.
>On Wed, Jan 24, 2018 at 8:50 PM, <tang.junhui@zte.com.cn> wrote:
>> From: Tang Junhui <tang.junhui@zte.com.cn>
>>
>> Hello Kent && Nix
>>
>>>
>>>neither of those locks are needed - rcu_read_lock() isn't needed because we never
>>>free struct btree (except at shutdown), and we're not derefing journal there
>>
>> __bch_btree_node_write() is called in many places, in __bch_btree_node_write(),
>> before node writing over, it has changed the write buff index to another, but the
>> journal of changed write buff maybe still NULL if no journal writing occurred. But
>> luckily we only use the NULL as ZERO to calculate the fifo_idx, the result is a very
>> big value, so it does no harm. And since we cannot take mutex under rcu_read_lock,
>> we can ignore it.
>>
>> As to the rcu lock, though the btree is not freed, but we often delete it from one
>> list (such as one bucket_hash) and move it to another list (such as btree_cache_freed),
>> shouldn't it be protected by rcu locker?
>>
>>
>>>On Wed, Jan 24, 2018 at 5:30 AM, Nikolay Borisov <nborisov@suse.com> wrote:
>>>
>>>
>>>On 24.01.2018 08:54, tang.junhui@zte.com.cn wrote:
>>>> From: Tang Junhui <tang.junhui@zte.com.cn>
>>>>
>>>> In btree_flush_write(), two places need to take a locker to
>>>> avoid races:
>>>>
>>>> Firstly, we need take rcu read locker to protect the bucket_hash
>>>> traverse, since hlist_for_each_entry_rcu() must be called under
>>>> the protection of rcu read locker.
>>>>
>>>> Secondly, we need take b->write_lock locker to protect journal
>>>> of the btree node, otherwise, the btree node may have been
>>>> written, and the journal have been assign to NULL.
>>>>
>>>> Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
>>>> ---
>>>> drivers/md/bcache/journal.c | 7 ++++++-
>>>> 1 file changed, 6 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
>>>> index 02a98dd.505f9f3 100644
>>>> --- a/drivers/md/bcache/journal.c
>>>> +++ b/drivers/md/bcache/journal.c
>>>> @@ -375,7 +375,9 @@ static void btree_flush_write(struct cache_set *c)
>>>> retry:
>>>> best = NULL;
>>>>
>>>> - for_each_cached_btree(b, c, i)
>>>> + rcu_read_lock();
>>>> + for_each_cached_btree(b, c, i) {
>>>> + mutex_lock(&b->write_lock);
>>>
>>
>>>You can't sleep in rcu read critical section, yet here you take mutex
>>>which can sleep under rcu_read_lock.
>> To Nix: Yes, you are write. Good catch.
>>
>>>
>>>> if (btree_current_write(b)->journal) {
>>>> if (!best)
>>>> best = b;
>>>> @@ -385,6 +387,9 @@ static void btree_flush_write(struct cache_set *c)
>>>> best = b;
>>>> }
>>>> }
>>>> + mutex_unlock(&b->write_lock);
>>>> + }
>>>> + rcu_read_unlock();
>>>>
>>>> b = best;
>>>> if (b) {
>>
>> Thanks,
>> Tang Junhui
>>
Thanks,
Tang Junhui
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2018-01-25 2:29 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-25 2:30 [PATCH] bcache: lock in btree_flush_write() to avoid races tang.junhui
-- strict thread matches above, loose matches on Subject: below --
2018-01-25 1:50 tang.junhui
2018-01-25 1:58 ` Kent Overstreet
2018-01-24 9:45 tang.junhui
2018-01-24 9:49 ` Coly Li
2018-01-24 6:54 tang.junhui
2018-01-24 9:13 ` Coly Li
2018-01-24 10:30 ` Nikolay Borisov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).