From mboxrd@z Thu Jan  1 00:00:00 1970
From: Andrew Price <anprice@redhat.com>
Date: Fri, 13 Nov 2015 11:48:04 +0000
Subject: [Cluster-devel] [PATCH] GFS2: Hold gl_hash_table bucket locks
 in glock_hash_walk
In-Reply-To: <5644F8D7.4000808@redhat.com>
References: <1447348800-18177-1-git-send-email-anprice@redhat.com>
	<812913157.9626150.1447359075574.JavaMail.zimbra@redhat.com>
	<5644F8D7.4000808@redhat.com>
Message-ID: <5645CDF4.9030205@redhat.com>
List-Id: <cluster-devel.redhat.com>
To: cluster-devel.redhat.com
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit

On 12/11/15 20:38, Steven Whitehouse wrote:
> Hi,
>
> On 12/11/15 20:11, Bob Peterson wrote:
>> ----- Original Message -----
>>> This lockdep splat was being triggered on umount:
>>>
>>> [55715.973122] ===============================
>>> [55715.980169] [ INFO: suspicious RCU usage. ]
>>> [55715.981021] 4.3.0-11553-g8d3de01-dirty #15 Tainted: G        W
>>> [55715.982353] -------------------------------
>>> [55715.983301] fs/gfs2/glock.c:1427 suspicious
>>> rcu_dereference_protected()
>>> usage!
>>>
>>> The code it refers to is the rht_for_each_entry_safe usage in
>>> glock_hash_walk. The condition that triggers the warning is
>>> lockdep_rht_bucket_is_held(tbl, hash) which is checked in the
>>> __rcu_dereference_protected macro.
>>>
>>> lockdep_rht_bucket_is_held returns false when rht_bucket_lock(tbl, hash)
>>> is not held, which suggests that glock_hash_walk should be holding the
>>> lock for each rhashtable bucket it looks at. Holding those locks clears
>>> up the warning.
>>>
>>> Signed-off-by: Andrew Price <anprice@redhat.com>
>>> ---
>>>   fs/gfs2/glock.c | 2 ++
>>>   1 file changed, 2 insertions(+)
>>>
>>> diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
>>> index 32e7471..7384cbb 100644
>>> --- a/fs/gfs2/glock.c
>>> +++ b/fs/gfs2/glock.c
>>> @@ -1424,11 +1424,13 @@ static void glock_hash_walk(glock_examiner
>>> examiner,
>>> const struct gfs2_sbd *sdp)
>>>       rcu_read_lock();
>>>       tbl = rht_dereference_rcu(gl_hash_table.tbl, &gl_hash_table);
>>>       for (i = 0; i < tbl->size; i++) {
>>> +        spin_lock(rht_bucket_lock(tbl, i));
>>>           rht_for_each_entry_safe(gl, pos, next, tbl, i, gl_node) {
>>>               if ((gl->gl_name.ln_sbd == sdp) &&
>>>                   lockref_get_not_dead(&gl->gl_lockref))
>>>                   examiner(gl);
>>>           }
>>> +        spin_unlock(rht_bucket_lock(tbl, i));
>>>       }
>>>       rcu_read_unlock();
>>>       cond_resched();
>>> --
> Not sure that this is the correct fix. There is nothing changing in the
> examiner function that I can see which would require the bucket lock,
> and we already have this with an rcu_read_lock, which is correct since
> we are only reading the hash lists, not changing them.

Good point, I wasn't sure whether it was safe to assume the examiner 
functions weren't changing the buckets and erred on the side of caution.

rht_for_each_entry_safe() does an unconditional

     rcu_dereference_protected(p, lockdep_rht_bucket_is_held(tbl, hash))

so we're likely using the wrong rht_for_each_entry* macro. I'm currently 
testing a version which uses rht_for_each_entry_rcu() instead. That one uses

     rcu_dereference_check(p, lockdep_rht_bucket_is_held(tbl, hash))

which is defined as

     #define rcu_dereference_check(p, c) \
        __rcu_dereference_check((p), (c) || rcu_read_lock_held(), __rcu)

so that looks like the right one to use. I'll send that patch shortly.

Andy