All of lore.kernel.org
 help / color / mirror / Atom feed
* [Cluster-devel] [BUG] fs/dlm: A possible sleep-in-atomic bug in dlm_master_lookup
@ 2017-10-07  1:59 ` Jia-Ju Bai
  0 siblings, 0 replies; 6+ messages in thread
From: Jia-Ju Bai @ 2017-10-07  1:59 UTC (permalink / raw)
  To: cluster-devel.redhat.com

According to fs/dlm/lock.c, the kernel may sleep under a spinlock,
and the function call path is:
dlm_master_lookup (acquire the spinlock)
   dlm_send_rcom_lookup_dump
     create_rcom
       dlm_lowcomms_get_buffer
         nodeid2con
           mutex_lock --> may sleep

This bug is found by my static analysis tool and my code review.

Thanks,
Jia-Ju Bai



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [BUG] fs/dlm: A possible sleep-in-atomic bug in dlm_master_lookup
@ 2017-10-07  1:59 ` Jia-Ju Bai
  0 siblings, 0 replies; 6+ messages in thread
From: Jia-Ju Bai @ 2017-10-07  1:59 UTC (permalink / raw)
  To: ccaulfie, teigland, viro; +Cc: cluster-devel, linux-kernel

According to fs/dlm/lock.c, the kernel may sleep under a spinlock,
and the function call path is:
dlm_master_lookup (acquire the spinlock)
   dlm_send_rcom_lookup_dump
     create_rcom
       dlm_lowcomms_get_buffer
         nodeid2con
           mutex_lock --> may sleep

This bug is found by my static analysis tool and my code review.

Thanks,
Jia-Ju Bai

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Cluster-devel] [BUG] fs/dlm: A possible sleep-in-atomic bug in dlm_master_lookup
  2017-10-07  1:59 ` Jia-Ju Bai
@ 2017-10-07  2:26   ` Al Viro
  -1 siblings, 0 replies; 6+ messages in thread
From: Al Viro @ 2017-10-07  2:26 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Sat, Oct 07, 2017 at 09:59:41AM +0800, Jia-Ju Bai wrote:
> According to fs/dlm/lock.c, the kernel may sleep under a spinlock,
> and the function call path is:
> dlm_master_lookup (acquire the spinlock)
>   dlm_send_rcom_lookup_dump
>     create_rcom
>       dlm_lowcomms_get_buffer
>         nodeid2con
>           mutex_lock --> may sleep
> 
> This bug is found by my static analysis tool and my code review.

Umm...  dlm_master_lookup() locking is not nice, but to trigger that
you would need a combination of

* from_nodeid != our_nodeid (or we would've buggered off long before that point)
* dir_nodeid == our_nodeid
* failing dlm_search_rsb_tree(&ls->ls_rsbtbl[b].keep, name, len, &r)
(success would have the lock dropped)
* succeeding dlm_search_rsb_tree(&ls->ls_rsbtbl[b].toss, name, len, &r)
* from_master being true
* r->res_master_nodeid != from_nodeid and r->res_master_nodeid == our_nodeid
(the former is follows from the latter, actually)

The last one might or might not be impossible - I'm not familiar with dlm
guts, but it does have
                        log_error(ls, "from_master %d our_master", from_nodeid);
just before that call, so it's worth a further look.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] fs/dlm: A possible sleep-in-atomic bug in dlm_master_lookup
@ 2017-10-07  2:26   ` Al Viro
  0 siblings, 0 replies; 6+ messages in thread
From: Al Viro @ 2017-10-07  2:26 UTC (permalink / raw)
  To: Jia-Ju Bai; +Cc: ccaulfie, teigland, cluster-devel, linux-kernel

On Sat, Oct 07, 2017 at 09:59:41AM +0800, Jia-Ju Bai wrote:
> According to fs/dlm/lock.c, the kernel may sleep under a spinlock,
> and the function call path is:
> dlm_master_lookup (acquire the spinlock)
>   dlm_send_rcom_lookup_dump
>     create_rcom
>       dlm_lowcomms_get_buffer
>         nodeid2con
>           mutex_lock --> may sleep
> 
> This bug is found by my static analysis tool and my code review.

Umm...  dlm_master_lookup() locking is not nice, but to trigger that
you would need a combination of

* from_nodeid != our_nodeid (or we would've buggered off long before that point)
* dir_nodeid == our_nodeid
* failing dlm_search_rsb_tree(&ls->ls_rsbtbl[b].keep, name, len, &r)
(success would have the lock dropped)
* succeeding dlm_search_rsb_tree(&ls->ls_rsbtbl[b].toss, name, len, &r)
* from_master being true
* r->res_master_nodeid != from_nodeid and r->res_master_nodeid == our_nodeid
(the former is follows from the latter, actually)

The last one might or might not be impossible - I'm not familiar with dlm
guts, but it does have
                        log_error(ls, "from_master %d our_master", from_nodeid);
just before that call, so it's worth a further look.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Cluster-devel] [BUG] fs/dlm: A possible sleep-in-atomic bug in dlm_master_lookup
  2017-10-07  2:26   ` Al Viro
@ 2017-10-09 14:37     ` David Teigland
  -1 siblings, 0 replies; 6+ messages in thread
From: David Teigland @ 2017-10-09 14:37 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Sat, Oct 07, 2017 at 03:26:11AM +0100, Al Viro wrote:
> On Sat, Oct 07, 2017 at 09:59:41AM +0800, Jia-Ju Bai wrote:
> > According to fs/dlm/lock.c, the kernel may sleep under a spinlock,
> > and the function call path is:
> > dlm_master_lookup (acquire the spinlock)
> >   dlm_send_rcom_lookup_dump
> >     create_rcom
> >       dlm_lowcomms_get_buffer
> >         nodeid2con
> >           mutex_lock --> may sleep
> > 
> > This bug is found by my static analysis tool and my code review.
> 
> Umm...  dlm_master_lookup() locking is not nice, but to trigger that
> you would need a combination of
> 
> * from_nodeid != our_nodeid (or we would've buggered off long before that point)
> * dir_nodeid == our_nodeid
> * failing dlm_search_rsb_tree(&ls->ls_rsbtbl[b].keep, name, len, &r)
> (success would have the lock dropped)
> * succeeding dlm_search_rsb_tree(&ls->ls_rsbtbl[b].toss, name, len, &r)
> * from_master being true
> * r->res_master_nodeid != from_nodeid and r->res_master_nodeid == our_nodeid
> (the former is follows from the latter, actually)
> 
> The last one might or might not be impossible - I'm not familiar with dlm
> guts, but it does have
>                         log_error(ls, "from_master %d our_master", from_nodeid);
> just before that call, so it's worth a further look.

dlm_send_rcom_lookup_dump() was for debugging and can be removed.  It's a
condition that shouldn't happen, and I'm guessing I added that to catch
any evidence if it did.  I'm surprised it wasn't removed in the final
version of the patch, but after 5 years I don't remember what I was
thinking.  I've pushed a commit dropping it to linux-dlm.git next.

Thanks,
Dave



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] fs/dlm: A possible sleep-in-atomic bug in dlm_master_lookup
@ 2017-10-09 14:37     ` David Teigland
  0 siblings, 0 replies; 6+ messages in thread
From: David Teigland @ 2017-10-09 14:37 UTC (permalink / raw)
  To: Al Viro; +Cc: Jia-Ju Bai, ccaulfie, cluster-devel, linux-kernel

On Sat, Oct 07, 2017 at 03:26:11AM +0100, Al Viro wrote:
> On Sat, Oct 07, 2017 at 09:59:41AM +0800, Jia-Ju Bai wrote:
> > According to fs/dlm/lock.c, the kernel may sleep under a spinlock,
> > and the function call path is:
> > dlm_master_lookup (acquire the spinlock)
> >   dlm_send_rcom_lookup_dump
> >     create_rcom
> >       dlm_lowcomms_get_buffer
> >         nodeid2con
> >           mutex_lock --> may sleep
> > 
> > This bug is found by my static analysis tool and my code review.
> 
> Umm...  dlm_master_lookup() locking is not nice, but to trigger that
> you would need a combination of
> 
> * from_nodeid != our_nodeid (or we would've buggered off long before that point)
> * dir_nodeid == our_nodeid
> * failing dlm_search_rsb_tree(&ls->ls_rsbtbl[b].keep, name, len, &r)
> (success would have the lock dropped)
> * succeeding dlm_search_rsb_tree(&ls->ls_rsbtbl[b].toss, name, len, &r)
> * from_master being true
> * r->res_master_nodeid != from_nodeid and r->res_master_nodeid == our_nodeid
> (the former is follows from the latter, actually)
> 
> The last one might or might not be impossible - I'm not familiar with dlm
> guts, but it does have
>                         log_error(ls, "from_master %d our_master", from_nodeid);
> just before that call, so it's worth a further look.

dlm_send_rcom_lookup_dump() was for debugging and can be removed.  It's a
condition that shouldn't happen, and I'm guessing I added that to catch
any evidence if it did.  I'm surprised it wasn't removed in the final
version of the patch, but after 5 years I don't remember what I was
thinking.  I've pushed a commit dropping it to linux-dlm.git next.

Thanks,
Dave

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-10-09 14:37 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-07  1:59 [Cluster-devel] [BUG] fs/dlm: A possible sleep-in-atomic bug in dlm_master_lookup Jia-Ju Bai
2017-10-07  1:59 ` Jia-Ju Bai
2017-10-07  2:26 ` [Cluster-devel] " Al Viro
2017-10-07  2:26   ` Al Viro
2017-10-09 14:37   ` [Cluster-devel] " David Teigland
2017-10-09 14:37     ` David Teigland

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.