* [Cluster-devel] [BUG] fs/dlm: A possible sleep-in-atomic bug in dlm_master_lookup
@ 2017-10-07 1:59 ` Jia-Ju Bai
0 siblings, 0 replies; 6+ messages in thread
From: Jia-Ju Bai @ 2017-10-07 1:59 UTC (permalink / raw)
To: cluster-devel.redhat.com
According to fs/dlm/lock.c, the kernel may sleep under a spinlock,
and the function call path is:
dlm_master_lookup (acquire the spinlock)
dlm_send_rcom_lookup_dump
create_rcom
dlm_lowcomms_get_buffer
nodeid2con
mutex_lock --> may sleep
This bug is found by my static analysis tool and my code review.
Thanks,
Jia-Ju Bai
^ permalink raw reply [flat|nested] 6+ messages in thread* [BUG] fs/dlm: A possible sleep-in-atomic bug in dlm_master_lookup @ 2017-10-07 1:59 ` Jia-Ju Bai 0 siblings, 0 replies; 6+ messages in thread From: Jia-Ju Bai @ 2017-10-07 1:59 UTC (permalink / raw) To: ccaulfie, teigland, viro; +Cc: cluster-devel, linux-kernel According to fs/dlm/lock.c, the kernel may sleep under a spinlock, and the function call path is: dlm_master_lookup (acquire the spinlock) dlm_send_rcom_lookup_dump create_rcom dlm_lowcomms_get_buffer nodeid2con mutex_lock --> may sleep This bug is found by my static analysis tool and my code review. Thanks, Jia-Ju Bai ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Cluster-devel] [BUG] fs/dlm: A possible sleep-in-atomic bug in dlm_master_lookup 2017-10-07 1:59 ` Jia-Ju Bai @ 2017-10-07 2:26 ` Al Viro -1 siblings, 0 replies; 6+ messages in thread From: Al Viro @ 2017-10-07 2:26 UTC (permalink / raw) To: cluster-devel.redhat.com On Sat, Oct 07, 2017 at 09:59:41AM +0800, Jia-Ju Bai wrote: > According to fs/dlm/lock.c, the kernel may sleep under a spinlock, > and the function call path is: > dlm_master_lookup (acquire the spinlock) > dlm_send_rcom_lookup_dump > create_rcom > dlm_lowcomms_get_buffer > nodeid2con > mutex_lock --> may sleep > > This bug is found by my static analysis tool and my code review. Umm... dlm_master_lookup() locking is not nice, but to trigger that you would need a combination of * from_nodeid != our_nodeid (or we would've buggered off long before that point) * dir_nodeid == our_nodeid * failing dlm_search_rsb_tree(&ls->ls_rsbtbl[b].keep, name, len, &r) (success would have the lock dropped) * succeeding dlm_search_rsb_tree(&ls->ls_rsbtbl[b].toss, name, len, &r) * from_master being true * r->res_master_nodeid != from_nodeid and r->res_master_nodeid == our_nodeid (the former is follows from the latter, actually) The last one might or might not be impossible - I'm not familiar with dlm guts, but it does have log_error(ls, "from_master %d our_master", from_nodeid); just before that call, so it's worth a further look. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [BUG] fs/dlm: A possible sleep-in-atomic bug in dlm_master_lookup @ 2017-10-07 2:26 ` Al Viro 0 siblings, 0 replies; 6+ messages in thread From: Al Viro @ 2017-10-07 2:26 UTC (permalink / raw) To: Jia-Ju Bai; +Cc: ccaulfie, teigland, cluster-devel, linux-kernel On Sat, Oct 07, 2017 at 09:59:41AM +0800, Jia-Ju Bai wrote: > According to fs/dlm/lock.c, the kernel may sleep under a spinlock, > and the function call path is: > dlm_master_lookup (acquire the spinlock) > dlm_send_rcom_lookup_dump > create_rcom > dlm_lowcomms_get_buffer > nodeid2con > mutex_lock --> may sleep > > This bug is found by my static analysis tool and my code review. Umm... dlm_master_lookup() locking is not nice, but to trigger that you would need a combination of * from_nodeid != our_nodeid (or we would've buggered off long before that point) * dir_nodeid == our_nodeid * failing dlm_search_rsb_tree(&ls->ls_rsbtbl[b].keep, name, len, &r) (success would have the lock dropped) * succeeding dlm_search_rsb_tree(&ls->ls_rsbtbl[b].toss, name, len, &r) * from_master being true * r->res_master_nodeid != from_nodeid and r->res_master_nodeid == our_nodeid (the former is follows from the latter, actually) The last one might or might not be impossible - I'm not familiar with dlm guts, but it does have log_error(ls, "from_master %d our_master", from_nodeid); just before that call, so it's worth a further look. ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Cluster-devel] [BUG] fs/dlm: A possible sleep-in-atomic bug in dlm_master_lookup 2017-10-07 2:26 ` Al Viro @ 2017-10-09 14:37 ` David Teigland -1 siblings, 0 replies; 6+ messages in thread From: David Teigland @ 2017-10-09 14:37 UTC (permalink / raw) To: cluster-devel.redhat.com On Sat, Oct 07, 2017 at 03:26:11AM +0100, Al Viro wrote: > On Sat, Oct 07, 2017 at 09:59:41AM +0800, Jia-Ju Bai wrote: > > According to fs/dlm/lock.c, the kernel may sleep under a spinlock, > > and the function call path is: > > dlm_master_lookup (acquire the spinlock) > > dlm_send_rcom_lookup_dump > > create_rcom > > dlm_lowcomms_get_buffer > > nodeid2con > > mutex_lock --> may sleep > > > > This bug is found by my static analysis tool and my code review. > > Umm... dlm_master_lookup() locking is not nice, but to trigger that > you would need a combination of > > * from_nodeid != our_nodeid (or we would've buggered off long before that point) > * dir_nodeid == our_nodeid > * failing dlm_search_rsb_tree(&ls->ls_rsbtbl[b].keep, name, len, &r) > (success would have the lock dropped) > * succeeding dlm_search_rsb_tree(&ls->ls_rsbtbl[b].toss, name, len, &r) > * from_master being true > * r->res_master_nodeid != from_nodeid and r->res_master_nodeid == our_nodeid > (the former is follows from the latter, actually) > > The last one might or might not be impossible - I'm not familiar with dlm > guts, but it does have > log_error(ls, "from_master %d our_master", from_nodeid); > just before that call, so it's worth a further look. dlm_send_rcom_lookup_dump() was for debugging and can be removed. It's a condition that shouldn't happen, and I'm guessing I added that to catch any evidence if it did. I'm surprised it wasn't removed in the final version of the patch, but after 5 years I don't remember what I was thinking. I've pushed a commit dropping it to linux-dlm.git next. Thanks, Dave ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [BUG] fs/dlm: A possible sleep-in-atomic bug in dlm_master_lookup @ 2017-10-09 14:37 ` David Teigland 0 siblings, 0 replies; 6+ messages in thread From: David Teigland @ 2017-10-09 14:37 UTC (permalink / raw) To: Al Viro; +Cc: Jia-Ju Bai, ccaulfie, cluster-devel, linux-kernel On Sat, Oct 07, 2017 at 03:26:11AM +0100, Al Viro wrote: > On Sat, Oct 07, 2017 at 09:59:41AM +0800, Jia-Ju Bai wrote: > > According to fs/dlm/lock.c, the kernel may sleep under a spinlock, > > and the function call path is: > > dlm_master_lookup (acquire the spinlock) > > dlm_send_rcom_lookup_dump > > create_rcom > > dlm_lowcomms_get_buffer > > nodeid2con > > mutex_lock --> may sleep > > > > This bug is found by my static analysis tool and my code review. > > Umm... dlm_master_lookup() locking is not nice, but to trigger that > you would need a combination of > > * from_nodeid != our_nodeid (or we would've buggered off long before that point) > * dir_nodeid == our_nodeid > * failing dlm_search_rsb_tree(&ls->ls_rsbtbl[b].keep, name, len, &r) > (success would have the lock dropped) > * succeeding dlm_search_rsb_tree(&ls->ls_rsbtbl[b].toss, name, len, &r) > * from_master being true > * r->res_master_nodeid != from_nodeid and r->res_master_nodeid == our_nodeid > (the former is follows from the latter, actually) > > The last one might or might not be impossible - I'm not familiar with dlm > guts, but it does have > log_error(ls, "from_master %d our_master", from_nodeid); > just before that call, so it's worth a further look. dlm_send_rcom_lookup_dump() was for debugging and can be removed. It's a condition that shouldn't happen, and I'm guessing I added that to catch any evidence if it did. I'm surprised it wasn't removed in the final version of the patch, but after 5 years I don't remember what I was thinking. I've pushed a commit dropping it to linux-dlm.git next. Thanks, Dave ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-10-09 14:37 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-10-07 1:59 [Cluster-devel] [BUG] fs/dlm: A possible sleep-in-atomic bug in dlm_master_lookup Jia-Ju Bai 2017-10-07 1:59 ` Jia-Ju Bai 2017-10-07 2:26 ` [Cluster-devel] " Al Viro 2017-10-07 2:26 ` Al Viro 2017-10-09 14:37 ` [Cluster-devel] " David Teigland 2017-10-09 14:37 ` David Teigland
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.