From mboxrd@z Thu Jan 1 00:00:00 1970 From: Al Viro Date: Sat, 7 Oct 2017 03:26:11 +0100 Subject: [Cluster-devel] [BUG] fs/dlm: A possible sleep-in-atomic bug in dlm_master_lookup In-Reply-To: <58efdfb6-d18d-cb45-ecd4-4c9b680d7595@163.com> References: <58efdfb6-d18d-cb45-ecd4-4c9b680d7595@163.com> Message-ID: <20171007022611.GO21978@ZenIV.linux.org.uk> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit On Sat, Oct 07, 2017 at 09:59:41AM +0800, Jia-Ju Bai wrote: > According to fs/dlm/lock.c, the kernel may sleep under a spinlock, > and the function call path is: > dlm_master_lookup (acquire the spinlock) > dlm_send_rcom_lookup_dump > create_rcom > dlm_lowcomms_get_buffer > nodeid2con > mutex_lock --> may sleep > > This bug is found by my static analysis tool and my code review. Umm... dlm_master_lookup() locking is not nice, but to trigger that you would need a combination of * from_nodeid != our_nodeid (or we would've buggered off long before that point) * dir_nodeid == our_nodeid * failing dlm_search_rsb_tree(&ls->ls_rsbtbl[b].keep, name, len, &r) (success would have the lock dropped) * succeeding dlm_search_rsb_tree(&ls->ls_rsbtbl[b].toss, name, len, &r) * from_master being true * r->res_master_nodeid != from_nodeid and r->res_master_nodeid == our_nodeid (the former is follows from the latter, actually) The last one might or might not be impossible - I'm not familiar with dlm guts, but it does have log_error(ls, "from_master %d our_master", from_nodeid); just before that call, so it's worth a further look. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753049AbdJGC0T (ORCPT ); Fri, 6 Oct 2017 22:26:19 -0400 Received: from zeniv.linux.org.uk ([195.92.253.2]:40728 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752155AbdJGC0S (ORCPT ); Fri, 6 Oct 2017 22:26:18 -0400 Date: Sat, 7 Oct 2017 03:26:11 +0100 From: Al Viro To: Jia-Ju Bai Cc: ccaulfie@redhat.com, teigland@redhat.com, cluster-devel@redhat.com, linux-kernel@vger.kernel.org Subject: Re: [BUG] fs/dlm: A possible sleep-in-atomic bug in dlm_master_lookup Message-ID: <20171007022611.GO21978@ZenIV.linux.org.uk> References: <58efdfb6-d18d-cb45-ecd4-4c9b680d7595@163.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <58efdfb6-d18d-cb45-ecd4-4c9b680d7595@163.com> User-Agent: Mutt/1.9.0 (2017-09-02) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Oct 07, 2017 at 09:59:41AM +0800, Jia-Ju Bai wrote: > According to fs/dlm/lock.c, the kernel may sleep under a spinlock, > and the function call path is: > dlm_master_lookup (acquire the spinlock) > dlm_send_rcom_lookup_dump > create_rcom > dlm_lowcomms_get_buffer > nodeid2con > mutex_lock --> may sleep > > This bug is found by my static analysis tool and my code review. Umm... dlm_master_lookup() locking is not nice, but to trigger that you would need a combination of * from_nodeid != our_nodeid (or we would've buggered off long before that point) * dir_nodeid == our_nodeid * failing dlm_search_rsb_tree(&ls->ls_rsbtbl[b].keep, name, len, &r) (success would have the lock dropped) * succeeding dlm_search_rsb_tree(&ls->ls_rsbtbl[b].toss, name, len, &r) * from_master being true * r->res_master_nodeid != from_nodeid and r->res_master_nodeid == our_nodeid (the former is follows from the latter, actually) The last one might or might not be impossible - I'm not familiar with dlm guts, but it does have log_error(ls, "from_master %d our_master", from_nodeid); just before that call, so it's worth a further look.