All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Teigland <teigland@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [BUG] fs/dlm: A possible sleep-in-atomic bug in dlm_master_lookup
Date: Mon, 9 Oct 2017 09:37:29 -0500	[thread overview]
Message-ID: <20171009143729.GA8549@redhat.com> (raw)
In-Reply-To: <20171007022611.GO21978@ZenIV.linux.org.uk>

On Sat, Oct 07, 2017 at 03:26:11AM +0100, Al Viro wrote:
> On Sat, Oct 07, 2017 at 09:59:41AM +0800, Jia-Ju Bai wrote:
> > According to fs/dlm/lock.c, the kernel may sleep under a spinlock,
> > and the function call path is:
> > dlm_master_lookup (acquire the spinlock)
> >   dlm_send_rcom_lookup_dump
> >     create_rcom
> >       dlm_lowcomms_get_buffer
> >         nodeid2con
> >           mutex_lock --> may sleep
> > 
> > This bug is found by my static analysis tool and my code review.
> 
> Umm...  dlm_master_lookup() locking is not nice, but to trigger that
> you would need a combination of
> 
> * from_nodeid != our_nodeid (or we would've buggered off long before that point)
> * dir_nodeid == our_nodeid
> * failing dlm_search_rsb_tree(&ls->ls_rsbtbl[b].keep, name, len, &r)
> (success would have the lock dropped)
> * succeeding dlm_search_rsb_tree(&ls->ls_rsbtbl[b].toss, name, len, &r)
> * from_master being true
> * r->res_master_nodeid != from_nodeid and r->res_master_nodeid == our_nodeid
> (the former is follows from the latter, actually)
> 
> The last one might or might not be impossible - I'm not familiar with dlm
> guts, but it does have
>                         log_error(ls, "from_master %d our_master", from_nodeid);
> just before that call, so it's worth a further look.

dlm_send_rcom_lookup_dump() was for debugging and can be removed.  It's a
condition that shouldn't happen, and I'm guessing I added that to catch
any evidence if it did.  I'm surprised it wasn't removed in the final
version of the patch, but after 5 years I don't remember what I was
thinking.  I've pushed a commit dropping it to linux-dlm.git next.

Thanks,
Dave



WARNING: multiple messages have this Message-ID (diff)
From: David Teigland <teigland@redhat.com>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Jia-Ju Bai <baijiaju1990@163.com>,
	ccaulfie@redhat.com, cluster-devel@redhat.com,
	linux-kernel@vger.kernel.org
Subject: Re: [BUG] fs/dlm: A possible sleep-in-atomic bug in dlm_master_lookup
Date: Mon, 9 Oct 2017 09:37:29 -0500	[thread overview]
Message-ID: <20171009143729.GA8549@redhat.com> (raw)
In-Reply-To: <20171007022611.GO21978@ZenIV.linux.org.uk>

On Sat, Oct 07, 2017 at 03:26:11AM +0100, Al Viro wrote:
> On Sat, Oct 07, 2017 at 09:59:41AM +0800, Jia-Ju Bai wrote:
> > According to fs/dlm/lock.c, the kernel may sleep under a spinlock,
> > and the function call path is:
> > dlm_master_lookup (acquire the spinlock)
> >   dlm_send_rcom_lookup_dump
> >     create_rcom
> >       dlm_lowcomms_get_buffer
> >         nodeid2con
> >           mutex_lock --> may sleep
> > 
> > This bug is found by my static analysis tool and my code review.
> 
> Umm...  dlm_master_lookup() locking is not nice, but to trigger that
> you would need a combination of
> 
> * from_nodeid != our_nodeid (or we would've buggered off long before that point)
> * dir_nodeid == our_nodeid
> * failing dlm_search_rsb_tree(&ls->ls_rsbtbl[b].keep, name, len, &r)
> (success would have the lock dropped)
> * succeeding dlm_search_rsb_tree(&ls->ls_rsbtbl[b].toss, name, len, &r)
> * from_master being true
> * r->res_master_nodeid != from_nodeid and r->res_master_nodeid == our_nodeid
> (the former is follows from the latter, actually)
> 
> The last one might or might not be impossible - I'm not familiar with dlm
> guts, but it does have
>                         log_error(ls, "from_master %d our_master", from_nodeid);
> just before that call, so it's worth a further look.

dlm_send_rcom_lookup_dump() was for debugging and can be removed.  It's a
condition that shouldn't happen, and I'm guessing I added that to catch
any evidence if it did.  I'm surprised it wasn't removed in the final
version of the patch, but after 5 years I don't remember what I was
thinking.  I've pushed a commit dropping it to linux-dlm.git next.

Thanks,
Dave

  reply	other threads:[~2017-10-09 14:37 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-07  1:59 [Cluster-devel] [BUG] fs/dlm: A possible sleep-in-atomic bug in dlm_master_lookup Jia-Ju Bai
2017-10-07  1:59 ` Jia-Ju Bai
2017-10-07  2:26 ` [Cluster-devel] " Al Viro
2017-10-07  2:26   ` Al Viro
2017-10-09 14:37   ` David Teigland [this message]
2017-10-09 14:37     ` David Teigland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171009143729.GA8549@redhat.com \
    --to=teigland@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.