All of lore.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Simon Kirby <sim@hostway.ca>, Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Waiman Long <Waiman.Long@hp.com>,
	Ian Applegate <ia@cloudflare.com>,
	Christoph Lameter <cl@gentwo.org>,
	Pekka Enberg <penberg@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Chris Mason <chris.mason@fusionio.com>,
	Steven Whitehouse <swhiteho@redhat.com>
Subject: gfs2 deadlock (was Re: Found it)
Date: Thu, 5 Dec 2013 08:12:01 +0000	[thread overview]
Message-ID: <20131205081201.GA17203@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20131203042830.GI10323@ZenIV.linux.org.uk>

On Tue, Dec 03, 2013 at 04:28:30AM +0000, Al Viro wrote:

> These should be safe, but damnit, we really need the lifecycle documented for
> all objects - the above is only a part of it (note that for e.g. superblocks
> we have additional rules re "->s_active can't be incremented for any reason
> once it drops to zero, it can't be incremented until superblock had been
> marked 'born' and it crosses over to zero only with ->s_umount held"; there's
> 6 stages in life cycle of struct super_block and we had interesting bugs due
> to messing the transitions up).  The trouble is, attempt to write those down
> tends to stray into massive grep session, with usual results - some other
> crap gets found (e.g. in some odd driver) and needs to be dealt with ;-/
> Sigh...

... and sure enough, this time is no different - gfs2 sysfs-related code
cheerfully violates lifetime rules for superblocks, which would've
caused a major mess later, if it had not immediately caused a deadlock
on the same superblock ;-/

Watch: gfs2 creates a bunch of files in sysfs (/sys/fs/gfs2/<devname>/*).
Said bunch gets removed from ->put_super().  Which is called under
->s_umount.  Guess what happens if somebody tries to write "1" to
/sys/fs/gfs2/.../freeze just as we enter that ->put_super() (or at any
point starting from the moment when deactivate_locked_super() has dropped
the last active reference)?  This:
static ssize_t freeze_store(struct gfs2_sbd *sdp, const char *buf, size_t len)
{
        int error;
        int n = simple_strtol(buf, NULL, 0);

        if (!capable(CAP_SYS_ADMIN))
                return -EPERM;

        switch (n) {
[snip]
        case 1:
                error = freeze_super(sdp->sd_vfs);

And freeze_super(sb) assumes that caller has an active reference to
sb:
int freeze_super(struct super_block *sb)
{
        int ret;

        atomic_inc(&sb->s_active);

... which is not legitimate when ->s_active has already reached zero.
And right after that we hit this:
        down_write(&sb->s_umount);

Voila - write(2) is waiting for ->s_umount, while umount(2) is holding
->s_umount and waits for write(2) to get past freeze_store().

Hell knows what to do here - atomic_inc_not_zero() in freeze_super()
(and failing if it fails) would've worked, but it doesn't help with
the deadlock - just write "0" instead and we hit thaw_super(), which
starts with grabbing ->s_umount.  atomic_inc_not_zero()/deactivate_super()
around that call of thaw_super() would probably work, but I'll need to
look at that after I get some sleep...

Why bother with sysfs, anyway?  What's wrong with putting those same files
on gfs2meta, seeing that _this_ would have no problems with object lifetimes?
Too late by now, of course, but...

  reply	other threads:[~2013-12-05  8:12 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-02 16:00 Found it! (was Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds()) Linus Torvalds
2013-12-02 16:27 ` Ingo Molnar
2013-12-02 16:46   ` Al Viro
2013-12-02 17:05     ` Ingo Molnar
2013-12-02 17:06 ` Al Viro
2013-12-03  2:58 ` Linus Torvalds
2013-12-03  4:28   ` Al Viro
2013-12-05  8:12     ` Al Viro [this message]
2013-12-05 10:19       ` [Cluster-devel] gfs2 deadlock (was Re: Found it) Steven Whitehouse
2013-12-05 10:19         ` Steven Whitehouse
2013-12-03  8:52   ` [PATCH] mutexes: Add CONFIG_DEBUG_MUTEX_FASTPATH=y debug variant to debug SMP races Ingo Molnar
2013-12-03 18:10     ` Linus Torvalds
2013-12-04  9:19       ` Simon Kirby
2013-12-04 21:14         ` Linus Torvalds
2013-12-05  8:06           ` Simon Kirby
2013-12-05  6:57     ` Simon Kirby
2013-12-11 15:03     ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131205081201.GA17203@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=Waiman.Long@hp.com \
    --cc=chris.mason@fusionio.com \
    --cc=cl@gentwo.org \
    --cc=ia@cloudflare.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=penberg@kernel.org \
    --cc=peterz@infradead.org \
    --cc=sim@hostway.ca \
    --cc=swhiteho@redhat.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.