All of lore.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Simon Kirby <sim@hostway.ca>, Ian Applegate <ia@cloudflare.com>,
	Christoph Lameter <cl@gentwo.org>,
	Pekka Enberg <penberg@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Chris Mason <chris.mason@fusionio.com>
Subject: Re: Found it! (was Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds())
Date: Mon, 2 Dec 2013 17:06:21 +0000	[thread overview]
Message-ID: <20131202170621.GG10323@ZenIV.linux.org.uk> (raw)
In-Reply-To: <CA+55aFyez7uZ6LeXrBRCqJzgJ=w4Xv+CV6QSYp5NkOJ2R9Xang@mail.gmail.com>

On Mon, Dec 02, 2013 at 08:00:38AM -0800, Linus Torvalds wrote:
> On Sat, Nov 30, 2013 at 1:08 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > I still don't see what could be wrong with the pipe_inode_info thing,
> > but the fact that it's been so consistent in your traces does make me
> > suspect it really is *that* particular slab.
> 
> I think I finally found it.
> 
> I've spent waaayy too much time looking at and thinking about that
> code without seeing anything wrong, but this morning I woke up and
> thought to myself "What if.."
> 
> And looking at the code again, I went "BINGO".
> 
> All our reference counting etc seems right, but we have one very
> subtle bug: on the freeing path, we have a pattern like this:
> 
>         spin_lock(&inode->i_lock);
>         if (!--pipe->files) {
>                 inode->i_pipe = NULL;
>                 kill = 1;
>         }
>         spin_unlock(&inode->i_lock);
>         __pipe_unlock(pipe);
>         if (kill)
>                 free_pipe_info(pipe);
> 
> which on the face of it is trying to be very careful in not accessing
> the pipe-info after it is released by having that "kill" flag, and
> doing the release last.
> 
> And it's complete garbage.
> 
> Why?
> 
> Because the thread that decrements "pipe->files" *without* releasing
> it, will very much access it after it has been dropped: that
> "__pipe_unlock(pipe)" happens *after* we've decremented the pipe
> reference count and dropped the inode lock. So another CPU can come in
> and free the structure concurrently with that __pipe_unlock(pipe).
> 
> This happens in two places, and we don't actually need or want the
> pipe lock for the pipe->files accesses (since pipe->files is protected
> by inode->i_lock, not the pipe lock), so the solution is to just do
> the __pipe_unlock() before the whole dance about the pipe->files
> reference count.
> 
> Patch appended. And no wonder nobody has ever seen it, because the
> race is unlikely as hell to ever happen. Simon, I assume it will be
> another few months before we can say "yeah, that fixed it", but I
> really think this is it. It explains all the symptoms, including
> "DEBUG_PAGEALLOC didn't catch it" (because the access happens just as
> it is released, and DEBUG_PAGEALLOC takes too long to actually free
> unmap the page etc).

Nice catch.  I'd probably add

/* we are holding inode->i_pipe->files */
static void drop_pipe(struct inode *inode, struct pipe_inode_info *pipe)
{
	int kill = 0;
        spin_lock(&inode->i_lock);
	WARN_ON(pipe != inode->i_pipe);
        if (!--pipe->files) {
                inode->i_pipe = NULL;
                kill = 1;
        }
        spin_unlock(&inode->i_lock);
        if (kill)
                free_pipe_info(pipe);
}

and use it in both places...

  parent reply	other threads:[~2013-12-02 17:06 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-02 16:00 Found it! (was Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds()) Linus Torvalds
2013-12-02 16:27 ` Ingo Molnar
2013-12-02 16:46   ` Al Viro
2013-12-02 17:05     ` Ingo Molnar
2013-12-02 17:06 ` Al Viro [this message]
2013-12-03  2:58 ` Linus Torvalds
2013-12-03  4:28   ` Al Viro
2013-12-05  8:12     ` gfs2 deadlock (was Re: Found it) Al Viro
2013-12-05 10:19       ` [Cluster-devel] " Steven Whitehouse
2013-12-05 10:19         ` Steven Whitehouse
2013-12-03  8:52   ` [PATCH] mutexes: Add CONFIG_DEBUG_MUTEX_FASTPATH=y debug variant to debug SMP races Ingo Molnar
2013-12-03 18:10     ` Linus Torvalds
2013-12-04  9:19       ` Simon Kirby
2013-12-04 21:14         ` Linus Torvalds
2013-12-05  8:06           ` Simon Kirby
2013-12-05  6:57     ` Simon Kirby
2013-12-11 15:03     ` Waiman Long
  -- strict thread matches above, loose matches on Subject: below --
2025-10-07  4:03 Found it! (was Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds()) Steven Paul Jobs

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131202170621.GG10323@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=chris.mason@fusionio.com \
    --cc=cl@gentwo.org \
    --cc=ia@cloudflare.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=penberg@kernel.org \
    --cc=sim@hostway.ca \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.