From: Al Viro <viro@ZenIV.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Simon Kirby <sim@hostway.ca>, Ian Applegate <ia@cloudflare.com>,
Christoph Lameter <cl@gentwo.org>,
Pekka Enberg <penberg@kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
Chris Mason <chris.mason@fusionio.com>
Subject: Re: Found it! (was Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds())
Date: Mon, 2 Dec 2013 17:06:21 +0000 [thread overview]
Message-ID: <20131202170621.GG10323@ZenIV.linux.org.uk> (raw)
In-Reply-To: <CA+55aFyez7uZ6LeXrBRCqJzgJ=w4Xv+CV6QSYp5NkOJ2R9Xang@mail.gmail.com>
On Mon, Dec 02, 2013 at 08:00:38AM -0800, Linus Torvalds wrote:
> On Sat, Nov 30, 2013 at 1:08 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > I still don't see what could be wrong with the pipe_inode_info thing,
> > but the fact that it's been so consistent in your traces does make me
> > suspect it really is *that* particular slab.
>
> I think I finally found it.
>
> I've spent waaayy too much time looking at and thinking about that
> code without seeing anything wrong, but this morning I woke up and
> thought to myself "What if.."
>
> And looking at the code again, I went "BINGO".
>
> All our reference counting etc seems right, but we have one very
> subtle bug: on the freeing path, we have a pattern like this:
>
> spin_lock(&inode->i_lock);
> if (!--pipe->files) {
> inode->i_pipe = NULL;
> kill = 1;
> }
> spin_unlock(&inode->i_lock);
> __pipe_unlock(pipe);
> if (kill)
> free_pipe_info(pipe);
>
> which on the face of it is trying to be very careful in not accessing
> the pipe-info after it is released by having that "kill" flag, and
> doing the release last.
>
> And it's complete garbage.
>
> Why?
>
> Because the thread that decrements "pipe->files" *without* releasing
> it, will very much access it after it has been dropped: that
> "__pipe_unlock(pipe)" happens *after* we've decremented the pipe
> reference count and dropped the inode lock. So another CPU can come in
> and free the structure concurrently with that __pipe_unlock(pipe).
>
> This happens in two places, and we don't actually need or want the
> pipe lock for the pipe->files accesses (since pipe->files is protected
> by inode->i_lock, not the pipe lock), so the solution is to just do
> the __pipe_unlock() before the whole dance about the pipe->files
> reference count.
>
> Patch appended. And no wonder nobody has ever seen it, because the
> race is unlikely as hell to ever happen. Simon, I assume it will be
> another few months before we can say "yeah, that fixed it", but I
> really think this is it. It explains all the symptoms, including
> "DEBUG_PAGEALLOC didn't catch it" (because the access happens just as
> it is released, and DEBUG_PAGEALLOC takes too long to actually free
> unmap the page etc).
Nice catch. I'd probably add
/* we are holding inode->i_pipe->files */
static void drop_pipe(struct inode *inode, struct pipe_inode_info *pipe)
{
int kill = 0;
spin_lock(&inode->i_lock);
WARN_ON(pipe != inode->i_pipe);
if (!--pipe->files) {
inode->i_pipe = NULL;
kill = 1;
}
spin_unlock(&inode->i_lock);
if (kill)
free_pipe_info(pipe);
}
and use it in both places...
next prev parent reply other threads:[~2013-12-02 17:06 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-02 16:00 Found it! (was Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds()) Linus Torvalds
2013-12-02 16:27 ` Ingo Molnar
2013-12-02 16:46 ` Al Viro
2013-12-02 17:05 ` Ingo Molnar
2013-12-02 17:06 ` Al Viro [this message]
2013-12-03 2:58 ` Linus Torvalds
2013-12-03 4:28 ` Al Viro
2013-12-05 8:12 ` gfs2 deadlock (was Re: Found it) Al Viro
2013-12-05 10:19 ` Steven Whitehouse
2013-12-03 8:52 ` [PATCH] mutexes: Add CONFIG_DEBUG_MUTEX_FASTPATH=y debug variant to debug SMP races Ingo Molnar
2013-12-03 18:10 ` Linus Torvalds
2013-12-04 9:19 ` Simon Kirby
2013-12-04 21:14 ` Linus Torvalds
2013-12-05 8:06 ` Simon Kirby
2013-12-05 6:57 ` Simon Kirby
2013-12-11 15:03 ` Waiman Long
-- strict thread matches above, loose matches on Subject: below --
2025-10-07 4:03 Found it! (was Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds()) Steven Paul Jobs
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131202170621.GG10323@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=chris.mason@fusionio.com \
--cc=cl@gentwo.org \
--cc=ia@cloudflare.com \
--cc=linux-kernel@vger.kernel.org \
--cc=penberg@kernel.org \
--cc=sim@hostway.ca \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox