public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Found it! (was Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds())
@ 2013-12-02 16:00 Linus Torvalds
  2013-12-02 16:27 ` Ingo Molnar
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Linus Torvalds @ 2013-12-02 16:00 UTC (permalink / raw)
  To: Simon Kirby
  Cc: Ian Applegate, Al Viro, Christoph Lameter, Pekka Enberg, LKML,
	Chris Mason

[-- Attachment #1: Type: text/plain, Size: 2185 bytes --]

On Sat, Nov 30, 2013 at 1:08 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I still don't see what could be wrong with the pipe_inode_info thing,
> but the fact that it's been so consistent in your traces does make me
> suspect it really is *that* particular slab.

I think I finally found it.

I've spent waaayy too much time looking at and thinking about that
code without seeing anything wrong, but this morning I woke up and
thought to myself "What if.."

And looking at the code again, I went "BINGO".

All our reference counting etc seems right, but we have one very
subtle bug: on the freeing path, we have a pattern like this:

        spin_lock(&inode->i_lock);
        if (!--pipe->files) {
                inode->i_pipe = NULL;
                kill = 1;
        }
        spin_unlock(&inode->i_lock);
        __pipe_unlock(pipe);
        if (kill)
                free_pipe_info(pipe);

which on the face of it is trying to be very careful in not accessing
the pipe-info after it is released by having that "kill" flag, and
doing the release last.

And it's complete garbage.

Why?

Because the thread that decrements "pipe->files" *without* releasing
it, will very much access it after it has been dropped: that
"__pipe_unlock(pipe)" happens *after* we've decremented the pipe
reference count and dropped the inode lock. So another CPU can come in
and free the structure concurrently with that __pipe_unlock(pipe).

This happens in two places, and we don't actually need or want the
pipe lock for the pipe->files accesses (since pipe->files is protected
by inode->i_lock, not the pipe lock), so the solution is to just do
the __pipe_unlock() before the whole dance about the pipe->files
reference count.

Patch appended. And no wonder nobody has ever seen it, because the
race is unlikely as hell to ever happen. Simon, I assume it will be
another few months before we can say "yeah, that fixed it", but I
really think this is it. It explains all the symptoms, including
"DEBUG_PAGEALLOC didn't catch it" (because the access happens just as
it is released, and DEBUG_PAGEALLOC takes too long to actually free
unmap the page etc).

                     Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 859 bytes --]

 fs/pipe.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/pipe.c b/fs/pipe.c
index d2c45e14e6d8..18f1a4b2dbbc 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -743,13 +743,14 @@ pipe_release(struct inode *inode, struct file *file)
 		kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
 		kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT);
 	}
+	__pipe_unlock(pipe);
+
 	spin_lock(&inode->i_lock);
 	if (!--pipe->files) {
 		inode->i_pipe = NULL;
 		kill = 1;
 	}
 	spin_unlock(&inode->i_lock);
-	__pipe_unlock(pipe);
 
 	if (kill)
 		free_pipe_info(pipe);
@@ -1130,13 +1131,14 @@ err_wr:
 	goto err;
 
 err:
+	__pipe_unlock(pipe);
+
 	spin_lock(&inode->i_lock);
 	if (!--pipe->files) {
 		inode->i_pipe = NULL;
 		kill = 1;
 	}
 	spin_unlock(&inode->i_lock);
-	__pipe_unlock(pipe);
 	if (kill)
 		free_pipe_info(pipe);
 	return ret;

^ permalink raw reply related	[flat|nested] 17+ messages in thread
* Re: Found it! (was Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds())
@ 2025-10-07  4:03 Steven Paul Jobs
  0 siblings, 0 replies; 17+ messages in thread
From: Steven Paul Jobs @ 2025-10-07  4:03 UTC (permalink / raw)
  To: torvalds
  Cc: Waiman.Long, chris.mason, cl, ia, linux-kernel, mingo, penberg,
	peterz, sim, viro


[-- Attachment #1.1.1: Type: text/plain, Size: 56 bytes --]



Steve Jobs


Sent from Proton Mail for Android.

[-- Attachment #1.1.2.1: Type: text/html, Size: 225 bytes --]

[-- Attachment #1.2: publickey - ispjobs@proton.me - 0xC9807EAC.asc --]
[-- Type: application/pgp-keys, Size: 830 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 343 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2025-10-07  4:04 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-02 16:00 Found it! (was Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds()) Linus Torvalds
2013-12-02 16:27 ` Ingo Molnar
2013-12-02 16:46   ` Al Viro
2013-12-02 17:05     ` Ingo Molnar
2013-12-02 17:06 ` Al Viro
2013-12-03  2:58 ` Linus Torvalds
2013-12-03  4:28   ` Al Viro
2013-12-05  8:12     ` gfs2 deadlock (was Re: Found it) Al Viro
2013-12-05 10:19       ` Steven Whitehouse
2013-12-03  8:52   ` [PATCH] mutexes: Add CONFIG_DEBUG_MUTEX_FASTPATH=y debug variant to debug SMP races Ingo Molnar
2013-12-03 18:10     ` Linus Torvalds
2013-12-04  9:19       ` Simon Kirby
2013-12-04 21:14         ` Linus Torvalds
2013-12-05  8:06           ` Simon Kirby
2013-12-05  6:57     ` Simon Kirby
2013-12-11 15:03     ` Waiman Long
  -- strict thread matches above, loose matches on Subject: below --
2025-10-07  4:03 Found it! (was Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds()) Steven Paul Jobs

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox