From: Al Viro <viro@ZenIV.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Simon Kirby <sim@hostway.ca>, Ingo Molnar <mingo@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Waiman Long <Waiman.Long@hp.com>,
Ian Applegate <ia@cloudflare.com>,
Christoph Lameter <cl@gentwo.org>,
Pekka Enberg <penberg@kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
Chris Mason <chris.mason@fusionio.com>
Subject: Re: Found it! (was Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds())
Date: Tue, 3 Dec 2013 04:28:30 +0000 [thread overview]
Message-ID: <20131203042830.GI10323@ZenIV.linux.org.uk> (raw)
In-Reply-To: <CA+55aFyhzvJrCiJOqUgL-98=MVutWePToQXaVfS2j6e5eetJSw@mail.gmail.com>
On Mon, Dec 02, 2013 at 06:58:57PM -0800, Linus Torvalds wrote:
> In other words, it's unsafe to protect reference counts inside objects
> with anything but spinlocks and/or atomic refcounts. Or you have to
> have the lock *outside* the object you're protecting (which is often
> what you want for other reasons anyway, notably lookup).
>
> So having a non-atomic refcount protected inside a sleeping lock is a
> bug, and that's really the bug that ended up biting us for pipes.
>
> Now, the question is: do we have other such cases? How do we document
> this? Do we try to make mutexes and other locks safe to use for things
> like this?
Umm... AFAICS, in VFS proper we have
files_struct - atomic_dec_and_test
fs_struct - spinlock + int
file - atomic_long_dec_and_test (with delays after that, including
RCU).
super_block - global spinlock + int (s_count); the mutex in there
(->s_umount) can be taken by anybody who holds an active ref *or* has
bumped ->s_count while holding sb_lock. Exactly to prevent that kind of
unpleasantness. Freeing RCU-delayed.
vfsmount - percpu counter + flag + global seqlock, with quite a bit of
contortions for the sake of avoiding cross-CPU stores on fastpath; discussed
back in October, concluded to be safe. Freeing RCU-delayed.
dentry - lockref, with RCU-delayed actual freeing.
file_system_type, nls_table, linux_binfmt - module refcount of "owner";
search structures protected by global spinlocks or rwlocks, exiting module
is responsible for unregistering first.
inode - atomic_dec_and_lock, with actual freeing RCU-delayed (and
evicting code waiting for pagecache references to be gone, with the rest
being responsibility of fs method called before we free the sucker)
block_device - part of bdevfs inode
These should be safe, but damnit, we really need the lifecycle documented for
all objects - the above is only a part of it (note that for e.g. superblocks
we have additional rules re "->s_active can't be incremented for any reason
once it drops to zero, it can't be incremented until superblock had been
marked 'born' and it crosses over to zero only with ->s_umount held"; there's
6 stages in life cycle of struct super_block and we had interesting bugs due
to messing the transitions up). The trouble is, attempt to write those down
tends to stray into massive grep session, with usual results - some other
crap gets found (e.g. in some odd driver) and needs to be dealt with ;-/
Sigh...
next prev parent reply other threads:[~2013-12-03 4:28 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-02 16:00 Found it! (was Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds()) Linus Torvalds
2013-12-02 16:27 ` Ingo Molnar
2013-12-02 16:46 ` Al Viro
2013-12-02 17:05 ` Ingo Molnar
2013-12-02 17:06 ` Al Viro
2013-12-03 2:58 ` Linus Torvalds
2013-12-03 4:28 ` Al Viro [this message]
2013-12-05 8:12 ` gfs2 deadlock (was Re: Found it) Al Viro
2013-12-05 10:19 ` [Cluster-devel] " Steven Whitehouse
2013-12-05 10:19 ` Steven Whitehouse
2013-12-03 8:52 ` [PATCH] mutexes: Add CONFIG_DEBUG_MUTEX_FASTPATH=y debug variant to debug SMP races Ingo Molnar
2013-12-03 18:10 ` Linus Torvalds
2013-12-04 9:19 ` Simon Kirby
2013-12-04 21:14 ` Linus Torvalds
2013-12-05 8:06 ` Simon Kirby
2013-12-05 6:57 ` Simon Kirby
2013-12-11 15:03 ` Waiman Long
-- strict thread matches above, loose matches on Subject: below --
2025-10-07 4:03 Found it! (was Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds()) Steven Paul Jobs
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131203042830.GI10323@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=Waiman.Long@hp.com \
--cc=chris.mason@fusionio.com \
--cc=cl@gentwo.org \
--cc=ia@cloudflare.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=penberg@kernel.org \
--cc=peterz@infradead.org \
--cc=sim@hostway.ca \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.