public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Simon Kirby <sim@hostway.ca>, Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Waiman Long <Waiman.Long@hp.com>,
	Ian Applegate <ia@cloudflare.com>,
	Christoph Lameter <cl@gentwo.org>,
	Pekka Enberg <penberg@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Chris Mason <chris.mason@fusionio.com>
Subject: Re: Found it! (was Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds())
Date: Tue, 3 Dec 2013 04:28:30 +0000	[thread overview]
Message-ID: <20131203042830.GI10323@ZenIV.linux.org.uk> (raw)
In-Reply-To: <CA+55aFyhzvJrCiJOqUgL-98=MVutWePToQXaVfS2j6e5eetJSw@mail.gmail.com>

On Mon, Dec 02, 2013 at 06:58:57PM -0800, Linus Torvalds wrote:

> In other words, it's unsafe to protect reference counts inside objects
> with anything but spinlocks and/or atomic refcounts. Or you have to
> have the lock *outside* the object you're protecting (which is often
> what you want for other reasons anyway, notably lookup).
> 
> So having a non-atomic refcount protected inside a sleeping lock is a
> bug, and that's really the bug that ended up biting us for pipes.
> 
> Now, the question is: do we have other such cases? How do we document
> this? Do we try to make mutexes and other locks safe to use for things
> like this?

Umm...  AFAICS, in VFS proper we have
	files_struct - atomic_dec_and_test
	fs_struct - spinlock + int
	file - atomic_long_dec_and_test (with delays after that, including
RCU).
	super_block - global spinlock + int (s_count); the mutex in there
(->s_umount) can be taken by anybody who holds an active ref *or* has
bumped ->s_count while holding sb_lock.  Exactly to prevent that kind of
unpleasantness.  Freeing RCU-delayed.
	vfsmount - percpu counter + flag + global seqlock, with quite a bit of
contortions for the sake of avoiding cross-CPU stores on fastpath; discussed
back in October, concluded to be safe.  Freeing RCU-delayed.
	dentry - lockref, with RCU-delayed actual freeing.
	file_system_type, nls_table, linux_binfmt - module refcount of "owner";
search structures protected by global spinlocks or rwlocks, exiting module
is responsible for unregistering first.
	inode - atomic_dec_and_lock, with actual freeing RCU-delayed (and
evicting code waiting for pagecache references to be gone, with the rest
being responsibility of fs method called before we free the sucker)
	block_device - part of bdevfs inode

These should be safe, but damnit, we really need the lifecycle documented for
all objects - the above is only a part of it (note that for e.g. superblocks
we have additional rules re "->s_active can't be incremented for any reason
once it drops to zero, it can't be incremented until superblock had been
marked 'born' and it crosses over to zero only with ->s_umount held"; there's
6 stages in life cycle of struct super_block and we had interesting bugs due
to messing the transitions up).  The trouble is, attempt to write those down
tends to stray into massive grep session, with usual results - some other
crap gets found (e.g. in some odd driver) and needs to be dealt with ;-/
Sigh...

  reply	other threads:[~2013-12-03  4:28 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-02 16:00 Found it! (was Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds()) Linus Torvalds
2013-12-02 16:27 ` Ingo Molnar
2013-12-02 16:46   ` Al Viro
2013-12-02 17:05     ` Ingo Molnar
2013-12-02 17:06 ` Al Viro
2013-12-03  2:58 ` Linus Torvalds
2013-12-03  4:28   ` Al Viro [this message]
2013-12-05  8:12     ` gfs2 deadlock (was Re: Found it) Al Viro
2013-12-05 10:19       ` Steven Whitehouse
2013-12-03  8:52   ` [PATCH] mutexes: Add CONFIG_DEBUG_MUTEX_FASTPATH=y debug variant to debug SMP races Ingo Molnar
2013-12-03 18:10     ` Linus Torvalds
2013-12-04  9:19       ` Simon Kirby
2013-12-04 21:14         ` Linus Torvalds
2013-12-05  8:06           ` Simon Kirby
2013-12-05  6:57     ` Simon Kirby
2013-12-11 15:03     ` Waiman Long
  -- strict thread matches above, loose matches on Subject: below --
2025-10-07  4:03 Found it! (was Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds()) Steven Paul Jobs

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131203042830.GI10323@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=Waiman.Long@hp.com \
    --cc=chris.mason@fusionio.com \
    --cc=cl@gentwo.org \
    --cc=ia@cloudflare.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=penberg@kernel.org \
    --cc=peterz@infradead.org \
    --cc=sim@hostway.ca \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox