linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Al Viro <viro@ZenIV.linux.org.uk>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Chris Mason <clm@fb.com>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	David Howells <dhowells@redhat.com>,
	elena.reshetova@intel.com, ishkamiel@gmail.com,
	dwindsor@gmail.com, gregkh@linuxfoundation.org,
	peterz@infradead.org
Subject: [RFC][PATCH 00/10] On inode::i_count and the usage vs reference count issue
Date: Fri, 24 Feb 2017 16:43:29 +0100	[thread overview]
Message-ID: <20170224154329.478276481@infradead.org> (raw)

(my appologies if this arrives a second time; I seem to have
fat-fingered my send command the first time and things didn't reach
neither me or the list).

Hi all,

So I'm not entirely happy with these patches; but I don't really know
fs/inode.c as well as some of you and I figured I'd reached a point where I
need feedback (or maybe I'm well past that, we'll see).

So the kernel has recently grown a reference count type, this thing is fairly
strict with semantics; such that it can give 'helpful' warnings when people
'accidentally' violate the rules and create bugs.

The one at the core of this patch set is that refcount_t assumes 0 means 'free'
or 'freeing'.

The problem is that inode::i_count is _not_ a reference count, it is a usage
count (for lack of a better name), it counts how many active users of the inode
are out there. But 0 users is a perfectly fine state for an inode to be in,
it'll just sit in the cache waiting for a new user (or reclaim).

Now refcount_t has no operations to increment once we've hit 0, because if you
assume 0 means 'free', increment from 0 means use-after-free, and that's a bad
thing.

So what this patch-set attempts is doing a +1 bias on the usage-count to turn
it into an actual reference count, where the extra reference is the pointer the
cache itself has to the object.

This then results in the need to do something like: dec_and_lock at the 2->1
transition instead of the usual 1->0; for this purpose we introduce
refcount_dec_unless().

So far, it sounds fairly sensible; _except_ for the wee little problem that a
fair amount of code looks at the value of i_count. Some of this is fine, eg.
the evict path verifies it is indeed 0. But other places look at !0 values and
those are suspect.

To make matters worse; once i_count is a refcount, it appears trivial to avoid
inode_hash_lock for lookups (yay RCU!) and looking at i_count becomes even more
of a problem because then holding i_lock will not in fact stabilize it anymore.

So I've 'ignored' (by assuming they were already broken) the i_count
observers and done that RCU conversion -- even though I have no idea what
workload would hit the global inode_hash_lock hard enough for it to matter
(see, maybe I'm well past the point where I could've used feedback).


There's a number of options here:

 - I'm not completely insane, and these patches can be made to work.

 - We decide usage-counts are useful and try and support them in refcount_t;
   this has the down-side that people can more easily write bad code (by doing
   from 0 increments that should not have happened).

 - We decide usage-counts need their own type (urgh, more...).

 - None of the above, we keep i_count as is and let people hunt and convert
   actual refcounts.


I'm ok with all those; I just figured it'd be 'fun' to convert something
non-trivial. FWIW, this boots and builds a kernel (but that's about all the
testing its had).

             reply	other threads:[~2017-02-24 16:41 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-24 15:43 Peter Zijlstra [this message]
2017-02-24 15:43 ` [RFC][PATCH 01/10] fs: Use lockdep_assert_held() instead of comments Peter Zijlstra
2017-02-24 15:43 ` [RFC][PATCH 02/10] fs: Avoid looking at i_count without i_lock held Peter Zijlstra
     [not found]   ` <CA+55aFxLw8FXf61rsGYDjA1tS=joDeaF7OSgaepLWwcz4zt=dg@mail.gmail.com>
2017-02-24 17:06     ` Peter Zijlstra
2017-02-24 15:43 ` [RFC][PATCH 03/10] fs: Introduce i_count() Peter Zijlstra
2017-02-24 15:43 ` [RFC][PATCH 04/10] fs: Restructure iput() Peter Zijlstra
2017-02-24 15:43 ` [RFC][PATCH 05/10] fs: Remove iput_final() Peter Zijlstra
2017-02-24 15:43 ` [RFC][PATCH 06/10] fs: Rework i_count Peter Zijlstra
2017-02-24 20:49   ` Al Viro
2017-02-24 15:43 ` [RFC][PATCH 07/10] orangefs: Use RCU for destroy_inode Peter Zijlstra
2017-02-24 20:52   ` Al Viro
2017-02-24 23:00     ` Mike Marshall
2017-02-25 20:31       ` Mike Marshall
2017-02-27  0:34         ` Mike Marshall
2017-02-27  1:20           ` Linus Torvalds
2017-02-27  8:44         ` David Howells
2017-02-27 14:44           ` Mike Marshall
2017-02-24 15:43 ` [RFC][PATCH 08/10] fs: Do RCU versions for find_inode() Peter Zijlstra
2017-02-24 15:43 ` [RFC][PATCH 09/10] locking/refcount: Provide refcount_dec_unless() Peter Zijlstra
2017-02-27  9:28   ` Reshetova, Elena
2017-02-24 15:43 ` [RFC][PATCH 10/10] fs: Convert i_count over to refcount_t Peter Zijlstra
2017-02-24 16:43 ` [RFC][PATCH 00/10] On inode::i_count and the usage vs reference count issue Christoph Hellwig
2017-02-24 17:07   ` Peter Zijlstra
2017-02-24 20:59   ` David Windsor
     [not found] ` <CA+55aFy1bNbsX_3T-s_EUwTP-r_SmJJMvB3=-2nffehFVP=EdQ@mail.gmail.com>
     [not found]   ` <CA+55aFz0DbAGZ8gc+s35nm1N5frXjK_NOh7QzuSfZeJbjsT6Sg@mail.gmail.com>
     [not found]     ` <CA+55aFyR8wkHps5_AqUqzx8MDMNxRZZ7+MYH9g=ZCUi=4Oey8w@mail.gmail.com>
2017-02-24 19:24       ` Fwd: " Linus Torvalds
2017-02-24 20:42 ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170224154329.478276481@infradead.org \
    --to=peterz@infradead.org \
    --cc=clm@fb.com \
    --cc=dhowells@redhat.com \
    --cc=dwindsor@gmail.com \
    --cc=elena.reshetova@intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=ishkamiel@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).