public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: Jan Kara <jack@suse.cz>
Cc: Qian Cai <quic_qiancai@quicinc.com>,
	Theodore Ts'o <tytso@mit.edu>, Jan Kara <jack@suse.com>,
	Neeraj Upadhyay <quic_neeraju@quicinc.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	Boqun Feng <boqun.feng@gmail.com>,
	linux-ext4@vger.kernel.org, rcu@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH] jbd2: avoid __GFP_ZERO with SLAB_TYPESAFE_BY_RCU
Date: Thu, 10 Feb 2022 12:06:34 -0800	[thread overview]
Message-ID: <20220210200634.GO4285@paulmck-ThinkPad-P17-Gen-1> (raw)
In-Reply-To: <20220210191252.3cbe2sy3jmqul2mh@quack3.lan>

On Thu, Feb 10, 2022 at 08:12:52PM +0100, Jan Kara wrote:
> On Thu 10-02-22 06:57:13, Paul E. McKenney wrote:
> > On Thu, Feb 10, 2022 at 10:16:48AM +0100, Jan Kara wrote:
> > > On Wed 09-02-22 12:11:37, Paul E. McKenney wrote:
> > > > On Wed, Feb 09, 2022 at 07:10:10PM +0100, Jan Kara wrote:
> > > > > On Wed 09-02-22 11:57:42, Qian Cai wrote:
> > > > > > Since the linux-next commit 120aa5e57479 (mm: Check for
> > > > > > SLAB_TYPESAFE_BY_RCU and __GFP_ZERO slab allocation), we will get a
> > > > > > boot warning. Avoid it by calling synchronize_rcu() before the zeroing.
> > > > > > 
> > > > > > Signed-off-by: Qian Cai <quic_qiancai@quicinc.com>
> > > > > 
> > > > > No, the performance impact of this would be just horrible. Can you
> > > > > ellaborate a bit why SLAB_TYPESAFE_BY_RCU + __GFP_ZERO is a problem and why
> > > > > synchronize_rcu() would be needed here before the memset() please? I mean
> > > > > how is zeroing here any different from the memory just being used?
> > > > 
> > > > Suppose a reader picks up a pointer to a memory block, then that memory
> > > > is freed.  No problem, given that this is a SLAB_TYPESAFE_BY_RCU slab,
> > > > so the memory won't be freed while the reader is accessing it.  But while
> > > > the reader is in the process of validating the block, it is zeroed.
> > > > 
> > > > How does the validation step handle this in all cases?
> > > > 
> > > > If you have a way of handling this, I will of course drop the patch.
> > > > And learn something new, which is always a good thing.  ;-)
> > > 
> > > So I maybe missed something when implementing the usage of journal_heads
> > > under RCU but let's have a look. An example of RCU user of journal heads
> > > is fs/jbd2/transaction.c:jbd2_write_access_granted(). It does:
> > > 
> > >         rcu_read_lock();
> > > 
> > > 	// This part fetches journal_head from buffer_head - not related to
> > > 	// our slab RCU discussion
> > > 
> > >         if (!buffer_jbd(bh))
> > >                 goto out;
> > >         /* This should be bh2jh() but that doesn't work with inline functions */
> > >         jh = READ_ONCE(bh->b_private);
> > >         if (!jh)
> > >                 goto out;
> > > 
> > > 	// The validation comes here
> > > 
> > >         /* For undo access buffer must have data copied */
> > >         if (undo && !jh->b_committed_data)
> > >                 goto out;
> > 
> > OK, so if *jh was freed and re-zallocated in the meantime, this test
> > should fail.  One concern would be if the zeroing was not at least eight
> > bytes at a time, maybe due to overly eager use of fancy SIMD hardware.
> > Though perhaps you also do something about ->b_committed_data on
> > the free path, the commit-done path, or whatever?  (I do see a
> > "jh->b_committed_data = NULL" on what might well be the commit-done path.)
> >
> > >         if (READ_ONCE(jh->b_transaction) != handle->h_transaction &&
> > >             READ_ONCE(jh->b_next_transaction) != handle->h_transaction)
> > >                 goto out;
> > 
> > And same with these guys.
> 
> Yes, on commit-done path we zero out jh->b_transaction (or set
> jh->b_transaction = jh->b_next_transaction; jh->b_next_transaction = NULL).
> So these fields are actually guaranteed to be zero on free.

Very good, thank you!

One more thing...

This assumes that when the slab allocator gets a fresh slab from mm,
that slab has been zeroed, right?  Or is there some other trick that
you are using to somehow accommodate randomly initialized memory?

Or am I just blind today and missing where the zeroing always happens
(other than for pages destined for userspace)?

> > > 	// Then some more checks unrelated to the slab itself.
> > > 
> > >         /*
> > >          * There are two reasons for the barrier here:
> > >          * 1) Make sure to fetch b_bh after we did previous checks so that we
> > >          * detect when jh went through free, realloc, attach to transaction
> > >          * while we were checking. Paired with implicit barrier in that path.
> > >          * 2) So that access to bh done after jbd2_write_access_granted()
> > >          * doesn't get reordered and see inconsistent state of concurrent
> > >          * do_get_write_access().
> > >          */
> > >         smp_mb();
> > >         if (unlikely(jh->b_bh != bh))
> > >                 goto out;
> > > 
> > > 	// If all passed
> > > 
> > > 	rcu_read_unlock();
> > > 	return true;
> > > 
> > > So if we are going to return true from the function, we know that 'jh' was
> > > attached to handle->h_transaction at some point. And when 'jh' was attached
> > > to handle->h_transaction, the transaction was holding reference to the 'jh'
> > > and our 'handle' holds reference to the transaction so 'jh' could not be
> > > freed since that moment. I.e., we are sure our reference to the handle keeps
> > > 'jh' alive and we can safely use it.
> > > 
> > > I don't see how any amount of scribbling over 'jh' could break this
> > > validation. But maybe it is just a lack of my imagination :).
> > 
> > Regardless of whether you are suffering a lack of imagination, you
> > have clearly demonstrated that it is possible to correctly use the
> > SLAB_TYPESAFE_BY_RCU flag in conjunction with kmem_cache_alloc(), thus
> > demonstrating that I was suffering from a lack of imagination.  ;-)
> > 
> > I have therefore reverted my commit.  Please accept my apologies for
> > the hassle!
> 
> No problem. Thanks for reverting the patch. I can imagine that jbd2's use
> of SLAB_TYPESAFE_BY_RCU is an unusual one...

I do like the very lightweight validation checks!

My mental model of this sort of validation always involved expensive
atomic operations.  ;-)

							Thanx, Paul

  reply	other threads:[~2022-02-10 20:06 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-09 16:57 [RFC PATCH] jbd2: avoid __GFP_ZERO with SLAB_TYPESAFE_BY_RCU Qian Cai
2022-02-09 18:10 ` Jan Kara
2022-02-09 18:46   ` Qian Cai
2022-02-09 20:11   ` Paul E. McKenney
2022-02-10  5:07     ` Theodore Ts'o
2022-02-10  5:43       ` Paul E. McKenney
2022-02-10 15:54         ` Theodore Ts'o
2022-02-10  9:16     ` Jan Kara
2022-02-10 14:57       ` Paul E. McKenney
2022-02-10 19:12         ` Jan Kara
2022-02-10 20:06           ` Paul E. McKenney [this message]
2022-02-10 20:08             ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220210200634.GO4285@paulmck-ThinkPad-P17-Gen-1 \
    --to=paulmck@kernel.org \
    --cc=boqun.feng@gmail.com \
    --cc=jack@suse.com \
    --cc=jack@suse.cz \
    --cc=joel@joelfernandes.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=quic_neeraju@quicinc.com \
    --cc=quic_qiancai@quicinc.com \
    --cc=rcu@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox