public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Dave Chinner <david@fromorbit.com>
Cc: JonasZhou-oc <JonasZhou-oc@zhaoxin.com>,
	viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	CobeChen@zhaoxin.com, LouisQi@zhaoxin.com, JonasZhou@zhaoxin.com
Subject: Re: [PATCH] fs/address_space: move i_mmap_rwsem to mitigate a false sharing with i_mmap.
Date: Tue, 6 Feb 2024 23:33:50 +0000	[thread overview]
Message-ID: <ZcLB3pCSFRXRodzN@casper.infradead.org> (raw)
In-Reply-To: <ZcKmP3zVdBwJVxGd@dread.disaster.area>

On Wed, Feb 07, 2024 at 08:35:59AM +1100, Dave Chinner wrote:
> > The solution to this problem is to change the interval tree data structure
> > from an Red-Black tree to a B-tree, or something similar where we use
> > an array of pointers instead of a single pointer.
> 
> .... B-trees are not immune to pointer chasing problems, either.
> Most use binary searches within the nodes, and so we have the
> problem of unpredictable cacheline misses within the nodes as well
> as being unable to do depth based prefetch similar to rbtrees.
> 
> Perhaps we should be looking at something like this:
> 
> https://lore.kernel.org/linux-fsdevel/20240126220655.395093-2-kent.overstreet@linux.dev/

I need more data (and maybe Kent has it!)

Without any data, I believe that Eytzinger layout is an idea that was
a really good one in the 1990s/2000s and has now passed its expiration
date because of the changed nature of hardware.

In the mid-90s, we could do O(10) instructions in the time it took to
service a LLC miss.  Today, it's more like O(2500).  That means it is
far more important to be predictable than it is to execute the minimum
number of instructions.  If your B-tree nodes are 256kB in size (I believe
that's what bacachefs is using?), then Eytzinger layout may make some
sense, but if you're using smaller nodes (which I further believe is
appropriate for an in-memory B-tree), then a straight 'load each index
and compare it' may outperform a search that jumps around inside a node.

I'm pontificating and will happily yield to someone who has data.
I've tried to mark my assumptions clearly above.


Something else I'm not aware of (and filesystem B-trees will not have
any experience of) is what research exists on efficiently adding new
entries to a balanced tree so as to minimise rebalances.  Filesystems
are like the Maple Tree in that for every logical offset inside a file,
there is precisely one answer to "what physical block does this map to".

For the i_mmap tree, we want instead to answer the question "Which VMAs
have an intersection with this range of the file", and for the benchmark
in question, we will have a large number of entries that compare equal to
each other (they have the same start, and same end, but different values).
So they could be inserted at many different points in the tree.  We'd like
to choose the point which causes the least amount of rebalancing.

  reply	other threads:[~2024-02-06 23:33 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-02  9:34 [PATCH] fs/address_space: move i_mmap_rwsem to mitigate a false sharing with i_mmap JonasZhou-oc
2024-02-02 10:18 ` Christian Brauner
2024-02-02 15:03 ` Matthew Wilcox
2024-02-02 19:32   ` Matthew Wilcox
2024-02-05  3:22     ` Dave Chinner
2024-02-05 23:28       ` Matthew Wilcox
2024-02-06 21:35         ` Dave Chinner
2024-02-06 23:33           ` Matthew Wilcox [this message]
2024-02-05  6:22     ` JonasZhou
2024-02-05 23:08       ` Matthew Wilcox
2024-02-06 13:06         ` Christian Brauner
2024-02-05 23:15       ` Dave Chinner
2024-03-06  6:16         ` JonasZhou
  -- strict thread matches above, loose matches on Subject: below --
2024-02-02  8:33 JonasZhou-oc
2024-02-02 16:20 ` Al Viro
2024-02-05 11:56 ` Christian Brauner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZcLB3pCSFRXRodzN@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=CobeChen@zhaoxin.com \
    --cc=JonasZhou-oc@zhaoxin.com \
    --cc=JonasZhou@zhaoxin.com \
    --cc=LouisQi@zhaoxin.com \
    --cc=brauner@kernel.org \
    --cc=david@fromorbit.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox