public inbox for linux-bcachefs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>,
	linux-bcachefs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, Dave Chinner <dchinner@redhat.com>
Subject: Re: [GIT PULL] bcachefs changes for 6.12-rc1
Date: Tue, 24 Sep 2024 13:04:29 +1000	[thread overview]
Message-ID: <ZvIsPe4JbJ7HX2sQ@dread.disaster.area> (raw)
In-Reply-To: <CAHk-=whbD0zwn-0RMNdgAw-8wjVJFQh4o_hGqffazAiW7DwXSQ@mail.gmail.com>

On Mon, Sep 23, 2024 at 07:26:31PM -0700, Linus Torvalds wrote:
> On Mon, 23 Sept 2024 at 17:27, Dave Chinner <david@fromorbit.com> wrote:
> >
> > However, the problematic workload is cold cache operations where
> > the dentry cache repeatedly misses. This places all the operational
> > concurrency directly on the inode hash as new inodes are inserted
> > into the hash. Add memory reclaim and that adds contention as it
> > removes inodes from the hash on eviction.
> 
> Yeah, and then we spend all the time just adding the inodes to the
> hashes, and probably fairly seldom use them. Oh well.
> 
> And I had missed the issue with PREEMPT_RT and the fact that right now
> the inode hash lock is outside the inode lock, which is problematic.

*nod*

> So it's all a bit nasty.
> 
> But I also assume most of the bad issues end up mainly showing up on
> just fairly synthetic benchmarks with ramdisks, because even with a
> good SSD I suspect the IO for the cold cache would still dominate?

No, all the issues show up on consumer level NVMe SSDs - they have
more than enough IO concurrency to cause these CPU concurrency
related problems.

Keep in mind that when it comes to doing huge amounts of IO,
ramdisks are fundamentally flawed and don't scale.  That is, the IO
is synchonous memcpy() based and so consumes CPU time and both read
and write memory bandwidth, and concurrency is limited to the number
of CPUs in the system..

With NVMe SSDs, all the data movement is asynchronous and offloaded
to hardware with DMA engines that move the data. Those DMA engines
can often handle hundreds of concurrent IOs at once.

DMA sourced data is also only written to RAM once, and there are no
dependent data reads to slow down the DMA write to RAM like there is
with a data copy streamed through a CPU. IOWs, once the IO rates and
concurrency go up, it is generally much faster to use the CPU to
program the DMA engines to move the data than it is to move the data
with the CPU itself.

The testing I did (and so the numbers in those benchmarks) was done
on 2018-era PCIe 3.0 enterprise NVMe SSDs that could do
approximately 400k 4kB random read IOPS. The latest consumer PCIe
5.0 NVMe SSDs are *way faster* than these drives when subject to
highly concurrent IO requests...

-Dave.
-- 
Dave Chinner
david@fromorbit.com

  parent reply	other threads:[~2024-09-24  3:04 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-21 19:27 [GIT PULL] bcachefs changes for 6.12-rc1 Kent Overstreet
2024-09-23 17:18 ` Linus Torvalds
2024-09-23 19:07   ` Linus Torvalds
2024-09-23 19:58     ` Kent Overstreet
2024-09-23 19:56   ` Kent Overstreet
2024-09-24  0:26     ` Dave Chinner
2024-09-24  1:55       ` Kent Overstreet
2024-09-24  2:26       ` Linus Torvalds
2024-09-24  2:48         ` Linus Torvalds
2024-09-24  3:55           ` Dave Chinner
2024-09-24 16:57             ` Linus Torvalds
2024-09-24 17:27               ` Kent Overstreet
2024-09-25  0:17               ` Dave Chinner
2024-09-25  1:45                 ` Linus Torvalds
2024-09-25 11:41                   ` Christian Brauner
2024-09-25  2:48                 ` Kent Overstreet
2024-09-27  0:48                   ` Herbert Xu
2024-09-28  0:11                     ` Kent Overstreet
2024-09-28  0:47                       ` Herbert Xu
2024-09-24  2:55         ` Kent Overstreet
2024-09-24  3:34           ` Dave Chinner
2024-09-24  3:47             ` Kent Overstreet
2024-09-25  1:00               ` Dave Chinner
2024-09-25  2:13                 ` Kent Overstreet
2024-09-25  4:43                   ` Dave Chinner
2024-09-25  5:11                     ` Kent Overstreet
2024-09-24  3:04         ` Dave Chinner [this message]
2024-09-23 19:06 ` pr-tracker-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZvIsPe4JbJ7HX2sQ@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=dchinner@redhat.com \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-bcachefs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox