public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
From: "Theodore Ts'o" <tytso@mit.edu>
To: David Wang <00107082@163.com>
Cc: adilger.kernel@dilger.ca, linux-ext4@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC] ext4: use kmem_cache for short fname allocation in readdir
Date: Fri, 30 May 2025 00:35:16 -0400	[thread overview]
Message-ID: <20250530043516.GD332467@mit.edu> (raw)
In-Reply-To: <20250529144256.4517-1-00107082@163.com>

On Thu, May 29, 2025 at 10:42:56PM +0800, David Wang wrote:
> When searching files, ext4_readdir would kzalloc() a fname
> object for each entry. It would be faster if a dedicated
> kmem_cache is used for fname.
> 
> But fnames are of variable length.
> 
> This patch suggests using kmem_cache for fname with short
> length, and resorting to kzalloc when fname needs larger buffer.
> Assuming long file names are not very common.
> 
> Profiling when searching files in kernel code base, with following
> command:
> 	# perf record -g -e cpu-clock --freq=max bash -c \
> 	"for i in {1..100}; do find ./linux -name notfoundatall > /dev/null; done"
> And using sample counts as indicator of performance improvement.

I would think a better indicator of performance improvement would be
to measure the system time when running the find commands.  (i.e.,
either using getrusange with RUSAGE_CHILDREN or wait3 or wait4).

We're trading off some extra memory usage and code complexity with
less CPU time because entries in the kmem_cache might be more TLB
friendly.  But this is only really going to be applicable if the
directory is large enough such that the cycles spent in readdir is
significant compared to the rest of the userspace program, *and* you
are reading the directory multiple times (e.g., calling find on a
directory hierarchy many, many times) such that the disk blocks are
cahed and you don't need to read them from the storage device.
Otherwise the I/O costs will completely dominate and swamp the
marginal TLB cache savings.

Given that it's really rare for readdir() to be the bottleneck of many
workloads, the question is, is it worth it?

						- Ted

  parent reply	other threads:[~2025-05-30  4:35 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-29 14:42 [RFC] ext4: use kmem_cache for short fname allocation in readdir David Wang
2025-05-29 21:44 ` Andreas Dilger
2025-05-30  2:27   ` David Wang
2025-05-30  4:35 ` Theodore Ts'o [this message]
2025-05-30  6:12   ` David Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250530043516.GD332467@mit.edu \
    --to=tytso@mit.edu \
    --cc=00107082@163.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox