public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
From: "David Wang" <00107082@163.com>
To: "Theodore Ts'o" <tytso@mit.edu>
Cc: adilger.kernel@dilger.ca, linux-ext4@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC] ext4: use kmem_cache for short fname allocation in readdir
Date: Fri, 30 May 2025 14:12:27 +0800 (CST)	[thread overview]
Message-ID: <2579f239.5415.1971fd2086a.Coremail.00107082@163.com> (raw)
In-Reply-To: <20250530043516.GD332467@mit.edu>


At 2025-05-30 12:35:16, "Theodore Ts'o" <tytso@mit.edu> wrote:
>On Thu, May 29, 2025 at 10:42:56PM +0800, David Wang wrote:
>> When searching files, ext4_readdir would kzalloc() a fname
>> object for each entry. It would be faster if a dedicated
>> kmem_cache is used for fname.
>> 
>> But fnames are of variable length.
>> 
>> This patch suggests using kmem_cache for fname with short
>> length, and resorting to kzalloc when fname needs larger buffer.
>> Assuming long file names are not very common.
>> 
>> Profiling when searching files in kernel code base, with following
>> command:
>> 	# perf record -g -e cpu-clock --freq=max bash -c \
>> 	"for i in {1..100}; do find ./linux -name notfoundatall > /dev/null; done"
>> And using sample counts as indicator of performance improvement.
>
>I would think a better indicator of performance improvement would be
>to measure the system time when running the find commands.  (i.e.,
>either using getrusange with RUSAGE_CHILDREN or wait3 or wait4).

I did use `time` to compare system time when search files with find,
and I did see slight improvement.  
The std deviation is quite high for the whole `find` process though.

>
>We're trading off some extra memory usage and code complexity with
>less CPU time because entries in the kmem_cache might be more TLB
>friendly.  But this is only really going to be applicable if the
>directory is large enough such that the cycles spent in readdir is
>significant compared to the rest of the userspace program, *and* you
>are reading the directory multiple times (e.g., calling find on a
>directory hierarchy many, many times) such that the disk blocks are
>cahed and you don't need to read them from the storage device.
>Otherwise the I/O costs will completely dominate and swamp the
>marginal TLB cache savings.

Yes, the test was run with cache-hot. 
But repeating search files is not  uncommon practice,  `find` would run with cache-hot
except the first round.
 
>
>Given that it's really rare for readdir() to be the bottleneck of many
>workloads, the question is, is it worth it?

That's the question I have been thinking about.
Beside marginal improvement for readdir(), I would argue with the impact on other parts in system
when searching files. Even with cache-code, searching large dir would involving high frequent of
malloc() for a short interval,  This might have transient negative impact  on others which also request malloc(), but with
low frequency.  But I don't have a convincing examples for this, it's all theoretical .


Thanks
David

>
>						- Ted

      reply	other threads:[~2025-05-30  6:12 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-29 14:42 [RFC] ext4: use kmem_cache for short fname allocation in readdir David Wang
2025-05-29 21:44 ` Andreas Dilger
2025-05-30  2:27   ` David Wang
2025-05-30  4:35 ` Theodore Ts'o
2025-05-30  6:12   ` David Wang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2579f239.5415.1971fd2086a.Coremail.00107082@163.com \
    --to=00107082@163.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox