Re: Optimizing small reads

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Linus Torvalds <torvalds@linux-foundation.org>
To: Kiryl Shutsemau <kirill@shutemov.name>
Cc: Matthew Wilcox <willy@infradead.org>,
	Luis Chamberlain <mcgrof@kernel.org>,
	 Linux-MM <linux-mm@kvack.org>,
	linux-fsdevel@vger.kernel.org
Subject: Re: Optimizing small reads
Date: Fri, 10 Oct 2025 10:51:40 -0700	[thread overview]
Message-ID: <CAHk-=wg0r_xsB0RQ+35WPHwPb9b9drJEfGL-hByBZRmPbSy0rQ@mail.gmail.com> (raw)
In-Reply-To: <qasdw5uxymstppbxvqrfs5nquf2rqczmzu5yhbvn6brqm5w6sw@ax6o4q2xkh3t>

On Fri, 10 Oct 2025 at 03:10, Kiryl Shutsemau <kirill@shutemov.name> wrote:
>
> > So honestly I'd be inclined to go back to "just deal with the
> > trivially small reads", and scratch this extra complexity.
>
> I will play with it a bit more, but, yes, this my feel too.

There's a couple of reasons I think this optimization ends up being
irrelevant for larger reads, with the obvious one being that ocne you
have bigger reads, the cost of the copy will swamp the other latency
issues.

But perhaps most importantly, if you do reads that are page-sized (or
bigger), you by definition are no longer doing the thing that started
this whole thing in the first place: hammering over and over on the
same page reference count in multiple threads.

(Of course, you can do exactly one page read over and over again, but
at some point it just has to be called outright stupid and an
artificial load)

IOW, I think the only reasonable way that you actually get that
cacheline ping-pong case is that you have some load that really does
access some *small* data in a file from multiple threads, where there
is then patterns where there are lots of those small fields on the
same page (eg it's some metadata that everybody ends up accessing).

If I recall correctly, the case Willy had a customer doing was reading
64-byte entries in a page. And then I really can see multiple threads
just reading the same page concurrently.

But even if the entries are "just" one page in size, suddenly it makes
no sense for a half-way competent app to re-read the whole page over
and over and over again in threads. If an application really does
that, then it's on the application to fix its silly behavior, not the
kernel to try to optimize for that insanity.

So that's why I think that original "just 256 bytes" is likely
perfectly sufficient.

Bigger IO simply doesn't make much sense for this particular "avoid
reference counting", and while the RCU path is certainly clever and
low-latency, and avoiding atomics is always a good thing, at the same
time it's also a very limited thing that then can't do some basic
things (like the whole "mark page accessed" etc)

Anyway, I may well be wrong, but let's start out with a minimal patch.
I think your first version with just the sequence number fixed would
likely be perfectly fine for integration into 6.19 - possibly with any
tweaking you come up with.

And any benchmarking should probably do exactly that "threaded 64-byte
read" that Willy had a real use-case for.

Then, if people come up with actual real loads where it would make
sense to expand on this, we can do so later.

Sounds like a plan?

                Linus

next prev parent reply	other threads:[~2025-10-10 17:52 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAHk-=wj00-nGmXEkxY=-=Z_qP6kiGUziSFvxHJ9N-cLWry5zpA@mail.gmail.com>
     [not found] ` <flg637pjmcnxqpgmsgo5yvikwznak2rl4il2srddcui2564br5@zmpwmxibahw2>
     [not found]   ` <CAHk-=wgy=oOSu+A3cMfVhBK66zdFsstDV3cgVO-=RF4cJ2bZ+A@mail.gmail.com>
     [not found]     ` <CAHk-=whThZaXqDdum21SEWXjKQXmBcFN8E5zStX8W-EMEhAFdQ@mail.gmail.com>
     [not found]       ` <a3nryktlvr6raisphhw56mdkvff6zr5athu2bsyiotrtkjchf3@z6rdwygtybft>
     [not found]         ` <CAHk-=wg-eq7s8UMogFCS8OJQt9hwajwKP6kzW88avbx+4JXhcA@mail.gmail.com>
2025-10-06 11:44           ` Optimizing small reads Kiryl Shutsemau
2025-10-06 15:50             ` Linus Torvalds
2025-10-06 18:04               ` Kiryl Shutsemau
2025-10-06 18:14                 ` Linus Torvalds
2025-10-07 21:47                 ` Linus Torvalds
2025-10-07 22:35                   ` Linus Torvalds
2025-10-07 22:54                     ` Linus Torvalds
2025-10-07 23:30                       ` Linus Torvalds
2025-10-08 14:54                         ` Kiryl Shutsemau
2025-10-08 16:27                           ` Linus Torvalds
2025-10-08 17:03                             ` Linus Torvalds
2025-10-09 16:22                               ` Kiryl Shutsemau
2025-10-09 17:29                                 ` Linus Torvalds
2025-10-10 10:10                                   ` Kiryl Shutsemau
2025-10-10 17:51                                     ` Linus Torvalds [this message]
2025-10-13 15:35                                       ` Kiryl Shutsemau
2025-10-13 15:39                                         ` Kiryl Shutsemau
2025-10-13 16:19                                           ` Linus Torvalds
2025-10-14 12:58                                             ` Kiryl Shutsemau
2025-10-14 16:41                                               ` Linus Torvalds
2025-10-13 16:06                                         ` Linus Torvalds
2025-10-13 17:26                                         ` Theodore Ts'o
2025-10-14  3:20                                           ` Theodore Ts'o
2025-10-08 10:28                       ` Kiryl Shutsemau
2025-10-08 16:24                         ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wg0r_xsB0RQ+35WPHwPb9b9drJEfGL-hByBZRmPbSy0rQ@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=kirill@shutemov.name \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mcgrof@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).