linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Kiryl Shutsemau <kirill@shutemov.name>
Cc: Matthew Wilcox <willy@infradead.org>,
	Luis Chamberlain <mcgrof@kernel.org>,
	 Linux-MM <linux-mm@kvack.org>,
	linux-fsdevel@vger.kernel.org
Subject: Re: Optimizing small reads
Date: Mon, 6 Oct 2025 08:50:55 -0700	[thread overview]
Message-ID: <CAHk-=wi4Cma0HL2DVLWRrvte5NDpcb2A6VZNwUc0riBr2=7TXw@mail.gmail.com> (raw)
In-Reply-To: <4bjh23pk56gtnhutt4i46magq74zx3nlkuo4ym2tkn54rv4gjl@rhxb6t6ncewp>

On Mon, 6 Oct 2025 at 04:45, Kiryl Shutsemau <kirill@shutemov.name> wrote:
>
> Below is my take on this. Lightly tested.

Thanks.

That looked even simpler than what I thought it would be, although I
worry a bit about the effect on page_cache_delete() now being much
more expensive with that spinlock.

And the spinlock actually looks unnecessary, since page_cache_delete()
already relies on being serialized by holding the i_pages lock.

So I think you can just change the seqcount_spinlock_t to a plain
seqcount_t with no locking at all, and document that external locking.

>  - Do we want a bounded retry on read_seqcount_retry()?
>    Maybe upto 3 iterations?

No., I don't think it ever triggers, and I really like how this looks.

And I'd go even further, and change that first

        seq = read_seqcount_begin(&mapping->i_pages_delete_seqcnt);

into a

        if (!raw_seqcount_try_begin(&mapping->i_pages_delete_seqcnt);
                return 0;

so that you don't even wait for any existing case.

That you could even do *outside* the RCU section, but I'm not sure
that buys us anything.

*If* somebody ever hits it we can revisit, but I really think the
whole point of this fast-path is to just deal with the common case
quickly.

There are going to be other things that are much more common and much
more realistic, like "this is the first read, so I need to set the
accessed bit".

>  - HIGHMEM support is trivial with memcpy_from_file_folio();

Good call. I didn't even want to think about it, and obviously never did.

>  - I opted for late partial read check. It would be nice allow to read
>    across PAGE_SIZE boundary as long as it is in the same folio

Sure,

When I wrote that patch, I actually worried more about the negative
overhead of it not hitting at all, so I tried very hard to minimize
the cases where we look up a folio speculatively only to then decide
we can't use it.

But as long as that

        if (iov_iter_count(iter) <= sizeof(area)) {

is there to protect the really basic rule, I guess it's not a huge deal.

>  - Move i_size check after uptodate check. It seems to be required
>    according to the comment in filemap_read(). But I cannot say I
>    understand i_size implications here.

I forget too, and it might be voodoo programming.

>  - Size of area is 256 bytes. I wounder if we want to get the fast read
>    to work on full page chunks. Can we dedicate a page per CPU for this?
>    I expect it to cover substantially more cases.

I guess a percpu page would be good, but I really liked using the
buffer we already ended up having for that page array.

Maybe worth playing around with.

> Any comments are welcome.

See above: the only think I think you should change - at least for a
first version - is to not even do the spinlock and just rely on the
locks we already hold in the removal path. That page_cache_delete()
already requires locks for the

        mapping->nrpages -= nr;

logic later (and for other reasons anyway).

And, obviously, this needs testing. I've never seen an issue with my
non-sequence case, so I think a lot of xfstests...

                     Linus

  reply	other threads:[~2025-10-06 15:51 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAHk-=wj00-nGmXEkxY=-=Z_qP6kiGUziSFvxHJ9N-cLWry5zpA@mail.gmail.com>
     [not found] ` <flg637pjmcnxqpgmsgo5yvikwznak2rl4il2srddcui2564br5@zmpwmxibahw2>
     [not found]   ` <CAHk-=wgy=oOSu+A3cMfVhBK66zdFsstDV3cgVO-=RF4cJ2bZ+A@mail.gmail.com>
     [not found]     ` <CAHk-=whThZaXqDdum21SEWXjKQXmBcFN8E5zStX8W-EMEhAFdQ@mail.gmail.com>
     [not found]       ` <a3nryktlvr6raisphhw56mdkvff6zr5athu2bsyiotrtkjchf3@z6rdwygtybft>
     [not found]         ` <CAHk-=wg-eq7s8UMogFCS8OJQt9hwajwKP6kzW88avbx+4JXhcA@mail.gmail.com>
2025-10-06 11:44           ` Optimizing small reads Kiryl Shutsemau
2025-10-06 15:50             ` Linus Torvalds [this message]
2025-10-06 18:04               ` Kiryl Shutsemau
2025-10-06 18:14                 ` Linus Torvalds
2025-10-07 21:47                 ` Linus Torvalds
2025-10-07 22:35                   ` Linus Torvalds
2025-10-07 22:54                     ` Linus Torvalds
2025-10-07 23:30                       ` Linus Torvalds
2025-10-08 14:54                         ` Kiryl Shutsemau
2025-10-08 16:27                           ` Linus Torvalds
2025-10-08 17:03                             ` Linus Torvalds
2025-10-09 16:22                               ` Kiryl Shutsemau
2025-10-09 17:29                                 ` Linus Torvalds
2025-10-10 10:10                                   ` Kiryl Shutsemau
2025-10-10 17:51                                     ` Linus Torvalds
2025-10-13 15:35                                       ` Kiryl Shutsemau
2025-10-13 15:39                                         ` Kiryl Shutsemau
2025-10-13 16:19                                           ` Linus Torvalds
2025-10-14 12:58                                             ` Kiryl Shutsemau
2025-10-14 16:41                                               ` Linus Torvalds
2025-10-13 16:06                                         ` Linus Torvalds
2025-10-13 17:26                                         ` Theodore Ts'o
2025-10-14  3:20                                           ` Theodore Ts'o
2025-10-08 10:28                       ` Kiryl Shutsemau
2025-10-08 16:24                         ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wi4Cma0HL2DVLWRrvte5NDpcb2A6VZNwUc0riBr2=7TXw@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=kirill@shutemov.name \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mcgrof@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).