From: "Ma, Yu" <yu.ma@intel.com>
To: Mateusz Guzik <mjguzik@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
viro@zeniv.linux.org.uk, edumazet@google.com,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
pan.deng@intel.com, tianyou.li@intel.com, tim.c.chen@intel.com,
tim.c.chen@linux.intel.com, yu.ma@intel.com
Subject: Re: [PATCH v2 1/3] fs/file.c: add fast path in alloc_fd()
Date: Sat, 29 Jun 2024 22:23:23 +0800 [thread overview]
Message-ID: <20f6b9aa-65e0-4e05-9d41-85e4a22b51c2@intel.com> (raw)
In-Reply-To: <CAGudoHE5ROsy_hZB9uZjcjko0+=DbsUtBkmX9D1K1RG1GWrNbg@mail.gmail.com>
On 6/28/2024 3:59 AM, Mateusz Guzik wrote:
> On Thu, Jun 27, 2024 at 8:27 PM Ma, Yu <yu.ma@intel.com> wrote:
>> 2. For fast path implementation, the essential and simple point is to
>> directly return an available bit if there is free bit in [0-63]. I'd
>> emphasize that it does not only improve low number of open fds (even it
>> is the majority case on system as Honza agreed), but also improve the
>> cases that lots of fds open/close frequently with short task (as per the
>> algorithm, lower bits will be prioritized to allocate after being
>> recycled). Not only blogbench, a synthetic benchmark, but also the
>> realistic scenario as claimed in f3f86e33dc3d("vfs: Fix pathological
>> performance case for __alloc_fd()"), which literally introduced this
>> 2-levels bitmap searching algorithm to vfs as we see now.
> I don't understand how using next_fd instead is supposed to be inferior.
>
> Maybe I should clarify that by API contract the kernel must return the
> lowest free fd it can find. To that end it maintains the next_fd field
> as a hint to hopefully avoid some of the search work.
>
> In the stock kernel the first thing done in alloc_fd is setting it as
> a starting point:
> fdt = files_fdtable(files);
> fd = start;
> if (fd < files->next_fd)
> fd = files->next_fd;
>
> that is all the calls which come here with 0 start their search from
> next_fd position.
>
> Suppose you implemented the patch as suggested by me and next_fd fits
> the range of 0-63. Then you get the benefit of lower level bitmap
> check just like in the patch you submitted, but without having to
> first branch on whether you happen to be in that range.
>
> Suppose next_fd is somewhere higher up, say 80. With your general
> approach the optimization wont be done whatsoever or it will be
> attempted at the 0-63 range when it is an invariant it finds no free
> fds.
>
> With what I'm suggesting the general idea of taking a peek at the
> lower level bitmap can be applied across the entire fd space. Some
> manual mucking will be needed to make sure this never pulls more than
> one cacheline, easiest way out I see would be to align next_fd to
> BITS_PER_LONG for the bitmap search purposes.
Some misunderstanding here, Guzik, I thought you felt not so worth for
fast path in previous feedback, so the whole message sent just wanna say
we still think the original idea is reasonable. Back to the point here,
the way to implement it in find_next_fd() by searching the word with
next_fd makes sense and OK to me. It's efficient, concise and should
bring us the expected benefits. I'll re-measure the data for reference
based on the code proposed by you and Honza.
> Outside of the scope of this patchset, but definitely worth
> considering, is an observation that this still pulls an entire
> cacheline worth of a bitmap (assuming it grew). If one could assume
> that the size is always a multiply of 64 bytes (which it is after
> first expansion) the 64 byte scan could be entirely inlined -- there
> is quite a bit of fd fields in this range we may as well scan in hopes
> of avoiding looking at the higher level bitmap, after all we already
> paid for fetching it. This would take the optimization to its logical
> conclusion.
>
> Perhaps it would be ok to special-case the lower bitmap to start with
> 64 bytes so that there would be no need to branch on it.
next prev parent reply other threads:[~2024-06-29 14:23 UTC|newest]
Thread overview: 103+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-14 16:34 [PATCH 0/3] fs/file.c: optimize the critical section of Yu Ma
2024-06-14 16:34 ` [PATCH 1/3] fs/file.c: add fast path in alloc_fd() Yu Ma
2024-06-15 6:31 ` Mateusz Guzik
2024-06-16 4:01 ` Ma, Yu
2024-06-17 17:49 ` Tim Chen
2024-06-19 10:36 ` David Laight
2024-06-19 17:09 ` Ma, Yu
2024-06-14 16:34 ` [PATCH 2/3] fs/file.c: conditionally clear full_fds Yu Ma
2024-06-14 16:34 ` [PATCH 3/3] fs/file.c: move sanity_check from alloc_fd() to put_unused_fd() Yu Ma
2024-06-15 4:41 ` Mateusz Guzik
2024-06-15 5:07 ` Mateusz Guzik
2024-06-17 17:55 ` Tim Chen
2024-06-17 17:59 ` Mateusz Guzik
2024-06-17 18:04 ` Tim Chen
2024-06-18 8:35 ` Michal Hocko
2024-06-18 9:06 ` Mateusz Guzik
2024-06-18 20:40 ` Tim Chen
2024-06-16 3:47 ` Ma, Yu
2024-06-17 11:23 ` Mateusz Guzik
2024-06-17 17:22 ` Ma, Yu
2024-06-17 8:36 ` Christian Brauner
2024-06-22 15:49 ` [PATCH v2 0/3] fs/file.c: optimize the critical section of file_lock in Yu Ma
2024-06-22 15:49 ` [PATCH v2 1/3] fs/file.c: add fast path in alloc_fd() Yu Ma
2024-06-25 11:52 ` Jan Kara
2024-06-25 12:53 ` Jan Kara
2024-06-25 15:33 ` Ma, Yu
2024-06-26 11:54 ` Jan Kara
2024-06-26 16:43 ` Tim Chen
2024-06-26 16:52 ` Tim Chen
2024-06-27 12:09 ` Jan Kara
2024-06-27 12:20 ` Mateusz Guzik
2024-06-27 16:21 ` Tim Chen
2024-06-26 19:13 ` Mateusz Guzik
2024-06-27 14:03 ` Jan Kara
2024-06-27 15:33 ` Christian Brauner
2024-06-27 18:27 ` Ma, Yu
2024-06-27 19:59 ` Mateusz Guzik
2024-06-28 9:12 ` Jan Kara
2024-06-29 15:41 ` Ma, Yu
2024-06-29 15:46 ` Mateusz Guzik
2024-06-29 14:23 ` Ma, Yu [this message]
2024-06-22 15:49 ` [PATCH v2 2/3] fs/file.c: conditionally clear full_fds Yu Ma
2024-06-25 11:54 ` Jan Kara
2024-06-25 15:41 ` Ma, Yu
2024-06-22 15:49 ` [PATCH v2 3/3] fs/file.c: remove sanity_check from alloc_fd() Yu Ma
2024-06-25 12:08 ` Jan Kara
2024-06-25 13:09 ` Mateusz Guzik
2024-06-25 13:11 ` Mateusz Guzik
2024-06-25 13:30 ` Jan Kara
2024-06-26 13:10 ` Christian Brauner
2024-07-03 14:33 ` [PATCH v3 0/3] fs/file.c: optimize the critical section of file_lock in Yu Ma
2024-07-03 14:33 ` [PATCH v3 1/3] fs/file.c: remove sanity_check and add likely/unlikely in alloc_fd() Yu Ma
2024-07-03 14:34 ` Christian Brauner
2024-07-03 14:46 ` Ma, Yu
2024-07-04 10:11 ` Jan Kara
2024-07-04 14:45 ` Ma, Yu
2024-07-04 15:41 ` Jan Kara
2024-07-03 14:33 ` [PATCH v3 2/3] fs/file.c: conditionally clear full_fds Yu Ma
2024-07-03 14:33 ` [PATCH v3 3/3] fs/file.c: add fast path in find_next_fd() Yu Ma
2024-07-03 14:17 ` Mateusz Guzik
2024-07-03 14:28 ` Ma, Yu
2024-07-04 10:07 ` Jan Kara
2024-07-04 10:03 ` Jan Kara
2024-07-04 14:50 ` Ma, Yu
2024-07-04 17:44 ` Mateusz Guzik
2024-07-04 21:55 ` Jan Kara
2024-07-05 7:56 ` Ma, Yu
2024-07-09 8:32 ` Ma, Yu
2024-07-09 10:17 ` Mateusz Guzik
2024-07-10 23:40 ` Tim Chen
2024-07-11 9:27 ` Ma, Yu
2024-07-13 2:39 ` [PATCH v4 0/3] fs/file.c: optimize the critical section of file_lock in Yu Ma
2024-07-13 2:39 ` [PATCH v4 1/3] fs/file.c: remove sanity_check and add likely/unlikely in alloc_fd() Yu Ma
2024-07-16 11:11 ` Jan Kara
2024-07-13 2:39 ` [PATCH v4 2/3] fs/file.c: conditionally clear full_fds Yu Ma
2024-07-13 2:39 ` [PATCH v4 3/3] fs/file.c: add fast path in find_next_fd() Yu Ma
2024-07-16 11:19 ` Jan Kara
2024-07-16 12:37 ` Ma, Yu
2024-07-17 14:50 ` [PATCH v5 0/3] fs/file.c: optimize the critical section of file_lock in Yu Ma
2024-07-17 14:50 ` [PATCH v5 1/3] fs/file.c: remove sanity_check and add likely/unlikely in alloc_fd() Yu Ma
2024-08-06 13:44 ` kernel test robot
2024-08-14 21:38 ` Al Viro
2024-08-15 2:49 ` Ma, Yu
2024-08-15 3:45 ` Al Viro
2024-08-15 8:34 ` Ma, Yu
2024-10-31 7:42 ` Mateusz Guzik
2024-10-31 10:14 ` Christian Brauner
2024-07-17 14:50 ` [PATCH v5 2/3] fs/file.c: conditionally clear full_fds Yu Ma
2024-07-17 14:50 ` [PATCH v5 3/3] fs/file.c: add fast path in find_next_fd() Yu Ma
2024-07-19 17:53 ` Mateusz Guzik
2024-07-20 12:57 ` Ma, Yu
2024-07-20 14:22 ` Mateusz Guzik
2024-08-06 13:48 ` kernel test robot
2024-07-22 15:02 ` [PATCH v5 0/3] fs/file.c: optimize the critical section of file_lock in Christian Brauner
2024-08-01 19:13 ` Al Viro
2024-08-02 11:04 ` Christian Brauner
2024-08-02 14:22 ` Al Viro
2024-08-05 6:56 ` Christian Brauner
2024-08-12 1:31 ` Ma, Yu
2024-08-12 2:40 ` Al Viro
2024-08-12 15:09 ` Ma, Yu
2024-11-06 17:44 ` Jan Kara
2024-11-06 17:59 ` Al Viro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20f6b9aa-65e0-4e05-9d41-85e4a22b51c2@intel.com \
--to=yu.ma@intel.com \
--cc=brauner@kernel.org \
--cc=edumazet@google.com \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mjguzik@gmail.com \
--cc=pan.deng@intel.com \
--cc=tianyou.li@intel.com \
--cc=tim.c.chen@intel.com \
--cc=tim.c.chen@linux.intel.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).