From: Al Viro <viro@zeniv.linux.org.uk>
To: Matthew Wilcox <willy@infradead.org>
Cc: "Kernel.org Bugbot" <bugbot@kernel.org>,
brauner@kernel.org, linux-fsdevel@vger.kernel.org,
bugs@lists.linux.dev
Subject: Re: large pause when opening file descriptor which is power of 2
Date: Wed, 26 Apr 2023 20:58:21 +0100 [thread overview]
Message-ID: <20230426195821.GV3390869@ZenIV> (raw)
In-Reply-To: <20230426194628.GU3390869@ZenIV>
On Wed, Apr 26, 2023 at 08:46:28PM +0100, Al Viro wrote:
> On Wed, Apr 26, 2023 at 08:13:37PM +0100, Matthew Wilcox wrote:
> > On Wed, Apr 26, 2023 at 05:58:06PM +0000, Kernel.org Bugbot wrote:
> > > When running a threaded program, and opening a file descriptor that
> > > is a power of 2 (starting at 64), the call takes a very long time to
> > > complete. Normally such a call takes less than 2us. However with this
> > > issue, I've seen the call take up to around 50ms. Additionally this only
> > > happens the first time, and not subsequent times that file descriptor is
> > > used. I'm guessing there might be some expansion of some internal data
> > > structures going on. But I cannot see why this process would take so long.
> >
> > Because we allocate a new block of memory and then memcpy() the old
> > block of memory into it. This isn't surprising behaviour to me.
> > I don't think there's much we can do to change it (Allocating a
> > segmented array of file descriptors has previously been vetoed by
> > people who have programs with a million file descriptors). Is it
> > causing you problems?
>
> FWIW, I suspect that this is not so much allocation + memcpy.
> /* make sure all fd_install() have seen resize_in_progress
> * or have finished their rcu_read_lock_sched() section.
> */
> if (atomic_read(&files->count) > 1)
> synchronize_rcu();
>
> in expand_fdtable() is a likelier source of delays.
A bit more background: we want to avoid grabbing ->file_lock in
fd_install() if at all possible. After all, we have already claimed
the slot (back when we'd allocated the descriptor) and nobody else is
allowed to shove a file reference there.
Which is fine, except for the fact that expansion of descriptor
table needs to allocate a new array, copy the old one into it and replace
the old one with it.
Lockless fd_install() might overlap with the "copy" step in
the above and end up getting lost. So in fd_install() we check if
there's a resize in progress before deciding to go for the lockless path.
Which means that on the resize side we need to mark the descriptor table
as getting resized, then wait long enough for all threads already in
fd_install() to get through.
next prev parent reply other threads:[~2023-04-26 19:58 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-26 17:58 large pause when opening file descriptor which is power of 2 Kernel.org Bugbot
2023-04-26 19:13 ` Matthew Wilcox
2023-04-26 19:46 ` Al Viro
2023-04-26 19:56 ` Matthew Wilcox
2023-04-26 20:33 ` Al Viro
2023-04-26 19:58 ` Al Viro [this message]
2023-04-26 23:42 ` Kernel.org Bugbot
2023-04-27 9:28 ` Christian Brauner
2023-04-27 18:18 ` Matthew Wilcox
2023-04-28 0:37 ` Kernel.org Bugbot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230426195821.GV3390869@ZenIV \
--to=viro@zeniv.linux.org.uk \
--cc=brauner@kernel.org \
--cc=bugbot@kernel.org \
--cc=bugs@lists.linux.dev \
--cc=linux-fsdevel@vger.kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).