From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, kernel test robot <oliver.sang@intel.com>,
Carel Si <beibei.si@intel.com>, Jann Horn <jannh@google.com>,
Miklos Szeredi <mszeredi@redhat.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Baokun Li <libaokun1@huawei.com>
Subject: [PATCH 4.19 20/34] fget: clarify and improve __fget_files() implementation
Date: Mon, 28 Feb 2022 18:24:26 +0100 [thread overview]
Message-ID: <20220228172210.065587470@linuxfoundation.org> (raw)
In-Reply-To: <20220228172207.090703467@linuxfoundation.org>
From: Linus Torvalds <torvalds@linux-foundation.org>
commit e386dfc56f837da66d00a078e5314bc8382fab83 upstream.
Commit 054aa8d439b9 ("fget: check that the fd still exists after getting
a ref to it") fixed a race with getting a reference to a file just as it
was being closed. It was a fairly minimal patch, and I didn't think
re-checking the file pointer lookup would be a measurable overhead,
since it was all right there and cached.
But I was wrong, as pointed out by the kernel test robot.
The 'poll2' case of the will-it-scale.per_thread_ops benchmark regressed
quite noticeably. Admittedly it seems to be a very artificial test:
doing "poll()" system calls on regular files in a very tight loop in
multiple threads.
That means that basically all the time is spent just looking up file
descriptors without ever doing anything useful with them (not that doing
'poll()' on a regular file is useful to begin with). And as a result it
shows the extra "re-check fd" cost as a sore thumb.
Happily, the regression is fixable by just writing the code to loook up
the fd to be better and clearer. There's still a cost to verify the
file pointer, but now it's basically in the noise even for that
benchmark that does nothing else - and the code is more understandable
and has better comments too.
[ Side note: this patch is also a classic case of one that looks very
messy with the default greedy Myers diff - it's much more legible with
either the patience of histogram diff algorithm ]
Link: https://lore.kernel.org/lkml/20211210053743.GA36420@xsang-OptiPlex-9020/
Link: https://lore.kernel.org/lkml/20211213083154.GA20853@linux.intel.com/
Reported-by: kernel test robot <oliver.sang@intel.com>
Tested-by: Carel Si <beibei.si@intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
fs/file.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++++--------------
1 file changed, 57 insertions(+), 16 deletions(-)
--- a/fs/file.c
+++ b/fs/file.c
@@ -677,28 +677,69 @@ void do_close_on_exec(struct files_struc
spin_unlock(&files->file_lock);
}
-static struct file *__fget(unsigned int fd, fmode_t mask, unsigned int refs)
+static inline struct file *__fget_files_rcu(struct files_struct *files,
+ unsigned int fd, fmode_t mask, unsigned int refs)
{
- struct files_struct *files = current->files;
- struct file *file;
+ for (;;) {
+ struct file *file;
+ struct fdtable *fdt = rcu_dereference_raw(files->fdt);
+ struct file __rcu **fdentry;
- rcu_read_lock();
-loop:
- file = fcheck_files(files, fd);
- if (file) {
- /* File object ref couldn't be taken.
- * dup2() atomicity guarantee is the reason
- * we loop to catch the new file (or NULL pointer)
+ if (unlikely(fd >= fdt->max_fds))
+ return NULL;
+
+ fdentry = fdt->fd + array_index_nospec(fd, fdt->max_fds);
+ file = rcu_dereference_raw(*fdentry);
+ if (unlikely(!file))
+ return NULL;
+
+ if (unlikely(file->f_mode & mask))
+ return NULL;
+
+ /*
+ * Ok, we have a file pointer. However, because we do
+ * this all locklessly under RCU, we may be racing with
+ * that file being closed.
+ *
+ * Such a race can take two forms:
+ *
+ * (a) the file ref already went down to zero,
+ * and get_file_rcu_many() fails. Just try
+ * again:
*/
- if (file->f_mode & mask)
- file = NULL;
- else if (!get_file_rcu_many(file, refs))
- goto loop;
- else if (__fcheck_files(files, fd) != file) {
+ if (unlikely(!get_file_rcu_many(file, refs)))
+ continue;
+
+ /*
+ * (b) the file table entry has changed under us.
+ * Note that we don't need to re-check the 'fdt->fd'
+ * pointer having changed, because it always goes
+ * hand-in-hand with 'fdt'.
+ *
+ * If so, we need to put our refs and try again.
+ */
+ if (unlikely(rcu_dereference_raw(files->fdt) != fdt) ||
+ unlikely(rcu_dereference_raw(*fdentry) != file)) {
fput_many(file, refs);
- goto loop;
+ continue;
}
+
+ /*
+ * Ok, we have a ref to the file, and checked that it
+ * still exists.
+ */
+ return file;
}
+}
+
+
+static struct file *__fget(unsigned int fd, fmode_t mask, unsigned int refs)
+{
+ struct files_struct *files = current->files;
+ struct file *file;
+
+ rcu_read_lock();
+ file = __fget_files_rcu(files, fd, mask, refs);
rcu_read_unlock();
return file;
next prev parent reply other threads:[~2022-02-28 17:36 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-28 17:24 [PATCH 4.19 00/34] 4.19.232-rc1 review Greg Kroah-Hartman
2022-02-28 17:24 ` [PATCH 4.19 01/34] cgroup/cpuset: Fix a race between cpuset_attach() and cpu hotplug Greg Kroah-Hartman
2022-03-08 15:12 ` Michal Koutný
2022-03-14 8:06 ` Greg Kroah-Hartman
2022-03-14 9:11 ` Zhang Qiao
2022-03-14 11:19 ` Michal Koutný
2022-03-16 14:27 ` Greg Kroah-Hartman
2022-03-17 2:41 ` Zhang Qiao
2022-03-17 10:24 ` Greg Kroah-Hartman
2022-02-28 17:24 ` [PATCH 4.19 02/34] vhost/vsock: dont check owner in vhost_vsock_stop() while releasing Greg Kroah-Hartman
2022-02-28 17:24 ` [PATCH 4.19 03/34] parisc/unaligned: Fix fldd and fstd unaligned handlers on 32-bit kernel Greg Kroah-Hartman
2022-02-28 17:24 ` [PATCH 4.19 04/34] parisc/unaligned: Fix ldw() and stw() unalignment handlers Greg Kroah-Hartman
2022-02-28 17:24 ` [PATCH 4.19 05/34] sr9700: sanity check for packet length Greg Kroah-Hartman
2022-02-28 17:24 ` [PATCH 4.19 06/34] USB: zaurus: support another broken Zaurus Greg Kroah-Hartman
2022-02-28 17:24 ` [PATCH 4.19 07/34] ping: remove pr_err from ping_lookup Greg Kroah-Hartman
2022-02-28 17:24 ` [PATCH 4.19 08/34] net: __pskb_pull_tail() & pskb_carve_frag_list() drop_monitor friends Greg Kroah-Hartman
2022-02-28 17:24 ` [PATCH 4.19 09/34] tipc: Fix end of loop tests for list_for_each_entry() Greg Kroah-Hartman
2022-02-28 17:24 ` [PATCH 4.19 10/34] gso: do not skip outer ip header in case of ipip and net_failover Greg Kroah-Hartman
2022-02-28 17:24 ` [PATCH 4.19 11/34] openvswitch: Fix setting ipv6 fields causing hw csum failure Greg Kroah-Hartman
2022-02-28 17:24 ` [PATCH 4.19 12/34] drm/edid: Always set RGB444 Greg Kroah-Hartman
2022-02-28 17:24 ` [PATCH 4.19 13/34] net/mlx5e: Fix wrong return value on ioctl EEPROM query failure Greg Kroah-Hartman
2022-02-28 17:24 ` [PATCH 4.19 14/34] configfs: fix a race in configfs_{,un}register_subsystem() Greg Kroah-Hartman
2022-02-28 17:24 ` [PATCH 4.19 15/34] RDMA/ib_srp: Fix a deadlock Greg Kroah-Hartman
2022-02-28 17:24 ` [PATCH 4.19 16/34] tty: n_gsm: fix proper link termination after failed open Greg Kroah-Hartman
2022-02-28 17:24 ` [PATCH 4.19 17/34] gpio: tegra186: Fix chip_data type confusion Greg Kroah-Hartman
2022-02-28 17:24 ` [PATCH 4.19 18/34] Revert "drm/nouveau/pmu/gm200-: avoid touching PMU outside of DEVINIT/PREOS/ACR" Greg Kroah-Hartman
2022-02-28 17:24 ` [PATCH 4.19 19/34] memblock: use kfree() to release kmalloced memblock regions Greg Kroah-Hartman
2022-02-28 17:24 ` Greg Kroah-Hartman [this message]
2022-02-28 17:24 ` [PATCH 4.19 21/34] tracing: Have traceon and traceoff trigger honor the instance Greg Kroah-Hartman
2022-02-28 17:24 ` [PATCH 4.19 22/34] iio: adc: men_z188_adc: Fix a resource leak in an error handling path Greg Kroah-Hartman
2022-02-28 17:24 ` [PATCH 4.19 23/34] ata: pata_hpt37x: disable primary channel on HPT371 Greg Kroah-Hartman
2022-02-28 17:24 ` [PATCH 4.19 24/34] Revert "USB: serial: ch341: add new Product ID for CH341A" Greg Kroah-Hartman
2022-02-28 17:24 ` [PATCH 4.19 25/34] usb: gadget: rndis: add spinlock for rndis response list Greg Kroah-Hartman
2022-02-28 17:24 ` [PATCH 4.19 26/34] USB: gadget: validate endpoint index for xilinx udc Greg Kroah-Hartman
2022-02-28 17:24 ` [PATCH 4.19 27/34] tracefs: Set the group ownership in apply_options() not parse_options() Greg Kroah-Hartman
2022-02-28 17:24 ` [PATCH 4.19 28/34] USB: serial: option: add support for DW5829e Greg Kroah-Hartman
2022-02-28 17:24 ` [PATCH 4.19 29/34] USB: serial: option: add Telit LE910R1 compositions Greg Kroah-Hartman
2022-02-28 17:24 ` [PATCH 4.19 30/34] usb: dwc3: pci: Fix Bay Trail phy GPIO mappings Greg Kroah-Hartman
2022-02-28 17:24 ` [PATCH 4.19 31/34] usb: dwc3: gadget: Let the interrupt handler disable bottom halves Greg Kroah-Hartman
2022-02-28 17:24 ` [PATCH 4.19 32/34] xhci: re-initialize the HC during resume if HCE was set Greg Kroah-Hartman
2022-02-28 17:24 ` [PATCH 4.19 33/34] xhci: Prevent futile URB re-submissions due to incorrect return value Greg Kroah-Hartman
2022-02-28 17:24 ` [PATCH 4.19 34/34] tty: n_gsm: fix encoding of control signal octet bit DV Greg Kroah-Hartman
2022-02-28 21:21 ` [PATCH 4.19 00/34] 4.19.232-rc1 review Pavel Machek
2022-02-28 21:42 ` Shuah Khan
2022-03-01 11:32 ` Sudip Mukherjee
2022-03-01 16:37 ` Naresh Kamboju
2022-03-01 18:23 ` Jeffrin Thalakkottoor
2022-03-01 19:10 ` Guenter Roeck
2022-03-01 19:36 ` Slade Watkins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220228172210.065587470@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=beibei.si@intel.com \
--cc=jannh@google.com \
--cc=libaokun1@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mszeredi@redhat.com \
--cc=oliver.sang@intel.com \
--cc=stable@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox