From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
patches@lists.linux.dev, Zhihao Cheng <chengzhihao1@huawei.com>,
Zhang Yi <yi.zhang@huawei.com>, Brian Foster <bfoster@redhat.com>,
Al Viro <viro@zeniv.linux.org.uk>,
Alexey Dobriyan <adobriyan@gmail.com>,
Eric Biederman <ebiederm@xmission.com>,
Matthew Wilcox <willy@infradead.org>, Baoquan He <bhe@redhat.com>,
Kalesh Singh <kaleshsingh@google.com>,
Yu Kuai <yukuai3@huawei.com>,
Andrew Morton <akpm@linux-foundation.org>,
Suraj Jitindar Singh <surajjs@amazon.com>
Subject: [PATCH 5.10 72/83] proc: fix a dentry lock race between release_task and lookup
Date: Wed, 20 Sep 2023 13:32:02 +0200 [thread overview]
Message-ID: <20230920112829.502498246@linuxfoundation.org> (raw)
In-Reply-To: <20230920112826.634178162@linuxfoundation.org>
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhihao Cheng <chengzhihao1@huawei.com>
commit d919a1e79bac890421537cf02ae773007bf55e6b upstream.
Commit 7bc3e6e55acf06 ("proc: Use a list of inodes to flush from proc")
moved proc_flush_task() behind __exit_signal(). Then, process systemd can
take long period high cpu usage during releasing task in following
concurrent processes:
systemd ps
kernel_waitid stat(/proc/tgid)
do_wait filename_lookup
wait_consider_task lookup_fast
release_task
__exit_signal
__unhash_process
detach_pid
__change_pid // remove task->pid_links
d_revalidate -> pid_revalidate // 0
d_invalidate(/proc/tgid)
shrink_dcache_parent(/proc/tgid)
d_walk(/proc/tgid)
spin_lock_nested(/proc/tgid/fd)
// iterating opened fd
proc_flush_pid |
d_invalidate (/proc/tgid/fd) |
shrink_dcache_parent(/proc/tgid/fd) |
shrink_dentry_list(subdirs) ↓
shrink_lock_dentry(/proc/tgid/fd) --> race on dentry lock
Function d_invalidate() will remove dentry from hash firstly, but why does
proc_flush_pid() process dentry '/proc/tgid/fd' before dentry
'/proc/tgid'? That's because proc_pid_make_inode() adds proc inode in
reverse order by invoking hlist_add_head_rcu(). But proc should not add
any inodes under '/proc/tgid' except '/proc/tgid/task/pid', fix it by
adding inode into 'pid->inodes' only if the inode is /proc/tgid or
/proc/tgid/task/pid.
Performance regression:
Create 200 tasks, each task open one file for 50,000 times. Kill all
tasks when opened files exceed 10,000,000 (cat /proc/sys/fs/file-nr).
Before fix:
$ time killall -wq aa
real 4m40.946s # During this period, we can see 'ps' and 'systemd'
taking high cpu usage.
After fix:
$ time killall -wq aa
real 1m20.732s # During this period, we can see 'systemd' taking
high cpu usage.
Link: https://lkml.kernel.org/r/20220713130029.4133533-1-chengzhihao1@huawei.com
Fixes: 7bc3e6e55acf06 ("proc: Use a list of inodes to flush from proc")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216054
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Suggested-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Context adjustments ]
Signed-off-by: Suraj Jitindar Singh <surajjs@amazon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
fs/proc/base.c | 46 ++++++++++++++++++++++++++++++++++++++--------
1 file changed, 38 insertions(+), 8 deletions(-)
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1881,7 +1881,7 @@ void proc_pid_evict_inode(struct proc_in
put_pid(pid);
}
-struct inode *proc_pid_make_inode(struct super_block * sb,
+struct inode *proc_pid_make_inode(struct super_block *sb,
struct task_struct *task, umode_t mode)
{
struct inode * inode;
@@ -1910,11 +1910,6 @@ struct inode *proc_pid_make_inode(struct
/* Let the pid remember us for quick removal */
ei->pid = pid;
- if (S_ISDIR(mode)) {
- spin_lock(&pid->lock);
- hlist_add_head_rcu(&ei->sibling_inodes, &pid->inodes);
- spin_unlock(&pid->lock);
- }
task_dump_owner(task, 0, &inode->i_uid, &inode->i_gid);
security_task_to_inode(task, inode);
@@ -1927,6 +1922,39 @@ out_unlock:
return NULL;
}
+/*
+ * Generating an inode and adding it into @pid->inodes, so that task will
+ * invalidate inode's dentry before being released.
+ *
+ * This helper is used for creating dir-type entries under '/proc' and
+ * '/proc/<tgid>/task'. Other entries(eg. fd, stat) under '/proc/<tgid>'
+ * can be released by invalidating '/proc/<tgid>' dentry.
+ * In theory, dentries under '/proc/<tgid>/task' can also be released by
+ * invalidating '/proc/<tgid>' dentry, we reserve it to handle single
+ * thread exiting situation: Any one of threads should invalidate its
+ * '/proc/<tgid>/task/<pid>' dentry before released.
+ */
+static struct inode *proc_pid_make_base_inode(struct super_block *sb,
+ struct task_struct *task, umode_t mode)
+{
+ struct inode *inode;
+ struct proc_inode *ei;
+ struct pid *pid;
+
+ inode = proc_pid_make_inode(sb, task, mode);
+ if (!inode)
+ return NULL;
+
+ /* Let proc_flush_pid find this directory inode */
+ ei = PROC_I(inode);
+ pid = ei->pid;
+ spin_lock(&pid->lock);
+ hlist_add_head_rcu(&ei->sibling_inodes, &pid->inodes);
+ spin_unlock(&pid->lock);
+
+ return inode;
+}
+
int pid_getattr(const struct path *path, struct kstat *stat,
u32 request_mask, unsigned int query_flags)
{
@@ -3341,7 +3369,8 @@ static struct dentry *proc_pid_instantia
{
struct inode *inode;
- inode = proc_pid_make_inode(dentry->d_sb, task, S_IFDIR | S_IRUGO | S_IXUGO);
+ inode = proc_pid_make_base_inode(dentry->d_sb, task,
+ S_IFDIR | S_IRUGO | S_IXUGO);
if (!inode)
return ERR_PTR(-ENOENT);
@@ -3637,7 +3666,8 @@ static struct dentry *proc_task_instanti
struct task_struct *task, const void *ptr)
{
struct inode *inode;
- inode = proc_pid_make_inode(dentry->d_sb, task, S_IFDIR | S_IRUGO | S_IXUGO);
+ inode = proc_pid_make_base_inode(dentry->d_sb, task,
+ S_IFDIR | S_IRUGO | S_IXUGO);
if (!inode)
return ERR_PTR(-ENOENT);
next prev parent reply other threads:[~2023-09-20 12:23 UTC|newest]
Thread overview: 96+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-20 11:30 [PATCH 5.10 00/83] 5.10.196-rc1 review Greg Kroah-Hartman
2023-09-20 11:30 ` [PATCH 5.10 01/83] autofs: fix memory leak of waitqueues in autofs_catatonic_mode Greg Kroah-Hartman
2023-09-20 11:30 ` [PATCH 5.10 02/83] btrfs: output extra debug info if we failed to find an inline backref Greg Kroah-Hartman
2023-09-20 11:30 ` [PATCH 5.10 03/83] locks: fix KASAN: use-after-free in trace_event_raw_event_filelock_lock Greg Kroah-Hartman
2023-09-20 11:30 ` [PATCH 5.10 04/83] ACPICA: Add AML_NO_OPERAND_RESOLVE flag to Timer Greg Kroah-Hartman
2023-09-20 11:30 ` [PATCH 5.10 05/83] kernel/fork: beware of __put_task_struct() calling context Greg Kroah-Hartman
2023-09-20 11:30 ` [PATCH 5.10 06/83] rcuscale: Move rcu_scale_writer() schedule_timeout_uninterruptible() to _idle() Greg Kroah-Hartman
2023-09-20 11:30 ` [PATCH 5.10 07/83] scftorture: Forgive memory-allocation failure if KASAN Greg Kroah-Hartman
2023-09-20 11:30 ` [PATCH 5.10 08/83] ACPI: video: Add backlight=native DMI quirk for Lenovo Ideapad Z470 Greg Kroah-Hartman
2023-09-20 11:30 ` [PATCH 5.10 09/83] perf/smmuv3: Enable HiSilicon Erratum 162001900 quirk for HIP08/09 Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 10/83] ACPI: video: Add backlight=native DMI quirk for Apple iMac12,1 and iMac12,2 Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 11/83] hw_breakpoint: fix single-stepping when using bpf_overflow_handler Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 12/83] devlink: remove reload failed checks in params get/set callbacks Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 13/83] crypto: lrw,xts - Replace strlcpy with strscpy Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 14/83] wifi: ath9k: fix fortify warnings Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 15/83] wifi: ath9k: fix printk specifier Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 16/83] wifi: mwifiex: fix fortify warning Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 17/83] wifi: wil6210: fix fortify warnings Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 18/83] crypto: lib/mpi - avoid null pointer deref in mpi_cmp_ui() Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 19/83] tpm_tis: Resend command to recover from data transfer errors Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 20/83] mmc: sdhci-esdhc-imx: improve ESDHC_FLAG_ERR010450 Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 21/83] alx: fix OOB-read compiler warning Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 22/83] netfilter: ebtables: fix fortify warnings in size_entry_mwt() Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 23/83] wifi: mac80211_hwsim: drop short frames Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 24/83] libbpf: Free btf_vmlinux when closing bpf_object Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 25/83] drm/bridge: tc358762: Instruct DSI host to generate HSE packets Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 26/83] samples/hw_breakpoint: Fix kernel BUG invalid opcode: 0000 Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 27/83] ASoC: Intel: sof_sdw: Update BT offload config for soundwire config Greg Kroah-Hartman
2023-09-20 20:13 ` Salvatore Bonaccorso
2023-09-22 4:39 ` Salvatore Bonaccorso
2023-09-23 8:23 ` Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 28/83] ALSA: hda: intel-dsp-cfg: add LunarLake support Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 29/83] drm/exynos: fix a possible null-pointer dereference due to data race in exynos_drm_crtc_atomic_disable() Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 30/83] bus: ti-sysc: Configure uart quirks for k3 SoC Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 31/83] md: raid1: fix potential OOB in raid1_remove_disk() Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 32/83] ext2: fix datatype of block number in ext2_xattr_set2() Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 33/83] fs/jfs: prevent double-free in dbUnmount() after failed jfs_remount() Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 34/83] jfs: fix invalid free of JFS_IP(ipimap)->i_imap in diUnmount Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 35/83] powerpc/pseries: fix possible memory leak in ibmebus_bus_init() Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 36/83] media: dvb-usb-v2: af9035: Fix null-ptr-deref in af9035_i2c_master_xfer Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 37/83] media: dw2102: Fix null-ptr-deref in dw2102_i2c_transfer() Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 38/83] media: af9005: Fix null-ptr-deref in af9005_i2c_xfer Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 39/83] media: anysee: fix null-ptr-deref in anysee_master_xfer Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 40/83] media: az6007: Fix null-ptr-deref in az6007_i2c_xfer() Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 41/83] media: dvb-usb-v2: gl861: Fix null-ptr-deref in gl861_i2c_master_xfer Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 42/83] media: tuners: qt1010: replace BUG_ON with a regular error Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 43/83] media: pci: cx23885: replace BUG with error return Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 44/83] usb: gadget: fsl_qe_udc: validate endpoint index for ch9 udc Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 45/83] scsi: target: iscsi: Fix buffer overflow in lio_target_nacl_info_show() Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 46/83] serial: cpm_uart: Avoid suspicious locking Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 47/83] media: pci: ipu3-cio2: Initialise timing struct to avoid a compiler warning Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 48/83] kobject: Add sanity check for kset->kobj.ktype in kset_register() Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 49/83] interconnect: Fix locking for runpm vs reclaim Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 50/83] mtd: rawnand: brcmnand: Allow SoC to provide I/O operations Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 51/83] mtd: rawnand: brcmnand: Fix ECC level field setting for v7.2 controller Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 52/83] perf jevents: Make build dependency on test JSONs Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 53/83] perf tools: Add an option to build without libbfd Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 54/83] perf jevents: Switch build to use jevents.py Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 55/83] perf build: Update build rule for generated files Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 56/83] btrfs: move btrfs_pinned_by_swapfile prototype into volumes.h Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 57/83] btrfs: add a helper to read the superblock metadata_uuid Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 58/83] btrfs: compare the correct fsid/metadata_uuid in btrfs_validate_super Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 59/83] drm: gm12u320: Fix the timeout usage for usb_bulk_msg() Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 60/83] scsi: qla2xxx: Fix NULL vs IS_ERR() bug for debugfs_create_dir() Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 61/83] selftests: tracing: Fix to unmount tracefs for recovering environment Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 62/83] scsi: lpfc: Fix the NULL vs IS_ERR() bug for debugfs_create_file() Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 63/83] x86/boot/compressed: Reserve more memory for page tables Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 64/83] samples/hw_breakpoint: fix building without module unloading Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 65/83] md/raid1: fix error: ISO C90 forbids mixed declarations Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 66/83] attr: block mode changes of symlinks Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 67/83] ovl: fix incorrect fdput() on aio completion Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 68/83] btrfs: fix lockdep splat and potential deadlock after failure running delayed items Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 69/83] btrfs: release path before inode lookup during the ino lookup ioctl Greg Kroah-Hartman
2023-09-20 11:32 ` [PATCH 5.10 70/83] drm/amdgpu: fix amdgpu_cs_p1_user_fence Greg Kroah-Hartman
2023-09-20 11:32 ` [PATCH 5.10 71/83] net/sched: Retire rsvp classifier Greg Kroah-Hartman
2023-09-20 11:32 ` Greg Kroah-Hartman [this message]
2023-09-20 11:32 ` [PATCH 5.10 73/83] mm/filemap: fix infinite loop in generic_file_buffered_read() Greg Kroah-Hartman
2023-09-20 11:32 ` [PATCH 5.10 74/83] drm/amd/display: enable cursor degamma for DCN3+ DRM legacy gamma Greg Kroah-Hartman
2023-09-20 11:32 ` [PATCH 5.10 75/83] tracing: Have current_trace inc the trace array ref count Greg Kroah-Hartman
2023-09-20 11:32 ` [PATCH 5.10 76/83] tracing: Have option files " Greg Kroah-Hartman
2023-09-20 11:32 ` [PATCH 5.10 77/83] nfsd: fix change_info in NFSv4 RENAME replies Greg Kroah-Hartman
2023-09-20 11:32 ` [PATCH 5.10 78/83] tracefs: Add missing lockdown check to tracefs_create_dir() Greg Kroah-Hartman
2023-09-20 11:32 ` [PATCH 5.10 79/83] i2c: aspeed: Reset the i2c controller when timeout occurs Greg Kroah-Hartman
2023-09-20 11:32 ` [PATCH 5.10 80/83] ata: libata: disallow dev-initiated LPM transitions to unsupported states Greg Kroah-Hartman
2023-09-20 11:32 ` [PATCH 5.10 81/83] scsi: megaraid_sas: Fix deadlock on firmware crashdump Greg Kroah-Hartman
2023-09-20 11:32 ` [PATCH 5.10 82/83] scsi: pm8001: Setup IRQs on resume Greg Kroah-Hartman
2023-09-20 11:32 ` [PATCH 5.10 83/83] ext4: fix rec_len verify error Greg Kroah-Hartman
2023-09-20 18:13 ` [PATCH 5.10 00/83] 5.10.196-rc1 review Florian Fainelli
[not found] ` <CABbpPsz0Brmfw3zg2Y_r54Hx2mN1Ly=AtghKNE9chmtSP_-Y6A@mail.gmail.com>
2023-09-20 18:47 ` Florian Fainelli
2023-09-20 21:35 ` Shuah Khan
2023-09-21 6:07 ` Dominique Martinet
2023-09-21 13:32 ` Naresh Kamboju
2023-09-21 15:59 ` Guenter Roeck
2023-09-23 8:22 ` Greg Kroah-Hartman
2023-09-21 20:38 ` Joel Fernandes
2023-09-22 9:46 ` Jon Hunter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230920112829.502498246@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=adobriyan@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=bfoster@redhat.com \
--cc=bhe@redhat.com \
--cc=chengzhihao1@huawei.com \
--cc=ebiederm@xmission.com \
--cc=kaleshsingh@google.com \
--cc=patches@lists.linux.dev \
--cc=stable@vger.kernel.org \
--cc=surajjs@amazon.com \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
--cc=yi.zhang@huawei.com \
--cc=yukuai3@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).