patches.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	patches@lists.linux.dev, Zhihao Cheng <chengzhihao1@huawei.com>,
	Zhang Yi <yi.zhang@huawei.com>, Brian Foster <bfoster@redhat.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Alexey Dobriyan <adobriyan@gmail.com>,
	Eric Biederman <ebiederm@xmission.com>,
	Matthew Wilcox <willy@infradead.org>, Baoquan He <bhe@redhat.com>,
	Kalesh Singh <kaleshsingh@google.com>,
	Yu Kuai <yukuai3@huawei.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Suraj Jitindar Singh <surajjs@amazon.com>
Subject: [PATCH 5.10 72/83] proc: fix a dentry lock race between release_task and lookup
Date: Wed, 20 Sep 2023 13:32:02 +0200	[thread overview]
Message-ID: <20230920112829.502498246@linuxfoundation.org> (raw)
In-Reply-To: <20230920112826.634178162@linuxfoundation.org>

5.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Zhihao Cheng <chengzhihao1@huawei.com>

commit d919a1e79bac890421537cf02ae773007bf55e6b upstream.

Commit 7bc3e6e55acf06 ("proc: Use a list of inodes to flush from proc")
moved proc_flush_task() behind __exit_signal().  Then, process systemd can
take long period high cpu usage during releasing task in following
concurrent processes:

  systemd                                 ps
kernel_waitid                 stat(/proc/tgid)
  do_wait                       filename_lookup
    wait_consider_task            lookup_fast
      release_task
        __exit_signal
          __unhash_process
            detach_pid
              __change_pid // remove task->pid_links
                                     d_revalidate -> pid_revalidate  // 0
                                     d_invalidate(/proc/tgid)
                                       shrink_dcache_parent(/proc/tgid)
                                         d_walk(/proc/tgid)
                                           spin_lock_nested(/proc/tgid/fd)
                                           // iterating opened fd
        proc_flush_pid                                    |
           d_invalidate (/proc/tgid/fd)                   |
              shrink_dcache_parent(/proc/tgid/fd)         |
                shrink_dentry_list(subdirs)               ↓
                  shrink_lock_dentry(/proc/tgid/fd) --> race on dentry lock

Function d_invalidate() will remove dentry from hash firstly, but why does
proc_flush_pid() process dentry '/proc/tgid/fd' before dentry
'/proc/tgid'?  That's because proc_pid_make_inode() adds proc inode in
reverse order by invoking hlist_add_head_rcu().  But proc should not add
any inodes under '/proc/tgid' except '/proc/tgid/task/pid', fix it by
adding inode into 'pid->inodes' only if the inode is /proc/tgid or
/proc/tgid/task/pid.

Performance regression:
Create 200 tasks, each task open one file for 50,000 times. Kill all
tasks when opened files exceed 10,000,000 (cat /proc/sys/fs/file-nr).

Before fix:
$ time killall -wq aa
  real    4m40.946s   # During this period, we can see 'ps' and 'systemd'
			taking high cpu usage.

After fix:
$ time killall -wq aa
  real    1m20.732s   # During this period, we can see 'systemd' taking
			high cpu usage.

Link: https://lkml.kernel.org/r/20220713130029.4133533-1-chengzhihao1@huawei.com
Fixes: 7bc3e6e55acf06 ("proc: Use a list of inodes to flush from proc")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216054
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Suggested-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Context adjustments ]
Signed-off-by: Suraj Jitindar Singh <surajjs@amazon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/proc/base.c |   46 ++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 38 insertions(+), 8 deletions(-)

--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1881,7 +1881,7 @@ void proc_pid_evict_inode(struct proc_in
 	put_pid(pid);
 }
 
-struct inode *proc_pid_make_inode(struct super_block * sb,
+struct inode *proc_pid_make_inode(struct super_block *sb,
 				  struct task_struct *task, umode_t mode)
 {
 	struct inode * inode;
@@ -1910,11 +1910,6 @@ struct inode *proc_pid_make_inode(struct
 
 	/* Let the pid remember us for quick removal */
 	ei->pid = pid;
-	if (S_ISDIR(mode)) {
-		spin_lock(&pid->lock);
-		hlist_add_head_rcu(&ei->sibling_inodes, &pid->inodes);
-		spin_unlock(&pid->lock);
-	}
 
 	task_dump_owner(task, 0, &inode->i_uid, &inode->i_gid);
 	security_task_to_inode(task, inode);
@@ -1927,6 +1922,39 @@ out_unlock:
 	return NULL;
 }
 
+/*
+ * Generating an inode and adding it into @pid->inodes, so that task will
+ * invalidate inode's dentry before being released.
+ *
+ * This helper is used for creating dir-type entries under '/proc' and
+ * '/proc/<tgid>/task'. Other entries(eg. fd, stat) under '/proc/<tgid>'
+ * can be released by invalidating '/proc/<tgid>' dentry.
+ * In theory, dentries under '/proc/<tgid>/task' can also be released by
+ * invalidating '/proc/<tgid>' dentry, we reserve it to handle single
+ * thread exiting situation: Any one of threads should invalidate its
+ * '/proc/<tgid>/task/<pid>' dentry before released.
+ */
+static struct inode *proc_pid_make_base_inode(struct super_block *sb,
+				struct task_struct *task, umode_t mode)
+{
+	struct inode *inode;
+	struct proc_inode *ei;
+	struct pid *pid;
+
+	inode = proc_pid_make_inode(sb, task, mode);
+	if (!inode)
+		return NULL;
+
+	/* Let proc_flush_pid find this directory inode */
+	ei = PROC_I(inode);
+	pid = ei->pid;
+	spin_lock(&pid->lock);
+	hlist_add_head_rcu(&ei->sibling_inodes, &pid->inodes);
+	spin_unlock(&pid->lock);
+
+	return inode;
+}
+
 int pid_getattr(const struct path *path, struct kstat *stat,
 		u32 request_mask, unsigned int query_flags)
 {
@@ -3341,7 +3369,8 @@ static struct dentry *proc_pid_instantia
 {
 	struct inode *inode;
 
-	inode = proc_pid_make_inode(dentry->d_sb, task, S_IFDIR | S_IRUGO | S_IXUGO);
+	inode = proc_pid_make_base_inode(dentry->d_sb, task,
+					 S_IFDIR | S_IRUGO | S_IXUGO);
 	if (!inode)
 		return ERR_PTR(-ENOENT);
 
@@ -3637,7 +3666,8 @@ static struct dentry *proc_task_instanti
 	struct task_struct *task, const void *ptr)
 {
 	struct inode *inode;
-	inode = proc_pid_make_inode(dentry->d_sb, task, S_IFDIR | S_IRUGO | S_IXUGO);
+	inode = proc_pid_make_base_inode(dentry->d_sb, task,
+					 S_IFDIR | S_IRUGO | S_IXUGO);
 	if (!inode)
 		return ERR_PTR(-ENOENT);
 



  parent reply	other threads:[~2023-09-20 12:23 UTC|newest]

Thread overview: 96+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-20 11:30 [PATCH 5.10 00/83] 5.10.196-rc1 review Greg Kroah-Hartman
2023-09-20 11:30 ` [PATCH 5.10 01/83] autofs: fix memory leak of waitqueues in autofs_catatonic_mode Greg Kroah-Hartman
2023-09-20 11:30 ` [PATCH 5.10 02/83] btrfs: output extra debug info if we failed to find an inline backref Greg Kroah-Hartman
2023-09-20 11:30 ` [PATCH 5.10 03/83] locks: fix KASAN: use-after-free in trace_event_raw_event_filelock_lock Greg Kroah-Hartman
2023-09-20 11:30 ` [PATCH 5.10 04/83] ACPICA: Add AML_NO_OPERAND_RESOLVE flag to Timer Greg Kroah-Hartman
2023-09-20 11:30 ` [PATCH 5.10 05/83] kernel/fork: beware of __put_task_struct() calling context Greg Kroah-Hartman
2023-09-20 11:30 ` [PATCH 5.10 06/83] rcuscale: Move rcu_scale_writer() schedule_timeout_uninterruptible() to _idle() Greg Kroah-Hartman
2023-09-20 11:30 ` [PATCH 5.10 07/83] scftorture: Forgive memory-allocation failure if KASAN Greg Kroah-Hartman
2023-09-20 11:30 ` [PATCH 5.10 08/83] ACPI: video: Add backlight=native DMI quirk for Lenovo Ideapad Z470 Greg Kroah-Hartman
2023-09-20 11:30 ` [PATCH 5.10 09/83] perf/smmuv3: Enable HiSilicon Erratum 162001900 quirk for HIP08/09 Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 10/83] ACPI: video: Add backlight=native DMI quirk for Apple iMac12,1 and iMac12,2 Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 11/83] hw_breakpoint: fix single-stepping when using bpf_overflow_handler Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 12/83] devlink: remove reload failed checks in params get/set callbacks Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 13/83] crypto: lrw,xts - Replace strlcpy with strscpy Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 14/83] wifi: ath9k: fix fortify warnings Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 15/83] wifi: ath9k: fix printk specifier Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 16/83] wifi: mwifiex: fix fortify warning Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 17/83] wifi: wil6210: fix fortify warnings Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 18/83] crypto: lib/mpi - avoid null pointer deref in mpi_cmp_ui() Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 19/83] tpm_tis: Resend command to recover from data transfer errors Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 20/83] mmc: sdhci-esdhc-imx: improve ESDHC_FLAG_ERR010450 Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 21/83] alx: fix OOB-read compiler warning Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 22/83] netfilter: ebtables: fix fortify warnings in size_entry_mwt() Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 23/83] wifi: mac80211_hwsim: drop short frames Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 24/83] libbpf: Free btf_vmlinux when closing bpf_object Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 25/83] drm/bridge: tc358762: Instruct DSI host to generate HSE packets Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 26/83] samples/hw_breakpoint: Fix kernel BUG invalid opcode: 0000 Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 27/83] ASoC: Intel: sof_sdw: Update BT offload config for soundwire config Greg Kroah-Hartman
2023-09-20 20:13   ` Salvatore Bonaccorso
2023-09-22  4:39     ` Salvatore Bonaccorso
2023-09-23  8:23       ` Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 28/83] ALSA: hda: intel-dsp-cfg: add LunarLake support Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 29/83] drm/exynos: fix a possible null-pointer dereference due to data race in exynos_drm_crtc_atomic_disable() Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 30/83] bus: ti-sysc: Configure uart quirks for k3 SoC Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 31/83] md: raid1: fix potential OOB in raid1_remove_disk() Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 32/83] ext2: fix datatype of block number in ext2_xattr_set2() Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 33/83] fs/jfs: prevent double-free in dbUnmount() after failed jfs_remount() Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 34/83] jfs: fix invalid free of JFS_IP(ipimap)->i_imap in diUnmount Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 35/83] powerpc/pseries: fix possible memory leak in ibmebus_bus_init() Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 36/83] media: dvb-usb-v2: af9035: Fix null-ptr-deref in af9035_i2c_master_xfer Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 37/83] media: dw2102: Fix null-ptr-deref in dw2102_i2c_transfer() Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 38/83] media: af9005: Fix null-ptr-deref in af9005_i2c_xfer Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 39/83] media: anysee: fix null-ptr-deref in anysee_master_xfer Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 40/83] media: az6007: Fix null-ptr-deref in az6007_i2c_xfer() Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 41/83] media: dvb-usb-v2: gl861: Fix null-ptr-deref in gl861_i2c_master_xfer Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 42/83] media: tuners: qt1010: replace BUG_ON with a regular error Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 43/83] media: pci: cx23885: replace BUG with error return Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 44/83] usb: gadget: fsl_qe_udc: validate endpoint index for ch9 udc Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 45/83] scsi: target: iscsi: Fix buffer overflow in lio_target_nacl_info_show() Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 46/83] serial: cpm_uart: Avoid suspicious locking Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 47/83] media: pci: ipu3-cio2: Initialise timing struct to avoid a compiler warning Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 48/83] kobject: Add sanity check for kset->kobj.ktype in kset_register() Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 49/83] interconnect: Fix locking for runpm vs reclaim Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 50/83] mtd: rawnand: brcmnand: Allow SoC to provide I/O operations Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 51/83] mtd: rawnand: brcmnand: Fix ECC level field setting for v7.2 controller Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 52/83] perf jevents: Make build dependency on test JSONs Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 53/83] perf tools: Add an option to build without libbfd Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 54/83] perf jevents: Switch build to use jevents.py Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 55/83] perf build: Update build rule for generated files Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 56/83] btrfs: move btrfs_pinned_by_swapfile prototype into volumes.h Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 57/83] btrfs: add a helper to read the superblock metadata_uuid Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 58/83] btrfs: compare the correct fsid/metadata_uuid in btrfs_validate_super Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 59/83] drm: gm12u320: Fix the timeout usage for usb_bulk_msg() Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 60/83] scsi: qla2xxx: Fix NULL vs IS_ERR() bug for debugfs_create_dir() Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 61/83] selftests: tracing: Fix to unmount tracefs for recovering environment Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 62/83] scsi: lpfc: Fix the NULL vs IS_ERR() bug for debugfs_create_file() Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 63/83] x86/boot/compressed: Reserve more memory for page tables Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 64/83] samples/hw_breakpoint: fix building without module unloading Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 65/83] md/raid1: fix error: ISO C90 forbids mixed declarations Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 66/83] attr: block mode changes of symlinks Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 67/83] ovl: fix incorrect fdput() on aio completion Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 68/83] btrfs: fix lockdep splat and potential deadlock after failure running delayed items Greg Kroah-Hartman
2023-09-20 11:31 ` [PATCH 5.10 69/83] btrfs: release path before inode lookup during the ino lookup ioctl Greg Kroah-Hartman
2023-09-20 11:32 ` [PATCH 5.10 70/83] drm/amdgpu: fix amdgpu_cs_p1_user_fence Greg Kroah-Hartman
2023-09-20 11:32 ` [PATCH 5.10 71/83] net/sched: Retire rsvp classifier Greg Kroah-Hartman
2023-09-20 11:32 ` Greg Kroah-Hartman [this message]
2023-09-20 11:32 ` [PATCH 5.10 73/83] mm/filemap: fix infinite loop in generic_file_buffered_read() Greg Kroah-Hartman
2023-09-20 11:32 ` [PATCH 5.10 74/83] drm/amd/display: enable cursor degamma for DCN3+ DRM legacy gamma Greg Kroah-Hartman
2023-09-20 11:32 ` [PATCH 5.10 75/83] tracing: Have current_trace inc the trace array ref count Greg Kroah-Hartman
2023-09-20 11:32 ` [PATCH 5.10 76/83] tracing: Have option files " Greg Kroah-Hartman
2023-09-20 11:32 ` [PATCH 5.10 77/83] nfsd: fix change_info in NFSv4 RENAME replies Greg Kroah-Hartman
2023-09-20 11:32 ` [PATCH 5.10 78/83] tracefs: Add missing lockdown check to tracefs_create_dir() Greg Kroah-Hartman
2023-09-20 11:32 ` [PATCH 5.10 79/83] i2c: aspeed: Reset the i2c controller when timeout occurs Greg Kroah-Hartman
2023-09-20 11:32 ` [PATCH 5.10 80/83] ata: libata: disallow dev-initiated LPM transitions to unsupported states Greg Kroah-Hartman
2023-09-20 11:32 ` [PATCH 5.10 81/83] scsi: megaraid_sas: Fix deadlock on firmware crashdump Greg Kroah-Hartman
2023-09-20 11:32 ` [PATCH 5.10 82/83] scsi: pm8001: Setup IRQs on resume Greg Kroah-Hartman
2023-09-20 11:32 ` [PATCH 5.10 83/83] ext4: fix rec_len verify error Greg Kroah-Hartman
2023-09-20 18:13 ` [PATCH 5.10 00/83] 5.10.196-rc1 review Florian Fainelli
     [not found]   ` <CABbpPsz0Brmfw3zg2Y_r54Hx2mN1Ly=AtghKNE9chmtSP_-Y6A@mail.gmail.com>
2023-09-20 18:47     ` Florian Fainelli
2023-09-20 21:35 ` Shuah Khan
2023-09-21  6:07 ` Dominique Martinet
2023-09-21 13:32 ` Naresh Kamboju
2023-09-21 15:59 ` Guenter Roeck
2023-09-23  8:22   ` Greg Kroah-Hartman
2023-09-21 20:38 ` Joel Fernandes
2023-09-22  9:46 ` Jon Hunter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230920112829.502498246@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=bfoster@redhat.com \
    --cc=bhe@redhat.com \
    --cc=chengzhihao1@huawei.com \
    --cc=ebiederm@xmission.com \
    --cc=kaleshsingh@google.com \
    --cc=patches@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    --cc=surajjs@amazon.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=yi.zhang@huawei.com \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).