linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: ikent@redhat.com, onestero@redhat.com
Subject: [PATCH 3/3] proc: use idr tgid tag hint to iterate pids in readdir
Date: Tue, 14 Jun 2022 14:09:49 -0400	[thread overview]
Message-ID: <20220614180949.102914-4-bfoster@redhat.com> (raw)
In-Reply-To: <20220614180949.102914-1-bfoster@redhat.com>

The tgid pid/task scan in proc_pid_readdir() is rather inefficient.
It linearly walks the pid_namespace and checks each allocated pid
for an associated PIDTYPE_TGID task. This has shown to impact
getdents() latency in environments that might have processes with
very large thread counts.

For example, on a mostly idle 2.4GHz Intel Xeon running Fedora on
5.19.0-rc2, 'strace -T xfs_io -c readdir /proc' shows the following:

  getdents64(... /* 814 entries */, 32768) = 20624 <0.000568>

With the addition of a dummy (i.e. idle) process running that
creates an additional 100k threads, that latency increases to:

  getdents64(... /* 815 entries */, 32768) = 20656 <0.011315>

While this may not be noticeable in one off /proc scans or simple
usage of ps or top, we have users that report problems caused by
this latency increase in these sort of scaled environments with
custom tooling that makes heavier use of task monitoring.

Optimize the tgid task scanning in proc_pid_readdir() by using
IDR_TGID tag lookups in the pid namespace tree. Tagged pids are not
guaranteed to have an associated PIDTYPE_TGID task, but pids that do
are always tagged. This significantly improves readdir() latency
when the pid namespace is populated with group leader tasks with
unusually large thread counts. For example, the above 100k idle task
test against a patched kernel now results in the following:

Idle:
  getdents64(... /* 861 entries */, 32768) = 21048 <0.000670>

"" + 100k threads:
  getdents64(... /* 862 entries */, 32768) = 21096 <0.000959>

... which is a much smaller latency hit after the high thread count
task is started.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/proc/base.c      |  2 +-
 include/linux/idr.h | 14 ++++++++++++++
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 8dfa36a99c74..fd3c8a5f8c2d 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3436,7 +3436,7 @@ static struct tgid_iter next_tgid(struct pid_namespace *ns, struct tgid_iter ite
 	rcu_read_lock();
 retry:
 	iter.task = NULL;
-	pid = find_ge_pid(iter.tgid, ns);
+	pid = find_tgid_pid(&ns->idr, iter.tgid);
 	if (pid) {
 		iter.tgid = pid_nr_ns(pid, ns);
 		iter.task = pid_task(pid, PIDTYPE_TGID);
diff --git a/include/linux/idr.h b/include/linux/idr.h
index 11e0ccedfc92..5ef32311b232 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -185,6 +185,20 @@ static inline bool idr_is_group_lead(struct idr *idr, unsigned long id)
 	return radix_tree_tag_get(&idr->idr_rt, id, IDR_TGID);
 }
 
+/*
+ * Find the next id with a potentially associated TGID task using the internal
+ * tag. Task association is not guaranteed and must be checked explicitly.
+ */
+static inline struct pid *find_tgid_pid(struct idr *idr, unsigned long id)
+{
+	struct pid *pid;
+
+	if (radix_tree_gang_lookup_tag(&idr->idr_rt, (void **) &pid, id, 1,
+				       IDR_TGID) != 1)
+		return NULL;
+	return pid;
+}
+
 /**
  * idr_for_each_entry() - Iterate over an IDR's elements of a given type.
  * @idr: IDR handle.
-- 
2.34.1


  parent reply	other threads:[~2022-06-14 18:09 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-14 18:09 [PATCH 0/3] proc: improve root readdir latency with many threads Brian Foster
2022-06-14 18:09 ` [PATCH 1/3] radix-tree: propagate all tags in idr tree Brian Foster
2022-06-15 11:12   ` Christoph Hellwig
2022-06-15 12:57     ` Brian Foster
2022-06-15 13:40       ` Matthew Wilcox
2022-06-15 14:43         ` Brian Foster
2022-06-15 16:34           ` Matthew Wilcox
2022-06-28 12:55             ` Christian Brauner
2022-06-28 14:53               ` Brian Foster
2022-06-29 19:13                 ` Brian Foster
2022-07-11 20:24                 ` Matthew Wilcox
2022-06-15 13:33     ` Ian Kent
2022-06-14 18:09 ` [PATCH 2/3] pid: use idr tag to hint pids associated with group leader tasks Brian Foster
2022-06-14 18:09 ` Brian Foster [this message]
2022-06-15 13:44   ` [PATCH 3/3] proc: use idr tgid tag hint to iterate pids in readdir Matthew Wilcox
2022-06-15 14:44     ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220614180949.102914-4-bfoster@redhat.com \
    --to=bfoster@redhat.com \
    --cc=ikent@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=onestero@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).