From: Brian Foster <bfoster@redhat.com>
To: Ivan Babrou <ivan@cloudflare.com>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
kernel-team@cloudflare.com, Alexey Dobriyan <adobriyan@gmail.com>,
Al Viro <viro@zeniv.linux.org.uk>, Theodore Ts'o <tytso@mit.edu>,
David Laight <David.Laight@aculab.com>,
Jonathan Corbet <corbet@lwn.net>,
Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@redhat.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>,
Mike Rapoport <rppt@kernel.org>,
Paul Gortmaker <paul.gortmaker@windriver.com>,
Kalesh Singh <kaleshsingh@google.com>
Subject: Re: [PATCH v4] proc: report open files as size in stat() for /proc/pid/fd
Date: Fri, 18 Nov 2022 14:10:51 -0500 [thread overview]
Message-ID: <Y3fYu2VCBgREBBau@bfoster> (raw)
In-Reply-To: <20221024173140.30673-1-ivan@cloudflare.com>
On Mon, Oct 24, 2022 at 10:31:40AM -0700, Ivan Babrou wrote:
> Many monitoring tools include open file count as a metric. Currently
> the only way to get this number is to enumerate the files in /proc/pid/fd.
>
> The problem with the current approach is that it does many things people
> generally don't care about when they need one number for a metric.
> In our tests for cadvisor, which reports open file counts per cgroup,
> we observed that reading the number of open files is slow. Out of 35.23%
> of CPU time spent in `proc_readfd_common`, we see 29.43% spent in
> `proc_fill_cache`, which is responsible for filling dentry info.
> Some of this extra time is spinlock contention, but it's a contention
> for the lock we don't want to take to begin with.
>
> We considered putting the number of open files in /proc/pid/status.
> Unfortunately, counting the number of fds involves iterating the open_files
> bitmap, which has a linear complexity in proportion with the number
> of open files (bitmap slots really, but it's close). We don't want
> to make /proc/pid/status any slower, so instead we put this info
> in /proc/pid/fd as a size member of the stat syscall result.
> Previously the reported number was zero, so there's very little
> risk of breaking anything, while still providing a somewhat logical
> way to count the open files with a fallback if it's zero.
>
> RFC for this patch included iterating open fds under RCU. Thanks
> to Frank Hofmann for the suggestion to use the bitmap instead.
>
> Previously:
>
> ```
> $ sudo stat /proc/1/fd | head -n2
> File: /proc/1/fd
> Size: 0 Blocks: 0 IO Block: 1024 directory
> ```
>
> With this patch:
>
> ```
> $ sudo stat /proc/1/fd | head -n2
> File: /proc/1/fd
> Size: 65 Blocks: 0 IO Block: 1024 directory
> ```
>
> Correctness check:
>
> ```
> $ sudo ls /proc/1/fd | wc -l
> 65
> ```
>
> I added the docs for /proc/<pid>/fd while I'm at it.
>
> Signed-off-by: Ivan Babrou <ivan@cloudflare.com>
>
> ---
> v4: Return errno from proc_fd_getattr() instead of setting negative size.
> Added an explicit include for linux/bitmap.h.
> v3: Made use of bitmap_weight() to count the bits.
> v2: Added missing rcu_read_lock() / rcu_read_unlock(),
> task_lock() / task_unlock() and put_task_struct().
> ---
> Documentation/filesystems/proc.rst | 17 +++++++++++
> fs/proc/fd.c | 45 ++++++++++++++++++++++++++++++
> 2 files changed, 62 insertions(+)
>
> diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> index 898c99eae8e4..ec6cfdf1796a 100644
> --- a/Documentation/filesystems/proc.rst
> +++ b/Documentation/filesystems/proc.rst
> @@ -47,6 +47,7 @@ fixes/update part 1.1 Stefani Seibold <stefani@seibold.net> June 9 2009
> 3.10 /proc/<pid>/timerslack_ns - Task timerslack value
> 3.11 /proc/<pid>/patch_state - Livepatch patch operation state
> 3.12 /proc/<pid>/arch_status - Task architecture specific information
> + 3.13 /proc/<pid>/fd - List of symlinks to open files
>
> 4 Configuring procfs
> 4.1 Mount options
> @@ -2149,6 +2150,22 @@ AVX512_elapsed_ms
> the task is unlikely an AVX512 user, but depends on the workload and the
> scheduling scenario, it also could be a false negative mentioned above.
>
> +3.13 /proc/<pid>/fd - List of symlinks to open files
> +-------------------------------------------------------
> +This directory contains symbolic links which represent open files
> +the process is maintaining. Example output::
> +
> + lr-x------ 1 root root 64 Sep 20 17:53 0 -> /dev/null
> + l-wx------ 1 root root 64 Sep 20 17:53 1 -> /dev/null
> + lrwx------ 1 root root 64 Sep 20 17:53 10 -> 'socket:[12539]'
> + lrwx------ 1 root root 64 Sep 20 17:53 11 -> 'socket:[12540]'
> + lrwx------ 1 root root 64 Sep 20 17:53 12 -> 'socket:[12542]'
> +
> +The number of open files for the process is stored in 'size' member
> +of stat() output for /proc/<pid>/fd for fast access.
> +-------------------------------------------------------
> +
> +
> Chapter 4: Configuring procfs
> =============================
>
> diff --git a/fs/proc/fd.c b/fs/proc/fd.c
> index 913bef0d2a36..fc46d6fe080c 100644
> --- a/fs/proc/fd.c
> +++ b/fs/proc/fd.c
> @@ -7,6 +7,7 @@
> #include <linux/namei.h>
> #include <linux/pid.h>
> #include <linux/ptrace.h>
> +#include <linux/bitmap.h>
> #include <linux/security.h>
> #include <linux/file.h>
> #include <linux/seq_file.h>
> @@ -279,6 +280,30 @@ static int proc_readfd_common(struct file *file, struct dir_context *ctx,
> return 0;
> }
>
> +static int proc_readfd_count(struct inode *inode, loff_t *count)
> +{
> + struct task_struct *p = get_proc_task(inode);
> + struct fdtable *fdt;
> +
> + if (!p)
> + return -ENOENT;
> +
> + task_lock(p);
> + if (p->files) {
> + rcu_read_lock();
> +
> + fdt = files_fdtable(p->files);
> + *count = bitmap_weight(fdt->open_fds, fdt->max_fds);
> +
> + rcu_read_unlock();
> + }
> + task_unlock(p);
> +
> + put_task_struct(p);
> +
> + return 0;
> +}
> +
> static int proc_readfd(struct file *file, struct dir_context *ctx)
> {
> return proc_readfd_common(file, ctx, proc_fd_instantiate);
> @@ -319,9 +344,29 @@ int proc_fd_permission(struct user_namespace *mnt_userns,
> return rv;
> }
>
> +static int proc_fd_getattr(struct user_namespace *mnt_userns,
> + const struct path *path, struct kstat *stat,
> + u32 request_mask, unsigned int query_flags)
> +{
> + struct inode *inode = d_inode(path->dentry);
> + int rv = 0;
> +
> + generic_fillattr(&init_user_ns, inode, stat);
> +
Sorry I missed this on v3, but shouldn't this pass through the
mnt_userns parameter?
> + /* If it's a directory, put the number of open fds there */
> + if (S_ISDIR(inode->i_mode)) {
> + rv = proc_readfd_count(inode, &stat->size);
> + if (rv < 0)
> + return rv;
> + }
Also I suppose this could just do:
if (S_ISDIR(inode->i_mode))
rv = proc_readfd_count(inode, &stat->size);
return rv;
But that's a nit. Otherwise seems reasonable to me.
Brian
> +
> + return rv;
> +}
> +
> const struct inode_operations proc_fd_inode_operations = {
> .lookup = proc_lookupfd,
> .permission = proc_fd_permission,
> + .getattr = proc_fd_getattr,
> .setattr = proc_setattr,
> };
>
> --
> 2.37.3
>
next prev parent reply other threads:[~2022-11-18 19:11 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-24 17:31 [PATCH v4] proc: report open files as size in stat() for /proc/pid/fd Ivan Babrou
2022-11-18 19:10 ` Brian Foster [this message]
2022-11-18 19:18 ` Ivan Babrou
2022-11-18 19:33 ` Brian Foster
2022-11-19 12:01 ` Christian Brauner
2022-11-21 11:42 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y3fYu2VCBgREBBau@bfoster \
--to=bfoster@redhat.com \
--cc=David.Laight@aculab.com \
--cc=adobriyan@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=corbet@lwn.net \
--cc=david@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=ivan@cloudflare.com \
--cc=kaleshsingh@google.com \
--cc=kernel-team@cloudflare.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mail@christoph.anton.mitterer.name \
--cc=paul.gortmaker@windriver.com \
--cc=rppt@kernel.org \
--cc=tytso@mit.edu \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.