* [RFC] proc: report open files as size in stat() for /proc/pid/fd @ 2022-09-16 23:08 Ivan Babrou 2022-09-17 0:01 ` Andrew Morton 2022-09-17 15:31 ` Theodore Ts'o 0 siblings, 2 replies; 10+ messages in thread From: Ivan Babrou @ 2022-09-16 23:08 UTC (permalink / raw) To: linux-fsdevel Cc: linux-kernel, kernel-team, Ivan Babrou, Andrew Morton, Kalesh Singh Many monitoring tools include open file count as a metric. Currently the only way to get this number is to enumerate the files in /proc/pid/fd. The problem with the current approach is that it does many things people generally don't care about when they need one number for a metric. In our tests for cadvisor, which reports open file counts per cgroup, we observed that reading the number of open files is slow. Out of 35.23% of CPU time spent in `proc_readfd_common`, we see 29.43% spent in `proc_fill_cache`, which is responsible for filling dentry info. Some of this extra time is spinlock contention, but it's a contention for the lock we don't want to take to begin with. We considered putting the number of open files in /proc/pid/stat. Unfortunately, counting the number of fds involves iterating the fdtable, which means that it might slow down /proc/pid/stat for processes with many open files. Instead we opted to put this info in /proc/pid/fd as a size member of the stat syscall result. Previously the reported number was zero, so there's very little risk of breaking anything, while still providing a somewhat logical way to count the open files. Previously: ``` $ sudo stat /proc/1/fd | head -n2 File: /proc/1/fd Size: 0 Blocks: 0 IO Block: 1024 directory ``` With this patch: ``` $ sudo stat /proc/1/fd | head -n2 File: /proc/1/fd Size: 65 Blocks: 0 IO Block: 1024 directory ``` Correctness check: ``` $ sudo ls /proc/1/fd | wc -l 65 ``` There are two alternatives to this approach that I can see: * Expose /proc/pid/fd_count with a count there * Make fd count acces O(1) and expose it in /proc/pid/status I can probably figure out how to do the former, but the latter will require somebody with more experience in file code than myself. Signed-off-by: Ivan Babrou <ivan@cloudflare.com> --- fs/proc/fd.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) diff --git a/fs/proc/fd.c b/fs/proc/fd.c index 913bef0d2a36..c7ac142500a8 100644 --- a/fs/proc/fd.c +++ b/fs/proc/fd.c @@ -279,6 +279,29 @@ static int proc_readfd_common(struct file *file, struct dir_context *ctx, return 0; } +static int proc_readfd_count(struct inode *inode) +{ + struct task_struct *p = get_proc_task(inode); + unsigned int fd = 0, count = 0; + + if (!p) + return -ENOENT; + + rcu_read_lock(); + while (task_lookup_next_fd_rcu(p, &fd)) { + rcu_read_unlock(); + + count++; + fd++; + + cond_resched(); + rcu_read_lock(); + } + rcu_read_unlock(); + put_task_struct(p); + return count; +} + static int proc_readfd(struct file *file, struct dir_context *ctx) { return proc_readfd_common(file, ctx, proc_fd_instantiate); @@ -319,9 +342,33 @@ int proc_fd_permission(struct user_namespace *mnt_userns, return rv; } +int proc_fd_getattr(struct user_namespace *mnt_userns, + const struct path *path, struct kstat *stat, + u32 request_mask, unsigned int query_flags) +{ + struct inode *inode = d_inode(path->dentry); + struct proc_dir_entry *de = PDE(inode); + + if (de) { + nlink_t nlink = READ_ONCE(de->nlink); + + if (nlink > 0) + set_nlink(inode, nlink); + } + + generic_fillattr(&init_user_ns, inode, stat); + + /* If it's a directory, put the number of open fds there */ + if (S_ISDIR(inode->i_mode)) + stat->size = proc_readfd_count(inode); + + return 0; +} + const struct inode_operations proc_fd_inode_operations = { .lookup = proc_lookupfd, .permission = proc_fd_permission, + .getattr = proc_fd_getattr, .setattr = proc_setattr, }; -- 2.37.2 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [RFC] proc: report open files as size in stat() for /proc/pid/fd 2022-09-16 23:08 [RFC] proc: report open files as size in stat() for /proc/pid/fd Ivan Babrou @ 2022-09-17 0:01 ` Andrew Morton 2022-09-17 0:28 ` Ivan Babrou 2022-09-17 7:15 ` Alexey Dobriyan 2022-09-17 15:31 ` Theodore Ts'o 1 sibling, 2 replies; 10+ messages in thread From: Andrew Morton @ 2022-09-17 0:01 UTC (permalink / raw) To: Ivan Babrou Cc: linux-fsdevel, linux-kernel, kernel-team, Kalesh Singh, Alexey Dobriyan, Al Viro (cc's added) On Fri, 16 Sep 2022 16:08:52 -0700 Ivan Babrou <ivan@cloudflare.com> wrote: > Many monitoring tools include open file count as a metric. Currently > the only way to get this number is to enumerate the files in /proc/pid/fd. > > The problem with the current approach is that it does many things people > generally don't care about when they need one number for a metric. > In our tests for cadvisor, which reports open file counts per cgroup, > we observed that reading the number of open files is slow. Out of 35.23% > of CPU time spent in `proc_readfd_common`, we see 29.43% spent in > `proc_fill_cache`, which is responsible for filling dentry info. > Some of this extra time is spinlock contention, but it's a contention > for the lock we don't want to take to begin with. > > We considered putting the number of open files in /proc/pid/stat. > Unfortunately, counting the number of fds involves iterating the fdtable, > which means that it might slow down /proc/pid/stat for processes > with many open files. Instead we opted to put this info in /proc/pid/fd > as a size member of the stat syscall result. Previously the reported > number was zero, so there's very little risk of breaking anything, > while still providing a somewhat logical way to count the open files. Documentation/filesystems/proc.rst would be an appropriate place to document this ;) > Previously: > > ``` > $ sudo stat /proc/1/fd | head -n2 > File: /proc/1/fd > Size: 0 Blocks: 0 IO Block: 1024 directory > ``` > > With this patch: > > ``` > $ sudo stat /proc/1/fd | head -n2 > File: /proc/1/fd > Size: 65 Blocks: 0 IO Block: 1024 directory > ``` > > Correctness check: > > ``` > $ sudo ls /proc/1/fd | wc -l > 65 > ``` > > There are two alternatives to this approach that I can see: > > * Expose /proc/pid/fd_count with a count there > * Make fd count acces O(1) and expose it in /proc/pid/status > > I can probably figure out how to do the former, but the latter > will require somebody with more experience in file code than myself. > > Signed-off-by: Ivan Babrou <ivan@cloudflare.com> > --- > fs/proc/fd.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 47 insertions(+) > > diff --git a/fs/proc/fd.c b/fs/proc/fd.c > index 913bef0d2a36..c7ac142500a8 100644 > --- a/fs/proc/fd.c > +++ b/fs/proc/fd.c > @@ -279,6 +279,29 @@ static int proc_readfd_common(struct file *file, struct dir_context *ctx, > return 0; > } > > +static int proc_readfd_count(struct inode *inode) > +{ > + struct task_struct *p = get_proc_task(inode); > + unsigned int fd = 0, count = 0; > + > + if (!p) > + return -ENOENT; > + > + rcu_read_lock(); > + while (task_lookup_next_fd_rcu(p, &fd)) { > + rcu_read_unlock(); > + > + count++; > + fd++; > + > + cond_resched(); > + rcu_read_lock(); > + } > + rcu_read_unlock(); > + put_task_struct(p); > + return count; > +} > + > static int proc_readfd(struct file *file, struct dir_context *ctx) > { > return proc_readfd_common(file, ctx, proc_fd_instantiate); > @@ -319,9 +342,33 @@ int proc_fd_permission(struct user_namespace *mnt_userns, > return rv; > } > > +int proc_fd_getattr(struct user_namespace *mnt_userns, > + const struct path *path, struct kstat *stat, > + u32 request_mask, unsigned int query_flags) > +{ > + struct inode *inode = d_inode(path->dentry); > + struct proc_dir_entry *de = PDE(inode); > + > + if (de) { > + nlink_t nlink = READ_ONCE(de->nlink); > + > + if (nlink > 0) > + set_nlink(inode, nlink); > + } > + > + generic_fillattr(&init_user_ns, inode, stat); > + > + /* If it's a directory, put the number of open fds there */ > + if (S_ISDIR(inode->i_mode)) > + stat->size = proc_readfd_count(inode); > + > + return 0; > +} > + > const struct inode_operations proc_fd_inode_operations = { > .lookup = proc_lookupfd, > .permission = proc_fd_permission, > + .getattr = proc_fd_getattr, > .setattr = proc_setattr, > }; > > -- > 2.37.2 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] proc: report open files as size in stat() for /proc/pid/fd 2022-09-17 0:01 ` Andrew Morton @ 2022-09-17 0:28 ` Ivan Babrou 2022-09-17 7:15 ` Alexey Dobriyan 1 sibling, 0 replies; 10+ messages in thread From: Ivan Babrou @ 2022-09-17 0:28 UTC (permalink / raw) To: Andrew Morton Cc: linux-fsdevel, linux-kernel, kernel-team, Kalesh Singh, Alexey Dobriyan, Al Viro On Fri, Sep 16, 2022 at 5:01 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > (cc's added) > > On Fri, 16 Sep 2022 16:08:52 -0700 Ivan Babrou <ivan@cloudflare.com> wrote: > > > Many monitoring tools include open file count as a metric. Currently > > the only way to get this number is to enumerate the files in /proc/pid/fd. > > > > The problem with the current approach is that it does many things people > > generally don't care about when they need one number for a metric. > > In our tests for cadvisor, which reports open file counts per cgroup, > > we observed that reading the number of open files is slow. Out of 35.23% > > of CPU time spent in `proc_readfd_common`, we see 29.43% spent in > > `proc_fill_cache`, which is responsible for filling dentry info. > > Some of this extra time is spinlock contention, but it's a contention > > for the lock we don't want to take to begin with. > > > > We considered putting the number of open files in /proc/pid/stat. > > Unfortunately, counting the number of fds involves iterating the fdtable, > > which means that it might slow down /proc/pid/stat for processes > > with many open files. Instead we opted to put this info in /proc/pid/fd > > as a size member of the stat syscall result. Previously the reported > > number was zero, so there's very little risk of breaking anything, > > while still providing a somewhat logical way to count the open files. > > Documentation/filesystems/proc.rst would be an appropriate place to > document this ;) I am more than happy to add the docs after there's a confirmation that this is an appropriate approach to expose this information. I probably should've mentioned this explicitly, that's on me. There are two alternative approaches at the bottom of my original email that might be considered. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] proc: report open files as size in stat() for /proc/pid/fd 2022-09-17 0:01 ` Andrew Morton 2022-09-17 0:28 ` Ivan Babrou @ 2022-09-17 7:15 ` Alexey Dobriyan 2022-09-17 18:32 ` Ivan Babrou 2022-09-19 8:18 ` Christian Brauner 1 sibling, 2 replies; 10+ messages in thread From: Alexey Dobriyan @ 2022-09-17 7:15 UTC (permalink / raw) To: Andrew Morton Cc: Ivan Babrou, linux-fsdevel, linux-kernel, kernel-team, Kalesh Singh, Al Viro On Fri, Sep 16, 2022 at 05:01:15PM -0700, Andrew Morton wrote: > (cc's added) > > On Fri, 16 Sep 2022 16:08:52 -0700 Ivan Babrou <ivan@cloudflare.com> wrote: > > > Many monitoring tools include open file count as a metric. Currently > > the only way to get this number is to enumerate the files in /proc/pid/fd. > > > > The problem with the current approach is that it does many things people > > generally don't care about when they need one number for a metric. > > In our tests for cadvisor, which reports open file counts per cgroup, > > we observed that reading the number of open files is slow. Out of 35.23% > > of CPU time spent in `proc_readfd_common`, we see 29.43% spent in > > `proc_fill_cache`, which is responsible for filling dentry info. > > Some of this extra time is spinlock contention, but it's a contention > > for the lock we don't want to take to begin with. > > > > We considered putting the number of open files in /proc/pid/stat. > > Unfortunately, counting the number of fds involves iterating the fdtable, > > which means that it might slow down /proc/pid/stat for processes > > with many open files. Instead we opted to put this info in /proc/pid/fd > > as a size member of the stat syscall result. Previously the reported > > number was zero, so there's very little risk of breaking anything, > > while still providing a somewhat logical way to count the open files. > > Documentation/filesystems/proc.rst would be an appropriate place to > document this ;) > > > Previously: > > > > ``` > > $ sudo stat /proc/1/fd | head -n2 > > File: /proc/1/fd > > Size: 0 Blocks: 0 IO Block: 1024 directory > > ``` > > > > With this patch: > > > > ``` > > $ sudo stat /proc/1/fd | head -n2 > > File: /proc/1/fd > > Size: 65 Blocks: 0 IO Block: 1024 directory Yes. This is natural place. > > ``` > > > > Correctness check: > > > > ``` > > $ sudo ls /proc/1/fd | wc -l > > 65 > > ``` > > > > There are two alternatives to this approach that I can see: > > > > * Expose /proc/pid/fd_count with a count there > > * Make fd count acces O(1) and expose it in /proc/pid/status This is doable, next to FDSize. Below is doable too. > > --- a/fs/proc/fd.c > > +++ b/fs/proc/fd.c > > @@ -279,6 +279,29 @@ static int proc_readfd_common(struct file *file, struct dir_context *ctx, > > return 0; > > } > > > > +static int proc_readfd_count(struct inode *inode) > > +{ > > + struct task_struct *p = get_proc_task(inode); > > + unsigned int fd = 0, count = 0; > > + > > + if (!p) > > + return -ENOENT; > > + > > + rcu_read_lock(); > > + while (task_lookup_next_fd_rcu(p, &fd)) { > > + rcu_read_unlock(); > > + > > + count++; > > + fd++; > > + > > + cond_resched(); > > + rcu_read_lock(); > > + } > > + rcu_read_unlock(); > > + put_task_struct(p); > > + return count; > > +} > > + > > static int proc_readfd(struct file *file, struct dir_context *ctx) > > { > > return proc_readfd_common(file, ctx, proc_fd_instantiate); > > @@ -319,9 +342,33 @@ int proc_fd_permission(struct user_namespace *mnt_userns, > > return rv; > > } > > > > +int proc_fd_getattr(struct user_namespace *mnt_userns, > > + const struct path *path, struct kstat *stat, > > + u32 request_mask, unsigned int query_flags) > > +{ > > + struct inode *inode = d_inode(path->dentry); > > + struct proc_dir_entry *de = PDE(inode); > > + > > + if (de) { > > + nlink_t nlink = READ_ONCE(de->nlink); > > + > > + if (nlink > 0) > > + set_nlink(inode, nlink); > > + } > > + > > + generic_fillattr(&init_user_ns, inode, stat); ^^^^^^^^^^^^^ Is this correct? I'm not userns guy at all. > > + > > + /* If it's a directory, put the number of open fds there */ > > + if (S_ISDIR(inode->i_mode)) > > + stat->size = proc_readfd_count(inode); ENOENT can get there. In principle this is OK, userspace can live with it. > > const struct inode_operations proc_fd_inode_operations = { > > .lookup = proc_lookupfd, > > .permission = proc_fd_permission, > > + .getattr = proc_fd_getattr, > > .setattr = proc_setattr, ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] proc: report open files as size in stat() for /proc/pid/fd 2022-09-17 7:15 ` Alexey Dobriyan @ 2022-09-17 18:32 ` Ivan Babrou 2022-09-19 12:30 ` Alexey Dobriyan 2022-09-19 8:18 ` Christian Brauner 1 sibling, 1 reply; 10+ messages in thread From: Ivan Babrou @ 2022-09-17 18:32 UTC (permalink / raw) To: Alexey Dobriyan Cc: Andrew Morton, linux-fsdevel, linux-kernel, kernel-team, Kalesh Singh, Al Viro > > > * Make fd count acces O(1) and expose it in /proc/pid/status > > This is doable, next to FDSize. It feels like a better solution, but maybe I'm missing some context here. Let me know whether this is preferred. That said, I've tried doing it, but failed. There's a noticeable mismatch in the numbers: * systemd: ivan@vm:~$ sudo ls -l /proc/1/fd | wc -l 66 ivan@vm:~$ cat /proc/1/status | fgrep FD FDSize: 256 FDUsed: 71 * journald: ivan@vm:~$ sudo ls -l /proc/803/fd | wc -l 29 ivan@vm:~$ cat /proc/803/status | fgrep FD FDSize: 128 FDUsed: 37 I'll see if I can make it work next week. I'm happy to receive tips as well. Below is my attempt (link in case gmail breaks patch formatting): * https://gist.githubusercontent.com/bobrik/acce40881d629d8cce2e55966b31a0a2/raw/716eb4724a8fe3afeeb76fd2a7a47ee13790a9e9/fdused.patch diff --git a/fs/file.c b/fs/file.c index 3bcc1ecc314a..8bc0741cabf1 100644 --- a/fs/file.c +++ b/fs/file.c @@ -85,6 +85,8 @@ static void copy_fdtable(struct fdtable *nfdt, struct fdtable *ofdt) memset((char *)nfdt->fd + cpy, 0, set); copy_fd_bitmaps(nfdt, ofdt, ofdt->max_fds); + + atomic_set(&nfdt->count, atomic_read(&ofdt->count)); } /* @@ -105,6 +107,7 @@ static void copy_fdtable(struct fdtable *nfdt, struct fdtable *ofdt) static struct fdtable * alloc_fdtable(unsigned int nr) { struct fdtable *fdt; + atomic_t count = ATOMIC_INIT(0); void *data; /* @@ -148,6 +151,7 @@ static struct fdtable * alloc_fdtable(unsigned int nr) fdt->close_on_exec = data; data += nr / BITS_PER_BYTE; fdt->full_fds_bits = data; + fdt->count = count; return fdt; @@ -399,6 +403,8 @@ struct files_struct *dup_fd(struct files_struct *oldf, unsigned int max_fds, int /* clear the remainder */ memset(new_fds, 0, (new_fdt->max_fds - open_files) * sizeof(struct file *)); + atomic_set(&new_fdt->count, atomic_read(&old_fdt->count)); + rcu_assign_pointer(newf->fdt, new_fdt); return newf; @@ -474,6 +480,7 @@ struct files_struct init_files = { .close_on_exec = init_files.close_on_exec_init, .open_fds = init_files.open_fds_init, .full_fds_bits = init_files.full_fds_bits_init, + .count = ATOMIC_INIT(0), }, .file_lock = __SPIN_LOCK_UNLOCKED(init_files.file_lock), .resize_wait = __WAIT_QUEUE_HEAD_INITIALIZER(init_files.resize_wait), @@ -613,6 +620,7 @@ void fd_install(unsigned int fd, struct file *file) BUG_ON(fdt->fd[fd] != NULL); rcu_assign_pointer(fdt->fd[fd], file); spin_unlock(&files->file_lock); + atomic_inc(&fdt->count); return; } /* coupled with smp_wmb() in expand_fdtable() */ @@ -621,6 +629,7 @@ void fd_install(unsigned int fd, struct file *file) BUG_ON(fdt->fd[fd] != NULL); rcu_assign_pointer(fdt->fd[fd], file); rcu_read_unlock_sched(); + atomic_inc(&fdt->count); } EXPORT_SYMBOL(fd_install); @@ -646,6 +655,7 @@ static struct file *pick_file(struct files_struct *files, unsigned fd) if (file) { rcu_assign_pointer(fdt->fd[fd], NULL); __put_unused_fd(files, fd); + atomic_dec(&fdt->count); } return file; } @@ -844,6 +854,7 @@ void do_close_on_exec(struct files_struct *files) filp_close(file, files); cond_resched(); spin_lock(&files->file_lock); + atomic_dec(&fdt->count); } } @@ -1108,6 +1119,7 @@ __releases(&files->file_lock) else __clear_close_on_exec(fd, fdt); spin_unlock(&files->file_lock); + atomic_inc(&fdt->count); if (tofree) filp_close(tofree, files); diff --git a/fs/proc/array.c b/fs/proc/array.c index 99fcbfda8e25..5847f077bfc3 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -153,7 +153,8 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns, struct task_struct *tracer; const struct cred *cred; pid_t ppid, tpid = 0, tgid, ngid; - unsigned int max_fds = 0; + struct fdtable *fdt; + unsigned int max_fds = 0, open_fds = 0; rcu_read_lock(); ppid = pid_alive(p) ? @@ -170,8 +171,11 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns, task_lock(p); if (p->fs) umask = p->fs->umask; - if (p->files) - max_fds = files_fdtable(p->files)->max_fds; + if (p->files) { + fdt = files_fdtable(p->files); + max_fds = fdt->max_fds; + open_fds = atomic_read(&fdt->count); + } task_unlock(p); rcu_read_unlock(); @@ -194,6 +198,7 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns, seq_put_decimal_ull(m, "\t", from_kgid_munged(user_ns, cred->sgid)); seq_put_decimal_ull(m, "\t", from_kgid_munged(user_ns, cred->fsgid)); seq_put_decimal_ull(m, "\nFDSize:\t", max_fds); + seq_put_decimal_ull(m, "\nFDUsed:\t", open_fds); seq_puts(m, "\nGroups:\t"); group_info = cred->group_info; diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h index e066816f3519..59aceb1e4bc6 100644 --- a/include/linux/fdtable.h +++ b/include/linux/fdtable.h @@ -31,6 +31,7 @@ struct fdtable { unsigned long *open_fds; unsigned long *full_fds_bits; struct rcu_head rcu; + atomic_t count; }; static inline bool close_on_exec(unsigned int fd, const struct fdtable *fdt) > > > + > > > + generic_fillattr(&init_user_ns, inode, stat); > ^^^^^^^^^^^^^ > > Is this correct? I'm not userns guy at all. I mostly copied from here: * https://elixir.bootlin.com/linux/v6.0-rc5/source/fs/proc/generic.c#L150 Maybe it can be simplified even further to match this one: * https://elixir.bootlin.com/linux/v6.0-rc5/source/fs/proc/root.c#L317 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [RFC] proc: report open files as size in stat() for /proc/pid/fd 2022-09-17 18:32 ` Ivan Babrou @ 2022-09-19 12:30 ` Alexey Dobriyan 2022-09-19 22:03 ` Ivan Babrou 0 siblings, 1 reply; 10+ messages in thread From: Alexey Dobriyan @ 2022-09-19 12:30 UTC (permalink / raw) To: Ivan Babrou Cc: Andrew Morton, linux-fsdevel, linux-kernel, kernel-team, Kalesh Singh, Al Viro On Sat, Sep 17, 2022 at 11:32:02AM -0700, Ivan Babrou wrote: > > > > * Make fd count acces O(1) and expose it in /proc/pid/status > > > > This is doable, next to FDSize. > > It feels like a better solution, but maybe I'm missing some context > here. Let me know whether this is preferred. I don't know. I'd put it in st_size as you did initially. /proc/*/status should be slow. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] proc: report open files as size in stat() for /proc/pid/fd 2022-09-19 12:30 ` Alexey Dobriyan @ 2022-09-19 22:03 ` Ivan Babrou 0 siblings, 0 replies; 10+ messages in thread From: Ivan Babrou @ 2022-09-19 22:03 UTC (permalink / raw) To: Alexey Dobriyan Cc: Andrew Morton, linux-fsdevel, linux-kernel, kernel-team, Kalesh Singh, Al Viro On Mon, Sep 19, 2022 at 5:30 AM Alexey Dobriyan <adobriyan@gmail.com> wrote: > > On Sat, Sep 17, 2022 at 11:32:02AM -0700, Ivan Babrou wrote: > > > > > * Make fd count acces O(1) and expose it in /proc/pid/status > > > > > > This is doable, next to FDSize. > > > > It feels like a better solution, but maybe I'm missing some context > > here. Let me know whether this is preferred. > > I don't know. I'd put it in st_size as you did initially. > /proc/*/status should be slow. Could you elaborate what you mean? * Are you saying that having FDUsed in /proc/*/status _would_ be slow? I would imagine that adding atomic_read() there shouldn't slow things down too much. * Are you saying that reading /proc/*/status is already slow and reading the number of open files from there would be inefficient? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] proc: report open files as size in stat() for /proc/pid/fd 2022-09-17 7:15 ` Alexey Dobriyan 2022-09-17 18:32 ` Ivan Babrou @ 2022-09-19 8:18 ` Christian Brauner 1 sibling, 0 replies; 10+ messages in thread From: Christian Brauner @ 2022-09-19 8:18 UTC (permalink / raw) To: Alexey Dobriyan Cc: Andrew Morton, Ivan Babrou, linux-fsdevel, linux-kernel, kernel-team, Kalesh Singh, Al Viro On Sat, Sep 17, 2022 at 10:15:13AM +0300, Alexey Dobriyan wrote: > On Fri, Sep 16, 2022 at 05:01:15PM -0700, Andrew Morton wrote: > > (cc's added) > > > > On Fri, 16 Sep 2022 16:08:52 -0700 Ivan Babrou <ivan@cloudflare.com> wrote: > > > > > Many monitoring tools include open file count as a metric. Currently > > > the only way to get this number is to enumerate the files in /proc/pid/fd. > > > > > > The problem with the current approach is that it does many things people > > > generally don't care about when they need one number for a metric. > > > In our tests for cadvisor, which reports open file counts per cgroup, > > > we observed that reading the number of open files is slow. Out of 35.23% > > > of CPU time spent in `proc_readfd_common`, we see 29.43% spent in > > > `proc_fill_cache`, which is responsible for filling dentry info. > > > Some of this extra time is spinlock contention, but it's a contention > > > for the lock we don't want to take to begin with. > > > > > > We considered putting the number of open files in /proc/pid/stat. > > > Unfortunately, counting the number of fds involves iterating the fdtable, > > > which means that it might slow down /proc/pid/stat for processes > > > with many open files. Instead we opted to put this info in /proc/pid/fd > > > as a size member of the stat syscall result. Previously the reported > > > number was zero, so there's very little risk of breaking anything, > > > while still providing a somewhat logical way to count the open files. > > > > Documentation/filesystems/proc.rst would be an appropriate place to > > document this ;) > > > > > Previously: > > > > > > ``` > > > $ sudo stat /proc/1/fd | head -n2 > > > File: /proc/1/fd > > > Size: 0 Blocks: 0 IO Block: 1024 directory > > > ``` > > > > > > With this patch: > > > > > > ``` > > > $ sudo stat /proc/1/fd | head -n2 > > > File: /proc/1/fd > > > Size: 65 Blocks: 0 IO Block: 1024 directory > > Yes. This is natural place. > > > > ``` > > > > > > Correctness check: > > > > > > ``` > > > $ sudo ls /proc/1/fd | wc -l > > > 65 > > > ``` > > > > > > There are two alternatives to this approach that I can see: > > > > > > * Expose /proc/pid/fd_count with a count there > > > > * Make fd count acces O(1) and expose it in /proc/pid/status > > This is doable, next to FDSize. > > Below is doable too. > > > > --- a/fs/proc/fd.c > > > +++ b/fs/proc/fd.c > > > @@ -279,6 +279,29 @@ static int proc_readfd_common(struct file *file, struct dir_context *ctx, > > > return 0; > > > } > > > > > > +static int proc_readfd_count(struct inode *inode) > > > +{ > > > + struct task_struct *p = get_proc_task(inode); > > > + unsigned int fd = 0, count = 0; > > > + > > > + if (!p) > > > + return -ENOENT; > > > + > > > + rcu_read_lock(); > > > + while (task_lookup_next_fd_rcu(p, &fd)) { > > > + rcu_read_unlock(); > > > + > > > + count++; > > > + fd++; > > > + > > > + cond_resched(); > > > + rcu_read_lock(); > > > + } > > > + rcu_read_unlock(); > > > + put_task_struct(p); > > > + return count; > > > +} > > > + > > > static int proc_readfd(struct file *file, struct dir_context *ctx) > > > { > > > return proc_readfd_common(file, ctx, proc_fd_instantiate); > > > @@ -319,9 +342,33 @@ int proc_fd_permission(struct user_namespace *mnt_userns, > > > return rv; > > > } > > > > > > +int proc_fd_getattr(struct user_namespace *mnt_userns, > > > + const struct path *path, struct kstat *stat, > > > + u32 request_mask, unsigned int query_flags) > > > +{ > > > + struct inode *inode = d_inode(path->dentry); > > > + struct proc_dir_entry *de = PDE(inode); > > > + > > > + if (de) { > > > + nlink_t nlink = READ_ONCE(de->nlink); > > > + > > > + if (nlink > 0) > > > + set_nlink(inode, nlink); > > > + } > > > + > > > + generic_fillattr(&init_user_ns, inode, stat); > ^^^^^^^^^^^^^ > > Is this correct? I'm not userns guy at all. This is correct. :) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] proc: report open files as size in stat() for /proc/pid/fd 2022-09-16 23:08 [RFC] proc: report open files as size in stat() for /proc/pid/fd Ivan Babrou 2022-09-17 0:01 ` Andrew Morton @ 2022-09-17 15:31 ` Theodore Ts'o 2022-09-17 18:32 ` Ivan Babrou 1 sibling, 1 reply; 10+ messages in thread From: Theodore Ts'o @ 2022-09-17 15:31 UTC (permalink / raw) To: Ivan Babrou Cc: linux-fsdevel, linux-kernel, kernel-team, Andrew Morton, Kalesh Singh On Fri, Sep 16, 2022 at 04:08:52PM -0700, Ivan Babrou wrote: > We considered putting the number of open files in /proc/pid/stat. > Unfortunately, counting the number of fds involves iterating the fdtable, > which means that it might slow down /proc/pid/stat for processes > with many open files. Instead we opted to put this info in /proc/pid/fd > as a size member of the stat syscall result. Previously the reported > number was zero, so there's very little risk of breaking anything, > while still providing a somewhat logical way to count the open files. Instead of using the st_size of /proc/<pid>/fd, why not return that value in st_nlink? /proc/<pid>/fd is a directory, so having st_nlinks return number of fd's plus 2 (for . and ..) would be much more natural. Cheers, - Ted ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] proc: report open files as size in stat() for /proc/pid/fd 2022-09-17 15:31 ` Theodore Ts'o @ 2022-09-17 18:32 ` Ivan Babrou 0 siblings, 0 replies; 10+ messages in thread From: Ivan Babrou @ 2022-09-17 18:32 UTC (permalink / raw) To: Theodore Ts'o Cc: linux-fsdevel, linux-kernel, kernel-team, Andrew Morton, Kalesh Singh On Sat, Sep 17, 2022 at 8:31 AM Theodore Ts'o <tytso@mit.edu> wrote: > > On Fri, Sep 16, 2022 at 04:08:52PM -0700, Ivan Babrou wrote: > > We considered putting the number of open files in /proc/pid/stat. > > Unfortunately, counting the number of fds involves iterating the fdtable, > > which means that it might slow down /proc/pid/stat for processes > > with many open files. Instead we opted to put this info in /proc/pid/fd > > as a size member of the stat syscall result. Previously the reported > > number was zero, so there's very little risk of breaking anything, > > while still providing a somewhat logical way to count the open files. > > Instead of using the st_size of /proc/<pid>/fd, why not return that > value in st_nlink? /proc/<pid>/fd is a directory, so having st_nlinks > return number of fd's plus 2 (for . and ..) would be much more natural. From what I see, st_nlinks is used for the number of subdirectories and it doesn't include files. In /proc/fd we only have files (well, symlinks really). I'm still happy to use that instead if that's preferred. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2022-09-19 22:04 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-09-16 23:08 [RFC] proc: report open files as size in stat() for /proc/pid/fd Ivan Babrou 2022-09-17 0:01 ` Andrew Morton 2022-09-17 0:28 ` Ivan Babrou 2022-09-17 7:15 ` Alexey Dobriyan 2022-09-17 18:32 ` Ivan Babrou 2022-09-19 12:30 ` Alexey Dobriyan 2022-09-19 22:03 ` Ivan Babrou 2022-09-19 8:18 ` Christian Brauner 2022-09-17 15:31 ` Theodore Ts'o 2022-09-17 18:32 ` Ivan Babrou
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).