From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexey Dobriyan Subject: Re: [PATCH v2 2/2] pidmap(2) Date: Tue, 26 Sep 2017 21:46:43 +0300 Message-ID: <20170926184643.GC14724@avx2> References: <20170924200620.GA24368@avx2> <20170924200822.GB24368@avx2> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Return-path: Content-Disposition: inline In-Reply-To: Sender: linux-api-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Andy Lutomirski Cc: Andrew Morton , "linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , Linux API , Randy Dunlap , Thomas Gleixner , Djalal Harouni , Alexey Gladkov , Tatsiana Brouka , Aliaksandr Patseyenak List-Id: linux-api@vger.kernel.org On Sun, Sep 24, 2017 at 02:27:00PM -0700, Andy Lutomirski wrote: > On Sun, Sep 24, 2017 at 1:08 PM, Alexey Dobriyan wrote: > > From: Tatsiana Brouka > > > > Implement system call for bulk retrieveing of pids in binary form. > > > > Using /proc is slower than necessary: 3 syscalls + another 3 for each thread + > > converting with atoi() + instantiating dentries and inodes. > > > > /proc may be not mounted especially in containers. Natural extension of > > hidepid=2 efforts is to not mount /proc at all. > > > > It could be used by programs like ps, top or CRIU. Speed increase will > > become more drastic once combined with bulk retrieval of process statistics. > > > > Benchmark: > > > > N=1<<16 times > > ~130 processes (~250 task_structs) on a regular desktop system > > opendir + readdir + closedir /proc + the same for every /proc/$PID/task > > (roughly what htop(1) does) vs pidmap > > > > /proc 16.80 ± 0.73% > > pidmap 0.06 ± 0.31% > > > > PIDMAP_* flags are modelled after /proc/task_diag patchset. > > > > > > PIDMAP(2) Linux Programmer's Manual PIDMAP(2) > > > > NAME > > pidmap - get allocated PIDs > > > > SYNOPSIS > > long pidmap(pid_t pid, int *pids, unsigned int count , unsigned int start, int flags); > > I think we will seriously regret a syscall that does this. Djalal is > working on fixing the turd that is hidepid, and this syscall is > basically incompatible with ever fixing hidepids. I think that, to > make it less regrettable, it needs to take an fd to a proc mount as a > parameter. This makes me wonder why it's a syscall at all -- why not > just create a new file like /proc/pids? See reply to fdmap(2). pidmap(2) is indeed more complex case exactly because of pid/tgid/tid/everything else + pidnamespaces + ->hide_pid. However the problem remains: query task tree without all the bullshit. C/R people succumbed with /proc/*/children, it was a mistake IMO.