All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yury Norov <yury.norov@gmail.com>
To: Michael Ellerman <mpe@ellerman.id.au>
Cc: Rafael Aquini <aquini@redhat.com>,
	Joel Savitz <jsavitz@redhat.com>,
	linux-kernel@vger.kernel.org,
	Alexey Dobriyan <adobriyan@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	Ram Pai <linuxram@us.ibm.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Huang Ying <ying.huang@intel.com>,
	Sandeep Patil <sspatil@android.com>,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v3] fs/proc: add VmTaskSize field to /proc/$$/status
Date: Sun, 19 May 2019 14:31:29 -0700	[thread overview]
Message-ID: <20190519213129.GA32053@yury-thinkpad> (raw)
In-Reply-To: <87k1ettv91.fsf@concordia.ellerman.id.au>

On Tue, May 14, 2019 at 04:17:46PM +1000, Michael Ellerman wrote:
> Yury Norov <yury.norov@gmail.com> writes:
> > On Fri, May 10, 2019 at 01:32:22PM +1000, Michael Ellerman wrote:
> >> Yury Norov <yury.norov@gmail.com> writes:
> >> > On Tue, May 07, 2019 at 08:54:31AM -0400, Rafael Aquini wrote:
> >> >> On Mon, May 06, 2019 at 11:53:43AM -0400, Joel Savitz wrote:
> >> >> > There is currently no easy and architecture-independent way to find the
> >> >> > lowest unusable virtual address available to a process without
> >> >> > brute-force calculation. This patch allows a user to easily retrieve
> >> >> > this value via /proc/<pid>/status.
> >> >> > 
> >> >> > Using this patch, any program that previously needed to waste cpu cycles
> >> >> > recalculating a non-sensitive process-dependent value already known to
> >> >> > the kernel can now be optimized to use this mechanism.
> >> >> > 
> >> >> > Signed-off-by: Joel Savitz <jsavitz@redhat.com>
> >> >> > ---
> >> >> >  Documentation/filesystems/proc.txt | 2 ++
> >> >> >  fs/proc/task_mmu.c                 | 2 ++
> >> >> >  2 files changed, 4 insertions(+)
> >> >> > 
> >> >> > diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
> >> >> > index 66cad5c86171..1c6a912e3975 100644
> >> >> > --- a/Documentation/filesystems/proc.txt
> >> >> > +++ b/Documentation/filesystems/proc.txt
> >> >> > @@ -187,6 +187,7 @@ read the file /proc/PID/status:
> >> >> >    VmLib:      1412 kB
> >> >> >    VmPTE:        20 kb
> >> >> >    VmSwap:        0 kB
> >> >> > +  VmTaskSize:	137438953468 kB
> >> >> >    HugetlbPages:          0 kB
> >> >> >    CoreDumping:    0
> >> >> >    THP_enabled:	  1
> >> >> > @@ -263,6 +264,7 @@ Table 1-2: Contents of the status files (as of 4.19)
> >> >> >   VmPTE                       size of page table entries
> >> >> >   VmSwap                      amount of swap used by anonymous private data
> >> >> >                               (shmem swap usage is not included)
> >> >> > + VmTaskSize                  lowest unusable address in process virtual memory
> >> >> 
> >> >> Can we change this help text to "size of process' virtual address space memory" ?
> >> >
> >> > Agree. Or go in other direction and make it VmEnd
> >> 
> >> Yeah I think VmEnd would be clearer to folks who aren't familiar with
> >> the kernel's usage of the TASK_SIZE terminology.
> >> 
> >> >> > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> >> >> > index 95ca1fe7283c..0af7081f7b19 100644
> >> >> > --- a/fs/proc/task_mmu.c
> >> >> > +++ b/fs/proc/task_mmu.c
> >> >> > @@ -74,6 +74,8 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
> >> >> >  	seq_put_decimal_ull_width(m,
> >> >> >  		    " kB\nVmPTE:\t", mm_pgtables_bytes(mm) >> 10, 8);
> >> >> >  	SEQ_PUT_DEC(" kB\nVmSwap:\t", swap);
> >> >> > +	seq_put_decimal_ull_width(m,
> >> >> > +		    " kB\nVmTaskSize:\t", mm->task_size >> 10, 8);
> >> >> >  	seq_puts(m, " kB\n");
> >> >> >  	hugetlb_report_usage(m, mm);
> >> >> >  }
> >> >
> >> > I'm OK with technical part, but I still have questions not answered
> >> > (or wrongly answered) in v1 and v2. Below is the very detailed
> >> > description of the concerns I have.
> >> >
> >> > 1. What is the exact reason for it? Original version tells about some
> >> > test that takes so much time that you were able to drink a cup of
> >> > coffee before it was done. The test as you said implements linear
> >> > search to find the last page and so is of O(n). If it's only for some
> >> > random test, I think the kernel can survive without it. Do you have a
> >> > real example of useful programs that suffer without this information?
> >> >
> >> >
> >> > 2. I have nothing against taking breaks and see nothing weird if
> >> > ineffective algorithms take time. On my system (x86, Ubuntu) the last
> >> > mapped region according to /proc/<pid>/maps is:
> >> > ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0     [vsyscall]
> >> > So to find the required address, we have to inspect 2559 pages. With a
> >> > binary search it would take 12 iterations at max. If my calculation is
> >> > wrong or your environment is completely different - please elaborate.
> >> 
> >> I agree it should not be hard to calculate, but at the same time it's
> >> trivial for the kernel to export the information so I don't see why the
> >> kernel shouldn't.
> >
> > Kernel shouldn't do it unless there will be real users of the feature.
> > Otherwise it's pure bloating.
> 
> A single line or two of code to print a value that's useful information
> for userspace is hardly "bloat".
> 
> I agree it's good to have users for things, but this seems like it's so
> trivial that we should just add it and someone will find a use for it.

Little bloat is still bloat. Trivial useless code is still useless.

If someone finds a use of VmEnd, it should be thoroughly reviewed for
better alternatives.
 
> > One possible user of it that I can imagine is mmap(MAP_FIXED). The
> > documentation is very clear about it:
> >
> >    Furthermore,  this  option  is  extremely  hazardous (when used on its own),
> >    because it forcibly removes preexisting mappings, making it easy for a 
> >    multithreaded  process  to corrupt its own address space.
> >
> > VmEnd provided by kernel may encourage people to solve their problems
> > by using MAP_FIXED which is potentially dangerous.
> 
> There's MAP_FIXED_NOREPLACE now which is not dangerous.

MAP_FIXED_NOREPLACE is still not supported by glibc and not
documented. (Glibc doesn't use mman-common.h that comes from kernel,
and defines all mmap-related stuff in its own bits/mman.h). Therefore
from the point of view of 99% users MAP_FIXED_NOREPLACE doesn't exist.
Bionic defines MAP_FIXED_NOREPLACE but does not document it and doesn't
use.

> Using MAX_FIXED_NOREPLACE and VmEnd would make it relatively easy to do
> a userspace ASLR implementation, so that actually is an argument in
> favour IMHO.

Kernel-supported ASLR works well since 2.6.12. Do you see any
downside of using it?

MAP_RANDOM would be even more handy for userspace ASLR.

VmEnd in current form would break certain userspace programs that
has DEFAULT_MAP_WINDOW != TASK_SIZE. This is the case for 48-bit VA
programs running on 52-bits VA ARM kernel. See 363524d2b1227
(arm64: mm: Introduce DEFAULT_MAP_WINDOW).

> > Another scenario of VmEnd is to understand how many top bits of address will
> > be always zero to allocate them for user's purpose, like smart pointers. It
> > worth to discuss this usecase with compiler people. If they have interest,
> > I think it's more straightforward to give them something like:
> >    int preserve_top_bits(int nbits);
> 
> You mean a syscall?
> 
> With things like hardware pointer tagging / colouring coming along I
> think you're right that using VmEnd and assuming the top bits are never
> used is a bad idea, an explicit interface would be better.
> 
> cheers

      reply	other threads:[~2019-05-19 21:31 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-06 15:53 [PATCH v3] fs/proc: add VmTaskSize field to /proc/$$/status Joel Savitz
2019-05-07 12:54 ` Rafael Aquini
2019-05-08  6:37   ` Yury Norov
2019-05-08 13:47     ` Rafael Aquini
2019-05-10  3:32     ` Michael Ellerman
2019-05-10  7:25       ` Yury Norov
2019-05-14  6:17         ` Michael Ellerman
2019-05-19 21:31           ` Yury Norov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190519213129.GA32053@yury-thinkpad \
    --to=yury.norov@gmail.com \
    --cc=aarcange@redhat.com \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=aquini@redhat.com \
    --cc=jsavitz@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxram@us.ibm.com \
    --cc=mpe@ellerman.id.au \
    --cc=sspatil@android.com \
    --cc=vbabka@suse.cz \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.