Re: /proc/kcore reads 0's for vmap_block

public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed

From: Uladzislau Rezki <urezki@gmail.com>
To: Baoquan He <bhe@redhat.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Uladzislau Rezki <urezki@gmail.com>,
	Christoph Hellwig <hch@infradead.org>,
	Matthew Wilcox <willy@infradead.org>
Subject: Re: /proc/kcore reads 0's for vmap_block
Date: Fri, 28 Oct 2022 09:52:42 +0200	[thread overview]
Message-ID: <Y1uKSmgURNEa3nQu@pc636> (raw)
In-Reply-To: <Y1pHTj2wuhoWmeV3@MiWiFi-R3L-srv>

On Thu, Oct 27, 2022 at 04:54:38PM +0800, Baoquan He wrote:
> On 10/26/22 at 04:15pm, Stephen Brennan wrote:
> > Hi all,
> > 
> > The /proc/kcore interface uses vread() to read memory addresses which
> > are in the vmalloc region, and it seems that vread() doesn't support
> > vmap_areas which are associated with a vmap_block. As far as I can tell,
> > those have vmap_area.vm == NULL, and get skipped during vread()'s
> > iteration. So the end result is that the read simply returns 0's.
> 
> Hmm, with my understanding, it should be vm_map_ram() which is called
> without a struct vm_struct associated, and it's the only interface do
> to so. vmap_block is the optimization way based on percpu to reduce vmap
> lock race.
> 
> > 
> > This impacts live debuggers like gdb and drgn, which is how I stumbled
> > upon it[1]. It looks like crash avoids the issue by doing a page table
> > walk and reading the physical address.
> > 
> > I'm wondering if there's any rationale for this omission from vread():
> > is it a simple oversight, or was it omitted due to the difficulty? Is
> > it possible for /proc/kcore to simply take the page faults when it reads
> > unmapped memory and handle them? (I'm sure that's already discussed or
> > is obviously infeasible for some reason beyond me.)
> 
> From git history, the vmlist iterating was taken in vread() at the
> first place. Later, in below commit, when people changed to iterate
> vmap_area_list instead, they just inherited the old code logic. I guess
> that's why vmap with NULL ->vm is skipped.
> 
> commit e81ce85f960c ("mm, vmalloc: iterate vmap_area_list, instead of vmlist in vread/vwrite()")
> 
> > 
> > Ideally, I'm just looking for a way forward that allows the debugger to
> > *work* as expected, meaning either that /proc/kcore always reads the
> > correct data, or that the debugger can know ahead of time that it will
> > need to do some processing (like a page table walk) first. 
> 
> I think we can adjust vread() to allow those vmap_area with NULL ->vm
> being read out? Made a draft patch at below, please feel free to have a
> test. Not sure if there's any risk.
> 
> From 9f1b786730f3ee0a8d5b48a94dbefa674102d7b9 Mon Sep 17 00:00:00 2001
> From: Baoquan He <bhe@redhat.com>
> Date: Thu, 27 Oct 2022 16:20:26 +0800
> Subject: [PATCH] mm/vmalloc.c: allow to read out vm_map_ram() areas in vread()
> Content-type: text/plain
> 
> Currently, vread can read out vmalloc areas who is associated with
> a vm_struct. While this doesn't work for areas created by vm_map_ram()
> interface because it doesn't allocate a vm_struct. Then in vread(),
> these areas will be skipped.
> 
> Pages are passed into vm_map_ram() and mapped onto frea vmap area,
> it should be safe to read them out. Change code to allow to read
> out these vm_map_ram() areas in vread().
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  mm/vmalloc.c | 15 +++++++--------
>  1 file changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index ccaa461998f3..f899ab784671 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -3526,7 +3526,7 @@ long vread(char *buf, char *addr, unsigned long count)
>  	struct vm_struct *vm;
>  	char *vaddr, *buf_start = buf;
>  	unsigned long buflen = count;
> -	unsigned long n;
> +	unsigned long n, size;
>  
>  	addr = kasan_reset_tag(addr);
>  
> @@ -3547,12 +3547,11 @@ long vread(char *buf, char *addr, unsigned long count)
>  		if (!count)
>  			break;
>  
> -		if (!va->vm)
> -			continue;
> -
>  		vm = va->vm;
> -		vaddr = (char *) vm->addr;
> -		if (addr >= vaddr + get_vm_area_size(vm))
> +		vaddr = (char *) va->va_start;
> +		size = vm ? get_vm_area_size(vm) : va_size(va);
> +
> +		if (addr >= vaddr + size)
>  			continue;
>  		while (addr < vaddr) {
>  			if (count == 0)
> @@ -3562,10 +3561,10 @@ long vread(char *buf, char *addr, unsigned long count)
>  			addr++;
>  			count--;
>  		}
> -		n = vaddr + get_vm_area_size(vm) - addr;
> +		n = vaddr + size - addr;
>  		if (n > count)
>  			n = count;
> -		if (!(vm->flags & VM_IOREMAP))
> +		if (!vm || !(vm->flags & VM_IOREMAP))
>  			aligned_vread(buf, addr, n);
>
What happens if during the read a page is unmapped by the
vm_unamp_ram()? I see that it concurrently can occur.

--
Uladzislau Rezki

next prev parent reply	other threads:[~2022-10-28  7:52 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-26 23:15 /proc/kcore reads 0's for vmap_block Stephen Brennan
2022-10-27  8:54 ` Baoquan He
2022-10-27 18:07   ` Stephen Brennan
2022-10-28  7:52   ` Uladzislau Rezki [this message]
2022-10-31  7:12     ` Baoquan He

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y1uKSmgURNEa3nQu@pc636 \
    --to=urezki@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=hch@infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=stephen.s.brennan@oracle.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox