All of lore.kernel.org
 help / color / mirror / Atom feed
From: Uladzislau Rezki <urezki@gmail.com>
To: Baoquan He <bhe@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	akpm@linux-foundation.org, stephen.s.brennan@oracle.com,
	urezki@gmail.com, hch@infradead.org
Subject: Re: [PATCH RFC 3/3] mm/vmalloc.c: allow vread() to read out vm_map_ram areas
Date: Wed, 30 Nov 2022 14:06:34 +0100	[thread overview]
Message-ID: <Y4dVWsmEUcjPBeYc@pc636> (raw)
In-Reply-To: <Y38+4tspZxRJj73p@MiWiFi-R3L-srv>

On Thu, Nov 24, 2022 at 05:52:34PM +0800, Baoquan He wrote:
> On 11/23/22 at 01:24pm, Matthew Wilcox wrote:
> > On Wed, Nov 23, 2022 at 11:38:54AM +0800, Baoquan He wrote:
> > > On 11/18/22 at 08:01am, Matthew Wilcox wrote:
> > > > On Wed, Nov 09, 2022 at 11:35:34AM +0800, Baoquan He wrote:
> > > > > Currently, vread() can read out vmalloc areas which is associated with
> > > > > a vm_struct. While this doesn't work for areas created by vm_map_ram()
> > > > > interface because it doesn't allocate a vm_struct. Then in vread(),
> > > > > these areas will be skipped.
> > > > > 
> > > > > Here, add a new function vb_vread() to read out areas managed by
> > > > > vmap_block specifically. Then recognize vm_map_ram areas via vmap->flags
> > > > > and handle  them respectively.
> > > > 
> > > > i don't understand how this deals with the original problem identified,
> > > > that the vread() can race with an unmap.
> > > 
> > > Thanks for checking.
> > > 
> > > I wrote a paragraph, then realized I misunderstood your concern. You are
> > > saying the comment from Uladzislau about my original draft patch, right?
> > > Paste the link of Uladzislau's reply here in case other people want to
> > > know the background:
> > > https://lore.kernel.org/all/Y1uKSmgURNEa3nQu@pc636/T/#u
> > > 
> > > When Stephen raised the issue originally, I posted a draft patch as
> > > below trying to fix it:
> > > https://lore.kernel.org/all/Y1pHTj2wuhoWmeV3@MiWiFi-R3L-srv/T/#u
> > > 
> > > In above draft patch, I tried to differentiate normal vmalloc area and
> > > vm_map_ram area with the fact that vmalloc area is associated with a
> > > vm_struct, while vm_map_ram area has ->vm as NULL. And I thought their
> > > only difference is normal vmalloc area has guard page, so its size need
> > > consider the guard page; while vm_map_ram area has no guard page, only
> > > consider its own actual size. Uladzislau's comment reminded me I was
> > > wrong. And the things we need handle are beyond that.
> > > 
> > > Currently there are three kinds of vmalloc areas in kernel:
> > > 
> > > 1) normal vmalloc areas, associated with a vm_struct, this is allocated 
> > > in __get_vm_area_node(). When freeing, it set ->vm to NULL
> > > firstly, then unmap and free vmap_area, see remove_vm_area().
> > > 
> > > 2) areas allocated via vm_map_ram() and size is larger than
> > > VMAP_MAX_ALLOC. The entire area is not associated with vm_struct, and
> > > freed at one time in vm_unmap_ram() with unmapping and freeing vmap_area;
> > > 
> > > 3) areas allocated via vm_map_ram(), then delegate to vb_alloc() when
> > > size <= VMAP_MAX_ALLOC. Its vmap_area is allocated at one time with
> > > VMAP_BLOCK_SIZE big, and split and used later through vb_alloc(), freed
> > > via vb_free(). When the entire area is dirty, it will be unmapped and
> > > freed.
> > > 
> > > Based on above facts, we need add flags to differentiate the normal
> > > vmalloc area from the vm_map_ram area, namely area 1) and 2). And we
> > > also need flags to differentiate the area 2) and 3). Because area 3) are
> > > pieces of a entire vmap_area, vb_free() will unmap the piece of area and
> > > set the part dirty, but the entire vmap_area will kept there. So when we
> > > will read area 3), we need take vb->lock and only read out the still
> > > mapped part, but not dirty or free part of the vmap_area.
> > 
> > I don't think you understand the problem.
> > 
> > Task A:			Task B:		Task C:
> > p = vm_map_ram()
> > 			vread(p);
> > 			... preempted ...
> > vm_unmap_ram(p);
> > 					q = vm_map_ram();
> > 			vread continues
> 
> 
> 
> > 
> > If C has reused the address space allocated by A, task B is now reading
> > the memory mapped by task C instead of task A.  If it hasn't, it's now
> > trying to read from unmapped, and quite possibly freed memory.  Which
> > might have been allocated by task D.
> 
> Hmm, it may not be like that.
> 
> Firstly, I would remind that vread() takes vmap_area_lock during the
> whole reading process. Before this patchset, the vread() and other area
> manipulation will have below status:
> 1) __get_vm_area_node() could only finish insert_vmap_area(), then free
> the vmap_area_lock, then warting;
> 2) __get_vm_area_node() finishs setup_vmalloc_vm()
>   2.1) doing mapping but not finished;
>   2.2) clear_vm_uninitialized_flag() is called after mapping is done;
> 3) remove_vm_area() is called to set -> = NULL, then free vmap_area_lock;
> 
> Task A:			   Task B:		     Task C:
> p = __get_vm_area_node()
> remove_vm_area(p);
> 			   vread(p);
> 
> 			   vread end 
> 					     q = __get_vm_area_node();
> 
> So, as you can see, the checking "if (!va->vm)" in vread() will filter
> out vmap_area:
> a) areas if only insert_vmap_area() is called, but ->vm is still NULL; 
> b) areas if remove_vm_area() is called to clear ->vm to NULL;
> c) areas created through vm_map_ram() since its ->vm is always NULL;
> 
> Means vread() will read out vmap_area:
> d) areas if setup_vmalloc_vm() is called;
>   1) mapping is done on areas, e.g clear_vm_uninitialized_flag() is
>        called;
>   2) mapping is being handled, just after returning from setup_vmalloc_vm();
> 
> 
> ******* after this patchset applied:
> 
> Task A:			Task B:		Task C:
> p = vm_map_ram()
> vm_unmap_ram(p);
> 			vread(p);
>                          vb_vread()
> 			vread end 
> 
> 					q = vm_map_ram();
> 
> With this patchset applied, other than normal areas, for the
> vm_map_ram() areas:
> 1) In vm_map_ram(), set vmap_area->flags = VMAP_RAM when vmap_area_lock
>    is taken; In vm_unmap_ram(), clear it wiht "va->flags &= ~VMAP_RAM"
>    when vmap_area_lock is taken;
> 2) If vmap_block, set va->flags = VMAP_RAM|VMAP_BLOCK; And set
>    vmap_block->used_map to track the used region, filter out the dirty
>    and free region;
> 3) In vb_vread(), we take vb->lock to avoid reading out dirty regions.
> 
> Please help point out what is wrong or I missed.
> 
One thing is we still can read-out un-mapped pages, i.e. a text instead:

<snip>
static void vb_free(unsigned long addr, unsigned long size)
{
	unsigned long offset;
	unsigned int order;
	struct vmap_block *vb;

	BUG_ON(offset_in_page(size));
	BUG_ON(size > PAGE_SIZE*VMAP_MAX_ALLOC);

	flush_cache_vunmap(addr, addr + size);

	order = get_order(size);
	offset = (addr & (VMAP_BLOCK_SIZE - 1)) >> PAGE_SHIFT;
	vb = xa_load(&vmap_blocks, addr_to_vb_idx(addr));

	vunmap_range_noflush(addr, addr + size);

	if (debug_pagealloc_enabled_static())
		flush_tlb_kernel_range(addr, addr + size);

	spin_lock(&vb->lock);
...
<snip>

or am i missing something? Is it a problem? It might be. Another thing
it would be good if you upload a new patchset so it is easier to review
it.

Thanks!

--
Uladzislau Rezki


  reply	other threads:[~2022-11-30 13:06 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-09  3:35 [PATCH RFC 0/3] mm/vmalloc.c: allow vread() to read out vm_map_ram areas Baoquan He
2022-11-09  3:35 ` [PATCH RFC 1/3] mm/vmalloc.c: add used_map into vmap_block to track space of vmap_block Baoquan He
2022-11-09  3:35 ` [PATCH RFC 2/3] mm/vmalloc.c: add flags to mark vm_map_ram area Baoquan He
2022-11-09  3:35 ` [PATCH RFC 3/3] mm/vmalloc.c: allow vread() to read out vm_map_ram areas Baoquan He
2022-11-10  0:59   ` Stephen Brennan
2022-11-10 10:23     ` Baoquan He
2022-11-10 18:48       ` Stephen Brennan
2022-11-14 10:06         ` Baoquan He
2022-11-11  9:48   ` kernel test robot
2022-11-18  8:01   ` Matthew Wilcox
2022-11-23  3:38     ` Baoquan He
2022-11-23 13:24       ` Matthew Wilcox
2022-11-24  9:52         ` Baoquan He
2022-11-30 13:06           ` Uladzislau Rezki [this message]
2022-12-01  4:46             ` Baoquan He

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y4dVWsmEUcjPBeYc@pc636 \
    --to=urezki@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=stephen.s.brennan@oracle.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.