All of lore.kernel.org
 help / color / mirror / Atom feed
From: Uladzislau Rezki <urezki@gmail.com>
To: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
	Baoquan He <bhe@redhat.com>,
	Christoph Hellwig <hch@infradead.org>,
	Matthew Wilcox <willy@infradead.org>,
	Dave Chinner <david@fromorbit.com>,
	Oleksiy Avramchenko <oleksiy.avramchenko@sony.com>
Subject: Re: [PATCH v3 1/2] mm: vmalloc: Remove a global vmap_blocks xarray
Date: Tue, 28 Mar 2023 14:51:40 +0200	[thread overview]
Message-ID: <ZCLi3K839DFzWFW8@pc636> (raw)
In-Reply-To: <132e2d5c-0c1f-4fff-850c-b3fb084455bb@lucifer.local>

On Mon, Mar 27, 2023 at 09:09:32PM +0100, Lorenzo Stoakes wrote:
> On Mon, Mar 27, 2023 at 07:01:25PM +0200, Uladzislau Rezki (Sony) wrote:
> > A global vmap_blocks-xarray array can be contented under
> > heavy usage of the vm_map_ram()/vm_unmap_ram() APIs. The
> > lock_stat shows that a "vmap_blocks.xa_lock" lock is a
> > second in a top-list when it comes to contentions:
> >
> > <snip>
> > ----------------------------------------
> > class name con-bounces contentions ...
> > ----------------------------------------
> > vmap_area_lock:         2554079 2554276 ...
> >   --------------
> >   vmap_area_lock        1297948  [<00000000dd41cbaa>] alloc_vmap_area+0x1c7/0x910
> >   vmap_area_lock        1256330  [<000000009d927bf3>] free_vmap_block+0x4a/0xe0
> >   vmap_area_lock              1  [<00000000c95c05a7>] find_vm_area+0x16/0x70
> >   --------------
> >   vmap_area_lock        1738590  [<00000000dd41cbaa>] alloc_vmap_area+0x1c7/0x910
> >   vmap_area_lock         815688  [<000000009d927bf3>] free_vmap_block+0x4a/0xe0
> >   vmap_area_lock              1  [<00000000c1d619d7>] __get_vm_area_node+0xd2/0x170
> >
> > vmap_blocks.xa_lock:    862689  862698 ...
> >   -------------------
> >   vmap_blocks.xa_lock   378418    [<00000000625a5626>] vm_map_ram+0x359/0x4a0
> >   vmap_blocks.xa_lock   484280    [<00000000caa2ef03>] xa_erase+0xe/0x30
> >   -------------------
> >   vmap_blocks.xa_lock   576226    [<00000000caa2ef03>] xa_erase+0xe/0x30
> >   vmap_blocks.xa_lock   286472    [<00000000625a5626>] vm_map_ram+0x359/0x4a0
> > ...
> > <snip>
> >
> > that is a result of running vm_map_ram()/vm_unmap_ram() in
> > a loop. The test creates 64(on 64 CPUs system) threads and
> > each one maps/unmaps 1 page.
> >
> > After this change the "xa_lock" can be considered as a noise
> > in the same test condition:
> >
> > <snip>
> > ...
> > &xa->xa_lock#1:         10333 10394 ...
> >   --------------
> >   &xa->xa_lock#1        5349      [<00000000bbbc9751>] xa_erase+0xe/0x30
> >   &xa->xa_lock#1        5045      [<0000000018def45d>] vm_map_ram+0x3a4/0x4f0
> >   --------------
> >   &xa->xa_lock#1        7326      [<0000000018def45d>] vm_map_ram+0x3a4/0x4f0
> >   &xa->xa_lock#1        3068      [<00000000bbbc9751>] xa_erase+0xe/0x30
> > ...
> > <snip>
> >
> > This patch does not fix vmap_area_lock/free_vmap_area_lock and
> > purge_vmap_area_lock bottle-necks, it is rather a separate rework.
> >
> > v1 - v2:
> >    - Add more comments(Andrew Morton req.)
> >    - Switch to WARN_ON_ONCE(Lorenzo Stoakes req.)
> >
> > v2 -> v3:
> >    - Fix a kernel-doc complain(Matthew Wilcox)
> >
> > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > ---
> >  mm/vmalloc.c | 85 +++++++++++++++++++++++++++++++++++++++-------------
> >  1 file changed, 64 insertions(+), 21 deletions(-)
> >
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index 978194dc2bb8..821256ecf81c 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -1908,9 +1908,22 @@ static struct vmap_area *find_unlink_vmap_area(unsigned long addr)
> >  #define VMAP_BLOCK		0x2 /* mark out the vmap_block sub-type*/
> >  #define VMAP_FLAGS_MASK		0x3
> >
> > +/*
> > + * We should probably have a fallback mechanism to allocate virtual memory
> > + * out of partially filled vmap blocks. However vmap block sizing should be
> > + * fairly reasonable according to the vmalloc size, so it shouldn't be a
> > + * big problem.
> > + */
> >  struct vmap_block_queue {
> >  	spinlock_t lock;
> >  	struct list_head free;
> > +
> > +	/*
> > +	 * An xarray requires an extra memory dynamically to
> > +	 * be allocated. If it is an issue, we can use rb-tree
> > +	 * instead.
> > +	 */
> > +	struct xarray vmap_blocks;
> >  };
> >
> >  struct vmap_block {
> > @@ -1928,24 +1941,46 @@ struct vmap_block {
> >  static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
> >
> >  /*
> > - * XArray of vmap blocks, indexed by address, to quickly find a vmap block
> > - * in the free path. Could get rid of this if we change the API to return a
> > - * "cookie" from alloc, to be passed to free. But no big deal yet.
> > + * In order to fast access to any "vmap_block" associated with a
> > + * specific address, we store them into a per-cpu xarray. A hash
> > + * function is addr_to_vbq() whereas a key is a vb->va->va_start
> > + * value.
> > + *
> > + * Please note, a vmap_block_queue, which is a per-cpu, is not
> > + * serialized by a raw_smp_processor_id() current CPU, instead
> > + * it is chosen based on a CPU-index it belongs to, i.e. it is
> > + * a hash-table.
> > + *
> > + * An example:
> > + *
> > + *  CPU_1  CPU_2  CPU_0
> > + *    |      |      |
> > + *    V      V      V
> > + * 0     10     20     30     40     50     60
> > + * |------|------|------|------|------|------|...<vmap address space>
> > + *   CPU0   CPU1   CPU2   CPU0   CPU1   CPU2
> > + *
> > + * - CPU_1 invokes vm_unmap_ram(6), 6 belongs to CPU0 zone, thus
> > + *   it access: CPU0/INDEX0 -> vmap_blocks -> xa_lock;
> > + *
> > + * - CPU_2 invokes vm_unmap_ram(11), 11 belongs to CPU1 zone, thus
> > + *   it access: CPU1/INDEX1 -> vmap_blocks -> xa_lock;
> > + *
> > + * - CPU_0 invokes vm_unmap_ram(20), 20 belongs to CPU2 zone, thus
> > + *   it access: CPU2/INDEX2 -> vmap_blocks -> xa_lock.
> >   */
> 
> OK so if I understand this correctly, you're overloading the per-CPU
> vmap_block_queue array to use as a simple hash based on the address and
> relying on the xa_lock() in xa_insert() to serialise in case of contention?
> 
> I like the general heft of your comment but I feel this could be spelled
> out a little more clearly, something like:-
> 
>   In order to have fast access to any vmap_block object associated with a
>   specific address, we use a hash.
> 
>   Rather than waste space on defining a new hash table  we take advantage
>   of the fact we already have a static per-cpu array vmap_block_queue.
> 
>   This is already used for per-CPU access to the block queue, however we
>   overload this to _also_ act as a vmap_block hash. The hash function is
>   addr_to_vbq() which hashes on vb->va->va_start.
> 
>   This then uses per_cpu() to lookup the _index_ rather than the
>   _cpu_. Each vmap_block_queue contains an xarray of vmap blocks which are
>   indexed on the same key as the hash (vb->va->va_start).
> 
>   xarray read acceses are protected by RCU lock and inserts are protected
>   by a spin lock so there is no risk of a race here.
> 
>   An example:
> 
>   ...
> 
> Feel free to cut this down as needed :) but I do feel it's important to
> _explicitly_ point out that we're overloading this as it's quite confusing
> at face value.
> 
> > -static DEFINE_XARRAY(vmap_blocks);
> > +static struct vmap_block_queue *
> > +addr_to_vbq(unsigned long addr)
> > +{
> > +	int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> >
> > -/*
> > - * We should probably have a fallback mechanism to allocate virtual memory
> > - * out of partially filled vmap blocks. However vmap block sizing should be
> > - * fairly reasonable according to the vmalloc size, so it shouldn't be a
> > - * big problem.
> > - */
> > +	return &per_cpu(vmap_block_queue, index);
> > +}
> >
> > -static unsigned long addr_to_vb_idx(unsigned long addr)
> > +static unsigned long
> > +addr_to_vb_va_start(unsigned long addr)
> >  {
> > -	addr -= VMALLOC_START & ~(VMAP_BLOCK_SIZE-1);
> > -	addr /= VMAP_BLOCK_SIZE;
> > -	return addr;
> > +	return rounddown(addr, VMAP_BLOCK_SIZE);
> >  }
> >
> >  static void *vmap_block_vaddr(unsigned long va_start, unsigned long pages_off)
> > @@ -1953,7 +1988,7 @@ static void *vmap_block_vaddr(unsigned long va_start, unsigned long pages_off)
> >  	unsigned long addr;
> >
> >  	addr = va_start + (pages_off << PAGE_SHIFT);
> > -	BUG_ON(addr_to_vb_idx(addr) != addr_to_vb_idx(va_start));
> > +	WARN_ON_ONCE(addr_to_vb_va_start(addr) != va_start);
> >  	return (void *)addr;
> >  }
> >
> > @@ -1970,7 +2005,6 @@ static void *new_vmap_block(unsigned int order, gfp_t gfp_mask)
> >  	struct vmap_block_queue *vbq;
> >  	struct vmap_block *vb;
> >  	struct vmap_area *va;
> > -	unsigned long vb_idx;
> >  	int node, err;
> >  	void *vaddr;
> >
> > @@ -2003,8 +2037,8 @@ static void *new_vmap_block(unsigned int order, gfp_t gfp_mask)
> >  	bitmap_set(vb->used_map, 0, (1UL << order));
> >  	INIT_LIST_HEAD(&vb->free_list);
> >
> > -	vb_idx = addr_to_vb_idx(va->va_start);
> > -	err = xa_insert(&vmap_blocks, vb_idx, vb, gfp_mask);
> > +	vbq = addr_to_vbq(va->va_start);
> > +	err = xa_insert(&vbq->vmap_blocks, va->va_start, vb, gfp_mask);
> 
> I might be being pedantic here, but shortly after this code you reassign vbq:-
> 
> 	vbq = addr_to_vbq(va->va_start);
> 	err = xa_insert(&vbq->vmap_blocks, va->va_start, vb, gfp_mask);
> 	if (err) {
> 		kfree(vb);
> 		free_vmap_area(va);
> 		return ERR_PTR(err);
> 	}
> 
> 	vbq = raw_cpu_ptr(&vmap_block_queue);
> 
> Which is confusing at a glance, as you're using it once as a hash lookup
> and again for its 'true purpose'.
> 
> I wonder whether it would be better overall, since you always follow a vbq
> lookup explicitly with an operation on vmap_blocks, to just add a helper
> that returned a pointer to the xarray? e.g. (untested code here :):-
> 
> static struct xarray *get_vblock_array(unsigned long addr)
> {
> 	struct vmap_block_queue *vbq;
> 	int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> 
> 	vbq = &per_cpu(vmap_block_queue, index);
> 	return &vbq->vblocks;
> }
> 
> And replace addr_to_vbq() with this. That'd also make the mechanism of this
> hash lookup super explicit.
> 
Thank you for the comments. I will go through all of them and fix
accordingly. At lease i see that i have to update the documentation in
more better way!

Thanks!

--
Uladzislau Rezki


  reply	other threads:[~2023-03-28 12:51 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-27 17:01 [PATCH v3 1/2] mm: vmalloc: Remove a global vmap_blocks xarray Uladzislau Rezki (Sony)
2023-03-27 17:01 ` [PATCH v3 2/2] lib/test_vmalloc.c: Add vm_map_ram()/vm_unmap_ram() test case Uladzislau Rezki (Sony)
2023-03-27 20:28   ` Lorenzo Stoakes
2023-03-28 12:29     ` Uladzislau Rezki
2023-03-27 20:09 ` [PATCH v3 1/2] mm: vmalloc: Remove a global vmap_blocks xarray Lorenzo Stoakes
2023-03-28 12:51   ` Uladzislau Rezki [this message]
2023-03-28 16:37   ` Uladzislau Rezki
2023-03-29 15:01   ` Uladzislau Rezki
2023-03-29 16:23     ` Lorenzo Stoakes
2023-03-29 17:50       ` Uladzislau Rezki
2023-03-28  3:25 ` Baoquan He
2023-03-28 12:34   ` Uladzislau Rezki
2023-03-29  4:33     ` Baoquan He
2023-03-29  6:54       ` Uladzislau Rezki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZCLi3K839DFzWFW8@pc636 \
    --to=urezki@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lstoakes@gmail.com \
    --cc=oleksiy.avramchenko@sony.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.