All of lore.kernel.org
 help / color / mirror / Atom feed
From: Uladzislau Rezki <urezki@gmail.com>
To: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Uladzislau Rezki <urezki@gmail.com>,
	linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>, Baoquan He <bhe@redhat.com>,
	Christoph Hellwig <hch@infradead.org>,
	Matthew Wilcox <willy@infradead.org>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Dave Chinner <david@fromorbit.com>,
	"Paul E . McKenney" <paulmck@kernel.org>,
	Joel Fernandes <joel@joelfernandes.org>,
	Oleksiy Avramchenko <oleksiy.avramchenko@sony.com>
Subject: Re: [PATCH v3 04/11] mm: vmalloc: Remove global vmap_area_root rb-tree
Date: Mon, 22 Jan 2024 18:44:34 +0100	[thread overview]
Message-ID: <Za6pgt2j1n4PTcGI@pc636> (raw)
In-Reply-To: <2c318a40-9e0f-4d24-b5cc-e712f7b2c334@lucifer.local>

On Sat, Jan 20, 2024 at 12:55:10PM +0000, Lorenzo Stoakes wrote:
> On Thu, Jan 18, 2024 at 02:15:31PM +0100, Uladzislau Rezki wrote:
> 
> [snip]
> 
> >
> > > > +	struct rb_root root;
> > > > +	struct list_head head;
> > > > +	spinlock_t lock;
> > > > +};
> > > > +
> > > > +static struct vmap_node {
> > > > +	/* Bookkeeping data of this node. */
> > > > +	struct rb_list busy;
> > > > +} single;
> > >
> > > This may be a thing about encapsulation/naming or similar, but I'm a little
> > > confused as to why the rb_list type is maintained as a field rather than
> > > its fields embedded?
> > >
> > The "struct vmap_node" will be extended by the following patches in the
> > series.
> >
> 
> Yeah sorry I missed this, only realising after I sent...!
> 
> > > > +
> > > > +static struct vmap_node *vmap_nodes = &single;
> > > > +static __read_mostly unsigned int nr_vmap_nodes = 1;
> > > > +static __read_mostly unsigned int vmap_zone_size = 1;
> > >
> > > It might be worth adding a comment here explaining that we're binding to a
> > > single node for now to maintain existing behaviour (and a brief description
> > > of what these values mean - for instance what unit vmap_zone_size is
> > > expressed in?)
> > >
> > Right. Agree on it :)
> >
> 
> Indeed :)
> 
> [snip]
> 
> > > >  /* Look up the first VA which satisfies addr < va_end, NULL if none. */
> > > > -static struct vmap_area *find_vmap_area_exceed_addr(unsigned long addr)
> > > > +static struct vmap_area *
> > > > +find_vmap_area_exceed_addr(unsigned long addr, struct rb_root *root)
> > > >  {
> > > >  	struct vmap_area *va = NULL;
> > > > -	struct rb_node *n = vmap_area_root.rb_node;
> > > > +	struct rb_node *n = root->rb_node;
> > > >
> > > >  	addr = (unsigned long)kasan_reset_tag((void *)addr);
> > > >
> > > > @@ -1552,12 +1583,14 @@ __alloc_vmap_area(struct rb_root *root, struct list_head *head,
> > > >   */
> > > >  static void free_vmap_area(struct vmap_area *va)
> > > >  {
> > > > +	struct vmap_node *vn = addr_to_node(va->va_start);
> > > > +
> > >
> > > I'm being nitty here, and while I know it's a vmalloc convention to use
> > > 'va' and 'vm', perhaps we can break away from the super short variable name
> > > convention and use 'vnode' or something for these values?
> > >
> > > I feel people might get confused between 'vm' and 'vn' for instance.
> > >
> > vnode, varea?
> 
> I think 'vm' and 'va' are fine, just scanning through easy to mistake 'vn'
> and 'vm'. Obviously a litle nitpicky! You could replace all but a bit
> churny, so I think vn -> vnode works best imo.
> 
> [snip]
> 
> > > >  struct vmap_area *find_vmap_area(unsigned long addr)
> > > >  {
> > > > +	struct vmap_node *vn;
> > > >  	struct vmap_area *va;
> > > > +	int i, j;
> > > >
> > > > -	spin_lock(&vmap_area_lock);
> > > > -	va = __find_vmap_area(addr, &vmap_area_root);
> > > > -	spin_unlock(&vmap_area_lock);
> > > > +	/*
> > > > +	 * An addr_to_node_id(addr) converts an address to a node index
> > > > +	 * where a VA is located. If VA spans several zones and passed
> > > > +	 * addr is not the same as va->va_start, what is not common, we
> > > > +	 * may need to scan an extra nodes. See an example:
> > >
> > > For my understading when you say 'scan an extra nodes' do you mean scan
> > > just 1 extra node, or multiple? If the former I'd replace this with 'may
> > > need to scan an extra node' if the latter then 'may ened to scan extra
> > > nodes'.
> > >
> > > It's a nitty language thing, but also potentially changes the meaning of
> > > this!
> > >
> > Typo, i should replace it to: scan extra nodes.
> 
> Thanks.
> 
> >
> > > > +	 *
> > > > +	 *      <--va-->
> > > > +	 * -|-----|-----|-----|-----|-
> > > > +	 *     1     2     0     1
> > > > +	 *
> > > > +	 * VA resides in node 1 whereas it spans 1 and 2. If passed
> > > > +	 * addr is within a second node we should do extra work. We
> > > > +	 * should mention that it is rare and is a corner case from
> > > > +	 * the other hand it has to be covered.
> > >
> > > A very minor language style nit, but you've already said this is not
> > > common, I don't think you need this 'We should mention...' bit. It's not a
> > > big deal however!
> > >
> > No problem. We can remove it!
> 
> Thanks.
> 
> >
> > > > +	 */
> > > > +	i = j = addr_to_node_id(addr);
> > > > +	do {
> > > > +		vn = &vmap_nodes[i];
> > > >
> > > > -	return va;
> > > > +		spin_lock(&vn->busy.lock);
> > > > +		va = __find_vmap_area(addr, &vn->busy.root);
> > > > +		spin_unlock(&vn->busy.lock);
> > > > +
> > > > +		if (va)
> > > > +			return va;
> > > > +	} while ((i = (i + 1) % nr_vmap_nodes) != j);
> > >
> > > If you comment above suggests that only 1 extra node might need to be
> > > scanned, should we stop after one iteration?
> > >
> > Not really. Though we can improve it further to scan backward.
> 
> I think it'd be good to clarify in the comment above that the VA could span
> more than 1 node then, as the diagram seems to imply only 1 (I think just
> simply because of the example you were showing).
> 
> [snip]
> 
> > > >  static struct vmap_area *find_unlink_vmap_area(unsigned long addr)
> > > >  {
> > > > +	struct vmap_node *vn;
> > > >  	struct vmap_area *va;
> > > > +	int i, j;
> > > >
> > > > -	spin_lock(&vmap_area_lock);
> > > > -	va = __find_vmap_area(addr, &vmap_area_root);
> > > > -	if (va)
> > > > -		unlink_va(va, &vmap_area_root);
> > > > -	spin_unlock(&vmap_area_lock);
> > > > +	i = j = addr_to_node_id(addr);
> > > > +	do {
> > > > +		vn = &vmap_nodes[i];
> > > >
> > > > -	return va;
> > > > +		spin_lock(&vn->busy.lock);
> > > > +		va = __find_vmap_area(addr, &vn->busy.root);
> > > > +		if (va)
> > > > +			unlink_va(va, &vn->busy.root);
> > > > +		spin_unlock(&vn->busy.lock);
> > > > +
> > > > +		if (va)
> > > > +			return va;
> > > > +	} while ((i = (i + 1) % nr_vmap_nodes) != j);
> > >
> > > Maybe worth adding a comment saying to refer to the comment in
> > > find_vmap_area() to see why this loop is necessary.
> > >
> > OK. We can do it to make it better for reading.
> 
> Thanks!
> 
> [snip]
> 
> > > > @@ -3728,8 +3804,11 @@ long vread_iter(struct iov_iter *iter, const char *addr, size_t count)
> > >
> > > Unrelated to your change but makes me feel a little unwell to see 'const
> > > char *addr'! Can we change this at some point? Or maybe I can :)
> > >
> > You are welcome :)
> 
> Haha ;) yes I think I might tbh, I have noted it down.
> 
> >
> > > >
> > > >  	remains = count;
> > > >
> > > > -	spin_lock(&vmap_area_lock);
> > > > -	va = find_vmap_area_exceed_addr((unsigned long)addr);
> > > > +	/* Hooked to node_0 so far. */
> > > > +	vn = addr_to_node(0);
> > >
> > > Why can't we use addr for this call? We already enforce the node-0 only
> > > thing by setting nr_vmap_nodes to 1 right? And won't this be potentially
> > > subtly wrong when we later increase this?
> > >
> > I used to have 0 here. But please note, it is changed by the next patch in
> > this series.
> 
> Yeah sorry, again hadn't noticed this.
> 
> [snip]
> 
> > > > +		spin_lock(&vn->busy.lock);
> > > > +		insert_vmap_area(vas[area], &vn->busy.root, &vn->busy.head);
> > > >  		setup_vmalloc_vm_locked(vms[area], vas[area], VM_ALLOC,
> > > >  				 pcpu_get_vm_areas);
> > > > +		spin_unlock(&vn->busy.lock);
> > >
> > > Hmm, before we were locking/unlocking once before the loop, now we're
> > > locking on each iteration, this seems inefficient.
> > >
> > > Seems like we need logic like:
> > >
> > > /* ... something to check nr_vms > 0 ... */
> > > struct vmap_node *last_node = NULL;
> > >
> > > for (...) {
> > > 	struct vmap_node *vnode = addr_to_node(vas[area]->va_start);
> > >
> > > 	if (vnode != last_node) {
> > > 		spin_unlock(last_node->busy.lock);
> > > 		spin_lock(vnode->busy.lock);
> > > 		last_node = vnode;
> > > 	}
> > >
> > > 	...
> > > }
> > >
> > > if (last_node)
> > > 	spin_unlock(last_node->busy.lock);
> > >
> > > To minimise the lock twiddling. What do you think?
> > >
> > This per-cpu-allocator prefetches several VA units per-cpu. I do not
> > find it as critical because it is not a hot path for the per-cpu allocator.
> > When its buffers are exhausted it does an extra prefetch. So it is not
> > frequent.
> 
> OK, sure I mean this is simpler and more readable so if not a huge perf
> concern then not a big deal.
> 
> >
> > >
> > > >  	}
> > > > -	spin_unlock(&vmap_area_lock);
> > > >
> > > >  	/*
> > > >  	 * Mark allocated areas as accessible. Do it now as a best-effort
> > > > @@ -4253,55 +4333,57 @@ bool vmalloc_dump_obj(void *object)
> > > >  {
> > > >  	void *objp = (void *)PAGE_ALIGN((unsigned long)object);
> > > >  	const void *caller;
> > > > -	struct vm_struct *vm;
> > > >  	struct vmap_area *va;
> > > > +	struct vmap_node *vn;
> > > >  	unsigned long addr;
> > > >  	unsigned int nr_pages;
> > > > +	bool success = false;
> > > >
> > > > -	if (!spin_trylock(&vmap_area_lock))
> > > > -		return false;
> > >
> > > Nitpick on style for this, I really don't know why you are removing this
> > > early exit? It's far neater to have a guard clause than to nest a whole
> > > bunch of code below.
> > >
> > Hm... I can return back as it used to be. I do not have a strong opinion here.
> 
> Yeah that'd be ideal just for readability.
> 
> [snip the rest as broadly fairly trivial comment stuff on which we agree]
> 
> >
> > Thank you for the review! I can fix the comments as separate patches if
> > no objections.
> 
> Yes, overall it's style/comment improvement stuff nothing major, feel free
> to send as follow-up patches.
> 
> I don't want to hold anything up here so for the rest, feel free to add:
> 
> Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>
> 
Appreciate! I will go through again and send out the patch that adds
more detailed explanation as requested in this review.

Again, thank you!

--
Uladzislau Rezki


  reply	other threads:[~2024-01-22 17:44 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-02 18:46 [PATCH v3 00/11] Mitigate a vmap lock contention v3 Uladzislau Rezki (Sony)
2024-01-02 18:46 ` [PATCH v3 01/11] mm: vmalloc: Add va_alloc() helper Uladzislau Rezki (Sony)
2024-01-02 18:46 ` [PATCH v3 02/11] mm: vmalloc: Rename adjust_va_to_fit_type() function Uladzislau Rezki (Sony)
2024-01-02 18:46 ` [PATCH v3 03/11] mm: vmalloc: Move vmap_init_free_space() down in vmalloc.c Uladzislau Rezki (Sony)
2024-01-02 18:46 ` [PATCH v3 04/11] mm: vmalloc: Remove global vmap_area_root rb-tree Uladzislau Rezki (Sony)
2024-01-05  8:10   ` Wen Gu
2024-01-05 10:50     ` Uladzislau Rezki
2024-01-06  9:17       ` Wen Gu
2024-01-06 16:36         ` Uladzislau Rezki
2024-01-07  6:59           ` Hillf Danton
2024-01-08  7:45             ` Wen Gu
2024-01-08 18:37               ` Uladzislau Rezki
2024-01-16 23:25   ` Lorenzo Stoakes
2024-01-18 13:15     ` Uladzislau Rezki
2024-01-20 12:55       ` Lorenzo Stoakes
2024-01-22 17:44         ` Uladzislau Rezki [this message]
2024-01-02 18:46 ` [PATCH v3 05/11] mm/vmalloc: remove vmap_area_list Uladzislau Rezki (Sony)
2024-01-16 23:36   ` Lorenzo Stoakes
2024-01-02 18:46 ` [PATCH v3 06/11] mm: vmalloc: Remove global purge_vmap_area_root rb-tree Uladzislau Rezki (Sony)
2024-01-02 18:46 ` [PATCH v3 07/11] mm: vmalloc: Offload free_vmap_area_lock lock Uladzislau Rezki (Sony)
2024-01-03 11:08   ` Hillf Danton
2024-01-03 15:47     ` Uladzislau Rezki
2024-01-11  9:02   ` Dave Chinner
2024-01-11 15:54     ` Uladzislau Rezki
2024-01-11 20:37       ` Dave Chinner
2024-01-12 12:18         ` Uladzislau Rezki
2024-01-16 22:12           ` Dave Chinner
2024-01-18 18:15             ` Uladzislau Rezki
2024-02-08  0:25   ` Baoquan He
2024-02-08 13:57     ` Uladzislau Rezki
2024-02-28  9:48   ` Baoquan He
2024-02-28 10:39     ` Uladzislau Rezki
2024-02-28 12:26       ` Baoquan He
2024-03-22 18:21   ` Guenter Roeck
2024-03-22 19:03     ` Uladzislau Rezki
2024-03-22 20:53       ` Guenter Roeck
2024-01-02 18:46 ` [PATCH v3 08/11] mm: vmalloc: Support multiple nodes in vread_iter Uladzislau Rezki (Sony)
2024-01-02 18:46 ` [PATCH v3 09/11] mm: vmalloc: Support multiple nodes in vmallocinfo Uladzislau Rezki (Sony)
2024-01-02 18:46 ` [PATCH v3 10/11] mm: vmalloc: Set nr_nodes based on CPUs in a system Uladzislau Rezki (Sony)
2024-01-11  9:25   ` Dave Chinner
2024-01-15 19:09     ` Uladzislau Rezki
2024-01-16 22:06       ` Dave Chinner
2024-01-18 18:23         ` Uladzislau Rezki
2024-01-18 21:28           ` Dave Chinner
2024-01-19 10:32             ` Uladzislau Rezki
2024-01-02 18:46 ` [PATCH v3 11/11] mm: vmalloc: Add a shrinker to drain vmap pools Uladzislau Rezki (Sony)
2024-02-22  8:35 ` [PATCH v3 00/11] Mitigate a vmap lock contention v3 Uladzislau Rezki
2024-02-22 23:15   ` Pedro Falcato
2024-02-23  9:34     ` Uladzislau Rezki
2024-02-23 10:26       ` Baoquan He
2024-02-23 11:06         ` Uladzislau Rezki
2024-02-23 15:57           ` Baoquan He
2024-02-23 18:55             ` Uladzislau Rezki
2024-02-28  9:27               ` Baoquan He
2024-02-29 10:38                 ` Uladzislau Rezki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Za6pgt2j1n4PTcGI@pc636 \
    --to=urezki@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=joel@joelfernandes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lstoakes@gmail.com \
    --cc=oleksiy.avramchenko@sony.com \
    --cc=paulmck@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.