Re: [PATCH v2 2/4] mm: vmalloc: use rwsem, mutex for vmap_area_lock and vmap_block->lock

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Uladzislau Rezki <urezki@gmail.com>
To: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Uladzislau Rezki <urezki@gmail.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Baoquan He <bhe@redhat.com>, Matthew Wilcox <willy@infradead.org>,
	David Hildenbrand <david@redhat.com>,
	Liu Shixin <liushixin2@huawei.com>, Jiri Olsa <jolsa@kernel.org>
Subject: Re: [PATCH v2 2/4] mm: vmalloc: use rwsem, mutex for vmap_area_lock and vmap_block->lock
Date: Mon, 20 Mar 2023 12:20:02 +0100	[thread overview]
Message-ID: <ZBhBRCIyc5Scx1Kf@pc636> (raw)
In-Reply-To: <cd91527d-e6e0-4900-a368-dfc9812546da@lucifer.local>

On Mon, Mar 20, 2023 at 08:35:11AM +0000, Lorenzo Stoakes wrote:
> On Mon, Mar 20, 2023 at 09:32:06AM +0100, Uladzislau Rezki wrote:
> > On Mon, Mar 20, 2023 at 08:25:32AM +0000, Lorenzo Stoakes wrote:
> > > On Mon, Mar 20, 2023 at 08:54:33AM +0100, Uladzislau Rezki wrote:
> > > > > vmalloc() is, by design, not permitted to be used in atomic context and
> > > > > already contains components which may sleep, so avoiding spin locks is not
> > > > > a problem from the perspective of atomic context.
> > > > >
> > > > > The global vmap_area_lock is held when the red/black tree rooted in
> > > > > vmap_are_root is accessed and thus is rather long-held and under
> > > > > potentially high contention. It is likely to be under contention for reads
> > > > > rather than write, so replace it with a rwsem.
> > > > >
> > > > > Each individual vmap_block->lock is likely to be held for less time but
> > > > > under low contention, so a mutex is not an outrageous choice here.
> > > > >
> > > > > A subset of test_vmalloc.sh performance results:-
> > > > >
> > > > > fix_size_alloc_test             0.40%
> > > > > full_fit_alloc_test		2.08%
> > > > > long_busy_list_alloc_test	0.34%
> > > > > random_size_alloc_test		-0.25%
> > > > > random_size_align_alloc_test	0.06%
> > > > > ...
> > > > > all tests cycles                0.2%
> > > > >
> > > > > This represents a tiny reduction in performance that sits barely above
> > > > > noise.
> > > > >
> > > > How important to have many simultaneous users of vread()? I do not see a
> > > > big reason to switch into mutexes due to performance impact and making it
> > > > less atomic.
> > >
> > > It's less about simultaneous users of vread() and more about being able to write
> > > direct to user memory rather than via a bounce buffer and not hold a spinlock
> > > over possible page faults.
> > >
> > > The performance impact is barely above noise (I got fairly widely varying
> > > results), so I don't think it's really much of a cost at all. I can't imagine
> > > there are many users critically dependent on a sub-single digit % reduction in
> > > speed in vmalloc() allocation.
> > >
> > > As I was saying to Willy, the code is already not atomic, or rather needs rework
> > > to become atomic-safe (there are a smattering of might_sleep()'s throughout)
> > >
> > > However, given your objection alongside Willy's, let me examine Willy's
> > > suggestion that we instead of doing this, prefault the user memory in advance of
> > > the vread call.
> > >
> > Just a quick perf tests shows regression around 6%. 10 workers test_mask is 31:
> >
> > # default
> > [  140.349731] All test took worker0=485061693537 cycles
> > [  140.386065] All test took worker1=486504572954 cycles
> > [  140.418452] All test took worker2=467204082542 cycles
> > [  140.435895] All test took worker3=512591010219 cycles
> > [  140.458316] All test took worker4=448583324125 cycles
> > [  140.494244] All test took worker5=501018129647 cycles
> > [  140.518144] All test took worker6=516224787767 cycles
> > [  140.535472] All test took worker7=442025617137 cycles
> > [  140.558249] All test took worker8=503337286539 cycles
> > [  140.590571] All test took worker9=494369561574 cycles
> >
> > # patch
> > [  144.464916] All test took worker0=530373399067 cycles
> > [  144.492904] All test took worker1=522641540924 cycles
> > [  144.528999] All test took worker2=529711158267 cycles
> > [  144.552963] All test took worker3=527389011775 cycles
> > [  144.592951] All test took worker4=529583252449 cycles
> > [  144.610286] All test took worker5=523605706016 cycles
> > [  144.627690] All test took worker6=531494777011 cycles
> > [  144.653046] All test took worker7=527150114726 cycles
> > [  144.669818] All test took worker8=526599712235 cycles
> > [  144.693428] All test took worker9=526057490851 cycles
> >
> 
> OK ouch, that's worse than I observed! Let me try this prefault approach and
> then we can revert back to spinlocks.
> 
> > > >
> > > > So, how important for you to have this change?
> > > >
> > >
> > > Personally, always very important :)
> > >
> > This is good. Personal opinion always wins :)
> >
> 
> The heart always wins ;) well, an adaption here can make everybody's hearts
> happy I think.
> 
Totally agree :)

--
Uladzislau Rezki

next prev parent reply	other threads:[~2023-03-20 11:20 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-19  7:09 [PATCH v2 0/4] convert read_kcore(), vread() to use iterators Lorenzo Stoakes
2023-03-19  7:09 ` [PATCH v2 1/4] fs/proc/kcore: Avoid bounce buffer for ktext data Lorenzo Stoakes
2023-03-20  9:58   ` David Hildenbrand
2023-03-19  7:09 ` [PATCH v2 2/4] mm: vmalloc: use rwsem, mutex for vmap_area_lock and vmap_block->lock Lorenzo Stoakes
2023-03-19  8:29   ` [lkp] [+328 bytes kernel size regression] [i386-tinyconfig] [1b7c843021] " kernel test robot
2023-03-19 20:10   ` [PATCH v2 2/4] " Andrew Morton
2023-03-19 20:29     ` Lorenzo Stoakes
2023-03-19 20:47       ` Matthew Wilcox
2023-03-19 21:16         ` Lorenzo Stoakes
2023-03-20  8:40           ` Lorenzo Stoakes
2023-03-20  7:54   ` Uladzislau Rezki
2023-03-20  8:25     ` Lorenzo Stoakes
2023-03-20  8:32       ` Uladzislau Rezki
2023-03-20  8:35         ` Lorenzo Stoakes
2023-03-20 11:20           ` Uladzislau Rezki [this message]
2023-03-21  1:09   ` Dave Chinner
2023-03-21  5:23     ` Uladzislau Rezki
2023-03-21  7:45       ` Lorenzo Stoakes
2023-03-21  8:54         ` Uladzislau Rezki
2023-03-21 10:05         ` Dave Chinner
2023-03-21 10:24           ` Uladzislau Rezki
2023-03-22 13:18     ` Uladzislau Rezki
2023-03-22 17:47       ` Matthew Wilcox
2023-03-22 18:01         ` Uladzislau Rezki
2023-03-22 19:15           ` Uladzislau Rezki
2023-03-23 12:47             ` Uladzislau Rezki
2023-03-24  5:25       ` Dave Chinner
2023-03-24  5:31         ` Matthew Wilcox
2023-03-27  0:38           ` Dave Chinner
2023-03-27 17:22         ` Uladzislau Rezki
2023-03-28  2:53           ` Dave Chinner
2023-03-28 12:40             ` Uladzislau Rezki
2023-03-19  7:09 ` [PATCH v2 3/4] fs/proc/kcore: convert read_kcore() to read_kcore_iter() Lorenzo Stoakes
2023-03-19  7:09 ` [PATCH v2 4/4] mm: vmalloc: convert vread() to vread_iter() Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZBhBRCIyc5Scx1Kf@pc636 \
    --to=urezki@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=david@redhat.com \
    --cc=jolsa@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liushixin2@huawei.com \
    --cc=lstoakes@gmail.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.