All of lore.kernel.org
 help / color / mirror / Atom feed
From: Uladzislau Rezki <urezki@gmail.com>
To: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Uladzislau Rezki <urezki@gmail.com>, zuoze <zuoze1@huawei.com>,
	Matthew Wilcox <willy@infradead.org>,
	gustavoars@kernel.org, akpm@linux-foundation.org,
	linux-hardening@vger.kernel.org, linux-mm@kvack.org,
	keescook@chromium.org
Subject: Re: [PATCH -next] mm: usercopy: add a debugfs interface to bypass the vmalloc check.
Date: Tue, 3 Dec 2024 14:51:47 +0100	[thread overview]
Message-ID: <Z08M88NdLfugqhu9@pc636> (raw)
In-Reply-To: <76995749-1c2e-4f78-9aac-a4bff4b8097f@huawei.com>

On Tue, Dec 03, 2024 at 09:45:09PM +0800, Kefeng Wang wrote:
> 
> 
> On 2024/12/3 21:39, Uladzislau Rezki wrote:
> > On Tue, Dec 03, 2024 at 09:30:09PM +0800, Kefeng Wang wrote:
> > > 
> > > 
> > > On 2024/12/3 21:10, zuoze wrote:
> > > > 
> > > > 
> > > > 在 2024/12/3 20:39, Uladzislau Rezki 写道:
> > > > > On Tue, Dec 03, 2024 at 07:23:44PM +0800, zuoze wrote:
> > > > > > We have implemented host-guest communication based on the TUN device
> > > > > > using XSK[1]. The hardware is a Kunpeng 920 machine (ARM architecture),
> > > > > > and the operating system is based on the 6.6 LTS version with kernel
> > > > > > version 6.6. The specific stack for hotspot collection is as follows:
> > > > > > 
> > > > > > -  100.00%     0.00%  vhost-12384  [unknown]      [k] 0000000000000000
> > > > > >      - ret_from_fork
> > > > > >         - 99.99% vhost_task_fn
> > > > > >            - 99.98% 0xffffdc59f619876c
> > > > > >               - 98.99% handle_rx_kick
> > > > > >                  - 98.94% handle_rx
> > > > > >                     - 94.92% tun_recvmsg
> > > > > >                        - 94.76% tun_do_read
> > > > > >                           - 94.62% tun_put_user_xdp_zc
> > > > > >                              - 63.53% __check_object_size
> > > > > >                                 - 63.49% __check_object_size.part.0
> > > > > >                                      find_vmap_area
> > > > > >                              - 30.02% _copy_to_iter
> > > > > >                                   __arch_copy_to_user
> > > > > >                     - 2.27% get_rx_bufs
> > > > > >                        - 2.12% vhost_get_vq_desc
> > > > > >                             1.49% __arch_copy_from_user
> > > > > >                     - 0.89% peek_head_len
> > > > > >                          0.54% xsk_tx_peek_desc
> > > > > >                     - 0.68% vhost_add_used_and_signal_n
> > > > > >                        - 0.53% eventfd_signal
> > > > > >                             eventfd_signal_mask
> > > > > >               - 0.94% handle_tx_kick
> > > > > >                  - 0.94% handle_tx
> > > > > >                     - handle_tx_copy
> > > > > >                        - 0.59% vhost_tx_batch.constprop.0
> > > > > >                             0.52% tun_sendmsg
> > > > > > 
> > > > > > It can be observed that most of the overhead is concentrated in the
> > > > > > find_vmap_area function.
> > > > > > 
> > > > > I see. Yes, it is pretty contented, since you run the v6.6 kernel. There
> > > > > was a work that tends to improve it to mitigate a vmap lock contention.
> > > > > See it here: https://lwn.net/Articles/956590/
> > > > > 
> > > > > The work was taken in the v6.9 kernel:
> > > > > 
> > > > > <snip>
> > > > > commit 38f6b9af04c4b79f81b3c2a0f76d1de94b78d7bc
> > > > > Author: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > > > Date:   Tue Jan 2 19:46:23 2024 +0100
> > > > > 
> > > > >       mm: vmalloc: add va_alloc() helper
> > > > > 
> > > > >       Patch series "Mitigate a vmap lock contention", v3.
> > > > > 
> > > > >       1. Motivation
> > > > > ...
> > > > > <snip>
> > > > > 
> > > > > Could you please try the v6.9 kernel on your setup?
> > > > > 
> > > > > How to solve it, probably, it can be back-ported to the v6.6 kernel.
> > > > 
> > > > All the vmalloc-related optimizations have already been merged into 6.6,
> > > > including the set of optimization patches you suggested. Thank you very
> > > > much for your input.
> > > > 
> > > 
> > > It is unclear, we have backported the vmalloc optimization into our 6.6
> > > kernel before, so the above stack already with those patches and even
> > > with those optimization, the find_vmap_area() is still the hotpots.
> > > 
> > > 
> > Could you please check that all below patches are in your v6.6 kernel?
> 
> Yes,
> 
> $ git lg v6.6..HEAD  --oneline mm/vmalloc.c
> * 86fee542f145 mm: vmalloc: ensure vmap_block is initialised before adding
> to queue
> * f459a0b59f7c mm/vmalloc: fix page mapping if vm_area_alloc_pages() with
> high order fallback to order 0
> * 0be7a82c2555 mm: vmalloc: fix lockdep warning
> * 58b99a00d0a0 mm/vmalloc: eliminated the lock contention from twice to once
> * 2c549aa32fa0 mm: vmalloc: check if a hash-index is in cpu_possible_mask
> * 0bc6d608b445 mm: fix incorrect vbq reference in purge_fragmented_block
> * 450f8c5270df mm/vmalloc: fix vmalloc which may return null if called with
> __GFP_NOFAIL
> * 2ea2bf4a18c3 mm: vmalloc: bail out early in find_vmap_area() if vmap is
> not init
> * bde74a3e8a71 mm/vmalloc: fix return value of vb_alloc if size is 0
> * 8c620d05b7c3 mm: vmalloc: refactor vmalloc_dump_obj() function
> * b0c8281703b8 mm: vmalloc: improve description of vmap node layer
> * ecc3f0bf5c5a mm: vmalloc: add a shrinker to drain vmap pools
> * dd89a137f483 mm: vmalloc: set nr_nodes based on CPUs in a system
> * 8e63c98d86f6 mm: vmalloc: support multiple nodes in vmallocinfo
> * cc32683cef48 mm: vmalloc: support multiple nodes in vread_iter
> * 54d5ce65633d mm: vmalloc: add a scan area of VA only once
> * ee9c199fb859 mm: vmalloc: offload free_vmap_area_lock lock
> * c2c272d78b5a mm: vmalloc: remove global purge_vmap_area_root rb-tree
> * c9b39e3ffa86 mm/vmalloc: remove vmap_area_list
> * 091d2493d15f mm: vmalloc: remove global vmap_area_root rb-tree
> * 53f06cc34bac mm: vmalloc: move vmap_init_free_space() down in vmalloc.c
> * bf24196d9ab9 mm: vmalloc: rename adjust_va_to_fit_type() function
> * 6e9c94401e34 mm: vmalloc: add va_alloc() helper
> * ae528eb14e9a mm: Introduce vmap_page_range() to map pages in PCI address
> space
> * e1dbcfaa1854 mm: Introduce VM_SPARSE kind and vm_area_[un]map_pages().
> * d3a24e7a01c4 mm: Enforce VM_IOREMAP flag and range in ioremap_page_range.
> * fc9813220585 mm/vmalloc: fix the unchecked dereference warning in
> vread_iter()
> * a52e0157837e ascend: export interfaces required by ascend drivers
> * 9b1283f2bec2 mm/vmalloc: Extend vmalloc usage about hugepage
>
Thank you. Then you have tons of copy_to_iter/copy_from_iter calls
during your test case. Per each you need to find an area which might
be really heavy.

How many CPUs in a system you have?

--
Uladzislau Rezki

  reply	other threads:[~2024-12-03 13:51 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-03  2:31 [PATCH -next] mm: usercopy: add a debugfs interface to bypass the vmalloc check Ze Zuo
2024-12-03  4:11 ` Matthew Wilcox
2024-12-03 11:23   ` zuoze
2024-12-03 12:39     ` Uladzislau Rezki
2024-12-03 13:10       ` zuoze
2024-12-03 13:25         ` Uladzislau Rezki
2024-12-03 13:30         ` Kefeng Wang
2024-12-03 13:39           ` Uladzislau Rezki
2024-12-03 13:45             ` Kefeng Wang
2024-12-03 13:51               ` Uladzislau Rezki [this message]
2024-12-03 14:10                 ` Kefeng Wang
2024-12-03 14:20                   ` Uladzislau Rezki
2024-12-03 19:02                     ` Uladzislau Rezki
2024-12-03 19:56                       ` Matthew Wilcox
2024-12-04  1:38                         ` zuoze
2024-12-04  4:43                         ` Kees Cook
2024-12-04  7:55                         ` Uladzislau Rezki
2024-12-04  9:21                           ` zuoze
2024-12-04  9:27                             ` Uladzislau Rezki
2024-12-04  8:51                         ` Uladzislau Rezki
2024-12-16  4:24                           ` Matthew Wilcox
2024-12-16 19:18                             ` Uladzislau Rezki
2024-12-04  1:21                       ` zuoze
2024-12-03  6:12 ` Uladzislau Rezki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z08M88NdLfugqhu9@pc636 \
    --to=urezki@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=gustavoars@kernel.org \
    --cc=keescook@chromium.org \
    --cc=linux-hardening@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=wangkefeng.wang@huawei.com \
    --cc=willy@infradead.org \
    --cc=zuoze1@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.