All of lore.kernel.org
 help / color / mirror / Atom feed
From: Uladzislau Rezki <urezki@gmail.com>
To: Baoquan He <bhe@redhat.com>
Cc: Uladzislau Rezki <urezki@gmail.com>,
	Pedro Falcato <pedro.falcato@gmail.com>,
	Matthew Wilcox <willy@infradead.org>,
	Mel Gorman <mgorman@suse.de>,
	kirill.shutemov@linux.intel.com,
	Vishal Moola <vishal.moola@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Lorenzo Stoakes <lstoakes@gmail.com>,
	Christoph Hellwig <hch@infradead.org>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Dave Chinner <david@fromorbit.com>,
	"Paul E . McKenney" <paulmck@kernel.org>,
	Joel Fernandes <joel@joelfernandes.org>,
	Oleksiy Avramchenko <oleksiy.avramchenko@sony.com>,
	linux-mm@kvack.org
Subject: Re: [PATCH v3 00/11] Mitigate a vmap lock contention v3
Date: Thu, 29 Feb 2024 11:38:03 +0100	[thread overview]
Message-ID: <ZeBeiy5QkSo7AJA7@pc636> (raw)
In-Reply-To: <Zd78aiZ8uiM6ZP16@MiWiFi-R3L-srv>

> 
> I finally finished the testing w/o and with your above improvement
> patch. Testing is done on a system with 128 cpus. The system with 288
> cpus is not available because of some console connection. Attach the log
> here. In some testing after rebooting, I found it could take more than 30
> minutes, I am not sure if it's caused by my messy code change. I finally
> cleaned up all of them and take a clean linux-next to test, then apply
> your above draft code.

> [root@dell-per6515-03 linux]# nproc 
> 128
> [root@dell-per6515-03 linux]# free -h
>                total        used        free      shared  buff/cache   available
> Mem:           124Gi       2.6Gi       122Gi        21Mi       402Mi       122Gi
> Swap:          4.0Gi          0B       4.0Gi
> 
> 1)linux-next kernel w/o improving code from Uladzislau
> -------------------------------------------------------
> [root@dell-per6515-03 linux]# time tools/testing/selftests/mm/test_vmalloc.sh run_test_mask=127 nr_threads=64
> Run the test with following parameters: run_test_mask=127 nr_threads=64
> Done.
> Check the kernel ring buffer to see the summary.
> 
> real	4m28.018s
> user	0m0.015s
> sys	0m4.712s
> [root@dell-per6515-03 ~]# sort -h /proc/allocinfo | tail -10
>     21405696     5226 mm/memory.c:1122 func:folio_prealloc 
>     26199936     7980 kernel/fork.c:309 func:alloc_thread_stack_node 
>     29822976     7281 mm/readahead.c:247 func:page_cache_ra_unbounded 
>     99090432    96768 drivers/iommu/iova.c:604 func:iova_magazine_alloc 
>    107638784     6320 mm/readahead.c:468 func:ra_alloc_folio 
>    120560528    29439 mm/mm_init.c:2521 func:alloc_large_system_hash 
>    134742016    32896 mm/percpu-vm.c:95 func:pcpu_alloc_pages 
>    263192576    64256 mm/page_ext.c:270 func:alloc_page_ext 
>    266797056    65136 include/linux/mm.h:2848 func:pagetable_alloc 
>    507617280    32796 mm/slub.c:2305 func:alloc_slab_page 
> [root@dell-per6515-03 ~]# 
> [root@dell-per6515-03 ~]# 
> [root@dell-per6515-03 linux]# time tools/testing/selftests/mm/test_vmalloc.sh run_test_mask=127 nr_threads=128
> Run the test with following parameters: run_test_mask=127 nr_threads=128
> Done.
> Check the kernel ring buffer to see the summary.
> 
> real	6m19.328s
> user	0m0.005s
> sys	0m9.476s
> [root@dell-per6515-03 ~]# sort -h /proc/allocinfo | tail -10
>     21405696     5226 mm/memory.c:1122 func:folio_prealloc 
>     26889408     8190 kernel/fork.c:309 func:alloc_thread_stack_node 
>     29822976     7281 mm/readahead.c:247 func:page_cache_ra_unbounded 
>     99090432    96768 drivers/iommu/iova.c:604 func:iova_magazine_alloc 
>    107638784     6320 mm/readahead.c:468 func:ra_alloc_folio 
>    120560528    29439 mm/mm_init.c:2521 func:alloc_large_system_hash 
>    134742016    32896 mm/percpu-vm.c:95 func:pcpu_alloc_pages 
>    263192576    64256 mm/page_ext.c:270 func:alloc_page_ext 
>    550068224    34086 mm/slub.c:2305 func:alloc_slab_page 
>    664535040   162240 include/linux/mm.h:2848 func:pagetable_alloc 
> [root@dell-per6515-03 ~]# 
> [root@dell-per6515-03 ~]# 
> [root@dell-per6515-03 linux]# time tools/testing/selftests/mm/test_vmalloc.sh run_test_mask=127 nr_threads=256
> Run the test with following parameters: run_test_mask=127 nr_threads=256
> Done.
> Check the kernel ring buffer to see the summary.
> 
> real	19m10.657s
> user	0m0.015s
> sys	0m20.959s
> [root@dell-per6515-03 ~]# sort -h /proc/allocinfo | tail -10
>     22441984     5479 mm/shmem.c:1634 func:shmem_alloc_folio 
>     26758080     8150 kernel/fork.c:309 func:alloc_thread_stack_node 
>     35880960     8760 mm/readahead.c:247 func:page_cache_ra_unbounded 
>     99090432    96768 drivers/iommu/iova.c:604 func:iova_magazine_alloc 
>    120560528    29439 mm/mm_init.c:2521 func:alloc_large_system_hash 
>    122355712     7852 mm/readahead.c:468 func:ra_alloc_folio 
>    134742016    32896 mm/percpu-vm.c:95 func:pcpu_alloc_pages 
>    263192576    64256 mm/page_ext.c:270 func:alloc_page_ext 
>    708231168    50309 mm/slub.c:2305 func:alloc_slab_page 
>   1107296256   270336 include/linux/mm.h:2848 func:pagetable_alloc 
> [root@dell-per6515-03 ~]# 
> 
> 2)linux-next kernel with improving code from Uladzislau
> -----------------------------------------------------
> [root@dell-per6515-03 linux]# time tools/testing/selftests/mm/test_vmalloc.sh run_test_mask=127 nr_threads=64
> Run the test with following parameters: run_test_mask=127 nr_threads=64
> Done.
> Check the kernel ring buffer to see the summary.
> 
> real	4m27.226s
> user	0m0.006s
> sys	0m4.709s
> [root@dell-per6515-03 linux]# sort -h /proc/allocinfo | tail -10
>     38023168     9283 mm/readahead.c:247 func:page_cache_ra_unbounded 
>     72228864    17634 fs/xfs/xfs_buf.c:390 [xfs] func:xfs_buf_alloc_pages 
>     99090432    96768 drivers/iommu/iova.c:604 func:iova_magazine_alloc 
>     99863552    97523 fs/xfs/xfs_icache.c:81 [xfs] func:xfs_inode_alloc 
>    120560528    29439 mm/mm_init.c:2521 func:alloc_large_system_hash 
>    136314880    33280 mm/percpu-vm.c:95 func:pcpu_alloc_pages 
>    184176640    10684 mm/readahead.c:468 func:ra_alloc_folio 
>    263192576    64256 mm/page_ext.c:270 func:alloc_page_ext 
>    284700672    69507 include/linux/mm.h:2848 func:pagetable_alloc 
>    601427968    36377 mm/slub.c:2305 func:alloc_slab_page 
> [root@dell-per6515-03 linux]# time tools/testing/selftests/mm/test_vmalloc.sh run_test_mask=127 nr_threads=128
> Run the test with following parameters: run_test_mask=127 nr_threads=128
> Done.
> Check the kernel ring buffer to see the summary.
> 
> real	6m16.960s
> user	0m0.007s
> sys	0m9.465s
> [root@dell-per6515-03 linux]# sort -h /proc/allocinfo | tail -10
>     38158336     9316 mm/readahead.c:247 func:page_cache_ra_unbounded 
>     72220672    17632 fs/xfs/xfs_buf.c:390 [xfs] func:xfs_buf_alloc_pages 
>     99090432    96768 drivers/iommu/iova.c:604 func:iova_magazine_alloc 
>     99863552    97523 fs/xfs/xfs_icache.c:81 [xfs] func:xfs_inode_alloc 
>    120560528    29439 mm/mm_init.c:2521 func:alloc_large_system_hash 
>    136314880    33280 mm/percpu-vm.c:95 func:pcpu_alloc_pages 
>    184504320    10710 mm/readahead.c:468 func:ra_alloc_folio 
>    263192576    64256 mm/page_ext.c:270 func:alloc_page_ext 
>    427884544   104464 include/linux/mm.h:2848 func:pagetable_alloc 
>    697311232    45159 mm/slub.c:2305 func:alloc_slab_page
> [root@dell-per6515-03 linux]# time tools/testing/selftests/mm/test_vmalloc.sh run_test_mask=127 nr_threads=256
> Run the test with following parameters: run_test_mask=127 nr_threads=256
> Done.
> Check the kernel ring buffer to see the summary.
> 
> real	21m15.673s
> user	0m0.008s
> sys	0m20.259s
> [root@dell-per6515-03 linux]# sort -h /proc/allocinfo | tail -10
>     38158336     9316 mm/readahead.c:247 func:page_cache_ra_unbounded 
>     72224768    17633 fs/xfs/xfs_buf.c:390 [xfs] func:xfs_buf_alloc_pages 
>     99090432    96768 drivers/iommu/iova.c:604 func:iova_magazine_alloc 
>     99863552    97523 fs/xfs/xfs_icache.c:81 [xfs] func:xfs_inode_alloc 
>    120560528    29439 mm/mm_init.c:2521 func:alloc_large_system_hash 
>    136314880    33280 mm/percpu-vm.c:95 func:pcpu_alloc_pages 
>    184504320    10710 mm/readahead.c:468 func:ra_alloc_folio 
>    263192576    64256 mm/page_ext.c:270 func:alloc_page_ext 
>    506974208   123773 include/linux/mm.h:2848 func:pagetable_alloc 
>    809504768    53621 mm/slub.c:2305 func:alloc_slab_page
> [root@dell-per6515-03 linux]# time tools/testing/selftests/mm/test_vmalloc.sh run_test_mask=127 nr_threads=256
> Run the test with following parameters: run_test_mask=127 nr_threads=256
> Done.
> Check the kernel ring buffer to see the summary.
> 
> real	21m36.580s
> user	0m0.012s
> sys	0m19.912s
> [root@dell-per6515-03 linux]# sort -h /proc/allocinfo | tail -10
>     38977536     9516 mm/readahead.c:247 func:page_cache_ra_unbounded 
>     72273920    17645 fs/xfs/xfs_buf.c:390 [xfs] func:xfs_buf_alloc_pages 
>     99090432    96768 drivers/iommu/iova.c:604 func:iova_magazine_alloc 
>     99895296    97554 fs/xfs/xfs_icache.c:81 [xfs] func:xfs_inode_alloc 
>    120560528    29439 mm/mm_init.c:2521 func:alloc_large_system_hash 
>    141033472    34432 mm/percpu-vm.c:95 func:pcpu_alloc_pages 
>    186064896    10841 mm/readahead.c:468 func:ra_alloc_folio 
>    263192576    64256 mm/page_ext.c:270 func:alloc_page_ext 
>    541237248   132138 include/linux/mm.h:2848 func:pagetable_alloc 
>    694718464    41216 mm/slub.c:2305 func:alloc_slab_page
> 
> 
Thank you for testing this. So ~132mb with a patch. I think it looks
good but i might change the draft version and send out a new version.

Thank you again!

--
Uladzislau Rezki


      reply	other threads:[~2024-02-29 10:38 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-02 18:46 [PATCH v3 00/11] Mitigate a vmap lock contention v3 Uladzislau Rezki (Sony)
2024-01-02 18:46 ` [PATCH v3 01/11] mm: vmalloc: Add va_alloc() helper Uladzislau Rezki (Sony)
2024-01-02 18:46 ` [PATCH v3 02/11] mm: vmalloc: Rename adjust_va_to_fit_type() function Uladzislau Rezki (Sony)
2024-01-02 18:46 ` [PATCH v3 03/11] mm: vmalloc: Move vmap_init_free_space() down in vmalloc.c Uladzislau Rezki (Sony)
2024-01-02 18:46 ` [PATCH v3 04/11] mm: vmalloc: Remove global vmap_area_root rb-tree Uladzislau Rezki (Sony)
2024-01-05  8:10   ` Wen Gu
2024-01-05 10:50     ` Uladzislau Rezki
2024-01-06  9:17       ` Wen Gu
2024-01-06 16:36         ` Uladzislau Rezki
2024-01-07  6:59           ` Hillf Danton
2024-01-08  7:45             ` Wen Gu
2024-01-08 18:37               ` Uladzislau Rezki
2024-01-16 23:25   ` Lorenzo Stoakes
2024-01-18 13:15     ` Uladzislau Rezki
2024-01-20 12:55       ` Lorenzo Stoakes
2024-01-22 17:44         ` Uladzislau Rezki
2024-01-02 18:46 ` [PATCH v3 05/11] mm/vmalloc: remove vmap_area_list Uladzislau Rezki (Sony)
2024-01-16 23:36   ` Lorenzo Stoakes
2024-01-02 18:46 ` [PATCH v3 06/11] mm: vmalloc: Remove global purge_vmap_area_root rb-tree Uladzislau Rezki (Sony)
2024-01-02 18:46 ` [PATCH v3 07/11] mm: vmalloc: Offload free_vmap_area_lock lock Uladzislau Rezki (Sony)
2024-01-03 11:08   ` Hillf Danton
2024-01-03 15:47     ` Uladzislau Rezki
2024-01-11  9:02   ` Dave Chinner
2024-01-11 15:54     ` Uladzislau Rezki
2024-01-11 20:37       ` Dave Chinner
2024-01-12 12:18         ` Uladzislau Rezki
2024-01-16 22:12           ` Dave Chinner
2024-01-18 18:15             ` Uladzislau Rezki
2024-02-08  0:25   ` Baoquan He
2024-02-08 13:57     ` Uladzislau Rezki
2024-02-28  9:48   ` Baoquan He
2024-02-28 10:39     ` Uladzislau Rezki
2024-02-28 12:26       ` Baoquan He
2024-03-22 18:21   ` Guenter Roeck
2024-03-22 19:03     ` Uladzislau Rezki
2024-03-22 20:53       ` Guenter Roeck
2024-01-02 18:46 ` [PATCH v3 08/11] mm: vmalloc: Support multiple nodes in vread_iter Uladzislau Rezki (Sony)
2024-01-02 18:46 ` [PATCH v3 09/11] mm: vmalloc: Support multiple nodes in vmallocinfo Uladzislau Rezki (Sony)
2024-01-02 18:46 ` [PATCH v3 10/11] mm: vmalloc: Set nr_nodes based on CPUs in a system Uladzislau Rezki (Sony)
2024-01-11  9:25   ` Dave Chinner
2024-01-15 19:09     ` Uladzislau Rezki
2024-01-16 22:06       ` Dave Chinner
2024-01-18 18:23         ` Uladzislau Rezki
2024-01-18 21:28           ` Dave Chinner
2024-01-19 10:32             ` Uladzislau Rezki
2024-01-02 18:46 ` [PATCH v3 11/11] mm: vmalloc: Add a shrinker to drain vmap pools Uladzislau Rezki (Sony)
2024-02-22  8:35 ` [PATCH v3 00/11] Mitigate a vmap lock contention v3 Uladzislau Rezki
2024-02-22 23:15   ` Pedro Falcato
2024-02-23  9:34     ` Uladzislau Rezki
2024-02-23 10:26       ` Baoquan He
2024-02-23 11:06         ` Uladzislau Rezki
2024-02-23 15:57           ` Baoquan He
2024-02-23 18:55             ` Uladzislau Rezki
2024-02-28  9:27               ` Baoquan He
2024-02-29 10:38                 ` Uladzislau Rezki [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZeBeiy5QkSo7AJA7@pc636 \
    --to=urezki@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=joel@joelfernandes.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lstoakes@gmail.com \
    --cc=mgorman@suse.de \
    --cc=oleksiy.avramchenko@sony.com \
    --cc=paulmck@kernel.org \
    --cc=pedro.falcato@gmail.com \
    --cc=vishal.moola@gmail.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.