From: Kairui Song <ryncsn@gmail.com>
To: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
Yu Zhao <yuzhao@google.com>, Wei Xu <weixugc@google.com>,
Chris Li <chrisl@kernel.org>,
Matthew Wilcox <willy@infradead.org>,
linux-kernel@vger.kernel.org, Kairui Song <kasong@tencent.com>
Subject: [PATCH v3 1/3] mm, lru_gen: try to prefetch next page when scanning LRU
Date: Wed, 24 Jan 2024 02:45:50 +0800 [thread overview]
Message-ID: <20240123184552.59758-2-ryncsn@gmail.com> (raw)
In-Reply-To: <20240123184552.59758-1-ryncsn@gmail.com>
From: Kairui Song <kasong@tencent.com>
Prefetch for inactive/active LRU have been long exiting, apply the same
optimization for MGLRU.
Test 1: Ramdisk fio ro test in a 4G memcg on a EPYC 7K62:
fio -name=mglru --numjobs=16 --directory=/mnt --size=960m \
--buffered=1 --ioengine=io_uring --iodepth=128 \
--iodepth_batch_submit=32 --iodepth_batch_complete=32 \
--rw=randread --random_distribution=zipf:0.5 --norandommap \
--time_based --ramp_time=1m --runtime=6m --group_reporting
Before this patch:
bw ( MiB/s): min= 7758, max= 9239, per=100.00%, avg=8747.59, stdev=16.51, samples=11488
iops : min=1986251, max=2365323, avg=2239380.87, stdev=4225.93, samples=11488
After this patch (+7.2%):
bw ( MiB/s): min= 8360, max= 9771, per=100.00%, avg=9381.31, stdev=15.67, samples=11488
iops : min=2140296, max=2501385, avg=2401613.91, stdev=4010.41, samples=11488
Test 2: Ramdisk fio hybrid test for 30m in a 4G memcg on a EPYC 7K62 (3 times):
fio --buffered=1 --numjobs=8 --size=960m --directory=/mnt \
--time_based --ramp_time=1m --runtime=30m \
--ioengine=io_uring --iodepth=128 --iodepth_batch_submit=32 \
--iodepth_batch_complete=32 --norandommap \
--name=mglru-ro --rw=randread --random_distribution=zipf:0.7 \
--name=mglru-rw --rw=randrw --random_distribution=zipf:0.7
Before this patch:
READ: 6622.0 MiB/s. Stdev: 22.090722
WRITE: 1256.3 MiB/s. Stdev: 5.249339
After this patch (+4.6%, +3.3%):
READ: 6926.6 MiB/s, Stdev: 37.950260
WRITE: 1297.3 MiB/s, Stdev: 7.408704
Test 3: 30m of MySQL test in 6G memcg (12 times):
echo 'set GLOBAL innodb_buffer_pool_size=16106127360;' | \
mysql -u USER -h localhost --password=PASS
sysbench /usr/share/sysbench/oltp_read_only.lua \
--mysql-user=USER --mysql-password=PASS --mysql-db=DB \
--tables=48 --table-size=2000000 --threads=16 --time=1800 run
Before this patch
Avg: 134743.714545 qps. Stdev: 582.242189
After this patch (+0.2%):
Avg: 135005.779091 qps. Stdev: 295.299027
Test 4: Build linux kernel in 2G memcg with make -j48 with SSD swap
(for memory stress, 18 times):
Before this patch:
Avg: 1456.768899 s. Stdev: 20.106973
After this patch (+0.0%):
Avg: 1455.659254 s. Stdev: 15.274481
Test 5: Memtier test in a 4G cgroup using brd as swap (18 times):
memcached -u nobody -m 16384 -s /tmp/memcached.socket \
-a 0766 -t 16 -B binary &
memtier_benchmark -S /tmp/memcached.socket \
-P memcache_binary -n allkeys \
--key-minimum=1 --key-maximum=16000000 -d 1024 \
--ratio=1:0 --key-pattern=P:P -c 1 -t 16 --pipeline 8 -x 3
Before this patch:
Avg: 50317.984000 Ops/sec. Stdev: 2568.965458
After this patch (-5.7%):
Avg: 47691.343500 Ops/sec. Stdev: 3925.772473
It seems prefetch is helpful in most cases, but the memtier test is
either hitting a case where prefetch causes higher cache miss or it's
just too noisy (high stdev).
Signed-off-by: Kairui Song <kasong@tencent.com>
---
mm/vmscan.c | 30 ++++++++++++++++++++++++++----
1 file changed, 26 insertions(+), 4 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4f9c854ce6cc..03631cedb3ab 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3681,15 +3681,26 @@ static bool inc_min_seq(struct lruvec *lruvec, int type, bool can_swap)
/* prevent cold/hot inversion if force_scan is true */
for (zone = 0; zone < MAX_NR_ZONES; zone++) {
struct list_head *head = &lrugen->folios[old_gen][type][zone];
+ struct folio *prev = NULL;
- while (!list_empty(head)) {
- struct folio *folio = lru_to_folio(head);
+ if (!list_empty(head))
+ prev = lru_to_folio(head);
+
+ while (prev) {
+ struct folio *folio = prev;
VM_WARN_ON_ONCE_FOLIO(folio_test_unevictable(folio), folio);
VM_WARN_ON_ONCE_FOLIO(folio_test_active(folio), folio);
VM_WARN_ON_ONCE_FOLIO(folio_is_file_lru(folio) != type, folio);
VM_WARN_ON_ONCE_FOLIO(folio_zonenum(folio) != zone, folio);
+ if (unlikely(list_is_first(&folio->lru, head))) {
+ prev = NULL;
+ } else {
+ prev = lru_to_folio(&folio->lru);
+ prefetchw(&prev->flags);
+ }
+
new_gen = folio_inc_gen(lruvec, folio, false);
list_move_tail(&folio->lru, &lrugen->folios[new_gen][type][zone]);
@@ -4341,11 +4352,15 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
for (i = MAX_NR_ZONES; i > 0; i--) {
LIST_HEAD(moved);
int skipped_zone = 0;
+ struct folio *prev = NULL;
int zone = (sc->reclaim_idx + i) % MAX_NR_ZONES;
struct list_head *head = &lrugen->folios[gen][type][zone];
- while (!list_empty(head)) {
- struct folio *folio = lru_to_folio(head);
+ if (!list_empty(head))
+ prev = lru_to_folio(head);
+
+ while (prev) {
+ struct folio *folio = prev;
int delta = folio_nr_pages(folio);
VM_WARN_ON_ONCE_FOLIO(folio_test_unevictable(folio), folio);
@@ -4355,6 +4370,13 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
scanned += delta;
+ if (unlikely(list_is_first(&folio->lru, head))) {
+ prev = NULL;
+ } else {
+ prev = lru_to_folio(&folio->lru);
+ prefetchw(&prev->flags);
+ }
+
if (sort_folio(lruvec, folio, sc, tier))
sorted += delta;
else if (isolate_folio(lruvec, folio, sc)) {
--
2.43.0
next prev parent reply other threads:[~2024-01-23 18:46 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-23 18:45 [PATCH v3 0/3] mm, lru_gen: batch update pages when aging Kairui Song
2024-01-23 18:45 ` Kairui Song [this message]
2024-01-25 7:32 ` [PATCH v3 1/3] mm, lru_gen: try to prefetch next page when scanning LRU Chris Li
2024-01-25 17:51 ` Kairui Song
2024-01-26 0:56 ` Chris Li
2024-01-26 10:31 ` Kairui Song
2024-01-26 21:19 ` Chris Li
2024-01-23 18:45 ` [PATCH v3 2/3] mm, lru_gen: batch update counters on aging Kairui Song
2024-01-23 18:45 ` [PATCH v3 3/3] mm, lru_gen: move pages in bulk when aging Kairui Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240123184552.59758-2-ryncsn@gmail.com \
--to=ryncsn@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=chrisl@kernel.org \
--cc=kasong@tencent.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=weixugc@google.com \
--cc=willy@infradead.org \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.