From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 70667C433EF for ; Tue, 19 Oct 2021 09:01:42 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1E5B561373 for ; Tue, 19 Oct 2021 09:01:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 1E5B561373 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=techsingularity.net Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id B92CB6B0072; Tue, 19 Oct 2021 05:01:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B245C6B0073; Tue, 19 Oct 2021 05:01:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9E3026B0074; Tue, 19 Oct 2021 05:01:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0162.hostedemail.com [216.40.44.162]) by kanga.kvack.org (Postfix) with ESMTP id 90C186B0072 for ; Tue, 19 Oct 2021 05:01:41 -0400 (EDT) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 55E681815EB4D for ; Tue, 19 Oct 2021 09:01:41 +0000 (UTC) X-FDA: 78712594002.13.7B9B70D Received: from outbound-smtp56.blacknight.com (outbound-smtp56.blacknight.com [46.22.136.240]) by imf12.hostedemail.com (Postfix) with ESMTP id A978310000AE for ; Tue, 19 Oct 2021 09:01:40 +0000 (UTC) Received: from mail.blacknight.com (pemlinmail01.blacknight.ie [81.17.254.10]) by outbound-smtp56.blacknight.com (Postfix) with ESMTPS id 73690FAB21 for ; Tue, 19 Oct 2021 10:01:39 +0100 (IST) Received: (qmail 6109 invoked from network); 19 Oct 2021 09:01:39 -0000 Received: from unknown (HELO stampy.112glenside.lan) (mgorman@techsingularity.net@[84.203.17.29]) by 81.17.254.9 with ESMTPA; 19 Oct 2021 09:01:39 -0000 From: Mel Gorman To: Andrew Morton Cc: NeilBrown , Theodore Ts'o , Andreas Dilger , "Darrick J . Wong" , Matthew Wilcox , Michal Hocko , Dave Chinner , Rik van Riel , Vlastimil Babka , Johannes Weiner , Jonathan Corbet , Linux-MM , Linux-fsdevel , LKML , Mel Gorman Subject: [PATCH 2/8] mm/vmscan: Throttle reclaim and compaction when too may pages are isolated Date: Tue, 19 Oct 2021 10:01:02 +0100 Message-Id: <20211019090108.25501-3-mgorman@techsingularity.net> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20211019090108.25501-1-mgorman@techsingularity.net> References: <20211019090108.25501-1-mgorman@techsingularity.net> MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: A978310000AE X-Stat-Signature: prrcodgih3yorcei4bmtupigi9gpybg6 Authentication-Results: imf12.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf12.hostedemail.com: domain of mgorman@techsingularity.net designates 46.22.136.240 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net X-HE-Tag: 1634634100-806320 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Page reclaim throttles on congestion if too many parallel reclaim instanc= es have isolated too many pages. This makes no sense, excessive parallelisat= ion has nothing to do with writeback or congestion. This patch creates an additional workqueue to sleep on when too many pages are isolated. The throttled tasks are woken when the number of isolated pages is reduced or a timeout occurs. There may be some false positive wakeups for GFP_NOIO/GFP_NOFS callers but the tasks will throttle again if necessary. [shy828301@gmail.com: Wake up from compaction context] [vbabka@suse.cz: Account number of throttled tasks only for writeback] Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- include/linux/mmzone.h | 6 ++++-- include/trace/events/vmscan.h | 4 +++- mm/compaction.c | 10 ++++++++-- mm/internal.h | 13 ++++++++++++- mm/page_alloc.c | 6 +++++- mm/vmscan.c | 28 +++++++++++++++++++--------- 6 files changed, 51 insertions(+), 16 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index ef0a63ebd21d..58a25d42c31c 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -275,6 +275,8 @@ enum lru_list { =20 enum vmscan_throttle_state { VMSCAN_THROTTLE_WRITEBACK, + VMSCAN_THROTTLE_ISOLATED, + NR_VMSCAN_THROTTLE, }; =20 #define for_each_lru(lru) for (lru =3D 0; lru < NR_LRU_LISTS; lru++) @@ -846,8 +848,8 @@ typedef struct pglist_data { int node_id; wait_queue_head_t kswapd_wait; wait_queue_head_t pfmemalloc_wait; - wait_queue_head_t reclaim_wait; /* wq for throttling reclaim */ - atomic_t nr_reclaim_throttled; /* nr of throtted tasks */ + wait_queue_head_t reclaim_wait[NR_VMSCAN_THROTTLE]; + atomic_t nr_writeback_throttled;/* nr of writeback-throttled tasks */ unsigned long nr_reclaim_start; /* nr pages written while throttled * when throttling started. */ struct task_struct *kswapd; /* Protected by diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.= h index c317f9fe0d17..d4905bd9e9c4 100644 --- a/include/trace/events/vmscan.h +++ b/include/trace/events/vmscan.h @@ -28,10 +28,12 @@ ) : "RECLAIM_WB_NONE" =20 #define _VMSCAN_THROTTLE_WRITEBACK (1 << VMSCAN_THROTTLE_WRITEBACK) +#define _VMSCAN_THROTTLE_ISOLATED (1 << VMSCAN_THROTTLE_ISOLATED) =20 #define show_throttle_flags(flags) \ (flags) ? __print_flags(flags, "|", \ - {_VMSCAN_THROTTLE_WRITEBACK, "VMSCAN_THROTTLE_WRITEBACK"} \ + {_VMSCAN_THROTTLE_WRITEBACK, "VMSCAN_THROTTLE_WRITEBACK"}, \ + {_VMSCAN_THROTTLE_ISOLATED, "VMSCAN_THROTTLE_ISOLATED"} \ ) : "VMSCAN_THROTTLE_NONE" =20 =20 diff --git a/mm/compaction.c b/mm/compaction.c index bfc93da1c2c7..7359093d8ac0 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -761,6 +761,8 @@ isolate_freepages_range(struct compact_control *cc, /* Similar to reclaim, but different enough that they don't share logic = */ static bool too_many_isolated(pg_data_t *pgdat) { + bool too_many; + unsigned long active, inactive, isolated; =20 inactive =3D node_page_state(pgdat, NR_INACTIVE_FILE) + @@ -770,7 +772,11 @@ static bool too_many_isolated(pg_data_t *pgdat) isolated =3D node_page_state(pgdat, NR_ISOLATED_FILE) + node_page_state(pgdat, NR_ISOLATED_ANON); =20 - return isolated > (inactive + active) / 2; + too_many =3D isolated > (inactive + active) / 2; + if (!too_many) + wake_throttle_isolated(pgdat); + + return too_many; } =20 /** @@ -822,7 +828,7 @@ isolate_migratepages_block(struct compact_control *cc= , unsigned long low_pfn, if (cc->mode =3D=3D MIGRATE_ASYNC) return -EAGAIN; =20 - congestion_wait(BLK_RW_ASYNC, HZ/10); + reclaim_throttle(pgdat, VMSCAN_THROTTLE_ISOLATED, HZ/10); =20 if (fatal_signal_pending(current)) return -EINTR; diff --git a/mm/internal.h b/mm/internal.h index 90764d646e02..3461a1055975 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -39,12 +39,21 @@ void __acct_reclaim_writeback(pg_data_t *pgdat, struc= t page *page, static inline void acct_reclaim_writeback(struct page *page) { pg_data_t *pgdat =3D page_pgdat(page); - int nr_throttled =3D atomic_read(&pgdat->nr_reclaim_throttled); + int nr_throttled =3D atomic_read(&pgdat->nr_writeback_throttled); =20 if (nr_throttled) __acct_reclaim_writeback(pgdat, page, nr_throttled); } =20 +static inline void wake_throttle_isolated(pg_data_t *pgdat) +{ + wait_queue_head_t *wqh; + + wqh =3D &pgdat->reclaim_wait[VMSCAN_THROTTLE_ISOLATED]; + if (waitqueue_active(wqh)) + wake_up_all(wqh); +} + vm_fault_t do_swap_page(struct vm_fault *vmf); =20 void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *start_= vma, @@ -120,6 +129,8 @@ extern unsigned long highest_memmap_pfn; */ extern int isolate_lru_page(struct page *page); extern void putback_lru_page(struct page *page); +extern void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_stat= e reason, + long timeout); =20 /* * in mm/rmap.c: diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d849ddfc1e51..78e538067651 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -7389,6 +7389,8 @@ static void pgdat_init_kcompactd(struct pglist_data= *pgdat) {} =20 static void __meminit pgdat_init_internals(struct pglist_data *pgdat) { + int i; + pgdat_resize_init(pgdat); =20 pgdat_init_split_queue(pgdat); @@ -7396,7 +7398,9 @@ static void __meminit pgdat_init_internals(struct p= glist_data *pgdat) =20 init_waitqueue_head(&pgdat->kswapd_wait); init_waitqueue_head(&pgdat->pfmemalloc_wait); - init_waitqueue_head(&pgdat->reclaim_wait); + + for (i =3D 0; i < NR_VMSCAN_THROTTLE; i++) + init_waitqueue_head(&pgdat->reclaim_wait[i]); =20 pgdat_page_ext_init(pgdat); lruvec_init(&pgdat->__lruvec); diff --git a/mm/vmscan.c b/mm/vmscan.c index 735b1f2b5d9e..29434d4fc1c7 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1006,12 +1006,12 @@ static void handle_write_error(struct address_spa= ce *mapping, unlock_page(page); } =20 -static void -reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason, +void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reaso= n, long timeout) { - wait_queue_head_t *wqh =3D &pgdat->reclaim_wait; + wait_queue_head_t *wqh =3D &pgdat->reclaim_wait[reason]; long ret; + bool acct_writeback =3D (reason =3D=3D VMSCAN_THROTTLE_WRITEBACK); DEFINE_WAIT(wait); =20 /* @@ -1023,7 +1023,8 @@ reclaim_throttle(pg_data_t *pgdat, enum vmscan_thro= ttle_state reason, current->flags & (PF_IO_WORKER|PF_KTHREAD)) return; =20 - if (atomic_inc_return(&pgdat->nr_reclaim_throttled) =3D=3D 1) { + if (acct_writeback && + atomic_inc_return(&pgdat->nr_writeback_throttled) =3D=3D 1) { WRITE_ONCE(pgdat->nr_reclaim_start, node_page_state(pgdat, NR_THROTTLED_WRITTEN)); } @@ -1031,7 +1032,9 @@ reclaim_throttle(pg_data_t *pgdat, enum vmscan_thro= ttle_state reason, prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE); ret =3D schedule_timeout(timeout); finish_wait(wqh, &wait); - atomic_dec(&pgdat->nr_reclaim_throttled); + + if (acct_writeback) + atomic_dec(&pgdat->nr_writeback_throttled); =20 trace_mm_vmscan_throttled(pgdat->node_id, jiffies_to_usecs(timeout), jiffies_to_usecs(timeout - ret), @@ -1061,7 +1064,7 @@ void __acct_reclaim_writeback(pg_data_t *pgdat, str= uct page *page, READ_ONCE(pgdat->nr_reclaim_start); =20 if (nr_written > SWAP_CLUSTER_MAX * nr_throttled) - wake_up_all(&pgdat->reclaim_wait); + wake_up_all(&pgdat->reclaim_wait[VMSCAN_THROTTLE_WRITEBACK]); } =20 /* possible outcome of pageout() */ @@ -2176,6 +2179,7 @@ static int too_many_isolated(struct pglist_data *pg= dat, int file, struct scan_control *sc) { unsigned long inactive, isolated; + bool too_many; =20 if (current_is_kswapd()) return 0; @@ -2199,7 +2203,13 @@ static int too_many_isolated(struct pglist_data *p= gdat, int file, if ((sc->gfp_mask & (__GFP_IO | __GFP_FS)) =3D=3D (__GFP_IO | __GFP_FS)= ) inactive >>=3D 3; =20 - return isolated > inactive; + too_many =3D isolated > inactive; + + /* Wake up tasks throttled due to too_many_isolated. */ + if (!too_many) + wake_throttle_isolated(pgdat); + + return too_many; } =20 /* @@ -2308,8 +2318,8 @@ shrink_inactive_list(unsigned long nr_to_scan, stru= ct lruvec *lruvec, return 0; =20 /* wait a bit for the reclaimer. */ - msleep(100); stalled =3D true; + reclaim_throttle(pgdat, VMSCAN_THROTTLE_ISOLATED, HZ/10); =20 /* We are about to die and free our memory. Return now. */ if (fatal_signal_pending(current)) @@ -4343,7 +4353,7 @@ static int kswapd(void *p) =20 WRITE_ONCE(pgdat->kswapd_order, 0); WRITE_ONCE(pgdat->kswapd_highest_zoneidx, MAX_NR_ZONES); - atomic_set(&pgdat->nr_reclaim_throttled, 0); + atomic_set(&pgdat->nr_writeback_throttled, 0); for ( ; ; ) { bool ret; =20 --=20 2.31.1