From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
patches@lists.linux.dev, Seiji Nishikawa <snishika@redhat.com>,
Mel Gorman <mgorman@techsingularity.net>,
Andrew Morton <akpm@linux-foundation.org>
Subject: [PATCH 5.4 93/93] mm: vmscan: account for free pages to prevent infinite Loop in throttle_direct_reclaim()
Date: Mon, 6 Jan 2025 16:18:09 +0100 [thread overview]
Message-ID: <20250106151132.213551251@linuxfoundation.org> (raw)
In-Reply-To: <20250106151128.686130933@linuxfoundation.org>
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Seiji Nishikawa <snishika@redhat.com>
commit 6aaced5abd32e2a57cd94fd64f824514d0361da8 upstream.
The task sometimes continues looping in throttle_direct_reclaim() because
allow_direct_reclaim(pgdat) keeps returning false.
#0 [ffff80002cb6f8d0] __switch_to at ffff8000080095ac
#1 [ffff80002cb6f900] __schedule at ffff800008abbd1c
#2 [ffff80002cb6f990] schedule at ffff800008abc50c
#3 [ffff80002cb6f9b0] throttle_direct_reclaim at ffff800008273550
#4 [ffff80002cb6fa20] try_to_free_pages at ffff800008277b68
#5 [ffff80002cb6fae0] __alloc_pages_nodemask at ffff8000082c4660
#6 [ffff80002cb6fc50] alloc_pages_vma at ffff8000082e4a98
#7 [ffff80002cb6fca0] do_anonymous_page at ffff80000829f5a8
#8 [ffff80002cb6fce0] __handle_mm_fault at ffff8000082a5974
#9 [ffff80002cb6fd90] handle_mm_fault at ffff8000082a5bd4
At this point, the pgdat contains the following two zones:
NODE: 4 ZONE: 0 ADDR: ffff00817fffe540 NAME: "DMA32"
SIZE: 20480 MIN/LOW/HIGH: 11/28/45
VM_STAT:
NR_FREE_PAGES: 359
NR_ZONE_INACTIVE_ANON: 18813
NR_ZONE_ACTIVE_ANON: 0
NR_ZONE_INACTIVE_FILE: 50
NR_ZONE_ACTIVE_FILE: 0
NR_ZONE_UNEVICTABLE: 0
NR_ZONE_WRITE_PENDING: 0
NR_MLOCK: 0
NR_BOUNCE: 0
NR_ZSPAGES: 0
NR_FREE_CMA_PAGES: 0
NODE: 4 ZONE: 1 ADDR: ffff00817fffec00 NAME: "Normal"
SIZE: 8454144 PRESENT: 98304 MIN/LOW/HIGH: 68/166/264
VM_STAT:
NR_FREE_PAGES: 146
NR_ZONE_INACTIVE_ANON: 94668
NR_ZONE_ACTIVE_ANON: 3
NR_ZONE_INACTIVE_FILE: 735
NR_ZONE_ACTIVE_FILE: 78
NR_ZONE_UNEVICTABLE: 0
NR_ZONE_WRITE_PENDING: 0
NR_MLOCK: 0
NR_BOUNCE: 0
NR_ZSPAGES: 0
NR_FREE_CMA_PAGES: 0
In allow_direct_reclaim(), while processing ZONE_DMA32, the sum of
inactive/active file-backed pages calculated in zone_reclaimable_pages()
based on the result of zone_page_state_snapshot() is zero.
Additionally, since this system lacks swap, the calculation of inactive/
active anonymous pages is skipped.
crash> p nr_swap_pages
nr_swap_pages = $1937 = {
counter = 0
}
As a result, ZONE_DMA32 is deemed unreclaimable and skipped, moving on to
the processing of the next zone, ZONE_NORMAL, despite ZONE_DMA32 having
free pages significantly exceeding the high watermark.
The problem is that the pgdat->kswapd_failures hasn't been incremented.
crash> px ((struct pglist_data *) 0xffff00817fffe540)->kswapd_failures
$1935 = 0x0
This is because the node deemed balanced. The node balancing logic in
balance_pgdat() evaluates all zones collectively. If one or more zones
(e.g., ZONE_DMA32) have enough free pages to meet their watermarks, the
entire node is deemed balanced. This causes balance_pgdat() to exit early
before incrementing the kswapd_failures, as it considers the overall
memory state acceptable, even though some zones (like ZONE_NORMAL) remain
under significant pressure.
The patch ensures that zone_reclaimable_pages() includes free pages
(NR_FREE_PAGES) in its calculation when no other reclaimable pages are
available (e.g., file-backed or anonymous pages). This change prevents
zones like ZONE_DMA32, which have sufficient free pages, from being
mistakenly deemed unreclaimable. By doing so, the patch ensures proper
node balancing, avoids masking pressure on other zones like ZONE_NORMAL,
and prevents infinite loops in throttle_direct_reclaim() caused by
allow_direct_reclaim(pgdat) repeatedly returning false.
The kernel hangs due to a task stuck in throttle_direct_reclaim(), caused
by a node being incorrectly deemed balanced despite pressure in certain
zones, such as ZONE_NORMAL. This issue arises from
zone_reclaimable_pages() returning 0 for zones without reclaimable file-
backed or anonymous pages, causing zones like ZONE_DMA32 with sufficient
free pages to be skipped.
The lack of swap or reclaimable pages results in ZONE_DMA32 being ignored
during reclaim, masking pressure in other zones. Consequently,
pgdat->kswapd_failures remains 0 in balance_pgdat(), preventing fallback
mechanisms in allow_direct_reclaim() from being triggered, leading to an
infinite loop in throttle_direct_reclaim().
This patch modifies zone_reclaimable_pages() to account for free pages
(NR_FREE_PAGES) when no other reclaimable pages exist. This ensures zones
with sufficient free pages are not skipped, enabling proper balancing and
reclaim behavior.
[akpm@linux-foundation.org: coding-style cleanups]
Link: https://lkml.kernel.org/r/20241130164346.436469-1-snishika@redhat.com
Link: https://lkml.kernel.org/r/20241130161236.433747-2-snishika@redhat.com
Fixes: 5a1c84b404a7 ("mm: remove reclaim and compaction retry approximations")
Signed-off-by: Seiji Nishikawa <snishika@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
mm/vmscan.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -342,7 +342,14 @@ unsigned long zone_reclaimable_pages(str
if (get_nr_swap_pages() > 0)
nr += zone_page_state_snapshot(zone, NR_ZONE_INACTIVE_ANON) +
zone_page_state_snapshot(zone, NR_ZONE_ACTIVE_ANON);
-
+ /*
+ * If there are no reclaimable file-backed or anonymous pages,
+ * ensure zones with sufficient free pages are not skipped.
+ * This prevents zones like DMA32 from being ignored in reclaim
+ * scenarios where they can still help alleviate memory pressure.
+ */
+ if (nr == 0)
+ nr = zone_page_state_snapshot(zone, NR_FREE_PAGES);
return nr;
}
next prev parent reply other threads:[~2025-01-06 16:01 UTC|newest]
Thread overview: 103+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-06 15:16 [PATCH 5.4 00/93] 5.4.289-rc1 review Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 5.4 01/93] net: sched: fix ordering of qlen adjustment Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 5.4 02/93] usb: dwc2: gadget: Dont write invalid mapped sg entries into dma_desc with iommu enabled Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 5.4 03/93] PCI/AER: Disable AER service on suspend Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 5.4 04/93] ALSA: usb: Fix UBSAN warning in parse_audio_unit() Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 5.4 05/93] PCI: Add ACS quirk for Broadcom BCM5760X NIC Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 5.4 06/93] i2c: pnx: Fix timeout in wait functions Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 5.4 07/93] drm/i915: Fix memory leak by correcting cache object name in error handler Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 5.4 08/93] erofs: fix order >= MAX_ORDER warning due to crafted negative i_size Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 5.4 09/93] erofs: fix incorrect symlink detection in fast symlink Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 5.4 10/93] net/smc: check sndbuf_space again after NOSPACE flag is set in smc_poll Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 5.4 11/93] ionic: use ee->offset when returning sprom data Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 5.4 12/93] net: hinic: Fix cleanup in create_rxqs/txqs() Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 5.4 13/93] net: ethernet: bgmac-platform: fix an OF node reference leak Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 5.4 14/93] netfilter: ipset: Fix for recursive locking warning Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 5.4 15/93] mmc: sdhci-tegra: Remove SDHCI_QUIRK_BROKEN_ADMA_ZEROLEN_DESC quirk Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 5.4 16/93] chelsio/chtls: prevent potential integer overflow on 32bit Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 5.4 17/93] i2c: riic: Always round-up when calculating bus period Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 5.4 18/93] efivarfs: Fix error on non-existent file Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 5.4 19/93] USB: serial: option: add TCL IK512 MBIM & ECM Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 5.4 20/93] USB: serial: option: add MeiG Smart SLM770A Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 5.4 21/93] USB: serial: option: add Netprisma LCUK54 modules for WWAN Ready Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 5.4 22/93] USB: serial: option: add MediaTek T7XX compositions Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 5.4 23/93] USB: serial: option: add Telit FE910C04 rmnet compositions Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 24/93] sh: clk: Fix clk_enable() to return 0 on NULL clk Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 25/93] zram: refuse to use zero sized block device as backing device Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 26/93] btrfs: tree-checker: reject inline extent items with 0 ref count Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 27/93] NFS/pnfs: Fix a live lock between recalled layouts and layoutget Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 28/93] of/irq: Fix using uninitialized variable @addr_len in API of_irq_parse_one() Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 29/93] nilfs2: prevent use of deleted inode Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 30/93] udmabuf: also check for F_SEAL_FUTURE_WRITE Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 31/93] of: Fix error path in of_parse_phandle_with_args_map() Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 32/93] of: Fix refcount leakage for OF node returned by __of_get_dma_parent() Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 33/93] media: dvb-frontends: dib3000mb: fix uninit-value in dib3000_write_reg Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 34/93] bpf: Check negative offsets in __bpf_skb_min_len() Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 35/93] nfsd: restore callback functionality for NFSv4.0 Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 36/93] mtd: diskonchip: Cast an operand to prevent potential overflow Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 37/93] phy: core: Fix an OF node refcount leakage in _of_phy_get() Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 38/93] phy: core: Fix an OF node refcount leakage in of_phy_provider_lookup() Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 39/93] phy: core: Fix that API devm_phy_put() fails to release the phy Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 40/93] phy: core: Fix that API devm_phy_destroy() fails to destroy " Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 41/93] dmaengine: mv_xor: fix child node refcount handling in early exit Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 42/93] dmaengine: at_xdmac: avoid null_prt_deref in at_xdmac_prep_dma_memset Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 43/93] mtd: rawnand: fix double free in atmel_pmecc_create_user() Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 44/93] tracing/kprobe: Make trace_kprobes module callback called after jump_label update Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 45/93] scsi: qla1280: Fix hw revision numbering for ISP1020/1040 Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 46/93] scsi: megaraid_sas: Fix for a potential deadlock Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 47/93] regmap: Use correct format specifier for logging range errors Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 48/93] platform/x86: asus-nb-wmi: Ignore unknown event 0xCF Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 49/93] scsi: mpt3sas: Diag-Reset when Doorbell-In-Use bit is set during driver load time Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 50/93] virtio-blk: dont keep queue frozen during system suspend Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 51/93] epoll: Add synchronous wakeup support for ep_poll_callback Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 52/93] MIPS: Probe toolchain support of -msym32 Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 53/93] skbuff: introduce skb_expand_head() Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 54/93] ipv6: use skb_expand_head in ip6_finish_output2 Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 55/93] ipv6: use skb_expand_head in ip6_xmit Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 56/93] ipv6: fix possible UAF in ip6_finish_output2() Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 57/93] bpf: fix recursive lock when verdict program return SK_PASS Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 58/93] tracing: Constify string literal data member in struct trace_event_call Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 59/93] btrfs: avoid monopolizing a core when activating a swap file Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 60/93] skb_expand_head() adjust skb->truesize incorrectly Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 61/93] ipv6: prevent possible UAF in ip6_xmit() Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 62/93] selinux: ignore unknown extended permissions Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 63/93] Drivers: hv: util: Avoid accessing a ringbuffer not initialized yet Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 64/93] IB/mlx5: Introduce and use mlx5_core_is_vf() Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 65/93] net/mlx5: Make API mlx5_core_is_ecpf accept const pointer Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 66/93] RDMA/mlx5: Enforce same type port association for multiport RoCE Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 67/93] RDMA/bnxt_re: Add check for path mtu in modify_qp Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 68/93] RDMA/bnxt_re: Fix reporting hw_ver in query_device Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 69/93] RDMA/bnxt_re: Fix max_qp_wrs reported Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 70/93] drm: bridge: adv7511: Enable SPDIF DAI Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 71/93] drm/bridge: adv7511_audio: Update Audio InfoFrame properly Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 72/93] netrom: check buffer length before accessing it Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 73/93] netfilter: Replace zero-length array with flexible-array member Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 74/93] netfilter: nft_set_hash: unaligned atomic read on struct nft_set_ext Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 75/93] net: llc: reset skb->transport_header Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 76/93] ALSA: usb-audio: US16x08: Initialize array before use Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 77/93] af_packet: fix vlan_get_tci() vs MSG_PEEK Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 78/93] af_packet: fix vlan_get_protocol_dgram() " Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 79/93] ila: serialize calls to nf_register_net_hooks() Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 80/93] wifi: mac80211: wake the queues in case of failure in resume Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 81/93] sound: usb: format: dont warn that raw DSD is unsupported Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 82/93] bpf: fix potential error return Greg Kroah-Hartman
2025-01-06 15:17 ` [PATCH 5.4 83/93] net: usb: qmi_wwan: add Telit FE910C04 compositions Greg Kroah-Hartman
2025-01-06 15:18 ` [PATCH 5.4 84/93] irqchip/gic: Correct declaration of *percpu_base pointer in union gic_base Greg Kroah-Hartman
2025-01-06 15:18 ` [PATCH 5.4 85/93] ARC: build: Try to guess GCC variant of cross compiler Greg Kroah-Hartman
2025-01-06 15:18 ` [PATCH 5.4 86/93] modpost: fix input MODULE_DEVICE_TABLE() built for 64-bit on 32-bit host Greg Kroah-Hartman
2025-01-06 15:18 ` [PATCH 5.4 87/93] modpost: fix the missed iteration for the max bit in do_input() Greg Kroah-Hartman
2025-01-06 15:18 ` [PATCH 5.4 88/93] RDMA/uverbs: Prevent integer overflow issue Greg Kroah-Hartman
2025-01-06 15:18 ` [PATCH 5.4 89/93] pinctrl: mcp23s08: Fix sleeping in atomic context due to regmap locking Greg Kroah-Hartman
2025-01-06 15:18 ` [PATCH 5.4 90/93] sky2: Add device ID 11ab:4373 for Marvell 88E8075 Greg Kroah-Hartman
2025-01-06 15:18 ` [PATCH 5.4 91/93] net/sctp: Prevent autoclose integer overflow in sctp_association_init() Greg Kroah-Hartman
2025-01-06 15:18 ` [PATCH 5.4 92/93] drm: adv7511: Drop dsi single lane support Greg Kroah-Hartman
2025-01-06 15:18 ` Greg Kroah-Hartman [this message]
2025-01-06 18:58 ` [PATCH 5.4 00/93] 5.4.289-rc1 review Florian Fainelli
2025-01-07 12:43 ` Jon Hunter
2025-01-07 13:50 ` Naresh Kamboju
2025-01-07 23:16 ` Shuah Khan
2025-01-08 13:00 ` Muhammad Usama Anjum
2025-01-09 10:12 ` Greg Kroah-Hartman
2025-01-09 14:10 ` Muhammad Usama Anjum
2025-01-09 14:14 ` Mark Brown
2025-01-08 14:13 ` Harshit Mogalapalli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250106151132.213551251@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=akpm@linux-foundation.org \
--cc=mgorman@techsingularity.net \
--cc=patches@lists.linux.dev \
--cc=snishika@redhat.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox