public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	patches@lists.linux.dev, Seiji Nishikawa <snishika@redhat.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: [PATCH 6.1 78/81] mm: vmscan: account for free pages to prevent infinite Loop in throttle_direct_reclaim()
Date: Mon,  6 Jan 2025 16:16:50 +0100	[thread overview]
Message-ID: <20250106151132.371873989@linuxfoundation.org> (raw)
In-Reply-To: <20250106151129.433047073@linuxfoundation.org>

6.1-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Seiji Nishikawa <snishika@redhat.com>

commit 6aaced5abd32e2a57cd94fd64f824514d0361da8 upstream.

The task sometimes continues looping in throttle_direct_reclaim() because
allow_direct_reclaim(pgdat) keeps returning false.

 #0 [ffff80002cb6f8d0] __switch_to at ffff8000080095ac
 #1 [ffff80002cb6f900] __schedule at ffff800008abbd1c
 #2 [ffff80002cb6f990] schedule at ffff800008abc50c
 #3 [ffff80002cb6f9b0] throttle_direct_reclaim at ffff800008273550
 #4 [ffff80002cb6fa20] try_to_free_pages at ffff800008277b68
 #5 [ffff80002cb6fae0] __alloc_pages_nodemask at ffff8000082c4660
 #6 [ffff80002cb6fc50] alloc_pages_vma at ffff8000082e4a98
 #7 [ffff80002cb6fca0] do_anonymous_page at ffff80000829f5a8
 #8 [ffff80002cb6fce0] __handle_mm_fault at ffff8000082a5974
 #9 [ffff80002cb6fd90] handle_mm_fault at ffff8000082a5bd4

At this point, the pgdat contains the following two zones:

        NODE: 4  ZONE: 0  ADDR: ffff00817fffe540  NAME: "DMA32"
          SIZE: 20480  MIN/LOW/HIGH: 11/28/45
          VM_STAT:
                NR_FREE_PAGES: 359
        NR_ZONE_INACTIVE_ANON: 18813
          NR_ZONE_ACTIVE_ANON: 0
        NR_ZONE_INACTIVE_FILE: 50
          NR_ZONE_ACTIVE_FILE: 0
          NR_ZONE_UNEVICTABLE: 0
        NR_ZONE_WRITE_PENDING: 0
                     NR_MLOCK: 0
                    NR_BOUNCE: 0
                   NR_ZSPAGES: 0
            NR_FREE_CMA_PAGES: 0

        NODE: 4  ZONE: 1  ADDR: ffff00817fffec00  NAME: "Normal"
          SIZE: 8454144  PRESENT: 98304  MIN/LOW/HIGH: 68/166/264
          VM_STAT:
                NR_FREE_PAGES: 146
        NR_ZONE_INACTIVE_ANON: 94668
          NR_ZONE_ACTIVE_ANON: 3
        NR_ZONE_INACTIVE_FILE: 735
          NR_ZONE_ACTIVE_FILE: 78
          NR_ZONE_UNEVICTABLE: 0
        NR_ZONE_WRITE_PENDING: 0
                     NR_MLOCK: 0
                    NR_BOUNCE: 0
                   NR_ZSPAGES: 0
            NR_FREE_CMA_PAGES: 0

In allow_direct_reclaim(), while processing ZONE_DMA32, the sum of
inactive/active file-backed pages calculated in zone_reclaimable_pages()
based on the result of zone_page_state_snapshot() is zero.

Additionally, since this system lacks swap, the calculation of inactive/
active anonymous pages is skipped.

        crash> p nr_swap_pages
        nr_swap_pages = $1937 = {
          counter = 0
        }

As a result, ZONE_DMA32 is deemed unreclaimable and skipped, moving on to
the processing of the next zone, ZONE_NORMAL, despite ZONE_DMA32 having
free pages significantly exceeding the high watermark.

The problem is that the pgdat->kswapd_failures hasn't been incremented.

        crash> px ((struct pglist_data *) 0xffff00817fffe540)->kswapd_failures
        $1935 = 0x0

This is because the node deemed balanced.  The node balancing logic in
balance_pgdat() evaluates all zones collectively.  If one or more zones
(e.g., ZONE_DMA32) have enough free pages to meet their watermarks, the
entire node is deemed balanced.  This causes balance_pgdat() to exit early
before incrementing the kswapd_failures, as it considers the overall
memory state acceptable, even though some zones (like ZONE_NORMAL) remain
under significant pressure.


The patch ensures that zone_reclaimable_pages() includes free pages
(NR_FREE_PAGES) in its calculation when no other reclaimable pages are
available (e.g., file-backed or anonymous pages).  This change prevents
zones like ZONE_DMA32, which have sufficient free pages, from being
mistakenly deemed unreclaimable.  By doing so, the patch ensures proper
node balancing, avoids masking pressure on other zones like ZONE_NORMAL,
and prevents infinite loops in throttle_direct_reclaim() caused by
allow_direct_reclaim(pgdat) repeatedly returning false.


The kernel hangs due to a task stuck in throttle_direct_reclaim(), caused
by a node being incorrectly deemed balanced despite pressure in certain
zones, such as ZONE_NORMAL.  This issue arises from
zone_reclaimable_pages() returning 0 for zones without reclaimable file-
backed or anonymous pages, causing zones like ZONE_DMA32 with sufficient
free pages to be skipped.

The lack of swap or reclaimable pages results in ZONE_DMA32 being ignored
during reclaim, masking pressure in other zones.  Consequently,
pgdat->kswapd_failures remains 0 in balance_pgdat(), preventing fallback
mechanisms in allow_direct_reclaim() from being triggered, leading to an
infinite loop in throttle_direct_reclaim().

This patch modifies zone_reclaimable_pages() to account for free pages
(NR_FREE_PAGES) when no other reclaimable pages exist.  This ensures zones
with sufficient free pages are not skipped, enabling proper balancing and
reclaim behavior.

[akpm@linux-foundation.org: coding-style cleanups]
Link: https://lkml.kernel.org/r/20241130164346.436469-1-snishika@redhat.com
Link: https://lkml.kernel.org/r/20241130161236.433747-2-snishika@redhat.com
Fixes: 5a1c84b404a7 ("mm: remove reclaim and compaction retry approximations")
Signed-off-by: Seiji Nishikawa <snishika@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 mm/vmscan.c |    9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -588,7 +588,14 @@ unsigned long zone_reclaimable_pages(str
 	if (can_reclaim_anon_pages(NULL, zone_to_nid(zone), NULL))
 		nr += zone_page_state_snapshot(zone, NR_ZONE_INACTIVE_ANON) +
 			zone_page_state_snapshot(zone, NR_ZONE_ACTIVE_ANON);
-
+	/*
+	 * If there are no reclaimable file-backed or anonymous pages,
+	 * ensure zones with sufficient free pages are not skipped.
+	 * This prevents zones like DMA32 from being ignored in reclaim
+	 * scenarios where they can still help alleviate memory pressure.
+	 */
+	if (nr == 0)
+		nr = zone_page_state_snapshot(zone, NR_FREE_PAGES);
 	return nr;
 }
 



  parent reply	other threads:[~2025-01-06 15:22 UTC|newest]

Thread overview: 93+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-06 15:15 [PATCH 6.1 00/81] 6.1.124-rc1 review Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 01/81] x86/hyperv: Fix hv tsc page based sched_clock for hibernation Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 02/81] selinux: ignore unknown extended permissions Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 03/81] btrfs: fix use-after-free in btrfs_encoded_read_endio() Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 04/81] tracing: Have process_string() also allow arrays Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 05/81] thunderbolt: Add support for Intel Lunar Lake Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 06/81] thunderbolt: Add support for Intel Panther Lake-M/P Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 07/81] thunderbolt: Dont display nvm_version unless upgrade supported Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 08/81] xhci: retry Stop Endpoint on buggy NEC controllers Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 09/81] usb: xhci: Limit Stop Endpoint retries Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 10/81] xhci: Turn NEC specific quirk for handling Stop Endpoint errors generic Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 11/81] net: mctp: handle skb cleanup on sock_queue failures Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 12/81] RDMA/mlx5: Enforce same type port association for multiport RoCE Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 13/81] RDMA/bnxt_re: Add check for path mtu in modify_qp Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 14/81] RDMA/bnxt_re: Fix reporting hw_ver in query_device Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 15/81] RDMA/bnxt_re: Fix max_qp_wrs reported Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 16/81] RDMA/bnxt_re: Fix the locking while accessing the QP table Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 17/81] drm/bridge: adv7511_audio: Update Audio InfoFrame properly Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 18/81] net: dsa: microchip: Fix KSZ9477 set_ageing_time function Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 19/81] net: dsa: microchip: add ksz_rmw8() function Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 20/81] net: dsa: microchip: Fix LAN937X set_ageing_time function Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 21/81] RDMA/hns: Refactor mtr find Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 22/81] RDMA/hns: Remove unused parameters and variables Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 23/81] RDMA/hns: Fix mapping error of zero-hop WQE buffer Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 24/81] RDMA/hns: Fix warning storm caused by invalid input in IO path Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 25/81] RDMA/hns: Fix missing flush CQE for DWQE Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 26/81] net: stmmac: platform: provide devm_stmmac_probe_config_dt() Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 27/81] net: stmmac: dont create a MDIO bus if unnecessary Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 28/81] net: stmmac: restructure the error path of stmmac_probe_config_dt() Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 29/81] net: fix memory leak in tcp_conn_request() Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 30/81] ipip,ip_tunnel,sit: Add FOU support for externally controlled ipip devices Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 31/81] ip_tunnel: annotate data-races around t->parms.link Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 32/81] ipv4: ip_tunnel: Unmask upper DSCP bits in ip_tunnel_bind_dev() Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 33/81] ipv4: ip_tunnel: Unmask upper DSCP bits in ip_md_tunnel_xmit() Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 34/81] ipv4: ip_tunnel: Unmask upper DSCP bits in ip_tunnel_xmit() Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 35/81] net: Fix netns for ip_tunnel_init_flow() Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 36/81] netrom: check buffer length before accessing it Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 37/81] drm/i915/dg1: Fix power gate sequence Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 38/81] netfilter: nft_set_hash: unaligned atomic read on struct nft_set_ext Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 39/81] net: llc: reset skb->transport_header Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 40/81] ALSA: usb-audio: US16x08: Initialize array before use Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 41/81] eth: bcmsysport: fix call balance of priv->clk handling routines Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 42/81] net: mv643xx_eth: fix an OF node reference leak Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 43/81] net: wwan: t7xx: Fix FSM command timeout issue Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 44/81] RDMA/rtrs: Ensure ib_sge list is accessible Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 45/81] net: reenable NETIF_F_IPV6_CSUM offload for BIG TCP packets Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 46/81] net: restrict SO_REUSEPORT to inet sockets Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 47/81] net: wwan: iosm: Properly check for valid exec stage in ipc_mmio_init() Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 48/81] af_packet: fix vlan_get_tci() vs MSG_PEEK Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 49/81] af_packet: fix vlan_get_protocol_dgram() " Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 50/81] ila: serialize calls to nf_register_net_hooks() Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 51/81] btrfs: rename and export __btrfs_cow_block() Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 52/81] btrfs: fix use-after-free when COWing tree bock and tracing is enabled Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 53/81] wifi: mac80211: wake the queues in case of failure in resume Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 54/81] drm/amdkfd: Correct the migration DMA map direction Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 55/81] btrfs: flush delalloc workers queue before stopping cleaner kthread during unmount Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 56/81] ALSA: hda/realtek: Add new alc2xx-fixup-headset-mic model Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 57/81] sound: usb: enable DSD output for ddHiFi TC44C Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 58/81] sound: usb: format: dont warn that raw DSD is unsupported Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 59/81] bpf: fix potential error return Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 60/81] ksmbd: retry iterate_dir in smb2_query_dir Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 61/81] net: usb: qmi_wwan: add Telit FE910C04 compositions Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 62/81] Bluetooth: hci_core: Fix sleeping function called from invalid context Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 63/81] irqchip/gic: Correct declaration of *percpu_base pointer in union gic_base Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 64/81] ARC: build: Try to guess GCC variant of cross compiler Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 65/81] usb: xhci: Avoid queuing redundant Stop Endpoint commands Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 66/81] modpost: fix input MODULE_DEVICE_TABLE() built for 64-bit on 32-bit host Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 67/81] modpost: fix the missed iteration for the max bit in do_input() Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 68/81] ALSA hda/realtek: Add quirk for Framework F111:000C Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 69/81] ALSA: seq: oss: Fix races at processing SysEx messages Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 70/81] kcov: mark in_softirq_really() as __always_inline Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 71/81] RDMA/uverbs: Prevent integer overflow issue Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 72/81] pinctrl: mcp23s08: Fix sleeping in atomic context due to regmap locking Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 73/81] sky2: Add device ID 11ab:4373 for Marvell 88E8075 Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 74/81] net/sctp: Prevent autoclose integer overflow in sctp_association_init() Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 75/81] drm: adv7511: Drop dsi single lane support Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 76/81] dt-bindings: display: adi,adv7533: Drop " Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 77/81] mm/readahead: fix large folio support in async readahead Greg Kroah-Hartman
2025-01-06 15:16 ` Greg Kroah-Hartman [this message]
2025-01-06 15:16 ` [PATCH 6.1 79/81] mptcp: fix TCP options overflow Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 80/81] mptcp: fix recvbuffer adjust on sleeping rcvmsg Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 81/81] mptcp: dont always assume copied data in mptcp_cleanup_rbuf() Greg Kroah-Hartman
2025-01-06 18:22 ` [PATCH 6.1 00/81] 6.1.124-rc1 review Pavel Machek
2025-01-06 19:29 ` Florian Fainelli
2025-01-06 22:26 ` Peter Schneider
2025-01-07  0:22 ` SeongJae Park
2025-01-07  7:10 ` Ron Economos
2025-01-07 12:33 ` Mark Brown
2025-01-07 12:36 ` Naresh Kamboju
2025-01-07 12:44 ` Jon Hunter
2025-01-07 20:59 ` [PATCH 6.1] " Hardik Garg
2025-01-07 23:16 ` [PATCH 6.1 00/81] " Shuah Khan
2025-01-08 12:54 ` Muhammad Usama Anjum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250106151132.371873989@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=mgorman@techsingularity.net \
    --cc=patches@lists.linux.dev \
    --cc=snishika@redhat.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox