From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-dl1-f41.google.com (mail-dl1-f41.google.com [74.125.82.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3967D39A81A for ; Fri, 17 Apr 2026 08:12:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.82.41 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776413543; cv=none; b=RJ2PhUieUF9CuPuYMWtPTrcnk0izxatXdOJ/88S6GpLFNFJxCXFkjl3cq28vGqkXSX5fdOkluxAwfm8WkxobrL/Diflc6lCGZWh84nFfD9n+09zpVMUEp79t3DmEqS4Q7ykAbS0Z8EIm2uONj3zM6CmnLtE78mbz2XSNDgpYmSQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776413543; c=relaxed/simple; bh=f1f8FP5MAPBZgyVtMY0NYdH9uQSsSaWjgN4QrxBbNbw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=inyxgNEhn9Lj86a2FBNubiTqgBR7II65pPhnHarof7TVM3DiDD2W+6tVAk8Ngc5ieGQT/t54iFsSxeKRTHNdzcv3eEUj/VsELk4Jb5MyhuTzQQfQtTMrO8jWVHxl+8mvK+IP4+03RiR0ikF4r+aAv0nNRU/b69ZEufXl72gc/8A= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=bnXrsnGy; arc=none smtp.client-ip=74.125.82.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="bnXrsnGy" Received: by mail-dl1-f41.google.com with SMTP id a92af1059eb24-12c726c30efso364647c88.1 for ; Fri, 17 Apr 2026 01:12:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1776413541; x=1777018341; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=8FVUfbekTe9VVF70JZx+Y2dELk9ZGKMrf0H0ErIcWXs=; b=bnXrsnGyk78ORRttd9feF47MUZLCAEgruJmRI4R1K9Ltwde4uIw5R1LlthzFLt96WV JQo6aW3OymwuHGdX5+UTmBBO19EZJGk5Ba2d6IsbE1Zx1n2K8q+XI/VQSHKDFJYIjXly cCCYG1z2KrIis1DOAWnEATyz/a4quQMKB2ist6NYDivoNijS5H5lkJcSnpUdbToKzmNj 6dNB/sCcpHjTpi6cbYAMwDWHOE655PvkJLLkWrETXbmrSBJF/pMJ09SYhBrYJ+aOCe8P jgAvOLUe9pWmwkpndU657H0DGSrRN4WZimpdLUSb1D3UlS52gNLO2MYGVCt1sgTJqonX 8CdA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776413541; x=1777018341; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=8FVUfbekTe9VVF70JZx+Y2dELk9ZGKMrf0H0ErIcWXs=; b=HvUMOUtqtviXg0OSagLsnPmGttOKkUqUXYwNjIhwZC2SdP/0lZb6vbba+ylPd+ycFb KWTsTTX3IjzIrLWSrRCQRMqelm7LLyJCI90LwJ6fiYYWvImuMfmWvhR+WtfDbQ78J3hA 7cZpdkl2A/rT42LIx7J2tAlg8beKD1iPSzhBrmOhk2MZxlrZFeFoDm20BHD/CKKCeFz4 9zY8j3u1xWqHH8icu7u1lEROYKHPhqHw8EZ7M6QZyPHiLEHbUgl5AlJ1Fjuc+tM+TgcP 41rs+aflPk0k9j7sWqGRBQVq72BYiT8mCGTvrRzdT8Y7Q3OBgewNW8HLMiiC0JFSVv5J /I/w== X-Forwarded-Encrypted: i=1; AFNElJ/16HCG7LsuXyizDXCGNZLvXxHSBjRzD4kdUA0+Ha/hZLAERR19L17QGte36/fNBC8bvesEEEdPuUs=@vger.kernel.org X-Gm-Message-State: AOJu0YxndW8EDhDriLBSwNqcq1JdNJW2oqepEOF2Hp2rZODA7RG/SdMj zhSvyr7QtQnNIjbudZTaFa9bpCT9Nqb9XrY7m0MaM6N9UpNXHsvVpoij X-Gm-Gg: AeBDievBu6EODKPMqhkbYHVvfrZmRPsy3QCjCdY4owtC23S+hUoXy7vfAznW1aF9bpO MY5jaRcyOsikZfL6+w5ycsh4X447RRAFjuBjdlniZmSQp1lODbruXYaML7tMTHqAKCK6HuImrWs BNLsqCyCL6qaa0yMIobw9Lte644QzX7N0Q3Sc+5rSMuxbHMpkUuzfu6kqauRAFCzwREWg4GlmW/ +Y1hzdXF30OWfOyhDbt/upG1Dxx4I1Bp33t/8PEBomEiygybf1OOTVxrDRceOa9revoB99+Rx4U EUrxUXL9ANGVplk/qfg2o27GeqLzhQ9Jnid1nDeOTETMEfRjoQCqGkp829ZegYT/NunTa+Z7KmM eEFvAOu9/CknU32rd6W8kuXlZjfkLvqdeuHRBaKIP7Nsp1wS0yBK24ImHSUbwBRwsY+n5125cme ux4BkIZlDMVceTyie3vSYi8I9Na3R+AagibO3MYTPK/LEdGw== X-Received: by 2002:a05:7022:6b99:b0:12c:2dd7:9099 with SMTP id a92af1059eb24-12c73f9fb84mr680694c88.30.1776413541268; Fri, 17 Apr 2026 01:12:21 -0700 (PDT) Received: from localhost.localdomain ([104.28.152.117]) by smtp.gmail.com with ESMTPSA id a92af1059eb24-12c74a20c55sm1161356c88.13.2026.04.17.01.12.14 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 17 Apr 2026 01:12:20 -0700 (PDT) From: wang lian To: willy@infradead.org Cc: 21cnbao@gmail.com, corbet@lwn.net, davem@davemloft.net, edumazet@google.com, hannes@cmpxchg.org, horms@kernel.org, jackmanb@google.com, kuba@kernel.org, kuniyu@google.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linyunsheng@huawei.com, mhocko@suse.com, netdev@vger.kernel.org, pabeni@redhat.com, surenb@google.com, v-songbaohua@oppo.com, vbabka@suse.cz, willemb@google.com, zhouhuacai@oppo.com, ziy@nvidia.com, wang lian Subject: Re: [RFC PATCH] mm: net: disable kswapd for high-order network buffer allocation Date: Fri, 17 Apr 2026 16:11:34 +0800 Message-ID: <20260417081138.23426-1-lianux.mm@gmail.com> X-Mailer: git-send-email 2.50.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-doc@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=y Content-Transfer-Encoding: 8bit Hi Matthew, Barry, > So, we try to do an order-3 allocation. kswapd runs and ... > succeeds in creating order-3 pages? Or fails to? >From our reproducer runs, both happen. We observe intermittent order-3 successes, but also frequent high-order failures followed by order-0 fallback. > If it fails, that's something we need to sort out. Agreed. In this workload, the bottleneck appears to be contiguity, not raw reclaimable memory shortage. Order-0 memory remains available while suitable order-3 blocks are often unavailable. > If it succeeds, now we have several order-3 pages, great. But where do > they all go that we need to run kswapd again? In our runs, order-3 pockets do show up, but they do not last long. They get consumed quickly by ongoing skb demand, and the pressure returns. To investigate this, we built a reproducer that keeps creating memory fragments while the network stack continuously requests order-3 allocations.[1][2] Raw sample output (trimmed): --------------------------------------------------------------------------------------------------- TIME | BUDDYINFO (Normal Zone) | MEMINFO | KSWAPD CPU & VMSTAT --------------------------------------------------------------------------------------------------- 11:08:11 | ord0:11622 ord3:0 | Free:96MB Avail:1309MB | CPU: 10.0% scan:83107932 [*] PHASE 3: Triggering Order-3 Pressure (UDP Storm). 11:08:15 | ord0:52079 ord3:0 | Free:273MB Avail:1300MB | CPU: 90.9% scan:85328881 11:08:16 | ord0:102895 ord3:0 | Free:477MB Avail:1309MB | CPU: 60.0% scan:85873777 11:08:17 | ord0:115459 ord3:5 | Free:517MB Avail:1284MB | CPU: 54.5% scan:86584389 11:08:18 | ord0:115164 ord3:0 | Free:509MB Avail:1107MB | CPU: 36.4% scan:87083561 --------------------------------------------------------------------------------------------------- The current phenomenon we observed is: Free memory is plentiful, Order-0 pages are abundant, and the network allocation has already successfully entered the fallback-to-order-0 path. Everything seems normal on the surface, yet kswapd remains trapped in a futile loop. It appears that kswapd is stuck in the following logic: wakeup_kswapd -> pgdat_balance -> __zone_watermark_ok. Specifically, in __zone_watermark_ok(): /* For a high-order request, check at least one suitable page is free */ for (o = order; o < NR_PAGE_ORDERS; o++) { struct free_area *area = &z->free_area[o]; int mt; if (!area->nr_free) continue; for (mt = 0; mt < MIGRATE_PCPTYPES; mt++) { if (!free_area_empty(area, mt)) return true; } } Because our reproducer keeps creating fragmentation while the network stack requests order-3, this loop continues to return 'false' for the high-order requirement, even though the system is functionally fine with order-0. To be clear, we are not intentionally creating "artificial" fragments just for the sake of it. Rather, we designed this reproducer to effectively stress-test and expose the existing feedback gap in the reclaim/compaction logic—helping to pinpoint why kswapd continues thumping CPU cycles to satisfy a watermark that the allocator has already abandoned in favor of order-0 fallback. A related discussion in [3] helps reduce vmpressure noise in this area. Useful, but it does not close the contiguity gap by itself: high-order wake/reclaim can still repeat when contiguous blocks cannot be formed. It seems the current situation directs us to take a much closer look at how kswapd behaves in these scenarios. After carefully reviewing everyone's input, we believe it is time to do some targeted work on handling these high-order page issues. We already have some rough ideas and plan to conduct further experiments in this area. We would appreciate a broader discussion to help address this potential oversight that we might have collectively missed. Links: [1] https://github.com/hack-kernel-just-for-fun/kswap/blob/main/kswapd_spin_repro.c [2] https://github.com/hack-kernel-just-for-fun/kswap/blob/main/kswapd.sh [3] https://lore.kernel.org/all/20260406195014.112521-1-jp.kobryn@linux.dev/#r This was reproduced and cross-checked independently by our team (Wang Lian and Kunwu Chan ). -- Best Regards, wang lian