From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 00297D3B7EA for ; Tue, 9 Dec 2025 11:36:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AC5B56B0005; Tue, 9 Dec 2025 06:36:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A76956B0007; Tue, 9 Dec 2025 06:36:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 965476B0008; Tue, 9 Dec 2025 06:36:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 844236B0005 for ; Tue, 9 Dec 2025 06:36:51 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 25F1FC057D for ; Tue, 9 Dec 2025 11:36:51 +0000 (UTC) X-FDA: 84199730622.15.95E9195 Received: from mail-yx1-f45.google.com (mail-yx1-f45.google.com [74.125.224.45]) by imf12.hostedemail.com (Postfix) with ESMTP id 3EA0240003 for ; Tue, 9 Dec 2025 11:36:49 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=GyQloiaW; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf12.hostedemail.com: domain of kartikey406@gmail.com designates 74.125.224.45 as permitted sender) smtp.mailfrom=kartikey406@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1765280209; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ns+PhtoMtT3whJ2OyUU1RIVA9EHSuQyDwkBgv7RCRS0=; b=DMJ9hUul9g67x1AiC0zYI5q/5zBeD5Rvotp7GPJKyk6TMXRJvaCtLFvB+IwTuf8y2Iqntc 1QKTHmG9wqyTbzyTnmZv1szE8C0uBbPrshjK+XWhUbBXaqW6Ak5hNL0ZGwyBx7OQ92YDyR HLYoyHXXRSj8pYA3BLIGmAsSWSvaK2Q= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=GyQloiaW; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf12.hostedemail.com: domain of kartikey406@gmail.com designates 74.125.224.45 as permitted sender) smtp.mailfrom=kartikey406@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1765280209; a=rsa-sha256; cv=none; b=B2gCeqHXyyEH6kgFoqf78Qe2CFBrTn1nQ9uUpPm5T7uHPcxyDOP9KB754HjAawZ75Cjhsx /hQPX2vaKu+SfaM2WHg8GgBNzINVWU5jeflTRIrp8kbNvbol6uSVGfqifjQYWu9RSsbnkU AmR89ZCnwTY0pmowxV73Gh7bgn/LzkU= Received: by mail-yx1-f45.google.com with SMTP id 956f58d0204a3-640e065991dso4288539d50.3 for ; Tue, 09 Dec 2025 03:36:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1765280208; x=1765885008; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ns+PhtoMtT3whJ2OyUU1RIVA9EHSuQyDwkBgv7RCRS0=; b=GyQloiaWRel2y9PSRLezu3j0l5Tl1wCZDbgX55zE6W1ATqYqZFwsXxowF9OOWhSnNF sIz8fNn9y+TvjCcy3Rmozj4opFFB4ONTlqbXJ7hvQUBpVxmsXiNyS3gACsDDxP6l+xAs QDLUV5nORvEI5lgS0+KZqLOA4oT3KwDT8rosJY+HNl5QKT1U3KnAbNuCcsUUxgS5iyLZ KubiOG5gxnGalSryzBk6Rd9aCzNC7uZQg/wsBk2S/pmo1F+TPi9nJUlUw39wc736k9qi Si5Uxwio0/8VGvwDuwyC7mOpOmb1IHhvcXz5Rtl6tECPO7A6gVflnZ2ZPHKDf95wiE9S 82cQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1765280208; x=1765885008; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=ns+PhtoMtT3whJ2OyUU1RIVA9EHSuQyDwkBgv7RCRS0=; b=oV2OkzLK2YuFu8KWswxqC2H7mGAWPS0zH2AC0knGdsO8rDJtJ4mBZDfyDGepleUzc4 t/xZiWqxnJeNfQuCtCvudG4yFnVcs6/f93buzR7dmFEiMQUDe3a/e5vUN1O++eYhLTDH remsqyq3ZR44toGqkSwv56YoNs1+iUhWaRWgVHTd2NNxA/qOqYpITqXlWNUuW8mrVOIm qELuolLoVNSxzUREO72QWA1lG7sIt5CK7Fsj1L7MQud9vZ0ni8wPtpb0xLJuuNDfl//N S7iaF6Tn3tVSk0uBJ9xN/IM5pcxYcDft/0lmvO5xK75Ti6D1M9SMhoYBOdwL5nGfV7Jq PXoQ== X-Forwarded-Encrypted: i=1; AJvYcCVf/rOSCOTu6EhlNiKR89DqswcR0ScdpDixjytsiTOSM9LOVAJi5ZOeb3Y0IGkSkA0zM2ZrnV+OlQ==@kvack.org X-Gm-Message-State: AOJu0YxjykFX9M+HF2hQ4qvfKZhKTC1Jo7LHoVbroFC1CC70Tf9VjM5y 8GoW3NGxyMU6C1IMcJlab43Ba8LxgE/9QK79EDxPe6GAufXU9UnTKR+l8iqL9EBTgVj1Wad6ldr wF/8YNUiycQulT+TeXpeWM7JLHmLP6ic= X-Gm-Gg: AY/fxX6XPVgc0T8bjiYIVNJU0LOncdE4qkLC0WFW9NN2FdqRVGg4uZEgcve1/Vx8uuR kIfm+lBh1osg6ksSh9I1zVhvJ33Y4Ol7r1AEIuD5dgku6yBhVjQq/jz7Rb+zLKvf7IbRHHD3kA8 TVKhH2wqfYh2UaBKnh5RzV1L+3IHtuJj1j/+fmEDDaRjpJ4gNltpbbuVg7Mht+AQa6o1nnk4eXF CwYm/ueME5dBaIT9vgbzlb4qNQNk9mSNi9mEJdEqsA/bNIzExbBPSG2r3iwsjfQaqDVTg/inRj7 ldQp3XXVIwEDmakSzCidBDkDDsc2hV8czJvt23+FoZKELGTwMXwyY/PnpTc4wCDkTZZrSg== X-Google-Smtp-Source: AGHT+IEb0kLJ3ddCCJa58WbtFVm/qMAonoEdyzm5Qvgo0ozWq+S1ZEX6u5F2N89blm+/SfxMh0Vi4y7TAINN8c/jI4U= X-Received: by 2002:a53:acc7:0:10b0:640:d038:faf9 with SMTP id 956f58d0204a3-6444e75520cmr6840335d50.25.1765280208228; Tue, 09 Dec 2025 03:36:48 -0800 (PST) MIME-Version: 1.0 References: <20251208060046.2933866-1-kartikey406@gmail.com> In-Reply-To: From: Deepanshu Kartikey Date: Tue, 9 Dec 2025 17:06:35 +0530 X-Gm-Features: AQt7F2ouTf3Mif4KtIWXtkfeRwvufqMpa4mxR1KfQRFLoSTuTUf2Jb-99tfomPM Message-ID: Subject: Re: [PATCH] mm/workingset: fix crash from corrupted shadow entries in lru_gen To: "David Hildenbrand (Red Hat)" Cc: akpm@linux-foundation.org, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, hannes@cmpxchg.org, mhocko@kernel.org, zhengqi.arch@bytedance.com, shakeel.butt@linux.dev, lorenzo.stoakes@oracle.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, syzbot+e008db2ac01e282550ee@syzkaller.appspot.com, Yu Zhao Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: 4yse4cnwoyfxbgzoda3sr61b5fju47nz X-Rspamd-Queue-Id: 3EA0240003 X-Rspamd-Server: rspam06 X-HE-Tag: 1765280209-378370 X-HE-Meta: U2FsdGVkX19zW35ikGfWR/z3KWFqlrDVU2ZtHKw/CgcRC5obkrMUOrhthQmRBrz/4q4esuka7X/2J83Kc8DoX6rwzp/zy39KwnzCTE4yVN//JI4tFW9WvwENcaWCC5qZxvXzgJiyQLZXDAu2tmfnh5TDvUqsbxzk5m1jqbT1O0RhjLSsXkPYycjyBruHY3G7Jq+Jg72dWS4Nc8Zte0N+ojiYmU8pJr8+KuI5jXhWaL0DjxzhDDNVQP+LlW/E9GmgN3EAM5Z2NQIZCN8f3fnxWySUPDEozX6GgyfCqJiMFk5nR59B/vbWsJ4+sfxIE+UmvAONlGcFTX8VS4Zv4S7jvextvlieplsENbdnbS2OzyfA2CtNqVzqYOdsjlYO+fzMccErGUJhR+ch34gjAltcVhe5d1NP9sRd8UweanCJV6coT9hV7Ql2hO6i8E7dd1I10XBo7wQA+H4OxkhiMwCwnT2e6D3cS/AyI3MpYN2jaED2z4YoYAahzh5Wjb2KBBzFFfubejx5GotdUH+rOFTWUcqcVm1SmmJuLbRC8xBFzOAZbWx1eAkzIVkISjMEVEKFwLf6r7Ixa+WsL/nmNB4E2Kma/g7ZsoyqzOgAAACJsDnTzmih74MAcpjmQiOTzbyquk/Awn2t7dLrhOYEKmLl8jbgBRn3J8mge1IVHzQP6GpcWJbHAaLE7roiOqzZTBYJiEVw/GqrclggQrhRbWMTSOdtNuaQS9ci6kQGWxSq+jYq1/eX8lx+920BFLBS2YinxgyUzLl/2KodGup/42DKHGXmVc9j79QX0fmN1dabn/0sM1A/eicWwVccqt3XGK1OgByZJ0GV311QjxHI3sMPWeLIOFDm5q1tX1fnXjdbcHwHj/8EEYqM6fODcHiNvN3kPBM7MNcMSVWmv6MOVJroCOI7aDLZjL81FqcE5gQejXisAedZkv8vzCxyAVJsckCDiGmNDAT3MNgX9tAGdDE U9+UU/bL XIxW5GsijIZ5gtbiirp/5jVyD+a+pxvrS2bMVqGxGKd3fVuhF5m8VknFAOAyc+bkh8uDqNii1kD0amsoO47pk+IDbY+UDBn64SQzXC47+IFomxpw5wHlnSruyI+ihQtYEEpnZjdyUnPnN/BQWURE1IrGtbuukYREKEif0cEl7RbYoc+qoqJv5Pysop5LhWNoxqF+gTSJRlX2y+Vd+TAhlv0Oc8jZtEEP3p4G3QoTVXBqxc3bCVm9botfNZmcuKpsrIqy06lhI7l9gRBlPty6UsOFsu8iPc5+GisbafDCS4hPm18hDrv2lYTsnC9KnZIEoG9GEV3VPWivn1CY+2t1sa9rWKQUM2/RIG41giiohLFJKC9nOa2tMWbYr2Gy+NiBfSPT7Zs1mEGWo0pLiZlKivzByW3oFvBQ0naThEiPF/t+K2ABkwkcedHtLQslDGy0aAQV90IY/735q4npDtfmJ8eh7KIcZzYAUf9A1Xunv1Uy2uMMgR4DBPfYyqLAPEro9ZiR7wSwAYEglXIewgKu93c/OoybPsMWNi03B5nZf4uFi2mc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000013, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Dec 8, 2025 at 4:54=E2=80=AFPM David Hildenbrand (Red Hat) wrote: > That's just hacking around the root cause, no? Because IIUC, that's not > something we would ever expect to happen unless BUG. > > Unless I am missing something this patch is trying to cure the symptoms, > but not the root cause. > > Now, if it would be valid (and we would not have a corruption), then > handling it like you propose would be the right thing. Hi David, Thank you for your review. Here's the root cause analysis with debug eviden= ce: ROOT CAUSE: Shadow entries contain invalid NUMA node IDs that don't exist on the system. When unpack_shadow() calls NODE_DATA(invalid_nid), it returns NULL, leading to a crash. EVIDENCE FROM DEBUG LOGS: 1. First crash - invalid node_id=3D4 (system has nodes 0-3): [ 12.345678] UNPACK_SHADOW: shadow=3D0x11 [ 12.345679] Unpacked: memcgid=3D0 nid=3D4 eviction=3D0x0 workingset=3D= 0 [ 12.345680] NODE_DATA(4)=3D0000000000000000 [ 12.345681] *** BUG: INVALID NODE ID 4! *** [ 12.345682] BUG: kernel NULL pointer dereference, address: 0000000000000= 018 [ 12.345683] Call Trace: [ 12.345684] lru_gen_test_recent+0x34/0x1b0 [ 12.345685] workingset_refault+0x123/0x2b0 2. Second crash - invalid node_id=3D11: [ 15.678901] UNPACK_SHADOW: shadow=3D0x2d [ 15.678902] Unpacked: memcgid=3D0 nid=3D11 eviction=3D0x0 workingset= =3D0 [ 15.678903] NODE_DATA(11)=3D0000000000000000 [ 15.678904] *** BUG: INVALID NODE ID 11! *** [ 15.678905] BUG: kernel NULL pointer dereference, address: 0000000000000= 018 CRITICAL FINDING: During the same run, ALL newly created shadows had valid node_id=3D0: [ 12.123456] LRU_GEN_EVICTION: min_seq=3D0x0 refs=3D0 tier=3D0 [ 12.123457] token=3D0x0 [ 12.123458] PACK_SHADOW: memcgid=3D2 node_id=3D0 eviction=3D0x0 [ 12.123459] Final packed shadow=3D0x201 [ 12.234567] PACK_SHADOW: memcgid=3D2 node_id=3D0 eviction=3D0x0 [ 12.234568] Final packed shadow=3D0x201 [ 12.345678] PACK_SHADOW: memcgid=3D2 node_id=3D0 eviction=3D0x0 [ 12.345679] Final packed shadow=3D0x201 Notice: We UNPACK shadows 0x11 and 0x2d (with invalid node IDs), but we NEVER see them being PACKED during this instrumented run. This indicates these invalid shadows are stale entries from before debug was applied. ANALYSIS: The invalid shadows appear to be: - Persisting in page cache/swap from previous runs We cannot confirm if: - The reproducer actively creates these invalid shadows, OR - It only triggers refaults on pre-existing invalid shadows PROPOSED SOLUTION: Given this uncertainty, we need both prevention AND remediation: 1. In pack_shadow() - prevent new invalid shadows: if (pgdat->node_id >=3D MAX_NUMNODES || !NODE_DATA(pgdat->node_id)) { WARN_ONCE(1, "Invalid node_id=3D%d\n", pgdat->node_id); pgdat =3D NODE_DATA(0); } 2. In unpack_shadow() - handle existing invalid shadows: if (nid >=3D MAX_NUMNODES || !NODE_DATA(nid)) { pr_warn_once("Invalid shadow node_id=3D%d, using node 0\n", nid); nid =3D 0; } The unpack_shadow() fix is critical for handling legacy invalid shadows that already exist in the wild. I can investigate further to identify the creation path if needed. Please let me know if you'd like me to: - Submit the defensive fix (unpack_shadow validation) first - Continue investigating the creation path - Or both in parallel Thanks, Deepanshu