From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 54B1E17C68 for ; Sun, 22 Dec 2024 12:29:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.174 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734870588; cv=none; b=jT8ylfENXTTk6GRcuA+udIB1dVHlvnHBUBiQ82yFAdaX5qVl7MeNXh0GPE6NI6EISSiwxz/n18/0jeDWLgLnDdHGiMSYoqVAGv6kNAIfgTJF9ulkfdr1wFJcsP5/nDfJq6Nw3dj7FX5XhIFg/ETjzm2voVMquXZLIXs6QiOe/xw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734870588; c=relaxed/simple; bh=8CUwmiLO/mknNfSEzTZOMq0Nr4tsXfqjoG2GPmBNwLk=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=lTZzNExZ/T/r5B+Wv8pyNdSa9YhBGN5VaeHDVyVJK3q+RsvmsvYomghTPajRVKjh7sv6dhiIa8DT30YX6pjF+fjx7j0JtIrZszuMu2zQNfYMY3gPcvKTjhR7TYglWxyNvEPs/3xygIclDpqCBg5A9cFTtHn3ba4305xosF4W5Rg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=gq8fp98l; arc=none smtp.client-ip=209.85.214.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="gq8fp98l" Received: by mail-pl1-f174.google.com with SMTP id d9443c01a7336-21680814d42so29270325ad.2 for ; Sun, 22 Dec 2024 04:29:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1734870586; x=1735475386; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:reply-to:message-id:date :subject:cc:to:from:from:to:cc:subject:date:message-id:reply-to; bh=2QTX0YqqR9AM7fsf9GyW4l3QeD7ywqg3BQ+Z/xOgdSw=; b=gq8fp98lkaMcrik7MmLqNlSRinCj1D5lqOZ6LL07BGxVhfNR7Bmq1xGs0hW2Z9dA7l 1PnNki3e5QLvvee5c60aHOoswr3qipCY1vMcs+GCHD7VVar0lpVKjlfNYxHVac8RjhIz BgpTxtS52WK5UqtinYKsCNe3R03TpL9BSkvRx27e5eKx52+4AmyKE/a015Bn+iOjLVrz AnB4nu77B/BNCfIkIxRIAwpH8lnKqIdXfp8BQYPG0SQcKwSLoKjUYbQ231uWYxQ5Kx/H 6+ONcIDajBjcEmmC/3MFSssa7u1rytt5olQuuO8bukQnaxaJ5AhsUoBo2Hgrv9iIjALi 39+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734870586; x=1735475386; h=content-transfer-encoding:mime-version:reply-to:message-id:date :subject:cc:to:from:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=2QTX0YqqR9AM7fsf9GyW4l3QeD7ywqg3BQ+Z/xOgdSw=; b=q6TO4kNi+AvGclPrmBXkR4rAdhBGocFPHltLZbSC7MiOUDcyOYLaStJ408HSnQnh6O mmXH1dbe1p783ZpOwlcAooGbemD2xxpOKOF9cIs+87ioGUWW0ck7aZWO2ciTc68gyyWc uqp34HCVbEQR7SlLAxtCwguI4gUiLpZrlkKba8gvO7akpNyab0P7MHc340WgX6fuxiCb JIY65ImA3CS3Z+tq7uthPS5oM8b16bpWSDrwqJ4x/fe8NuBGc3fH92z59QCdVu086Wzc I+gKOlxvtSlTnYm6rZUjwyWYGEBMwYN0a/x5jbyPg7OIpLP+M3lecpHyZFy9zkEhI5pb 9Zww== X-Forwarded-Encrypted: i=1; AJvYcCVeffNnu0oxm2t6LsY/lNJ2hANvQfgOBksj4/xdgg8vb23ap8kErcnuzibdq/IcQjgW1p64NERdoTmxlRA=@vger.kernel.org X-Gm-Message-State: AOJu0YwpPvdcW3RHfA4zdVq8fqsK0S5arknSbRxIvaWzfNH1/8c8lyP0 5jLRwdGKqaWiN1NbZvLZieqyc21+wcaNWEhYIEridHc3RAPkR9yHpeXd6g8Y X-Gm-Gg: ASbGncvsloeBp6O483D+MNdGXKK+8hFhftOtbpEneB88DT2MS8rxpXaRwBYcuCbuXdD GFbAX5Xumgi40OWCRSnl2rsG9yac8IYII91kNVdvPubR9bVzQ3ZAg5siFwfjGzPjdXo1B+hBm2G EMuBWWke4YxxrurHwG/ieW4WRRvQgfrONB0yJ7KeZsgC6jFBs0g26eUx0Fb4swXfPX2WsUAmJW6 yOdY1/3nxuQ2rXA2xOVUzfWJFv1Mp90sn81np0wILoU56NFQql3naPfIZnnKaP7bs91OOzgW0Qn RA== X-Google-Smtp-Source: AGHT+IGqN1Dkb8gMZ61ekH4lQeiM1STei9P4DnhltR62KGD5Az+TRB3g2gvaTJnTZXRGZQk1m8+N6w== X-Received: by 2002:a17:90a:d00f:b0:2ee:c9b6:4c42 with SMTP id 98e67ed59e1d1-2f452e214b8mr14596073a91.16.1734870586454; Sun, 22 Dec 2024 04:29:46 -0800 (PST) Received: from KASONG-MC4.tencent.com ([101.32.222.185]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2f447798801sm6552537a91.8.2024.12.22.04.29.42 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Sun, 22 Dec 2024 04:29:45 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Muchun Song , Matthew Wilcox , Johannes Weiner , Roman Gushchin , Shakeel Butt , Michal Hocko , Chengming Zhou , Qi Zheng , Yu Zhao , Sasha Levin , linux-kernel@vger.kernel.org, Kairui Song , syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com Subject: [PATCH] mm, madvise: fix potential workingset node list_lru leaks Date: Sun, 22 Dec 2024 20:29:36 +0800 Message-ID: <20241222122936.67501-1-ryncsn@gmail.com> X-Mailer: git-send-email 2.47.1 Reply-To: Kairui Song Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: Kairui Song Since commit 5abc1e37afa0 ("mm: list_lru: allocate list_lru_one only when needed"), all list_lru users need to allocate the items using the new infrastructure that provides list_lru info for slab allocation, ensuring that the corresponding memcg list_lru is allocated before use. For workingset shadow nodes (which are xa_node), users are converted to use the new infrastructure by commit 9bbdc0f32409 ("xarray: use kmem_cache_alloc_lru to allocate xa_node"). The xas->xa_lru will be set correctly for filemap users. However, there is a missing case: xa_node allocations caused by madvise(..., MADV_COLLAPSE). madvise(..., MADV_COLLAPSE) will also read in the absent parts of file map, and there will be xa_nodes allocated for the caller's memcg (assuming it's not rootcg). However, these allocations won't trigger memcg list_lru allocation because the proper xas info was not set. If nothing else has allocated other xa_nodes for that memcg to trigger list_lru creation, and memory pressure starts to evict file pages, workingset_update_node will try to add these xa_nodes to their corresponding memcg list_lru, and it does not exist (NULL). So they will be added to rootcg's list_lru instead. This shouldn't be a significant issue in practice, but it is indeed unexpected behavior, and these xa_nodes will not be reclaimed effectively. And may lead to incorrect counting of the list_lru->nr_items counter. This problem wasn't exposed until recent commit 28e98022b31ef ("mm/list_lru: simplify reparenting and initial allocation") added a sanity check: only dying memcg could have a NULL list_lru when list_lru_{add,del} is called. This problem triggered this WARNING. So make madvise(..., MADV_COLLAPSE) also call xas_set_lru() to pass the list_lru which we may want to insert xa_node into later. And move mapping_set_update to mm/internal.h, and turn into a macro to avoid including extra headers in mm/internal.h. Fixes: 9bbdc0f32409 ("xarray: use kmem_cache_alloc_lru to allocate xa_node") Reported-by: syzbot+38a0cbd267eff2d286ff@syzkaller.appspotmail.com Closes: https://lore.kernel.org/lkml/675d01e9.050a0220.37aaf.00be.GAE@google.com/ Signed-off-by: Kairui Song --- mm/filemap.c | 9 --------- mm/internal.h | 6 ++++++ mm/khugepaged.c | 3 +++ 3 files changed, 9 insertions(+), 9 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index f61cf51c2238..33b60d448fca 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -124,15 +124,6 @@ * ->private_lock (zap_pte_range->block_dirty_folio) */ -static void mapping_set_update(struct xa_state *xas, - struct address_space *mapping) -{ - if (dax_mapping(mapping) || shmem_mapping(mapping)) - return; - xas_set_update(xas, workingset_update_node); - xas_set_lru(xas, &shadow_nodes); -} - static void page_cache_delete(struct address_space *mapping, struct folio *folio, void *shadow) { diff --git a/mm/internal.h b/mm/internal.h index cb8d8e8e3ffa..4e7a3a93d0a2 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1510,6 +1510,12 @@ static inline void shrinker_debugfs_remove(struct dentry *debugfs_entry, /* Only track the nodes of mappings with shadow entries */ void workingset_update_node(struct xa_node *node); extern struct list_lru shadow_nodes; +#define mapping_set_update(xas, mapping) do { \ + if (!dax_mapping(mapping) && !shmem_mapping(mapping)) { \ + xas_set_update(xas, workingset_update_node); \ + xas_set_lru(xas, &shadow_nodes); \ + } \ +} while (0) /* mremap.c */ unsigned long move_page_tables(struct vm_area_struct *vma, diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 6f8d46d107b4..653dbb1ff05c 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -19,6 +19,7 @@ #include #include #include +#include #include #include @@ -1837,6 +1838,8 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, if (result != SCAN_SUCCEED) goto out; + mapping_set_update(&xas, mapping); + __folio_set_locked(new_folio); if (is_shmem) __folio_set_swapbacked(new_folio); -- 2.47.1