From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B93CAC47DD9 for ; Wed, 28 Feb 2024 09:52:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3BB966B009E; Wed, 28 Feb 2024 04:52:06 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 36CFC6B00A0; Wed, 28 Feb 2024 04:52:06 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 233AD6B00A1; Wed, 28 Feb 2024 04:52:06 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 13BC16B009E for ; Wed, 28 Feb 2024 04:52:06 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id CEAC7C0D22 for ; Wed, 28 Feb 2024 09:52:05 +0000 (UTC) X-FDA: 81840746610.28.B138ADE Received: from out-189.mta0.migadu.com (out-189.mta0.migadu.com [91.218.175.189]) by imf06.hostedemail.com (Postfix) with ESMTP id 01BEB180012 for ; Wed, 28 Feb 2024 09:52:02 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=s1R5wnrS; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf06.hostedemail.com: domain of chengming.zhou@linux.dev designates 91.218.175.189 as permitted sender) smtp.mailfrom=chengming.zhou@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709113923; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=GPOCbdIrdO1UY9mjYpsvt7O6Jg6uWqYy//PM9B5ScNY=; b=r3vZeoM+bKvqNEN1Ky7vP0zv1ZPLyqZ2MKBLdJtxj+oRfpeWI857Xz+S81Mcp2gSs037Bl FvJQESWLGso1RjJMvGoolATXplLNRjwf93kdBV2CAchQBAPQEKDpq1nO5EJiDv2wi4pV4N 026/hK+032m0Q2yN1/N36esHagPuWvU= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=s1R5wnrS; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf06.hostedemail.com: domain of chengming.zhou@linux.dev designates 91.218.175.189 as permitted sender) smtp.mailfrom=chengming.zhou@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709113923; a=rsa-sha256; cv=none; b=21LussbFEwGXhCcjlctRem/QPHOV5nECTt2epQOpQ9+UFmEpdmFYHoJXZ3SCLbRf58p7nj kc+fXETqzcHcuZaSWfpj1j8En7rOLBmwg5ftZaHDNp+kbS8/GZF9/dpaIRwLEZaTDyYbbf rxkhxgpkTq8cezDfdEIdj8QCLTqQG3g= Message-ID: <0aa3ce20-438f-49fb-8f04-4fc1dbf49728@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1709113920; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GPOCbdIrdO1UY9mjYpsvt7O6Jg6uWqYy//PM9B5ScNY=; b=s1R5wnrSmub+m/GNUcmW5SoCAvWOPfUnpZJ3sNCiLvnAuJI90E2wPvTlzquACpDevJKO2o Re+UOalb7q3w7JKklWSSb+F9lC8jCBFYfKTor2tEnxYwYjqmsDRFk+FiwqZKmyq1O4k0ui +oFB6QRijQg9Ehlv+To4TlynrwhG1k4= Date: Wed, 28 Feb 2024 17:51:49 +0800 MIME-Version: 1.0 Subject: Re: [PATCH] slub: avoid scanning all partial slabs in get_slabinfo() Content-Language: en-US To: "Christoph Lameter (Ampere)" Cc: Vlastimil Babka , David Rientjes , Jianfeng Wang , penberg@kernel.org, iamjoonsoo.kim@lge.com, akpm@linux-foundation.org, roman.gushchin@linux.dev, 42.hyeyoo@gmail.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Chengming Zhou References: <20240215211457.32172-1-jianfeng.w.wang@oracle.com> <6b58d81f-8e8f-3732-a5d4-40eece75013b@google.com> <55ccc92a-79fa-42d2-97d8-b514cf00823b@linux.dev> <6daf88a2-84c2-5ba4-853c-c38cca4a03cb@linux.com> <347b870e-a7d5-45df-84ba-4eee37b74ff6@linux.dev> <1a952209-fa22-4439-af27-bf102c7d742b@suse.cz> <2744dd57-e76e-4d80-851a-02898f87f9be@suse.cz> <036f2bb4-b086-2988-e46d-86d399405687@linux.com> <1eeb84d4-42b1-d204-ece1-b76bfbc548bf@linux.com> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Chengming Zhou In-Reply-To: <1eeb84d4-42b1-d204-ece1-b76bfbc548bf@linux.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: 01BEB180012 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: 54oq5odry8ra1q1s8mrwaa9dw1k5sd6t X-HE-Tag: 1709113922-366494 X-HE-Meta: U2FsdGVkX18/9zyExROIohffxZkLbO+0KbwqsyYn39k8zBnb8oSHwUKoZXlmixa/iwkeZRImIm1KAlhEFfltpLDZU5/HsuHxsbuxk/PnCzRGU28JNQ7jSaFXLWamTmhp36IlaWZOiWD8/d12+XnO1e92l90ccbImoagFeqq35jxGpyRRNYtZBy8id+mfMbiTiEynPTFXn6eEUcC7hNvp9Sj3y7NKI5Z0IQz7gUo2d7TAEhL58GJaQ51+dIALIHmuQsUdn6YPATYzyZ0boOliVO3zU8EA30ZuNWcknbCD6MZTbPmxk4NGouEQx5pllfqx6nD/XK3HmJIE6wUGH0jUU+2HjAegNJ7aEB97LyNJWkUx3muVNTbwjmKV4nPO4FxbCekJQ6MKQIZ4Wj93Ezvsf/HDXML42COtBtemcODffZU5rReAMXau+Gf3lFvuGEtzgk0huYZsNDLr0tTewL6veHTa7pcB0ya7+abkpHlqCC5hua1o+11k3hQ48hhxqPfsAykZKlm3PYR9GSqDGoS8TKs7B9zUmpD8N/kShQskndHATVKf3Bogff+x3KM1VSJgww0RMrc9mKzgVz7v1DAIG6pY9JmpM50cycwaenOM5jJFSULxlE5AVu9Z3Sd8TD6kfPWyaQji/Ksu9g3KNaiuHSVB8jJO+edKyHvaBOyHwMr2TyQRjsaKlK1jg58iept7ifoLnJlqd44zcJlnVBRrA5sN99Mv0zvKiwZTi+cqem4FqhWgWP8c+n+x4/irRNReoJ7SqUxRBf7YP+oQG7eqNS8NQpttuD4IXMmXeAet8FUeVUjqHvKRvI0ZxLNAHL9/mvJ//tSHm4RLr3e+bzah4rO80zlppHmli7/iUC0lT4JTaqPTd8mK0wRXxFjr3Xf25poZz+x4tlWbhAiG5yQNQeeoRlKCvqvAa4nvVqY6Zhzfm8AjJ906o4sfmOB0az86NSvIKxnSqWyqf9IN03/ Y35tG/gW we5vY5rcJa7DiT6ZRzhZWjCjsv/0UySCb34BhBz9WxfU18GFa39XWof6oJxxutIHf5g+glh7MjSuik6trBuCKZKDk3nNRRJ2WCGXvN1wIh0Q8prZFokjoYArVenSyQ7koyru/27IwpiEJwptDBnmhAFbiGTS6VWWM+RXhvSJRKW+kYibh0Hsb4YO2HW0czg7jJPxW3LF+ChQt9+3B9ISnu8ogHBrKvdtl6MMFRF87DfyQQfw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/2/28 06:55, Christoph Lameter (Ampere) wrote: > On Tue, 27 Feb 2024, Chengming Zhou wrote: > >>> We could mark the state change (list ownership) in the slab metadata and then abort the scan if the state mismatches the list. >> >> It seems feasible, maybe something like below? >> >> But this way needs all kmem_caches have SLAB_TYPESAFE_BY_RCU, right? > > No. > > If a slab is freed to the page allocator and the fields are reused in a different way then we would have to wait till the end of the RCU period. This could be done with a deferred free. Otherwise we have the type checking to ensure that nothing untoward happens in the RCU period. > > The usually shuffle of the pages between freelists/cpulists/cpuslab and fully used slabs would not require that. IIUC, your method doesn't need the slab struct (page) to be delay freed by RCU. So that page struct maybe reused to anything by buddy, even maybe freed, right? Not sure the RCU read lock protection is enough here, do we need to hold other lock, like memory hotplug lock? > >> Not sure if this is acceptable? Which may cause random delay of memory free. >> >> ``` >> retry: >>     rcu_read_lock(); >> >>     h = rcu_dereference(list_next_rcu(&n->partial)); >> >>     while (h != &n->partial) { > > Hmm... a linked list that forms a circle? Linked lists usually terminate in a NULL pointer. I think the node partial list should be a double-linked list? Since we need to add slab to its head or tail. > > So this would be > > > redo: > >      >     rcu_read_lock(); >     h = ; > >     while (h && h->type == ) { >           > >           /* Maybe check h->type again */ >           if (h->type != ) >             break; Another problem of this lockless recheck is that we may get a very false value: say a slab removed from the node list, then be added to our list in another position, so passed our recheck conditions here. Which may cause our counting is very mistaken? Thanks! > >           h = ; >     } > >     rcu_read_unlock(); > > >     if (!h) /* Type of list changed under us */ >         goto redo; > > > The check for type == is racy. Maybe we can ignore that or we could do something additional. > > Using RCU does not make sense if you add locking in the inner loop. Then it gets too complicated and causes delay. This must be a simple fast lockless loop in order to do what we need. > > Presumably the type and list pointers are in the same cacheline and thus could made to be updated in a coherent way if properly sequenced with fences etc.