From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7106CEB64DA for ; Sat, 22 Jul 2023 04:05:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8CFFE8D0001; Sat, 22 Jul 2023 00:05:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 880456B0072; Sat, 22 Jul 2023 00:05:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 747D78D0001; Sat, 22 Jul 2023 00:05:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 60EC56B0071 for ; Sat, 22 Jul 2023 00:05:52 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 011DA8042A for ; Sat, 22 Jul 2023 04:05:51 +0000 (UTC) X-FDA: 81037909344.13.13117A1 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf29.hostedemail.com (Postfix) with ESMTP id E589D12000C for ; Sat, 22 Jul 2023 04:05:49 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=DyHXUWhe; spf=none (imf29.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689998750; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ullnYhqG8fA5KMqRastTGiw9dH0/azv9SAaNZKVNpJY=; b=67QTWIBKaKGP0xVe10P6ixEl+4Blrq1icseyfewddWCuvqGGMSq+d8IJC9oTig5p3gXs4h jb1e1GwMhnjvUOR4J/Sx3uaBlNn3PccfU1nc7mL0mkkINA1+tmM//MpfePN/Rpxwc2/twl +s+B+41YsVJwb2swAFKVgE6EP7EN0w4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689998750; a=rsa-sha256; cv=none; b=kMole6TMZRgaG4JJQAXxrohKQs19T2hHb5kQY5R+sgEzSilWTWOtMEL1uRsADg3IOTB0Ch nQJPFmWOmgxWxYZtotbdimoEm4y5HILRBULW45I93zf05BHrhwa+p/21EtNnn9TZh8KrXJ Yhpjhc36BJcsOzBjlN+YhuAbnODow5c= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=DyHXUWhe; spf=none (imf29.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=ullnYhqG8fA5KMqRastTGiw9dH0/azv9SAaNZKVNpJY=; b=DyHXUWhekaxChDS3Qi7zTEgZqP XTcNxRWhZsPCkJWA1G796Q0F7o/QSV367RlYzDl3gzmqTPmGmBRVxIao5UQ7DtO8v/521Yg0Wi8e2 t1o55guVz0Jf8uVY2YoCQcZw7KOtek7q+P/ujvy0uJjVAO9UUU+ui5U42bJIJTo/Cul10fTp/J5sb 8tq1dss3BpjuOdr2xQl06I0Q7ONa4xPXAOAOwHK0Qc9mfQ+qg34mrXJ8kimXrJp1OE5ivBapCrdNW mtpMmJ4/elDQqD2dxRkAxNipns5+hiR/JXirbTPHZqZ3KWj+cxb1mqy/iphtW584MbOUBqYdxSFsV KJbp07aA==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1qN3sC-001j68-QF; Sat, 22 Jul 2023 04:05:40 +0000 Date: Sat, 22 Jul 2023 05:05:40 +0100 From: Matthew Wilcox To: Mike Kravetz Cc: Sidhartha Kumar , linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, songmuchun@bytedance.com, david@redhat.com Subject: Re: [PATCH v2 0/1] change ->index to PAGE_SIZE for hugetlb pages Message-ID: References: <20230710230450.110064-1-sidhartha.kumar@oracle.com> <20230720000011.GD3240@monkey> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230720000011.GD3240@monkey> X-Stat-Signature: 47jzje49b1pphzb48ztdt38s3z47jp13 X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: E589D12000C X-Rspam-User: X-HE-Tag: 1689998749-532644 X-HE-Meta: U2FsdGVkX19NXB6hqlFOfEeHY/79lCKbB5gMI18rdxsnSQTO9tw3h1AOPdOPOyT6I9T11qbSjSRNLfv1g3zIHJHI7Miop3znz2RBfYjUsmEBqs/+py1likJkefNJWfuFL38MgaskJZGUV1WvcgtdojEFq9wuGH8dBlbiHJF+VBx1b7uLVO1mNryp54XVTUX8QdEmUzl3bBTmEmu4UtP+PLyZBhOAQmyBrgtDaDm4zmePQMnzJoSHrl5P2OknERKB2p3/JSVpXmTuD94F7/Jq5MyCbTv/sC9GIIpipUtM12lM0tj7DaAsYTLjV+qvh0Cqq70w/Qs9aRQewz/mdGFZu8WhHux+8yAs87ZcE9qCCycwnqvdZtCKPcakb5NbdhHgfPMfSmHEWJYIXecyLMZLWy0JdcnFSaU5a0iovByD2wDI/pRkQbfwBJxg4/fzqWe9rCPSpsnUqdmDBpKlpOs8OGn/GxQtZREcUozIAwheK5lw4bKPPelYgzabXOai1TYIuOvwFmz0oGP8t9yWakZ1h7Tmnr0jKTip2yHNMsRisk686pMWbWQxe6GvtqIVCLbs8nCUk5lcUtAuGjC7wlTILxPUfVdmX/XWQSRVkM5j9xGsvdA47kpONR2/zOqb9bVkClWsr5PXK1Qklv45hhOb7+BTVBo40O4UOp8Wcb5kiKOfnXyJSWmJ0BaEG/VQrPTIYEBfdDQQdaOAqHqXRHcrimdRltJVPojg3bH0XgwS3LIkxDlhTpY9KuI/2QyVXP/z0NRUdq2jWi+/C6S/QoGP60LkVCZ3MvguGr+MxTdRWNElyE/K4pyv1zuUZuZNcScx4ILSeHfLFnA9EHkrUJFsg/qdYzXawclOCbdQopEqmR9DlwsG2PUtqA89Bt0Td3GgJKDvlHQ9GlQBb7cXllfgWjqv1CSfY13dyDx+49tDLl3FYZI6HEitEkQmg2gZNaRGGbzsbLQd1J7RhfwR0Ri uMxKhGCk 6COASjL5R6R1OBM3AjT2l8UPUMVlQMAjxy+xQ8B4Y1sc3wAX+hRjsWSzL47UfRA6hWhI6j7NVL12tJ80mb2fijnGjz1s5h8wZDVPK9f82K26auVJYYfVwYC1prF4XVhIk/FVRT+qaoYeNsazFjZE/7MaHCd4u3r6SAECoGfBgEWBFRWYAPw0hS7u3hw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jul 19, 2023 at 05:00:11PM -0700, Mike Kravetz wrote: > On 07/10/23 16:04, Sidhartha Kumar wrote: > > ========================== OVERVIEW ======================================== > > This patchset attempts to implement a listed filemap TODO which is > > changing hugetlb folios to have ->index in PAGE_SIZE. This simplifies many > > functions within filemap.c as they have to special case hugetlb pages. > > From the RFC v1[1], Mike pointed out that hugetlb will still have to maintain > > a huge page sized index as it is used for the reservation map and the hash > > function for the hugetlb mutex table. > > > > This patchset adds new wrappers for hugetlb code to to interact with the > > page cache. These wrappers calculate a linear page index as this is now > > what the page cache expects for hugetlb pages. > > > > From the discussion on HGM for hugetlb[3], there is a want to remove hugetlb > > special casing throughout the core mm code. This series accomplishes > > a part of this by shifting complexity from filemap.c to hugetlb.c. There > > are still checks for hugetlb within the filemap code as cgroup accounting > > and hugetlb accounting are special cased as well. > > > > =========================== PERFORMANCE ===================================== > > Hi Sid, > > Sorry for being dense but can you tell me what the below performance > information means. My concern with such a change would be any noticeable > difference in populating a large (up to TB) hugetlb file. My guess is > that it is going to take longer unless xarray is optimized for this. > > We do have users that create and pre-populate hugetlb files this big. > Just want to make sure there are no surprises for them. It's Going To Depend. Annoyingly. Let's say you're using 1GB pages on a 4kB PAGE_SIZE machine. That's an order-18 folio, so we end up skipping three layers of the tree, and if you're going up to 1TB, it's structured: root -> node (shift 30) -> node (shift 24) -> entry -> entry (...) -> node (shift 24) -> entry (...) (...) This is essentially no different from before where each 1GB page would occupy a single entry. It's just that it now occupies 2^18 entries, and everything in the tree has a different label. Where you will (may?) see a difference is with the 2MB entries. An order-9 page doesn't quite fit with the order-6 nodes in the tree, so it looks like this: root -> node (s30) -> node (s24) -> node (s18) -> node (s12) -> entry 0 -> sibling -> sibling (...) -> entry 8 -> sibling -> sibling (...) so all of a sudden the tree is 8x as big as it used to be. The upside is that we lose all the calculations from filemap.c/pagemap.h. It's a lot better than it was perhaps five years ago when each 2MB page would occupy 512 entries, but 8 entries is still worse than 1. Could we do better? Undoubtedly. We could have variable shifts & node sizes in the tree so that we perhaps had an s18 node that was 8x as large (4160 bytes), and then each order-9 entry in the tree would occupy one entry in that special large node. I've been reluctant to introduce such a beast without strong evidence it would help. Or we could introduce a small s12 node which could only store 8 entries (again an order-9 entry would occupy one entry in such a special node). These are things which would only benefit hugetlbfs, so there's a bit of a chicken-and-egg problem; no demand for the feature until the work is done, and the work maybe performs badly until the feature exists. And then some architectures have other orders for their huge pages. Order 11 is probably the worst possibility to exist (or in general 6n - 1), but I haven't done a detailed survey to figure out if anyone supports such a thing.