From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF04CC25B75 for ; Wed, 29 May 2024 03:57:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 653766B0098; Tue, 28 May 2024 23:57:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 604176B0099; Tue, 28 May 2024 23:57:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4CB726B009A; Tue, 28 May 2024 23:57:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 2DB9D6B0098 for ; Tue, 28 May 2024 23:57:44 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id B0935802B7 for ; Wed, 29 May 2024 03:57:43 +0000 (UTC) X-FDA: 82170074406.29.EB3B927 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf11.hostedemail.com (Postfix) with ESMTP id 7852D40004 for ; Wed, 29 May 2024 03:57:41 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=EAnWCRWb; dmarc=none; spf=none (imf11.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1716955062; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=eVhwBX0IrYiD8u4IcSFIQl5FT84JE1rr3CApOMdZVBU=; b=I8dbwhVl6LrM0Ye6ikmf3ZDPqfrcwxCnOooltwb9uITJg/3bbO7nYClClVab/qguAV0Jsm FC9zlC6GEN1ZCjuU+A2GTXPAeRHsu1CwhKEW3YabFNQxnfckUmFgnkQIYyZu7Ld3wx/DEC uYp1i91loj+igUmPu+aHiil1z4ueQIM= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=EAnWCRWb; dmarc=none; spf=none (imf11.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1716955062; a=rsa-sha256; cv=none; b=z57OX+cAyeKiHy6w+QAxBKcRtkFcaG+hVi1AnVEYIPBe9NRBZexXwVlt4T5bOp+04wDOh5 NNL8wVKiON1aoeGh/qUwvDx+CXC7PvkvrJwd8KxEKthkPsBAaQU4v4OsnQo7J+hk8yPlje il2I34AknrdOb0cn6+L7n+32WJEe4Qg= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=eVhwBX0IrYiD8u4IcSFIQl5FT84JE1rr3CApOMdZVBU=; b=EAnWCRWbN+p5hI+xyGUEkMD0dU 2z3C8akcNLNQBxSHT0wF3Wwhac0jJ1fl0uLL+3uwmjbLnG9KONpOtVcewiQ26x54Fk4iSwlat61rg 9nFr8QnGn73ijIV5bjMZDtnLEPgqgHsDutcIHUt71H7//DKZcv3Nk9FmlbGiVvRzfi3/gr0EBvsxY nCIVy5JHtzW+RE+DfQR4/dVXmyxYX08dySSVm3IhPMmBERZ6upEmqrZSyvvYksE5nTlMhEgdiGg9j w68MfbZ5Lhkp/nhDf0Oe1KFMaINQ3eYdBJHSB+m4YaOG/qEooQEKqK0raOKqn+JdNe5dVxZOuXNRu yebqQxag==; Received: from willy by casper.infradead.org with local (Exim 4.97.1 #2 (Red Hat Linux)) id 1sCARR-00000009PKP-0uEq; Wed, 29 May 2024 03:57:33 +0000 Date: Wed, 29 May 2024 04:57:33 +0100 From: Matthew Wilcox To: Chris Li Cc: Karim Manaouil , Jan Kara , Chuanhua Han , linux-mm , lsf-pc@lists.linux-foundation.org, ryan.roberts@arm.com, 21cnbao@gmail.com, david@redhat.com Subject: Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Swap Abstraction "the pony" Message-ID: References: <039190fb-81da-c9b3-3f33-70069cdb27b0@oppo.com> <20240307140344.4wlumk6zxustylh6@quack3> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 7852D40004 X-Stat-Signature: 54s67gmxxg18uuekjeao7kghftk9ossm X-HE-Tag: 1716955061-560143 X-HE-Meta: U2FsdGVkX1/OXS8Dzgj9QMLABcvGspxWzCTHaOzUyQ89VQ5rw+kun5KoqP2UbWDntVngWwd1nm2wDkDzDx1qcRV7E3rO5A1l053nscJD7O+hmCLPR3mLGyXULqWTB/124TwV9nj9ik1wkU/iS+d9ZTSP9xf3CVhUvyvUsQSu7H7E7a+UqyleAD8cK1H2mMU5L6sXmxVWg6MOsJdyJImRJYX5wOP/GVXPwytbA3xajXE6dy1x05TRgQkXtPI5v65KK8xQ8LeFtsqZU/OwpzgqtjVdLmLgTLnenZ4hNiSHKoUBwnvDolmBmiblk+oHIGJ310CZxTPR+XDLaY8hqddehUXY59KPHf1YPGjRKTzyuMuHtC23IAPY27nFiK4GXJ/cP7x54jIHwlSIEz5PDjAX4N/F2Af4/PLe40XJqfrTIubzxCROMZ5uF/C2VG3bq13twTRDJYkPcESGhY59WAJLW1PHPTAxhsEGCkZe51ybUYfl92VJ1pBqOOT8SFCkhiu9POlzVhLkoeUKUQHDeQTwANR+skjblQHfSpzMLfw0mYeaL3CPrAdKhZX3Kb/leSOI9pt1Ls+FYu8zwloG+pBTMYLdvAj2g0fWnAJcj60Tr3X4wxB3qQDXMxhuMWUpn40LZs8sbOZPRJmx9tPQRIVUBbRdu7MmB+guhlhGCyWuy11F613q35Qwbe6xARqTwcBRgJAueSYCdjyYccI/etBIdu7cL7G9c7+4e+/Ph3zzj0WF+o/sYXZV7LMbvu8t/azl6itlM8ANHU9BAFED7fLNU/dgCTOQjKZcK3wRsLluBM4+1jMeyKN3KmrjkSy+Lj4JDyiBlru2ggUwMAz02vuyGAn97BGm/UlxKl0eWfBHjKuW9drAC/HUxuuhWJkCTLJqPbNjbBhJ1a1lH1+KQ+zXv56Y2T5sPi+hw/tybZHwatHvg8/4hrENuP+iL0b1Af0JfLmZuXyqIFmlH3mKkZt TGEpCRXE 7+7j3L6/e2oCbTjLKqLF/92IjqTLCHs/zpNdrKv7U0ZOEv9WeLHrQN9lpPylqFFGDEIUO6UiVke3Z+9GWkYK2rSLEM54oT34YKpbGXWvKtoEXX5zuVAT2rE0q95/LrDWB9MMthzI3KHGaxozzZSWK0dJCZqYz6CtdlNJIElJicIPhhrFfOZZNs3r7YHGgjBtwo6DqP7QgKYqot2Ns2MyaP8Y5OM4unIDvfPTpFmel9FxD+6nlND2BVzbqPftgS0qTvuaXqzMF52pSvVXSmsU5i5JrnUnA3i4dIVcoeHTWMD6n5vC7Z+C86L/CprsJWyqS3jc8Tfjq6TsLMf0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, May 21, 2024 at 01:40:56PM -0700, Chris Li wrote: > > Filesystems already implemented a lot of solutions for fragmentation > > avoidance that are more apropriate for slow storage media. > > Swap and file systems have very different requirements and usage > patterns and IO patterns. Should they, though? Filesystems noticed that handling pages in LRU order was inefficient and so they stopped doing that (see the removal of aops->writepage in favour of ->writepages, along with where each are called from). Maybe it's time for swap to start doing writes in the order of virtual addresses within a VMA, instead of LRU order. Indeed, if we're open to radical ideas, the LRU sucks. A physical scan is 40x faster: https://lore.kernel.org/linux-mm/ZTc7SHQ4RbPkD3eZ@casper.infradead.org/ > One challenging aspect is that the current swap back end has a very > low per swap entry memory overhead. It is about 1 byte (swap_map), 2 > byte (swap cgroup), 8 byte(swap cache pointer). The inode struct is > more than 64 bytes per file. That is a big jump if you map a swap > entry to a file. If you map more than one swap entry to a file, then > you need to track the mapping of file offset to swap entry, and the > reverse lookup of swap entry to a file with offset. Whichever way you > cut it, it will significantly increase the per swap entry memory > overhead. Not necessarily, no. If your workload uses a lot of order-2, order-4 and order-9 folios, then the current scheme is using 11 bytes per page, so 44 bytes per order-2 folio, 176 per order-4 folio and 5632 per order-9 folio. That's a lot of bytes we can use for an extent-based scheme. Also, why would you compare the size of an inode to the size of an inode? inode is ~equivalent to an anon_vma, not to a swap entry.