From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 21F4E19ABC3; Tue, 15 Jul 2025 20:40:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752612057; cv=none; b=oys7k8LKnehpDxUSbVpSK+KE7jvu/juWrJzEtpUvAdW7ikbGQolO4jF6lYbFlbTkoCDaT3+DxSPonw9NeVWtEiaZw4q//eewP5HczK88dTjuLBbgehH0J8Fwd3U+qflArKSmw+WPiep/wsBEZvIHWmTrOODH2UrgEAtb/x64FZM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752612057; c=relaxed/simple; bh=bjgYAM2QLmUopNUDY1m+jMaK0QwGYud5GRBrIM2R+Hc=; h=Date:From:To:Subject:Message-ID:MIME-Version:Content-Type: Content-Disposition; b=lurG8QA/4vbRPS22aJa6b7hJEGnROOp3htwmsfdMZzj8dFOEv2Zy3eqLJwCnDq9qGX1jSHOD2k2VGmXX0yNd7B3E09U+kff47XiRZhG0wLd08GfN6DS+F9URAZjiq3VisG02eYPOHtNeP/jFksSLnTou5n08ulHE1TYw9GZ/MWo= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=rkI+4BNO; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="rkI+4BNO" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:Message-ID: Subject:To:From:Date:Sender:Reply-To:Cc:Content-Transfer-Encoding:Content-ID: Content-Description:In-Reply-To:References; bh=SYy7+JhvSNYS+NgaNxt53BfDsD6DTcsBMQgVgtXAnMw=; b=rkI+4BNOMXmHOlYzssTyZGMoyL aI4+mtJLryNxKiCc4W+ld08C0+bphxREcY/dxe5wfC8vq1Jm/gkORw00fkrVN1wN9ZebXvUZdsASi zm/TWcijb+f/QGxCr7JAj8hN/eD1ZF8y8US+BmVKcsXKcioWHpvbobjpz6cJAgAK7CUS1qNhk4uFL tjWrM8+Ywd756Z75jpjjjhA98m3aZFKJrVLHToG44C24SvMEzbGkUiMb11sHEeqIF4vneMyjl5vwU TzrTkImLRsy3P+dQk+EBbDZjxoBwbifTvRQpCJk58iDMe6EFuJx0uQ/0H2pK8Hk2W4Z21ZbGxiHf4 6wuu3xaA==; Received: from willy by casper.infradead.org with local (Exim 4.98.2 #2 (Red Hat Linux)) id 1ubmSA-0000000DkmQ-22ck; Tue, 15 Jul 2025 20:40:42 +0000 Date: Tue, 15 Jul 2025 21:40:42 +0100 From: Matthew Wilcox To: Chris Mason , Josef Bacik , David Sterba , linux-btrfs@vger.kernel.org, Nicolas Pitre , Gao Xiang , Chao Yu , linux-erofs@lists.ozlabs.org, Jaegeuk Kim , linux-f2fs-devel@lists.sourceforge.net, Jan Kara , linux-fsdevel@vger.kernel.org, David Woodhouse , Richard Weinberger , linux-mtd@lists.infradead.org, David Howells , netfs@lists.linux.dev, Paulo Alcantara , Konstantin Komarov , ntfs3@lists.linux.dev, Steve French , linux-cifs@vger.kernel.org, Phillip Lougher Subject: Compressed files & the page cache Message-ID: Precedence: bulk X-Mailing-List: linux-cifs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline I've started looking at how the page cache can help filesystems handle compressed data better. Feedback would be appreciated! I'll probably say a few things which are obvious to anyone who knows how compressed files work, but I'm trying to be explicit about my assumptions. First, I believe that all filesystems work by compressing fixed-size plaintext into variable-sized compressed blocks. This would be a good point to stop reading and tell me about counterexamples. >From what I've been reading in all your filesystems is that you want to allocate extra pages in the page cache in order to store the excess data retrieved along with the page that you're actually trying to read. That's because compressing in larger chunks leads to better compression. There's some discrepancy between filesystems whether you need scratch space for decompression. Some filesystems read the compressed data into the pagecache and decompress in-place, while other filesystems read the compressed data into scratch pages and decompress into the page cache. There also seems to be some discrepancy between filesystems whether the decompression involves vmap() of all the memory allocated or whether the decompression routines can handle doing kmap_local() on individual pages. So, my proposal is that filesystems tell the page cache that their minimum folio size is the compression block size. That seems to be around 64k, so not an unreasonable minimum allocation size. That removes all the extra code in filesystems to allocate extra memory in the page cache. It means we don't attempt to track dirtiness at a sub-folio granularity (there's no point, we have to write back the entire compressed bock at once). We also get a single virtually contiguous block ... if you're willing to ditch HIGHMEM support. Or there's a proposal to introduce a vmap_file() which would give us a virtually contiguous chunk of memory (and could be trivially turned into a noop for the case of trying to vmap a single large folio). From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.sourceforge.net (lists.sourceforge.net [216.105.38.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 2E769C83F27 for ; Tue, 15 Jul 2025 20:41:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.sourceforge.net; s=beta; h=Content-Transfer-Encoding:Content-Type: List-Subscribe:List-Help:List-Post:List-Archive:List-Unsubscribe:List-Id: Subject:MIME-Version:Message-ID:To:From:Date:Sender:Reply-To:Cc:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Owner; bh=7G0sAighjmCKuSGWXUOdWE5fil/ateaZv6QbPhj8TKU=; b=hTaTvbCdgPoQA2IMAXHwcekfAv x5Qy0eCC4Lx98ENRlyvMnoWB3epBPK6sCBvw4BLUCgAAhmanLAvjywbXPKIG+zURUY75ljnj0ZVeJ qUklowBztLYrRg46j2grAPcbjqZ+TACqt6cijEEuwahG+oAdqLvNMxx0C9rMEC3ZlVQU=; Received: from [127.0.0.1] (helo=sfs-ml-2.v29.lw.sourceforge.com) by sfs-ml-2.v29.lw.sourceforge.com with esmtp (Exim 4.95) (envelope-from ) id 1ubmSX-0005l9-Pe; Tue, 15 Jul 2025 20:41:06 +0000 Received: from [172.30.29.66] (helo=mx.sourceforge.net) by sfs-ml-2.v29.lw.sourceforge.com with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1ubmSW-0005kv-3m for linux-f2fs-devel@lists.sourceforge.net; Tue, 15 Jul 2025 20:41:05 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=sourceforge.net; s=x; h=Content-Type:MIME-Version:Message-ID:Subject:To: From:Date:Sender:Reply-To:Cc:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=SYy7+JhvSNYS+NgaNxt53BfDsD6DTcsBMQgVgtXAnMw=; b=EzB/5taxFZyu9ikZA7P2bb8YlD DjtFmv99+iLoyXZdY1VLCqU+2rkwBhbrHRqVth84FAf6jeDSXAzMrv7teARIUKqAxvgzaL8dMUQnw Y3f/H/M5vAzMC9ahBgq0cSnT5QhDrykK3M5u/f8Cn7H1ILcD3lAxGjg3DmRhwGQ1oc5E=; DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=sf.net; s=x ; h=Content-Type:MIME-Version:Message-ID:Subject:To:From:Date:Sender:Reply-To :Cc:Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To: References:List-Id:List-Help:List-Unsubscribe:List-Subscribe:List-Post: List-Owner:List-Archive; bh=SYy7+JhvSNYS+NgaNxt53BfDsD6DTcsBMQgVgtXAnMw=; b=A viB9oC5uD6lABrVI6v18RtMDnjtJa6gJdBr3/s6lp7KEq8DMrQXjJrOT3VUQwM/rZLVUkO6b7EmHH vxkBadbUICVJzZoV7S3OgWNNQkNNognjlx81LX/exc/7kjlHaO5k2Cgi+lCR2TChK9zrqNHkMd0F8 7XrS+VcD/hdfmZLE=; Received: from [90.155.50.34] (helo=casper.infradead.org) by sfi-mx-2.v28.lw.sourceforge.com with esmtps (TLS1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.95) id 1ubmSV-00012u-2b for linux-f2fs-devel@lists.sourceforge.net; Tue, 15 Jul 2025 20:41:04 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:Message-ID: Subject:To:From:Date:Sender:Reply-To:Cc:Content-Transfer-Encoding:Content-ID: Content-Description:In-Reply-To:References; bh=SYy7+JhvSNYS+NgaNxt53BfDsD6DTcsBMQgVgtXAnMw=; b=rkI+4BNOMXmHOlYzssTyZGMoyL aI4+mtJLryNxKiCc4W+ld08C0+bphxREcY/dxe5wfC8vq1Jm/gkORw00fkrVN1wN9ZebXvUZdsASi zm/TWcijb+f/QGxCr7JAj8hN/eD1ZF8y8US+BmVKcsXKcioWHpvbobjpz6cJAgAK7CUS1qNhk4uFL tjWrM8+Ywd756Z75jpjjjhA98m3aZFKJrVLHToG44C24SvMEzbGkUiMb11sHEeqIF4vneMyjl5vwU TzrTkImLRsy3P+dQk+EBbDZjxoBwbifTvRQpCJk58iDMe6EFuJx0uQ/0H2pK8Hk2W4Z21ZbGxiHf4 6wuu3xaA==; Received: from willy by casper.infradead.org with local (Exim 4.98.2 #2 (Red Hat Linux)) id 1ubmSA-0000000DkmQ-22ck; Tue, 15 Jul 2025 20:40:42 +0000 Date: Tue, 15 Jul 2025 21:40:42 +0100 From: Matthew Wilcox To: Chris Mason , Josef Bacik , David Sterba , linux-btrfs@vger.kernel.org, Nicolas Pitre , Gao Xiang , Chao Yu , linux-erofs@lists.ozlabs.org, Jaegeuk Kim , linux-f2fs-devel@lists.sourceforge.net, Jan Kara , linux-fsdevel@vger.kernel.org, David Woodhouse , Richard Weinberger , linux-mtd@lists.infradead.org, David Howells , netfs@lists.linux.dev, Paulo Alcantara , Konstantin Komarov , ntfs3@lists.linux.dev, Steve French , linux-cifs@vger.kernel.org, Phillip Lougher Message-ID: MIME-Version: 1.0 Content-Disposition: inline X-Headers-End: 1ubmSV-00012u-2b Subject: [f2fs-dev] Compressed files & the page cache X-BeenThere: linux-f2fs-devel@lists.sourceforge.net X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: linux-f2fs-devel-bounces@lists.sourceforge.net I've started looking at how the page cache can help filesystems handle compressed data better. Feedback would be appreciated! I'll probably say a few things which are obvious to anyone who knows how compressed files work, but I'm trying to be explicit about my assumptions. First, I believe that all filesystems work by compressing fixed-size plaintext into variable-sized compressed blocks. This would be a good point to stop reading and tell me about counterexamples. >From what I've been reading in all your filesystems is that you want to allocate extra pages in the page cache in order to store the excess data retrieved along with the page that you're actually trying to read. That's because compressing in larger chunks leads to better compression. There's some discrepancy between filesystems whether you need scratch space for decompression. Some filesystems read the compressed data into the pagecache and decompress in-place, while other filesystems read the compressed data into scratch pages and decompress into the page cache. There also seems to be some discrepancy between filesystems whether the decompression involves vmap() of all the memory allocated or whether the decompression routines can handle doing kmap_local() on individual pages. So, my proposal is that filesystems tell the page cache that their minimum folio size is the compression block size. That seems to be around 64k, so not an unreasonable minimum allocation size. That removes all the extra code in filesystems to allocate extra memory in the page cache. It means we don't attempt to track dirtiness at a sub-folio granularity (there's no point, we have to write back the entire compressed bock at once). We also get a single virtually contiguous block ... if you're willing to ditch HIGHMEM support. Or there's a proposal to introduce a vmap_file() which would give us a virtually contiguous chunk of memory (and could be trivially turned into a noop for the case of trying to vmap a single large folio). _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A25B3C83F17 for ; Tue, 15 Jul 2025 20:41:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:Message-ID:Subject:To:From :Date:Reply-To:Cc:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References: List-Owner; bh=pTKWvojoQ8CM/E+umVdoGFtRbl7k32ija93OP2lymXQ=; b=MuJec49DXtCyHq Wotmdq7H6bX0W1kK+BVx7b31/Oz6C0BuAAjzXMG1zXWPfzR89LsHOWaEhBW3eGfl7uteshbxOmvQA C06Cc/Yjlhuq+kXyyD0EAf3l/wpAEkGOACIzNoTrBh/Mu2fsYYsdosNm0VWl0Zf+9R354Bew04nd+ GyDi2HJu/QaKcEoOj4BDKGEFKFCddm573EjoiMmtsdyTurG8AoYWD5uJYtMTB3wUmgxz8YrLm0avk o45710iJP8gyOICDPH3GoyU6/flHnyZtZDuoVuUS62hvt+BL2KPC0XGrG98bn3JCyrhQCML58wwVK Am6MdZSZsiiNwXqpWnpg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98.2 #2 (Red Hat Linux)) id 1ubmSM-00000006BtT-0c6w; Tue, 15 Jul 2025 20:40:54 +0000 Received: from casper.infradead.org ([2001:8b0:10b:1236::1]) by bombadil.infradead.org with esmtps (Exim 4.98.2 #2 (Red Hat Linux)) id 1ubmSK-00000006BtL-1CWa for linux-mtd@bombadil.infradead.org; Tue, 15 Jul 2025 20:40:52 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:Message-ID: Subject:To:From:Date:Sender:Reply-To:Cc:Content-Transfer-Encoding:Content-ID: Content-Description:In-Reply-To:References; bh=SYy7+JhvSNYS+NgaNxt53BfDsD6DTcsBMQgVgtXAnMw=; b=rkI+4BNOMXmHOlYzssTyZGMoyL aI4+mtJLryNxKiCc4W+ld08C0+bphxREcY/dxe5wfC8vq1Jm/gkORw00fkrVN1wN9ZebXvUZdsASi zm/TWcijb+f/QGxCr7JAj8hN/eD1ZF8y8US+BmVKcsXKcioWHpvbobjpz6cJAgAK7CUS1qNhk4uFL tjWrM8+Ywd756Z75jpjjjhA98m3aZFKJrVLHToG44C24SvMEzbGkUiMb11sHEeqIF4vneMyjl5vwU TzrTkImLRsy3P+dQk+EBbDZjxoBwbifTvRQpCJk58iDMe6EFuJx0uQ/0H2pK8Hk2W4Z21ZbGxiHf4 6wuu3xaA==; Received: from willy by casper.infradead.org with local (Exim 4.98.2 #2 (Red Hat Linux)) id 1ubmSA-0000000DkmQ-22ck; Tue, 15 Jul 2025 20:40:42 +0000 Date: Tue, 15 Jul 2025 21:40:42 +0100 From: Matthew Wilcox To: Chris Mason , Josef Bacik , David Sterba , linux-btrfs@vger.kernel.org, Nicolas Pitre , Gao Xiang , Chao Yu , linux-erofs@lists.ozlabs.org, Jaegeuk Kim , linux-f2fs-devel@lists.sourceforge.net, Jan Kara , linux-fsdevel@vger.kernel.org, David Woodhouse , Richard Weinberger , linux-mtd@lists.infradead.org, David Howells , netfs@lists.linux.dev, Paulo Alcantara , Konstantin Komarov , ntfs3@lists.linux.dev, Steve French , linux-cifs@vger.kernel.org, Phillip Lougher Subject: Compressed files & the page cache Message-ID: MIME-Version: 1.0 Content-Disposition: inline X-BeenThere: linux-mtd@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-mtd" Errors-To: linux-mtd-bounces+linux-mtd=archiver.kernel.org@lists.infradead.org I've started looking at how the page cache can help filesystems handle compressed data better. Feedback would be appreciated! I'll probably say a few things which are obvious to anyone who knows how compressed files work, but I'm trying to be explicit about my assumptions. First, I believe that all filesystems work by compressing fixed-size plaintext into variable-sized compressed blocks. This would be a good point to stop reading and tell me about counterexamples. >From what I've been reading in all your filesystems is that you want to allocate extra pages in the page cache in order to store the excess data retrieved along with the page that you're actually trying to read. That's because compressing in larger chunks leads to better compression. There's some discrepancy between filesystems whether you need scratch space for decompression. Some filesystems read the compressed data into the pagecache and decompress in-place, while other filesystems read the compressed data into scratch pages and decompress into the page cache. There also seems to be some discrepancy between filesystems whether the decompression involves vmap() of all the memory allocated or whether the decompression routines can handle doing kmap_local() on individual pages. So, my proposal is that filesystems tell the page cache that their minimum folio size is the compression block size. That seems to be around 64k, so not an unreasonable minimum allocation size. That removes all the extra code in filesystems to allocate extra memory in the page cache. It means we don't attempt to track dirtiness at a sub-folio granularity (there's no point, we have to write back the entire compressed bock at once). We also get a single virtually contiguous block ... if you're willing to ditch HIGHMEM support. Or there's a proposal to introduce a vmap_file() which would give us a virtually contiguous chunk of memory (and could be trivially turned into a noop for the case of trying to vmap a single large folio). ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/