From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D130E3D5C05; Tue, 12 May 2026 17:02:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778605325; cv=none; b=Wq9Q4kDZmK5kwZQqxCAaLYa3G4FxKUfTQSHsKm4hROgv12VDfwegJHQb7WykVhnOUyMnMrgrbBb99Fr8PnD6Gbc/Vhc7lFxFk3rqpk17cJBXY12NZOkTzhTT/qO3Zb0O8w0aly0IgyIU779BLts2Av1skhpQuunGeNI07n7Zp4s= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778605325; c=relaxed/simple; bh=3w1awZ3eeZyi0jyriVIT/nzrxciCz6a3fg6BE8QtIlg=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=G5uslSr+tSmuo0d6F51f0tPRdKogqVtXMZCSxCTsHwtfQSsgHC6M+pEYzL1LWEUos7mAmBEaEySZK6tmXHFvLq89RMDAQTqX1J8GUpudprL9qVCJhdtKNfQvQa+2I6EMqs0VjoxOyS3d7fb3HZqE8Q3hRZeE0ACmwl059ef00rU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=TjMYk2vR; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="TjMYk2vR" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6F11FC2BCC7; Tue, 12 May 2026 17:02:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1778605325; bh=3w1awZ3eeZyi0jyriVIT/nzrxciCz6a3fg6BE8QtIlg=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=TjMYk2vRoty2a4xDF+mPNkwg088Su/Dx1Sz86vsQ2QZG6kbz61WB0pPzkcCZAxjYq kJY3nMshWUg7KGRx6BsutKaT+vu8mwemfJijbJapF//T7rUIbic7QUSCSMCINj9Vnm utITe5nMhhEVSeINgpaCBTozVPfIZ/os3th5Ms0aJgOfThlG42fWDsASqq13Ot1T5b Wg1VYUEw8U21QanLlWxGkEH80h/yjschij/vN+NnHNHX+S7Sb+2OzYIZUb7Q97/fHl QCDzO30QikF4V6NHzRaXJaxzSG3Ei3fZ/Pa/6QN0mk9zIzYYmmkf0fXA4YRksP43ne w2Gji8z2LDbZA== Date: Tue, 12 May 2026 10:02:04 -0700 From: "Darrick J. Wong" To: Christoph Hellwig Cc: Andrew Morton , Chris Li , Kairui Song , Christian Brauner , Jens Axboe , David Sterba , Theodore Ts'o , Jaegeuk Kim , Chao Yu , Trond Myklebust , Anna Schumaker , Namjae Jeon , Hyunchul Lee , Steve French , Paulo Alcantara , Carlos Maiolino , Damien Le Moal , Naohiro Aota , linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-block@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org Subject: Re: [PATCH 08/12] swap,iomap: simplify iomap_swapfile_iter Message-ID: <20260512170204.GI9555@frogsfrogsfrogs> References: <20260512053625.2950900-1-hch@lst.de> <20260512053625.2950900-9-hch@lst.de> Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20260512053625.2950900-9-hch@lst.de> On Tue, May 12, 2026 at 07:35:24AM +0200, Christoph Hellwig wrote: > add_swap_extent already coalesces multiple extents, no need to duplicate > that in the caller. > > Signed-off-by: Christoph Hellwig /me wishes he'd either noticed that add_swap_extent already had the coalescing code or had documented why he implemented his own. OH. Now I remember why -- it's to handle contiguous mixed mappings better. Let's say that you have a 1k fsblock filesystem and 4k base pages. You fallocate an 8G swap file and then mkswap it. The first mapping is a 1k written mapping at offset 0 for the swap header, followed by an 8388607k unwritten mapping at offset 3k. The PAGE_SIZE rounding code in iomap_swapfile_add_extent will round the end of that first mapping down to zero and ignore it. The second mapping will be treated as if it were a 8388604k mapping starting at offset 4096. Now the page counts are wrong and the swapon fails. A more generic solution to this would be to change add_swap_extent to take sector_t addr and length values and use them to construct a bitmap representing contiguous physical space on the bdev, accounting of course for PAGE_SIZE alignment. Except for the swap header page, every other contiguously set page-aligned region in the bitmap gets added to the swap extent map. You could then maximize the number of pages participating in swap even for files with layouts that are truly egregiously bad. But I elected not to go there because the common case is fallocate getting contiguous space. But still, I'm not sure we want to drop the iomap accumulator in fs/iomap/swapfile.c. --D > --- > fs/iomap/swapfile.c | 104 +++++++++++++------------------------------- > 1 file changed, 31 insertions(+), 73 deletions(-) > > diff --git a/fs/iomap/swapfile.c b/fs/iomap/swapfile.c > index cf354fdfb7c3..a4e0ca462cc4 100644 > --- a/fs/iomap/swapfile.c > +++ b/fs/iomap/swapfile.c > @@ -6,57 +6,32 @@ > #include > #include > > -/* Swapfile activation */ > - > -struct iomap_swapfile_info { > - struct iomap iomap; /* accumulated iomap */ > - struct swap_info_struct *sis; > - unsigned long nr_pages; /* number of pages collected */ > - struct file *file; > -}; > - > -/* > - * Collect physical extents for this swap file. Physical extents reported to > - * the swap code must be trimmed to align to a page boundary. The logical > - * offset within the file is irrelevant since the swapfile code maps logical > - * page numbers of the swap device to the physical page-aligned extents. > - */ > -static int iomap_swapfile_add_extent(struct iomap_swapfile_info *isi) > -{ > - struct iomap *iomap = &isi->iomap; > - uint64_t first_ppage; > - uint64_t next_ppage; > - > - /* > - * Round the start up and the end down so that the physical > - * extent aligns to a page boundary. > - */ > - first_ppage = ALIGN(iomap->addr, PAGE_SIZE) >> PAGE_SHIFT; > - next_ppage = ALIGN_DOWN(iomap->addr + iomap->length, PAGE_SIZE) >> > - PAGE_SHIFT; > - return add_swap_extent(isi->sis, next_ppage - first_ppage, first_ppage); > -} > - > -static int iomap_swapfile_fail(struct iomap_swapfile_info *isi, const char *str) > +static int iomap_swapfile_fail(struct file *file, const char *str) > { > char *buf, *p = ERR_PTR(-ENOMEM); > > buf = kmalloc(PATH_MAX, GFP_KERNEL); > if (buf) > - p = file_path(isi->file, buf, PATH_MAX); > + p = file_path(file, buf, PATH_MAX); > pr_err("swapon: file %s %s\n", IS_ERR(p) ? "" : p, str); > kfree(buf); > return -EINVAL; > } > > /* > - * Accumulate iomaps for this swap file. We have to accumulate iomaps because > - * swap only cares about contiguous page-aligned physical extents and makes no > - * distinction between written and unwritten extents. > + * Report physical extents for this swap file. Physical extents reported to the > + * swap code must be trimmed to align to a page boundary. The logical offset > + * within the file is irrelevant since the swapfile code maps logical page > + * numbers of the swap device to the physical page-aligned extents. > */ > -static int iomap_swapfile_iter(struct iomap_iter *iter, > - struct iomap *iomap, struct iomap_swapfile_info *isi) > +static int iomap_swapfile_iter(struct iomap_iter *iter, struct file *file, > + struct swap_info_struct *sis) > { > + struct iomap *iomap = &iter->iomap; > + uint64_t first_ppage; > + uint64_t next_ppage; > + int error; > + > switch (iomap->type) { > case IOMAP_MAPPED: > case IOMAP_UNWRITTEN: > @@ -64,35 +39,31 @@ static int iomap_swapfile_iter(struct iomap_iter *iter, > break; > case IOMAP_INLINE: > /* No inline data. */ > - return iomap_swapfile_fail(isi, "is inline"); > + return iomap_swapfile_fail(file, "is inline"); > default: > - return iomap_swapfile_fail(isi, "has unallocated extents"); > + return iomap_swapfile_fail(file, "has unallocated extents"); > } > > /* No uncommitted metadata or shared blocks. */ > if (iomap->flags & IOMAP_F_DIRTY) > - return iomap_swapfile_fail(isi, "is not committed"); > + return iomap_swapfile_fail(file, "is not committed"); > if (iomap->flags & IOMAP_F_SHARED) > - return iomap_swapfile_fail(isi, "has shared extents"); > + return iomap_swapfile_fail(file, "has shared extents"); > > /* Only one bdev per swap file. */ > - if (iomap->bdev != isi->sis->bdev) > - return iomap_swapfile_fail(isi, "outside the main device"); > - > - if (isi->iomap.length == 0) { > - /* No accumulated extent, so just store it. */ > - memcpy(&isi->iomap, iomap, sizeof(isi->iomap)); > - } else if (isi->iomap.addr + isi->iomap.length == iomap->addr) { > - /* Append this to the accumulated extent. */ > - isi->iomap.length += iomap->length; > - } else { > - /* Otherwise, add the retained iomap and store this one. */ > - int error = iomap_swapfile_add_extent(isi); > - if (error) > - return error; > - memcpy(&isi->iomap, iomap, sizeof(isi->iomap)); > - } > + if (iomap->bdev != sis->bdev) > + return iomap_swapfile_fail(file, "outside the main device"); > > + /* > + * Round the start up and the end down so that the physical extent > + * aligns to a page boundary. > + */ > + first_ppage = ALIGN(iomap->addr, PAGE_SIZE) >> PAGE_SHIFT; > + next_ppage = ALIGN_DOWN(iomap->addr + iomap->length, PAGE_SIZE) >> > + PAGE_SHIFT; > + error = add_swap_extent(sis, next_ppage - first_ppage, first_ppage); > + if (error) > + return error; > return iomap_iter_advance_full(iter); > } > > @@ -110,10 +81,6 @@ int iomap_swap_activate(struct file *file, struct swap_info_struct *sis, > .len = ALIGN_DOWN(i_size_read(inode), PAGE_SIZE), > .flags = IOMAP_REPORT, > }; > - struct iomap_swapfile_info isi = { > - .sis = sis, > - .file = file, > - }; > int ret; > > /* > @@ -125,16 +92,7 @@ int iomap_swap_activate(struct file *file, struct swap_info_struct *sis, > return ret; > > while ((ret = iomap_iter(&iter, ops)) > 0) > - iter.status = iomap_swapfile_iter(&iter, &iter.iomap, &isi); > - if (ret < 0) > - return ret; > - > - if (isi.iomap.length) { > - ret = iomap_swapfile_add_extent(&isi); > - if (ret) > - return ret; > - } > - > - return 0; > + iter.status = iomap_swapfile_iter(&iter, file, sis); > + return ret; > } > EXPORT_SYMBOL_GPL(iomap_swap_activate); > -- > 2.53.0 > >