From: "Darrick J. Wong" <djwong@kernel.org>
To: Christoph Hellwig <hch@lst.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Chris Li <chrisl@kernel.org>, Kairui Song <kasong@tencent.com>,
Christian Brauner <brauner@kernel.org>,
Jens Axboe <axboe@kernel.dk>, David Sterba <dsterba@suse.com>,
Theodore Ts'o <tytso@mit.edu>, Jaegeuk Kim <jaegeuk@kernel.org>,
Chao Yu <chao@kernel.org>, Trond Myklebust <trondmy@kernel.org>,
Anna Schumaker <anna@kernel.org>,
Namjae Jeon <linkinjeon@kernel.org>,
Hyunchul Lee <hyc.lee@gmail.com>,
Steve French <sfrench@samba.org>,
Paulo Alcantara <pc@manguebit.org>,
Carlos Maiolino <cem@kernel.org>,
Damien Le Moal <dlemoal@kernel.org>,
Naohiro Aota <naohiro.aota@wdc.com>,
linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-doc@vger.kernel.org, linux-mm@kvack.org,
linux-block@vger.kernel.org, linux-btrfs@vger.kernel.org,
linux-ext4@vger.kernel.org,
linux-f2fs-devel@lists.sourceforge.net,
linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org
Subject: Re: [PATCH 08/12] swap,iomap: simplify iomap_swapfile_iter
Date: Tue, 12 May 2026 10:02:04 -0700 [thread overview]
Message-ID: <20260512170204.GI9555@frogsfrogsfrogs> (raw)
In-Reply-To: <20260512053625.2950900-9-hch@lst.de>
On Tue, May 12, 2026 at 07:35:24AM +0200, Christoph Hellwig wrote:
> add_swap_extent already coalesces multiple extents, no need to duplicate
> that in the caller.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
/me wishes he'd either noticed that add_swap_extent already had the
coalescing code or had documented why he implemented his own.
OH. Now I remember why -- it's to handle contiguous mixed mappings
better.
Let's say that you have a 1k fsblock filesystem and 4k base pages. You
fallocate an 8G swap file and then mkswap it. The first mapping is a 1k
written mapping at offset 0 for the swap header, followed by an 8388607k
unwritten mapping at offset 3k.
The PAGE_SIZE rounding code in iomap_swapfile_add_extent will round the
end of that first mapping down to zero and ignore it. The second
mapping will be treated as if it were a 8388604k mapping starting at
offset 4096. Now the page counts are wrong and the swapon fails.
A more generic solution to this would be to change add_swap_extent to
take sector_t addr and length values and use them to construct a bitmap
representing contiguous physical space on the bdev, accounting of course
for PAGE_SIZE alignment. Except for the swap header page, every other
contiguously set page-aligned region in the bitmap gets added to the
swap extent map.
You could then maximize the number of pages participating in swap even
for files with layouts that are truly egregiously bad. But I elected
not to go there because the common case is fallocate getting contiguous
space.
But still, I'm not sure we want to drop the iomap accumulator in
fs/iomap/swapfile.c.
--D
> ---
> fs/iomap/swapfile.c | 104 +++++++++++++-------------------------------
> 1 file changed, 31 insertions(+), 73 deletions(-)
>
> diff --git a/fs/iomap/swapfile.c b/fs/iomap/swapfile.c
> index cf354fdfb7c3..a4e0ca462cc4 100644
> --- a/fs/iomap/swapfile.c
> +++ b/fs/iomap/swapfile.c
> @@ -6,57 +6,32 @@
> #include <linux/iomap.h>
> #include <linux/swap.h>
>
> -/* Swapfile activation */
> -
> -struct iomap_swapfile_info {
> - struct iomap iomap; /* accumulated iomap */
> - struct swap_info_struct *sis;
> - unsigned long nr_pages; /* number of pages collected */
> - struct file *file;
> -};
> -
> -/*
> - * Collect physical extents for this swap file. Physical extents reported to
> - * the swap code must be trimmed to align to a page boundary. The logical
> - * offset within the file is irrelevant since the swapfile code maps logical
> - * page numbers of the swap device to the physical page-aligned extents.
> - */
> -static int iomap_swapfile_add_extent(struct iomap_swapfile_info *isi)
> -{
> - struct iomap *iomap = &isi->iomap;
> - uint64_t first_ppage;
> - uint64_t next_ppage;
> -
> - /*
> - * Round the start up and the end down so that the physical
> - * extent aligns to a page boundary.
> - */
> - first_ppage = ALIGN(iomap->addr, PAGE_SIZE) >> PAGE_SHIFT;
> - next_ppage = ALIGN_DOWN(iomap->addr + iomap->length, PAGE_SIZE) >>
> - PAGE_SHIFT;
> - return add_swap_extent(isi->sis, next_ppage - first_ppage, first_ppage);
> -}
> -
> -static int iomap_swapfile_fail(struct iomap_swapfile_info *isi, const char *str)
> +static int iomap_swapfile_fail(struct file *file, const char *str)
> {
> char *buf, *p = ERR_PTR(-ENOMEM);
>
> buf = kmalloc(PATH_MAX, GFP_KERNEL);
> if (buf)
> - p = file_path(isi->file, buf, PATH_MAX);
> + p = file_path(file, buf, PATH_MAX);
> pr_err("swapon: file %s %s\n", IS_ERR(p) ? "<unknown>" : p, str);
> kfree(buf);
> return -EINVAL;
> }
>
> /*
> - * Accumulate iomaps for this swap file. We have to accumulate iomaps because
> - * swap only cares about contiguous page-aligned physical extents and makes no
> - * distinction between written and unwritten extents.
> + * Report physical extents for this swap file. Physical extents reported to the
> + * swap code must be trimmed to align to a page boundary. The logical offset
> + * within the file is irrelevant since the swapfile code maps logical page
> + * numbers of the swap device to the physical page-aligned extents.
> */
> -static int iomap_swapfile_iter(struct iomap_iter *iter,
> - struct iomap *iomap, struct iomap_swapfile_info *isi)
> +static int iomap_swapfile_iter(struct iomap_iter *iter, struct file *file,
> + struct swap_info_struct *sis)
> {
> + struct iomap *iomap = &iter->iomap;
> + uint64_t first_ppage;
> + uint64_t next_ppage;
> + int error;
> +
> switch (iomap->type) {
> case IOMAP_MAPPED:
> case IOMAP_UNWRITTEN:
> @@ -64,35 +39,31 @@ static int iomap_swapfile_iter(struct iomap_iter *iter,
> break;
> case IOMAP_INLINE:
> /* No inline data. */
> - return iomap_swapfile_fail(isi, "is inline");
> + return iomap_swapfile_fail(file, "is inline");
> default:
> - return iomap_swapfile_fail(isi, "has unallocated extents");
> + return iomap_swapfile_fail(file, "has unallocated extents");
> }
>
> /* No uncommitted metadata or shared blocks. */
> if (iomap->flags & IOMAP_F_DIRTY)
> - return iomap_swapfile_fail(isi, "is not committed");
> + return iomap_swapfile_fail(file, "is not committed");
> if (iomap->flags & IOMAP_F_SHARED)
> - return iomap_swapfile_fail(isi, "has shared extents");
> + return iomap_swapfile_fail(file, "has shared extents");
>
> /* Only one bdev per swap file. */
> - if (iomap->bdev != isi->sis->bdev)
> - return iomap_swapfile_fail(isi, "outside the main device");
> -
> - if (isi->iomap.length == 0) {
> - /* No accumulated extent, so just store it. */
> - memcpy(&isi->iomap, iomap, sizeof(isi->iomap));
> - } else if (isi->iomap.addr + isi->iomap.length == iomap->addr) {
> - /* Append this to the accumulated extent. */
> - isi->iomap.length += iomap->length;
> - } else {
> - /* Otherwise, add the retained iomap and store this one. */
> - int error = iomap_swapfile_add_extent(isi);
> - if (error)
> - return error;
> - memcpy(&isi->iomap, iomap, sizeof(isi->iomap));
> - }
> + if (iomap->bdev != sis->bdev)
> + return iomap_swapfile_fail(file, "outside the main device");
>
> + /*
> + * Round the start up and the end down so that the physical extent
> + * aligns to a page boundary.
> + */
> + first_ppage = ALIGN(iomap->addr, PAGE_SIZE) >> PAGE_SHIFT;
> + next_ppage = ALIGN_DOWN(iomap->addr + iomap->length, PAGE_SIZE) >>
> + PAGE_SHIFT;
> + error = add_swap_extent(sis, next_ppage - first_ppage, first_ppage);
> + if (error)
> + return error;
> return iomap_iter_advance_full(iter);
> }
>
> @@ -110,10 +81,6 @@ int iomap_swap_activate(struct file *file, struct swap_info_struct *sis,
> .len = ALIGN_DOWN(i_size_read(inode), PAGE_SIZE),
> .flags = IOMAP_REPORT,
> };
> - struct iomap_swapfile_info isi = {
> - .sis = sis,
> - .file = file,
> - };
> int ret;
>
> /*
> @@ -125,16 +92,7 @@ int iomap_swap_activate(struct file *file, struct swap_info_struct *sis,
> return ret;
>
> while ((ret = iomap_iter(&iter, ops)) > 0)
> - iter.status = iomap_swapfile_iter(&iter, &iter.iomap, &isi);
> - if (ret < 0)
> - return ret;
> -
> - if (isi.iomap.length) {
> - ret = iomap_swapfile_add_extent(&isi);
> - if (ret)
> - return ret;
> - }
> -
> - return 0;
> + iter.status = iomap_swapfile_iter(&iter, file, sis);
> + return ret;
> }
> EXPORT_SYMBOL_GPL(iomap_swap_activate);
> --
> 2.53.0
>
>
next prev parent reply other threads:[~2026-05-12 17:02 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-12 5:35 improve the swap_activate interface Christoph Hellwig
2026-05-12 5:35 ` [PATCH 01/12] swap: remove the maxpages variable in sys_swapon Christoph Hellwig
2026-05-12 7:08 ` Damien Le Moal
2026-05-12 7:20 ` Christoph Hellwig
2026-05-12 14:19 ` Hannes Reinecke
2026-05-12 16:14 ` Darrick J. Wong
2026-05-12 5:35 ` [PATCH 02/12] swap: move boilerplate code into the core swap code Christoph Hellwig
2026-05-12 7:11 ` Damien Le Moal
2026-05-12 16:33 ` Darrick J. Wong
2026-05-12 5:35 ` [PATCH 03/12] swap,fs: move swapfile operations to struct file_operations Christoph Hellwig
2026-05-12 7:16 ` Damien Le Moal
2026-05-12 16:41 ` Darrick J. Wong
2026-05-12 5:35 ` [PATCH 04/12] swap: restrict to regular files or block devices Christoph Hellwig
2026-05-12 7:17 ` Damien Le Moal
2026-05-12 16:42 ` Darrick J. Wong
2026-05-12 5:35 ` [PATCH 05/12] swap: cleanup setup_swap_extents Christoph Hellwig
2026-05-12 7:18 ` Damien Le Moal
2026-05-12 16:43 ` Darrick J. Wong
2026-05-13 5:56 ` Christoph Hellwig
2026-05-12 5:35 ` [PATCH 06/12] swap,block: move the block device swapon code into block/fops.c Christoph Hellwig
2026-05-12 7:20 ` Damien Le Moal
2026-05-12 16:44 ` Darrick J. Wong
2026-05-12 5:35 ` [PATCH 07/12] swap,block: limit swap file size to device size Christoph Hellwig
2026-05-12 7:21 ` Damien Le Moal
2026-05-12 7:23 ` Christoph Hellwig
2026-05-12 16:45 ` Darrick J. Wong
2026-05-12 5:35 ` [PATCH 08/12] swap,iomap: simplify iomap_swapfile_iter Christoph Hellwig
2026-05-12 7:31 ` Damien Le Moal
2026-05-12 17:02 ` Darrick J. Wong [this message]
2026-05-13 6:56 ` Christoph Hellwig
2026-05-12 5:35 ` [PATCH 09/12] swap: push down setting sis->bdev into ->swap_activate Christoph Hellwig
2026-05-12 7:34 ` Damien Le Moal
2026-05-12 17:08 ` Darrick J. Wong
2026-05-13 5:58 ` Christoph Hellwig
2026-05-13 7:44 ` Damien Le Moal
2026-05-13 7:46 ` Christoph Hellwig
2026-05-13 7:58 ` Damien Le Moal
2026-05-12 5:35 ` [PATCH 10/12] swap: add a swap_activate_fs_ops helper Christoph Hellwig
2026-05-12 7:36 ` Damien Le Moal
2026-05-12 17:09 ` Darrick J. Wong
2026-05-12 5:35 ` [PATCH 11/12] swap: move struct swap_extent to swapfile.c Christoph Hellwig
2026-05-12 7:36 ` Damien Le Moal
2026-05-12 17:09 ` Darrick J. Wong
2026-05-12 5:35 ` [PATCH 12/12] swap: move swap_info_struct to mm/swap.h Christoph Hellwig
2026-05-12 7:41 ` Damien Le Moal
2026-05-12 17:10 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260512170204.GI9555@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=anna@kernel.org \
--cc=axboe@kernel.dk \
--cc=brauner@kernel.org \
--cc=cem@kernel.org \
--cc=chao@kernel.org \
--cc=chrisl@kernel.org \
--cc=dlemoal@kernel.org \
--cc=dsterba@suse.com \
--cc=hch@lst.de \
--cc=hyc.lee@gmail.com \
--cc=jaegeuk@kernel.org \
--cc=kasong@tencent.com \
--cc=linkinjeon@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-cifs@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-f2fs-devel@lists.sourceforge.net \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=naohiro.aota@wdc.com \
--cc=pc@manguebit.org \
--cc=sfrench@samba.org \
--cc=trondmy@kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox