From: David Hildenbrand <david@redhat.com>
To: Barry Song <21cnbao@gmail.com>, Chris Li <chrisl@kernel.org>
Cc: lsf-pc@lists.linux-foundation.org, linux-mm <linux-mm@kvack.org>,
ryan.roberts@arm.com, Chuanhua Han <hanchuanhua@oppo.com>
Subject: Re: [LSF/MM/BPF TOPIC] Swap Abstraction "the pony"
Date: Fri, 8 Mar 2024 09:55:51 +0100 [thread overview]
Message-ID: <0530d101-b636-402e-991f-bc186b45490c@redhat.com> (raw)
In-Reply-To: <CAGsJ_4zV7S0SX_gaeO-V=bo232TNOZYpHoBzN7js_dvpimq1KA@mail.gmail.com>
On 06.03.24 07:05, Barry Song wrote:
> On Wed, Mar 6, 2024 at 4:00 PM Chris Li <chrisl@kernel.org> wrote:
>>
>> On Tue, Mar 5, 2024 at 5:15 PM Barry Song <21cnbao@gmail.com> wrote:
>>>> Another limitation I would like to address is that swap_writepage can
>>>> only write out IO in one contiguous chunk, not able to perform
>>>> non-continuous IO. When the swapfile is close to full, it is likely
>>>> the unused entry will spread across different locations. It would be
>>>> nice to be able to read and write large folio using discontiguous disk
>>>> IO locations.
>>>
>>> I don't find it will be too difficult for swap_writepage to only write
>>> out a large folio which has discontiguous swap offsets. taking
>>> zRAM as an example, as long as bio can be organized correctly,
>>> zram should be able to write a large folio one by one for its all
>>> subpages.
>>
>> Yes.
>>
>>>
>>> static void zram_bio_write(struct zram *zram, struct bio *bio)
>>> {
>>> unsigned long start_time = bio_start_io_acct(bio);
>>> struct bvec_iter iter = bio->bi_iter;
>>>
>>> do {
>>> u32 index = iter.bi_sector >> SECTORS_PER_PAGE_SHIFT;
>>> u32 offset = (iter.bi_sector & (SECTORS_PER_PAGE - 1)) <<
>>> SECTOR_SHIFT;
>>> struct bio_vec bv = bio_iter_iovec(bio, iter);
>>>
>>> bv.bv_len = min_t(u32, bv.bv_len, PAGE_SIZE - offset);
>>>
>>> if (zram_bvec_write(zram, &bv, index, offset, bio) < 0) {
>>> atomic64_inc(&zram->stats.failed_writes);
>>> bio->bi_status = BLK_STS_IOERR;
>>> break;
>>> }
>>>
>>> zram_slot_lock(zram, index);
>>> zram_accessed(zram, index);
>>> zram_slot_unlock(zram, index);
>>>
>>> bio_advance_iter_single(bio, &iter, bv.bv_len);
>>> } while (iter.bi_size);
>>>
>>> bio_end_io_acct(bio, start_time);
>>> bio_endio(bio);
>>> }
>>>
>>> right now , add_to_swap() is lacking a way to record discontiguous
>>> offset for each subpage, alternatively, we have a folio->swap.
>>>
>>> I wonder if we can somehow make it page granularity, for each
>>> subpage, it can have its own offset somehow like page->swap,
>>> then in swap_writepage(), we can make a bio with multiple
>>> discontiguous I/O index. then we allow add_to_swap() to get
>>> nr_pages different swap offsets, and fill into each subpage.
>>
>> The key is where to store the subpage offset. It can't be stored on
>> the tail page's page->swap because some tail page's page struct are
>> just mapping of the head page's page struct. I am afraid this mapping
>> relationship has to be stored on the swap back end. That is the idea,
>> have swap backend keep track of an array of subpage's swap location.
>> This array is looked up by the head swap offset.
>
> I assume "some tail page's page struct are just mapping of the head
> page's page struct" is only true of hugeTLB larger than PMD-mapped
> hugeTLB (for example 2MB) for this moment? more widely mTHP
> less than PMD-mapped size will still have all tail page struct?
We just successfully stopped using subpages to store swap offsets, and
even accidentally fixed a bug that was lurking for years. I am confident
that we don't want to go back. The current direction is to move as much
information we can out of the subpages: So if we can find ways to avoid
messing with subpages, that would be great.
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2024-03-08 8:56 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-01 9:24 [LSF/MM/BPF TOPIC] Swap Abstraction "the pony" Chris Li
2024-03-01 9:53 ` Nhat Pham
2024-03-01 18:57 ` Chris Li
2024-03-04 22:58 ` Matthew Wilcox
2024-03-05 3:23 ` Chengming Zhou
2024-03-05 7:44 ` Chris Li
2024-03-05 8:15 ` Chengming Zhou
2024-03-05 18:24 ` Chris Li
2024-03-05 9:32 ` Nhat Pham
2024-03-05 9:52 ` Chengming Zhou
2024-03-05 10:55 ` Nhat Pham
2024-03-05 19:20 ` Chris Li
2024-03-05 20:56 ` Jared Hulbert
2024-03-05 21:38 ` Jared Hulbert
2024-03-05 21:58 ` Chris Li
2024-03-06 4:16 ` Jared Hulbert
2024-03-06 5:50 ` Chris Li
[not found] ` <CA+ZsKJ7JE56NS6hu4L_uyywxZO7ixgftvfKjdND9e5SOyn+72Q@mail.gmail.com>
2024-03-06 18:16 ` Chris Li
2024-03-06 22:44 ` Jared Hulbert
2024-03-07 0:46 ` Chris Li
2024-03-07 8:57 ` Jared Hulbert
2024-03-06 1:33 ` Barry Song
2024-03-04 18:43 ` Kairui Song
2024-03-04 22:03 ` Jared Hulbert
2024-03-04 22:47 ` Chris Li
2024-03-04 22:36 ` Chris Li
2024-03-06 1:15 ` Barry Song
2024-03-06 2:59 ` Chris Li
2024-03-06 6:05 ` Barry Song
2024-03-06 17:56 ` Chris Li
2024-03-06 21:29 ` Barry Song
2024-03-08 8:55 ` David Hildenbrand [this message]
2024-03-07 7:56 ` Chuanhua Han
2024-03-07 14:03 ` [Lsf-pc] " Jan Kara
2024-03-07 21:06 ` Jared Hulbert
2024-03-07 21:17 ` Barry Song
2024-03-08 0:14 ` Jared Hulbert
2024-03-08 0:53 ` Barry Song
2024-03-14 9:03 ` Jan Kara
2024-05-16 15:04 ` Zi Yan
2024-05-17 3:48 ` Chris Li
2024-03-14 8:52 ` Jan Kara
2024-03-08 2:02 ` Chuanhua Han
2024-03-14 8:26 ` Jan Kara
2024-03-14 11:19 ` Chuanhua Han
2024-05-15 23:07 ` Chris Li
2024-05-16 7:16 ` Chuanhua Han
2024-05-17 12:12 ` Karim Manaouil
2024-05-21 20:40 ` Chris Li
2024-05-28 7:08 ` Jared Hulbert
2024-05-29 3:36 ` Chris Li
2024-05-29 3:57 ` Matthew Wilcox
2024-05-29 6:50 ` Chris Li
2024-05-29 12:33 ` Matthew Wilcox
2024-05-30 22:53 ` Chris Li
2024-05-31 3:12 ` Matthew Wilcox
2024-06-01 0:43 ` Chris Li
2024-05-31 1:56 ` Yuanchu Xie
2024-05-31 16:51 ` Chris Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0530d101-b636-402e-991f-bc186b45490c@redhat.com \
--to=david@redhat.com \
--cc=21cnbao@gmail.com \
--cc=chrisl@kernel.org \
--cc=hanchuanhua@oppo.com \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=ryan.roberts@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).