linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Barry Song <21cnbao@gmail.com>, Chris Li <chrisl@kernel.org>
Cc: lsf-pc@lists.linux-foundation.org, linux-mm <linux-mm@kvack.org>,
	ryan.roberts@arm.com, Chuanhua Han <hanchuanhua@oppo.com>
Subject: Re: [LSF/MM/BPF TOPIC] Swap Abstraction "the pony"
Date: Fri, 8 Mar 2024 09:55:51 +0100	[thread overview]
Message-ID: <0530d101-b636-402e-991f-bc186b45490c@redhat.com> (raw)
In-Reply-To: <CAGsJ_4zV7S0SX_gaeO-V=bo232TNOZYpHoBzN7js_dvpimq1KA@mail.gmail.com>

On 06.03.24 07:05, Barry Song wrote:
> On Wed, Mar 6, 2024 at 4:00 PM Chris Li <chrisl@kernel.org> wrote:
>>
>> On Tue, Mar 5, 2024 at 5:15 PM Barry Song <21cnbao@gmail.com> wrote:
>>>> Another limitation I would like to address is that swap_writepage can
>>>> only write out IO in one contiguous chunk, not able to perform
>>>> non-continuous IO. When the swapfile is close to full, it is likely
>>>> the unused entry will spread across different locations. It would be
>>>> nice to be able to read and write large folio using discontiguous disk
>>>> IO locations.
>>>
>>> I don't find it will be too difficult for swap_writepage to only write
>>> out a large folio which has discontiguous swap offsets. taking
>>> zRAM as an example, as long as bio can be organized correctly,
>>> zram should be able to write a large folio one by one for its all
>>> subpages.
>>
>> Yes.
>>
>>>
>>> static void zram_bio_write(struct zram *zram, struct bio *bio)
>>> {
>>>          unsigned long start_time = bio_start_io_acct(bio);
>>>          struct bvec_iter iter = bio->bi_iter;
>>>
>>>          do {
>>>                  u32 index = iter.bi_sector >> SECTORS_PER_PAGE_SHIFT;
>>>                  u32 offset = (iter.bi_sector & (SECTORS_PER_PAGE - 1)) <<
>>>                                  SECTOR_SHIFT;
>>>                  struct bio_vec bv = bio_iter_iovec(bio, iter);
>>>
>>>                  bv.bv_len = min_t(u32, bv.bv_len, PAGE_SIZE - offset);
>>>
>>>                  if (zram_bvec_write(zram, &bv, index, offset, bio) < 0) {
>>>                          atomic64_inc(&zram->stats.failed_writes);
>>>                          bio->bi_status = BLK_STS_IOERR;
>>>                          break;
>>>                  }
>>>
>>>                  zram_slot_lock(zram, index);
>>>                  zram_accessed(zram, index);
>>>                  zram_slot_unlock(zram, index);
>>>
>>>                  bio_advance_iter_single(bio, &iter, bv.bv_len);
>>>          } while (iter.bi_size);
>>>
>>>          bio_end_io_acct(bio, start_time);
>>>          bio_endio(bio);
>>> }
>>>
>>> right now , add_to_swap() is lacking a way to record discontiguous
>>> offset for each subpage, alternatively, we have a folio->swap.
>>>
>>> I wonder if we can somehow make it page granularity, for each
>>> subpage, it can have its own offset somehow like page->swap,
>>> then in swap_writepage(), we can make a bio with multiple
>>> discontiguous I/O index. then we allow add_to_swap() to get
>>> nr_pages different swap offsets, and fill into each subpage.
>>
>> The key is where to store the subpage offset. It can't be stored on
>> the tail page's page->swap because some tail page's page struct are
>> just mapping of the head page's page struct. I am afraid this mapping
>> relationship has to be stored on the swap back end. That is the idea,
>> have swap backend keep track of an array of subpage's swap location.
>> This array is looked up by the head swap offset.
> 
> I assume "some tail page's page struct are just mapping of the head
> page's page struct" is only true of hugeTLB larger than PMD-mapped
> hugeTLB (for example 2MB) for this moment? more widely mTHP
> less than PMD-mapped size will still have all tail page struct?

We just successfully stopped using subpages to store swap offsets, and 
even accidentally fixed a bug that was lurking for years. I am confident 
that we don't want to go back. The current direction is to move as much 
information we can out of the subpages: So if we can find ways to avoid 
messing with subpages, that would be great.

-- 
Cheers,

David / dhildenb



  parent reply	other threads:[~2024-03-08  8:56 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-01  9:24 [LSF/MM/BPF TOPIC] Swap Abstraction "the pony" Chris Li
2024-03-01  9:53 ` Nhat Pham
2024-03-01 18:57   ` Chris Li
2024-03-04 22:58   ` Matthew Wilcox
2024-03-05  3:23     ` Chengming Zhou
2024-03-05  7:44       ` Chris Li
2024-03-05  8:15         ` Chengming Zhou
2024-03-05 18:24           ` Chris Li
2024-03-05  9:32         ` Nhat Pham
2024-03-05  9:52           ` Chengming Zhou
2024-03-05 10:55             ` Nhat Pham
2024-03-05 19:20               ` Chris Li
2024-03-05 20:56                 ` Jared Hulbert
2024-03-05 21:38         ` Jared Hulbert
2024-03-05 21:58           ` Chris Li
2024-03-06  4:16             ` Jared Hulbert
2024-03-06  5:50               ` Chris Li
     [not found]                 ` <CA+ZsKJ7JE56NS6hu4L_uyywxZO7ixgftvfKjdND9e5SOyn+72Q@mail.gmail.com>
2024-03-06 18:16                   ` Chris Li
2024-03-06 22:44                     ` Jared Hulbert
2024-03-07  0:46                       ` Chris Li
2024-03-07  8:57                         ` Jared Hulbert
2024-03-06  1:33   ` Barry Song
2024-03-04 18:43 ` Kairui Song
2024-03-04 22:03   ` Jared Hulbert
2024-03-04 22:47     ` Chris Li
2024-03-04 22:36   ` Chris Li
2024-03-06  1:15 ` Barry Song
2024-03-06  2:59   ` Chris Li
2024-03-06  6:05     ` Barry Song
2024-03-06 17:56       ` Chris Li
2024-03-06 21:29         ` Barry Song
2024-03-08  8:55       ` David Hildenbrand [this message]
2024-03-07  7:56 ` Chuanhua Han
2024-03-07 14:03   ` [Lsf-pc] " Jan Kara
2024-03-07 21:06     ` Jared Hulbert
2024-03-07 21:17       ` Barry Song
2024-03-08  0:14         ` Jared Hulbert
2024-03-08  0:53           ` Barry Song
2024-03-14  9:03         ` Jan Kara
2024-05-16 15:04           ` Zi Yan
2024-05-17  3:48             ` Chris Li
2024-03-14  8:52       ` Jan Kara
2024-03-08  2:02     ` Chuanhua Han
2024-03-14  8:26       ` Jan Kara
2024-03-14 11:19         ` Chuanhua Han
2024-05-15 23:07           ` Chris Li
2024-05-16  7:16             ` Chuanhua Han
2024-05-17 12:12     ` Karim Manaouil
2024-05-21 20:40       ` Chris Li
2024-05-28  7:08         ` Jared Hulbert
2024-05-29  3:36           ` Chris Li
2024-05-29  3:57         ` Matthew Wilcox
2024-05-29  6:50           ` Chris Li
2024-05-29 12:33             ` Matthew Wilcox
2024-05-30 22:53               ` Chris Li
2024-05-31  3:12                 ` Matthew Wilcox
2024-06-01  0:43                   ` Chris Li
2024-05-31  1:56               ` Yuanchu Xie
2024-05-31 16:51                 ` Chris Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0530d101-b636-402e-991f-bc186b45490c@redhat.com \
    --to=david@redhat.com \
    --cc=21cnbao@gmail.com \
    --cc=chrisl@kernel.org \
    --cc=hanchuanhua@oppo.com \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=ryan.roberts@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).