Re: [LSF/MM/BPF TOPIC] Swap Abstraction "the pony"

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Chris Li <chrisl@kernel.org>
To: Jared Hulbert <jaredeh@gmail.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>,
	Matthew Wilcox <willy@infradead.org>,
	 Nhat Pham <nphamcs@gmail.com>,
	lsf-pc@lists.linux-foundation.org,  linux-mm <linux-mm@kvack.org>,
	ryan.roberts@arm.com,  David Hildenbrand <david@redhat.com>,
	Barry Song <21cnbao@gmail.com>,
	 Chuanhua Han <hanchuanhua@oppo.com>
Subject: Re: [LSF/MM/BPF TOPIC] Swap Abstraction "the pony"
Date: Tue, 5 Mar 2024 13:58:00 -0800	[thread overview]
Message-ID: <CAF8kJuM1CMMZc=Hz8cWy1LEuRSFQ+NG3e67UtJeOt+NTbB7=HA@mail.gmail.com> (raw)
In-Reply-To: <CA+ZsKJ5K8M1f-8358p3aKOT3NBVz=3w1V8w8SP+kzuyu75qTOA@mail.gmail.com>

On Tue, Mar 5, 2024 at 1:38 PM Jared Hulbert <jaredeh@gmail.com> wrote:
>
> On Mon, Mar 4, 2024 at 11:49 PM Chris Li <chrisl@kernel.org> wrote:
> >
> > I have considered that as well, that is further than writing from one
> > swap device to another. The current swap device currently can't accept
> > write on non page aligned offset. If we allow byte aligned write out
> > size, the whole swap entry offset stuff needs some heavy changes.
> >
> > If we write out 4K pages, and the compression ratio is lower than 50%,
> > it means a combination of two compressed pages can't fit into one
> > page.  Which means some of the page read back will need to overflow
> > into another page. We kind of need a small file system to keep track
> > of how the compressed data is stored, because it is not page aligned
> > size any more.
> >
> > We can write out zsmalloc blocks of data as it is, however there is no
> > guarantee the data in zsmalloc blocks have the same LRU order.
> >
> > It makes more sense when writing higher order > 0 swap pages. e.g
> > writing 64K pages in one buffer, then we can write out compressed data
> > as page boundary aligned and page sizes, accepting the waste on the
> > last compressed page, might not fill up the whole page.
>
> A swap device not a device, until recently, it was a really bad
> filesystem with no abstractions between the block device and the
> filesystem.  Zswap and zram are, in some respects, attempts to make
> specialized filesystems without any of the advantages of using the vfs
> tooling.
>
> What stops us from using an existing compressing filesystem?

The issue is that the swap has a lot of different usage than a typical
file system. Please take a look at the current different usage cases
of swap and their related data structures, in the beginning of this
email thread.  If you want to use an existing file system, you still
need to to bridge the gap between swap system and file systems. For
example, the cgroup information is associated with each swap entry.

You can think of swap as  a special file system that can read and
write 4K objects by keys.  You can always use file system extend
attributes to track the additional information associated with each
swap entry. The end of the day, using the existing file system, the
per swap entry metadata overhead would  likely be much higher than the
current swap back end. I understand the current swap back end
organizes the data around swap offset, that makes swap data spreading
to many different places. That is one reason people might not like it.
However, it does have pretty minimal per swap entry memory overheads.

The file system can store their meta data on disk, reducing the in
memory overhead. That has a price that when you swap in a page, you
might need to go through a few file system metadata reads before you
can read in the real swapping data.

>
> Crazy talk here.  What if we handled swap pages like they were mmap'd
> to a special swap "file(s)"?

That is already the case in the kernel, the swap cache handling is the
same way of handling file cache with a file offset. Some of them even
share the same underlying function, for example filemap_get_folio().

Chris

next prev parent reply	other threads:[~2024-03-05 21:58 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-01  9:24 [LSF/MM/BPF TOPIC] Swap Abstraction "the pony" Chris Li
2024-03-01  9:53 ` Nhat Pham
2024-03-01 18:57   ` Chris Li
2024-03-04 22:58   ` Matthew Wilcox
2024-03-05  3:23     ` Chengming Zhou
2024-03-05  7:44       ` Chris Li
2024-03-05  8:15         ` Chengming Zhou
2024-03-05 18:24           ` Chris Li
2024-03-05  9:32         ` Nhat Pham
2024-03-05  9:52           ` Chengming Zhou
2024-03-05 10:55             ` Nhat Pham
2024-03-05 19:20               ` Chris Li
2024-03-05 20:56                 ` Jared Hulbert
2024-03-05 21:38         ` Jared Hulbert
2024-03-05 21:58           ` Chris Li [this message]
2024-03-06  4:16             ` Jared Hulbert
2024-03-06  5:50               ` Chris Li
     [not found]                 ` <CA+ZsKJ7JE56NS6hu4L_uyywxZO7ixgftvfKjdND9e5SOyn+72Q@mail.gmail.com>
2024-03-06 18:16                   ` Chris Li
2024-03-06 22:44                     ` Jared Hulbert
2024-03-07  0:46                       ` Chris Li
2024-03-07  8:57                         ` Jared Hulbert
2024-03-06  1:33   ` Barry Song
2024-03-04 18:43 ` Kairui Song
2024-03-04 22:03   ` Jared Hulbert
2024-03-04 22:47     ` Chris Li
2024-03-04 22:36   ` Chris Li
2024-03-06  1:15 ` Barry Song
2024-03-06  2:59   ` Chris Li
2024-03-06  6:05     ` Barry Song
2024-03-06 17:56       ` Chris Li
2024-03-06 21:29         ` Barry Song
2024-03-08  8:55       ` David Hildenbrand
2024-03-07  7:56 ` Chuanhua Han
2024-03-07 14:03   ` [Lsf-pc] " Jan Kara
2024-03-07 21:06     ` Jared Hulbert
2024-03-07 21:17       ` Barry Song
2024-03-08  0:14         ` Jared Hulbert
2024-03-08  0:53           ` Barry Song
2024-03-14  9:03         ` Jan Kara
2024-05-16 15:04           ` Zi Yan
2024-05-17  3:48             ` Chris Li
2024-03-14  8:52       ` Jan Kara
2024-03-08  2:02     ` Chuanhua Han
2024-03-14  8:26       ` Jan Kara
2024-03-14 11:19         ` Chuanhua Han
2024-05-15 23:07           ` Chris Li
2024-05-16  7:16             ` Chuanhua Han
2024-05-17 12:12     ` Karim Manaouil
2024-05-21 20:40       ` Chris Li
2024-05-28  7:08         ` Jared Hulbert
2024-05-29  3:36           ` Chris Li
2024-05-29  3:57         ` Matthew Wilcox
2024-05-29  6:50           ` Chris Li
2024-05-29 12:33             ` Matthew Wilcox
2024-05-30 22:53               ` Chris Li
2024-05-31  3:12                 ` Matthew Wilcox
2024-06-01  0:43                   ` Chris Li
2024-05-31  1:56               ` Yuanchu Xie
2024-05-31 16:51                 ` Chris Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAF8kJuM1CMMZc=Hz8cWy1LEuRSFQ+NG3e67UtJeOt+NTbB7=HA@mail.gmail.com' \
    --to=chrisl@kernel.org \
    --cc=21cnbao@gmail.com \
    --cc=chengming.zhou@linux.dev \
    --cc=david@redhat.com \
    --cc=hanchuanhua@oppo.com \
    --cc=jaredeh@gmail.com \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=nphamcs@gmail.com \
    --cc=ryan.roberts@arm.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).