Re: [PATCH] mm: Add Kcompressd for accelerated memory compression

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Barry Song <21cnbao@gmail.com>
To: nphamcs@gmail.com
Cc: 21cnbao@gmail.com, akpm@linux-foundation.org,
	andrew.yang@mediatek.com,
	angelogioacchino.delregno@collabora.com, casper.li@mediatek.com,
	chinwen.chang@mediatek.com, hannes@cmpxchg.org,
	james.hsu@mediatek.com, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mediatek@lists.infradead.org,
	linux-mm@kvack.org, matthias.bgg@gmail.com, minchan@kernel.org,
	qun-wei.lin@mediatek.com, rppt@kernel.org,
	senozhatsky@chromium.org, sj@kernel.org
Subject: Re: [PATCH] mm: Add Kcompressd for accelerated memory compression
Date: Mon, 23 Jun 2025 17:16:42 +1200	[thread overview]
Message-ID: <20250623051642.3645-1-21cnbao@gmail.com> (raw)
In-Reply-To: <CAKEwX=ObLVcbR9q7ZvR3WE2hhmxLpk1bSuvcbWZwo5o5vPCDRA@mail.gmail.com>

Hi Nhat,

On Wed, Jun 18, 2025 at 2:21 AM Nhat Pham <nphamcs@gmail.com> wrote:
>
> On Sun, Jun 15, 2025 at 8:41 PM Barry Song <21cnbao@gmail.com> wrote:
> > >>
> > >> That seems unnecessary. There is an existing method for asynchronous
> > >> writeback, and pageout() is naturally fully set up to handle this.
> > >>
> > >> IMO the better way to do this is to make zswap_store() (and
> > >> zram_bio_write()?) asynchronous. Make those functions queue the work
> > >> and wake the compression daemon, and then have the daemon call
> > >> folio_end_writeback() / bio_endio() when it's done with it.
> >
> > > +1.
> >
> >
> > But,
> > How could this be possible for zswap? zswap_store() is only a frontend —
> > we still need its return value to determine whether __swap_writepage()
> > is required. Waiting for the result of zswap_store() is inherently a
> > synchronous step.
>
> Hmm, I might be misunderstanding either of you, but it sounds like
> what you're describing here does not contradict what Johannes is
> proposing?

It seems contradictory: Johannes proposes that zswap could behave like zRAM
by invoking `folio_end_writeback()` or `bio_endio()`, but this doesn’t align
with actual behavior since zswap_store might not end `swap_writeout()`—it may
still proceed to `__swap_writeback()` to complete the final steps.

Meanwhile, Qun-wei’s RFC has already explored using `folio_end_writeback()` and
`bio_endio()` at the end of `__swap_writepage()` for zRAM, though that approach
also has its own issues.

>
> >
> > My point is that folio_end_writeback() and bio_endio() can only be
> > called after the entire zswap_store() → __swap_writepage() sequence is
> > completed. That’s why both are placed in the new kcompressed.
>
> Hmm, how about:
>
> 1. Inside zswap_store(), we first obtain the obj_cgroup reference,
> check cgroup and pool limit, and grab a zswap pool reference (in
> effect, determining the slot allocator and compressor).
>
> 2. Next, we try to queue the work to kcompressd, saving the folio and
> the zswap pool (and whatever else we need for the continuation). If
> this fails, we can proceed with the old synchronous path.
>
> 3. In kcompressed daemon, we perform the continuation of
> zswap_store(): compression, slot allocation, storing, zswap's LRU
> modification, etc. If this fails, we check if the mem_cgroup enables
> writeback. If it's enabled, we can call __swap_writepage(). Ideally,
> if writeback is disabled, we should activate the page, but it might
> not be possible since shrink_folio_list() might already re-add the
> page to the inactive lru. Maybe some modification of pageout() and
> shrink_folio_list() can make this work, but I haven't thought too
> deeply about it :) If it's impossible, we can perform async
> compression only for cgroups that enable writeback for now. Once we
> fix zswap's handling of incompressible pages, we can revisit this
> decision (+ SJ).
>
> TLDR: move the work-queueing step forward a bit, into the middle of
> zswap_store().
>
> One benefit of this is we skip pages of cgroups that disable zswap, or
> when zswap pool is full.

I assume you meant something like the following:

bool try_to_sched_async_zswap_store()
{
	get_obj_cgroup_from_folio()
	if (err) goto xxx;
	zswap_check_limits();
	if (err) goto xxx;
	zswap_pool_current_get()
	if (err) goto xxx;
	   
	queue_folio_to_kcompressd(folio);
	return true;

xxx:
	error handler things;
	return false;
}

If this function returns true, it suggests that compression requests  
have been queued to kcompressd. Following that, in kcompressd():

int __zswap_store(folio)
{
	for(i=0;i<nr_pages;i++) {
		zswap_store_page();
		if (err) return err;
	}
	return 0;
}

kcompressd()
{
	while(folio_queue_is_not_empty) {
		folio = dequeue_folio();
		if (folio_queued_by_zswap(folio)) {
			if(!__zswap_store(folio))
				continue;
		}
		if ((zswap_store_page_fails && mem_cgroup_zswap_writeback_enabled()) ||
		    folio_queued_by_zram) {
			__swap_writepage();
		}
	}
}

In kswapd, we will need to do
int swap_writeout(struct folio *folio, struct swap_iocb **swap_plug)
{
	...
	if (try_to_sched_async_zswap_store(folio))
		return;
	if (is_sync_comp_blkdev(swap)) {
		queue_folio_to_kcompressd(folio);
		return;
	}
	__swap_writepage();
}

To be honest, I'm not sure if there's a flag that indicates whether the  
folio was queued by zswap or zram. If not, we may need to add a member  
associated with folio pointers in the queue between kswapd and kcompressd,  
since we need to identify zswap cases. Maybe we can reuse bit 0 of the  
folio pointer?

What I mean is: while queuing, if the folio is queued by zswap, we do  
`pointer |= BIT(0)`. Then in kcompressd, we restore the original folio  
with `folio = pointer & ~BIT(0)`. It's a bit ugly, but I’m not sure  
there’s a better approach.

Thanks
Barry

next prev parent reply	other threads:[~2025-06-23  5:19 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-30  8:26 [PATCH] mm: Add Kcompressd for accelerated memory compression Qun-Wei Lin
2025-04-30 17:05 ` Nhat Pham
2025-04-30 17:22 ` Nhat Pham
2025-04-30 21:51 ` Andrew Morton
2025-04-30 22:49   ` Barry Song
2025-05-07 15:11     ` Nhat Pham
2025-05-01 14:02 ` Johannes Weiner
2025-05-01 15:12   ` Nhat Pham
2025-06-16  3:41     ` Barry Song
2025-06-17 14:21       ` Nhat Pham
2025-06-23  5:16         ` Barry Song [this message]
2025-06-27 23:21           ` Nhat Pham
2025-07-09  3:25             ` Qun-wei Lin (林群崴)
2025-05-02  9:16   ` Qun-wei Lin (林群崴)
2025-05-01 15:50 ` Nhat Pham
2025-05-07  1:12 ` Harry Yoo
2025-05-07  1:50   ` Zi Yan
2025-05-07  2:04     ` Barry Song
2025-05-07 15:00       ` Nhat Pham
2025-05-07 15:12         ` Zi Yan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250623051642.3645-1-21cnbao@gmail.com \
    --to=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrew.yang@mediatek.com \
    --cc=angelogioacchino.delregno@collabora.com \
    --cc=casper.li@mediatek.com \
    --cc=chinwen.chang@mediatek.com \
    --cc=hannes@cmpxchg.org \
    --cc=james.hsu@mediatek.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mediatek@lists.infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=matthias.bgg@gmail.com \
    --cc=minchan@kernel.org \
    --cc=nphamcs@gmail.com \
    --cc=qun-wei.lin@mediatek.com \
    --cc=rppt@kernel.org \
    --cc=senozhatsky@chromium.org \
    --cc=sj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.