From: "tiantao (H)" <tiantao6@huawei.com>
To: Vitaly Wool <vitaly.wool@konsulko.com>,
Shakeel Butt <shakeelb@google.com>
Cc: Minchan Kim <minchan@kernel.org>,
"tiantao (H)" <tiantao6@hisilicon.com>,
Seth Jennings <sjenning@redhat.com>,
Dan Streetman <ddstreet@ieee.org>,
Andrew Morton <akpm@linux-foundation.org>,
"Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com>,
Linux-MM <linux-mm@kvack.org>,
"Sergey Senozhatsky" <sergey.senozhatsky.work@gmail.com>
Subject: Re: [RFC mm/zswap 1/2] mm/zswap: add the flag can_sleep_mapped
Date: Tue, 19 Jan 2021 09:28:45 +0800 [thread overview]
Message-ID: <a43e8637-c960-d9a6-f87c-2e72e7c1bd25@huawei.com> (raw)
In-Reply-To: <CAM4kBBKAy35RFeCwHGSwM-viTtz5bXapgGQe50vjhEJwVGPseQ@mail.gmail.com>
在 2021/1/15 6:41, Vitaly Wool 写道:
> On Thu, Jan 14, 2021 at 10:09 PM Shakeel Butt <shakeelb@google.com> wrote:
>> On Thu, Jan 14, 2021 at 11:53 AM Vitaly Wool <vitaly.wool@konsulko.com> wrote:
>>> On Thu, Jan 14, 2021 at 8:21 PM Shakeel Butt <shakeelb@google.com> wrote:
>>>> On Thu, Jan 14, 2021 at 11:05 AM Vitaly Wool <vitaly.wool@konsulko.com> wrote:
>>>>> On Thu, Jan 14, 2021 at 7:56 PM Minchan Kim <minchan@kernel.org> wrote:
>>>>>> On Thu, Jan 14, 2021 at 07:40:50PM +0100, Vitaly Wool wrote:
>>>>>>> On Thu, Jan 14, 2021 at 7:29 PM Minchan Kim <minchan@kernel.org> wrote:
>>>>>>>> On Fri, Dec 25, 2020 at 07:02:50PM +0800, Tian Tao wrote:
>>>>>>>>> add a flag to zpool, named is "can_sleep_mapped", and have it set true
>>>>>>>>> for zbud/z3fold, set false for zsmalloc. Then zswap could go the current
>>>>>>>>> path if the flag is true; and if it's false, copy data from src to a
>>>>>>>>> temporary buffer, then unmap the handle, take the mutex, process the
>>>>>>>>> buffer instead of src to avoid sleeping function called from atomic
>>>>>>>>> context.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
>>>>>>>>> ---
>>>>>>>>> include/linux/zpool.h | 3 +++
>>>>>>>>> mm/zpool.c | 13 +++++++++++++
>>>>>>>>> mm/zswap.c | 50 +++++++++++++++++++++++++++++++++++++++++++++-----
>>>>>>>>> 3 files changed, 61 insertions(+), 5 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/include/linux/zpool.h b/include/linux/zpool.h
>>>>>>>>> index 51bf430..e899701 100644
>>>>>>>>> --- a/include/linux/zpool.h
>>>>>>>>> +++ b/include/linux/zpool.h
>>>>>>>>> @@ -73,6 +73,7 @@ u64 zpool_get_total_size(struct zpool *pool);
>>>>>>>>> * @malloc: allocate mem from a pool.
>>>>>>>>> * @free: free mem from a pool.
>>>>>>>>> * @shrink: shrink the pool.
>>>>>>>>> + * @sleep_mapped: whether zpool driver can sleep during map.
>>>>>>>> I don't think it's a good idea. It just breaks zpool abstraction
>>>>>>>> in that it exposes internal implementation to user to avoid issue
>>>>>>>> zswap recently introduced. It also conflicts zpool_map_handle's
>>>>>>>> semantic.
>>>>>>>>
>>>>>>>> Rather than introducing another break in zpool due to the new
>>>>>>>> zswap feature recenlty introduced, zswap could introduce
>>>>>>>> CONFIG_ZSWAP_HW_COMPRESSOR. Once it's configured, zsmalloc could
>>>>>>>> be disabled. And with disabling CONFIG_ZSWAP_HW_COMPRESSOR, zswap
>>>>>>>> doesn't need to make any bounce buffer copy so that no existing
>>>>>>>> zsmalloc user will see performance regression.
>>>>>>> I believe it won't help that much -- the new compressor API presumes
>>>>>>> that the caller may sleep during compression and that will be an
>>>>>>> accident waiting to happen as long as we use it and keep the handle
>>>>>>> mapped in zsmalloc case.
>>>>>>>
>>>>>>> Or maybe I interpreted you wrong and you are suggesting re-introducing
>>>>>>> calls to the old API under this #ifdef, is that the case?
>>>>>> Yub. zswap could abstract that part under #ifdef to keep old behavior.
>>>>> We can reconsider this option when zsmalloc implements reclaim
>>>>> callback. So far it's obviously too much a mess for a reason so weak.
>>>>>
>>>> Sorry I don't understand the link between zsmalloc implementing shrink
>>>> callback and this patch.
>>> There is none. There is a link between taking all the burden to revive
>>> zsmalloc for zswap at the cost of extra zswap complexity and zsmalloc
>>> not being fully zswap compatible.
>>>
>>> The ultimate zswap goal is to cache hottest swapped-out pages in a
>>> compressed format. zsmalloc doesn't implement reclaim callback, and
>>> therefore zswap can *not* fulfill its goal since old pages are there
>>> to stay, and it can't accept new hotter ones. So, let me make it
>>> clear: zswap/zsmalloc combo is a legacy corner case.
>>>
>> This is the first time I am hearing that zswap/zsmalloc combo is a
>> legacy configuration. We use zswap in production and intentionally
>> size the zswap pool to never have to go to the backing device. So
>> absence of reclaim callback is not an issue for us. Please let me know
>> if the zswap/zsmalloc combo is officially on its way to deprecation.
> No, zswap/zsmalloc combo not on the way to deprecation. I generally
> would not advise on using it but your particular case does make sense
> (although using frontswap/zswap without a backing device *is* a corner
> case).
>
>>>> This patch is adding an overhead for all
>>>> zswap+zsmalloc users irrespective of availability of hardware. If we
>>>> want to add support for new hardware, please add without impacting the
>>>> current users.
>>> No, it's not like that. zswap+zsmalloc combination is currently
>>> already broken
>> By broken do you mean the missing reclaim callback?
> No. I mean deadlocks in -rt kernel / scheduling while atomic bugs in
> the mainline. Missing reclaim callback goes into "not fully
> compatible" section in my world.
>
>>> and this patch implements a way to work that around.
>>> The users are already impacted and that is of course a problem.
>> Are you talking about rt users or was there some other report?
> I don't think the issue is specific to -rt series. There may (and
> will) still be "scheduling while atomic" bugs occurring, just not as
> often.
>
>>> The workaround is not perfect but looks good enough for me.
>> I would like a see page fault perf experiment for the non-hardware case.
> I second that. @tiantao (H), would it be possible to provide one?
No problem, but can you tell how to provide the data you want for page
fault?
>
> Also, correct me if I'm wrong but from what I recall, the acomp API
> does one redundant copy less than the old comp API so zsmalloc should
> be back to square one even with the buffer implemented in this patch.
> The other backends should do a little better though but if so, it's
> the upside of not taking too many spinlocks.
>
> Best regards,
> Vitaly
next prev parent reply other threads:[~2021-01-19 1:29 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-25 11:02 [RFC mm/zswap 0/2] Fix the compatibility of zsmalloc and zswap Tian Tao
2020-12-25 11:02 ` [RFC mm/zswap 1/2] mm/zswap: add the flag can_sleep_mapped Tian Tao
2021-01-14 18:28 ` Minchan Kim
2021-01-14 18:40 ` Vitaly Wool
2021-01-14 18:56 ` Minchan Kim
2021-01-14 19:05 ` Vitaly Wool
2021-01-14 19:21 ` Shakeel Butt
2021-01-14 19:23 ` Minchan Kim
2021-01-14 19:53 ` Vitaly Wool
2021-01-14 21:09 ` Shakeel Butt
2021-01-14 22:41 ` Vitaly Wool
2021-01-19 1:28 ` tiantao (H) [this message]
2021-01-14 18:43 ` Shakeel Butt
2021-01-14 18:53 ` Vitaly Wool
2021-01-21 9:17 ` Vitaly Wool
2021-01-21 23:15 ` Song Bao Hua (Barry Song)
2020-12-25 11:02 ` [RFC mm/zswap 2/2] mm: set the sleep_mapped to true for zbud and z3fold Tian Tao
2021-01-14 18:46 ` [RFC mm/zswap 0/2] Fix the compatibility of zsmalloc and zswap Vitaly Wool
2021-01-15 1:17 ` tiantao (H)
2021-01-19 1:31 ` tiantao (H)
2021-01-19 2:39 ` Mike Galbraith
2021-01-18 13:44 ` Sebastian Andrzej Siewior
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a43e8637-c960-d9a6-f87c-2e72e7c1bd25@huawei.com \
--to=tiantao6@huawei.com \
--cc=akpm@linux-foundation.org \
--cc=ddstreet@ieee.org \
--cc=linux-mm@kvack.org \
--cc=minchan@kernel.org \
--cc=sergey.senozhatsky.work@gmail.com \
--cc=shakeelb@google.com \
--cc=sjenning@redhat.com \
--cc=song.bao.hua@hisilicon.com \
--cc=tiantao6@hisilicon.com \
--cc=vitaly.wool@konsulko.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).