linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "tiantao (H)" <tiantao6@huawei.com>
To: Vitaly Wool <vitaly.wool@konsulko.com>,
	Shakeel Butt <shakeelb@google.com>
Cc: Minchan Kim <minchan@kernel.org>,
	"tiantao (H)" <tiantao6@hisilicon.com>,
	Seth Jennings <sjenning@redhat.com>,
	Dan Streetman <ddstreet@ieee.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com>,
	Linux-MM <linux-mm@kvack.org>,
	"Sergey Senozhatsky" <sergey.senozhatsky.work@gmail.com>
Subject: Re: [RFC mm/zswap 1/2] mm/zswap: add the flag can_sleep_mapped
Date: Tue, 19 Jan 2021 09:28:45 +0800	[thread overview]
Message-ID: <a43e8637-c960-d9a6-f87c-2e72e7c1bd25@huawei.com> (raw)
In-Reply-To: <CAM4kBBKAy35RFeCwHGSwM-viTtz5bXapgGQe50vjhEJwVGPseQ@mail.gmail.com>


在 2021/1/15 6:41, Vitaly Wool 写道:
> On Thu, Jan 14, 2021 at 10:09 PM Shakeel Butt <shakeelb@google.com> wrote:
>> On Thu, Jan 14, 2021 at 11:53 AM Vitaly Wool <vitaly.wool@konsulko.com> wrote:
>>> On Thu, Jan 14, 2021 at 8:21 PM Shakeel Butt <shakeelb@google.com> wrote:
>>>> On Thu, Jan 14, 2021 at 11:05 AM Vitaly Wool <vitaly.wool@konsulko.com> wrote:
>>>>> On Thu, Jan 14, 2021 at 7:56 PM Minchan Kim <minchan@kernel.org> wrote:
>>>>>> On Thu, Jan 14, 2021 at 07:40:50PM +0100, Vitaly Wool wrote:
>>>>>>> On Thu, Jan 14, 2021 at 7:29 PM Minchan Kim <minchan@kernel.org> wrote:
>>>>>>>> On Fri, Dec 25, 2020 at 07:02:50PM +0800, Tian Tao wrote:
>>>>>>>>> add a flag to zpool, named is "can_sleep_mapped", and have it set true
>>>>>>>>> for zbud/z3fold, set false for zsmalloc. Then zswap could go the current
>>>>>>>>> path if the flag is true; and if it's false, copy data from src to a
>>>>>>>>> temporary buffer, then unmap the handle, take the mutex, process the
>>>>>>>>> buffer instead of src to avoid sleeping function called from atomic
>>>>>>>>> context.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
>>>>>>>>> ---
>>>>>>>>>   include/linux/zpool.h |  3 +++
>>>>>>>>>   mm/zpool.c            | 13 +++++++++++++
>>>>>>>>>   mm/zswap.c            | 50 +++++++++++++++++++++++++++++++++++++++++++++-----
>>>>>>>>>   3 files changed, 61 insertions(+), 5 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/include/linux/zpool.h b/include/linux/zpool.h
>>>>>>>>> index 51bf430..e899701 100644
>>>>>>>>> --- a/include/linux/zpool.h
>>>>>>>>> +++ b/include/linux/zpool.h
>>>>>>>>> @@ -73,6 +73,7 @@ u64 zpool_get_total_size(struct zpool *pool);
>>>>>>>>>    * @malloc:  allocate mem from a pool.
>>>>>>>>>    * @free:    free mem from a pool.
>>>>>>>>>    * @shrink:  shrink the pool.
>>>>>>>>> + * @sleep_mapped: whether zpool driver can sleep during map.
>>>>>>>> I don't think it's a good idea. It just breaks zpool abstraction
>>>>>>>> in that it exposes internal implementation to user to avoid issue
>>>>>>>> zswap recently introduced. It also conflicts zpool_map_handle's
>>>>>>>> semantic.
>>>>>>>>
>>>>>>>> Rather than introducing another break in zpool due to the new
>>>>>>>> zswap feature recenlty introduced, zswap could introduce
>>>>>>>> CONFIG_ZSWAP_HW_COMPRESSOR. Once it's configured, zsmalloc could
>>>>>>>> be disabled. And with disabling CONFIG_ZSWAP_HW_COMPRESSOR, zswap
>>>>>>>> doesn't need to make any bounce buffer copy so that no existing
>>>>>>>> zsmalloc user will see performance regression.
>>>>>>> I believe it won't help that much -- the new compressor API presumes
>>>>>>> that the caller may sleep during compression and that will be an
>>>>>>> accident waiting to happen as long as we use it and keep the handle
>>>>>>> mapped in zsmalloc case.
>>>>>>>
>>>>>>> Or maybe I interpreted you wrong and you are suggesting re-introducing
>>>>>>> calls to the old API under this #ifdef, is that the case?
>>>>>> Yub. zswap could abstract that part under #ifdef to keep old behavior.
>>>>> We can reconsider this option when zsmalloc implements reclaim
>>>>> callback. So far it's obviously too much a mess for a reason so weak.
>>>>>
>>>> Sorry I don't understand the link between zsmalloc implementing shrink
>>>> callback and this patch.
>>> There is none. There is a link between taking all the burden to revive
>>> zsmalloc for zswap at the cost of extra zswap complexity and zsmalloc
>>> not being fully zswap compatible.
>>>
>>> The ultimate zswap goal is to cache hottest swapped-out pages in a
>>> compressed format. zsmalloc doesn't implement reclaim callback, and
>>> therefore zswap can *not* fulfill its goal since old pages are there
>>> to stay, and it can't accept new hotter ones. So, let me make it
>>> clear: zswap/zsmalloc combo is a legacy corner case.
>>>
>> This is the first time I am hearing that zswap/zsmalloc combo is a
>> legacy configuration. We use zswap in production and intentionally
>> size the zswap pool to never have to go to the backing device. So
>> absence of reclaim callback is not an issue for us. Please let me know
>> if the zswap/zsmalloc combo is officially on its way to deprecation.
> No, zswap/zsmalloc combo not on the way to deprecation. I generally
> would not advise on using it but your particular case does make sense
> (although using frontswap/zswap without a backing device *is* a corner
> case).
>
>>>> This patch is adding an overhead for all
>>>> zswap+zsmalloc users irrespective of availability of hardware. If we
>>>> want to add support for new hardware, please add without impacting the
>>>> current users.
>>> No, it's not like that. zswap+zsmalloc combination is currently
>>> already broken
>> By broken do you mean the missing reclaim callback?
> No. I mean deadlocks in -rt kernel / scheduling while atomic bugs in
> the mainline. Missing reclaim callback goes into "not fully
> compatible" section in my world.
>
>>> and this patch implements a way to work that around.
>>> The users are already impacted and that is of course a problem.
>> Are you talking about rt users or was there some other report?
> I don't think the issue is specific to -rt series. There may (and
> will) still be "scheduling while atomic" bugs occurring, just not as
> often.
>
>>> The workaround is not perfect but looks good enough for me.
>> I would like a see page fault perf experiment for the non-hardware case.
> I second that. @tiantao (H), would it be possible to provide one?
No problem, but can you tell how to provide the data you want for page 
fault?
>
> Also, correct me if I'm wrong but from what I recall, the acomp API
> does one redundant copy less than the old comp API so zsmalloc should
> be back to square one even with the buffer implemented in this patch.
> The other backends should do a little better though but if so, it's
> the upside of not taking too many spinlocks.
>
> Best regards,
>     Vitaly



  reply	other threads:[~2021-01-19  1:29 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-25 11:02 [RFC mm/zswap 0/2] Fix the compatibility of zsmalloc and zswap Tian Tao
2020-12-25 11:02 ` [RFC mm/zswap 1/2] mm/zswap: add the flag can_sleep_mapped Tian Tao
2021-01-14 18:28   ` Minchan Kim
2021-01-14 18:40     ` Vitaly Wool
2021-01-14 18:56       ` Minchan Kim
2021-01-14 19:05         ` Vitaly Wool
2021-01-14 19:21           ` Shakeel Butt
2021-01-14 19:23             ` Minchan Kim
2021-01-14 19:53             ` Vitaly Wool
2021-01-14 21:09               ` Shakeel Butt
2021-01-14 22:41                 ` Vitaly Wool
2021-01-19  1:28                   ` tiantao (H) [this message]
2021-01-14 18:43     ` Shakeel Butt
2021-01-14 18:53       ` Vitaly Wool
2021-01-21  9:17   ` Vitaly Wool
2021-01-21 23:15     ` Song Bao Hua (Barry Song)
2020-12-25 11:02 ` [RFC mm/zswap 2/2] mm: set the sleep_mapped to true for zbud and z3fold Tian Tao
2021-01-14 18:46 ` [RFC mm/zswap 0/2] Fix the compatibility of zsmalloc and zswap Vitaly Wool
2021-01-15  1:17   ` tiantao (H)
2021-01-19  1:31     ` tiantao (H)
2021-01-19  2:39       ` Mike Galbraith
2021-01-18 13:44   ` Sebastian Andrzej Siewior

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a43e8637-c960-d9a6-f87c-2e72e7c1bd25@huawei.com \
    --to=tiantao6@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=ddstreet@ieee.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=sergey.senozhatsky.work@gmail.com \
    --cc=shakeelb@google.com \
    --cc=sjenning@redhat.com \
    --cc=song.bao.hua@hisilicon.com \
    --cc=tiantao6@hisilicon.com \
    --cc=vitaly.wool@konsulko.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).