From: "Huang\, Ying" <ying.huang@intel.com>
To: Dan Williams <dan.j.williams@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-mm <linux-mm@kvack.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Michal Hocko <mhocko@suse.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Shaohua Li <shli@kernel.org>,
hughd@google.com, Minchan Kim <minchan@kernel.org>,
Rik van Riel <riel@redhat.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
n-horiguchi@ah.jp.nec.com, zi.yan@cs.rutgers.edu,
daniel.m.jordan@oracle.com
Subject: Re: [PATCH -mm -v4 03/21] mm, THP, swap: Support PMD swap mapping in swap_duplicate()
Date: Mon, 09 Jul 2018 15:38:39 +0800 [thread overview]
Message-ID: <8736wsluyo.fsf@yhuang-dev.intel.com> (raw)
In-Reply-To: <CAA9_cmc2YteXBhrLOFN0rAZ4UFDRPcXaE1OPNv06P+Fu9e+zeA@mail.gmail.com> (Dan Williams's message of "Sat, 7 Jul 2018 16:22:54 -0700")
Dan Williams <dan.j.williams@gmail.com> writes:
> On Thu, Jun 21, 2018 at 8:55 PM Huang, Ying <ying.huang@intel.com> wrote:
>>
>> From: Huang Ying <ying.huang@intel.com>
>>
>> To support to swapin the THP as a whole, we need to create PMD swap
>> mapping during swapout, and maintain PMD swap mapping count. This
>> patch implements the support to increase the PMD swap mapping
>> count (for swapout, fork, etc.) and set SWAP_HAS_CACHE flag (for
>> swapin, etc.) for a huge swap cluster in swap_duplicate() function
>> family. Although it only implements a part of the design of the swap
>> reference count with PMD swap mapping, the whole design is described
>> as follow to make it easy to understand the patch and the whole
>> picture.
>>
>> A huge swap cluster is used to hold the contents of a swapouted THP.
>> After swapout, a PMD page mapping to the THP will become a PMD
>> swap mapping to the huge swap cluster via a swap entry in PMD. While
>> a PTE page mapping to a subpage of the THP will become the PTE swap
>> mapping to a swap slot in the huge swap cluster via a swap entry in
>> PTE.
>>
>> If there is no PMD swap mapping and the corresponding THP is removed
>> from the page cache (reclaimed), the huge swap cluster will be split
>> and become a normal swap cluster.
>>
>> The count (cluster_count()) of the huge swap cluster is
>> SWAPFILE_CLUSTER (= HPAGE_PMD_NR) + PMD swap mapping count. Because
>> all swap slots in the huge swap cluster are mapped by PTE or PMD, or
>> has SWAP_HAS_CACHE bit set, the usage count of the swap cluster is
>> HPAGE_PMD_NR. And the PMD swap mapping count is recorded too to make
>> it easy to determine whether there are remaining PMD swap mappings.
>>
>> The count in swap_map[offset] is the sum of PTE and PMD swap mapping
>> count. This means when we increase the PMD swap mapping count, we
>> need to increase swap_map[offset] for all swap slots inside the swap
>> cluster. An alternative choice is to make swap_map[offset] to record
>> PTE swap map count only, given we have recorded PMD swap mapping count
>> in the count of the huge swap cluster. But this need to increase
>> swap_map[offset] when splitting the PMD swap mapping, that may fail
>> because of memory allocation for swap count continuation. That is
>> hard to dealt with. So we choose current solution.
>>
>> The PMD swap mapping to a huge swap cluster may be split when unmap a
>> part of PMD mapping etc. That is easy because only the count of the
>> huge swap cluster need to be changed. When the last PMD swap mapping
>> is gone and SWAP_HAS_CACHE is unset, we will split the huge swap
>> cluster (clear the huge flag). This makes it easy to reason the
>> cluster state.
>>
>> A huge swap cluster will be split when splitting the THP in swap
>> cache, or failing to allocate THP during swapin, etc. But when
>> splitting the huge swap cluster, we will not try to split all PMD swap
>> mappings, because we haven't enough information available for that
>> sometimes. Later, when the PMD swap mapping is duplicated or swapin,
>> etc, the PMD swap mapping will be split and fallback to the PTE
>> operation.
>>
>> When a THP is added into swap cache, the SWAP_HAS_CACHE flag will be
>> set in the swap_map[offset] of all swap slots inside the huge swap
>> cluster backing the THP. This huge swap cluster will not be split
>> unless the THP is split even if its PMD swap mapping count dropped to
>> 0. Later, when the THP is removed from swap cache, the SWAP_HAS_CACHE
>> flag will be cleared in the swap_map[offset] of all swap slots inside
>> the huge swap cluster. And this huge swap cluster will be split if
>> its PMD swap mapping count is 0.
>>
>> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
>> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>> Cc: Andrea Arcangeli <aarcange@redhat.com>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>> Cc: Shaohua Li <shli@kernel.org>
>> Cc: Hugh Dickins <hughd@google.com>
>> Cc: Minchan Kim <minchan@kernel.org>
>> Cc: Rik van Riel <riel@redhat.com>
>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>> Cc: Zi Yan <zi.yan@cs.rutgers.edu>
>> Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
>> ---
>> include/linux/huge_mm.h | 5 +
>> include/linux/swap.h | 9 +-
>> mm/memory.c | 2 +-
>> mm/rmap.c | 2 +-
>> mm/swap_state.c | 2 +-
>> mm/swapfile.c | 287 +++++++++++++++++++++++++++++++++---------------
>> 6 files changed, 214 insertions(+), 93 deletions(-)
>
> I'm probably missing some background, but I find the patch hard to
> read. Can you disseminate some of this patch changelog into kernel-doc
> commentary so it's easier to follow which helpers do what relative to
> THP swap.
Yes. This is a good idea. Thanks for pointing it out. I will add more
kernel-doc commentary to make the code easier to be understood.
>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index d3bbf6bea9e9..213d32e57c39 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -80,6 +80,11 @@ extern struct kobj_attribute shmem_enabled_attr;
>> #define HPAGE_PMD_ORDER (HPAGE_PMD_SHIFT-PAGE_SHIFT)
>> #define HPAGE_PMD_NR (1<<HPAGE_PMD_ORDER)
>>
>> +static inline bool thp_swap_supported(void)
>> +{
>> + return IS_ENABLED(CONFIG_THP_SWAP);
>> +}
>> +
>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> #define HPAGE_PMD_SHIFT PMD_SHIFT
>> #define HPAGE_PMD_SIZE ((1UL) << HPAGE_PMD_SHIFT)
>> diff --git a/include/linux/swap.h b/include/linux/swap.h
>> index f73eafcaf4e9..57aa655ab27d 100644
>> --- a/include/linux/swap.h
>> +++ b/include/linux/swap.h
>> @@ -451,8 +451,8 @@ extern swp_entry_t get_swap_page_of_type(int);
>> extern int get_swap_pages(int n, bool cluster, swp_entry_t swp_entries[]);
>> extern int add_swap_count_continuation(swp_entry_t, gfp_t);
>> extern void swap_shmem_alloc(swp_entry_t);
>> -extern int swap_duplicate(swp_entry_t);
>> -extern int swapcache_prepare(swp_entry_t);
>> +extern int swap_duplicate(swp_entry_t *entry, bool cluster);
>
> This patch introduces a new flag to swap_duplicate(), but then all all
> usages still pass 'false' so why does this patch change the argument.
> Seems this change belongs to another patch?
This patch just introduce the capability to deal with huge swap entry in
swap_duplicate() family functions. The first user of the huge swap
entry is in
[PATCH -mm -v4 08/21] mm, THP, swap: Support to read a huge swap cluster for swapin a THP
via swapcache_prepare().
Yes, it is generally better to put the implementation and the user into
one patch. But I found in that way, I have to put most code of this
patchset into single huge patch, that is not good for review too. So I
made some compromise to separate the implementation and the users into
different patches to make the size of single patch not too huge. Does
this make sense to you?
>> +extern int swapcache_prepare(swp_entry_t entry, bool cluster);
>
> Rather than add a cluster flag to these helpers can the swp_entry_t
> carry the cluster flag directly?
Matthew Wilcox suggested to replace the "cluster" flag with the number
of entries to make the interface more flexible. And he suggest to use a
very smart way to encode the nr_entries into swap_entry_t with something
like,
https://plus.google.com/117536210417097546339/posts/hvctn17WUZu
But I think we need to
- encode swap type, swap offset, nr_entries into a new swp_entry_t
- call a function
- decode the new swp_entry_t in the function
So it appears that it doesn't bring real value except reduce one
parameter. Or you suggest something else?
Best Regards,
Huang, Ying
next prev parent reply other threads:[~2018-07-09 7:38 UTC|newest]
Thread overview: 72+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-06-22 3:51 [PATCH -mm -v4 00/21] mm, THP, swap: Swapout/swapin THP in one piece Huang, Ying
2018-06-22 3:51 ` [PATCH -mm -v4 01/21] mm, THP, swap: Enable PMD swap operations for CONFIG_THP_SWAP Huang, Ying
2018-07-07 21:11 ` Dan Williams
2018-07-09 5:40 ` Huang, Ying
2018-07-09 6:08 ` Dan Williams
2018-07-09 6:34 ` Huang, Ying
2018-07-09 15:59 ` Dave Hansen
2018-07-10 1:08 ` Huang, Ying
2018-06-22 3:51 ` [PATCH -mm -v4 02/21] mm, THP, swap: Make CONFIG_THP_SWAP depends on CONFIG_SWAP Huang, Ying
2018-07-07 21:12 ` Dan Williams
2018-07-09 6:34 ` Huang, Ying
2018-07-09 16:00 ` Dave Hansen
2018-07-10 1:19 ` Huang, Ying
2018-07-10 1:59 ` Dave Hansen
2018-07-10 5:26 ` Huang, Ying
2018-06-22 3:51 ` [PATCH -mm -v4 03/21] mm, THP, swap: Support PMD swap mapping in swap_duplicate() Huang, Ying
2018-06-29 6:04 ` Matthew Wilcox
2018-07-02 5:19 ` Huang, Ying
2018-07-07 23:22 ` Dan Williams
2018-07-09 7:38 ` Huang, Ying [this message]
2018-07-09 16:51 ` Dave Hansen
2018-07-10 6:44 ` Huang, Ying
2018-07-10 13:50 ` Dave Hansen
2018-07-11 0:59 ` Huang, Ying
2018-06-22 3:51 ` [PATCH -mm -v4 04/21] mm, THP, swap: Support PMD swap mapping in swapcache_free_cluster() Huang, Ying
2018-07-09 17:11 ` Dave Hansen
2018-07-10 6:53 ` Huang, Ying
2018-07-10 13:54 ` Dave Hansen
2018-07-11 1:08 ` Huang, Ying
2018-06-22 3:51 ` [PATCH -mm -v4 05/21] mm, THP, swap: Support PMD swap mapping in free_swap_and_cache()/swap_free() Huang, Ying
2018-07-05 18:33 ` Daniel Jordan
2018-07-06 12:49 ` Huang, Ying
2018-07-09 17:19 ` Dave Hansen
2018-07-10 7:13 ` Huang, Ying
2018-07-10 14:07 ` Dave Hansen
2018-07-11 1:28 ` Huang, Ying
2018-06-22 3:51 ` [PATCH -mm -v4 06/21] mm, THP, swap: Support PMD swap mapping when splitting huge PMD Huang, Ying
2018-06-22 3:51 ` [PATCH -mm -v4 07/21] mm, THP, swap: Support PMD swap mapping in split_swap_cluster() Huang, Ying
2018-06-22 3:51 ` [PATCH -mm -v4 08/21] mm, THP, swap: Support to read a huge swap cluster for swapin a THP Huang, Ying
2018-06-29 6:21 ` Matthew Wilcox
2018-07-02 6:02 ` Huang, Ying
2018-07-04 0:12 ` Daniel Jordan
2018-07-04 2:24 ` Huang, Ying
2018-06-22 3:51 ` [PATCH -mm -v4 09/21] mm, THP, swap: Swapin a THP as a whole Huang, Ying
2018-06-22 3:51 ` [PATCH -mm -v4 10/21] mm, THP, swap: Support to count THP swapin and its fallback Huang, Ying
2018-06-22 3:51 ` [PATCH -mm -v4 11/21] mm, THP, swap: Add sysfs interface to configure THP swapin Huang, Ying
2018-06-22 3:51 ` [PATCH -mm -v4 12/21] mm, THP, swap: Support PMD swap mapping in swapoff Huang, Ying
2018-06-22 3:51 ` [PATCH -mm -v4 13/21] mm, THP, swap: Support PMD swap mapping in madvise_free() Huang, Ying
2018-06-22 3:51 ` [PATCH -mm -v4 14/21] mm, cgroup, THP, swap: Support to move swap account for PMD swap mapping Huang, Ying
2018-07-09 17:20 ` Daniel Jordan
2018-07-10 7:49 ` Huang, Ying
2018-07-10 22:49 ` Daniel Jordan
2018-06-22 3:51 ` [PATCH -mm -v4 15/21] mm, THP, swap: Support to copy PMD swap mapping when fork() Huang, Ying
2018-06-22 3:51 ` [PATCH -mm -v4 16/21] mm, THP, swap: Free PMD swap mapping when zap_huge_pmd() Huang, Ying
2018-06-22 3:51 ` [PATCH -mm -v4 17/21] mm, THP, swap: Support PMD swap mapping for MADV_WILLNEED Huang, Ying
2018-06-22 3:51 ` [PATCH -mm -v4 18/21] mm, THP, swap: Support PMD swap mapping in mincore() Huang, Ying
2018-06-22 3:51 ` [PATCH -mm -v4 19/21] mm, THP, swap: Support PMD swap mapping in common path Huang, Ying
2018-06-22 3:51 ` [PATCH -mm -v4 20/21] mm, THP, swap: create PMD swap mapping when unmap the THP Huang, Ying
2018-06-22 3:51 ` [PATCH -mm -v4 21/21] mm, THP: Avoid to split THP when reclaim MADV_FREE THP Huang, Ying
2018-06-28 4:51 ` [PATCH -mm -v4 00/21] mm, THP, swap: Swapout/swapin THP in one piece Andrew Morton
2018-06-28 5:29 ` Huang, Ying
2018-06-28 5:31 ` Andrew Morton
2018-06-28 5:35 ` Huang, Ying
2018-06-28 6:18 ` Andrew Morton
2018-06-28 9:03 ` Matthew Wilcox
2018-06-29 1:17 ` Huang, Ying
2018-06-29 5:57 ` Matthew Wilcox
2018-07-02 5:19 ` Huang, Ying
2018-07-04 2:11 ` Sergey Senozhatsky
2018-07-04 2:20 ` Huang, Ying
2018-07-04 2:27 ` Sergey Senozhatsky
2018-07-04 2:59 ` Huang, Ying
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8736wsluyo.fsf@yhuang-dev.intel.com \
--to=ying.huang@intel.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=dan.j.williams@gmail.com \
--cc=daniel.m.jordan@oracle.com \
--cc=dave.hansen@linux.intel.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=minchan@kernel.org \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=riel@redhat.com \
--cc=shli@kernel.org \
--cc=zi.yan@cs.rutgers.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).