From: "Huang\, Ying" <ying.huang@intel.com>
To: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Michal Hocko <mhocko@kernel.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Shaohua Li <shli@kernel.org>, Hugh Dickins <hughd@google.com>,
Minchan Kim <minchan@kernel.org>, Rik van Riel <riel@redhat.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
Zi Yan <zi.yan@cs.rutgers.edu>
Subject: Re: [PATCH -V6 00/21] swap: Swapout/swapin THP in one piece
Date: Wed, 24 Oct 2018 11:31:42 +0800 [thread overview]
Message-ID: <87sh0wuijl.fsf@yhuang-dev.intel.com> (raw)
In-Reply-To: <20181023122738.a5j2vk554tsx4f6i@ca-dmjordan1.us.oracle.com> (Daniel Jordan's message of "Tue, 23 Oct 2018 05:27:38 -0700")
Hi, Daniel,
Daniel Jordan <daniel.m.jordan@oracle.com> writes:
> On Wed, Oct 10, 2018 at 03:19:03PM +0800, Huang Ying wrote:
>> And for all, Any comment is welcome!
>>
>> This patchset is based on the 2018-10-3 head of mmotm/master.
>
> There seems to be some infrequent memory corruption with THPs that have been
> swapped out: page contents differ after swapin.
Thanks a lot for testing this! I know there were big effort behind this
and it definitely will improve the quality of the patchset greatly!
> Reproducer at the bottom. Part of some tests I'm writing, had to separate it a
> little hack-ily. Basically it writes the word offset _at_ each word offset in
> a memory blob, tries to push it to swap, and verifies the offset is the same
> after swapin.
>
> I ran with THP enabled=always. THP swapin_enabled could be always or never, it
> happened with both. Every time swapping occurred, a single THP-sized chunk in
> the middle of the blob had different offsets. Example:
>
> ** > word corruption gap
> ** corruption detected 14929920 bytes in (got 15179776, expected 14929920) **
> ** corruption detected 14929928 bytes in (got 15179784, expected 14929928) **
> ** corruption detected 14929936 bytes in (got 15179792, expected 14929936) **
> ...pattern continues...
> ** corruption detected 17027048 bytes in (got 15179752, expected 17027048) **
> ** corruption detected 17027056 bytes in (got 15179760, expected 17027056) **
> ** corruption detected 17027064 bytes in (got 15179768, expected 17027064) **
15179776 < 15179xxx <= 17027064
15179776 % 4096 = 0
And 15179776 = 15179768 + 8
So I guess we have some alignment bug. Could you try the patches
attached? It deal with some alignment issue.
> 100.0% of memory was swapped out at mincore time
> 0.00305% of pages were corrupted (first corrupt word 14929920, last corrupt word 17027064)
>
> The problem goes away with THP enabled=never, and I don't see it on 2018-10-3
> mmotm/master with THP enabled=always.
>
> The server had an NVMe swap device and ~760G memory over two nodes, and the
> program was always run like this: swap-verify -s $((64 * 2**30))
>
> The kernels had one extra patch, Alexander Duyck's
> "dma-direct: Fix return value of dma_direct_supported", which was required to
> get them to build.
>
Thanks again!
Best Regards,
Huang, Ying
---------------------------------->8-----------------------------
WARNING: multiple messages have this Message-ID (diff)
From: "Huang\, Ying" <ying.huang@intel.com>
To: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>, <linux-mm@kvack.org>,
<linux-kernel@vger.kernel.org>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Michal Hocko <mhocko@kernel.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Shaohua Li <shli@kernel.org>, Hugh Dickins <hughd@google.com>,
Minchan Kim <minchan@kernel.org>, Rik van Riel <riel@redhat.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
Zi Yan <zi.yan@cs.rutgers.edu>
Subject: Re: [PATCH -V6 00/21] swap: Swapout/swapin THP in one piece
Date: Wed, 24 Oct 2018 11:31:42 +0800 [thread overview]
Message-ID: <87sh0wuijl.fsf@yhuang-dev.intel.com> (raw)
In-Reply-To: <20181023122738.a5j2vk554tsx4f6i@ca-dmjordan1.us.oracle.com> (Daniel Jordan's message of "Tue, 23 Oct 2018 05:27:38 -0700")
Hi, Daniel,
Daniel Jordan <daniel.m.jordan@oracle.com> writes:
> On Wed, Oct 10, 2018 at 03:19:03PM +0800, Huang Ying wrote:
>> And for all, Any comment is welcome!
>>
>> This patchset is based on the 2018-10-3 head of mmotm/master.
>
> There seems to be some infrequent memory corruption with THPs that have been
> swapped out: page contents differ after swapin.
Thanks a lot for testing this! I know there were big effort behind this
and it definitely will improve the quality of the patchset greatly!
> Reproducer at the bottom. Part of some tests I'm writing, had to separate it a
> little hack-ily. Basically it writes the word offset _at_ each word offset in
> a memory blob, tries to push it to swap, and verifies the offset is the same
> after swapin.
>
> I ran with THP enabled=always. THP swapin_enabled could be always or never, it
> happened with both. Every time swapping occurred, a single THP-sized chunk in
> the middle of the blob had different offsets. Example:
>
> ** > word corruption gap
> ** corruption detected 14929920 bytes in (got 15179776, expected 14929920) **
> ** corruption detected 14929928 bytes in (got 15179784, expected 14929928) **
> ** corruption detected 14929936 bytes in (got 15179792, expected 14929936) **
> ...pattern continues...
> ** corruption detected 17027048 bytes in (got 15179752, expected 17027048) **
> ** corruption detected 17027056 bytes in (got 15179760, expected 17027056) **
> ** corruption detected 17027064 bytes in (got 15179768, expected 17027064) **
15179776 < 15179xxx <= 17027064
15179776 % 4096 = 0
And 15179776 = 15179768 + 8
So I guess we have some alignment bug. Could you try the patches
attached? It deal with some alignment issue.
> 100.0% of memory was swapped out at mincore time
> 0.00305% of pages were corrupted (first corrupt word 14929920, last corrupt word 17027064)
>
> The problem goes away with THP enabled=never, and I don't see it on 2018-10-3
> mmotm/master with THP enabled=always.
>
> The server had an NVMe swap device and ~760G memory over two nodes, and the
> program was always run like this: swap-verify -s $((64 * 2**30))
>
> The kernels had one extra patch, Alexander Duyck's
> "dma-direct: Fix return value of dma_direct_supported", which was required to
> get them to build.
>
Thanks again!
Best Regards,
Huang, Ying
---------------------------------->8-----------------------------
From e1c3e4f565deeb8245bdc4ee53a1f1e4188b6d4a Mon Sep 17 00:00:00 2001
From: Huang Ying <ying.huang@intel.com>
Date: Wed, 24 Oct 2018 11:24:15 +0800
Subject: [PATCH] Fix alignment bug
---
include/linux/huge_mm.h | 6 ++----
mm/huge_memory.c | 9 ++++-----
mm/swap_state.c | 2 +-
3 files changed, 7 insertions(+), 10 deletions(-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 96baae08f47c..e7b3527bc493 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -379,8 +379,7 @@ struct page_vma_mapped_walk;
#ifdef CONFIG_THP_SWAP
extern void __split_huge_swap_pmd(struct vm_area_struct *vma,
- unsigned long haddr,
- pmd_t *pmd);
+ unsigned long addr, pmd_t *pmd);
extern int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd,
unsigned long address, pmd_t orig_pmd);
extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd);
@@ -411,8 +410,7 @@ static inline bool transparent_hugepage_swapin_enabled(
}
#else /* CONFIG_THP_SWAP */
static inline void __split_huge_swap_pmd(struct vm_area_struct *vma,
- unsigned long haddr,
- pmd_t *pmd)
+ unsigned long addr, pmd_t *pmd)
{
}
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index ed64266b63dc..b2af3bff7624 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1731,10 +1731,11 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd)
#ifdef CONFIG_THP_SWAP
/* Convert a PMD swap mapping to a set of PTE swap mappings */
void __split_huge_swap_pmd(struct vm_area_struct *vma,
- unsigned long haddr,
+ unsigned long addr,
pmd_t *pmd)
{
struct mm_struct *mm = vma->vm_mm;
+ unsigned long haddr = addr & HPAGE_PMD_MASK;
pgtable_t pgtable;
pmd_t _pmd;
swp_entry_t entry;
@@ -1772,7 +1773,7 @@ int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd,
ptl = pmd_lock(mm, pmd);
if (pmd_same(*pmd, orig_pmd))
- __split_huge_swap_pmd(vma, address & HPAGE_PMD_MASK, pmd);
+ __split_huge_swap_pmd(vma, address, pmd);
else
ret = -ENOENT;
spin_unlock(ptl);
@@ -2013,9 +2014,7 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
* swap mapping and operate on the PTEs
*/
if (next - addr != HPAGE_PMD_SIZE) {
- unsigned long haddr = addr & HPAGE_PMD_MASK;
-
- __split_huge_swap_pmd(vma, haddr, pmd);
+ __split_huge_swap_pmd(vma, addr, pmd);
goto out;
}
free_swap_and_cache(entry, HPAGE_PMD_NR);
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 784ad6388da0..fd143ef82351 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -451,7 +451,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
/* May fail (-ENOMEM) if XArray node allocation failed. */
__SetPageLocked(new_page);
__SetPageSwapBacked(new_page);
- err = add_to_swap_cache(new_page, entry, gfp_mask & GFP_KERNEL);
+ err = add_to_swap_cache(new_page, hentry, gfp_mask & GFP_KERNEL);
if (likely(!err)) {
/* Initiate read into locked page */
SetPageWorkingset(new_page);
--
2.18.1
next prev parent reply other threads:[~2018-10-24 3:31 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-10 7:19 [PATCH -V6 00/21] swap: Swapout/swapin THP in one piece Huang Ying
2018-10-10 7:19 ` Huang Ying
2018-10-10 7:19 ` [PATCH -V6 01/21] swap: Enable PMD swap operations for CONFIG_THP_SWAP Huang Ying
2018-10-10 7:19 ` [PATCH -V6 02/21] swap: Add __swap_duplicate_locked() Huang Ying
2018-10-10 7:19 ` [PATCH -V6 03/21] swap: Support PMD swap mapping in swap_duplicate() Huang Ying
2018-10-10 7:19 ` [PATCH -V6 04/21] swap: Support PMD swap mapping in put_swap_page() Huang Ying
2018-10-10 7:19 ` [PATCH -V6 05/21] swap: Support PMD swap mapping in free_swap_and_cache()/swap_free() Huang Ying
2018-10-10 7:19 ` [PATCH -V6 06/21] swap: Support PMD swap mapping when splitting huge PMD Huang Ying
2018-10-24 17:25 ` Daniel Jordan
2018-10-25 0:54 ` Huang, Ying
2018-10-25 0:54 ` Huang, Ying
2018-10-25 15:00 ` Daniel Jordan
2018-10-10 7:19 ` [PATCH -V6 07/21] swap: Support PMD swap mapping in split_swap_cluster() Huang Ying
2018-10-10 7:19 ` [PATCH -V6 08/21] swap: Support to read a huge swap cluster for swapin a THP Huang Ying
2018-10-10 7:19 ` [PATCH -V6 09/21] swap: Swapin a THP in one piece Huang Ying
2018-10-10 7:19 ` [PATCH -V6 10/21] swap: Support to count THP swapin and its fallback Huang Ying
2018-10-10 7:19 ` [PATCH -V6 11/21] swap: Add sysfs interface to configure THP swapin Huang Ying
2018-10-10 7:19 ` [PATCH -V6 12/21] swap: Support PMD swap mapping in swapoff Huang Ying
2018-10-10 7:19 ` [PATCH -V6 13/21] swap: Support PMD swap mapping in madvise_free() Huang Ying
2018-10-10 7:19 ` [PATCH -V6 14/21] swap: Support to move swap account for PMD swap mapping Huang Ying
2018-10-24 17:27 ` Daniel Jordan
2018-10-24 17:27 ` Daniel Jordan
2018-10-25 1:06 ` Huang, Ying
2018-10-25 1:06 ` Huang, Ying
2018-10-10 7:19 ` [PATCH -V6 15/21] swap: Support to copy PMD swap mapping when fork() Huang Ying
2018-10-10 7:19 ` [PATCH -V6 16/21] swap: Free PMD swap mapping when zap_huge_pmd() Huang Ying
2018-10-10 7:19 ` [PATCH -V6 17/21] swap: Support PMD swap mapping for MADV_WILLNEED Huang Ying
2018-10-10 7:19 ` [PATCH -V6 18/21] swap: Support PMD swap mapping in mincore() Huang Ying
2018-10-10 7:19 ` [PATCH -V6 19/21] swap: Support PMD swap mapping in common path Huang Ying
2018-10-10 7:19 ` [PATCH -V6 20/21] swap: create PMD swap mapping when unmap the THP Huang Ying
2018-10-10 7:19 ` [PATCH -V6 21/21] swap: Update help of CONFIG_THP_SWAP Huang Ying
2018-10-23 12:27 ` [PATCH -V6 00/21] swap: Swapout/swapin THP in one piece Daniel Jordan
2018-10-24 3:31 ` Huang, Ying [this message]
2018-10-24 3:31 ` Huang, Ying
2018-10-24 17:24 ` Daniel Jordan
2018-10-25 0:42 ` Huang, Ying
2018-10-25 0:42 ` Huang, Ying
2018-11-09 1:12 ` Daniel Jordan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87sh0wuijl.fsf@yhuang-dev.intel.com \
--to=ying.huang@intel.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=daniel.m.jordan@oracle.com \
--cc=dave.hansen@linux.intel.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=minchan@kernel.org \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=riel@redhat.com \
--cc=shli@kernel.org \
--cc=zi.yan@cs.rutgers.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.