From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
To: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
Alex Thorlton <athorlton@sgi.com>, Mel Gorman <mgorman@suse.de>,
Andi Kleen <andi@firstfloor.org>, Michal Hocko <mhocko@suse.cz>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Rik van Riel <riel@redhat.com>,
Andrea Arcangeli <aarcange@redhat.com>,
kirill.shutemov@linux.intel.com, linux-kernel@vger.kernel.org
Subject: RE: [PATCH v4] hugetlbfs: support split page table lock
Date: Mon, 16 Sep 2013 13:42:05 +0300 (EEST) [thread overview]
Message-ID: <20130916104205.5605CE0090@blue.fi.intel.com> (raw)
In-Reply-To: <1379117362-gwv3vrog-mutt-n-horiguchi@ah.jp.nec.com>
Naoya Horiguchi wrote:
> Hi,
>
> Kirill posted split_ptl patchset for thp today, so in this version
> I post only hugetlbfs part. I added Kconfig variables in following
> Kirill's patches (although without CONFIG_SPLIT_*_PTLOCK_CPUS.)
>
> This patch changes many lines, but all are in hugetlbfs specific code,
> so I think we can apply this independent of thp patches.
> -----
> From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Date: Fri, 13 Sep 2013 18:12:30 -0400
> Subject: [PATCH v4] hugetlbfs: support split page table lock
>
> Currently all of page table handling by hugetlbfs code are done under
> mm->page_table_lock. So when a process have many threads and they heavily
> access to the memory, lock contention happens and impacts the performance.
>
> This patch makes hugepage support split page table lock so that we use
> page->ptl of the leaf node of page table tree which is pte for normal pages
> but can be pmd and/or pud for hugepages of some architectures.
>
> ChangeLog v4:
> - introduce arch dependent macro ARCH_ENABLE_SPLIT_HUGETLB_PTLOCK
> (only defined for x86 for now)
> - rename USE_SPLIT_PTLOCKS_HUGETLB to USE_SPLIT_HUGETLB_PTLOCKS
>
> ChangeLog v3:
> - disable split ptl for ppc with USE_SPLIT_PTLOCKS_HUGETLB.
> - remove replacement in some architecture dependent code. This is justified
> because an allocation of pgd/pud/pmd/pte entry can race with other
> allocation, not with read/write access, so we can use different locks.
> http://thread.gmane.org/gmane.linux.kernel.mm/106292/focus=106458
>
> ChangeLog v2:
> - add split ptl on other archs missed in v1
> - drop changes on arch/{powerpc,tile}/mm/hugetlbpage.c
>
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> ---
> arch/x86/Kconfig | 4 +++
> include/linux/hugetlb.h | 20 +++++++++++
> include/linux/mm_types.h | 2 ++
> mm/Kconfig | 3 ++
> mm/hugetlb.c | 92 +++++++++++++++++++++++++++++-------------------
> mm/mempolicy.c | 5 +--
> mm/migrate.c | 4 +--
> mm/rmap.c | 2 +-
> 8 files changed, 91 insertions(+), 41 deletions(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 6a5cf6a..5b83d14 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1884,6 +1884,10 @@ config ARCH_ENABLE_SPLIT_PMD_PTLOCK
> def_bool y
> depends on X86_64 || X86_PAE
>
> +config ARCH_ENABLE_SPLIT_HUGETLB_PTLOCK
> + def_bool y
> + depends on X86_64 || X86_PAE
> +
> menu "Power management and ACPI options"
>
> config ARCH_HIBERNATION_HEADER
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 0393270..2cdac68 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -80,6 +80,24 @@ extern const unsigned long hugetlb_zero, hugetlb_infinity;
> extern int sysctl_hugetlb_shm_group;
> extern struct list_head huge_boot_pages;
>
> +#if USE_SPLIT_HUGETLB_PTLOCKS
> +#define huge_pte_lockptr(mm, ptep) ({__pte_lockptr(virt_to_page(ptep)); })
> +#else /* !USE_SPLIT_HUGETLB_PTLOCKS */
> +#define huge_pte_lockptr(mm, ptep) ({&(mm)->page_table_lock; })
> +#endif /* USE_SPLIT_HUGETLB_PTLOCKS */
> +
> +#define huge_pte_offset_lock(mm, address, ptlp) \
> +({ \
> + pte_t *__pte = huge_pte_offset(mm, address); \
> + spinlock_t *__ptl = NULL; \
> + if (__pte) { \
> + __ptl = huge_pte_lockptr(mm, __pte); \
> + *(ptlp) = __ptl; \
> + spin_lock(__ptl); \
> + } \
> + __pte; \
> +})
> +
[ Disclaimer: I don't know much about hugetlb. ]
I don't think it's correct. Few points:
- Hugetlb supports multiple page sizes: on x86_64 2M (PMD) and 1G (PUD).
My patchset only implements it for PMD. We don't even initialize
spinlock in struct page for PUD.
- If we enable split PMD lock we should use it *globally*. With you patch
we can end up with different locks used by hugetlb and rest of kernel
to protect the same PMD table if USE_SPLIT_HUGETLB_PTLOCKS !=
USE_SPLIT_PMD_PTLOCKS. It's just broken.
What we should really do is take split pmd lock (using pmd_lock*) if we
try to protect PMD level and fallback to mm->page_table_lock if we want to
protect upper levels.
> /* arch callbacks */
>
> pte_t *huge_pte_alloc(struct mm_struct *mm,
> @@ -164,6 +182,8 @@ static inline void __unmap_hugepage_range(struct mmu_gather *tlb,
> BUG();
> }
>
> +#define huge_pte_lockptr(mm, ptep) 0
> +
NULL?
> #endif /* !CONFIG_HUGETLB_PAGE */
>
> #define HUGETLB_ANON_FILE "anon_hugepage"
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
To: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
Alex Thorlton <athorlton@sgi.com>, Mel Gorman <mgorman@suse.de>,
Andi Kleen <andi@firstfloor.org>, Michal Hocko <mhocko@suse.cz>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Rik van Riel <riel@redhat.com>,
Andrea Arcangeli <aarcange@redhat.com>,
kirill.shutemov@linux.intel.com, linux-kernel@vger.kernel.org
Subject: RE: [PATCH v4] hugetlbfs: support split page table lock
Date: Mon, 16 Sep 2013 13:42:05 +0300 (EEST) [thread overview]
Message-ID: <20130916104205.5605CE0090@blue.fi.intel.com> (raw)
In-Reply-To: <1379117362-gwv3vrog-mutt-n-horiguchi@ah.jp.nec.com>
Naoya Horiguchi wrote:
> Hi,
>
> Kirill posted split_ptl patchset for thp today, so in this version
> I post only hugetlbfs part. I added Kconfig variables in following
> Kirill's patches (although without CONFIG_SPLIT_*_PTLOCK_CPUS.)
>
> This patch changes many lines, but all are in hugetlbfs specific code,
> so I think we can apply this independent of thp patches.
> -----
> From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Date: Fri, 13 Sep 2013 18:12:30 -0400
> Subject: [PATCH v4] hugetlbfs: support split page table lock
>
> Currently all of page table handling by hugetlbfs code are done under
> mm->page_table_lock. So when a process have many threads and they heavily
> access to the memory, lock contention happens and impacts the performance.
>
> This patch makes hugepage support split page table lock so that we use
> page->ptl of the leaf node of page table tree which is pte for normal pages
> but can be pmd and/or pud for hugepages of some architectures.
>
> ChangeLog v4:
> - introduce arch dependent macro ARCH_ENABLE_SPLIT_HUGETLB_PTLOCK
> (only defined for x86 for now)
> - rename USE_SPLIT_PTLOCKS_HUGETLB to USE_SPLIT_HUGETLB_PTLOCKS
>
> ChangeLog v3:
> - disable split ptl for ppc with USE_SPLIT_PTLOCKS_HUGETLB.
> - remove replacement in some architecture dependent code. This is justified
> because an allocation of pgd/pud/pmd/pte entry can race with other
> allocation, not with read/write access, so we can use different locks.
> http://thread.gmane.org/gmane.linux.kernel.mm/106292/focus=106458
>
> ChangeLog v2:
> - add split ptl on other archs missed in v1
> - drop changes on arch/{powerpc,tile}/mm/hugetlbpage.c
>
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> ---
> arch/x86/Kconfig | 4 +++
> include/linux/hugetlb.h | 20 +++++++++++
> include/linux/mm_types.h | 2 ++
> mm/Kconfig | 3 ++
> mm/hugetlb.c | 92 +++++++++++++++++++++++++++++-------------------
> mm/mempolicy.c | 5 +--
> mm/migrate.c | 4 +--
> mm/rmap.c | 2 +-
> 8 files changed, 91 insertions(+), 41 deletions(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 6a5cf6a..5b83d14 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1884,6 +1884,10 @@ config ARCH_ENABLE_SPLIT_PMD_PTLOCK
> def_bool y
> depends on X86_64 || X86_PAE
>
> +config ARCH_ENABLE_SPLIT_HUGETLB_PTLOCK
> + def_bool y
> + depends on X86_64 || X86_PAE
> +
> menu "Power management and ACPI options"
>
> config ARCH_HIBERNATION_HEADER
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 0393270..2cdac68 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -80,6 +80,24 @@ extern const unsigned long hugetlb_zero, hugetlb_infinity;
> extern int sysctl_hugetlb_shm_group;
> extern struct list_head huge_boot_pages;
>
> +#if USE_SPLIT_HUGETLB_PTLOCKS
> +#define huge_pte_lockptr(mm, ptep) ({__pte_lockptr(virt_to_page(ptep)); })
> +#else /* !USE_SPLIT_HUGETLB_PTLOCKS */
> +#define huge_pte_lockptr(mm, ptep) ({&(mm)->page_table_lock; })
> +#endif /* USE_SPLIT_HUGETLB_PTLOCKS */
> +
> +#define huge_pte_offset_lock(mm, address, ptlp) \
> +({ \
> + pte_t *__pte = huge_pte_offset(mm, address); \
> + spinlock_t *__ptl = NULL; \
> + if (__pte) { \
> + __ptl = huge_pte_lockptr(mm, __pte); \
> + *(ptlp) = __ptl; \
> + spin_lock(__ptl); \
> + } \
> + __pte; \
> +})
> +
[ Disclaimer: I don't know much about hugetlb. ]
I don't think it's correct. Few points:
- Hugetlb supports multiple page sizes: on x86_64 2M (PMD) and 1G (PUD).
My patchset only implements it for PMD. We don't even initialize
spinlock in struct page for PUD.
- If we enable split PMD lock we should use it *globally*. With you patch
we can end up with different locks used by hugetlb and rest of kernel
to protect the same PMD table if USE_SPLIT_HUGETLB_PTLOCKS !=
USE_SPLIT_PMD_PTLOCKS. It's just broken.
What we should really do is take split pmd lock (using pmd_lock*) if we
try to protect PMD level and fallback to mm->page_table_lock if we want to
protect upper levels.
> /* arch callbacks */
>
> pte_t *huge_pte_alloc(struct mm_struct *mm,
> @@ -164,6 +182,8 @@ static inline void __unmap_hugepage_range(struct mmu_gather *tlb,
> BUG();
> }
>
> +#define huge_pte_lockptr(mm, ptep) 0
> +
NULL?
> #endif /* !CONFIG_HUGETLB_PAGE */
>
> #define HUGETLB_ANON_FILE "anon_hugepage"
--
Kirill A. Shutemov
next prev parent reply other threads:[~2013-09-16 10:42 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-09-14 0:09 [PATCH v4] hugetlbfs: support split page table lock Naoya Horiguchi
2013-09-14 0:09 ` Naoya Horiguchi
2013-09-16 10:42 ` Kirill A. Shutemov [this message]
2013-09-16 10:42 ` Kirill A. Shutemov
2013-09-16 21:22 ` Naoya Horiguchi
2013-09-16 21:22 ` Naoya Horiguchi
2013-09-17 15:18 ` Kirill A. Shutemov
2013-09-17 15:18 ` Kirill A. Shutemov
2013-09-17 18:57 ` Naoya Horiguchi
2013-09-17 18:57 ` Naoya Horiguchi
2013-09-16 14:36 ` Aneesh Kumar K.V
2013-09-16 14:36 ` Aneesh Kumar K.V
2013-09-16 21:37 ` Naoya Horiguchi
2013-09-16 21:37 ` Naoya Horiguchi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130916104205.5605CE0090@blue.fi.intel.com \
--to=kirill.shutemov@linux.intel.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=athorlton@sgi.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@suse.cz \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.