From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Khalid Aziz <khalid.aziz@oracle.com>,
Pravin B Shelar <pshelar@nicira.com>,
Christoph Lameter <cl@linux.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Johannes Weiner <hannes@cmpxchg.org>, Mel Gorman <mel@csn.ul.ie>,
Rik van Riel <riel@redhat.com>, Minchan Kim <minchan@kernel.org>,
Andi Kleen <andi@firstfloor.org>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 3.4 24/26] mm: fix aio performance regression for database caused by THP
Date: Fri, 8 Nov 2013 22:51:53 -0800 [thread overview]
Message-ID: <20131109065051.882756124@linuxfoundation.org> (raw)
In-Reply-To: <20131109065050.089866597@linuxfoundation.org>
3.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Khalid Aziz <khalid.aziz@oracle.com>
commit 7cb2ef56e6a8b7b368b2e883a0a47d02fed66911 upstream.
I am working with a tool that simulates oracle database I/O workload.
This tool (orion to be specific -
<http://docs.oracle.com/cd/E11882_01/server.112/e16638/iodesign.htm#autoId24>)
allocates hugetlbfs pages using shmget() with SHM_HUGETLB flag. It then
does aio into these pages from flash disks using various common block
sizes used by database. I am looking at performance with two of the most
common block sizes - 1M and 64K. aio performance with these two block
sizes plunged after Transparent HugePages was introduced in the kernel.
Here are performance numbers:
pre-THP 2.6.39 3.11-rc5
1M read 8384 MB/s 5629 MB/s 6501 MB/s
64K read 7867 MB/s 4576 MB/s 4251 MB/s
I have narrowed the performance impact down to the overheads introduced by
THP in __get_page_tail() and put_compound_page() routines. perf top shows
>40% of cycles being spent in these two routines. Every time direct I/O
to hugetlbfs pages starts, kernel calls get_page() to grab a reference to
the pages and calls put_page() when I/O completes to put the reference
away. THP introduced significant amount of locking overhead to get_page()
and put_page() when dealing with compound pages because hugepages can be
split underneath get_page() and put_page(). It added this overhead
irrespective of whether it is dealing with hugetlbfs pages or transparent
hugepages. This resulted in 20%-45% drop in aio performance when using
hugetlbfs pages.
Since hugetlbfs pages can not be split, there is no reason to go through
all the locking overhead for these pages from what I can see. I added
code to __get_page_tail() and put_compound_page() to bypass all the
locking code when working with hugetlbfs pages. This improved performance
significantly. Performance numbers with this patch:
pre-THP 3.11-rc5 3.11-rc5 + Patch
1M read 8384 MB/s 6501 MB/s 8371 MB/s
64K read 7867 MB/s 4251 MB/s 6510 MB/s
Performance with 64K read is still lower than what it was before THP, but
still a 53% improvement. It does mean there is more work to be done but I
will take a 53% improvement for now.
Please take a look at the following patch and let me know if it looks
reasonable.
[akpm@linux-foundation.org: tweak comments]
Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Pravin B Shelar <pshelar@nicira.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
mm/swap.c | 31 +++++++++++++++++++++++++++++--
1 file changed, 29 insertions(+), 2 deletions(-)
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -30,6 +30,7 @@
#include <linux/backing-dev.h>
#include <linux/memcontrol.h>
#include <linux/gfp.h>
+#include <linux/hugetlb.h>
#include "internal.h"
@@ -68,13 +69,26 @@ static void __put_compound_page(struct p
{
compound_page_dtor *dtor;
- __page_cache_release(page);
+ if (!PageHuge(page))
+ __page_cache_release(page);
dtor = get_compound_page_dtor(page);
(*dtor)(page);
}
static void put_compound_page(struct page *page)
{
+ /*
+ * hugetlbfs pages cannot be split from under us. If this is a
+ * hugetlbfs page, check refcount on head page and release the page if
+ * the refcount becomes zero.
+ */
+ if (PageHuge(page)) {
+ page = compound_head(page);
+ if (put_page_testzero(page))
+ __put_compound_page(page);
+ return;
+ }
+
if (unlikely(PageTail(page))) {
/* __split_huge_page_refcount can run under us */
struct page *page_head = compound_trans_head(page);
@@ -159,8 +173,20 @@ bool __get_page_tail(struct page *page)
*/
unsigned long flags;
bool got = false;
- struct page *page_head = compound_trans_head(page);
+ struct page *page_head;
+
+ /*
+ * If this is a hugetlbfs page it cannot be split under us. Simply
+ * increment refcount for the head page.
+ */
+ if (PageHuge(page)) {
+ page_head = compound_head(page);
+ atomic_inc(&page_head->_count);
+ got = true;
+ goto out;
+ }
+ page_head = compound_trans_head(page);
if (likely(page != page_head && get_page_unless_zero(page_head))) {
/*
* page_head wasn't a dangling pointer but it
@@ -178,6 +204,7 @@ bool __get_page_tail(struct page *page)
if (unlikely(!got))
put_page(page_head);
}
+out:
return got;
}
EXPORT_SYMBOL(__get_page_tail);
next prev parent reply other threads:[~2013-11-09 7:44 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-09 6:51 [PATCH 3.4 00/26] 3.4.69-stable review Greg Kroah-Hartman
2013-11-09 6:51 ` [PATCH 3.4 01/26] USB: support new huawei devices in option.c Greg Kroah-Hartman
2013-11-09 6:51 ` [PATCH 3.4 02/26] USB: quirks.c: add one device that cannot deal with suspension Greg Kroah-Hartman
2013-11-09 6:51 ` [PATCH 3.4 03/26] USB: quirks: add touchscreen that is dazzeled by remote wakeup Greg Kroah-Hartman
2013-11-09 6:51 ` [PATCH 3.4 04/26] USB: serial: ftdi_sio: add id for Z3X Box device Greg Kroah-Hartman
2013-11-09 6:51 ` [PATCH 3.4 05/26] mac80211: correctly close cancelled scans Greg Kroah-Hartman
2013-11-09 6:51 ` [PATCH 3.4 06/26] mac80211: update sta->last_rx on acked tx frames Greg Kroah-Hartman
2013-11-09 6:51 ` [PATCH 3.4 07/26] rtlwifi: rtl8192cu: Fix error in pointer arithmetic Greg Kroah-Hartman
2013-11-09 6:51 ` [PATCH 3.4 08/26] jfs: fix error path in ialloc Greg Kroah-Hartman
2013-11-09 6:51 ` [PATCH 3.4 09/26] can: flexcan: flexcan_chip_start: fix regression, mark one MB for TX and abort pending TX Greg Kroah-Hartman
2013-11-09 6:51 ` [PATCH 3.4 10/26] libata: make ata_eh_qc_retry() bump scmd->allowed on bogus failures Greg Kroah-Hartman
2013-11-09 6:51 ` [PATCH 3.4 11/26] md: Fix skipping recovery for read-only arrays Greg Kroah-Hartman
2013-11-17 4:11 ` Ben Hutchings
2013-11-17 7:20 ` NeilBrown
2013-11-09 6:51 ` [PATCH 3.4 12/26] clockevents: Sanitize ticks to nsec conversion Greg Kroah-Hartman
2013-11-09 6:51 ` Greg Kroah-Hartman
2013-11-09 6:51 ` [PATCH 3.4 13/26] parisc: Do not crash 64bit SMP kernels on machines with >= 4GB RAM Greg Kroah-Hartman
2013-11-09 6:51 ` [PATCH 3.4 14/26] ALSA: hda - Add a fixup for ASUS N76VZ Greg Kroah-Hartman
2013-11-09 6:51 ` [PATCH 3.4 15/26] ALSA: fix oops in snd_pcm_info() caused by ASoC DPCM Greg Kroah-Hartman
2013-11-09 6:51 ` [PATCH 3.4 16/26] ASoC: wm_hubs: Add missing break in hp_supply_event() Greg Kroah-Hartman
2013-11-09 6:51 ` [PATCH 3.4 17/26] ASoC: dapm: Fix source list debugfs outputs Greg Kroah-Hartman
2013-11-09 6:51 ` [PATCH 3.4 18/26] staging: ozwpan: prevent overflow in oz_cdev_write() Greg Kroah-Hartman
2013-11-09 6:51 ` [PATCH 3.4 19/26] Staging: bcm: info leak in ioctl Greg Kroah-Hartman
2013-11-09 6:51 ` [PATCH 3.4 20/26] uml: check length in exitcode_proc_write() Greg Kroah-Hartman
2013-11-09 6:51 ` [PATCH 3.4 21/26] xtensa: dont use alternate signal stack on threads Greg Kroah-Hartman
2013-11-09 6:51 ` [PATCH 3.4 22/26] lib/scatterlist.c: dont flush_kernel_dcache_page on slab page Greg Kroah-Hartman
2013-11-09 6:51 ` [PATCH 3.4 23/26] aacraid: missing capable() check in compat ioctl Greg Kroah-Hartman
2013-11-09 6:51 ` Greg Kroah-Hartman [this message]
2013-11-09 6:51 ` [PATCH 3.4 25/26] drm: Prevent overwriting from userspace underallocating core ioctl structs Greg Kroah-Hartman
2013-11-09 6:51 ` [PATCH 3.4 26/26] drm/radeon/atom: workaround vbios bug in transmitter table on rs780 Greg Kroah-Hartman
2013-11-09 14:24 ` [PATCH 3.4 00/26] 3.4.69-stable review Satoru Takeuchi
2013-11-09 16:19 ` Greg Kroah-Hartman
2013-11-09 16:58 ` Guenter Roeck
2013-11-09 17:12 ` Greg Kroah-Hartman
2013-11-11 17:58 ` Shuah Khan
2013-11-11 22:51 ` Greg Kroah-Hartman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131109065051.882756124@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=cl@linux.com \
--cc=hannes@cmpxchg.org \
--cc=khalid.aziz@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mel@csn.ul.ie \
--cc=minchan@kernel.org \
--cc=pshelar@nicira.com \
--cc=riel@redhat.com \
--cc=stable@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.