From: Khalid Aziz <khalid.aziz@oracle.com>
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jack Wang <jinpu.wang@profitbricks.com>,
Luis Henriques <luis.henriques@canonical.com>,
linux-kernel@vger.kernel.org, stable@vger.kernel.org,
kernel-team@lists.ubuntu.com,
Pravin B Shelar <pshelar@nicira.com>,
Christoph Lameter <cl@linux.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Johannes Weiner <hannes@cmpxchg.org>, Mel Gorman <mel@csn.ul.ie>,
Rik van Riel <riel@redhat.com>, Minchan Kim <minchan@kernel.org>,
Andi Kleen <andi@firstfloor.org>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH 092/104] mm: fix aio performance regression for database caused by THP
Date: Mon, 30 Sep 2013 07:31:35 -0600 [thread overview]
Message-ID: <52497D37.9020706@oracle.com> (raw)
In-Reply-To: <20130930132642.GA7510@kroah.com>
On 09/30/2013 07:26 AM, Greg Kroah-Hartman wrote:
> On Mon, Sep 30, 2013 at 03:14:52PM +0200, Jack Wang wrote:
>> On 09/30/2013 12:11 PM, Luis Henriques wrote:
>>> 3.5.7.22 -stable review patch. If anyone has any objections, please let me know.
>>>
>>> ------------------
>>>
>>> From: Khalid Aziz <khalid.aziz@oracle.com>
>>>
>>> commit 7cb2ef56e6a8b7b368b2e883a0a47d02fed66911 upstream.
>>>
>>> I am working with a tool that simulates oracle database I/O workload.
>>> This tool (orion to be specific -
>>> <http://docs.oracle.com/cd/E11882_01/server.112/e16638/iodesign.htm#autoId24>)
>>> allocates hugetlbfs pages using shmget() with SHM_HUGETLB flag. It then
>>> does aio into these pages from flash disks using various common block
>>> sizes used by database. I am looking at performance with two of the most
>>> common block sizes - 1M and 64K. aio performance with these two block
>>> sizes plunged after Transparent HugePages was introduced in the kernel.
>>> Here are performance numbers:
>>>
>>> pre-THP 2.6.39 3.11-rc5
>>> 1M read 8384 MB/s 5629 MB/s 6501 MB/s
>>> 64K read 7867 MB/s 4576 MB/s 4251 MB/s
>>>
>>> I have narrowed the performance impact down to the overheads introduced by
>>> THP in __get_page_tail() and put_compound_page() routines. perf top shows
>>>> 40% of cycles being spent in these two routines. Every time direct I/O
>>> to hugetlbfs pages starts, kernel calls get_page() to grab a reference to
>>> the pages and calls put_page() when I/O completes to put the reference
>>> away. THP introduced significant amount of locking overhead to get_page()
>>> and put_page() when dealing with compound pages because hugepages can be
>>> split underneath get_page() and put_page(). It added this overhead
>>> irrespective of whether it is dealing with hugetlbfs pages or transparent
>>> hugepages. This resulted in 20%-45% drop in aio performance when using
>>> hugetlbfs pages.
>>>
>>> Since hugetlbfs pages can not be split, there is no reason to go through
>>> all the locking overhead for these pages from what I can see. I added
>>> code to __get_page_tail() and put_compound_page() to bypass all the
>>> locking code when working with hugetlbfs pages. This improved performance
>>> significantly. Performance numbers with this patch:
>>>
>>> pre-THP 3.11-rc5 3.11-rc5 + Patch
>>> 1M read 8384 MB/s 6501 MB/s 8371 MB/s
>>> 64K read 7867 MB/s 4251 MB/s 6510 MB/s
>>>
>>> Performance with 64K read is still lower than what it was before THP, but
>>> still a 53% improvement. It does mean there is more work to be done but I
>>> will take a 53% improvement for now.
>>>
>>> Please take a look at the following patch and let me know if it looks
>>> reasonable.
>>>
>>> [akpm@linux-foundation.org: tweak comments]
>>> Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
>>> Cc: Pravin B Shelar <pshelar@nicira.com>
>>> Cc: Christoph Lameter <cl@linux.com>
>>> Cc: Andrea Arcangeli <aarcange@redhat.com>
>>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>>> Cc: Mel Gorman <mel@csn.ul.ie>
>>> Cc: Rik van Riel <riel@redhat.com>
>>> Cc: Minchan Kim <minchan@kernel.org>
>>> Cc: Andi Kleen <andi@firstfloor.org>
>>> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>>> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
>>> [ luis: backported to 3.5: adjusted context ]
>>> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
>> Hi Greg,
>>
>> I suppose this patch also needed for 3.4, right?
>
> As it didn't originally apply there, I didn't apply it.
>
> If people think it should be applicable for 3.4, I'll take it.
>
> thanks,
>
> greg k-h
>
Hi Greg,
I did send you a backported version of this patch to apply to 3.0, 3.2
and 3.4 last Monday and cc'd stable@vger.kernel.org. That patch should
apply cleanly to those three kernels.
--
Khalid
next prev parent reply other threads:[~2013-09-30 13:31 UTC|newest]
Thread overview: 113+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-09-30 10:09 [ 3.5.y.z extended stable ] Linux 3.5.7.22 stable review Luis Henriques
2013-09-30 10:09 ` [PATCH 001/104] iwl4965: fix rfkill set state regression Luis Henriques
2013-09-30 10:09 ` [PATCH 002/104] ath9k_htc: Restore skb headroom when returning skb to mac80211 Luis Henriques
2013-09-30 10:09 ` [PATCH 003/104] ALSA: opti9xx: Fix conflicting driver object name Luis Henriques
2013-09-30 10:09 ` [PATCH 004/104] SUNRPC: Fix memory corruption issue on 32-bit highmem systems Luis Henriques
2013-09-30 10:09 ` [PATCH 005/104] drm/i915: ivb: fix edp voltage swing reg val Luis Henriques
2013-09-30 10:09 ` [PATCH 006/104] drm/vmwgfx: Split GMR2_REMAP commands if they are to large Luis Henriques
2013-09-30 10:09 ` [PATCH 007/104] ALSA: ak4xx-adda: info leak in ak4xxx_capture_source_info() Luis Henriques
2013-09-30 10:09 ` [PATCH 008/104] Bluetooth: Add support for Foxconn/Hon Hai [0489:e04d] Luis Henriques
2013-09-30 10:09 ` [PATCH 009/104] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal Luis Henriques
2013-09-30 10:09 ` [PATCH 010/104] xen-gnt: prevent adding duplicate gnt callbacks Luis Henriques
2013-09-30 10:09 ` [PATCH 011/104] usb: config->desc.bLength may not exceed amount of data returned by the device Luis Henriques
2013-09-30 10:09 ` [PATCH 012/104] USB: cdc-wdm: fix race between interrupt handler and tasklet Luis Henriques
2013-09-30 10:09 ` [PATCH 013/104] USB: handle LPM errors during device suspend correctly Luis Henriques
2013-09-30 10:09 ` [PATCH 014/104] xhci-plat: Don't enable legacy PCI interrupts Luis Henriques
2013-09-30 10:09 ` [PATCH 015/104] ASoC: wm8960: Fix PLL register writes Luis Henriques
2013-09-30 10:09 ` [PATCH 016/104] rculist: list_first_or_null_rcu() should use list_entry_rcu() Luis Henriques
2013-09-30 10:09 ` [PATCH 017/104] USB: mos7720: use GFP_ATOMIC under spinlock Luis Henriques
2013-09-30 10:09 ` [PATCH 018/104] USB: mos7720: fix big-endian control requests Luis Henriques
2013-09-30 10:09 ` [PATCH 019/104] staging: comedi: dt282x: dt282x_ai_insn_read() always fails Luis Henriques
2013-09-30 10:09 ` [PATCH 020/104] usb: ehci-mxc: check for pdata before dereferencing Luis Henriques
2013-09-30 10:09 ` [PATCH 021/104] usb: xhci: Disable runtime PM suspend for quirky controllers Luis Henriques
2013-09-30 10:09 ` [PATCH 022/104] USB: OHCI: Allow runtime PM without system sleep Luis Henriques
2013-09-30 10:10 ` [PATCH 023/104] ACPI / EC: Add HP Folio 13 to ec_dmi_table in order to skip DSDT scan Luis Henriques
2013-09-30 10:10 ` [PATCH 024/104] ACPI / EC: Add ASUSTEK L4R to quirk list in order to validate ECDT Luis Henriques
2013-09-30 10:10 ` [PATCH 025/104] USB: fix build error when CONFIG_PM_SLEEP isn't enabled Luis Henriques
2013-09-30 10:10 ` [PATCH 026/104] ALSA: hda - hdmi: Refactor hdmi_eld into parsed_hdmi_eld Luis Henriques
2013-09-30 10:29 ` David Henningsson
2013-09-30 11:10 ` Luis Henriques
2013-09-30 11:37 ` David Henningsson
2013-09-30 10:10 ` [PATCH 027/104] ALSA: hda - hdmi: Fallback to ALSA allocation when selecting CA Luis Henriques
2013-09-30 10:10 ` [PATCH 028/104] regmap: silence GCC warning Luis Henriques
2013-09-30 10:10 ` [PATCH 029/104] target: Fix trailing ASCII space usage in INQUIRY vendor+model Luis Henriques
2013-09-30 10:10 ` [PATCH 030/104] iwlwifi: dvm: don't send BT_CONFIG on devices w/o Bluetooth Luis Henriques
2013-09-30 10:10 ` [PATCH 031/104] Bluetooth: Add support for Mediatek Bluetooth device [0e8d:763f] Luis Henriques
2013-09-30 10:10 ` [PATCH 032/104] Bluetooth: ath3k: Add support for Fujitsu Lifebook UH5x2 [04c5:1330] Luis Henriques
2013-09-30 10:10 ` [PATCH 033/104] Bluetooth: ath3k: Add support for ID 0x13d3/0x3402 Luis Henriques
2013-09-30 10:10 ` [PATCH 034/104] Bluetooth: Add support for Atheros [0cf3:e003] Luis Henriques
2013-09-30 10:10 ` [PATCH 035/104] cifs: don't instantiate new dentries in readdir for inodes that need to be revalidated immediately Luis Henriques
2013-09-30 10:10 ` [PATCH 036/104] xen/events: mask events when changing their VCPU binding Luis Henriques
2013-09-30 10:10 ` [PATCH 037/104] tipc: fix lockdep warning during bearer initialization Luis Henriques
2013-09-30 10:10 ` [PATCH 038/104] htb: fix sign extension bug Luis Henriques
2013-09-30 10:10 ` [PATCH 039/104] net: check net.core.somaxconn sysctl values Luis Henriques
2013-09-30 10:10 ` [PATCH 040/104] neighbour: populate neigh_parms on alloc before calling ndo_neigh_setup Luis Henriques
2013-09-30 10:10 ` [PATCH 041/104] bonding: modify only neigh_parms owned by us Luis Henriques
2013-09-30 10:10 ` [PATCH 042/104] fib_trie: remove potential out of bound access Luis Henriques
2013-09-30 10:10 ` [PATCH 043/104] tcp: cubic: fix overflow error in bictcp_update() Luis Henriques
2013-09-30 10:10 ` [PATCH 044/104] tcp: cubic: fix bug in bictcp_acked() Luis Henriques
2013-09-30 10:10 ` [PATCH 045/104] macvtap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS Luis Henriques
2013-09-30 10:10 ` [PATCH 046/104] ipv6: don't stop backtracking in fib6_lookup_1 if subtree does not match Luis Henriques
2013-09-30 10:10 ` [PATCH 047/104] 8139cp: Fix skb leak in rx_status_loop failure path Luis Henriques
2013-09-30 10:10 ` [PATCH 048/104] tun: signedness bug in tun_get_user() Luis Henriques
2013-09-30 10:10 ` [PATCH 049/104] ipv6: remove max_addresses check from ipv6_create_tempaddr Luis Henriques
2013-09-30 10:10 ` [PATCH 050/104] ipv6: drop packets with multiple fragmentation headers Luis Henriques
2013-09-30 10:10 ` [PATCH 051/104] net: bridge: convert MLDv2 Query MRC into msecs_to_jiffies for max_delay Luis Henriques
2013-09-30 10:10 ` [PATCH 052/104] ICMPv6: treat dest unreachable codes 5 and 6 as EACCES, not EPROTO Luis Henriques
2013-09-30 10:10 ` [PATCH 053/104] ipv6: Don't depend on per socket memory for neighbour discovery messages Luis Henriques
2013-09-30 10:10 ` [PATCH 054/104] net: ipv6: tcp: fix potential use after free in tcp_v6_do_rcv Luis Henriques
2013-09-30 10:10 ` [PATCH 055/104] ath9k: always clear ps filter bit on new assoc Luis Henriques
2013-09-30 10:10 ` [PATCH 056/104] libceph: unregister request in __map_request failed and nofail == false Luis Henriques
2013-09-30 10:10 ` [PATCH 057/104] powerpc: Handle unaligned ldbrx/stdbrx Luis Henriques
2013-09-30 10:10 ` [PATCH 058/104] ath9k: fix rx descriptor related race condition Luis Henriques
2013-09-30 10:10 ` [PATCH 059/104] ath9k: avoid accessing MRC registers on single-chain devices Luis Henriques
2013-09-30 10:10 ` [PATCH 060/104] brcmsmac: Fix WARNING caused by lack of calls to dma_mapping_error() Luis Henriques
2013-09-30 10:10 ` [PATCH 061/104] mmc: tmio_mmc_dma: fix PIO fallback on SDHI Luis Henriques
2013-09-30 10:10 ` [PATCH 062/104] HID: validate HID report id size Luis Henriques
2013-09-30 10:10 ` [PATCH 063/104] of: Fix missing memory initialization on FDT unflattening Luis Henriques
2013-09-30 10:10 ` [PATCH 064/104] drm/edid: add quirk for Medion MD30217PG Luis Henriques
2013-09-30 10:10 ` [PATCH 065/104] drm/radeon: fix endian bugs in hw i2c atom routines Luis Henriques
2013-09-30 10:10 ` [PATCH 066/104] drm/radeon: update line buffer allocation for dce4.1/5 Luis Henriques
2013-09-30 10:10 ` [PATCH 067/104] drm/radeon: update line buffer allocation for dce6 Luis Henriques
2013-09-30 10:10 ` [PATCH 068/104] drm/radeon: fix LCD record parsing Luis Henriques
2013-09-30 10:10 ` [PATCH 069/104] drm/radeon: fix resume on some rs4xx boards (v2) Luis Henriques
2013-09-30 10:10 ` [PATCH 070/104] drm/radeon: fix handling of variable sized arrays for router objects Luis Henriques
2013-09-30 10:10 ` [PATCH 071/104] radeon kms: fix uninitialised hotplug work usage in r100_irq_process() Luis Henriques
2013-09-30 10:10 ` [PATCH 072/104] drm/radeon: fix init ordering for r600+ Luis Henriques
2013-09-30 10:10 ` [PATCH 073/104] HID: input: return ENODATA if reading battery attrs fails Luis Henriques
2013-09-30 10:10 ` [PATCH 074/104] HID: battery: don't do DMA from stack Luis Henriques
2013-09-30 10:10 ` [PATCH 075/104] fuse: postpone end_page_writeback() in fuse_writepage_locked() Luis Henriques
2013-09-30 10:10 ` [PATCH 076/104] fuse: invalidate inode attributes on xattr modification Luis Henriques
2013-09-30 10:10 ` [PATCH 077/104] s5p-g2d: Fix registration failure Luis Henriques
2013-09-30 10:10 ` [PATCH 078/104] DocBook: upgrade media_api DocBook version to 4.2 Luis Henriques
2013-09-30 10:10 ` [PATCH 079/104] v4l2: added missing mutex.h include to v4l2-ctrls.h Luis Henriques
2013-09-30 10:10 ` [PATCH 080/104] hdpvr: fix iteration over uninitialized lists in hdpvr_probe() Luis Henriques
2013-09-30 10:10 ` [PATCH 081/104] exynos4-is: Fix fimc-lite bayer formats Luis Henriques
2013-09-30 10:10 ` [PATCH 082/104] exynos4-is: Fix entity unregistration on error path Luis Henriques
2013-09-30 10:11 ` [PATCH 083/104] libceph: use pg_num_mask instead of pgp_num_mask for pg.seed calc Luis Henriques
2013-09-30 10:11 ` [PATCH 084/104] HID: pantherlord: validate output report details Luis Henriques
2013-09-30 10:11 ` [PATCH 085/104] HID: ntrig: validate feature " Luis Henriques
2013-09-30 10:11 ` [PATCH 086/104] HID: picolcd_core: validate output " Luis Henriques
2013-09-30 10:11 ` [PATCH 087/104] HID: check for NULL field when setting values Luis Henriques
2013-09-30 10:11 ` [PATCH 088/104] drm/i915: try not to lose backlight CBLV precision Luis Henriques
2013-09-30 10:11 ` [PATCH 089/104] powerpc: Default arch idle could cede processor on pseries Luis Henriques
2013-09-30 10:11 ` [PATCH 090/104] ocfs2: fix the end cluster offset of FIEMAP Luis Henriques
2013-09-30 10:11 ` [PATCH 091/104] mm/huge_memory.c: fix potential NULL pointer dereference Luis Henriques
2013-09-30 10:11 ` [PATCH 092/104] mm: fix aio performance regression for database caused by THP Luis Henriques
2013-09-30 13:14 ` Jack Wang
2013-09-30 13:26 ` Greg Kroah-Hartman
2013-09-30 13:31 ` Khalid Aziz [this message]
2013-09-30 15:00 ` Greg Kroah-Hartman
2013-10-03 2:33 ` Greg Kroah-Hartman
2013-09-30 10:11 ` [PATCH 093/104] memcg: fix multiple large threshold notifications Luis Henriques
2013-09-30 10:11 ` [PATCH 094/104] intel-iommu: Fix leaks in pagetable freeing Luis Henriques
2013-09-30 10:11 ` [PATCH 095/104] MIPS: ath79: Fix ar933x watchdog clock Luis Henriques
2013-09-30 10:11 ` [PATCH 096/104] ARM: PCI: versatile: Fix map_irq function to match hardware Luis Henriques
2013-09-30 10:11 ` [PATCH 097/104] ARM: PCI: versatile: Fix SMAP register offsets Luis Henriques
2013-09-30 10:11 ` [PATCH 098/104] crypto: api - Fix race condition in larval lookup Luis Henriques
2013-09-30 10:11 ` [PATCH 099/104] cifs: ensure that srv_mutex is held when dealing with ssocket pointer Luis Henriques
2013-09-30 10:11 ` [PATCH 100/104] ALSA: hda - Add Toshiba Satellite C870 to MSI blacklist Luis Henriques
2013-09-30 10:11 ` [PATCH 101/104] ASoC: mc13783: add spi errata fix Luis Henriques
2013-09-30 10:11 ` [PATCH 102/104] [SCSI] sd: Fix potential out-of-bounds access Luis Henriques
2013-09-30 10:11 ` [PATCH 103/104] Revert "zram: use zram->lock to protect zram_free_page() in swap free notify path" Luis Henriques
2013-09-30 10:11 ` [PATCH 104/104] kernel-doc: bugfix - multi-line macros Luis Henriques
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52497D37.9020706@oracle.com \
--to=khalid.aziz@oracle.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=cl@linux.com \
--cc=gregkh@linuxfoundation.org \
--cc=hannes@cmpxchg.org \
--cc=jinpu.wang@profitbricks.com \
--cc=kernel-team@lists.ubuntu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luis.henriques@canonical.com \
--cc=mel@csn.ul.ie \
--cc=minchan@kernel.org \
--cc=pshelar@nicira.com \
--cc=riel@redhat.com \
--cc=stable@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).