All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Pankaj Gupta <pagupta@redhat.com>
Cc: Tariq Toukan <ttoukan.linux@gmail.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Tariq Toukan <tariqt@mellanox.com>,
	netdev@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm <linux-mm@kvack.org>,
	Saeed Mahameed <saeedm@mellanox.com>,
	brouer@redhat.com
Subject: Re: Page allocator order-0 optimizations merged
Date: Mon, 27 Mar 2017 14:39:47 +0200	[thread overview]
Message-ID: <20170327143947.4c237e54@redhat.com> (raw)
In-Reply-To: <20170327105514.1ed5b1ba@redhat.com>

On Mon, 27 Mar 2017 10:55:14 +0200
Jesper Dangaard Brouer <brouer@redhat.com> wrote:

> A possible solution, would be use the local_bh_{disable,enable} instead
> of the {preempt_disable,enable} calls.  But it is slower, using numbers
> from [1] (19 vs 11 cycles), thus the expected cycles saving is 38-19=19.
> 
> The problematic part of using local_bh_enable is that this adds a
> softirq/bottom-halves rescheduling point (as it checks for pending
> BHs).  Thus, this might affects real workloads.

I implemented this solution in patch below... and tested it on mlx5 at
50G with manually disabled driver-page-recycling.  It works for me.

To Mel, that do you prefer... a partial-revert or something like this?


[PATCH] mm, page_alloc: re-enable softirq use of per-cpu page allocator

From: Jesper Dangaard Brouer <brouer@redhat.com>

IRQ context were excluded from using the Per-Cpu-Pages (PCP) lists
caching of order-0 pages in commit 374ad05ab64d ("mm, page_alloc: only
use per-cpu allocator for irq-safe requests").

This unfortunately also included excluded SoftIRQ.  This hurt the
performance for the use-case of refilling DMA RX rings in softirq
context.

This patch re-allow softirq context, which should be safe by disabling
BH/softirq, while accessing the list.  And makes sure to avoid
PCP-lists access from both hard-IRQ and NMI context.

One concern with this change is adding a BH (enable) scheduling point
at both PCP alloc and free.

Fixes: 374ad05ab64d ("mm, page_alloc: only use per-cpu allocator for irq-safe requests")
---
 include/trace/events/kmem.h |    2 ++
 mm/page_alloc.c             |   41 ++++++++++++++++++++++++++++++++++-------
 2 files changed, 36 insertions(+), 7 deletions(-)

diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
index 6b2e154fd23a..ad412ad1b092 100644
--- a/include/trace/events/kmem.h
+++ b/include/trace/events/kmem.h
@@ -244,6 +244,8 @@ DECLARE_EVENT_CLASS(mm_page,
 		__entry->order,
 		__entry->migratetype,
 		__entry->order == 0)
+// WARNING: percpu_refill check not 100% correct after commit
+// 374ad05ab64d ("mm, page_alloc: only use per-cpu allocator for irq-safe requests")
 );
 
 DEFINE_EVENT(mm_page, mm_page_alloc_zone_locked,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6cbde310abed..db9ffc8ac538 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2470,6 +2470,25 @@ void mark_free_pages(struct zone *zone)
 }
 #endif /* CONFIG_PM */
 
+static __always_inline int in_irq_or_nmi(void)
+{
+	return in_irq() || in_nmi();
+// XXX: hoping compiler will optimize this (todo verify) into:
+// #define in_irq_or_nmi()	(preempt_count() & (HARDIRQ_MASK | NMI_MASK))
+
+	/* compiler was smart enough to only read __preempt_count once
+	 * but added two branches
+asm code:
+ │       mov    __preempt_count,%eax
+ │       test   $0xf0000,%eax    // HARDIRQ_MASK: 0x000f0000
+ │    ┌──jne    2a
+ │    │  test   $0x100000,%eax   // NMI_MASK:     0x00100000
+ │    │↓ je     3f
+ │ 2a:└─→mov    %rbx,%rdi
+
+	 */
+}
+
 /*
  * Free a 0-order page
  * cold == true ? free a cold page : free a hot page
@@ -2481,7 +2500,11 @@ void free_hot_cold_page(struct page *page, bool cold)
 	unsigned long pfn = page_to_pfn(page);
 	int migratetype;
 
-	if (in_interrupt()) {
+	/*
+	 * Exclude (hard) IRQ and NMI context from using the pcplists.
+	 * But allow softirq context, via disabling BH.
+	 */
+	if (in_irq_or_nmi()) {
 		__free_pages_ok(page, 0);
 		return;
 	}
@@ -2491,7 +2514,7 @@ void free_hot_cold_page(struct page *page, bool cold)
 
 	migratetype = get_pfnblock_migratetype(page, pfn);
 	set_pcppage_migratetype(page, migratetype);
-	preempt_disable();
+	local_bh_disable();
 
 	/*
 	 * We only track unmovable, reclaimable and movable on pcp lists.
@@ -2522,7 +2545,7 @@ void free_hot_cold_page(struct page *page, bool cold)
 	}
 
 out:
-	preempt_enable();
+	local_bh_enable();
 }
 
 /*
@@ -2647,7 +2670,7 @@ static struct page *__rmqueue_pcplist(struct zone *zone, int migratetype,
 {
 	struct page *page;
 
-	VM_BUG_ON(in_interrupt());
+	VM_BUG_ON(in_irq());
 
 	do {
 		if (list_empty(list)) {
@@ -2680,7 +2703,7 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone,
 	bool cold = ((gfp_flags & __GFP_COLD) != 0);
 	struct page *page;
 
-	preempt_disable();
+	local_bh_disable();
 	pcp = &this_cpu_ptr(zone->pageset)->pcp;
 	list = &pcp->lists[migratetype];
 	page = __rmqueue_pcplist(zone,  migratetype, cold, pcp, list);
@@ -2688,7 +2711,7 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone,
 		__count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order);
 		zone_statistics(preferred_zone, zone);
 	}
-	preempt_enable();
+	local_bh_enable();
 	return page;
 }
 
@@ -2704,7 +2727,11 @@ struct page *rmqueue(struct zone *preferred_zone,
 	unsigned long flags;
 	struct page *page;
 
-	if (likely(order == 0) && !in_interrupt()) {
+	/*
+	 * Exclude (hard) IRQ and NMI context from using the pcplists.
+	 * But allow softirq context, via disabling BH.
+	 */
+	if (likely(order == 0) && !in_irq_or_nmi() ) {
 		page = rmqueue_pcplist(preferred_zone, zone, order,
 				gfp_flags, migratetype);
 		goto out;


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2017-03-27 12:39 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-27 20:25 [merged] mm-page_alloc-only-use-per-cpu-allocator-for-irq-safe-requests.patch removed from -mm tree akpm
2017-03-01 13:48 ` Page allocator order-0 optimizations merged Jesper Dangaard Brouer
2017-03-01 17:36   ` Tariq Toukan
2017-03-01 17:36     ` Tariq Toukan
2017-03-22 17:39     ` Tariq Toukan
2017-03-22 17:39       ` Tariq Toukan
2017-03-22 23:40       ` Mel Gorman
2017-03-23 13:43         ` Jesper Dangaard Brouer
2017-03-23 14:51           ` Mel Gorman
2017-03-26  8:21             ` Tariq Toukan
2017-03-26 10:17               ` Tariq Toukan
2017-03-27  7:32                 ` Pankaj Gupta
2017-03-27  8:55                   ` Jesper Dangaard Brouer
2017-03-27 12:28                     ` Mel Gorman
2017-03-27 12:39                     ` Jesper Dangaard Brouer [this message]
2017-03-27 13:32                       ` Mel Gorman
2017-03-28  7:32                         ` Tariq Toukan
2017-03-28  8:29                           ` Jesper Dangaard Brouer
2017-03-28 16:05                           ` Tariq Toukan
2017-03-28 18:24                             ` Jesper Dangaard Brouer
2017-03-29  7:13                               ` Tariq Toukan
2017-03-28  8:28                         ` Pankaj Gupta
2017-03-27 14:15                       ` Matthew Wilcox
2017-03-27 14:15                         ` Matthew Wilcox
2017-03-27 15:15                         ` Jesper Dangaard Brouer
2017-03-27 16:58                           ` in_irq_or_nmi() Matthew Wilcox
2017-03-27 16:58                             ` in_irq_or_nmi() Matthew Wilcox
2017-03-27 16:58                             ` in_irq_or_nmi() Matthew Wilcox
2017-03-29  8:12                             ` in_irq_or_nmi() Peter Zijlstra
2017-03-29  8:12                               ` in_irq_or_nmi() Peter Zijlstra
2017-03-29  8:12                               ` in_irq_or_nmi() Peter Zijlstra
2017-03-29  8:59                               ` in_irq_or_nmi() Jesper Dangaard Brouer
2017-03-29  8:59                                 ` in_irq_or_nmi() Jesper Dangaard Brouer
2017-03-29  9:19                                 ` in_irq_or_nmi() Peter Zijlstra
2017-03-29  9:19                                   ` in_irq_or_nmi() Peter Zijlstra
2017-03-29  9:19                                   ` in_irq_or_nmi() Peter Zijlstra
2017-03-29 18:12                                   ` in_irq_or_nmi() Matthew Wilcox
2017-03-29 18:12                                     ` in_irq_or_nmi() Matthew Wilcox
2017-03-29 19:11                                     ` in_irq_or_nmi() Jesper Dangaard Brouer
2017-03-29 19:11                                       ` in_irq_or_nmi() Jesper Dangaard Brouer
2017-03-29 19:44                                       ` in_irq_or_nmi() and RFC patch Jesper Dangaard Brouer
2017-03-29 19:44                                         ` Jesper Dangaard Brouer
2017-03-30  6:49                                         ` Peter Zijlstra
2017-03-30  6:49                                           ` Peter Zijlstra
2017-03-30  7:12                                           ` Jesper Dangaard Brouer
2017-03-30  7:12                                             ` Jesper Dangaard Brouer
2017-03-30  7:35                                             ` Peter Zijlstra
2017-03-30  7:35                                               ` Peter Zijlstra
2017-03-30  9:46                                               ` Jesper Dangaard Brouer
2017-03-30  9:46                                                 ` Jesper Dangaard Brouer
2017-03-30 13:04                                         ` Mel Gorman
2017-03-30 13:04                                           ` Mel Gorman
2017-03-30 15:07                                           ` Jesper Dangaard Brouer
2017-03-30 15:07                                             ` Jesper Dangaard Brouer
2017-04-03 12:05                                             ` Mel Gorman
2017-04-03 12:05                                               ` Mel Gorman
2017-04-05  8:53                                               ` Mel Gorman
2017-04-05  8:53                                                 ` Mel Gorman
2017-04-10 14:31   ` Page allocator order-0 optimizations merged zhong jiang
2017-04-10 14:31     ` zhong jiang
2017-04-10 15:10     ` Mel Gorman
2017-04-11  1:54       ` zhong jiang
2017-04-11  1:54         ` zhong jiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170327143947.4c237e54@redhat.com \
    --to=brouer@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=netdev@vger.kernel.org \
    --cc=pagupta@redhat.com \
    --cc=saeedm@mellanox.com \
    --cc=tariqt@mellanox.com \
    --cc=ttoukan.linux@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.