Re: [PATCH 3/4] mm, page_allocator: Only use per-cpu allocator for irq-safe requests

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>,
	Hillf Danton <hillf.zj@alibaba-inc.com>,
	brouer@redhat.com
Subject: Re: [PATCH 3/4] mm, page_allocator: Only use per-cpu allocator for irq-safe requests
Date: Wed, 11 Jan 2017 13:44:20 +0100	[thread overview]
Message-ID: <20170111134420.368efb9e@redhat.com> (raw)
In-Reply-To: <20170109163518.6001-4-mgorman@techsingularity.net>


On Mon,  9 Jan 2017 16:35:17 +0000 Mel Gorman <mgorman@techsingularity.net> wrote:
 
> The following is results from a page allocator micro-benchmark. Only
> order-0 is interesting as higher orders do not use the per-cpu allocator

Micro-benchmarked with [1] page_bench02:
 modprobe page_bench02 page_order=0 run_flags=$((2#010)) loops=$((10**8)); \
  rmmod page_bench02 ; dmesg --notime | tail -n 4

Compared to baseline: 213 cycles(tsc) 53.417 ns
 - against this     : 184 cycles(tsc) 46.056 ns
 - Saving           : -29 cycles
 - Very close to expected 27 cycles saving [see below [2]]


> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>

Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>

[1] https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm/bench
-
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

[2] Expected saving comes from Mel removing a local_irq_{save,restore}
and adding a preempt_{disable,enable} instead.

Micro benchmarking via time_bench_sample[3], we get the cost of these
operations:

 time_bench: Type:for_loop                 Per elem: 0 cycles(tsc) 0.232 ns (step:0)
 time_bench: Type:spin_lock_unlock         Per elem: 33 cycles(tsc) 8.334 ns (step:0)
 time_bench: Type:spin_lock_unlock_irqsave Per elem: 62 cycles(tsc) 15.607 ns (step:0)
 time_bench: Type:irqsave_before_lock      Per elem: 57 cycles(tsc) 14.344 ns (step:0)
 time_bench: Type:spin_lock_unlock_irq     Per elem: 34 cycles(tsc) 8.560 ns (step:0)
 time_bench: Type:simple_irq_disable_before_lock Per elem: 37 cycles(tsc) 9.289 ns (step:0)
 time_bench: Type:local_BH_disable_enable  Per elem: 19 cycles(tsc) 4.920 ns (step:0)
 time_bench: Type:local_IRQ_disable_enable Per elem: 7 cycles(tsc) 1.864 ns (step:0)
 time_bench: Type:local_irq_save_restore   Per elem: 38 cycles(tsc) 9.665 ns (step:0)
 [Mel's patch removes a ^^^^^^^^^^^^^^^^]            ^^^^^^^^^ expected saving - preempt cost
 time_bench: Type:preempt_disable_enable   Per elem: 11 cycles(tsc) 2.794 ns (step:0)
 [adds a preempt  ^^^^^^^^^^^^^^^^^^^^^^]            ^^^^^^^^^ adds this cost
 time_bench: Type:funcion_call_cost        Per elem: 6 cycles(tsc) 1.689 ns (step:0)
 time_bench: Type:func_ptr_call_cost       Per elem: 11 cycles(tsc) 2.767 ns (step:0)
 time_bench: Type:page_alloc_put           Per elem: 211 cycles(tsc) 52.803 ns (step:0)

Thus, expected improvement is: 38-11 = 27 cycles.

[3] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/time_bench_sample.c

CPU used: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz

Config options of interest:
 CONFIG_NUMA=y
 CONFIG_DEBUG_LIST=n
 CONFIG_VM_EVENT_COUNTERS=y

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>,
	Hillf Danton <hillf.zj@alibaba-inc.com>,
	brouer@redhat.com
Subject: Re: [PATCH 3/4] mm, page_allocator: Only use per-cpu allocator for irq-safe requests
Date: Wed, 11 Jan 2017 13:44:20 +0100	[thread overview]
Message-ID: <20170111134420.368efb9e@redhat.com> (raw)
In-Reply-To: <20170109163518.6001-4-mgorman@techsingularity.net>


On Mon,  9 Jan 2017 16:35:17 +0000 Mel Gorman <mgorman@techsingularity.net> wrote:
 
> The following is results from a page allocator micro-benchmark. Only
> order-0 is interesting as higher orders do not use the per-cpu allocator

Micro-benchmarked with [1] page_bench02:
 modprobe page_bench02 page_order=0 run_flags=$((2#010)) loops=$((10**8)); \
  rmmod page_bench02 ; dmesg --notime | tail -n 4

Compared to baseline: 213 cycles(tsc) 53.417 ns
 - against this     : 184 cycles(tsc) 46.056 ns
 - Saving           : -29 cycles
 - Very close to expected 27 cycles saving [see below [2]]


> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>

Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>

[1] https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm/bench
-
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

[2] Expected saving comes from Mel removing a local_irq_{save,restore}
and adding a preempt_{disable,enable} instead.

Micro benchmarking via time_bench_sample[3], we get the cost of these
operations:

 time_bench: Type:for_loop                 Per elem: 0 cycles(tsc) 0.232 ns (step:0)
 time_bench: Type:spin_lock_unlock         Per elem: 33 cycles(tsc) 8.334 ns (step:0)
 time_bench: Type:spin_lock_unlock_irqsave Per elem: 62 cycles(tsc) 15.607 ns (step:0)
 time_bench: Type:irqsave_before_lock      Per elem: 57 cycles(tsc) 14.344 ns (step:0)
 time_bench: Type:spin_lock_unlock_irq     Per elem: 34 cycles(tsc) 8.560 ns (step:0)
 time_bench: Type:simple_irq_disable_before_lock Per elem: 37 cycles(tsc) 9.289 ns (step:0)
 time_bench: Type:local_BH_disable_enable  Per elem: 19 cycles(tsc) 4.920 ns (step:0)
 time_bench: Type:local_IRQ_disable_enable Per elem: 7 cycles(tsc) 1.864 ns (step:0)
 time_bench: Type:local_irq_save_restore   Per elem: 38 cycles(tsc) 9.665 ns (step:0)
 [Mel's patch removes a ^^^^^^^^^^^^^^^^]            ^^^^^^^^^ expected saving - preempt cost
 time_bench: Type:preempt_disable_enable   Per elem: 11 cycles(tsc) 2.794 ns (step:0)
 [adds a preempt  ^^^^^^^^^^^^^^^^^^^^^^]            ^^^^^^^^^ adds this cost
 time_bench: Type:funcion_call_cost        Per elem: 6 cycles(tsc) 1.689 ns (step:0)
 time_bench: Type:func_ptr_call_cost       Per elem: 11 cycles(tsc) 2.767 ns (step:0)
 time_bench: Type:page_alloc_put           Per elem: 211 cycles(tsc) 52.803 ns (step:0)

Thus, expected improvement is: 38-11 = 27 cycles.

[3] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/time_bench_sample.c

CPU used: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz

Config options of interest:
 CONFIG_NUMA=y
 CONFIG_DEBUG_LIST=n
 CONFIG_VM_EVENT_COUNTERS=y

next prev parent reply	other threads:[~2017-01-11 12:44 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-09 16:35 [RFC PATCH 0/4] Fast noirq bulk page allocator v2r7 Mel Gorman
2017-01-09 16:35 ` Mel Gorman
2017-01-09 16:35 ` [PATCH 1/4] mm, page_alloc: Split buffered_rmqueue Mel Gorman
2017-01-09 16:35   ` Mel Gorman
2017-01-11 12:31   ` Jesper Dangaard Brouer
2017-01-11 12:31     ` Jesper Dangaard Brouer
2017-01-12  3:09   ` Hillf Danton
2017-01-12  3:09     ` Hillf Danton
2017-01-09 16:35 ` [PATCH 2/4] mm, page_alloc: Split alloc_pages_nodemask Mel Gorman
2017-01-09 16:35   ` Mel Gorman
2017-01-11 12:32   ` Jesper Dangaard Brouer
2017-01-11 12:32     ` Jesper Dangaard Brouer
2017-01-12  3:11   ` Hillf Danton
2017-01-12  3:11     ` Hillf Danton
2017-01-09 16:35 ` [PATCH 3/4] mm, page_allocator: Only use per-cpu allocator for irq-safe requests Mel Gorman
2017-01-09 16:35   ` Mel Gorman
2017-01-11 12:44   ` Jesper Dangaard Brouer [this message]
2017-01-11 12:44     ` Jesper Dangaard Brouer
2017-01-11 13:27     ` Jesper Dangaard Brouer
2017-01-11 13:27       ` Jesper Dangaard Brouer
2017-01-12 10:47       ` Mel Gorman
2017-01-12 10:47         ` Mel Gorman
2017-01-09 16:35 ` [PATCH 4/4] mm, page_alloc: Add a bulk page allocator Mel Gorman
2017-01-09 16:35   ` Mel Gorman
2017-01-10  4:00   ` Hillf Danton
2017-01-10  4:00     ` Hillf Danton
2017-01-10  8:34     ` Mel Gorman
2017-01-10  8:34       ` Mel Gorman
2017-01-16 14:25   ` Jesper Dangaard Brouer
2017-01-16 14:25     ` Jesper Dangaard Brouer
2017-01-16 15:01     ` Mel Gorman
2017-01-16 15:01       ` Mel Gorman
2017-01-29  4:00 ` [RFC PATCH 0/4] Fast noirq bulk page allocator v2r7 Andy Lutomirski
2017-01-29  4:00   ` Andy Lutomirski
  -- strict thread matches above, loose matches on Subject: below --
2017-01-04 11:10 [RFC PATCH 0/4] Fast noirq bulk page allocator Mel Gorman
2017-01-04 11:10 ` [PATCH 3/4] mm, page_allocator: Only use per-cpu allocator for irq-safe requests Mel Gorman
2017-01-04 11:10   ` Mel Gorman
2017-01-04 14:20   ` Jesper Dangaard Brouer
2017-01-04 14:20     ` Jesper Dangaard Brouer
2017-01-06  3:26   ` Hillf Danton
2017-01-06  3:26     ` Hillf Danton
2017-01-06 10:15     ` Mel Gorman
2017-01-06 10:15       ` Mel Gorman
2017-01-09  3:14       ` Hillf Danton
2017-01-09  3:14         ` Hillf Danton
2017-01-09  9:48         ` Mel Gorman
2017-01-09  9:48           ` Mel Gorman
2017-01-09  9:55           ` Hillf Danton
2017-01-09  9:55             ` Hillf Danton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170111134420.368efb9e@redhat.com \
    --to=brouer@redhat.com \
    --cc=hillf.zj@alibaba-inc.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.