From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>,
Linux-MM <linux-mm@kvack.org>,
Hillf Danton <hillf.zj@alibaba-inc.com>,
brouer@redhat.com
Subject: Re: [PATCH 3/4] mm, page_allocator: Only use per-cpu allocator for irq-safe requests
Date: Wed, 11 Jan 2017 13:44:20 +0100 [thread overview]
Message-ID: <20170111134420.368efb9e@redhat.com> (raw)
In-Reply-To: <20170109163518.6001-4-mgorman@techsingularity.net>
On Mon, 9 Jan 2017 16:35:17 +0000 Mel Gorman <mgorman@techsingularity.net> wrote:
> The following is results from a page allocator micro-benchmark. Only
> order-0 is interesting as higher orders do not use the per-cpu allocator
Micro-benchmarked with [1] page_bench02:
modprobe page_bench02 page_order=0 run_flags=$((2#010)) loops=$((10**8)); \
rmmod page_bench02 ; dmesg --notime | tail -n 4
Compared to baseline: 213 cycles(tsc) 53.417 ns
- against this : 184 cycles(tsc) 46.056 ns
- Saving : -29 cycles
- Very close to expected 27 cycles saving [see below [2]]
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
[1] https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm/bench
-
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
[2] Expected saving comes from Mel removing a local_irq_{save,restore}
and adding a preempt_{disable,enable} instead.
Micro benchmarking via time_bench_sample[3], we get the cost of these
operations:
time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.232 ns (step:0)
time_bench: Type:spin_lock_unlock Per elem: 33 cycles(tsc) 8.334 ns (step:0)
time_bench: Type:spin_lock_unlock_irqsave Per elem: 62 cycles(tsc) 15.607 ns (step:0)
time_bench: Type:irqsave_before_lock Per elem: 57 cycles(tsc) 14.344 ns (step:0)
time_bench: Type:spin_lock_unlock_irq Per elem: 34 cycles(tsc) 8.560 ns (step:0)
time_bench: Type:simple_irq_disable_before_lock Per elem: 37 cycles(tsc) 9.289 ns (step:0)
time_bench: Type:local_BH_disable_enable Per elem: 19 cycles(tsc) 4.920 ns (step:0)
time_bench: Type:local_IRQ_disable_enable Per elem: 7 cycles(tsc) 1.864 ns (step:0)
time_bench: Type:local_irq_save_restore Per elem: 38 cycles(tsc) 9.665 ns (step:0)
[Mel's patch removes a ^^^^^^^^^^^^^^^^] ^^^^^^^^^ expected saving - preempt cost
time_bench: Type:preempt_disable_enable Per elem: 11 cycles(tsc) 2.794 ns (step:0)
[adds a preempt ^^^^^^^^^^^^^^^^^^^^^^] ^^^^^^^^^ adds this cost
time_bench: Type:funcion_call_cost Per elem: 6 cycles(tsc) 1.689 ns (step:0)
time_bench: Type:func_ptr_call_cost Per elem: 11 cycles(tsc) 2.767 ns (step:0)
time_bench: Type:page_alloc_put Per elem: 211 cycles(tsc) 52.803 ns (step:0)
Thus, expected improvement is: 38-11 = 27 cycles.
[3] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/time_bench_sample.c
CPU used: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
Config options of interest:
CONFIG_NUMA=y
CONFIG_DEBUG_LIST=n
CONFIG_VM_EVENT_COUNTERS=y
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>,
Linux-MM <linux-mm@kvack.org>,
Hillf Danton <hillf.zj@alibaba-inc.com>,
brouer@redhat.com
Subject: Re: [PATCH 3/4] mm, page_allocator: Only use per-cpu allocator for irq-safe requests
Date: Wed, 11 Jan 2017 13:44:20 +0100 [thread overview]
Message-ID: <20170111134420.368efb9e@redhat.com> (raw)
In-Reply-To: <20170109163518.6001-4-mgorman@techsingularity.net>
On Mon, 9 Jan 2017 16:35:17 +0000 Mel Gorman <mgorman@techsingularity.net> wrote:
> The following is results from a page allocator micro-benchmark. Only
> order-0 is interesting as higher orders do not use the per-cpu allocator
Micro-benchmarked with [1] page_bench02:
modprobe page_bench02 page_order=0 run_flags=$((2#010)) loops=$((10**8)); \
rmmod page_bench02 ; dmesg --notime | tail -n 4
Compared to baseline: 213 cycles(tsc) 53.417 ns
- against this : 184 cycles(tsc) 46.056 ns
- Saving : -29 cycles
- Very close to expected 27 cycles saving [see below [2]]
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
[1] https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm/bench
-
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
[2] Expected saving comes from Mel removing a local_irq_{save,restore}
and adding a preempt_{disable,enable} instead.
Micro benchmarking via time_bench_sample[3], we get the cost of these
operations:
time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.232 ns (step:0)
time_bench: Type:spin_lock_unlock Per elem: 33 cycles(tsc) 8.334 ns (step:0)
time_bench: Type:spin_lock_unlock_irqsave Per elem: 62 cycles(tsc) 15.607 ns (step:0)
time_bench: Type:irqsave_before_lock Per elem: 57 cycles(tsc) 14.344 ns (step:0)
time_bench: Type:spin_lock_unlock_irq Per elem: 34 cycles(tsc) 8.560 ns (step:0)
time_bench: Type:simple_irq_disable_before_lock Per elem: 37 cycles(tsc) 9.289 ns (step:0)
time_bench: Type:local_BH_disable_enable Per elem: 19 cycles(tsc) 4.920 ns (step:0)
time_bench: Type:local_IRQ_disable_enable Per elem: 7 cycles(tsc) 1.864 ns (step:0)
time_bench: Type:local_irq_save_restore Per elem: 38 cycles(tsc) 9.665 ns (step:0)
[Mel's patch removes a ^^^^^^^^^^^^^^^^] ^^^^^^^^^ expected saving - preempt cost
time_bench: Type:preempt_disable_enable Per elem: 11 cycles(tsc) 2.794 ns (step:0)
[adds a preempt ^^^^^^^^^^^^^^^^^^^^^^] ^^^^^^^^^ adds this cost
time_bench: Type:funcion_call_cost Per elem: 6 cycles(tsc) 1.689 ns (step:0)
time_bench: Type:func_ptr_call_cost Per elem: 11 cycles(tsc) 2.767 ns (step:0)
time_bench: Type:page_alloc_put Per elem: 211 cycles(tsc) 52.803 ns (step:0)
Thus, expected improvement is: 38-11 = 27 cycles.
[3] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/time_bench_sample.c
CPU used: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
Config options of interest:
CONFIG_NUMA=y
CONFIG_DEBUG_LIST=n
CONFIG_VM_EVENT_COUNTERS=y
next prev parent reply other threads:[~2017-01-11 12:44 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-01-09 16:35 [RFC PATCH 0/4] Fast noirq bulk page allocator v2r7 Mel Gorman
2017-01-09 16:35 ` Mel Gorman
2017-01-09 16:35 ` [PATCH 1/4] mm, page_alloc: Split buffered_rmqueue Mel Gorman
2017-01-09 16:35 ` Mel Gorman
2017-01-11 12:31 ` Jesper Dangaard Brouer
2017-01-11 12:31 ` Jesper Dangaard Brouer
2017-01-12 3:09 ` Hillf Danton
2017-01-12 3:09 ` Hillf Danton
2017-01-09 16:35 ` [PATCH 2/4] mm, page_alloc: Split alloc_pages_nodemask Mel Gorman
2017-01-09 16:35 ` Mel Gorman
2017-01-11 12:32 ` Jesper Dangaard Brouer
2017-01-11 12:32 ` Jesper Dangaard Brouer
2017-01-12 3:11 ` Hillf Danton
2017-01-12 3:11 ` Hillf Danton
2017-01-09 16:35 ` [PATCH 3/4] mm, page_allocator: Only use per-cpu allocator for irq-safe requests Mel Gorman
2017-01-09 16:35 ` Mel Gorman
2017-01-11 12:44 ` Jesper Dangaard Brouer [this message]
2017-01-11 12:44 ` Jesper Dangaard Brouer
2017-01-11 13:27 ` Jesper Dangaard Brouer
2017-01-11 13:27 ` Jesper Dangaard Brouer
2017-01-12 10:47 ` Mel Gorman
2017-01-12 10:47 ` Mel Gorman
2017-01-09 16:35 ` [PATCH 4/4] mm, page_alloc: Add a bulk page allocator Mel Gorman
2017-01-09 16:35 ` Mel Gorman
2017-01-10 4:00 ` Hillf Danton
2017-01-10 4:00 ` Hillf Danton
2017-01-10 8:34 ` Mel Gorman
2017-01-10 8:34 ` Mel Gorman
2017-01-16 14:25 ` Jesper Dangaard Brouer
2017-01-16 14:25 ` Jesper Dangaard Brouer
2017-01-16 15:01 ` Mel Gorman
2017-01-16 15:01 ` Mel Gorman
2017-01-29 4:00 ` [RFC PATCH 0/4] Fast noirq bulk page allocator v2r7 Andy Lutomirski
2017-01-29 4:00 ` Andy Lutomirski
-- strict thread matches above, loose matches on Subject: below --
2017-01-04 11:10 [RFC PATCH 0/4] Fast noirq bulk page allocator Mel Gorman
2017-01-04 11:10 ` [PATCH 3/4] mm, page_allocator: Only use per-cpu allocator for irq-safe requests Mel Gorman
2017-01-04 11:10 ` Mel Gorman
2017-01-04 14:20 ` Jesper Dangaard Brouer
2017-01-04 14:20 ` Jesper Dangaard Brouer
2017-01-06 3:26 ` Hillf Danton
2017-01-06 3:26 ` Hillf Danton
2017-01-06 10:15 ` Mel Gorman
2017-01-06 10:15 ` Mel Gorman
2017-01-09 3:14 ` Hillf Danton
2017-01-09 3:14 ` Hillf Danton
2017-01-09 9:48 ` Mel Gorman
2017-01-09 9:48 ` Mel Gorman
2017-01-09 9:55 ` Hillf Danton
2017-01-09 9:55 ` Hillf Danton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170111134420.368efb9e@redhat.com \
--to=brouer@redhat.com \
--cc=hillf.zj@alibaba-inc.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.