From: Steven Rostedt <rostedt@goodmis.org>
To: linux-kernel@vger.kernel.org
Cc: Ingo Molnar <mingo@elte.hu>,
Andrew Morton <akpm@linux-foundation.org>,
Frederic Weisbecker <fweisbec@gmail.com>,
Vaibhav Nagarnaik <vnagarnaik@google.com>,
Ingo Molnar <mingo@redhat.com>, Michael Rubin <mrubin@google.com>,
David Sharp <dhsharp@google.com>
Subject: [PATCH 02/15] tracing: Use NUMA allocation for per-cpu ring buffer pages
Date: Thu, 09 Jun 2011 13:27:46 -0400 [thread overview]
Message-ID: <20110609172910.665010533@goodmis.org> (raw)
In-Reply-To: 20110609172744.333794089@goodmis.org
[-- Attachment #1: 0002-tracing-Use-NUMA-allocation-for-per-cpu-ring-buffer-.patch --]
[-- Type: text/plain, Size: 6890 bytes --]
From: Vaibhav Nagarnaik <vnagarnaik@google.com>
The tracing ring buffer is a group of per-cpu ring buffers where
allocation and logging is done on a per-cpu basis. The events that are
generated on a particular CPU are logged in the corresponding buffer.
This is to provide wait-free writes between CPUs and good NUMA node
locality while accessing the ring buffer.
However, the allocation routines consider NUMA locality only for buffer
page metadata and not for the actual buffer page. This causes the pages
to be allocated on the NUMA node local to the CPU where the allocation
routine is running at the time.
This patch fixes the problem by using a NUMA node specific allocation
routine so that the pages are allocated from a NUMA node local to the
logging CPU.
I tested with the getuid_microbench from autotest. It is a simple binary
that calls getuid() in a loop and measures the average time for the
syscall to complete. The following command was used to test:
$ getuid_microbench 1000000
Compared the numbers found on kernel with and without this patch and
found that logging latency decreases by 30-50 ns/call.
tracing with non-NUMA allocation - 569 ns/call
tracing with NUMA allocation - 512 ns/call
Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Michael Rubin <mrubin@google.com>
Cc: David Sharp <dhsharp@google.com>
Link: http://lkml.kernel.org/r/1304470602-20366-1-git-send-email-vnagarnaik@google.com
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
include/linux/ring_buffer.h | 2 +-
kernel/trace/ring_buffer.c | 36 +++++++++++++++++----------------
kernel/trace/ring_buffer_benchmark.c | 2 +-
kernel/trace/trace.c | 6 +++-
4 files changed, 25 insertions(+), 21 deletions(-)
diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index ab38ac8..b891de9 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -169,7 +169,7 @@ void ring_buffer_set_clock(struct ring_buffer *buffer,
size_t ring_buffer_page_len(void *page);
-void *ring_buffer_alloc_read_page(struct ring_buffer *buffer);
+void *ring_buffer_alloc_read_page(struct ring_buffer *buffer, int cpu);
void ring_buffer_free_read_page(struct ring_buffer *buffer, void *data);
int ring_buffer_read_page(struct ring_buffer *buffer, void **data_page,
size_t len, int cpu, int full);
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index b0c7aa4..2780e60 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -997,13 +997,13 @@ static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
unsigned nr_pages)
{
struct buffer_page *bpage, *tmp;
- unsigned long addr;
LIST_HEAD(pages);
unsigned i;
WARN_ON(!nr_pages);
for (i = 0; i < nr_pages; i++) {
+ struct page *page;
bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()),
GFP_KERNEL, cpu_to_node(cpu_buffer->cpu));
if (!bpage)
@@ -1013,10 +1013,11 @@ static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
list_add(&bpage->list, &pages);
- addr = __get_free_page(GFP_KERNEL);
- if (!addr)
+ page = alloc_pages_node(cpu_to_node(cpu_buffer->cpu),
+ GFP_KERNEL, 0);
+ if (!page)
goto free_pages;
- bpage->page = (void *)addr;
+ bpage->page = page_address(page);
rb_init_page(bpage->page);
}
@@ -1045,7 +1046,7 @@ rb_allocate_cpu_buffer(struct ring_buffer *buffer, int cpu)
{
struct ring_buffer_per_cpu *cpu_buffer;
struct buffer_page *bpage;
- unsigned long addr;
+ struct page *page;
int ret;
cpu_buffer = kzalloc_node(ALIGN(sizeof(*cpu_buffer), cache_line_size()),
@@ -1067,10 +1068,10 @@ rb_allocate_cpu_buffer(struct ring_buffer *buffer, int cpu)
rb_check_bpage(cpu_buffer, bpage);
cpu_buffer->reader_page = bpage;
- addr = __get_free_page(GFP_KERNEL);
- if (!addr)
+ page = alloc_pages_node(cpu_to_node(cpu), GFP_KERNEL, 0);
+ if (!page)
goto fail_free_reader;
- bpage->page = (void *)addr;
+ bpage->page = page_address(page);
rb_init_page(bpage->page);
INIT_LIST_HEAD(&cpu_buffer->reader_page->list);
@@ -1314,7 +1315,6 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
unsigned nr_pages, rm_pages, new_pages;
struct buffer_page *bpage, *tmp;
unsigned long buffer_size;
- unsigned long addr;
LIST_HEAD(pages);
int i, cpu;
@@ -1375,16 +1375,18 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
for_each_buffer_cpu(buffer, cpu) {
for (i = 0; i < new_pages; i++) {
+ struct page *page;
bpage = kzalloc_node(ALIGN(sizeof(*bpage),
cache_line_size()),
GFP_KERNEL, cpu_to_node(cpu));
if (!bpage)
goto free_pages;
list_add(&bpage->list, &pages);
- addr = __get_free_page(GFP_KERNEL);
- if (!addr)
+ page = alloc_pages_node(cpu_to_node(cpu), GFP_KERNEL,
+ 0);
+ if (!page)
goto free_pages;
- bpage->page = (void *)addr;
+ bpage->page = page_address(page);
rb_init_page(bpage->page);
}
}
@@ -3730,16 +3732,16 @@ EXPORT_SYMBOL_GPL(ring_buffer_swap_cpu);
* Returns:
* The page allocated, or NULL on error.
*/
-void *ring_buffer_alloc_read_page(struct ring_buffer *buffer)
+void *ring_buffer_alloc_read_page(struct ring_buffer *buffer, int cpu)
{
struct buffer_data_page *bpage;
- unsigned long addr;
+ struct page *page;
- addr = __get_free_page(GFP_KERNEL);
- if (!addr)
+ page = alloc_pages_node(cpu_to_node(cpu), GFP_KERNEL, 0);
+ if (!page)
return NULL;
- bpage = (void *)addr;
+ bpage = page_address(page);
rb_init_page(bpage);
diff --git a/kernel/trace/ring_buffer_benchmark.c b/kernel/trace/ring_buffer_benchmark.c
index 302f8a6..a5457d5 100644
--- a/kernel/trace/ring_buffer_benchmark.c
+++ b/kernel/trace/ring_buffer_benchmark.c
@@ -106,7 +106,7 @@ static enum event_status read_page(int cpu)
int inc;
int i;
- bpage = ring_buffer_alloc_read_page(buffer);
+ bpage = ring_buffer_alloc_read_page(buffer, cpu);
if (!bpage)
return EVENT_DROPPED;
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 2af132e..6368eeb 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -3697,7 +3697,8 @@ tracing_buffers_read(struct file *filp, char __user *ubuf,
return 0;
if (!info->spare)
- info->spare = ring_buffer_alloc_read_page(info->tr->buffer);
+ info->spare = ring_buffer_alloc_read_page(info->tr->buffer,
+ info->cpu);
if (!info->spare)
return -ENOMEM;
@@ -3854,7 +3855,8 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
ref->ref = 1;
ref->buffer = info->tr->buffer;
- ref->page = ring_buffer_alloc_read_page(ref->buffer);
+ ref->page = ring_buffer_alloc_read_page(ref->buffer,
+ info->cpu);
if (!ref->page) {
kfree(ref);
break;
--
1.7.4.4
next prev parent reply other threads:[~2011-06-09 17:31 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-09 17:27 [PATCH 00/15] [GIT PULL] tracing: various updates Steven Rostedt
2011-06-09 17:27 ` [PATCH 01/15] tracing: Schedule a delayed work to call wakeup() Steven Rostedt
2011-06-13 10:07 ` Ingo Molnar
2011-06-13 10:27 ` Pekka Enberg
2011-06-13 11:45 ` Steven Rostedt
2011-06-09 17:27 ` Steven Rostedt [this message]
2011-06-13 10:09 ` [PATCH 02/15] tracing: Use NUMA allocation for per-cpu ring buffer pages Ingo Molnar
2011-06-13 11:28 ` Steven Rostedt
2011-06-09 17:27 ` [PATCH 03/15] tracing: Add a free on close control mechanism for buffer_size_kb Steven Rostedt
2011-06-13 10:12 ` Ingo Molnar
2011-06-13 11:39 ` Steven Rostedt
2011-06-13 11:49 ` Ingo Molnar
2011-06-13 11:54 ` Steven Rostedt
2011-06-13 19:12 ` Vaibhav Nagarnaik
2011-06-13 20:01 ` Vaibhav Nagarnaik
2011-06-14 0:37 ` Steven Rostedt
2011-06-14 0:43 ` Vaibhav Nagarnaik
2011-06-09 17:27 ` [PATCH 04/15] ftrace: Fixed an include coding style issue Steven Rostedt
2011-06-09 17:27 ` [PATCH 05/15] async: " Steven Rostedt
2011-06-09 17:27 ` [PATCH 06/15] tracing, function_graph: Remove dependency of abstime and duration Steven Rostedt
2011-06-09 17:27 ` [PATCH 07/15] tracing, function_graph: Merge overhead and duration display Steven Rostedt
2011-06-09 17:27 ` [PATCH 08/15] tracing, function: Fix trace header to follow context-info option Steven Rostedt
2011-06-09 17:27 ` [PATCH 09/15] tracing, function_graph: Remove lock-depth from latency trace Steven Rostedt
2011-06-09 17:27 ` [PATCH 10/15] tracing, function_graph: Add context-info support for function_graph Steven Rostedt
2011-06-09 17:27 ` [PATCH 11/15] tracing: Convert to kstrtoul_from_user Steven Rostedt
2011-06-09 17:27 ` [PATCH 12/15] ring-buffer: Set __GFP_NORETRY flag for ring buffer allocating Steven Rostedt
2011-06-09 17:27 ` [PATCH 13/15] x86: Swap save_stack_trace_regs parameters Steven Rostedt
2011-06-13 10:18 ` Ingo Molnar
2011-06-13 11:43 ` Steven Rostedt
2011-06-09 17:27 ` [PATCH 14/15] stack_trace: Add weak save_stack_trace_regs() Steven Rostedt
2011-06-13 10:19 ` Ingo Molnar
2011-06-13 10:52 ` Masami Hiramatsu
2011-06-13 11:42 ` Ingo Molnar
2011-06-09 17:27 ` [PATCH 15/15] tracing/kprobes: Fix kprobe-tracer to support stack trace Steven Rostedt
2011-06-13 10:21 ` Ingo Molnar
2011-06-13 11:44 ` Steven Rostedt
2011-06-13 11:50 ` Ingo Molnar
2011-06-13 12:14 ` Steven Rostedt
2011-06-14 1:25 ` Masami Hiramatsu
2011-06-14 2:08 ` Steven Rostedt
2011-06-14 11:22 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110609172910.665010533@goodmis.org \
--to=rostedt@goodmis.org \
--cc=akpm@linux-foundation.org \
--cc=dhsharp@google.com \
--cc=fweisbec@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=mingo@redhat.com \
--cc=mrubin@google.com \
--cc=vnagarnaik@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox