From: Steven Rostedt <rostedt@goodmis.org>
To: linux-kernel@vger.kernel.org
Cc: Ingo Molnar <mingo@elte.hu>,
Andrew Morton <akpm@linux-foundation.org>,
Thomas Gleixner <tglx@linutronix.de>,
Frederic Weisbecker <fweisbec@gmail.com>
Subject: [PATCH 20/20] tracing: Do not allocate buffer for trace_marker
Date: Mon, 10 Oct 2011 09:39:12 -0400 [thread overview]
Message-ID: <20111010134146.557023500@goodmis.org> (raw)
In-Reply-To: 20111010133852.829771373@goodmis.org
[-- Attachment #1: Type: text/plain, Size: 4873 bytes --]
From: Steven Rostedt <srostedt@redhat.com>
When doing intense tracing, the kmalloc inside trace_marker can
introduce side effects to what is being traced.
As trace_marker() is used by userspace to inject data into the
kernel ring buffer, it needs to do so with the least amount
of intrusion to the operations of the kernel or the user space
application.
As the ring buffer is designed to write directly into the buffer
without the need to make a temporary buffer, and userspace already
went through the hassle of knowing how big the write will be,
we can simply pin the userspace pages and write the data directly
into the buffer. This improves the impact of tracing via trace_marker
tremendously!
Thanks to Peter Zijlstra and Thomas Gleixner for pointing out the
use of get_user_pages_fast() and kmap_atomic().
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
kernel/trace/trace.c | 111 +++++++++++++++++++++++++++++++++++++-------------
1 files changed, 83 insertions(+), 28 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 13f2b84..f86efe9 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -3628,22 +3628,24 @@ tracing_free_buffer_release(struct inode *inode, struct file *filp)
return 0;
}
-static int mark_printk(const char *fmt, ...)
-{
- int ret;
- va_list args;
- va_start(args, fmt);
- ret = trace_vprintk(0, fmt, args);
- va_end(args);
- return ret;
-}
-
static ssize_t
tracing_mark_write(struct file *filp, const char __user *ubuf,
size_t cnt, loff_t *fpos)
{
- char *buf;
- size_t written;
+ unsigned long addr = (unsigned long)ubuf;
+ struct ring_buffer_event *event;
+ struct ring_buffer *buffer;
+ struct print_entry *entry;
+ unsigned long irq_flags;
+ struct page *pages[2];
+ int nr_pages = 1;
+ ssize_t written;
+ void *page1;
+ void *page2;
+ int offset;
+ int size;
+ int len;
+ int ret;
if (tracing_disabled)
return -EINVAL;
@@ -3651,28 +3653,81 @@ tracing_mark_write(struct file *filp, const char __user *ubuf,
if (cnt > TRACE_BUF_SIZE)
cnt = TRACE_BUF_SIZE;
- buf = kmalloc(cnt + 2, GFP_KERNEL);
- if (buf == NULL)
- return -ENOMEM;
+ /*
+ * Userspace is injecting traces into the kernel trace buffer.
+ * We want to be as non intrusive as possible.
+ * To do so, we do not want to allocate any special buffers
+ * or take any locks, but instead write the userspace data
+ * straight into the ring buffer.
+ *
+ * First we need to pin the userspace buffer into memory,
+ * which, most likely it is, because it just referenced it.
+ * But there's no guarantee that it is. By using get_user_pages_fast()
+ * and kmap_atomic/kunmap_atomic() we can get access to the
+ * pages directly. We then write the data directly into the
+ * ring buffer.
+ */
+ BUILD_BUG_ON(TRACE_BUF_SIZE >= PAGE_SIZE);
- if (copy_from_user(buf, ubuf, cnt)) {
- kfree(buf);
- return -EFAULT;
+ /* check if we cross pages */
+ if ((addr & PAGE_MASK) != ((addr + cnt) & PAGE_MASK))
+ nr_pages = 2;
+
+ offset = addr & (PAGE_SIZE - 1);
+ addr &= PAGE_MASK;
+
+ ret = get_user_pages_fast(addr, nr_pages, 0, pages);
+ if (ret < nr_pages) {
+ while (--ret >= 0)
+ put_page(pages[ret]);
+ written = -EFAULT;
+ goto out;
}
- if (buf[cnt-1] != '\n') {
- buf[cnt] = '\n';
- buf[cnt+1] = '\0';
+
+ page1 = kmap_atomic(pages[0]);
+ if (nr_pages == 2)
+ page2 = kmap_atomic(pages[1]);
+
+ local_save_flags(irq_flags);
+ size = sizeof(*entry) + cnt + 2; /* possible \n added */
+ buffer = global_trace.buffer;
+ event = trace_buffer_lock_reserve(buffer, TRACE_PRINT, size,
+ irq_flags, preempt_count());
+ if (!event) {
+ /* Ring buffer disabled, return as if not open for write */
+ written = -EBADF;
+ goto out_unlock;
+ }
+
+ entry = ring_buffer_event_data(event);
+ entry->ip = _THIS_IP_;
+
+ if (nr_pages == 2) {
+ len = PAGE_SIZE - offset;
+ memcpy(&entry->buf, page1 + offset, len);
+ memcpy(&entry->buf[len], page2, cnt - len);
} else
- buf[cnt] = '\0';
+ memcpy(&entry->buf, page1 + offset, cnt);
- written = mark_printk("%s", buf);
- kfree(buf);
- *fpos += written;
+ if (entry->buf[cnt - 1] != '\n') {
+ entry->buf[cnt] = '\n';
+ entry->buf[cnt + 1] = '\0';
+ } else
+ entry->buf[cnt] = '\0';
+
+ ring_buffer_unlock_commit(buffer, event);
- /* don't tell userspace we wrote more - it might confuse them */
- if (written > cnt)
- written = cnt;
+ written = cnt;
+ *fpos += written;
+
+ out_unlock:
+ if (nr_pages == 2)
+ kunmap_atomic(page2);
+ kunmap_atomic(page1);
+ while (nr_pages > 0)
+ put_page(pages[--nr_pages]);
+ out:
return written;
}
--
1.7.6.3
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
next prev parent reply other threads:[~2011-10-10 13:42 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-10-10 13:38 [PATCH 00/20] [GIT PULL][v3.2] tracing: queued updates Steven Rostedt
2011-10-10 13:38 ` [PATCH 01/20] tracing: Clean up tb_fmt to not give faulty compile warning Steven Rostedt
2011-10-10 13:38 ` [PATCH 02/20] Tracepoint: Dissociate from module mutex Steven Rostedt
2011-10-10 13:38 ` [PATCH 03/20] x86: jump_label: arch_jump_label_text_poke_early: add missing __init Steven Rostedt
2011-10-10 13:38 ` [PATCH 04/20] tracing/filter: Use static allocation for filter predicates Steven Rostedt
2011-10-10 13:38 ` [PATCH 05/20] tracing/filter: Separate predicate init and filter addition Steven Rostedt
2011-10-10 13:38 ` [PATCH 06/20] tracing/filter: Remove field_name from filter_pred struct Steven Rostedt
2011-10-10 13:38 ` [PATCH 07/20] tracing/filter: Simplify tracepoint event lookup Steven Rostedt
2011-10-10 13:39 ` [PATCH 08/20] tracing/filter: Unify predicate tree walking, change check_pred_tree Steven Rostedt
2011-10-10 13:39 ` [PATCH 09/20] tracing/filter: Change count_leafs function to use walk_pred_tree Steven Rostedt
2011-10-10 13:39 ` [PATCH 10/20] tracing/filter: Change fold_pred_tree " Steven Rostedt
2011-10-10 13:39 ` [PATCH 11/20] tracing/filter: Change fold_pred " Steven Rostedt
2011-10-10 13:39 ` [PATCH 12/20] tracing/filter: Change filter_match_preds function to use Steven Rostedt
2011-10-10 13:39 ` [PATCH 13/20] tracing/filter: Add startup tests for events filter Steven Rostedt
2011-10-10 13:39 ` [PATCH 14/20] tracing: Add preempt disable for filter self test Steven Rostedt
2011-10-10 13:39 ` [PATCH 15/20] trace: Add a new readonly entry to report total buffer size Steven Rostedt
2011-10-10 13:39 ` [PATCH 16/20] trace: Add ring buffer stats to measure rate of events Steven Rostedt
2011-10-10 13:39 ` [PATCH 17/20] tracing: Add a counter clock for those that do not trust clocks Steven Rostedt
2011-10-10 13:39 ` [PATCH 18/20] tracing: Fix preemptirqsoff tracer to not stop at preempt off Steven Rostedt
2011-10-10 13:39 ` [PATCH 19/20] tracing: Warn on output if the function tracer was found corrupted Steven Rostedt
2011-10-10 13:39 ` Steven Rostedt [this message]
2011-10-10 13:52 ` [PATCH 00/20] [GIT PULL][v3.2] tracing: queued updates Steven Rostedt
2011-10-11 5:51 ` Ingo Molnar
2011-10-11 8:15 ` Ingo Molnar
2011-10-11 11:14 ` Steven Rostedt
2011-10-11 5:50 ` Ingo Molnar
2011-10-11 21:08 ` Valdis.Kletnieks
2011-10-12 8:07 ` Ingo Molnar
2011-10-12 14:19 ` Jeff King
2011-10-12 16:29 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111010134146.557023500@goodmis.org \
--to=rostedt@goodmis.org \
--cc=akpm@linux-foundation.org \
--cc=fweisbec@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox