From: Jiri Olsa <jolsa@redhat.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
"Luis Claudio R. Goncalves" <lclaudio@uudg.org>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
ldv@altlinux.org, esyr@redhat.com,
Frederic Weisbecker <fweisbec@gmail.com>,
Masami Hiramatsu <mhiramat@kernel.org>,
Namhyung Kim <namhyung@kernel.org>
Subject: Re: Rough idea of implementing blocking perf calls for system call tracepoints
Date: Fri, 30 Nov 2018 12:48:09 +0100 [thread overview]
Message-ID: <20181130114809.GC3617@krava> (raw)
In-Reply-To: <20181130104044.GB3617@krava>
On Fri, Nov 30, 2018 at 11:40:44AM +0100, Jiri Olsa wrote:
> On Wed, Nov 28, 2018 at 02:18:08PM -0500, Steven Rostedt wrote:
> >
> > Adding Masami and Namhyung to this as well.
> >
> > -- Steve
> >
> >
> > On Wed, 28 Nov 2018 13:47:00 -0500
> > Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > > [
> > > Sorry for the late reply on this, when I got back from Plumbers, my
> > > work was really piled up, and then Turkey day came and just added more
> > > to the chaos.
> > > ]
> > >
> > > From our discussion at the Linux Plumbers strace talk about
> > > implementing strace with perf. As strace requires to be lossless, it
> > > currently can not be implemented with perf because there's always a
> > > chance to lose events. The idea here is to have a way to instrument a
> > > way to record system calls from perf but also block when the perf ring
> > > buffer is full.
> > >
> > > Below is a patch I wrote that gives an idea of what needs to be done.
> > > It is by no means a real patch (wont even compile). And I left out the
> > > wake up part, as I'm not familiar enough with how perf works to
> > > implement it. But hopefully someone on this list can :-)
> > >
> > > The idea here is that we set the tracepoints sys_enter and sys_exit
> > > with a new flag called TRACE_EVENT_FL_BLOCK. When the perf code records
> > > the event, if the buffer is full, it will set a "perf_block" field in
> > > the current task structure to point to the tp_event, if the tp_event
> > > has the BLOCK flag set.
> > >
> > > Then on the exit of the syscall tracepoints, the perf_block field is
> > > checked, and if it is set, it knows that the event was dropped, and
> > > will add itself to a wait queue. When the reader reads the perf buffer
> > > and hits a water mark, it can wake whatever is on the queue (not sure
> > > where to put this queue, but someone can figure it out).
> > >
> > > Once woken, it will try to write to the perf system call tracepoint
> > > again (notice that it only tries perf and doesn't call the generic
> > > tracepoint code, as only perf requires a repeat).
> > >
> > > This is just a basic idea patch, to hopefully give someone else an idea
> > > of what I envision. I think it can work, and if it does, I can imagine
> > > that it would greatly improve the performance of strace!
> > >
> > > Thoughts?
> > >
> > > -- Steve
> > >
> > > diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> > > index 3b2490b81918..57fe95950a24 100644
> > > --- a/arch/x86/entry/common.c
> > > +++ b/arch/x86/entry/common.c
> > > @@ -123,8 +123,22 @@ static long syscall_trace_enter(struct pt_regs *regs)
> > > }
> > > #endif
> > >
> > > - if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
> > > + if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT))) {
> > > + current->perf_block = NULL;
> > > trace_sys_enter(regs, regs->orig_ax);
> > > + while (current->perf_block) {
> > > + DECLARE_WAITQUEUE(wait, current);
> > > + struct trace_event_call *tp_event = current->perf_block;
> > > +
> > > + current->perf_block = NULL;
> > > +
> > > + set_current_state(TASK_INTERRUPTIBLE);
> > > + add_wait_queue(&tp_event->block_queue, &wait);
> > > + perf_trace_sys_enter(tp_event, regs, regs->orig_ax);
> > > + if (current->perf_block)
> > > + schedule();
>
> the space gets freed up by user space moving the tail pointer
> so I wonder we need actualy to do some polling in here
>
> also how about making this ring buffer feature so it's not specific
> just to sys_enter/sys_exit.. I'll check
or perhaps just tracepoints.. does not seem to make much
sense for he events
jirka
next prev parent reply other threads:[~2018-11-30 11:48 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-28 18:47 Rough idea of implementing blocking perf calls for system call tracepoints Steven Rostedt
2018-11-28 19:18 ` Steven Rostedt
2018-11-30 10:40 ` Jiri Olsa
2018-11-30 11:48 ` Jiri Olsa [this message]
2018-11-30 14:01 ` Jiri Olsa
2018-11-30 14:45 ` Steven Rostedt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181130114809.GC3617@krava \
--to=jolsa@redhat.com \
--cc=acme@kernel.org \
--cc=esyr@redhat.com \
--cc=fweisbec@gmail.com \
--cc=lclaudio@uudg.org \
--cc=ldv@altlinux.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mhiramat@kernel.org \
--cc=mingo@kernel.org \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.