From mboxrd@z Thu Jan 1 00:00:00 1970 From: Namhyung Kim Subject: Re: [PATCH tip 0/9] tracing: attach eBPF programs to tracepoints/syscalls/kprobe Date: Thu, 22 Jan 2015 10:03:16 +0900 Message-ID: <20150122010316.GA15871@sejong> References: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Cc: Steven Rostedt , Ingo Molnar , Arnaldo Carvalho de Melo , Jiri Olsa , "David S. Miller" , Daniel Borkmann , Hannes Frederic Sowa , Brendan Gregg , Linux API , Network Development , LKML To: Alexei Starovoitov Return-path: Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org List-Id: netdev.vger.kernel.org Hi Alexei, On Fri, Jan 16, 2015 at 10:57:15AM -0800, Alexei Starovoitov wrote: > On Fri, Jan 16, 2015 at 7:02 AM, Steven Rostedt wrote: > > One last thing. If the ebpf is used for anything but filtering, it > > should go into the trigger file. The filtering is only a way to say if > > the event should be recorded or not. But the trigger could do something > > else (a printk, a stacktrace, etc). > > it does way more than just filtering, but > invoking program as a trigger is too slow. > When program is called as soon as tracepoint fires, > it can fetch other fields, evaluate them, printk some of them, > optionally dump stack, aggregate into maps. > We can let it call triggers too, so that user program will > be able to enable/disable other events. > I'm not against invoking programs as a trigger, but I don't > see a use case for it. It's just too slow for production > analytics that needs to act on huge number of events > per second. AFAIK a trigger can be fired before allocating a ring buffer if it doesn't use the event record (i.e. has filter) or ->post_trigger bit set (stacktrace). Please see ftrace_trigger_soft_disabled(). This also makes it keeping events in the soft-disabled state. Thanks, Namhyung > We must minimize the overhead between tracepoint > firing and program executing, so that programs can > be used on events like packet receive which will be > in millions per second. Every nsec counts. > For example: > - raw dd if=/dev/zero of=/dev/null > does 760 MB/s (on my debug kernel) > - echo 1 > events/syscalls/sys_enter_write/enable > drops it to 400 MB/s > - echo "echo "count == 123 " > events/syscalls/sys_enter_write/filter > drops it even further down to 388 MB/s > This slowdown is too high for this to be used on a live system. > - tracex4 that computes histogram of sys_write sizes > and stores log2(count) into a map does 580 MB/s > This is still not great, but this slowdown is now usable > and we can work further on minimizing the overhead. > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/