From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752595AbbE1HPn (ORCPT ); Thu, 28 May 2015 03:15:43 -0400 Received: from szxga01-in.huawei.com ([58.251.152.64]:16428 "EHLO szxga01-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751372AbbE1HPf (ORCPT ); Thu, 28 May 2015 03:15:35 -0400 Message-ID: <5566C064.6020205@huawei.com> Date: Thu, 28 May 2015 15:14:44 +0800 From: "Wangnan (F)" User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: Alexei Starovoitov CC: , , , , , , , , , , , , , xiakaixu 00238161 Subject: Re: [RFC PATCH v4 10/29] bpf tools: Collect map definitions from 'maps' section References: <1432704004-171454-1-git-send-email-wangnan0@huawei.com> <1432704004-171454-11-git-send-email-wangnan0@huawei.com> <20150528015307.GE20764@Alexeis-MacBook-Pro.local> <55667758.1070206@huawei.com> <20150528022833.GI20764@Alexeis-MacBook-Pro.local> <556686FE.105@huawei.com> <20150528060957.GA21013@Alexeis-MBP.westell.com> In-Reply-To: <20150528060957.GA21013@Alexeis-MBP.westell.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.111.66.109] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2015/5/28 14:09, Alexei Starovoitov wrote: > On Thu, May 28, 2015 at 11:09:50AM +0800, Wangnan (F) wrote: >> However this breaks a law in current design that opening phase doesn't >> talk to kernel with sys_bpf() at all. All related staff is done in loading >> phase. This principle ensures that in every systems, no matter it support >> sys_bpf() or not, can read eBPF object without failure. > I see, so you want 'parse elf' and 'create maps + load programs' > to be separate phases? > Fair enough. Then please add a call to release the information > collected from elf after program loading is done. > relocations and other things are not needed at that point. What about appending a flag into bpf_object__load() to let it know whether to cleanup resource it taken or not? for example: int bpf_object__load(struct bpf_object *obj, bool clean); then we can further wrap it by a macro: #define bpf_object__load_clean(o) bpf_object__load(o, true) If 'clear' is true, after loading resources will be freed, and the same object will be unable to reload again after unload. B doing this we can avoid adding a new function. >> Moreover, we are planning to introduce hardware PMU to eBPF in the way like >> maps, >> to give eBPF programs the ability to access hardware PMU counter. I haven't > that's very interesting. Please share more info when you can :) > If I understood it right, you want in-kernel bpf to do aggregation > and filtering of pmu counters ? > And computing a number of cache misses between two kprobe events? > I can see how I can use that to measure not only time > taken by syscall, but number of cache misses occurred due > to syscall. Sounds very useful! I'm glad to see you are also interested with it. Of course, filtering and aggregation based on PMU counter will be useful, but this is only our first goal. You know there are many useful PMU provided by x86 and ARM64. Many people ask me if there is a way to record absolute PMU counter value when sampling, so they can measure IPC changing, cache miss rate, page faults and so on. Currently 'perf state' is able to read PMU counter, but the cost is relatively high. For me, enable eBPF program to read PMU counter is the first thing need to be done. The other thing is enabling eBPF programs to bring some information to perf sample. Here is an example to show my idea. I have a program which: int main() { while(1) { read(...); /* do A */ write(...); /* do B */ } } Then by using following script: SEC("enter=sys_write $outdata:u64") int enter_sys_write(...) { u64 cycles_cnt = bpf_read_pmu(&cycles_pmu); bpf_store_value(cycles_cnt); return 1; } SEC("enter=sys_read $outdata:u64") int enter_sys_read(...) { u64 cycles_cnt = bpf_read_pmu(&cycles_pmu); bpf_store_value(cycles_cnt); return 1; } by 'perf script', we can check the counter of cycles at each points, then we are allowed to compute the number of cycles between any two sampling points. This way we can compute how many cycles taken by A and B. If instruction counter is also recorded, we will know the IPC of A and B. Above is still a casual idea. Currently I focus on bring eBPF to perf. This should be the base for all other interesting stuffs. However, I'm glad to see people discuss with it. Thank you.