From mboxrd@z Thu Jan 1 00:00:00 1970 From: Arnaldo Carvalho de Melo Subject: Re: Is it possible to trace events and its call stack? Date: Thu, 12 Jan 2017 17:41:53 -0300 Message-ID: <20170112204153.GD20003@kernel.org> References: <20170112101658.GA3470@naverao1-tp.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Received: from mail.kernel.org ([198.145.29.136]:52320 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750877AbdALUmZ (ORCPT ); Thu, 12 Jan 2017 15:42:25 -0500 Content-Disposition: inline In-Reply-To: <20170112101658.GA3470@naverao1-tp.localdomain> Sender: linux-perf-users-owner@vger.kernel.org List-ID: To: Qu Wenruo Cc: "Naveen N. Rao" , linux-perf-users@vger.kernel.org Em Thu, Jan 12, 2017 at 03:46:58PM +0530, Naveen N. Rao escreveu: > On 2017/01/12 03:49PM, Qu Wenruo wrote: > > Hi, > > > > Is it possible to use perf/ftrace to trace events and its call stack? > > > > [Background] > > It's one structure in btrfs, btrfs_bio, I'm tracing for. > > That structure is allocated and free somewhat frequently, and its size is > > not fixed, so no SLAB/SLUB cache is used. > > > > I added trace events(or trace points, anyway, just in > > include/trace/events/btrfs.h) to trace the allocation and freeing. > > Which will output the pointer address of that structure, so I can pair them, > > alone with other info. > > > > Things went well until, I found some structures are allocated but not freed. > > (no corresponding trace point is triggered for given address). > > > > It's possible that btrfs just forget to free it, or btrfs is just holding it > > for some purpose. > > So kernel memleak detector won't catch the later one. > > > > That's to say along with the tracepoint data, I still need the call stack of > > each calling, to determine the code who leak or hold the pointer. > > > > Is it possible to do it using perf or ftrace? > > Yes, use -g option with 'perf record'. In fact, I don't think you even > need to add a new tracepoint - you should be able to use kprobes (perf > probe) at structure allocation/free points. Yes, with 'perf record -g', as suggested above, or directly with 'perf trace', if the volume is not big or if you're ok about using a strace like workflow, for example: [root@jouet ~]# perf probe -m btrfs -F btrfs_bio* btrfs_bio_alloc btrfs_bio_clone btrfs_bio_counter_inc_blocked btrfs_bio_counter_inc_noblocked btrfs_bio_counter_sub btrfs_bio_wq_end_io [root@jouet ~]# perf probe -m btrfs btrfs_bio_alloc Added new event: probe:btrfs_bio_alloc (on btrfs_bio_alloc in btrfs) You can now use it in all perf tools, such as: perf record -e probe:btrfs_bio_alloc -aR sleep 1 [root@jouet ~]# #perf trace -e write,read,probe:btrfs* [root@jouet ~]# mount | grep btrfs /var/lib/machines.raw on /var/lib/machines type btrfs (rw,relatime,seclabel,space_cache,subvolid=5,subvol=/) [root@jouet ~]# perf trace --no-syscalls -e probe:btrfs*/max-stack=4/ 0.000 probe:btrfs_bio_alloc:(ffffffffc0aae110)) btrfs_bio_alloc ([btrfs]) write_one_eb ([btrfs]) btree_write_cache_pages ([btrfs]) btree_writepages ([btrfs]) 13.112 probe:btrfs_bio_alloc:(ffffffffc0aae110)) btrfs_bio_alloc ([btrfs]) __extent_writepage_io ([btrfs]) __extent_writepage ([btrfs]) extent_write_cache_pages.isra.43.constprop.60 ([btrfs]) 13.285 probe:btrfs_bio_alloc:(ffffffffc0aae110)) btrfs_bio_alloc ([btrfs]) __extent_writepage_io ([btrfs]) __extent_writepage ([btrfs]) extent_write_cache_pages.isra.43.constprop.60 ([btrfs]) 13.434 probe:btrfs_bio_alloc:(ffffffffc0aae110)) btrfs_bio_alloc ([btrfs]) write_one_eb ([btrfs]) btree_write_cache_pages ([btrfs]) btree_writepages ([btrfs]) 13.454 probe:btrfs_bio_alloc:(ffffffffc0aae110)) btrfs_bio_alloc ([btrfs]) write_one_eb ([btrfs]) btree_write_cache_pages ([btrfs]) btree_writepages ([btrfs]) ^C[root@jouet ~]# [root@jouet ~]# perf probe -l probe:btrfs_bio_alloc (on __start_delalloc_inodes+624@git/linux/fs/btrfs/inode.c in btrfs) [root@jouet ~]# This was a system wide record, you could do it just for a set of threads, or for work taking place in a specific CPU, etc. I.e. you could try to isolate a set of CPUs, then make sure that the work you want to trace takes place there and then trace just those CPUS, etc. Use 'perf trace -h topic' to see options related to a topic, e.g.: [root@jouet ~]# perf trace -h cpu Usage: perf trace [] [] or: perf trace [] -- [] or: perf trace record [] [] or: perf trace record [] -- [] -a, --all-cpus system-wide collection from all CPUs -C, --cpu list of cpus to monitor [root@jouet ~]# Remote that --no-syscalls to see strace like output for syscalls (enter + exit, time it takes, only syscalls with more than N milliseconds.microsecds, etc) > A more efficient way would probably use a eBPF program with stackmaps to > track the stack traces. If wanting to do aggregation inside the kernel, yes. - Arnaldo > - Naveen > > -- > To unsubscribe from this list: send the line "unsubscribe linux-perf-users" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html