From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932539AbbDMQuB (ORCPT ); Mon, 13 Apr 2015 12:50:01 -0400 Received: from mail-wi0-f169.google.com ([209.85.212.169]:35290 "EHLO mail-wi0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932150AbbDMQty (ORCPT ); Mon, 13 Apr 2015 12:49:54 -0400 Date: Mon, 13 Apr 2015 18:49:49 +0200 From: Ingo Molnar To: Peter Zijlstra Cc: rusty@rustcorp.com.au, mathieu.desnoyers@efficios.com, oleg@redhat.com, paulmck@linux.vnet.ibm.com, torvalds@linux-foundation.org, linux-kernel@vger.kernel.org, andi@firstfloor.org, rostedt@goodmis.org, tglx@linutronix.de, laijs@cn.fujitsu.com, linux@horizon.com Subject: Re: [PATCH v5 07/10] module: Optimize __module_address() using a latched RB-tree Message-ID: <20150413164949.GF6040@gmail.com> References: <20150413141126.756350256@infradead.org> <20150413141213.614514026@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150413141213.614514026@infradead.org> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Peter Zijlstra wrote: > Currently __module_address() is using a linear search through all > modules in order to find the module corresponding to the provided > address. With a lot of modules this can take a lot of time. > > One of the users of this is kernel_text_address() which is employed > in many stack unwinders; which in turn are used by perf-callchain > and ftrace (possibly from NMI context). > > So by optimizing __module_address() we optimize many stack unwinders > which are used by both perf and tracing in performance sensitive > code. So my (rather typical) workstation has 116 modules loaded currently - but setups using in excess of 150 modules are not uncommon either. A linear list walk of 100-150 entries for every single call chain entry that hits some module, in 'perf record -g', can cause some overhead! > + /* > + * If this is non-NULL, vfree after init() returns. s/vfree/vfree() > + /* > + * We want mtn_core::{mod,node[0]} to be in the same cacheline as the > + * above entries such that a regular lookup will only touch the one > + * cacheline. s/touch the one cacheline /touch one cacheline ? > +static __always_inline int > +mod_tree_comp(void *key, struct latch_tree_node *n) > +{ > + unsigned long val = (unsigned long)key; > + unsigned long start, end; > + > + end = start = __mod_tree_val(n); > + end += __mod_tree_size(n); > + > + if (val < start) > + return -1; > + > + if (val >= end) > + return 1; > + > + return 0; So since we are counting nanoseconds, I suspect this could be written more optimally as: { unsigned long val = (unsigned long)key; unsigned long start, end; start = __mod_tree_val(n); if (val < start) return -1; end = start + __mod_tree_size(n); if (val >= end) return 1; return 0; } right? Thanks, Ingo