From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S932539AbbDMQuB (ORCPT <rfc822;w@1wt.eu>);
	Mon, 13 Apr 2015 12:50:01 -0400
Received: from mail-wi0-f169.google.com ([209.85.212.169]:35290 "EHLO
	mail-wi0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S932150AbbDMQty (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Mon, 13 Apr 2015 12:49:54 -0400
Date: Mon, 13 Apr 2015 18:49:49 +0200
From: Ingo Molnar <mingo@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: rusty@rustcorp.com.au, mathieu.desnoyers@efficios.com, oleg@redhat.com,
        paulmck@linux.vnet.ibm.com, torvalds@linux-foundation.org,
        linux-kernel@vger.kernel.org, andi@firstfloor.org, rostedt@goodmis.org,
        tglx@linutronix.de, laijs@cn.fujitsu.com, linux@horizon.com
Subject: Re: [PATCH v5 07/10] module: Optimize __module_address() using a
 latched RB-tree
Message-ID: <20150413164949.GF6040@gmail.com>
References: <20150413141126.756350256@infradead.org>
 <20150413141213.614514026@infradead.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20150413141213.614514026@infradead.org>
User-Agent: Mutt/1.5.23 (2014-03-12)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org


* Peter Zijlstra <peterz@infradead.org> wrote:

> Currently __module_address() is using a linear search through all
> modules in order to find the module corresponding to the provided
> address. With a lot of modules this can take a lot of time.
>
> One of the users of this is kernel_text_address() which is employed 
> in many stack unwinders; which in turn are used by perf-callchain 
> and ftrace (possibly from NMI context).
> 
> So by optimizing __module_address() we optimize many stack unwinders 
> which are used by both perf and tracing in performance sensitive 
> code.

So my (rather typical) workstation has 116 modules loaded currently - 
but setups using in excess of 150 modules are not uncommon either.

A linear list walk of 100-150 entries for every single call chain 
entry that hits some module, in 'perf record -g', can cause some 
overhead!

> +	/*
> +	 * If this is non-NULL, vfree after init() returns.

s/vfree/vfree()

> +	/*
> +	 * We want mtn_core::{mod,node[0]} to be in the same cacheline as the
> +	 * above entries such that a regular lookup will only touch the one
> +	 * cacheline.

s/touch the one cacheline
 /touch one cacheline

?

> +static __always_inline int
> +mod_tree_comp(void *key, struct latch_tree_node *n)
> +{
> +	unsigned long val = (unsigned long)key;
> +	unsigned long start, end;
> +
> +	end = start = __mod_tree_val(n);
> +	end += __mod_tree_size(n);
> +
> +	if (val < start)
> +		return -1;
> +
> +	if (val >= end)
> +		return 1;
> +
> +	return 0;

So since we are counting nanoseconds, I suspect this could be written 
more optimally as:

{
	unsigned long val = (unsigned long)key;
	unsigned long start, end;

	start = __mod_tree_val(n);
	if (val < start)
		return -1;

	end = start + __mod_tree_size(n);
	if (val >= end)
		return 1;

	return 0;
}

right?

Thanks,

	Ingo