From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1756068AbcK1XKs (ORCPT <rfc822;w@1wt.eu>);
        Mon, 28 Nov 2016 18:10:48 -0500
Received: from one.firstfloor.org ([193.170.194.197]:56970 "EHLO
        one.firstfloor.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1756217AbcK1XKh (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 28 Nov 2016 18:10:37 -0500
Date: Mon, 28 Nov 2016 15:10:34 -0800
From: Andi Kleen <andi@firstfloor.org>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Andi Kleen <andi@firstfloor.org>,
        Peter Zijlstra <peterz@infradead.org>, Jiri Olsa <jolsa@redhat.com>,
        "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
        linux-kernel@vger.kernel.org, Ingo Molnar <mingo@kernel.org>,
        Josh Triplett <josh@joshtriplett.org>,
        Jan Stancek <jstancek@redhat.com>
Subject: Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!
Message-ID: <20161128231034.GK26852@two.firstfloor.org>
References: <20161121170612.GT26852@two.firstfloor.org>
 <20161121171853.GK3092@twins.programming.kicks-ass.net>
 <20161121174504.GU26852@two.firstfloor.org>
 <20161121130115.2f0f38e7@gandalf.local.home>
 <20161121180654.GV26852@two.firstfloor.org>
 <20161121132220.17fb0515@gandalf.local.home>
 <20161121183700.GW26852@two.firstfloor.org>
 <20161123210618.5103f25d@gandalf.local.home>
 <20161128214825.GJ26852@two.firstfloor.org>
 <20161128174601.533d1f09@gandalf.local.home>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20161128174601.533d1f09@gandalf.local.home>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

> Well, actually it's been optimized a lot from the first instances.
> Note, tracepoints need to be quite generic, so where it can be optimized
> is not obvious.

There is quite a bit of stuff that doesn't change from trace point to
trace point. So you could just cache all these decisions into a single
per cpu variable, and invalidate if anything changes the conditions.

> > 
> > Functions with # of instructions:
> > 
> >     25 trace_event_raw_event_sched_switch
> 
> This is from the TRACE_EVENT macro.
> 
>         if (trace_trigger_soft_disabled(trace_file))                    \
>                 return;                                                 \

Cache?

>         if ((trace_file->flags & EVENT_FILE_FL_PID_FILTER) &&
>             trace_event_ignore_this_pid(trace_file))
>                 return NULL;
> 
>         *current_rb = trace_file->tr->trace_buffer.buffer;

Indirection could be cached.

> 
>         if ((trace_file->flags &
>              (EVENT_FILE_FL_SOFT_DISABLED | EVENT_FILE_FL_FILTERED)) &&
>             (entry = this_cpu_read(trace_buffered_event))) {

Multiple tests of trace_file->flags in different functions, 
could be all folded into a single test.

> 
> 
> Note: Some of these if statements can be encapsulated with jump labels, as they
> are false pretty much all of the time.
> 
> 
>                 /* Try to use the per cpu buffer first */
>                 val = this_cpu_inc_return(trace_buffered_event_cnt);
>                 if (val == 1) {
>                         trace_event_setup(entry, type, flags, pc);
>                         entry->array[0] = len;
>                         return entry;
>                 }
>                 this_cpu_dec(trace_buffered_event_cnt);

This can be pre cached in a fast path. If true just point per cpu
variable directly to buffer and invalidate if buffer changes or is full.

>         preempt_disable_notrace();
> 
>         if (unlikely(atomic_read(&buffer->record_disabled)))
>                 goto out;

Cache. 
> 
>         cpu = raw_smp_processor_id();
> 
>         if (unlikely(!cpumask_test_cpu(cpu, buffer->cpumask)))
>                 goto out;

Not needed for per cpu buffers in a fast path.

> 
>         cpu_buffer = buffer->buffers[cpu];
> 
>         if (unlikely(atomic_read(&cpu_buffer->record_disabled)))
>                 goto out;
cache

> 
>         if (unlikely(length > BUF_MAX_DATA_SIZE))
>                 goto out;
> 
>         if (unlikely(trace_recursive_lock(cpu_buffer)))
>                 goto out;

locking should invalidate state on this cpu, so shouldn't be needed

>         u64 diff;
> 
>         rb_start_commit(cpu_buffer);
> 
> #ifdef CONFIG_RING_BUFFER_ALLOW_SWAP
>         /*
>          * Due to the ability to swap a cpu buffer from a buffer
>          * it is possible it was swapped before we committed.
>          * (committing stops a swap). We check for it here and
>          * if it happened, we have to fail the write.
>          */
>         barrier();
>         if (unlikely(ACCESS_ONCE(cpu_buffer->buffer) != buffer)) {
>                 local_dec(&cpu_buffer->committing);
>                 local_dec(&cpu_buffer->commits);
>                 return NULL;
>         }

swapping should invalidate state centrally (and then run slow path like
current path)

>         /*
>          * We allow for interrupts to reenter here and do a trace.
>          * If one does, it will cause this original code to loop
>          * back here. Even with heavy interrupts happening, this
>          * should only happen a few times in a row. If this happens
>          * 1000 times in a row, there must be either an interrupt
>          * storm or we have something buggy.
>          * Bail!
>          */
>         if (RB_WARN_ON(cpu_buffer, ++nr_loops > 1000))
>                 goto out_fail;

didn't you disable interrupts earlier?

>         /* Did the write stamp get updated already? */
>         if (likely(info.ts >= cpu_buffer->write_stamp)) {
>                 info.delta = diff;
>                 if (unlikely(test_time_stamp(info.delta)))
>                         rb_handle_timestamp(cpu_buffer, &info);
>         }

Not sure why a write stamp is needed. isn't that the last few entries?

...

-Andi