From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <stable-owner@vger.kernel.org>
Received: from mail-wm0-f67.google.com ([74.125.82.67]:45887 "EHLO
        mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1750993AbdJSJPd (ORCPT
        <rfc822;stable@vger.kernel.org>); Thu, 19 Oct 2017 05:15:33 -0400
Received: by mail-wm0-f67.google.com with SMTP id q124so14633318wmb.0
        for <stable@vger.kernel.org>; Thu, 19 Oct 2017 02:15:33 -0700 (PDT)
Subject: Re: [PATCH 1/3] clocksource/mips-gic-timer: Fix rcu_sched timeouts
 from multithreading
To: Thomas Gleixner <tglx@linutronix.de>,
        Matt Redfearn <matt.redfearn@mips.com>
Cc: linux-mips@linux-mips.org,
        Matt Redfearn <matt.redfearn@imgtec.com>,
        "# v3 . 19 +" <stable@vger.kernel.org>,
        linux-kernel@vger.kernel.org
References: <1507730474-8577-1-git-send-email-matt.redfearn@mips.com>
 <alpine.DEB.2.20.1710182226080.2477@nanos>
From: Daniel Lezcano <daniel.lezcano@linaro.org>
Message-ID: <32cc3d3e-88df-405f-5278-7fe00d066a93@linaro.org>
Date: Thu, 19 Oct 2017 11:15:30 +0200
MIME-Version: 1.0
In-Reply-To: <alpine.DEB.2.20.1710182226080.2477@nanos>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 8bit
Sender: stable-owner@vger.kernel.org
List-ID: <stable.vger.kernel.org>

On 18/10/2017 22:34, Thomas Gleixner wrote:
> On Wed, 11 Oct 2017, Matt Redfearn wrote:
> 
>> When the MIPS GIC clockevent code was written, it appears to have
>> inherited the 0x300 cycle min delta from the MIPS CPU timer driver. This
>> is suboptimal for two reasons.
>>
>> Firstly, the CPU timer counts once every other cycle (i.e. half the
>> clock rate). The GIC counts once per clock. Assuming that the GIC and
>> CPU share the same clock this means the GIC is counting twice as fast,
>> and so the min delta should be (at least) doubled. Fix this by doubling
>> the min delta to 0x600.
>>
>> Secondly, the fixed min delta ignores the fact that with MIPS
>> multithreading active, execution resource within a core is shared
>> between the hardware threads within that core. An inconvenienly timed
>> switch of executing thread within gic_next_event, between the read and
>> write of updated count, can result in the CPU writing an event in the
>> past, and subsequently not receiving a tick interrupt until the counter
>> wraps. This stalls the CPU from the RCU scheduler. Other CPUs detect
>> this and print rcu_sched timeout messages in  the kernel log. It can
>> lead to other issues as well if the CPU is holding locks or other
>> resources at the point at which it stalls. Fix this by scaling the min
>> delta for the timer based on the number of threads in the core
>> (smp_num_siblings). This accounts for the greater average runtime of
>> CPUs within a multithreading core.
> 
> I don't understand why this is not catched by the check at the end of the
> next_event() function:
> 
>         res = ((int)(gic_read_count() - cnt) >= 0) ? -ETIME : 0;
> 
> Btw, the local_irq_save() in this function is pointless as this function is
> always called with interrupts disabled from the core code.

Would it be worth to add some comment in include/linux/clockchips.h in
the structure definition for the different callbacks to tell which ones
are called with the irq disabled ?


-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog