From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f66.google.com ([74.125.82.66]:49244 "EHLO mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751797AbdJSJ1c (ORCPT ); Thu, 19 Oct 2017 05:27:32 -0400 Received: by mail-wm0-f66.google.com with SMTP id b189so14574087wmd.4 for ; Thu, 19 Oct 2017 02:27:31 -0700 (PDT) Subject: Re: [PATCH 1/3] clocksource/mips-gic-timer: Fix rcu_sched timeouts from multithreading To: Thomas Gleixner Cc: Matt Redfearn , linux-mips@linux-mips.org, Matt Redfearn , "# v3 . 19 +" , linux-kernel@vger.kernel.org References: <1507730474-8577-1-git-send-email-matt.redfearn@mips.com> <32cc3d3e-88df-405f-5278-7fe00d066a93@linaro.org> From: Daniel Lezcano Message-ID: Date: Thu, 19 Oct 2017 11:27:29 +0200 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: stable-owner@vger.kernel.org List-ID: On 19/10/2017 11:18, Thomas Gleixner wrote: > On Thu, 19 Oct 2017, Daniel Lezcano wrote: >> On 18/10/2017 22:34, Thomas Gleixner wrote: >>> On Wed, 11 Oct 2017, Matt Redfearn wrote: >>> >>>> When the MIPS GIC clockevent code was written, it appears to have >>>> inherited the 0x300 cycle min delta from the MIPS CPU timer driver. This >>>> is suboptimal for two reasons. >>>> >>>> Firstly, the CPU timer counts once every other cycle (i.e. half the >>>> clock rate). The GIC counts once per clock. Assuming that the GIC and >>>> CPU share the same clock this means the GIC is counting twice as fast, >>>> and so the min delta should be (at least) doubled. Fix this by doubling >>>> the min delta to 0x600. >>>> >>>> Secondly, the fixed min delta ignores the fact that with MIPS >>>> multithreading active, execution resource within a core is shared >>>> between the hardware threads within that core. An inconvenienly timed >>>> switch of executing thread within gic_next_event, between the read and >>>> write of updated count, can result in the CPU writing an event in the >>>> past, and subsequently not receiving a tick interrupt until the counter >>>> wraps. This stalls the CPU from the RCU scheduler. Other CPUs detect >>>> this and print rcu_sched timeout messages in the kernel log. It can >>>> lead to other issues as well if the CPU is holding locks or other >>>> resources at the point at which it stalls. Fix this by scaling the min >>>> delta for the timer based on the number of threads in the core >>>> (smp_num_siblings). This accounts for the greater average runtime of >>>> CPUs within a multithreading core. >>> >>> I don't understand why this is not catched by the check at the end of the >>> next_event() function: >>> >>> res = ((int)(gic_read_count() - cnt) >= 0) ? -ETIME : 0; >>> >>> Btw, the local_irq_save() in this function is pointless as this function is >>> always called with interrupts disabled from the core code. >> >> Would it be worth to add some comment in include/linux/clockchips.h in >> the structure definition for the different callbacks to tell which ones >> are called with the irq disabled ? > > Yes. IIRC all callbacks are invoked with interrupts disabled. Care to check > that and whip up a patch? Sure, no problem. -- Linaro.org │ Open source software for ARM SoCs Follow Linaro: Facebook | Twitter | Blog