From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f67.google.com ([74.125.82.67]:45887 "EHLO mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750993AbdJSJPd (ORCPT ); Thu, 19 Oct 2017 05:15:33 -0400 Received: by mail-wm0-f67.google.com with SMTP id q124so14633318wmb.0 for ; Thu, 19 Oct 2017 02:15:33 -0700 (PDT) Subject: Re: [PATCH 1/3] clocksource/mips-gic-timer: Fix rcu_sched timeouts from multithreading To: Thomas Gleixner , Matt Redfearn Cc: linux-mips@linux-mips.org, Matt Redfearn , "# v3 . 19 +" , linux-kernel@vger.kernel.org References: <1507730474-8577-1-git-send-email-matt.redfearn@mips.com> From: Daniel Lezcano Message-ID: <32cc3d3e-88df-405f-5278-7fe00d066a93@linaro.org> Date: Thu, 19 Oct 2017 11:15:30 +0200 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: stable-owner@vger.kernel.org List-ID: On 18/10/2017 22:34, Thomas Gleixner wrote: > On Wed, 11 Oct 2017, Matt Redfearn wrote: > >> When the MIPS GIC clockevent code was written, it appears to have >> inherited the 0x300 cycle min delta from the MIPS CPU timer driver. This >> is suboptimal for two reasons. >> >> Firstly, the CPU timer counts once every other cycle (i.e. half the >> clock rate). The GIC counts once per clock. Assuming that the GIC and >> CPU share the same clock this means the GIC is counting twice as fast, >> and so the min delta should be (at least) doubled. Fix this by doubling >> the min delta to 0x600. >> >> Secondly, the fixed min delta ignores the fact that with MIPS >> multithreading active, execution resource within a core is shared >> between the hardware threads within that core. An inconvenienly timed >> switch of executing thread within gic_next_event, between the read and >> write of updated count, can result in the CPU writing an event in the >> past, and subsequently not receiving a tick interrupt until the counter >> wraps. This stalls the CPU from the RCU scheduler. Other CPUs detect >> this and print rcu_sched timeout messages in the kernel log. It can >> lead to other issues as well if the CPU is holding locks or other >> resources at the point at which it stalls. Fix this by scaling the min >> delta for the timer based on the number of threads in the core >> (smp_num_siblings). This accounts for the greater average runtime of >> CPUs within a multithreading core. > > I don't understand why this is not catched by the check at the end of the > next_event() function: > > res = ((int)(gic_read_count() - cnt) >= 0) ? -ETIME : 0; > > Btw, the local_irq_save() in this function is pointless as this function is > always called with interrupts disabled from the core code. Would it be worth to add some comment in include/linux/clockchips.h in the structure definition for the different callbacks to tell which ones are called with the irq disabled ? -- Linaro.org │ Open source software for ARM SoCs Follow Linaro: Facebook | Twitter | Blog