From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753285AbcFPQCW (ORCPT ); Thu, 16 Jun 2016 12:02:22 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:37354 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751563AbcFPQCV (ORCPT ); Thu, 16 Jun 2016 12:02:21 -0400 X-IBM-Helo: d01dlp02.pok.ibm.com X-IBM-MailFrom: paulmck@linux.vnet.ibm.com X-IBM-RcptTo: linux-kernel@vger.kernel.org Date: Thu, 16 Jun 2016 09:02:15 -0700 From: "Paul E. McKenney" To: Thomas Gleixner Cc: Arjan van de Ven , Eric Dumazet , Peter Zijlstra , Ingo Molnar , LKML , Frederic Weisbecker , Chris Mason , Arjan van de Ven , Linus Torvalds , George Spelvin Subject: Re: [patch 13/20] timer: Switch to a non cascading wheel Reply-To: paulmck@linux.vnet.ibm.com References: <20160614101144.GA849@gmail.com> <20160614204225.GI30154@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16061616-0040-0000-0000-000000953286 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16061616-0041-0000-0000-0000046F1247 Message-Id: <20160616160215.GQ3923@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-06-16_09:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1604210000 definitions=main-1606160175 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jun 16, 2016 at 05:43:36PM +0200, Thomas Gleixner wrote: > On Wed, 15 Jun 2016, Thomas Gleixner wrote: > > On Wed, 15 Jun 2016, Arjan van de Ven wrote: > > > what would 1 more timer wheel do? > > > > Waste storage space and make the collection of expired timers more expensive. > > > > The selection of the timer wheel properties is combination of: > > > > 1) Granularity > > > > 2) Storage space > > > > 3) Number of levels to collect > > So I came up with a slightly different solution for this. The problem case is > HZ=1000 and again looking at the data, there is no reason why we need actual > 1ms granularity for timer wheel timers. That's independent of the desired ms > based interfaces. > > We can simply run the wheel internaly with 4ms base level resolution and > degrade from there. That gives us 6 days+ and a simple cutoff at the capacity > of the 7th level wheel. > > 0 0 4 ms 0 ms - 255 ms > 1 64 32 ms 256 ms - 2047 ms (256ms - ~2s) > 2 128 256 ms 2048 ms - 16383 ms (~2s - ~16s) > 3 192 2048 ms (~2s) 16384 ms - 131071 ms (~16s - ~2m) > 4 256 16384 ms (~16s) 131072 ms - 1048575 ms (~2m - ~17m) > 5 320 131072 ms (~2m) 1048576 ms - 8388607 ms (~17m - ~2h) > 6 384 1048576 ms (~17m) 8388608 ms - 67108863 ms (~2h - ~18h) > 7 448 8388608 ms (~2h) 67108864 ms - 536870911 ms (~18h - ~6d) > > That works really nice and has the interesting side effect that we batch in > the first level wheel which helps networking. I'll repost the series with the > other review points addressed later tonight. > > Btw, I also thought a bit more about the milliseconds interfaces. I think we > shouldn't invent new interfaces. The correct solution IMHO is to distangle the > scheduler tick frequency and jiffies. If we have that completely seperated > then we can do the following: > > 1) Force HZ=1000. That means jiffies and timer wheel units are 1ms. If the > tick frequency is != 1000 we simply increment jiffies in the tick by the > proper amount (4 @250 ticks/sec, 10 @100 ticks/sec). > > So all msec_to_jiffies() invocations compile out into nothing magically and > we can remove them gradually over time. Some of RCU's heuristics assume that if scheduling-clock ticks happen, they happen once per jiffy. These would need to be adjusted, which would not be a big deal, just a bit more use of HZ. > 2) When we do that right, we can make the tick frequency a command line option > and just have a compiled in default. As long as there is something that tells RCU what the tick frequency actually is at runtime, this should not be a problem. For example, in rcu_implicit_dynticks_qs(), the following: rdp->rsp->jiffies_resched += 5; Would instead need to be something like: rdp->rsp->jiffies_resched += 5 * jiffies_per_tick; Changing tick frequency at runtime would be a bit more tricky, as it would be tough to avoid some oddball false positives during the transition. But setting it at boot time would be fine. ;-) Thanx, Paul > Thoughts? > > Thanks, > > tglx >