From mboxrd@z Thu Jan 1 00:00:00 1970 From: Randy Dunlap Subject: Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI Date: Tue, 19 Jun 2018 17:25:09 -0700 Message-ID: References: <1528851463-21140-1-git-send-email-ricardo.neri-calderon@linux.intel.com> <1528851463-21140-18-git-send-email-ricardo.neri-calderon@linux.intel.com> <20180615020314.GA11625@voyager> <20180616005103.GC6659@voyager> <20180620001535.GA27531@voyager> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20180620001535.GA27531@voyager> Content-Language: en-US List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: iommu-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org Errors-To: iommu-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org To: Ricardo Neri , Thomas Gleixner Cc: "Rafael J. Wysocki" , Peter Zijlstra , Alexei Starovoitov , Kai-Heng Feng , "H. Peter Anvin" , sparclinux-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Ingo Molnar , Christoffer Dall , Davidlohr Bueso , Ashok Raj , Michael Ellerman , x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org, David Rientjes , Andi Kleen , Waiman Long , Borislav Petkov , Masami Hiramatsu , Don Zickus , "Ravi V. Shankar" , Konrad Rzeszutek Wilk , Marc Zyngier , Frederic Weisbecker , Nicholas Piggin List-Id: iommu@lists.linux-foundation.org On 06/19/2018 05:15 PM, Ricardo Neri wrote: > On Sat, Jun 16, 2018 at 03:24:49PM +0200, Thomas Gleixner wrote: >> On Fri, 15 Jun 2018, Ricardo Neri wrote: >>> On Fri, Jun 15, 2018 at 11:19:09AM +0200, Thomas Gleixner wrote: >>>> On Thu, 14 Jun 2018, Ricardo Neri wrote: >>>>> Alternatively, there could be a counter that skips reading the HPET status >>>>> register (and the detection of hardlockups) for every X NMIs. This would >>>>> reduce the overall frequency of HPET register reads. >>>> >>>> Great plan. So if the watchdog is the only NMI (because perf is off) then >>>> you delay the watchdog detection by that count. >>> >>> OK. This was a bad idea. Then, is it acceptable to have an read to an HPET >>> register per NMI just to check in the status register if the HPET timer >>> caused the NMI? >> >> The status register is useless in case of MSI. MSI is edge triggered .... >> >> The only register which gives you proper information is the counter >> register itself. That adds an massive overhead to each NMI, because the >> counter register access is synchronized to the HPET clock with hardware >> magic. Plus on larger systems, the HPET access is cross node and even >> slower. > > It starts to sound that the HPET is too slow to drive the hardlockup detector. > > Would it be possible to envision a variant of this implementation? In this > variant, the HPET only targets a single CPU. The actual hardlockup detector > is implemented by this single CPU sending interprocessor interrupts to the > rest of the CPUs. > > In this manner only one CPU has to deal with the slowness of the HPET; the > rest of the CPUs don't have to read or write any HPET registers. A sysfs > entry could be added to configure which CPU will have to deal with the HPET > timer. However, profiling could not be done accurately on such CPU. Please forgive my simple question: What happens when this one CPU is the one that locks up? thnx, -- ~Randy