From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932091AbaCEVzu (ORCPT ); Wed, 5 Mar 2014 16:55:50 -0500 Received: from mx1.redhat.com ([209.132.183.28]:31428 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753236AbaCEVzr (ORCPT ); Wed, 5 Mar 2014 16:55:47 -0500 Message-ID: <53179D06.2050707@redhat.com> Date: Wed, 05 Mar 2014 16:54:14 -0500 From: Rik van Riel User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.1.0 MIME-Version: 1.0 To: Thomas Gleixner CC: linux-kernel@vger.kernel.org, Mateusz Guzik , Benjamin Herrenschmidt , Ingo Molnar , Prarit Bhargava , Frederic Weisbecker , Clark Williams Subject: Re: [RFC PATCH] hrtimer: remove deadlock due to waiting on IPI in softirq context References: <20140305162526.7d2ef1ab@cuia.bos.redhat.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/05/2014 04:51 PM, Thomas Gleixner wrote: > On Wed, 5 Mar 2014, Rik van Riel wrote: >> There appears to be a deadlock in the hrtimer code. Specifically, >> clock_was_set() calls an IPI with wait=1, from softirq context. > > This should not be called from softirq context. > >> Waiting for IPIs to complete in irq context can lead to a deadlock, >> because the current code (that was interrupted) might be holding some >> kind of lock, that another CPU is waiting for with spin_lock_irq or >> similar. >> >> In other words, the current CPU may need to release a resource, before >> the IPI can be handled by one of the destination CPUs. >> >> To my untrained eye, it does not look like this patch introduces a >> new bug to the timer code, but that is hard to ascertain with the >> timer code. so I am posting this as an RFC for the timer gods to hurt >> their brains on :) >> >> This bug was introduced by 54cdfdb4 in early 2007 (the original >> hrtimer code patch). > > Right and we had some issues with that until we moved the calls to > clock_was_set() out of lock held regions. Ahh indeed, the bug got fixed already :) > The only call which happens from interrupt context is in > update_wall_time(). And that one definitely holds no locks which are > relevant. > > On which kernel are you observing the issue? This was RHEL6, and I saw that the immediate function was still the same upstream. I forgot to check that clock_was_set() is now called in a different way. My bad.