From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754667AbXDZMx3 (ORCPT ); Thu, 26 Apr 2007 08:53:29 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754673AbXDZMx3 (ORCPT ); Thu, 26 Apr 2007 08:53:29 -0400 Received: from mx10.go2.pl ([193.17.41.74]:57117 "EHLO poczta.o2.pl" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1754667AbXDZMx2 (ORCPT ); Thu, 26 Apr 2007 08:53:28 -0400 Date: Thu, 26 Apr 2007 14:59:18 +0200 From: Jarek Poplawski To: Oleg Nesterov Cc: Andrew Morton , Ingo Molnar , linux-kernel@vger.kernel.org, David Howells Subject: Re: Fw: [PATCH -mm] workqueue: debug possible endless loop in cancel_rearming_delayed_work Message-ID: <20070426125918.GC3145@ff.dom.local> References: <20070420092201.GC1695@ff.dom.local> <20070420170836.GB470@tv-sign.ru> <20070423090030.GC1684@ff.dom.local> <20070423163312.GA129@tv-sign.ru> <20070424115322.GA2423@ff.dom.local> <20070424185537.GA5029@tv-sign.ru> <20070425122038.GE1613@ff.dom.local> <20070425122814.GF1613@ff.dom.local> <20070425124714.GA94@tv-sign.ru> <20070425144759.GA201@tv-sign.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20070425144759.GA201@tv-sign.ru> User-Agent: Mutt/1.4.2.2i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 25, 2007 at 06:47:59PM +0400, Oleg Nesterov wrote: > On 04/25, Oleg Nesterov wrote: > > > > On 04/25, Jarek Poplawski wrote: > > > > > > Probably this is also possible without timer i.e. > > > with queue_work. > > > > Yes, thanks. While adding cpu-hotplug check I forgot to add ->current_work > > check, which is needed to actually implement this > > > > > > Note that cancel_rearming_delayed_work() now can handle the works > > > > which re-arm itself via queue_work(), not only queue_delayed_work(). > > > > part. I'll resend after fix. > > Hm. But can't we do better? Looks like we don't need to check ->current_work, > > void cancel_rearming_delayed_work(struct delayed_work *dwork) > { > struct work_struct *work = &dwork->work; > struct cpu_workqueue_struct *cwq = get_wq_data(work); > int done; I don't understand, why you think cwq cannot be NULL here. > > do { > done = 1; > spin_lock_irq(&cwq->lock); > > if (!list_empty(&work->entry)) > list_del_init(&work->entry); BTW, isn't needs_a_good_name needles after this and after del_timer positive? > else if (test_and_set_bit(WORK_STRUCT_PENDING, work_data_bits(work))) > done = del_timer(&dwork->timer) If this runs while a work function is fired in run_workqueue, it sets _PENDING bit, but if the work skips rearming, we have probably endless loop, again. > > spin_unlock_irq(&cwq->lock); > } while (!done); > > /* > * Nobody can clear WORK_STRUCT_PENDING. This means that the > * work can't be re-queued and the timer can't be re-started. > */ > needs_a_good_name(cwq->wq, work); > work_clear_pending(work); > } > > Jarek, I didn't think much about this, just a new idea. I am posting this code > in a hope you can review it while I sleep on this... CPU-hotplug is ignored for > now. Note that this version doesn't need the change in run_workqueue(). It's very interesting proposal, but also hard to analyse - the locks are taken and given away, and there is hard to forsee, when and where the loop regains the lock again. It is something alike to the current way, with some added measures: you try to shoot a work on the run, while queued or timer_pending, plus the _PENDING flag set, so it seems, there is some risk of longer than planed looping. I have to look at this more, at home and, if something new, I'll write tomorrow. So, the good news, is you should have enough sleep this time! Cheers, Jarek P.