From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3B87C4321D for ; Wed, 22 Aug 2018 09:42:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 85E8521480 for ; Wed, 22 Aug 2018 09:42:55 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 85E8521480 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=sipsolutions.net Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728686AbeHVNHA (ORCPT ); Wed, 22 Aug 2018 09:07:00 -0400 Received: from s3.sipsolutions.net ([144.76.43.62]:53664 "EHLO sipsolutions.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728315AbeHVNHA (ORCPT ); Wed, 22 Aug 2018 09:07:00 -0400 Received: by sipsolutions.net with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.91) (envelope-from ) id 1fsPf7-0007Ix-Av; Wed, 22 Aug 2018 11:42:49 +0200 Message-ID: <1534930963.25523.87.camel@sipsolutions.net> Subject: Re: [PATCH 1/2] workqueue: skip lockdep wq dependency in cancel_work_sync() From: Johannes Berg To: Byungchul Park Cc: Tejun Heo , Lai Jiangshan , linux-kernel@vger.kernel.org, linux-wireless@vger.kernel.org, kernel-team@lge.com Date: Wed, 22 Aug 2018 11:42:43 +0200 In-Reply-To: <20180822091547.GB29722@X58A-UD3R> References: <20180821172711.GR3978217@devbig004.ftw2.facebook.com> <1534872621.25523.39.camel@sipsolutions.net> <20180821175550.GS3978217@devbig004.ftw2.facebook.com> <1534879241.25523.44.camel@sipsolutions.net> <20180822024535.GA2414@X58A-UD3R> <1534910543.25523.50.camel@sipsolutions.net> <20180822054731.GB2414@X58A-UD3R> <1534921643.25523.56.camel@sipsolutions.net> <20180822075010.GA29722@X58A-UD3R> <1534924940.25523.70.camel@sipsolutions.net> <20180822091547.GB29722@X58A-UD3R> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.26.6 (3.26.6-1.fc27) Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2018-08-22 at 18:15 +0900, Byungchul Park wrote: > > > thread 1 thread 2 (wq thread) > > > > common_case = false; > > queue_work(&my_wq, &work); > > mutex_lock(&mutex); > > > > flush_workqueue(&my_wq); > > work_function() > > -> mutex_lock(&mutex); > > -> schedule(), wait for mutex > > wait_for_completion() > > > > -> deadlock - we can't make any forward progress here. > > I see. Now, I understand what you are talking about. > > if (common_case) > schedule_and_wait_for_something_that_takes_a_long_time() > > should be: > > if (common_case) > schedule_and_wait_long_time_so_that_the_work_to_finish() Fair point. > Ok. I didn't know what you are talking about but now I got it. great. > You are talking about who knows whether common_case is true or not, > so we must aggresively consider the case the wait_for_completion() > so far hasn't been called but may be called in the future. Yes. > I think it's a problem of how aggressively we need to check dependencies. > If we choose the aggressive option, then we could get reported > aggressively but could not avoid aggresive false positives either. > I don't want to strongly argue that because it's a problem of policy. I don't think you could consider a report from "aggressive reporting" to be a false positive. It's clearly possible that this happens, and once you go to a workqueue you basically don't really know your sequencing and timing any more. > Just, I would consider only waits that actually happened anyway. Of > course, we gotta consider the waits definitely even if any actual > deadlock doesn't happen since detecting potantial problem is more > important than doing on actual ones as you said. > > So no big objection to check dependencies aggressively. > > > Here we don't have a deadlock, but without the revert we will also not > > You misunderstand me. The commit should be reverted or acquire/release > pairs should be added in both flush functions. Ok, I thought you were arguing we shouldn't revert it :) I don't know whether to revert or just add the pairs in the flush functions, because I can't say I understand what the third part of the patch does. > Anyway the annotation should be placed in the path where > wait_for_completion() might be called. Yes, it should be called regardless of whether we actually wait or not, i.e. before most of the checking in the functions. My issue #3 that I outlined is related - we'd like to have lockdep understand that if this work was on the WQ it might deadlock, but we don't have a way to get the WQ unless the work is scheduled - and in the case that it is scheduled we might already hitting the deadlock (depending on the order of execution though I guess). > Absolutly true. You can find my opinion about that in > Documentation/locking/crossrelease.txt which has been removed because > crossrelease is strong at detecting *potential* deadlock problems. Ok, you know what, I'm going to read this now ... hang on........ I see. You were trying to solve the problem of completions/context transfers in lockdep. Given the revert of crossrelease though, we can't actually revert your patch anyway, and looking at it again I see what you mean by the "name" now ... So yeah, we should only re-add the two acquire/release pairs. I guess I'll make a patch for that, replacing my patch 2. I'm intrigued by the crossrelease - but that's a separate topic. johannes