From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 63FBAC4727D for ; Thu, 24 Sep 2020 05:49:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 29C7423600 for ; Thu, 24 Sep 2020 05:49:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1600926582; bh=VuY5deMJpz9RpbOl6ZDHNd/ktzmGDQjVZoET55OsRWA=; h=Subject:From:To:Cc:Date:In-Reply-To:References:List-ID:From; b=NXzFNUPQs5q3zq035QYI36eQViX+09EGugVFrtFmI4f5pDO8a/zQv5MF0X3Yezbfs X+Xs7QTnwMEYixDiL8ZLPdW4e3rIOBBQjaWWTWul0d/lcbjw0uS65dYMP0jhzAfNgm YuqZAUDXJ/mivF0uiRR7/5sW0ugaYqzcin1CvqbE= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726865AbgIXFtj (ORCPT ); Thu, 24 Sep 2020 01:49:39 -0400 Received: from mail.kernel.org ([198.145.29.99]:35260 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726683AbgIXFtj (ORCPT ); Thu, 24 Sep 2020 01:49:39 -0400 Received: from sx1.lan (c-24-6-56-119.hsd1.ca.comcast.net [24.6.56.119]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id BC961235FD; Thu, 24 Sep 2020 05:49:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1600926579; bh=VuY5deMJpz9RpbOl6ZDHNd/ktzmGDQjVZoET55OsRWA=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=jTLCPGelAzygXjPA2HO7IY5wi7yWYHK5PC4oxtY34gJoes5bpf+lKxVYIUfll6KIO QGqL8VcoNjtRZoybO9cZmHdjLLYnLR4H/VvE9ZloteQCUSacBhUO5JmUXVDoezbVAj C6IfzOyB8gVydyoilEG7BAGSBpZHUf51jgI6q4Ys= Message-ID: <2cf4178e970d2737e7ba866ebc83a7ec30ca8ad1.camel@kernel.org> Subject: Re: [PATCH] Revert "net: linkwatch: add check for netdevice being present to linkwatch_do_dev" From: Saeed Mahameed To: David Miller Cc: hkallweit1@gmail.com, geert+renesas@glider.be, f.fainelli@gmail.com, andrew@lunn.ch, kuba@kernel.org, gaku.inami.xh@renesas.com, yoshihiro.shimoda.uh@renesas.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 23 Sep 2020 22:49:37 -0700 In-Reply-To: <20200923.172349.872678515629678579.davem@davemloft.net> References: <20200923.172125.1341776337290371000.davem@davemloft.net> <20200923.172349.872678515629678579.davem@davemloft.net> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.36.5 (3.36.5-1.fc32) MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Wed, 2020-09-23 at 17:23 -0700, David Miller wrote: > From: David Miller > Date: Wed, 23 Sep 2020 17:21:25 -0700 (PDT) > > > If an async code path tests 'present', gets true, and then the RTNL > > holding synchronous code path puts the device into D3hot > immediately > > afterwards, the async code path will still continue and access the > > chips registers and fault. > > Wait, is the sequence: > > ->ndo_stop() > mark device not present and put into D3hot > triggers linkwatch event > ... > ->ndo_get_stats64() > > ??? > I assume it is, since normally device drivers do carrier_off() on ndo_stop() 1) One problematic sequence would be (for drivers doing D3hot on ndo_stop()) __dev_close_many() ->ndo_stop() netif_device_detach() //Mark !present; ... D3hot carrier_off()->linkwatch_event() ... // !present && IFF_UP 2) Another problematic scenario which i see is repeated in many drivers: shutdown/suspend() rtnl_lock() netif_device_detach()//Mark !present; stop()->carrier_off()->linkwatch_event() // at this point device is still IFF_UP and !present // due to the early detach above.. rtnl_unlock(); For scenario 1) we can fix by marking IFF_UP at the beginning, but for 2), i think we need to fix the drivers to detach only after stop :( > Then yeah we might have to clear IFF_UP at the beginning of taking > a netdev down.