From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AC221C10F11 for ; Wed, 24 Apr 2019 05:54:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 7D525218B0 for ; Wed, 24 Apr 2019 05:54:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729629AbfDXFyI convert rfc822-to-8bit (ORCPT ); Wed, 24 Apr 2019 01:54:08 -0400 Received: from smtp.nue.novell.com ([195.135.221.5]:55229 "EHLO smtp.nue.novell.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729244AbfDXFyI (ORCPT ); Wed, 24 Apr 2019 01:54:08 -0400 Received: from [10.67.17.168] ([45.122.156.254]) by smtp.nue.novell.com with ESMTP (TLS encrypted); Wed, 24 Apr 2019 07:54:03 +0200 Subject: Re: [PATCH 1/1] be2net: Detach interface for avoiding a system crash To: saeedm@mellanox.com, sathya.perla@broadcom.com, sriharsha.basavapatna@broadcom.com, somnath.kotur@broadcom.com, ajit.khaparde@broadcom.com, davem@davemloft.net, Firo Yang Cc: netdev@vger.kernel.org References: <20190401122421.30116-1-fyang@suse.com> <786ef1d29facea3163b91ce160dd146f5759d160.camel@mellanox.com> <22d06c53-2cfc-6000-ba0d-044dd9457a10@suse.com> <761637d4ccc2ec265b7418fa31c36c0f37d574f1.camel@mellanox.com> <65b80fd5-7dcf-3cc6-b5e9-00a6a9d8706a@suse.com> <03607909a259d01b461d912ccff0042554871a37.camel@mellanox.com> From: Firo Message-ID: Date: Wed, 24 Apr 2019 13:53:58 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: <03607909a259d01b461d912ccff0042554871a37.camel@mellanox.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8BIT Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On 4/20/19 6:31 AM, Saeed Mahameed wrote: > On Fri, 2019-04-19 at 21:07 +0800, Firo wrote: >> >> On 4/19/19 2:17 AM, Saeed Mahameed wrote: >>> On Thu, 2019-04-18 at 15:05 +0800, Firo wrote: >>>> On 4/2/19 12:25 AM, Saeed Mahameed wrote: >>>>> On Mon, 2019-04-01 at 20:24 +0800, Firo Yang wrote: >>>>>> This crash is triggered by a user-after-free since lake of >>>>>> the synchronization of a race condition between >>>>>> be_update_queues() modifying multi-purpose channels of >>>>>> network device and be_tx_timeout(). >>>>>> >>>>>> BUG: unable to handle kernel NULL pointer dereference at >>>>>> (null) >>>>>> Call Trace: >>>>>> be_tx_timeout+0xa5/0x360 [be2net] >>>>>> dev_watchdog+0x1d8/0x210 >>>>>> call_timer_fn+0x32/0x140 >>>>>> >>>>>> To fix it, detach the interface before modifying >>>>>> multi-purpose channels of network device. >>>>>> >>>>>> Signed-off-by: Firo Yang >>>>>> --- >>>>>> drivers/net/ethernet/emulex/benet/be_main.c | 12 ++++++++--- >>>>>> - >>>>>> 1 file changed, 8 insertions(+), 4 deletions(-) >>>>>> >>>>>> diff --git a/drivers/net/ethernet/emulex/benet/be_main.c >>>>>> b/drivers/net/ethernet/emulex/benet/be_main.c >>>>>> index d5026909dec5..25d0128bf684 100644 >>>>>> --- a/drivers/net/ethernet/emulex/benet/be_main.c >>>>>> +++ b/drivers/net/ethernet/emulex/benet/be_main.c >>>>>> @@ -4705,6 +4705,8 @@ int be_update_queues(struct be_adapter >>>>>> *adapter) >>>>>> struct net_device *netdev = adapter->netdev; >>>>>> int status; >>>>>> >>>>>> + netif_device_detach(netdev); >>>>>> + >>>>> >>>>> This will reduce the probability, but will not do the trick. >>>>> since this will not guarantee that the dev_watchdog is >>>>> disabled. >>>> Hi Saeed, >>>> >>>> What about using dev_watchdog_down/up() to temporarily disable >>>> the >>>> dev_watchdog? >>>> >>>> +++ b/drivers/net/ethernet/emulex/benet/be_main.c >>>> @@ -4697,6 +4697,8 @@ int be_update_queues(struct be_adapter >>>> *adapter) >>>> struct net_device *netdev = adapter->netdev; >>>> int status; >>>> >>>> + dev_watchdog_down(); >>>> + >>> >>> there is no such API, currently this is a static function, and i >>> don't >>> think it is a good idea to mess around with the watchdog. >>> >>> If you want to avoid deferred work and explicit locking, you need >>> something similar to what you did with the device_detach to flag to >>> the >>> watchdog to not look at your tx queues. >>> >>> by looking at be_close() I see that it calls >>> netif_tx_disable(netdev); >>> which provides some kind of state synchronization with the >>> watchdog. >>> >>> so maybe netif_device_detach() will work, but it is a very heavy >>> gun! >> >> Sorry, I cannot fully understand why you think netif_device_detach() >> is >> very heavy. Except clearing __LINK_STATE_PRESENT, >> netif_device_detach() >> almost does the same thing as netif_tx_disable(); could you please >> detail your thought? >> > > calling netif_tx_disable on its own is not enough to make the watchdog > back-off, you need the extra flag from carrier_off or detach_device. I think your first comment is right. Locking mechanism is necessary here. Consider the following situation. In dev_watchdog(), due to some reasons[1], netif_xmit_stopped(txq) && time_after(jiffies, (trans_start + dev->watchdog_timeo)) is true; and dev_watchdog() tests the outermost if-statement[2] just before be_update_queues() executes carrier_off or detach_device; then dev_watchdog() will be possible to hit a kernel oops. [1]: For example, a real problem causes that NIC stopped and timed out. [2]: if (!qdisc_tx_is_noop(dev)) { if (netif_device_present(dev) && netif_running(dev) && netif_carrier_ok(dev)) > > 1) the documentation of netif_device_detach says > "Mark device as removed from system and therefore no longer available." > which is not true in your case. Indeed. > > 2) netif_device_present(dev) check is used very widely in netdev > control paths, so if you detach the device while doing one config you > will loose all other configs and they would wrongfully return -ENODEV; I went through a few *ioctl and netlink code; it seems that they all under the protection of RTNL lock and won't be able to return -ENODEV. > > calling carrier_off will make the watch-dog back off but still allow > other configurations. > > the carrier is used to mark that the device is not available for tx, > which is exactly your case since you are resetting the rings, but still > available for anything else. > > looking at other drivers, i couldn't find anyone using the > detach_device approach to disable the watchdog while resetting the > rings, but almost most of the drivers are doing the carrier_off. > >>> netif_carrier_off(netdev) should do the work, and at the end you >>> will >>> need to restore the original carrier state. >> >> netif_carrier_off() might trigger a linkwatch event. From this point >> of >> view, maybe netif_device_detach() is better. >> > > Code-wise yes netif_device_detach is better, but the implications can > be worse, as explained above. > > My 2 cent is to copy the approach used by most drivers, and just use > netif_carrier_off(), and maybe in the future we will introduce a more > relaxed version of carrier off to shut-up the watchdog, and fix all > drivers at once :).> > I will leave the decision to you and to this driver maintainers. > > Thanks for following up on my comments. Thank you for such detailed explanation! // Firo > >> Thanks, >> Firo >> >>> >>> carrier_ok = netif_carrier_ok(netdev); >>> /* must be called before netif_tx_disable() */ >>> netif_carrier_off(netdev); >>> >>> // do stuff >>> >>> if (carrier_ok) >>> netif_carrier_on(netdev); >>> >>>> if (netif_running(netdev)) >>>> be_close(netdev); >>>> >>>> @@ -4711,21 +4713,21 @@ int be_update_queues(struct be_adapter >>>> *adapter) >>>> be_clear_queues(adapter); >>>> status = be_cmd_if_destroy(adapter, adapter- >>>>> if_handle, 0); >>>> if (status) >>>> - return status; >>>> + goto out; >>>> >>>> if (!msix_enabled(adapter)) { >>>> status = be_msix_enable(adapter); >>>> if (status) >>>> - return status; >>>> + goto out; >>>> } >>>> >>>> status = be_if_create(adapter); >>>> if (status) >>>> - return status; >>>> + goto out; >>>> >>>> status = be_setup_queues(adapter); >>>> if (status) >>>> - return status; >>>> + goto out; >>>> >>>> be_schedule_worker(adapter); >>>> >>>> @@ -4741,6 +4743,9 @@ int be_update_queues(struct be_adapter >>>> *adapter) >>>> if (netif_running(netdev)) >>>> status = be_open(netdev); >>>> >>>> +out: >>>> + dev_watchdog_up(); >>>> >>>> Thanks, >>>> Firo >>>> >>>>> what you need is proper locking mechanism and/or scheduling the >>>>> tx_timeout handling out of atomic context if a mutex will be >>>>> required. >>>>> >>>>> netif_device_detach is a too heavy hammer for such >>>>> synchronizations >>>>> tasks. >>>>> >>>>>> if (netif_running(netdev)) >>>>>> be_close(netdev); >>>>>> >>>>>> @@ -4719,21 +4721,21 @@ int be_update_queues(struct >>>>>> be_adapter >>>>>> *adapter) >>>>>> be_clear_queues(adapter); >>>>>> status = be_cmd_if_destroy(adapter, adapter- >>>>>>> if_handle, 0); >>>>>> if (status) >>>>>> - return status; >>>>>> + goto out; >>>>>> >>>>>> if (!msix_enabled(adapter)) { >>>>>> status = be_msix_enable(adapter); >>>>>> if (status) >>>>>> - return status; >>>>>> + goto out; >>>>>> } >>>>>> >>>>>> status = be_if_create(adapter); >>>>>> if (status) >>>>>> - return status; >>>>>> + goto out; >>>>>> >>>>>> status = be_setup_queues(adapter); >>>>>> if (status) >>>>>> - return status; >>>>>> + goto out; >>>>>> >>>>>> be_schedule_worker(adapter); >>>>>> >>>>>> @@ -4748,6 +4750,8 @@ int be_update_queues(struct be_adapter >>>>>> *adapter) >>>>>> if (netif_running(netdev)) >>>>>> status = be_open(netdev); >>>>>> >>>>>> +out: >>>>>> + netif_device_attach(netdev); >>>>>> return status; >>>>>> } >>>>>> >> >>