From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A5A8AC282DF for ; Fri, 19 Apr 2019 19:07:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6C11620645 for ; Fri, 19 Apr 2019 19:07:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728399AbfDSTHU convert rfc822-to-8bit (ORCPT ); Fri, 19 Apr 2019 15:07:20 -0400 Received: from smtp.nue.novell.com ([195.135.221.5]:39805 "EHLO smtp.nue.novell.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727267AbfDSTHT (ORCPT ); Fri, 19 Apr 2019 15:07:19 -0400 Received: from [10.67.17.168] ([45.122.156.254]) by smtp.nue.novell.com with ESMTP (TLS encrypted); Fri, 19 Apr 2019 15:07:14 +0200 From: Firo Subject: Re: [PATCH 1/1] be2net: Detach interface for avoiding a system crash To: Saeed Mahameed , "sathya.perla@broadcom.com" , "sriharsha.basavapatna@broadcom.com" , "somnath.kotur@broadcom.com" , "ajit.khaparde@broadcom.com" , "davem@davemloft.net" Cc: "firogm@gmail.com" , "netdev@vger.kernel.org" References: <20190401122421.30116-1-fyang@suse.com> <786ef1d29facea3163b91ce160dd146f5759d160.camel@mellanox.com> <22d06c53-2cfc-6000-ba0d-044dd9457a10@suse.com> <761637d4ccc2ec265b7418fa31c36c0f37d574f1.camel@mellanox.com> Message-ID: <65b80fd5-7dcf-3cc6-b5e9-00a6a9d8706a@suse.com> Date: Fri, 19 Apr 2019 21:07:09 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: <761637d4ccc2ec265b7418fa31c36c0f37d574f1.camel@mellanox.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8BIT Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On 4/19/19 2:17 AM, Saeed Mahameed wrote: > On Thu, 2019-04-18 at 15:05 +0800, Firo wrote: >> >> On 4/2/19 12:25 AM, Saeed Mahameed wrote: >>> On Mon, 2019-04-01 at 20:24 +0800, Firo Yang wrote: >>>> This crash is triggered by a user-after-free since lake of >>>> the synchronization of a race condition between >>>> be_update_queues() modifying multi-purpose channels of >>>> network device and be_tx_timeout(). >>>> >>>> BUG: unable to handle kernel NULL pointer dereference at (null) >>>> Call Trace: >>>> be_tx_timeout+0xa5/0x360 [be2net] >>>> dev_watchdog+0x1d8/0x210 >>>> call_timer_fn+0x32/0x140 >>>> >>>> To fix it, detach the interface before modifying >>>> multi-purpose channels of network device. >>>> >>>> Signed-off-by: Firo Yang >>>> --- >>>> drivers/net/ethernet/emulex/benet/be_main.c | 12 ++++++++---- >>>> 1 file changed, 8 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/drivers/net/ethernet/emulex/benet/be_main.c >>>> b/drivers/net/ethernet/emulex/benet/be_main.c >>>> index d5026909dec5..25d0128bf684 100644 >>>> --- a/drivers/net/ethernet/emulex/benet/be_main.c >>>> +++ b/drivers/net/ethernet/emulex/benet/be_main.c >>>> @@ -4705,6 +4705,8 @@ int be_update_queues(struct be_adapter >>>> *adapter) >>>> struct net_device *netdev = adapter->netdev; >>>> int status; >>>> >>>> + netif_device_detach(netdev); >>>> + >>> >>> This will reduce the probability, but will not do the trick. >>> since this will not guarantee that the dev_watchdog is disabled. >> Hi Saeed, >> >> What about using dev_watchdog_down/up() to temporarily disable the >> dev_watchdog? >> >> +++ b/drivers/net/ethernet/emulex/benet/be_main.c >> @@ -4697,6 +4697,8 @@ int be_update_queues(struct be_adapter >> *adapter) >> struct net_device *netdev = adapter->netdev; >> int status; >> >> + dev_watchdog_down(); >> + > > there is no such API, currently this is a static function, and i don't > think it is a good idea to mess around with the watchdog. > > If you want to avoid deferred work and explicit locking, you need > something similar to what you did with the device_detach to flag to the > watchdog to not look at your tx queues. > > by looking at be_close() I see that it calls netif_tx_disable(netdev); > which provides some kind of state synchronization with the watchdog. > > so maybe netif_device_detach() will work, but it is a very heavy gun! Sorry, I cannot fully understand why you think netif_device_detach() is very heavy. Except clearing __LINK_STATE_PRESENT, netif_device_detach() almost does the same thing as netif_tx_disable(); could you please detail your thought? > netif_carrier_off(netdev) should do the work, and at the end you will > need to restore the original carrier state. netif_carrier_off() might trigger a linkwatch event. From this point of view, maybe netif_device_detach() is better. Thanks, Firo > > carrier_ok = netif_carrier_ok(netdev); > /* must be called before netif_tx_disable() */ > netif_carrier_off(netdev); > > // do stuff > > if (carrier_ok) > netif_carrier_on(netdev); > >> if (netif_running(netdev)) >> be_close(netdev); >> >> @@ -4711,21 +4713,21 @@ int be_update_queues(struct be_adapter >> *adapter) >> be_clear_queues(adapter); >> status = be_cmd_if_destroy(adapter, adapter->if_handle, 0); >> if (status) >> - return status; >> + goto out; >> >> if (!msix_enabled(adapter)) { >> status = be_msix_enable(adapter); >> if (status) >> - return status; >> + goto out; >> } >> >> status = be_if_create(adapter); >> if (status) >> - return status; >> + goto out; >> >> status = be_setup_queues(adapter); >> if (status) >> - return status; >> + goto out; >> >> be_schedule_worker(adapter); >> >> @@ -4741,6 +4743,9 @@ int be_update_queues(struct be_adapter >> *adapter) >> if (netif_running(netdev)) >> status = be_open(netdev); >> >> +out: >> + dev_watchdog_up(); >> >> Thanks, >> Firo >> >>> what you need is proper locking mechanism and/or scheduling the >>> tx_timeout handling out of atomic context if a mutex will be >>> required. >>> >>> netif_device_detach is a too heavy hammer for such synchronizations >>> tasks. >>> >>>> if (netif_running(netdev)) >>>> be_close(netdev); >>>> >>>> @@ -4719,21 +4721,21 @@ int be_update_queues(struct be_adapter >>>> *adapter) >>>> be_clear_queues(adapter); >>>> status = be_cmd_if_destroy(adapter, adapter->if_handle, 0); >>>> if (status) >>>> - return status; >>>> + goto out; >>>> >>>> if (!msix_enabled(adapter)) { >>>> status = be_msix_enable(adapter); >>>> if (status) >>>> - return status; >>>> + goto out; >>>> } >>>> >>>> status = be_if_create(adapter); >>>> if (status) >>>> - return status; >>>> + goto out; >>>> >>>> status = be_setup_queues(adapter); >>>> if (status) >>>> - return status; >>>> + goto out; >>>> >>>> be_schedule_worker(adapter); >>>> >>>> @@ -4748,6 +4750,8 @@ int be_update_queues(struct be_adapter >>>> *adapter) >>>> if (netif_running(netdev)) >>>> status = be_open(netdev); >>>> >>>> +out: >>>> + netif_device_attach(netdev); >>>> return status; >>>> } >>>>