From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=jfLS=S2=vger.kernel.org=netdev-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS,
	INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_PASS,URIBL_BLOCKED
	autolearn=ham autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id AC221C10F11
	for <netdev@archiver.kernel.org>; Wed, 24 Apr 2019 05:54:09 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.kernel.org (Postfix) with ESMTP id 7D525218B0
	for <netdev@archiver.kernel.org>; Wed, 24 Apr 2019 05:54:09 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1729629AbfDXFyI convert rfc822-to-8bit (ORCPT
        <rfc822;netdev@archiver.kernel.org>); Wed, 24 Apr 2019 01:54:08 -0400
Received: from smtp.nue.novell.com ([195.135.221.5]:55229 "EHLO
        smtp.nue.novell.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1729244AbfDXFyI (ORCPT
        <rfc822;groupwise-netdev@vger.kernel.org:0:0>);
        Wed, 24 Apr 2019 01:54:08 -0400
Received: from [10.67.17.168] ([45.122.156.254])
        by smtp.nue.novell.com with ESMTP (TLS encrypted); Wed, 24 Apr 2019 07:54:03 +0200
Subject: Re: [PATCH 1/1] be2net: Detach interface for avoiding a system crash
To:     saeedm@mellanox.com, sathya.perla@broadcom.com,
        sriharsha.basavapatna@broadcom.com, somnath.kotur@broadcom.com,
        ajit.khaparde@broadcom.com, davem@davemloft.net,
        Firo Yang <firogm@gmail.com>
Cc:     netdev@vger.kernel.org
References: <20190401122421.30116-1-fyang@suse.com>
 <786ef1d29facea3163b91ce160dd146f5759d160.camel@mellanox.com>
 <22d06c53-2cfc-6000-ba0d-044dd9457a10@suse.com>
 <761637d4ccc2ec265b7418fa31c36c0f37d574f1.camel@mellanox.com>
 <65b80fd5-7dcf-3cc6-b5e9-00a6a9d8706a@suse.com>
 <03607909a259d01b461d912ccff0042554871a37.camel@mellanox.com>
From:   Firo <fyang@suse.com>
Message-ID: <a6b9bbe1-3f3c-b798-56ab-8193d481e364@suse.com>
Date:   Wed, 24 Apr 2019 13:53:58 +0800
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101
 Thunderbird/60.6.1
MIME-Version: 1.0
In-Reply-To: <03607909a259d01b461d912ccff0042554871a37.camel@mellanox.com>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 8BIT
Sender: netdev-owner@vger.kernel.org
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org


On 4/20/19 6:31 AM, Saeed Mahameed wrote:
> On Fri, 2019-04-19 at 21:07 +0800, Firo wrote:
>>
>> On 4/19/19 2:17 AM, Saeed Mahameed wrote:
>>> On Thu, 2019-04-18 at 15:05 +0800, Firo wrote:
>>>> On 4/2/19 12:25 AM, Saeed Mahameed wrote:
>>>>> On Mon, 2019-04-01 at 20:24 +0800, Firo Yang wrote:
>>>>>> This crash is triggered by a user-after-free since lake of
>>>>>> the synchronization of a race condition between   
>>>>>> be_update_queues() modifying multi-purpose channels of
>>>>>> network device and be_tx_timeout().
>>>>>>
>>>>>> BUG: unable to handle kernel NULL pointer dereference at
>>>>>> (null)
>>>>>> Call Trace:
>>>>>> be_tx_timeout+0xa5/0x360 [be2net]
>>>>>> dev_watchdog+0x1d8/0x210
>>>>>> call_timer_fn+0x32/0x140
>>>>>>
>>>>>> To fix it, detach the interface before modifying
>>>>>> multi-purpose channels of network device.
>>>>>>
>>>>>> Signed-off-by: Firo Yang <fyang@suse.com>
>>>>>> ---
>>>>>>  drivers/net/ethernet/emulex/benet/be_main.c | 12 ++++++++---
>>>>>> -
>>>>>>  1 file changed, 8 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/net/ethernet/emulex/benet/be_main.c
>>>>>> b/drivers/net/ethernet/emulex/benet/be_main.c
>>>>>> index d5026909dec5..25d0128bf684 100644
>>>>>> --- a/drivers/net/ethernet/emulex/benet/be_main.c
>>>>>> +++ b/drivers/net/ethernet/emulex/benet/be_main.c
>>>>>> @@ -4705,6 +4705,8 @@ int be_update_queues(struct be_adapter
>>>>>> *adapter)
>>>>>>  	struct net_device *netdev = adapter->netdev;
>>>>>>  	int status;
>>>>>>    
>>>>>> +	netif_device_detach(netdev);
>>>>>> +
>>>>>
>>>>> This will reduce the probability, but will not do the trick.
>>>>> since this will not guarantee that the dev_watchdog is
>>>>> disabled.
>>>> Hi Saeed,
>>>>
>>>> What about using dev_watchdog_down/up() to temporarily disable
>>>> the
>>>> dev_watchdog?
>>>>
>>>> +++ b/drivers/net/ethernet/emulex/benet/be_main.c
>>>> @@ -4697,6 +4697,8 @@ int be_update_queues(struct be_adapter
>>>> *adapter)
>>>>         struct net_device *netdev = adapter->netdev;
>>>>         int status;
>>>>
>>>> +       dev_watchdog_down();
>>>> +
>>>   
>>> there is no such API, currently this is a static function, and i
>>> don't
>>> think it is a good idea to mess around with the watchdog.
>>>   
>>> If you want to avoid deferred work and explicit locking, you need
>>> something similar to what you did with the device_detach to flag to
>>> the
>>> watchdog to not look at your tx queues.
>>>   
>>> by looking at be_close() I see that it calls
>>> netif_tx_disable(netdev);
>>> which provides some kind of state synchronization with the
>>> watchdog.
>>>   
>>> so maybe netif_device_detach() will work, but it is a very heavy
>>> gun!
>>
>> Sorry, I cannot fully understand why you think netif_device_detach()
>> is
>> very heavy. Except clearing __LINK_STATE_PRESENT,
>> netif_device_detach()
>> almost does the same thing as  netif_tx_disable(); could you please
>> detail your thought?
>>
> 
> calling netif_tx_disable on its own is not enough to make the watchdog
> back-off, you need the extra flag from carrier_off or detach_device.
I think your first comment is right. Locking mechanism is necessary
here. Consider the following situation. In dev_watchdog(), due to some
reasons[1], netif_xmit_stopped(txq) && time_after(jiffies, (trans_start
+ dev->watchdog_timeo)) is true; and dev_watchdog() tests the outermost
if-statement[2] just before be_update_queues() executes carrier_off or
detach_device; then dev_watchdog() will be possible to hit a kernel oops.

[1]: For example, a real problem causes that NIC stopped and timed out.
[2]: if (!qdisc_tx_is_noop(dev)) {
                if (netif_device_present(dev) &&
                    netif_running(dev) &&
                    netif_carrier_ok(dev))
> 
> 1) the documentation of netif_device_detach says 
> "Mark device as removed from system and therefore no longer available."
> which is not true in your case.
Indeed.
> 
> 2) netif_device_present(dev) check is used very widely in netdev
> control paths, so if you detach the device while doing one config you
> will loose all other configs and they would wrongfully return -ENODEV;
I went through a few *ioctl and netlink code; it seems that they all
under the protection of RTNL lock and won't be able to return -ENODEV.

> 
> calling carrier_off will make the watch-dog back off but still allow
> other configurations.
> 
> the carrier is used to mark that the device is not available for tx,
> which is exactly your case since you are resetting the rings, but still
> available for anything else.
> 
> looking at other drivers, i couldn't find anyone using the
> detach_device approach to disable the watchdog while resetting the
> rings, but almost most of the drivers are doing the carrier_off.
> 
>>> netif_carrier_off(netdev) should do the work, and at the end you
>>> will
>>> need to restore the original carrier state.
>>
>> netif_carrier_off() might trigger a linkwatch event. From this point
>> of
>> view, maybe netif_device_detach() is better.
>>
> 
> Code-wise yes netif_device_detach is better, but the implications can
> be worse, as explained above.
> 
> My 2 cent is to copy the approach used by most drivers, and just use
> netif_carrier_off(), and maybe in the future we will introduce a more
> relaxed version of carrier off to shut-up the watchdog, and fix all
> drivers at once :).>
> I will leave the decision to you and to this driver maintainers.
> 
> Thanks for following up on my comments.
Thank you for such detailed explanation!
// Firo

> 
>> Thanks,
>> Firo
>>
>>>   
>>> carrier_ok = netif_carrier_ok(netdev);
>>> /* must be called before netif_tx_disable() */
>>> netif_carrier_off(netdev);   
>>>   
>>> // do stuff
>>>   
>>> if (carrier_ok)
>>> 	netif_carrier_on(netdev);
>>>   
>>>>         if (netif_running(netdev))
>>>>                 be_close(netdev);
>>>>
>>>> @@ -4711,21 +4713,21 @@ int be_update_queues(struct be_adapter
>>>> *adapter)
>>>>         be_clear_queues(adapter);
>>>>         status = be_cmd_if_destroy(adapter, adapter-
>>>>> if_handle,  0);
>>>>         if (status)
>>>> -               return status;
>>>> +               goto out;
>>>>
>>>>         if (!msix_enabled(adapter)) {
>>>>                 status = be_msix_enable(adapter);
>>>>                 if (status)
>>>> -                       return status;
>>>> +                       goto out;
>>>>         }
>>>>
>>>>         status = be_if_create(adapter);
>>>>         if (status)
>>>> -               return status;
>>>> +               goto out;
>>>>
>>>>         status = be_setup_queues(adapter);
>>>>         if (status)
>>>> -               return status;
>>>> +               goto out;
>>>>
>>>>         be_schedule_worker(adapter);
>>>>
>>>> @@ -4741,6 +4743,9 @@ int be_update_queues(struct be_adapter
>>>> *adapter)
>>>>         if (netif_running(netdev))
>>>>                 status = be_open(netdev);
>>>>
>>>> +out:
>>>> +       dev_watchdog_up();
>>>>
>>>> Thanks,
>>>> Firo
>>>>
>>>>> what you need is proper locking mechanism and/or scheduling the
>>>>> tx_timeout handling out of atomic context if a mutex will be
>>>>> required.
>>>>>
>>>>> netif_device_detach is a too heavy hammer for such
>>>>> synchronizations
>>>>> tasks.
>>>>>
>>>>>>  	if (netif_running(netdev))
>>>>>>  		be_close(netdev);
>>>>>>    
>>>>>> @@ -4719,21 +4721,21 @@ int be_update_queues(struct
>>>>>> be_adapter
>>>>>> *adapter)
>>>>>>  	be_clear_queues(adapter);
>>>>>>  	status = be_cmd_if_destroy(adapter, adapter-
>>>>>>> if_handle,  0);
>>>>>>  	if (status)
>>>>>> -		return status;
>>>>>> +		goto out;
>>>>>>    
>>>>>>  	if (!msix_enabled(adapter)) {
>>>>>>  		status = be_msix_enable(adapter);
>>>>>>  		if (status)
>>>>>> -			return status;
>>>>>> +			goto out;
>>>>>>  	}
>>>>>>    
>>>>>>  	status = be_if_create(adapter);
>>>>>>  	if (status)
>>>>>> -		return status;
>>>>>> +		goto out;
>>>>>>    
>>>>>>  	status = be_setup_queues(adapter);
>>>>>>  	if (status)
>>>>>> -		return status;
>>>>>> +		goto out;
>>>>>>    
>>>>>>  	be_schedule_worker(adapter);
>>>>>>    
>>>>>> @@ -4748,6 +4750,8 @@ int be_update_queues(struct be_adapter
>>>>>> *adapter)
>>>>>>  	if (netif_running(netdev))
>>>>>>  		status = be_open(netdev);
>>>>>>    
>>>>>> +out:
>>>>>> +	netif_device_attach(netdev);
>>>>>>  	return status;
>>>>>>  }
>>>>>>    
>>
>>