From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 549ADC38A2A for ; Fri, 8 May 2020 12:39:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2780121835 for ; Fri, 8 May 2020 12:39:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1588941579; bh=XgqZ47eejARrn/IJvigT5qeeQFyfofjfC9J7rZ7c808=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=2PaPVYD7DJZ8iGt5qNbDUYPg42NzoRUw8Aq+Xe5N3K8SjtOZw1v8PD3Tuf/ly5m94 AMxSTCKFf1DBompU4u+LrtgaCDYIgFc3eKi06SkMvt94VwvYi6DIr3m0jxq50mF1M6 HmAXM9nRYZahsA0iGSzRE7RPetA7+czYrbdxPrYY= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728592AbgEHMjh (ORCPT ); Fri, 8 May 2020 08:39:37 -0400 Received: from mail.kernel.org ([198.145.29.99]:58402 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727108AbgEHMjW (ORCPT ); Fri, 8 May 2020 08:39:22 -0400 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id E1C1521835; Fri, 8 May 2020 12:39:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1588941561; bh=XgqZ47eejARrn/IJvigT5qeeQFyfofjfC9J7rZ7c808=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=tf7yUI+D60RwcdQ9bCvmWfvv03XP5e4LNFFzq18+ysJdwOyFvLKlwhhR3tuygAQQx 08C6bkd1orrOhDxQ6jM6fjkJaCWTv43l7M6kV/Pz6G020dCPsHtgiRHpjtdQj6jMH7 4Ai7FMBkU4ptcH3VxLZtoQv2eGxNwCPIoz7Pt4E8= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Mohamad Haj Yahia , Saeed Mahameed , "David S. Miller" Subject: [PATCH 4.4 081/312] net/mlx5: Avoid calling sleeping function by the health poll thread Date: Fri, 8 May 2020 14:31:12 +0200 Message-Id: <20200508123130.227210621@linuxfoundation.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200508123124.574959822@linuxfoundation.org> References: <20200508123124.574959822@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Mohamad Haj Yahia commit c1d4d2e92ad670168a17a57dfa182a5a5baa72d4 upstream. In internal error state the health poll thread will eventually call synchronize_irq() (to safely trigger command completions) which might sleep, so we are calling sleeping function from atomic context which is invalid. Here we move trigger_cmd_completions(dev) to enter error state which is the earliest stage in error state handling. This way we won't need to wait for next health poll to trigger command completions and will solve the scheduling while atomic issue. mlx5_enter_error_state can be called from two contexts, protect it with dev->intf_state_lock Fixes: 89d44f0a6c73 ('net/mlx5_core: Add pci error handlers to mlx5_core driver') Signed-off-by: Mohamad Haj Yahia Signed-off-by: Saeed Mahameed Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/mellanox/mlx5/core/health.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) --- a/drivers/net/ethernet/mellanox/mlx5/core/health.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c @@ -108,15 +108,21 @@ static int in_fatal(struct mlx5_core_dev void mlx5_enter_error_state(struct mlx5_core_dev *dev) { + mutex_lock(&dev->intf_state_mutex); if (dev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR) - return; + goto unlock; mlx5_core_err(dev, "start\n"); - if (pci_channel_offline(dev->pdev) || in_fatal(dev)) + if (pci_channel_offline(dev->pdev) || in_fatal(dev)) { dev->state = MLX5_DEVICE_STATE_INTERNAL_ERROR; + trigger_cmd_completions(dev); + } mlx5_core_event(dev, MLX5_DEV_EVENT_SYS_ERROR, 0); mlx5_core_err(dev, "end\n"); + +unlock: + mutex_unlock(&dev->intf_state_mutex); } static void mlx5_handle_bad_state(struct mlx5_core_dev *dev) @@ -245,7 +251,6 @@ static void poll_health(unsigned long da u32 count; if (dev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR) { - trigger_cmd_completions(dev); mod_timer(&health->timer, get_next_poll_jiffies()); return; }