From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.3 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EFB51C0650F for ; Sun, 11 Aug 2019 16:31:13 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C3D252085B for ; Sun, 11 Aug 2019 16:31:13 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="Q03quxGY"; dkim=fail reason="signature verification failed" (1024-bit key) header.d=lunn.ch header.i=@lunn.ch header.b="y8xe1vCI" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C3D252085B Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=lunn.ch Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=ldkkRonrTELKHDgcmOneUU85bDZNPZGUcjyIGpXuYI8=; b=Q03quxGY+Lnrb4 xx1LFvUPpwSgl8k5tAGDwQe6oJNCinFkJPTN3SDSaYaohokQCMQ0dpqNFnJJrWm6nfSt5Bp1G34id 2wBpgaec/cvqCxup3+K5wcVy1dhoEM+dESVvdpK873YheWPn2jLJGfuBeR7PzbRf0td6/ssBI2f2E ieXnmLyHMShArlL6R99Z/YLKLTvr57kerpCMcY3/QueTvKBDGBHsj2AUrdGv3NRSWpseDQ6hqzbar XGJAVLo+BkNSim02roNUDW27sjUS5i/Tqp4hDxM8dYpfoMMXJ6e9Ad1NIWZtjJhUNK2zSaiH4YBnC K5iTsxV7iP2PjU33Osag==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.92 #3 (Red Hat Linux)) id 1hwqkS-0004Av-Lv; Sun, 11 Aug 2019 16:31:12 +0000 Received: from vps0.lunn.ch ([185.16.172.187]) by bombadil.infradead.org with esmtps (Exim 4.92 #3 (Red Hat Linux)) id 1hwqkP-0004Aa-0f for linux-arm-kernel@lists.infradead.org; Sun, 11 Aug 2019 16:31:10 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lunn.ch; s=20171124; h=In-Reply-To:Content-Type:MIME-Version:References:Message-ID: Subject:Cc:To:From:Date:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=lozDe2KJhIZbEGQBfmlQNXhoJltngcT+6KrAJ7pHjU0=; b=y8xe1vCIbnoaJUi+9k8+FHNKe0 xmZz6pGIaArP2t6zvIo7fcabjWTH6JZVjl+a4OSS9SFMmgI57ryRebUmrEOHziyz+M6NYGxB9YNIC om+/lQntAlqUi7jhH/OZdwGE9jqIeEPGGdSEOaPpvZ7q9hxv2FDk/OgDjDv0XaJ8UW5w=; Received: from andrew by vps0.lunn.ch with local (Exim 4.89) (envelope-from ) id 1hwqkN-0004Bz-Ao; Sun, 11 Aug 2019 18:31:07 +0200 Date: Sun, 11 Aug 2019 18:31:07 +0200 From: Andrew Lunn To: Russell King - ARM Linux admin Subject: Re: [BUG] fec mdio times out under system stress Message-ID: <20190811163107.GE14290@lunn.ch> References: <20190811133707.GC13294@shell.armlinux.org.uk> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20190811133707.GC13294@shell.armlinux.org.uk> User-Agent: Mutt/1.5.23 (2014-03-12) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20190811_093109_214271_F02AA0F2 X-CRM114-Status: GOOD ( 14.02 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: netdev@vger.kernel.org, Florian Fainelli , Fabio Estevam , linux-arm-kernel@lists.infradead.org, Heiner Kallweit Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Sun, Aug 11, 2019 at 02:37:07PM +0100, Russell King - ARM Linux admin wrote: > Hi Fabio, > > When I woke up this morning, I found that one of the Hummingboards > had gone offline (as in, lost network link) during the night. > Investigating, I find that the system had gone into OOM, and at > that time, triggered an unrelated: > > [4111697.698776] fec 2188000.ethernet eth0: MDIO read timeout > [4111697.712996] MII_DATA: 0x6006796d > [4111697.729415] MII_SPEED: 0x0000001a > [4111697.745232] IEVENT: 0x00000000 > [4111697.745242] IMASK: 0x0a8000aa > [4111698.002233] Atheros 8035 ethernet 2188000.ethernet-1:00: PHY state change RUNNING -> HALTED > [4111698.009882] fec 2188000.ethernet eth0: Link is Down > > This is on a dual-core iMX6. > > It looks like the read actually completed (since MII_DATA contains > the register data) but we somehow lost the interrupt (or maybe > received the interrupt after wait_for_completion_timeout() timed > out.) Hi Russell The timeout is quite short, #define FEC_MII_TIMEOUT 30000 /* us */ Looking at the Vybrid datasheet, there does not appear to be any way to determine if the hardware is busy other than waiting for the interrupt. There is no 'busy' bit which gets cleared on completion. So about the only option is to make the timeout bigger. Andrew _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.1 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 747B9C0650F for ; Sun, 11 Aug 2019 16:31:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 39DBF2085B for ; Sun, 11 Aug 2019 16:31:12 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=lunn.ch header.i=@lunn.ch header.b="y8xe1vCI" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725939AbfHKQbL (ORCPT ); Sun, 11 Aug 2019 12:31:11 -0400 Received: from vps0.lunn.ch ([185.16.172.187]:51310 "EHLO vps0.lunn.ch" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725837AbfHKQbK (ORCPT ); Sun, 11 Aug 2019 12:31:10 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lunn.ch; s=20171124; h=In-Reply-To:Content-Type:MIME-Version:References:Message-ID: Subject:Cc:To:From:Date:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=lozDe2KJhIZbEGQBfmlQNXhoJltngcT+6KrAJ7pHjU0=; b=y8xe1vCIbnoaJUi+9k8+FHNKe0 xmZz6pGIaArP2t6zvIo7fcabjWTH6JZVjl+a4OSS9SFMmgI57ryRebUmrEOHziyz+M6NYGxB9YNIC om+/lQntAlqUi7jhH/OZdwGE9jqIeEPGGdSEOaPpvZ7q9hxv2FDk/OgDjDv0XaJ8UW5w=; Received: from andrew by vps0.lunn.ch with local (Exim 4.89) (envelope-from ) id 1hwqkN-0004Bz-Ao; Sun, 11 Aug 2019 18:31:07 +0200 Date: Sun, 11 Aug 2019 18:31:07 +0200 From: Andrew Lunn To: Russell King - ARM Linux admin Cc: linux-arm-kernel@lists.infradead.org, Fabio Estevam , netdev@vger.kernel.org, Florian Fainelli , Heiner Kallweit Subject: Re: [BUG] fec mdio times out under system stress Message-ID: <20190811163107.GE14290@lunn.ch> References: <20190811133707.GC13294@shell.armlinux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190811133707.GC13294@shell.armlinux.org.uk> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Sun, Aug 11, 2019 at 02:37:07PM +0100, Russell King - ARM Linux admin wrote: > Hi Fabio, > > When I woke up this morning, I found that one of the Hummingboards > had gone offline (as in, lost network link) during the night. > Investigating, I find that the system had gone into OOM, and at > that time, triggered an unrelated: > > [4111697.698776] fec 2188000.ethernet eth0: MDIO read timeout > [4111697.712996] MII_DATA: 0x6006796d > [4111697.729415] MII_SPEED: 0x0000001a > [4111697.745232] IEVENT: 0x00000000 > [4111697.745242] IMASK: 0x0a8000aa > [4111698.002233] Atheros 8035 ethernet 2188000.ethernet-1:00: PHY state change RUNNING -> HALTED > [4111698.009882] fec 2188000.ethernet eth0: Link is Down > > This is on a dual-core iMX6. > > It looks like the read actually completed (since MII_DATA contains > the register data) but we somehow lost the interrupt (or maybe > received the interrupt after wait_for_completion_timeout() timed > out.) Hi Russell The timeout is quite short, #define FEC_MII_TIMEOUT 30000 /* us */ Looking at the Vybrid datasheet, there does not appear to be any way to determine if the hardware is busy other than waiting for the interrupt. There is no 'busy' bit which gets cleared on completion. So about the only option is to make the timeout bigger. Andrew