From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9348DC25B76 for ; Thu, 30 May 2024 16:56:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:From:References:Cc:To: Subject:MIME-Version:Date:Message-ID:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=qk+fuRJi/+FCef/+662lmgizOcyzIMUe0rLQTrV97Uw=; b=siXM7OckAJuD9c 6tQ3QA7mu9KbPTF7xnWtsIH3thpQWzcfp6NvtqaGd6CEU/IYnWDws9wt3tCj7OxJVMwh+RQoZ27Vh /ehiJSKQvv2uaJAjQDRDp0b2/lqQUn0MIAw28638o6PJhDIMYSgk5HxiKmn1KTBBT+XPByQnCPU9b TKl6nfDSHgHPHPayQL/U7e5c94zgmlYiR8BNpqN+WV12Ar7blbjHt+75/fsnhbrc6+Na71fdZZf0w vprsV4s9CD+skfk8hCWQFpE4U5HuvoW1ISmyokNVbW72I8C5WPT9aGhbo79ZM0nlIXU+OEJ558PHL lSrsbRZzCjs96KadE8pA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sCj4q-00000007uko-2rUu; Thu, 30 May 2024 16:56:32 +0000 Received: from out-177.mta1.migadu.com ([95.215.58.177]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1sCj4m-00000007ui3-38uR for linux-arm-kernel@lists.infradead.org; Thu, 30 May 2024 16:56:30 +0000 X-Envelope-To: andrew@lunn.ch DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1717088183; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2ct3hM3Ww+WHMzbbEaPHOdhaeIFR0TgxFOCopIwLsWM=; b=F6jDcoqx6HAHd7R/e3fFvka+K1XKYaWdL46hXt0ONACvxsu8C54cjfGYUqAOmvFFs4zprd io5fsrew2rK0YFbM4Q4fThuW6oY8EtHQfqv2swgJGUMYPWlsVR1P1GiktNTuSUpZXjy/M9 3+z9S53UpRvXVYZB1XIScdLDA9DSufQ= X-Envelope-To: linux@armlinux.org.uk X-Envelope-To: alex.williams@ni.com X-Envelope-To: andi.shyti@kernel.org X-Envelope-To: netdev@vger.kernel.org X-Envelope-To: linux-i2c@vger.kernel.org X-Envelope-To: michal.simek@amd.com X-Envelope-To: hkallweit1@gmail.com X-Envelope-To: linux-arm-kernel@lists.infradead.org X-Envelope-To: linux-kernel@vger.kernel.org Message-ID: <93e8839d-e712-4708-a2ca-df81051b8360@linux.dev> Date: Thu, 30 May 2024 12:56:18 -0400 MIME-Version: 1.0 Subject: Re: [BUG] SFP I2C timeout forces link down with PHY_ERROR To: Andrew Lunn Cc: "Russell King (Oracle)" , Alex Williams , Andi Shyti , "netdev@vger.kernel.org" , linux-i2c@vger.kernel.org, Michal Simek , Heiner Kallweit , "linux-arm-kernel@lists.infradead.org" , "linux-kernel@vger.kernel.org" References: <90873b78-13ba-445e-890a-0b90a653721b@linux.dev> <1398a492-95aa-46d9-b52b-a374fd6e9e77@lunn.ch> Content-Language: en-US X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Sean Anderson In-Reply-To: <1398a492-95aa-46d9-b52b-a374fd6e9e77@lunn.ch> X-Migadu-Flow: FLOW_OUT X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240530_095629_196139_2CD3A115 X-CRM114-Status: GOOD ( 32.86 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On 5/28/24 14:14, Andrew Lunn wrote: > On Tue, May 28, 2024 at 01:52:56PM -0400, Sean Anderson wrote: >> (forgot to CC Alex) >> >> On 5/28/24 13:50, Sean Anderson wrote: >> > On 5/28/24 13:28, Russell King (Oracle) wrote: >> >> First, note that phylib's policy is if it loses comms with the PHY, >> >> then the link will be forced down. This is out of control of the SFP >> >> or phylink code. >> >> >> >> I've seen bugs with the I2C emulation on some modules resulting in >> >> problems with various I2C controllers. >> >> >> >> Sometimes the problem is due to a bad I2C level shifter. Some I2C >> >> level shifter manufacturers will swear blind that their shifter >> >> doesn't lock up, but strangely, one can prove with an osciloscope >> >> that it _does_ lock up - and in a way that the only way to recover >> >> was to possibly unplug the module or poewr cycle the platform. >> > >> > Well, I haven't seen any case where the bus locks up. I've been able to >> > recover just by doing >> > >> > ip link set net0 down >> > ip link set net0 up >> > >> > which suggests that this is just a transient problem. > > If you look back over the history, i don't think you will find any > reports to transient problems with real MDIO busses. Hence any error > is considered fatal. Also, when you consider the design of MDIO, it is > actually very hard for an error to be detected. It is basically a > shift register, shifting out 64 bits for a write, or 48 bits for a > read, followed by receiving 16 bits for a read. There is no protocol > to indicate any sort of error. If there is no device at the address, > the pullup means you receive 1s. End of story. Yes, I would expect the only time there could be transient problems would be with external MII (such as if someone jiggled the phy). > With MDIO over I2C, it is I2C which has problems, not MDIO. Do you > expect transient problems with I2C? Well, I2C is known to have devices which can get stuck and hang the bus (generally requiring some bit-banging from Linux to get things unstuck, or a reset of the device). So while I2C (like MDIO) is supposed to be completely reliable, there is a history of it being not quite perfect. That said, I did not expect to see these kinds of errors at all. I'll have a closer look at the controller driver when I have the time. Maybe there is some errata for this... > I would also point out that MDIO is not idempotent. Reading an > interrupt status register often clears it. Reading the link status > clears the latched link status. If you need to retry the read of the > interrupt status register, you cannot, the interrupt has been cleared, > you have lost it, and probably your hardware no longer works because > you don't know what interrupt to handle.... If you need to re-read the > link status, you have lost the latched version, and you have missed a > up or down event. Yes. Same thing with I2C. >> >> My advice would be to investigate the hardware in the first instance. > > I agree with Russell. Figure out why I2C is flaky. Since this is an > SFP it maybe something as trivial as the contacts need cleaning. Or > the resistors are wrong, or you have a cheap module which is out of > spec. OK, I'll try to dig into this a little more... --Sean _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel