From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wolfgang Grandegger Subject: Re: pch_can: Data transmission stops after dropped packet Date: Mon, 26 Nov 2012 19:13:54 +0100 Message-ID: <50B3B162.9020905@grandegger.com> References: <50AAA8C8.2080504@grandegger.com> <50ABABDE.8060503@grandegger.com> <50ABF09C.8040303@grandegger.com> <50ACABE2.2020306@grandegger.com> <50ACF9C0.8050206@grandegger.com> <50AD042B.3020305 @grandegger.com> <50AD319E.2000209@grandegger.com> <50AF8C01.6060 809@grandegger.com> <50AFABB1.7080 507@grandegger.com> <50AFAFF0.9030706@grandegger.com> <50B2449B.8060708@grandegger.com> <50B38AFB.70209@grandegger.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: Received: from ngcobalt02.manitu.net ([217.11.48.102]:51540 "EHLO ngcobalt02.manitu.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756577Ab2KZSN5 (ORCPT ); Mon, 26 Nov 2012 13:13:57 -0500 In-Reply-To: Sender: linux-can-owner@vger.kernel.org List-ID: To: Michael Pellegrini Cc: linux-can@vger.kernel.org On 11/26/2012 06:30 PM, Michael Pellegrini wrote: > Wolfgang Grandegger grandegger.com> writes: > >> Not too bad! The return does only happen at high load. When you >> "ifconfig up" the device some kernel messages are printed. Could you >> please show them. I want to understand if the reset really occurs by >> checking some register values. > > "ifconfig can0 down" followed by "ifconfig can0 up" produces the following dmesg > output: > > [10113.228189] CTRL_REG=0x1 > [10113.228213] BTR_REG =0x2301 > [10113.228230] TEST_REG =0x80 These are the correct reset values. Therefore, a reset did occur. > Note that I used v7 of the driver to get this data. > >>> I tried the PCH driver and hit the transmission failure within a minute. >> >> Ah. In the function pch_xmit(), could you please move >> >> spin_unlock_irqrestore(&priv->lock, flags); >> >> to the end of the function just before >> >> return NETDEV_TX_OK; >> >> and then retry. This would fix races with accessing the message ram as >> well (via pch_can_rw_msg_obj). I missed that. > > Alright, I applied the following patch: > > *** ../c-can-pci-v7/pch_can.c 2012-11-25 05:09:13.000000000 -0500 > --- ./pch_can.c 2012-11-26 11:29:11.350012074 -0500 > *************** static netdev_tx_t pch_xmit(struct sk_bu > *** 921,928 **** > priv->tx_obj++; > } > > - spin_unlock_irqrestore(&priv->lock, flags); > - > /* Setting the CMASK register. */ > pch_can_bit_set(&priv->regs->ifregs[1].cmask, PCH_CMASK_ALL); > > --- 921,926 ---- > *************** static netdev_tx_t pch_xmit(struct sk_bu > *** 957,962 **** > --- 955,962 ---- > > pch_can_rw_msg_obj(&priv->regs->ifregs[1].creq, tx_obj_no); > > + spin_unlock_irqrestore(&priv->lock, flags); > + > return NETDEV_TX_OK; > } > > The patched driver did not fail in the first few minutes, so that's a good sign. > I will run this driver overnight. OK, thank. I think the problem is fixed. >>> I'm happy to test out more changes to this driver if you think it is worth >>> pursuing. >> >> Remote debugging is slow, unfortunately. Thanks for your patience. > > No problem. I'm just thankful that the problem is getting addressed. > >>> I started a test with the new c_can driver. I'll check on it throughout >>> the day and let it run overnight as well. >> >> OK, apart from the return issue above the driver has not changed from >> the functional point of view. > > Alright, I will wait until more substantial changes are implemented before > re-running the long-term test on this driver. Yes, maybe we can come up with an even better solution. Wolfgang.