* J1939 crash after receiving BAM frame and no DT frame @ 2015-11-20 15:24 laurent vaudoit 2015-11-23 12:51 ` Kurt Van Dijck 2015-11-23 20:09 ` Kurt Van Dijck 0 siblings, 2 replies; 9+ messages in thread From: laurent vaudoit @ 2015-11-20 15:24 UTC (permalink / raw) To: linux-can Hi all, i'm facing some issue on my imx6 board, with j1939 patch. i setup my can interface with ip link command and then i setup this interface in j1939 ip link set can0 up type can bitrate 250000 ip link set can0 j1939 on and after i launch a candump on can0 On can0, i play an can bus trace (logged from a vehicule). at the end of the trace, i have a bam frame without the DT frame associated on the console, i see the log 1939tprx_task timeout on 2 (it seems normal) and after, i have a rando crash of my system (not in can stuffs, sometimes a reset, sometimes nand driver...) if i remove the BAM frame, problem does not appear. if i do not set can0 j1939 on, problem does not appear so it seems the problem is related to j1939 timeout Is someone yet encountered this problem? my kernel is a 3.10 kernel, with j1939 patch Thanks in advance Best regards Laurent ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: J1939 crash after receiving BAM frame and no DT frame 2015-11-20 15:24 J1939 crash after receiving BAM frame and no DT frame laurent vaudoit @ 2015-11-23 12:51 ` Kurt Van Dijck 2015-11-23 20:09 ` Kurt Van Dijck 1 sibling, 0 replies; 9+ messages in thread From: Kurt Van Dijck @ 2015-11-23 12:51 UTC (permalink / raw) To: laurent vaudoit; +Cc: linux-can --- Original message --- > Date: Fri, 20 Nov 2015 15:24:57 +0000 (UTC) > From: laurent vaudoit <laurent.vaudoit@gmail.com> > To: linux-can@vger.kernel.org > Subject: J1939 crash after receiving BAM frame and no DT frame > User-Agent: Loom/3.14 (http://gmane.org/) > > Hi all, > i'm facing some issue on my imx6 board, with j1939 patch. > > i setup my can interface with ip link command > and then i setup this interface in j1939 > > ip link set can0 up type can bitrate 250000 > ip link set can0 j1939 on > > and after i launch a candump on can0 > > On can0, i play an can bus trace (logged from a vehicule). > at the end of the trace, i have a bam frame without the DT frame associated > > on the console, i see the log > 1939tprx_task timeout on 2 (it seems normal) > and after, i have a rando crash of my system (not in can stuffs, sometimes > a reset, sometimes nand driver...) > > if i remove the BAM frame, problem does not appear. > if i do not set can0 j1939 on, problem does not appear > so it seems the problem is related to j1939 timeout > > Is someone yet encountered this problem? I think I never tried this (sending only the BAM without the DATA). I will take a look one of these days. Kurt > > my kernel is a 3.10 kernel, with j1939 patch > > Thanks in advance > Best regards > > Laurent > > -- > To unsubscribe from this list: send the line "unsubscribe linux-can" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: J1939 crash after receiving BAM frame and no DT frame 2015-11-20 15:24 J1939 crash after receiving BAM frame and no DT frame laurent vaudoit 2015-11-23 12:51 ` Kurt Van Dijck @ 2015-11-23 20:09 ` Kurt Van Dijck 2015-11-25 10:28 ` laurent vaudoit 1 sibling, 1 reply; 9+ messages in thread From: Kurt Van Dijck @ 2015-11-23 20:09 UTC (permalink / raw) To: laurent vaudoit; +Cc: linux-can > Hi all, > i'm facing some issue on my imx6 board, with j1939 patch. > > i setup my can interface with ip link command > and then i setup this interface in j1939 > > ip link set can0 up type can bitrate 250000 > ip link set can0 j1939 on > > and after i launch a candump on can0 > > On can0, i play an can bus trace (logged from a vehicule). > at the end of the trace, i have a bam frame without the DT frame associated > > on the console, i see the log > 1939tprx_task timeout on 2 (it seems normal) > and after, i have a rando crash of my system (not in can stuffs, sometimes > a reset, sometimes nand driver...) > > if i remove the BAM frame, problem does not appear. > if i do not set can0 j1939 on, problem does not appear > so it seems the problem is related to j1939 timeout I was not in a position to reproduce your scenario. I suspect a locking mistake of mine. Can you try this patch? diff --git a/net/can/j1939/transport.c b/net/can/j1939/transport.c index 5c0bdc5..0bc1edb 100644 --- a/net/can/j1939/transport.c +++ b/net/can/j1939/transport.c @@ -166,9 +166,9 @@ static void put_session(struct session *session) tasklet_disable_nosync(&session->txtask); if (in_interrupt()) { - spin_lock_bh(&s.del.lock); + spin_lock(&s.del.lock); list_add_tail(&session->list, &s.del.sessionq); - spin_unlock_bh(&s.del.lock); + spin_unlock(&s.del.lock); schedule_work(&s.del.work); } else { /* destroy session right here */ -- You may also enable CONFIG_DEBUG_SPINLOCK > > Is someone yet encountered this problem? > > my kernel is a 3.10 kernel, with j1939 patch > > Thanks in advance > Best regards > > Laurent > > -- > To unsubscribe from this list: send the line "unsubscribe linux-can" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: J1939 crash after receiving BAM frame and no DT frame 2015-11-23 20:09 ` Kurt Van Dijck @ 2015-11-25 10:28 ` laurent vaudoit 2015-11-25 13:26 ` Kurt Van Dijck 0 siblings, 1 reply; 9+ messages in thread From: laurent vaudoit @ 2015-11-25 10:28 UTC (permalink / raw) To: linux-can, Kurt Van Dijck Hi Kurt On Mon, Nov 23, 2015 at 9:09 PM, Kurt Van Dijck <dev.kurt@vandijck-laurijssen.be> wrote: >> Hi all, >> i'm facing some issue on my imx6 board, with j1939 patch. >> >> i setup my can interface with ip link command >> and then i setup this interface in j1939 >> >> ip link set can0 up type can bitrate 250000 >> ip link set can0 j1939 on >> >> and after i launch a candump on can0 >> >> On can0, i play an can bus trace (logged from a vehicule). >> at the end of the trace, i have a bam frame without the DT frame associated >> >> on the console, i see the log >> 1939tprx_task timeout on 2 (it seems normal) >> and after, i have a rando crash of my system (not in can stuffs, sometimes >> a reset, sometimes nand driver...) >> >> if i remove the BAM frame, problem does not appear. >> if i do not set can0 j1939 on, problem does not appear >> so it seems the problem is related to j1939 timeout > > I was not in a position to reproduce your scenario. > > I suspect a locking mistake of mine. > Can you try this patch? > > diff --git a/net/can/j1939/transport.c b/net/can/j1939/transport.c > index 5c0bdc5..0bc1edb 100644 > --- a/net/can/j1939/transport.c > +++ b/net/can/j1939/transport.c > @@ -166,9 +166,9 @@ static void put_session(struct session *session) > tasklet_disable_nosync(&session->txtask); > > if (in_interrupt()) { > - spin_lock_bh(&s.del.lock); > + spin_lock(&s.del.lock); > list_add_tail(&session->list, &s.del.sessionq); > - spin_unlock_bh(&s.del.lock); > + spin_unlock(&s.del.lock); > schedule_work(&s.del.work); > } else { > /* destroy session right here */ > > -- > > You may also enable CONFIG_DEBUG_SPINLOCK i've tried the patch, but in fact in_interrupt() return 0 so i do no enter in this code after more debug, the problem is in 1939session_destroy, during the first call to tasklet_disable() by the way problem occurs with rtpatch applied, and does not without. i have found a workaround, working with rtpatch, but not sure if this can be considered as a bug fix or not. in function j1939tp_rxtask, there is a call to j1939session_cancel, who call put_session and at the end of j1939tp_rxtask, we call put_session. if i comment this "second call" to put_session, the problem disappear. rt_patch makes lots of improvment in tasklet management. please let me know if this part of code is usefull (2 call to put_session instead of 1) Regards Laurent > >> >> Is someone yet encountered this problem? >> >> my kernel is a 3.10 kernel, with j1939 patch >> >> Thanks in advance >> Best regards >> >> Laurent >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-can" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: J1939 crash after receiving BAM frame and no DT frame 2015-11-25 10:28 ` laurent vaudoit @ 2015-11-25 13:26 ` Kurt Van Dijck 2015-11-25 13:39 ` laurent vaudoit 0 siblings, 1 reply; 9+ messages in thread From: Kurt Van Dijck @ 2015-11-25 13:26 UTC (permalink / raw) To: laurent vaudoit; +Cc: linux-can --- Original message --- > Date: Wed, 25 Nov 2015 11:28:30 +0100 > From: laurent vaudoit <laurent.vaudoit@gmail.com> > To: linux-can@vger.kernel.org, Kurt Van Dijck > <dev.kurt@vandijck-laurijssen.be> > Subject: Re: J1939 crash after receiving BAM frame and no DT frame > > Hi Kurt > > On Mon, Nov 23, 2015 at 9:09 PM, Kurt Van Dijck > <dev.kurt@vandijck-laurijssen.be> wrote: > >> Hi all, > >> i'm facing some issue on my imx6 board, with j1939 patch. > >> > >> i setup my can interface with ip link command > >> and then i setup this interface in j1939 > >> > >> ip link set can0 up type can bitrate 250000 > >> ip link set can0 j1939 on > >> > >> and after i launch a candump on can0 > >> > >> On can0, i play an can bus trace (logged from a vehicule). > >> at the end of the trace, i have a bam frame without the DT frame associated > >> > >> on the console, i see the log > >> 1939tprx_task timeout on 2 (it seems normal) > >> and after, i have a rando crash of my system (not in can stuffs, sometimes > >> a reset, sometimes nand driver...) > >> > >> if i remove the BAM frame, problem does not appear. > >> if i do not set can0 j1939 on, problem does not appear > >> so it seems the problem is related to j1939 timeout > > > > I was not in a position to reproduce your scenario. > > > > I suspect a locking mistake of mine. > > Can you try this patch? > > > > diff --git a/net/can/j1939/transport.c b/net/can/j1939/transport.c > > index 5c0bdc5..0bc1edb 100644 > > --- a/net/can/j1939/transport.c > > +++ b/net/can/j1939/transport.c > > @@ -166,9 +166,9 @@ static void put_session(struct session *session) > > tasklet_disable_nosync(&session->txtask); > > > > if (in_interrupt()) { > > - spin_lock_bh(&s.del.lock); > > + spin_lock(&s.del.lock); > > list_add_tail(&session->list, &s.del.sessionq); > > - spin_unlock_bh(&s.del.lock); > > + spin_unlock(&s.del.lock); > > schedule_work(&s.del.work); > > } else { > > /* destroy session right here */ > > > > -- > > > > You may also enable CONFIG_DEBUG_SPINLOCK > > i've tried the patch, but in fact in_interrupt() return 0 so > i do no enter in this code > after more debug, the problem is in > 1939session_destroy, during the first call to tasklet_disable() > > > by the way problem occurs with rtpatch applied, and does not without. Well, that explains me not having seen this. > > i have found a workaround, working with rtpatch, but not sure if this > can be considered as a bug fix or not. > in function j1939tp_rxtask, there is a call to j1939session_cancel, > who call put_session > and at the end of j1939tp_rxtask, we call put_session. > if i comment this "second call" to put_session, the problem disappear. It seems that you added a memory leak because put_session if a ref counter. You keep the session installed ... forever, or until a next similar session is about to be created. I'm not sure either that this is the correct way to proceed. > > rt_patch makes lots of improvment in tasklet management. > please let me know if this part of code is usefull (2 call to > put_session instead of 1) > > Regards > Laurent > > > >> > >> Is someone yet encountered this problem? > >> > >> my kernel is a 3.10 kernel, with j1939 patch > >> > >> Thanks in advance > >> Best regards > >> > >> Laurent > >> > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-can" in > >> the body of a message to majordomo@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: J1939 crash after receiving BAM frame and no DT frame 2015-11-25 13:26 ` Kurt Van Dijck @ 2015-11-25 13:39 ` laurent vaudoit 2015-11-25 14:59 ` Marc Kleine-Budde 0 siblings, 1 reply; 9+ messages in thread From: laurent vaudoit @ 2015-11-25 13:39 UTC (permalink / raw) To: linux-can, Kurt Van Dijck On Wed, Nov 25, 2015 at 2:26 PM, Kurt Van Dijck <dev.kurt@vandijck-laurijssen.be> wrote: > > --- Original message --- >> Date: Wed, 25 Nov 2015 11:28:30 +0100 >> From: laurent vaudoit <laurent.vaudoit@gmail.com> >> To: linux-can@vger.kernel.org, Kurt Van Dijck >> <dev.kurt@vandijck-laurijssen.be> >> Subject: Re: J1939 crash after receiving BAM frame and no DT frame >> >> Hi Kurt >> >> On Mon, Nov 23, 2015 at 9:09 PM, Kurt Van Dijck >> <dev.kurt@vandijck-laurijssen.be> wrote: >> >> Hi all, >> >> i'm facing some issue on my imx6 board, with j1939 patch. >> >> >> >> i setup my can interface with ip link command >> >> and then i setup this interface in j1939 >> >> >> >> ip link set can0 up type can bitrate 250000 >> >> ip link set can0 j1939 on >> >> >> >> and after i launch a candump on can0 >> >> >> >> On can0, i play an can bus trace (logged from a vehicule). >> >> at the end of the trace, i have a bam frame without the DT frame associated >> >> >> >> on the console, i see the log >> >> 1939tprx_task timeout on 2 (it seems normal) >> >> and after, i have a rando crash of my system (not in can stuffs, sometimes >> >> a reset, sometimes nand driver...) >> >> >> >> if i remove the BAM frame, problem does not appear. >> >> if i do not set can0 j1939 on, problem does not appear >> >> so it seems the problem is related to j1939 timeout >> > >> > I was not in a position to reproduce your scenario. >> > >> > I suspect a locking mistake of mine. >> > Can you try this patch? >> > >> > diff --git a/net/can/j1939/transport.c b/net/can/j1939/transport.c >> > index 5c0bdc5..0bc1edb 100644 >> > --- a/net/can/j1939/transport.c >> > +++ b/net/can/j1939/transport.c >> > @@ -166,9 +166,9 @@ static void put_session(struct session *session) >> > tasklet_disable_nosync(&session->txtask); >> > >> > if (in_interrupt()) { >> > - spin_lock_bh(&s.del.lock); >> > + spin_lock(&s.del.lock); >> > list_add_tail(&session->list, &s.del.sessionq); >> > - spin_unlock_bh(&s.del.lock); >> > + spin_unlock(&s.del.lock); >> > schedule_work(&s.del.work); >> > } else { >> > /* destroy session right here */ >> > >> > -- >> > >> > You may also enable CONFIG_DEBUG_SPINLOCK >> >> i've tried the patch, but in fact in_interrupt() return 0 so >> i do no enter in this code >> after more debug, the problem is in >> 1939session_destroy, during the first call to tasklet_disable() >> >> >> by the way problem occurs with rtpatch applied, and does not without. > > Well, that explains me not having seen this. > >> >> i have found a workaround, working with rtpatch, but not sure if this >> can be considered as a bug fix or not. >> in function j1939tp_rxtask, there is a call to j1939session_cancel, >> who call put_session >> and at the end of j1939tp_rxtask, we call put_session. >> if i comment this "second call" to put_session, the problem disappear. > > It seems that you added a memory leak because put_session if a ref > counter. You keep the session installed ... forever, or until a next > similar session is about to be created. > > I'm not sure either that this is the correct way to proceed. i'm not sure i understand well. this seems weird for me to have one get_session and 2 put_session should'nt we have one get_session and one put_session only? (this is what i tried)? > >> >> rt_patch makes lots of improvment in tasklet management. >> please let me know if this part of code is usefull (2 call to >> put_session instead of 1) >> >> Regards >> Laurent >> > >> >> >> >> Is someone yet encountered this problem? >> >> >> >> my kernel is a 3.10 kernel, with j1939 patch >> >> >> >> Thanks in advance >> >> Best regards >> >> >> >> Laurent >> >> >> >> -- >> >> To unsubscribe from this list: send the line "unsubscribe linux-can" in >> >> the body of a message to majordomo@vger.kernel.org >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: J1939 crash after receiving BAM frame and no DT frame 2015-11-25 13:39 ` laurent vaudoit @ 2015-11-25 14:59 ` Marc Kleine-Budde 2015-11-25 15:16 ` laurent vaudoit 0 siblings, 1 reply; 9+ messages in thread From: Marc Kleine-Budde @ 2015-11-25 14:59 UTC (permalink / raw) To: laurent vaudoit, linux-can, Kurt Van Dijck [-- Attachment #1: Type: text/plain, Size: 1214 bytes --] On 11/25/2015 02:39 PM, laurent vaudoit wrote: >>> i have found a workaround, working with rtpatch, but not sure if this >>> can be considered as a bug fix or not. >>> in function j1939tp_rxtask, there is a call to j1939session_cancel, >>> who call put_session >>> and at the end of j1939tp_rxtask, we call put_session. >>> if i comment this "second call" to put_session, the problem disappear. >> >> It seems that you added a memory leak because put_session if a ref >> counter. You keep the session installed ... forever, or until a next >> similar session is about to be created. >> >> I'm not sure either that this is the correct way to proceed. > > i'm not sure i understand well. > this seems weird for me to have one get_session and 2 put_session > should'nt we have one get_session and one put_session only? (this is > what i tried)? Maybe copy/pasting your patch here make the discussion easier :) Marc -- Pengutronix e.K. | Marc Kleine-Budde | Industrial Linux Solutions | Phone: +49-231-2826-924 | Vertretung West/Dortmund | Fax: +49-5121-206917-5555 | Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de | [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: J1939 crash after receiving BAM frame and no DT frame 2015-11-25 14:59 ` Marc Kleine-Budde @ 2015-11-25 15:16 ` laurent vaudoit 2015-12-04 14:04 ` Kurt Van Dijck 0 siblings, 1 reply; 9+ messages in thread From: laurent vaudoit @ 2015-11-25 15:16 UTC (permalink / raw) To: Marc Kleine-Budde; +Cc: linux-can, Kurt Van Dijck Hi Marc, the "patch" tried below On Wed, Nov 25, 2015 at 3:59 PM, Marc Kleine-Budde <mkl@pengutronix.de> wrote: > On 11/25/2015 02:39 PM, laurent vaudoit wrote: >>>> i have found a workaround, working with rtpatch, but not sure if this >>>> can be considered as a bug fix or not. >>>> in function j1939tp_rxtask, there is a call to j1939session_cancel, >>>> who call put_session >>>> and at the end of j1939tp_rxtask, we call put_session. >>>> if i comment this "second call" to put_session, the problem disappear. >>> >>> It seems that you added a memory leak because put_session if a ref >>> counter. You keep the session installed ... forever, or until a next >>> similar session is about to be created. >>> >>> I'm not sure either that this is the correct way to proceed. >> >> i'm not sure i understand well. >> this seems weird for me to have one get_session and 2 put_session >> should'nt we have one get_session and one put_session only? (this is >> what i tried)? > > Maybe copy/pasting your patch here make the discussion easier :) diff --git a/net/can/sj1939/transport.c b/net/can/sj1939/transport.c index aa395b0..8d79949 100644 --- a/net/can/sj1939/transport.c +++ b/net/can/sj1939/transport.c @@ -165,6 +165,7 @@ static void put_session(struct session *session) tasklet_disable_nosync(&session->rxtask); tasklet_disable_nosync(&session->txtask); + if (in_interrupt()) { spin_lock_bh(&s.del.lock); list_add_tail(&session->list, &s.del.sessionq); @@ -511,7 +512,6 @@ static void j1939tp_rxtask(unsigned long val) get_session(session); pr_alert("%s: timeout on %i\n", __func__, session->cb->ifindex); j1939session_cancel(session, ABORT_TIMEOUT); - put_session(session); } /* > > Marc Laurent > > -- > Pengutronix e.K. | Marc Kleine-Budde | > Industrial Linux Solutions | Phone: +49-231-2826-924 | > Vertretung West/Dortmund | Fax: +49-5121-206917-5555 | > Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de | > ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: J1939 crash after receiving BAM frame and no DT frame 2015-11-25 15:16 ` laurent vaudoit @ 2015-12-04 14:04 ` Kurt Van Dijck 0 siblings, 0 replies; 9+ messages in thread From: Kurt Van Dijck @ 2015-12-04 14:04 UTC (permalink / raw) To: laurent vaudoit; +Cc: Marc Kleine-Budde, linux-can > On Wed, Nov 25, 2015 at 3:59 PM, Marc Kleine-Budde <mkl@pengutronix.de> wrote: > > On 11/25/2015 02:39 PM, laurent vaudoit wrote: > >>>> i have found a workaround, working with rtpatch, but not sure if this > >>>> can be considered as a bug fix or not. > >>>> in function j1939tp_rxtask, there is a call to j1939session_cancel, > >>>> who call put_session > >>>> and at the end of j1939tp_rxtask, we call put_session. > >>>> if i comment this "second call" to put_session, the problem disappear. > >>> > >>> It seems that you added a memory leak because put_session if a ref > >>> counter. You keep the session installed ... forever, or until a next > >>> similar session is about to be created. > >>> > >>> I'm not sure either that this is the correct way to proceed. > >> > >> i'm not sure i understand well. > >> this seems weird for me to have one get_session and 2 put_session > >> should'nt we have one get_session and one put_session only? (this is > >> what i tried)? When you come to the point where you want to get rid of the session, you need decrement the refcounter to become zero. This extra put_session compensates the initial 1 in the refcounter. I still don't see where it goes wrong. Can you experiment with disabling individual lines in j1939session_destroy()? Since problems occur only later, I suspect that the call to skb_free is causing the issue, but I'm not sure. Kurt ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-12-04 14:05 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-11-20 15:24 J1939 crash after receiving BAM frame and no DT frame laurent vaudoit 2015-11-23 12:51 ` Kurt Van Dijck 2015-11-23 20:09 ` Kurt Van Dijck 2015-11-25 10:28 ` laurent vaudoit 2015-11-25 13:26 ` Kurt Van Dijck 2015-11-25 13:39 ` laurent vaudoit 2015-11-25 14:59 ` Marc Kleine-Budde 2015-11-25 15:16 ` laurent vaudoit 2015-12-04 14:04 ` Kurt Van Dijck
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).