Linux CAN drivers development
 help / color / mirror / Atom feed
From: Oleksij Rempel <o.rempel@pengutronix.de>
To: "Hölzl, Alexander" <alexander.hoelzl@gmx.net>
Cc: robin@protonic.nl, linux-kernel@vger.kernel.org,
	kernel@pengutronix.de, linux-can@vger.kernel.org
Subject: Re: [PATCH] can: j1939: fix wrong rx timeout for CTS hold messages
Date: Thu, 23 Apr 2026 15:07:43 +0200	[thread overview]
Message-ID: <aeoZn2BIOzZyCWo_@pengutronix.de> (raw)
In-Reply-To: <3e17efb4-ae71-4b5c-af23-7b5de9c5e03c@gmx.net>

On Thu, Apr 23, 2026 at 11:35:27AM +0200, Hölzl, Alexander wrote:
> 
> Hello Oleksij,
> thank you for your quick review!
> 
> Am 23.04.2026 um 05:50 schrieb Oleksij Rempel:
> > Hi Alexander,
> > 
> > On Tue, Apr 21, 2026 at 05:31:54PM +0200, Alexander Hölzl wrote:
> > > In J1939 segmented transport, a CTS message with data byte 2 set to zero is interpreted as a hold message.
> > > This instructs the transmitter of the segmented message to hold the connection open but to delay sending.
> > > According to the J1939-21 standard, section 5.10.2.4 the timeout T4 after which an held open session is invalidated is
> > > 1050 ms, not 550 as implemented currently.
> > > The 550 ms are problematic if a device uses hold messages and assumes it can wait for more than 550 ms before it has
> > > to resend the hold message.
> > > 
> > > This patch changes the T4 timeout used in the implementation from 550 ms to 1050.
> > > 
> > > Signed-off-by: Alexander Hölzl <alexander.hoelzl@gmx.net>
> > 
> > LGTM. Thank you!
> > 
> > Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
> > 
> > 
> > Sashico detected one more potential issue, not related to this patch:
> > https://sashiko.dev/#/patchset/20260421153152.87772-3-alexander.hoelzl%40gmx.net
> > 
> > If you have time, can you please verify it?
> I just tried it and to be honest it seems that holds are fundamentally
> broken currently. I don't think there is any way to restart normal
> communication as soon as a hold has been received.
> 
> When I send a hold with byte 3 set to FF and try to resume from sequence
> number 1 I get an abort with reason 08 which is "Duplicate sequence number"
> according to the spec:
> (000.000000)  can0  18EC31F9   [8]  10 0A 00 02 02 00 AB 00
> (000.001166)  can0  18ECF931   [8]  11 00 FF FF FF 00 AB 00
> (000.101138)  can0  18ECF931   [8]  11 02 01 FF FF 00 AB 00
> (000.000685)  can0  18EC31F9   [8]  FF 08 FF FF FF 00 AB 00
> 
> The same happens when setting byte 3 to 01:
> (000.000000)  can0  18EC31F9   [8]  10 0A 00 02 02 00 AB 00
> (000.001077)  can0  18ECF931   [8]  11 00 01 FF FF 00 AB 00
> (000.100910)  can0  18ECF931   [8]  11 02 01 FF FF 00 AB 00
> (000.000657)  can0  18EC31F9   [8]  FF 08 FF FF FF 00 AB 00
> 
> Setting it to 0 is disallowed as well and the transmission is cancelled
> immediatley with error 05 which is "Maximum retransmit request limit
> reached.":
> (000.000000)  can0  18EC31F9   [8]  10 0A 00 02 02 00 AB 00
> (000.000941)  can0  18ECF931   [8]  11 00 00 FF FF 00 AB 00
> (000.000645)  can0  18EC31F9   [8]  FF 05 FF FF FF 00 AB 00
> 
> There is a check at the beggining of j1939_xtp_rx_cts_one for duplicate
> sequence numbers which targets byte 0, so the command type byte, and checks
> that it is not equal to the last command.
> 
> if (session->last_cmd == dat[0]) {
> 		err = J1939_XTP_ABORT_DUP_SEQ;
> 		goto out_session_cancel;
> 	}
> 
> This means it is impossible to handle two directly succeeding CTS which
> would be necessary to escape the hold....
> 
> The easiest way to fix this would probably be to move the check for a hold
> message all the way to the top of j1939_xtp_rx_cts_one and if a hold message
> has been received just set the rx-timeout timer and then bail?

From a quick lock, it sounds plausible. Will you send a patch?

Hm... we needs tests, preferably in kernel source to avoid regressions.

would it be possible to implement is on top of kunit tests?
https://lore.kernel.org/all/20260420152228.581421-1-o.rempel@pengutronix.de/

It looks like there is more user space friendly testing used:
https://lore.kernel.org/all/20260419144600.GA4091724@chcpu16/


-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

  reply	other threads:[~2026-04-23 13:07 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-21 15:31 [PATCH] can: j1939: fix wrong rx timeout for CTS hold messages Alexander Hölzl
2026-04-23  3:50 ` Oleksij Rempel
2026-04-23  9:35   ` Hölzl, Alexander
2026-04-23 13:07     ` Oleksij Rempel [this message]
2026-04-23 13:34       ` Hölzl, Alexander
2026-05-06 12:54 ` Marc Kleine-Budde

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aeoZn2BIOzZyCWo_@pengutronix.de \
    --to=o.rempel@pengutronix.de \
    --cc=alexander.hoelzl@gmx.net \
    --cc=kernel@pengutronix.de \
    --cc=linux-can@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=robin@protonic.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox