[PATCH] can: j1939: fix wrong rx timeout for CTS hold messages

Linux CAN drivers development
 help / color / mirror / Atom feed

* [PATCH] can: j1939: fix wrong rx timeout for CTS hold messages
@ 2026-04-21 15:31 Alexander Hölzl
  2026-04-23  3:50 ` Oleksij Rempel
  2026-05-06 12:54 ` Marc Kleine-Budde
  0 siblings, 2 replies; 6+ messages in thread
From: Alexander Hölzl @ 2026-04-21 15:31 UTC (permalink / raw)
  To: robin, o.rempel; +Cc: kernel, linux-can, linux-kernel, Alexander Hölzl

In J1939 segmented transport, a CTS message with data byte 2 set to zero is interpreted as a hold message.
This instructs the transmitter of the segmented message to hold the connection open but to delay sending.
According to the J1939-21 standard, section 5.10.2.4 the timeout T4 after which an held open session is invalidated is
1050 ms, not 550 as implemented currently.
The 550 ms are problematic if a device uses hold messages and assumes it can wait for more than 550 ms before it has
to resend the hold message.

This patch changes the T4 timeout used in the implementation from 550 ms to 1050.

Signed-off-by: Alexander Hölzl <alexander.hoelzl@gmx.net>
---
 net/can/j1939/transport.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/can/j1939/transport.c b/net/can/j1939/transport.c
index df93d57907da..7ad56b5f17b9 100644
--- a/net/can/j1939/transport.c
+++ b/net/can/j1939/transport.c
@@ -1479,7 +1479,7 @@ j1939_xtp_rx_cts_one(struct j1939_session *session, struct sk_buff *skb)
 		}
 	} else {
 		/* CTS(0) */
-		j1939_tp_set_rxtimeout(session, 550);
+		j1939_tp_set_rxtimeout(session, 1050);
 	}
 	return;

-- 
2.43.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] can: j1939: fix wrong rx timeout for CTS hold messages
  2026-04-21 15:31 [PATCH] can: j1939: fix wrong rx timeout for CTS hold messages Alexander Hölzl
@ 2026-04-23  3:50 ` Oleksij Rempel
  2026-04-23  9:35   ` Hölzl, Alexander
  2026-05-06 12:54 ` Marc Kleine-Budde
  1 sibling, 1 reply; 6+ messages in thread
From: Oleksij Rempel @ 2026-04-23  3:50 UTC (permalink / raw)
  To: Alexander Hölzl; +Cc: robin, linux-kernel, kernel, linux-can

Hi Alexander,

On Tue, Apr 21, 2026 at 05:31:54PM +0200, Alexander Hölzl wrote:
> In J1939 segmented transport, a CTS message with data byte 2 set to zero is interpreted as a hold message.
> This instructs the transmitter of the segmented message to hold the connection open but to delay sending.
> According to the J1939-21 standard, section 5.10.2.4 the timeout T4 after which an held open session is invalidated is
> 1050 ms, not 550 as implemented currently.
> The 550 ms are problematic if a device uses hold messages and assumes it can wait for more than 550 ms before it has
> to resend the hold message.
> 
> This patch changes the T4 timeout used in the implementation from 550 ms to 1050.
> 
> Signed-off-by: Alexander Hölzl <alexander.hoelzl@gmx.net>

LGTM. Thank you!

Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>


Sashico detected one more potential issue, not related to this patch:
https://sashiko.dev/#/patchset/20260421153152.87772-3-alexander.hoelzl%40gmx.net

If you have time, can you please verify it?

Best Regards,
Oleksij
-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] can: j1939: fix wrong rx timeout for CTS hold messages
  2026-04-23  3:50 ` Oleksij Rempel
@ 2026-04-23  9:35   ` Hölzl, Alexander
  2026-04-23 13:07     ` Oleksij Rempel
  0 siblings, 1 reply; 6+ messages in thread
From: Hölzl, Alexander @ 2026-04-23  9:35 UTC (permalink / raw)
  To: Oleksij Rempel; +Cc: robin, linux-kernel, kernel, linux-can


Hello Oleksij,
thank you for your quick review!

Am 23.04.2026 um 05:50 schrieb Oleksij Rempel:
> Hi Alexander,
> 
> On Tue, Apr 21, 2026 at 05:31:54PM +0200, Alexander Hölzl wrote:
>> In J1939 segmented transport, a CTS message with data byte 2 set to zero is interpreted as a hold message.
>> This instructs the transmitter of the segmented message to hold the connection open but to delay sending.
>> According to the J1939-21 standard, section 5.10.2.4 the timeout T4 after which an held open session is invalidated is
>> 1050 ms, not 550 as implemented currently.
>> The 550 ms are problematic if a device uses hold messages and assumes it can wait for more than 550 ms before it has
>> to resend the hold message.
>>
>> This patch changes the T4 timeout used in the implementation from 550 ms to 1050.
>>
>> Signed-off-by: Alexander Hölzl <alexander.hoelzl@gmx.net>
> 
> LGTM. Thank you!
> 
> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
> 
> 
> Sashico detected one more potential issue, not related to this patch:
> https://sashiko.dev/#/patchset/20260421153152.87772-3-alexander.hoelzl%40gmx.net
> 
> If you have time, can you please verify it?
I just tried it and to be honest it seems that holds are fundamentally 
broken currently. I don't think there is any way to restart normal 
communication as soon as a hold has been received.

When I send a hold with byte 3 set to FF and try to resume from sequence 
number 1 I get an abort with reason 08 which is "Duplicate sequence 
number" according to the spec:
(000.000000)  can0  18EC31F9   [8]  10 0A 00 02 02 00 AB 00
(000.001166)  can0  18ECF931   [8]  11 00 FF FF FF 00 AB 00
(000.101138)  can0  18ECF931   [8]  11 02 01 FF FF 00 AB 00
(000.000685)  can0  18EC31F9   [8]  FF 08 FF FF FF 00 AB 00

The same happens when setting byte 3 to 01:
(000.000000)  can0  18EC31F9   [8]  10 0A 00 02 02 00 AB 00
(000.001077)  can0  18ECF931   [8]  11 00 01 FF FF 00 AB 00
(000.100910)  can0  18ECF931   [8]  11 02 01 FF FF 00 AB 00
(000.000657)  can0  18EC31F9   [8]  FF 08 FF FF FF 00 AB 00

Setting it to 0 is disallowed as well and the transmission is cancelled 
immediatley with error 05 which is "Maximum retransmit request limit 
reached.":
(000.000000)  can0  18EC31F9   [8]  10 0A 00 02 02 00 AB 00
(000.000941)  can0  18ECF931   [8]  11 00 00 FF FF 00 AB 00
(000.000645)  can0  18EC31F9   [8]  FF 05 FF FF FF 00 AB 00

There is a check at the beggining of j1939_xtp_rx_cts_one for duplicate 
sequence numbers which targets byte 0, so the command type byte, and 
checks that it is not equal to the last command.

if (session->last_cmd == dat[0]) {
		err = J1939_XTP_ABORT_DUP_SEQ;
		goto out_session_cancel;
	}

This means it is impossible to handle two directly succeeding CTS which 
would be necessary to escape the hold....

The easiest way to fix this would probably be to move the check for a 
hold message all the way to the top of j1939_xtp_rx_cts_one and if a 
hold message has been received just set the rx-timeout timer and then bail?

> 
> Best Regards,
> Oleksij

Kind regards,
Alexander


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] can: j1939: fix wrong rx timeout for CTS hold messages
  2026-04-23  9:35   ` Hölzl, Alexander
@ 2026-04-23 13:07     ` Oleksij Rempel
  2026-04-23 13:34       ` Hölzl, Alexander
  0 siblings, 1 reply; 6+ messages in thread
From: Oleksij Rempel @ 2026-04-23 13:07 UTC (permalink / raw)
  To: Hölzl, Alexander; +Cc: robin, linux-kernel, kernel, linux-can

On Thu, Apr 23, 2026 at 11:35:27AM +0200, Hölzl, Alexander wrote:
> 
> Hello Oleksij,
> thank you for your quick review!
> 
> Am 23.04.2026 um 05:50 schrieb Oleksij Rempel:
> > Hi Alexander,
> > 
> > On Tue, Apr 21, 2026 at 05:31:54PM +0200, Alexander Hölzl wrote:
> > > In J1939 segmented transport, a CTS message with data byte 2 set to zero is interpreted as a hold message.
> > > This instructs the transmitter of the segmented message to hold the connection open but to delay sending.
> > > According to the J1939-21 standard, section 5.10.2.4 the timeout T4 after which an held open session is invalidated is
> > > 1050 ms, not 550 as implemented currently.
> > > The 550 ms are problematic if a device uses hold messages and assumes it can wait for more than 550 ms before it has
> > > to resend the hold message.
> > > 
> > > This patch changes the T4 timeout used in the implementation from 550 ms to 1050.
> > > 
> > > Signed-off-by: Alexander Hölzl <alexander.hoelzl@gmx.net>
> > 
> > LGTM. Thank you!
> > 
> > Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
> > 
> > 
> > Sashico detected one more potential issue, not related to this patch:
> > https://sashiko.dev/#/patchset/20260421153152.87772-3-alexander.hoelzl%40gmx.net
> > 
> > If you have time, can you please verify it?
> I just tried it and to be honest it seems that holds are fundamentally
> broken currently. I don't think there is any way to restart normal
> communication as soon as a hold has been received.
> 
> When I send a hold with byte 3 set to FF and try to resume from sequence
> number 1 I get an abort with reason 08 which is "Duplicate sequence number"
> according to the spec:
> (000.000000)  can0  18EC31F9   [8]  10 0A 00 02 02 00 AB 00
> (000.001166)  can0  18ECF931   [8]  11 00 FF FF FF 00 AB 00
> (000.101138)  can0  18ECF931   [8]  11 02 01 FF FF 00 AB 00
> (000.000685)  can0  18EC31F9   [8]  FF 08 FF FF FF 00 AB 00
> 
> The same happens when setting byte 3 to 01:
> (000.000000)  can0  18EC31F9   [8]  10 0A 00 02 02 00 AB 00
> (000.001077)  can0  18ECF931   [8]  11 00 01 FF FF 00 AB 00
> (000.100910)  can0  18ECF931   [8]  11 02 01 FF FF 00 AB 00
> (000.000657)  can0  18EC31F9   [8]  FF 08 FF FF FF 00 AB 00
> 
> Setting it to 0 is disallowed as well and the transmission is cancelled
> immediatley with error 05 which is "Maximum retransmit request limit
> reached.":
> (000.000000)  can0  18EC31F9   [8]  10 0A 00 02 02 00 AB 00
> (000.000941)  can0  18ECF931   [8]  11 00 00 FF FF 00 AB 00
> (000.000645)  can0  18EC31F9   [8]  FF 05 FF FF FF 00 AB 00
> 
> There is a check at the beggining of j1939_xtp_rx_cts_one for duplicate
> sequence numbers which targets byte 0, so the command type byte, and checks
> that it is not equal to the last command.
> 
> if (session->last_cmd == dat[0]) {
> 		err = J1939_XTP_ABORT_DUP_SEQ;
> 		goto out_session_cancel;
> 	}
> 
> This means it is impossible to handle two directly succeeding CTS which
> would be necessary to escape the hold....
> 
> The easiest way to fix this would probably be to move the check for a hold
> message all the way to the top of j1939_xtp_rx_cts_one and if a hold message
> has been received just set the rx-timeout timer and then bail?

From a quick lock, it sounds plausible. Will you send a patch?

Hm... we needs tests, preferably in kernel source to avoid regressions.

would it be possible to implement is on top of kunit tests?
https://lore.kernel.org/all/20260420152228.581421-1-o.rempel@pengutronix.de/

It looks like there is more user space friendly testing used:
https://lore.kernel.org/all/20260419144600.GA4091724@chcpu16/


-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] can: j1939: fix wrong rx timeout for CTS hold messages
  2026-04-23 13:07     ` Oleksij Rempel
@ 2026-04-23 13:34       ` Hölzl, Alexander
  0 siblings, 0 replies; 6+ messages in thread
From: Hölzl, Alexander @ 2026-04-23 13:34 UTC (permalink / raw)
  To: Oleksij Rempel; +Cc: robin, linux-kernel, kernel, linux-can



Am 23.04.2026 um 15:07 schrieb Oleksij Rempel:
> On Thu, Apr 23, 2026 at 11:35:27AM +0200, Hölzl, Alexander wrote:
>>
>> Hello Oleksij,
>> thank you for your quick review!
>>
>> Am 23.04.2026 um 05:50 schrieb Oleksij Rempel:
>>> Hi Alexander,
>>>
>>> On Tue, Apr 21, 2026 at 05:31:54PM +0200, Alexander Hölzl wrote:
>>>> In J1939 segmented transport, a CTS message with data byte 2 set to zero is interpreted as a hold message.
>>>> This instructs the transmitter of the segmented message to hold the connection open but to delay sending.
>>>> According to the J1939-21 standard, section 5.10.2.4 the timeout T4 after which an held open session is invalidated is
>>>> 1050 ms, not 550 as implemented currently.
>>>> The 550 ms are problematic if a device uses hold messages and assumes it can wait for more than 550 ms before it has
>>>> to resend the hold message.
>>>>
>>>> This patch changes the T4 timeout used in the implementation from 550 ms to 1050.
>>>>
>>>> Signed-off-by: Alexander Hölzl <alexander.hoelzl@gmx.net>
>>>
>>> LGTM. Thank you!
>>>
>>> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
>>>
>>>
>>> Sashico detected one more potential issue, not related to this patch:
>>> https://sashiko.dev/#/patchset/20260421153152.87772-3-alexander.hoelzl%40gmx.net
>>>
>>> If you have time, can you please verify it?
>> I just tried it and to be honest it seems that holds are fundamentally
>> broken currently. I don't think there is any way to restart normal
>> communication as soon as a hold has been received.
>>
>> When I send a hold with byte 3 set to FF and try to resume from sequence
>> number 1 I get an abort with reason 08 which is "Duplicate sequence number"
>> according to the spec:
>> (000.000000)  can0  18EC31F9   [8]  10 0A 00 02 02 00 AB 00
>> (000.001166)  can0  18ECF931   [8]  11 00 FF FF FF 00 AB 00
>> (000.101138)  can0  18ECF931   [8]  11 02 01 FF FF 00 AB 00
>> (000.000685)  can0  18EC31F9   [8]  FF 08 FF FF FF 00 AB 00
>>
>> The same happens when setting byte 3 to 01:
>> (000.000000)  can0  18EC31F9   [8]  10 0A 00 02 02 00 AB 00
>> (000.001077)  can0  18ECF931   [8]  11 00 01 FF FF 00 AB 00
>> (000.100910)  can0  18ECF931   [8]  11 02 01 FF FF 00 AB 00
>> (000.000657)  can0  18EC31F9   [8]  FF 08 FF FF FF 00 AB 00
>>
>> Setting it to 0 is disallowed as well and the transmission is cancelled
>> immediatley with error 05 which is "Maximum retransmit request limit
>> reached.":
>> (000.000000)  can0  18EC31F9   [8]  10 0A 00 02 02 00 AB 00
>> (000.000941)  can0  18ECF931   [8]  11 00 00 FF FF 00 AB 00
>> (000.000645)  can0  18EC31F9   [8]  FF 05 FF FF FF 00 AB 00
>>
>> There is a check at the beggining of j1939_xtp_rx_cts_one for duplicate
>> sequence numbers which targets byte 0, so the command type byte, and checks
>> that it is not equal to the last command.
>>
>> if (session->last_cmd == dat[0]) {
>> 		err = J1939_XTP_ABORT_DUP_SEQ;
>> 		goto out_session_cancel;
>> 	}
>>
>> This means it is impossible to handle two directly succeeding CTS which
>> would be necessary to escape the hold....
>>
>> The easiest way to fix this would probably be to move the check for a hold
>> message all the way to the top of j1939_xtp_rx_cts_one and if a hold message
>> has been received just set the rx-timeout timer and then bail?
> 
>  From a quick lock, it sounds plausible. Will you send a patch?
> 
> Hm... we needs tests, preferably in kernel source to avoid regressions.
> 
> would it be possible to implement is on top of kunit tests?
> https://lore.kernel.org/all/20260420152228.581421-1-o.rempel@pengutronix.de/
> 
> It looks like there is more user space friendly testing used:
> https://lore.kernel.org/all/20260419144600.GA4091724@chcpu16/
> 
> 
Sure I'll write a patch and tests!


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] can: j1939: fix wrong rx timeout for CTS hold messages
  2026-04-21 15:31 [PATCH] can: j1939: fix wrong rx timeout for CTS hold messages Alexander Hölzl
  2026-04-23  3:50 ` Oleksij Rempel
@ 2026-05-06 12:54 ` Marc Kleine-Budde
  1 sibling, 0 replies; 6+ messages in thread
From: Marc Kleine-Budde @ 2026-05-06 12:54 UTC (permalink / raw)
  To: Alexander Hölzl; +Cc: robin, o.rempel, kernel, linux-can, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1130 bytes --]

On 21.04.2026 17:31:54, Alexander Hölzl wrote:
> In J1939 segmented transport, a CTS message with data byte 2 set to zero is interpreted as a hold message.
> This instructs the transmitter of the segmented message to hold the connection open but to delay sending.
> According to the J1939-21 standard, section 5.10.2.4 the timeout T4 after which an held open session is invalidated is
> 1050 ms, not 550 as implemented currently.
> The 550 ms are problematic if a device uses hold messages and assumes it can wait for more than 550 ms before it has
> to resend the hold message.
>
> This patch changes the T4 timeout used in the implementation from 550 ms to 1050.
>
> Signed-off-by: Alexander Hölzl <alexander.hoelzl@gmx.net>

Applied to linux-can + rewrap long lines in patch description, use
imperative mood in last section.

regards,
Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde          |
Embedded Linux                   | https://www.pengutronix.de |
Vertretung Nürnberg              | Phone: +49-5121-206917-129 |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-9   |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-05-06 12:54 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-21 15:31 [PATCH] can: j1939: fix wrong rx timeout for CTS hold messages Alexander Hölzl
2026-04-23  3:50 ` Oleksij Rempel
2026-04-23  9:35   ` Hölzl, Alexander
2026-04-23 13:07     ` Oleksij Rempel
2026-04-23 13:34       ` Hölzl, Alexander
2026-05-06 12:54 ` Marc Kleine-Budde

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox