From: Marc Kleine-Budde <mkl@pengutronix.de>
To: Thomas.Kopp@microchip.com
Cc: linux-can@vger.kernel.org, mark@baggywrinkle.co.uk
Subject: Re: [PATCH 5/5] can: bittiming: can_calc_bittiming(): use Phase Seg2 / 2 as default for SJW
Date: Thu, 8 Sep 2022 21:57:16 +0200 [thread overview]
Message-ID: <871qsmt3mq.fsf@hardanger.blackshift.org> (raw)
In-Reply-To: <BL3PR11MB64844BD45DE7FDE94BDE47BCFB419@BL3PR11MB6484.namprd11.prod.outlook.com>
[-- Attachment #1: Type: text/plain, Size: 12627 bytes --]
Hello Thomas,
On 07.09.2022 19:19:42, Thomas.Kopp@microchip.com wrote:
> > In other words this means the SJW parameter controls the tolerance of
> > the CAN controller against the frequency error compared to other CAN
> > controllers.
> >
> > If the user space doesn't provide a SJW parameter, the kernel chooses
> > a default value of 1. This proved to be a good default value for CAN
> > controllers, but not anymore with modern controllers.
> >
> > To get the same oscillator tolerances on controllers with wide bit
> > timing registers, choose a default SJW value of Phase Seg2 / 2. This
> > results in the following bit timing parameters:
>
> Thanks for your work on this! May I ask why Phase Seg2 / 2 is chosen?
According to "Combining CANopen and SAE J1939 networks" [2], CANopen and
J1939 use a SJW of 1:
"The CANopen standard CiA 301 allows various bitrates in the range from
10 kBit/s to 1 MBit/s. The J1939-11 specification stipulates 250 kBit/s
and is used in the majority of applications. The J1939-14 standard
specifies 500 kBit/s for the physical layer.
Thereby the bitrate for a shared physical layer is limited to the
bitrates 250 kBit/s and 500 kBit/s. Fortunately both standards define
the same sample point location at 87,5 % together with a SJW value of
1."
I think this is why we implemented a default SJW of 1 in the kernel.
As discussed in my patch description a SJW of 1 for controllers that
were common at the time these standard were written, with narrow bit
timing registers, equals "Phase Seg2 / 2".
In "The Configuration of the CAN Bit Timing" [3] the following criteria
for the Oscillator Tolerance Range was given:
"The tolerance range df for an oscillator’s frequency f_osc around the
nominal frequency f_nom with:
(1 - df) * f_nom <= f_osc <= (1 + df) * f_nom
depends on the proportions of Phase_Seg1, Phase_Seg2, SJW, and the bit
time. The maximum tolerance df is the defined by two conditions (both
shall be met):
min(Phase_Seg1, Phase_Seg2)
I: df <= --------------------------------
2 * (13 * bit_time - Phase_Seg2)
SJW
II: df <= -------------
20 * bit_time
"
The 2nd criteria (see "df2" column) is influenced by SJW, so of
interest. Here a sja1000 controller:
| Bit timing parameters for sja1000 with 8.000000 MHz ref clock using algo 'v5.19'
| _----+--------------=> TSeg1: 1 … 16
| / / _---------=> TSeg2: 1 … 8
| | | / _-----=> SJW: 1 … 4
| | | | / _-=> BRP: 1 … 64 (inc: 1)
| | | | | /
| nominal | | | | | real Bitrt nom real SampP
| Bitrate TQ[ns] PrS PhS1 PhS2 SJW BRP Bitrate Error SampP SampP Error df1 df2 BTR0 BTR1
| 1000000 125 2 3 2 1 1 1000000 0.0% 75.0% 75.0% 0.0% 9.8‰ 6.2‰ 0x00 0x14
| 800000 125 3 4 2 1 1 800000 0.0% 80.0% 80.0% 0.0% 7.8‰ 5.0‰ 0x00 0x16
| 666666 125 4 4 3 1 1 666666 0.0% 80.0% 75.0% 6.2% 9.8‰ 4.2‰ 0x00 0x27
| 500000 125 6 7 2 1 1 500000 0.0% 87.5% 87.5% 0.0% 4.9‰ 3.1‰ 0x00 0x1c
| 250000 250 6 7 2 1 2 250000 0.0% 87.5% 87.5% 0.0% 4.9‰ 3.1‰ 0x01 0x1c
| 125000 500 6 7 2 1 4 125000 0.0% 87.5% 87.5% 0.0% 4.9‰ 3.1‰ 0x03 0x1c
| 100000 625 6 7 2 1 5 100000 0.0% 87.5% 87.5% 0.0% 4.9‰ 3.1‰ 0x04 0x1c
| 83333 750 6 7 2 1 6 83333 0.0% 87.5% 87.5% 0.0% 4.9‰ 3.1‰ 0x05 0x1c
| 50000 1250 6 7 2 1 10 50000 0.0% 87.5% 87.5% 0.0% 4.9‰ 3.1‰ 0x09 0x1c
| 33333 1875 6 7 2 1 15 33333 0.0% 87.5% 87.5% 0.0% 4.9‰ 3.1‰ 0x0e 0x1c
| 20000 3125 6 7 2 1 25 20000 0.0% 87.5% 87.5% 0.0% 4.9‰ 3.1‰ 0x18 0x1c
| 10000 6250 6 7 2 1 50 10000 0.0% 87.5% 87.5% 0.0% 4.9‰ 3.1‰ 0x31 0x1c
"df2" is 3.1‰ for 500kbit/s or slower.
In contrast the mcp251xfd with the current mainline bit timing algorithm
(SJW=1):
| Bit timing parameters for mcp251xfd with 40.000000 MHz ref clock (CIA recommendation) using algo 'v5.19'
| _----+--------------=> TSeg1: 2 … 256
| / / _---------=> TSeg2: 1 … 128
| | | / _-----=> SJW: 1 … 128
| | | | / _-=> BRP: 1 … 256 (inc: 1)
| | | | | /
| nominal | | | | | real Bitrt nom real SampP
| Bitrate TQ[ns] PrS PhS1 PhS2 SJW BRP Bitrate Error SampP SampP Error df1 df2 NBTCFG
| 1000000 25 14 15 10 1 1 1000000 0.0% 75.0% 75.0% 0.0% 9.8‰ 1.2‰ 0x001c0900
| 800000 25 19 20 10 1 1 800000 0.0% 80.0% 80.0% 0.0% 7.8‰ 1.0‰ 0x00260900
| 666666 25 23 24 12 1 1 666666 0.0% 80.0% 80.0% 0.0% 7.8‰ 0.8‰ 0x002e0b00
| 500000 25 34 35 10 1 1 500000 0.0% 87.5% 87.5% 0.0% 4.9‰ 0.6‰ 0x00440900
| 250000 25 69 70 20 1 1 250000 0.0% 87.5% 87.5% 0.0% 4.9‰ 0.3‰ 0x008a1300
| 125000 50 69 70 20 1 2 125000 0.0% 87.5% 87.5% 0.0% 4.9‰ 0.3‰ 0x018a1300
| 100000 50 87 87 25 1 2 100000 0.0% 87.5% 87.5% 0.0% 4.9‰ 0.2‰ 0x01ad1800
| 83333 50 104 105 30 1 2 83333 0.0% 87.5% 87.5% 0.0% 4.9‰ 0.2‰ 0x01d01d00
| 50000 100 87 87 25 1 4 50000 0.0% 87.5% 87.5% 0.0% 4.9‰ 0.2‰ 0x03ad1800
| 33333 125 104 105 30 1 5 33333 0.0% 87.5% 87.5% 0.0% 4.9‰ 0.2‰ 0x04d01d00
| 20000 250 87 87 25 1 10 20000 0.0% 87.5% 87.5% 0.0% 4.9‰ 0.2‰ 0x09ad1800
| 10000 500 87 87 25 1 20 10000 0.0% 87.5% 87.5% 0.0% 4.9‰ 0.2‰ 0x13ad1800
for slow bit rates "df2" drops to 0.2‰.
Here the mcp251xfd with the updated algorithm (SJW = Phase Seg2 / 2):
| Bit timing parameters for mcp251xfd with 40.000000 MHz ref clock (CIA recommendation) using algo 'next'
| _----+--------------=> TSeg1: 2 … 256
| / / _---------=> TSeg2: 1 … 128
| | | / _-----=> SJW: 1 … 128
| | | | / _-=> BRP: 1 … 256 (inc: 1)
| | | | | /
| nominal | | | | | real Bitrt nom real SampP
| Bitrate TQ[ns] PrS PhS1 PhS2 SJW BRP Bitrate Error SampP SampP Error df1 df2 NBTCFG
| 1000000 25 14 15 10 5 1 1000000 0.0% 75.0% 75.0% 0.0% 9.8‰ 6.2‰ 0x001c0904
| 800000 25 19 20 10 5 1 800000 0.0% 80.0% 80.0% 0.0% 7.8‰ 5.0‰ 0x00260904
| 666666 25 23 24 12 6 1 666666 0.0% 80.0% 80.0% 0.0% 7.8‰ 5.0‰ 0x002e0b05
| 500000 25 34 35 10 5 1 500000 0.0% 87.5% 87.5% 0.0% 4.9‰ 3.1‰ 0x00440904
| 250000 25 69 70 20 10 1 250000 0.0% 87.5% 87.5% 0.0% 4.9‰ 3.1‰ 0x008a1309
| 125000 50 69 70 20 10 2 125000 0.0% 87.5% 87.5% 0.0% 4.9‰ 3.1‰ 0x018a1309
| 100000 50 87 87 25 12 2 100000 0.0% 87.5% 87.5% 0.0% 4.9‰ 3.0‰ 0x01ad180b
| 83333 50 104 105 30 15 2 83333 0.0% 87.5% 87.5% 0.0% 4.9‰ 3.1‰ 0x01d01d0e
| 50000 100 87 87 25 12 4 50000 0.0% 87.5% 87.5% 0.0% 4.9‰ 3.0‰ 0x03ad180b
| 33333 125 104 105 30 15 5 33333 0.0% 87.5% 87.5% 0.0% 4.9‰ 3.1‰ 0x04d01d0e
| 20000 250 87 87 25 12 10 20000 0.0% 87.5% 87.5% 0.0% 4.9‰ 3.0‰ 0x09ad180b
| 10000 500 87 87 25 12 20 10000 0.0% 87.5% 87.5% 0.0% 4.9‰ 3.0‰ 0x13ad180b
The "df2" stays at ~3.1‰
[2] http://web.archive.org/https://www.can-cia.org/fileadmin/resources/documents/proceedings/2013_mmc_koppe2.pdf
[3] http://web.archive.org/http://www.oertel-halle.de/files/cia99paper.pdf
And I tested with SJW=max from mcp251x (not the mcp251xfd) with SJA=max
to a peak USB adapter with SJW=1 - the peak adapter fails to receive CAN
frames:
| 5: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UP mode DEFAULT group default qlen 10
| link/can promiscuity 0 minmtu 0 maxmtu 0
| can state ERROR-ACTIVE restart-ms 1000
| bitrate 500000 sample-point 0.850
| tq 100 prop-seg 8 phase-seg1 8 phase-seg2 3 sjw 3 brp 1
| mcp251x: tseg1 3..16 tseg2 2..8 sjw 1..4 brp 1..64 brp_inc 1
| clock 10000000
| re-started bus-errors arbit-lost error-warn error-pass bus-off
| 0 0 0 2 2 0 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 parentbus spi parentdev spi1.1
| 97: peakfd0: <NOARP,UP,LOWER_UP,ECHO> mtu 72 qdisc pfifo_fast state UP mode DEFAULT group default qlen 10
| link/can promiscuity 0 minmtu 0 maxmtu 0
| can <FD> state ERROR-PASSIVE (berr-counter tx 0 rx 136) restart-ms 1000
| bitrate 500000 sample-point 0.875
| tq 12 prop-seg 69 phase-seg1 70 phase-seg2 20 sjw 1 brp 1
| pcan_usb_fd: tseg1 1..256 tseg2 1..128 sjw 1..128 brp 1..1024 brp_inc 1
| dbitrate 2000000 dsample-point 0.750
| dtq 12 dprop-seg 14 dphase-seg1 15 dphase-seg2 10 dsjw 1 dbrp 1
| pcan_usb_fd: dtseg1 1..32 dtseg2 1..16 dsjw 1..16 dbrp 1..1024 dbrp_inc 1
| clock 80000000
| re-started bus-errors arbit-lost error-warn error-pass bus-off
| 0 0 0 1 2 0 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 gro_max_size 65536 parentbus usb parentdev 2-2.2:1.0
| (1970-01-01 02:41:33.856148) peakfd0 RX - - 20000004 [8] 00 04 00 00 00 00 00 00 ERRORFRAME
| controller-problem{rx-error-warning}
| (1970-01-01 02:41:33.857168) peakfd0 RX - - 20000004 [8] 00 10 00 00 00 00 00 00 ERRORFRAME
| controller-problem{rx-error-passive}
With SJW = Phase Seg2 / 2, it works.
> In general the recommendation is to chose SJW as big as possible.
> Here's an excerpt from the CiA 601-3 (Unfortunately I can't share the
> entire document)
CiA 301 and J1939 seem to have a different opinion...
> "Recom2: Choose sjwA as large as possible.
> - The maximum possible value is sjwA = min(ps1A, ps2A).
> - During arbitration phase, a large sjwA allows a CAN node to resynchronize quickly on the leading transmitting node.
> - If the CAN controller does not allow configuring the maximum possible sjwA
> because the value range for sjwA is limited in the CAN controller, then consider
> increasing BRPA. This allows configuring a larger sjwA, unless this can impact
> phase error tolerance in an unwanted way, see Recom4."
>
> I think we discussed this a while ago already and the biggest
> reasoning against maxing SJW was breaking "compatibility" to old
> kernel versions.
ACK
> The compatibility in this is not being able to sync
> though... Now, if a change to a larger default values is agreed upon
> we might as well chose min(phaseseg1, phaseseg2) directly to maximize
> robustness for the user that doesn't set it explicitly. Given that
> most of the getting started with Socketcan results on your favorite
> Search engine don't even mention SJW that's probably a significant
> portion of the users. From a pure communication standpoint I can't see
> any downside to doing this, the only hypothetical case would be if a
> CAN controller had an erratum about higher SJWs and the driver not
> handling it.
>
> So, in essence, I'd propose to use min(p1,p2).
NACK - as I have a setup that doesn't work anymore when mixing an old
and a new kernel.
regards,
Marc
--
Pengutronix e.K. | Marc Kleine-Budde |
Embedded Linux | https://www.pengutronix.de |
Vertretung West/Dortmund | Phone: +49-231-2826-924 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2022-09-08 19:57 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-07 10:38 [PATCH 0/5] can: bittiming: improve SJW handling and default Marc Kleine-Budde
2022-09-07 10:38 ` [PATCH 1/5] can: dev: register_candev(): ensure that bittiming const are valid Marc Kleine-Budde
2022-09-07 10:38 ` [PATCH 2/5] can: bittiming: can_calc_bittiming(): make use of min3() Marc Kleine-Budde
2022-09-11 8:24 ` Vincent Mailhol
2022-09-12 5:56 ` Thomas.Kopp
2022-09-12 8:28 ` Vincent Mailhol
2023-01-31 16:04 ` Marc Kleine-Budde
2023-02-01 5:49 ` Vincent Mailhol
2023-02-01 8:58 ` Marc Kleine-Budde
2023-02-01 9:38 ` Vincent Mailhol
2023-02-02 9:55 ` Marc Kleine-Budde
2023-02-02 10:29 ` Vincent Mailhol
2023-02-01 9:05 ` Marc Kleine-Budde
2022-09-07 10:38 ` [PATCH 3/5] can: bittiming: can_calc_bittiming(): clean up SJW sanitizing Marc Kleine-Budde
2022-09-07 10:38 ` [PATCH 4/5] can: bittiming: can_calc_bittiming(): ensure that SJW is not longer than either Phase Buffer Segment Marc Kleine-Budde
2022-09-07 10:38 ` [PATCH 5/5] can: bittiming: can_calc_bittiming(): use Phase Seg2 / 2 as default for SJW Marc Kleine-Budde
2022-09-07 19:19 ` Thomas.Kopp
2022-09-08 19:57 ` Marc Kleine-Budde [this message]
2022-09-09 12:29 ` Marc Kleine-Budde
2022-09-26 6:15 ` Thomas.Kopp
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=871qsmt3mq.fsf@hardanger.blackshift.org \
--to=mkl@pengutronix.de \
--cc=Thomas.Kopp@microchip.com \
--cc=linux-can@vger.kernel.org \
--cc=mark@baggywrinkle.co.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox