Linux CAN drivers development
 help / color / mirror / Atom feed
From: Marc Kleine-Budde <mkl@pengutronix.de>
To: Thomas.Kopp@microchip.com
Cc: linux-can@vger.kernel.org, mark@baggywrinkle.co.uk
Subject: Re: [PATCH 5/5] can: bittiming: can_calc_bittiming(): use Phase Seg2 / 2 as default for SJW
Date: Thu, 8 Sep 2022 21:57:16 +0200	[thread overview]
Message-ID: <871qsmt3mq.fsf@hardanger.blackshift.org> (raw)
In-Reply-To: <BL3PR11MB64844BD45DE7FDE94BDE47BCFB419@BL3PR11MB6484.namprd11.prod.outlook.com>

[-- Attachment #1: Type: text/plain, Size: 12627 bytes --]

Hello Thomas,

On 07.09.2022 19:19:42, Thomas.Kopp@microchip.com wrote:
> > In other words this means the SJW parameter controls the tolerance of
> > the CAN controller against the frequency error compared to other CAN
> > controllers.
> >
> > If the user space doesn't provide a SJW parameter, the kernel chooses
> > a default value of 1. This proved to be a good default value for CAN
> > controllers, but not anymore with modern controllers.
> >
> > To get the same oscillator tolerances on controllers with wide bit
> > timing registers, choose a default SJW value of Phase Seg2 / 2. This
> > results in the following bit timing parameters:
>
> Thanks for your work on this! May I ask why Phase Seg2 / 2 is chosen?

According to "Combining CANopen and SAE J1939 networks" [2], CANopen and
J1939 use a SJW of 1:

"The CANopen standard CiA 301 allows various bitrates in the range from
 10 kBit/s to 1 MBit/s. The J1939-11 specification stipulates 250 kBit/s
 and is used in the majority of applications. The J1939-14 standard
 specifies 500 kBit/s for the physical layer.

 Thereby the bitrate for a shared physical layer is limited to the
 bitrates 250 kBit/s and 500 kBit/s. Fortunately both standards define
 the same sample point location at 87,5 % together with a SJW value of
 1."

I think this is why we implemented a default SJW of 1 in the kernel.

As discussed in my patch description a SJW of 1 for controllers that
were common at the time these standard were written, with narrow bit
timing registers, equals "Phase Seg2 / 2".

In "The Configuration of the CAN Bit Timing" [3] the following criteria
for the Oscillator Tolerance Range was given:

"The tolerance range df for an oscillator’s frequency f_osc around the
 nominal frequency f_nom with:

 (1 - df) * f_nom <= f_osc <= (1 + df) * f_nom

 depends on the proportions of Phase_Seg1, Phase_Seg2, SJW, and the bit
 time. The maximum tolerance df is the defined by two conditions (both
 shall be met):

              min(Phase_Seg1, Phase_Seg2)
 I:  df <= --------------------------------
           2 * (13 * bit_time - Phase_Seg2)

                SJW
 II: df <= -------------
           20 * bit_time
"

The 2nd criteria (see "df2" column) is influenced by SJW, so of
interest. Here a sja1000 controller:

| Bit timing parameters for sja1000 with 8.000000 MHz ref clock using algo 'v5.19'
|                     _----+--------------=> TSeg1: 1 …   16
|                    /    /     _---------=> TSeg2: 1 …    8
|                   |    |     /    _-----=> SJW:   1 …    4
|                   |    |    |    /    _-=> BRP:   1 …   64 (inc: 1)
|                   |    |    |   |    /
|  nominal          |    |    |   |   |     real  Bitrt    nom   real   SampP
|  Bitrate TQ[ns] PrS PhS1 PhS2 SJW BRP  Bitrate  Error  SampP  SampP   Error   df1    df2   BTR0 BTR1
|  1000000    125   2    3    2   1   1  1000000   0.0%  75.0%  75.0%   0.0%   9.8‰   6.2‰   0x00 0x14
|   800000    125   3    4    2   1   1   800000   0.0%  80.0%  80.0%   0.0%   7.8‰   5.0‰   0x00 0x16
|   666666    125   4    4    3   1   1   666666   0.0%  80.0%  75.0%   6.2%   9.8‰   4.2‰   0x00 0x27
|   500000    125   6    7    2   1   1   500000   0.0%  87.5%  87.5%   0.0%   4.9‰   3.1‰   0x00 0x1c
|   250000    250   6    7    2   1   2   250000   0.0%  87.5%  87.5%   0.0%   4.9‰   3.1‰   0x01 0x1c
|   125000    500   6    7    2   1   4   125000   0.0%  87.5%  87.5%   0.0%   4.9‰   3.1‰   0x03 0x1c
|   100000    625   6    7    2   1   5   100000   0.0%  87.5%  87.5%   0.0%   4.9‰   3.1‰   0x04 0x1c
|    83333    750   6    7    2   1   6    83333   0.0%  87.5%  87.5%   0.0%   4.9‰   3.1‰   0x05 0x1c
|    50000   1250   6    7    2   1  10    50000   0.0%  87.5%  87.5%   0.0%   4.9‰   3.1‰   0x09 0x1c
|    33333   1875   6    7    2   1  15    33333   0.0%  87.5%  87.5%   0.0%   4.9‰   3.1‰   0x0e 0x1c
|    20000   3125   6    7    2   1  25    20000   0.0%  87.5%  87.5%   0.0%   4.9‰   3.1‰   0x18 0x1c
|    10000   6250   6    7    2   1  50    10000   0.0%  87.5%  87.5%   0.0%   4.9‰   3.1‰   0x31 0x1c

"df2" is 3.1‰ for 500kbit/s or slower.

In contrast the mcp251xfd with the current mainline bit timing algorithm
(SJW=1):

| Bit timing parameters for mcp251xfd with 40.000000 MHz ref clock (CIA recommendation) using algo 'v5.19'
|                     _----+--------------=> TSeg1: 2 …  256
|                    /    /     _---------=> TSeg2: 1 …  128
|                   |    |     /    _-----=> SJW:   1 …  128
|                   |    |    |    /    _-=> BRP:   1 …  256 (inc: 1)
|                   |    |    |   |    /
|  nominal          |    |    |   |   |     real  Bitrt    nom   real   SampP
|  Bitrate TQ[ns] PrS PhS1 PhS2 SJW BRP  Bitrate  Error  SampP  SampP   Error   df1    df2       NBTCFG
|  1000000     25  14   15   10   1   1  1000000   0.0%  75.0%  75.0%   0.0%   9.8‰   1.2‰   0x001c0900
|   800000     25  19   20   10   1   1   800000   0.0%  80.0%  80.0%   0.0%   7.8‰   1.0‰   0x00260900
|   666666     25  23   24   12   1   1   666666   0.0%  80.0%  80.0%   0.0%   7.8‰   0.8‰   0x002e0b00
|   500000     25  34   35   10   1   1   500000   0.0%  87.5%  87.5%   0.0%   4.9‰   0.6‰   0x00440900
|   250000     25  69   70   20   1   1   250000   0.0%  87.5%  87.5%   0.0%   4.9‰   0.3‰   0x008a1300
|   125000     50  69   70   20   1   2   125000   0.0%  87.5%  87.5%   0.0%   4.9‰   0.3‰   0x018a1300
|   100000     50  87   87   25   1   2   100000   0.0%  87.5%  87.5%   0.0%   4.9‰   0.2‰   0x01ad1800
|    83333     50 104  105   30   1   2    83333   0.0%  87.5%  87.5%   0.0%   4.9‰   0.2‰   0x01d01d00
|    50000    100  87   87   25   1   4    50000   0.0%  87.5%  87.5%   0.0%   4.9‰   0.2‰   0x03ad1800
|    33333    125 104  105   30   1   5    33333   0.0%  87.5%  87.5%   0.0%   4.9‰   0.2‰   0x04d01d00
|    20000    250  87   87   25   1  10    20000   0.0%  87.5%  87.5%   0.0%   4.9‰   0.2‰   0x09ad1800
|    10000    500  87   87   25   1  20    10000   0.0%  87.5%  87.5%   0.0%   4.9‰   0.2‰   0x13ad1800

for slow bit rates "df2" drops to 0.2‰.

Here the mcp251xfd with the updated algorithm (SJW = Phase Seg2 / 2):

| Bit timing parameters for mcp251xfd with 40.000000 MHz ref clock (CIA recommendation) using algo 'next'
|                     _----+--------------=> TSeg1: 2 …  256
|                    /    /     _---------=> TSeg2: 1 …  128
|                   |    |     /    _-----=> SJW:   1 …  128
|                   |    |    |    /    _-=> BRP:   1 …  256 (inc: 1)
|                   |    |    |   |    /
|  nominal          |    |    |   |   |     real  Bitrt    nom   real   SampP
|  Bitrate TQ[ns] PrS PhS1 PhS2 SJW BRP  Bitrate  Error  SampP  SampP   Error   df1    df2       NBTCFG
|  1000000     25  14   15   10   5   1  1000000   0.0%  75.0%  75.0%   0.0%   9.8‰   6.2‰   0x001c0904
|   800000     25  19   20   10   5   1   800000   0.0%  80.0%  80.0%   0.0%   7.8‰   5.0‰   0x00260904
|   666666     25  23   24   12   6   1   666666   0.0%  80.0%  80.0%   0.0%   7.8‰   5.0‰   0x002e0b05
|   500000     25  34   35   10   5   1   500000   0.0%  87.5%  87.5%   0.0%   4.9‰   3.1‰   0x00440904
|   250000     25  69   70   20  10   1   250000   0.0%  87.5%  87.5%   0.0%   4.9‰   3.1‰   0x008a1309
|   125000     50  69   70   20  10   2   125000   0.0%  87.5%  87.5%   0.0%   4.9‰   3.1‰   0x018a1309
|   100000     50  87   87   25  12   2   100000   0.0%  87.5%  87.5%   0.0%   4.9‰   3.0‰   0x01ad180b
|    83333     50 104  105   30  15   2    83333   0.0%  87.5%  87.5%   0.0%   4.9‰   3.1‰   0x01d01d0e
|    50000    100  87   87   25  12   4    50000   0.0%  87.5%  87.5%   0.0%   4.9‰   3.0‰   0x03ad180b
|    33333    125 104  105   30  15   5    33333   0.0%  87.5%  87.5%   0.0%   4.9‰   3.1‰   0x04d01d0e
|    20000    250  87   87   25  12  10    20000   0.0%  87.5%  87.5%   0.0%   4.9‰   3.0‰   0x09ad180b
|    10000    500  87   87   25  12  20    10000   0.0%  87.5%  87.5%   0.0%   4.9‰   3.0‰   0x13ad180b

The "df2" stays at ~3.1‰

[2] http://web.archive.org/https://www.can-cia.org/fileadmin/resources/documents/proceedings/2013_mmc_koppe2.pdf
[3] http://web.archive.org/http://www.oertel-halle.de/files/cia99paper.pdf


And I tested with SJW=max from mcp251x (not the mcp251xfd) with SJA=max
to a peak USB adapter with SJW=1 - the peak adapter fails to receive CAN
frames:

| 5: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UP mode DEFAULT group default qlen 10
|     link/can  promiscuity 0 minmtu 0 maxmtu 0
|     can state ERROR-ACTIVE restart-ms 1000
|           bitrate 500000 sample-point 0.850
|           tq 100 prop-seg 8 phase-seg1 8 phase-seg2 3 sjw 3 brp 1
|           mcp251x: tseg1 3..16 tseg2 2..8 sjw 1..4 brp 1..64 brp_inc 1
|           clock 10000000
|           re-started bus-errors arbit-lost error-warn error-pass bus-off
|           0          0          0          2          2          0         numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 parentbus spi parentdev spi1.1

| 97: peakfd0: <NOARP,UP,LOWER_UP,ECHO> mtu 72 qdisc pfifo_fast state UP mode DEFAULT group default qlen 10
|     link/can  promiscuity 0 minmtu 0 maxmtu 0
|     can <FD> state ERROR-PASSIVE (berr-counter tx 0 rx 136) restart-ms 1000
|           bitrate 500000 sample-point 0.875
|           tq 12 prop-seg 69 phase-seg1 70 phase-seg2 20 sjw 1 brp 1
|           pcan_usb_fd: tseg1 1..256 tseg2 1..128 sjw 1..128 brp 1..1024 brp_inc 1
|           dbitrate 2000000 dsample-point 0.750
|           dtq 12 dprop-seg 14 dphase-seg1 15 dphase-seg2 10 dsjw 1 dbrp 1
|           pcan_usb_fd: dtseg1 1..32 dtseg2 1..16 dsjw 1..16 dbrp 1..1024 dbrp_inc 1
|           clock 80000000
|           re-started bus-errors arbit-lost error-warn error-pass bus-off
|           0          0          0          1          2          0         numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 gro_max_size 65536 parentbus usb parentdev 2-2.2:1.0

|  (1970-01-01 02:41:33.856148)  peakfd0  RX - -  20000004   [8]  00 04 00 00 00 00 00 00   ERRORFRAME
|         controller-problem{rx-error-warning}
|  (1970-01-01 02:41:33.857168)  peakfd0  RX - -  20000004   [8]  00 10 00 00 00 00 00 00   ERRORFRAME
|         controller-problem{rx-error-passive}

With SJW = Phase Seg2 / 2, it works.

> In general the recommendation is to chose SJW as big as possible.
> Here's an excerpt from the CiA 601-3 (Unfortunately I can't share the
> entire document)

CiA 301 and J1939 seem to have a different opinion... 

> "Recom2: Choose sjwA as large as possible.
> - The maximum possible value is sjwA = min(ps1A, ps2A).
> - During arbitration phase, a large sjwA allows a CAN node to resynchronize quickly on the leading transmitting node.
> - If the CAN controller does not allow configuring the maximum possible sjwA
> because the value range for sjwA is limited in the CAN controller, then consider
> increasing BRPA. This allows configuring a larger sjwA, unless this can impact
> phase error tolerance in an unwanted way, see Recom4."
>
> I think we discussed this a while ago already and the biggest
> reasoning against maxing SJW was breaking "compatibility" to old
> kernel versions.

ACK

> The compatibility in this is not being able to sync
> though... Now, if a change to a larger default values is agreed upon
> we might as well chose min(phaseseg1, phaseseg2) directly to maximize
> robustness for the user that doesn't set it explicitly. Given that
> most of the getting started with Socketcan results on your favorite
> Search engine don't even mention SJW that's probably a significant
> portion of the users. From a pure communication standpoint I can't see
> any downside to doing this, the only hypothetical case would be if a
> CAN controller had an erratum about higher SJWs and the driver not
> handling it.
>
> So, in essence, I'd propose to use min(p1,p2).

NACK - as I have a setup that doesn't work anymore when mixing an old
and a new kernel.

regards,
Marc

--
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2022-09-08 19:57 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-07 10:38 [PATCH 0/5] can: bittiming: improve SJW handling and default Marc Kleine-Budde
2022-09-07 10:38 ` [PATCH 1/5] can: dev: register_candev(): ensure that bittiming const are valid Marc Kleine-Budde
2022-09-07 10:38 ` [PATCH 2/5] can: bittiming: can_calc_bittiming(): make use of min3() Marc Kleine-Budde
2022-09-11  8:24   ` Vincent Mailhol
2022-09-12  5:56     ` Thomas.Kopp
2022-09-12  8:28       ` Vincent Mailhol
2023-01-31 16:04         ` Marc Kleine-Budde
2023-02-01  5:49           ` Vincent Mailhol
2023-02-01  8:58             ` Marc Kleine-Budde
2023-02-01  9:38               ` Vincent Mailhol
2023-02-02  9:55                 ` Marc Kleine-Budde
2023-02-02 10:29                   ` Vincent Mailhol
2023-02-01  9:05         ` Marc Kleine-Budde
2022-09-07 10:38 ` [PATCH 3/5] can: bittiming: can_calc_bittiming(): clean up SJW sanitizing Marc Kleine-Budde
2022-09-07 10:38 ` [PATCH 4/5] can: bittiming: can_calc_bittiming(): ensure that SJW is not longer than either Phase Buffer Segment Marc Kleine-Budde
2022-09-07 10:38 ` [PATCH 5/5] can: bittiming: can_calc_bittiming(): use Phase Seg2 / 2 as default for SJW Marc Kleine-Budde
2022-09-07 19:19   ` Thomas.Kopp
2022-09-08 19:57     ` Marc Kleine-Budde [this message]
2022-09-09 12:29       ` Marc Kleine-Budde
2022-09-26  6:15         ` Thomas.Kopp

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871qsmt3mq.fsf@hardanger.blackshift.org \
    --to=mkl@pengutronix.de \
    --cc=Thomas.Kopp@microchip.com \
    --cc=linux-can@vger.kernel.org \
    --cc=mark@baggywrinkle.co.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox