MSG_CONFIRM RX messages with SocketCAN known as unreliable under heavy load?

public inbox for linux-can@vger.kernel.org
 help / color / mirror / Atom feed

* MSG_CONFIRM RX messages with SocketCAN known as unreliable under heavy load?
@ 2021-06-17 12:22 Harald Mommer
  2021-06-18  9:16 ` Marc Kleine-Budde
  0 siblings, 1 reply; 14+ messages in thread
From: Harald Mommer @ 2021-06-17 12:22 UTC (permalink / raw)
  To: linux-can

Hello,

we are currently in the process of developing a draft specification for 
Virtio CAN. In the scope of this work I am developing a Virtio CAN Linux 
driver and a Virtio CAN Linux device running on top of our hypervisor 
solution.

The Virtio CAN Linux device forwards an existing SocketCAN CAN device 
(currently vcan) via Virtio to the Virtio driver guest so that the 
virtual driver guest can send and receive CAN frames via SocketCAN.

What was originally planned (probably with too much AUTOSAR CAN driver 
semantics in my head and too few SocketCAN knowledge) is to mark a 
transmission request as used (done) when it's sent finally on the CAN 
bus (vs. when it's given to SocketCAN not really done but still pending 
somewhere in the protocol stack).

Thought this was doable with some implementation effort using

setsockopt(..., SOL_CAN_RAW, CAN_RAW_RECV_OWN_MSGS, ...) and evaluatiing 
the MSG_CONFIRM bit on received messages.

This works fine with

cangen -g 0 -i can0

on the driver side sending CAN messages to the device guest. No 
confirmation is lost testing for several minutes.

Adding now on the device side a

cangen -g 0 -i vcan0

sending messages like crazy from the device side guest to the driver 
side guest in parallel I'm loosing TX confirmations in the Linux CAN 
stack. Seems also there is no other error indication (CAN_ERR_FLAG) that 
something like this happened. The virtio CAN device gets out of 
resources and TX will become stuck. Which is not really acceptable even 
for such a heavy load situation (-g0 on both sides).

Is CAN_RAW_RECV_OWN_MSGS / MSG_CONFIRM known as being unreliable (means 
MSG_CONFIRM messages are dropped) under extreme load situations? If so, 
is there a way to detect reliably that this happened so that somehow a 
recovery mechanism for the pending TX acknowledgements could be implemented?

I'm aware that "normal" RX messages from other nodes may be dropped due 
to overload. No problem with this.

The timing requirement originally set (done when sent on CAN bus) has to 
be weakened or put under a feature flag when it's not reliably 
implementable in all environments. But before declaring as "not reliably 
implementable with Linux SocketCAN" I would like to be sure that it's 
really that way and absolutely nothing can be done about it. Could even 
be that I missed an additional setting I'm not aware of. But the 
observed behavior may as well be something which is known to everyone 
except me.

Of course it can be that there is still a bug in my software but checked 
this carefully and I'm now convinced that under heavy load situations 
MSG_CONFIRM messages are lost somewhere in the Linux SocketCAN protocol 
stack. If there's no way to recover from this situaton I've to weaken 
the next draft Virtio CAN draft specification regarding the TX ACK 
timing. As this has some additional impact on the specification before 
doing so I would like to be really sure that the TX ACK timing cannot be 
done reliably the way it was originally planned.

Regards
Harald
-- 
Dipl.-Ing. Harald Mommer
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone:  +49 (30) 60 98 540-0 <== Zentrale
Fax:    +49 (30) 60 98 540-99
E-Mail: harald.mommer@opensynergy.com

www.opensynergy.com

Handelsregister: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Regis Adjamah

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: MSG_CONFIRM RX messages with SocketCAN known as unreliable under heavy load?
  2021-06-17 12:22 MSG_CONFIRM RX messages with SocketCAN known as unreliable under heavy load? Harald Mommer
@ 2021-06-18  9:16 ` Marc Kleine-Budde
  2021-06-18 18:23   ` Oliver Hartkopp
  2021-06-24 15:21   ` Harald Mommer
  0 siblings, 2 replies; 14+ messages in thread
From: Marc Kleine-Budde @ 2021-06-18  9:16 UTC (permalink / raw)
  To: Harald Mommer; +Cc: linux-can

[-- Attachment #1: Type: text/plain, Size: 4672 bytes --]

On 17.06.2021 14:22:03, Harald Mommer wrote:
> we are currently in the process of developing a draft specification for
> Virtio CAN. In the scope of this work I am developing a Virtio CAN Linux
> driver and a Virtio CAN Linux device

Oh that sounds interesting. Please keep the linux-can mailing list in
the loop. Do you have a first draft version for review, yet?

> running on top of our hypervisor solution.
> 
> The Virtio CAN Linux device forwards an existing SocketCAN CAN device
> (currently vcan) via Virtio to the Virtio driver guest so that the virtual
> driver guest can send and receive CAN frames via SocketCAN.
> 
> What was originally planned (probably with too much AUTOSAR CAN driver
> semantics in my head and too few SocketCAN knowledge) is to mark a
> transmission request as used (done) when it's sent finally on the CAN bus
> (vs. when it's given to SocketCAN not really done but still pending
> somewhere in the protocol stack).

Makes sense.

> Thought this was doable with some implementation effort using
> 
> setsockopt(..., SOL_CAN_RAW, CAN_RAW_RECV_OWN_MSGS, ...) and evaluatiing the
> MSG_CONFIRM bit on received messages.

Where does that code run? Would that be part of qemu running on the host
of an open source solution?

Can you sketch a quick block diagram showing guest, host, Virtio device,
Virtio driver, etc...

> This works fine with
> 
> cangen -g 0 -i can0
> 
> on the driver side sending CAN messages to the device guest. No confirmation
> is lost testing for several minutes.

Where's the driver side? On the host or the guest?

> Adding now on the device side a
> 
> cangen -g 0 -i vcan0
> 
> sending messages like crazy from the device side guest to the driver side
> guest in parallel I'm loosing TX confirmations in the Linux CAN stack. Seems
> also there is no other error indication (CAN_ERR_FLAG) that something like

CAN_ERR_FLAG are only for real CAN errors on the bus or controller
problems. The vcan interface doesn't generate any.

> this happened. The virtio CAN device gets out of resources and TX will
> become stuck. Which is not really acceptable even for such a heavy load
> situation (-g0 on both sides).
> 
> Is CAN_RAW_RECV_OWN_MSGS / MSG_CONFIRM known as being unreliable (means
> MSG_CONFIRM messages are dropped) under extreme load situations? If so, is
> there a way to detect reliably that this happened so that somehow a recovery
> mechanism for the pending TX acknowledgements could be implemented?

Have you activated SO_RXQ_OVFL?
With recvmsg() you get the number of dropped messages in the socket.
Have a look at:
https://github.com/linux-can/can-utils/blob/master/cansequence.c

> I'm aware that "normal" RX messages from other nodes may be dropped due to
> overload. No problem with this.
> 
> The timing requirement originally set (done when sent on CAN bus) has to be
> weakened or put under a feature flag when it's not reliably implementable in
> all environments.

Even if the Linux Kernel doesn't drop any messages, not all CAN
controllers support that feature. On the Linux side we try our best, but
some USB attached devices don't report a TX complete event back, so the
driver triggers the CAN echo skb after the USB transfer has been
completed.

We don't have a feature flag to query if the Linux driver support proper
CAN echo on TX complete notification.

> But before declaring as "not reliably implementable with
> Linux SocketCAN" I would like to be sure that it's really that way and
> absolutely nothing can be done about it. Could even be that I missed an
> additional setting I'm not aware of. But the observed behavior may as well
> be something which is known to everyone except me.
> 
> Of course it can be that there is still a bug in my software but checked
> this carefully and I'm now convinced that under heavy load situations
> MSG_CONFIRM messages are lost somewhere in the Linux SocketCAN protocol
> stack. If there's no way to recover from this situaton I've to weaken the
> next draft Virtio CAN draft specification regarding the TX ACK timing. As
> this has some additional impact on the specification before doing so I would
> like to be really sure that the TX ACK timing cannot be done reliably the
> way it was originally planned.

Do you have some code available yet?

regards,
Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: MSG_CONFIRM RX messages with SocketCAN known as unreliable under heavy load?
  2021-06-18  9:16 ` Marc Kleine-Budde
@ 2021-06-18 18:23   ` Oliver Hartkopp
  2021-06-19 21:42     ` Marc Kleine-Budde
  2021-06-24 15:21   ` Harald Mommer
  1 sibling, 1 reply; 14+ messages in thread
From: Oliver Hartkopp @ 2021-06-18 18:23 UTC (permalink / raw)
  To: Marc Kleine-Budde, Harald Mommer; +Cc: linux-can



On 18.06.21 11:16, Marc Kleine-Budde wrote:

> 
> Even if the Linux Kernel doesn't drop any messages, not all CAN
> controllers support that feature. On the Linux side we try our best, but
> some USB attached devices don't report a TX complete event back, so the
> driver triggers the CAN echo skb after the USB transfer has been
> completed.
> 
> We don't have a feature flag to query if the Linux driver support proper
> CAN echo on TX complete notification.
> 

We have. It is set in struct netdevice.flags and called IFF_ECHO.

https://elixir.bootlin.com/linux/v5.12.11/source/net/can/af_can.c#L257

E.g. the slcan driver does not have this bit set.

Regards,
Oliver


>> But before declaring as "not reliably implementable with
>> Linux SocketCAN" I would like to be sure that it's really that way and
>> absolutely nothing can be done about it. Could even be that I missed an
>> additional setting I'm not aware of. But the observed behavior may as well
>> be something which is known to everyone except me.
>>
>> Of course it can be that there is still a bug in my software but checked
>> this carefully and I'm now convinced that under heavy load situations
>> MSG_CONFIRM messages are lost somewhere in the Linux SocketCAN protocol
>> stack. If there's no way to recover from this situaton I've to weaken the
>> next draft Virtio CAN draft specification regarding the TX ACK timing. As
>> this has some additional impact on the specification before doing so I would
>> like to be really sure that the TX ACK timing cannot be done reliably the
>> way it was originally planned.
> 
> Do you have some code available yet?
> 
> regards,
> Marc
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: MSG_CONFIRM RX messages with SocketCAN known as unreliable under heavy load?
  2021-06-18 18:23   ` Oliver Hartkopp
@ 2021-06-19 21:42     ` Marc Kleine-Budde
  0 siblings, 0 replies; 14+ messages in thread
From: Marc Kleine-Budde @ 2021-06-19 21:42 UTC (permalink / raw)
  To: Oliver Hartkopp; +Cc: Harald Mommer, linux-can

[-- Attachment #1: Type: text/plain, Size: 1381 bytes --]

On 18.06.2021 20:23:39, Oliver Hartkopp wrote:
> > Even if the Linux Kernel doesn't drop any messages, not all CAN
> > controllers support that feature. On the Linux side we try our best, but
> > some USB attached devices don't report a TX complete event back, so the
> > driver triggers the CAN echo skb after the USB transfer has been
> > completed.
> > 
> > We don't have a feature flag to query if the Linux driver support proper
> > CAN echo on TX complete notification.
> 
> We have. It is set in struct netdevice.flags and called IFF_ECHO.
> 
> https://elixir.bootlin.com/linux/v5.12.11/source/net/can/af_can.c#L257

The flag tells the rest of the stack, that the driver takes care of
generating the CAN echo packages.

Several USB based driver set the IFF_ECHO flag, but the USB device don't
signal the TX-complete to the host. These drivers generate the CAN echo
frame after the successful USB TX transmission. This is better than
letting the networking stack generate the CAN echo frame, but it's not
100% perfect.

> E.g. the slcan driver does not have this bit set.

Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: MSG_CONFIRM RX messages with SocketCAN known as unreliable under heavy load?
  2021-06-18  9:16 ` Marc Kleine-Budde
  2021-06-18 18:23   ` Oliver Hartkopp
@ 2021-06-24 15:21   ` Harald Mommer
  2021-06-24 18:45     ` Oliver Hartkopp
                       ` (2 more replies)
  1 sibling, 3 replies; 14+ messages in thread
From: Harald Mommer @ 2021-06-24 15:21 UTC (permalink / raw)
  To: linux-can

Hello,

Am 18.06.21 um 11:16 schrieb Marc Kleine-Budde:
> On 17.06.2021 14:22:03, Harald Mommer wrote:
>> we are currently in the process of developing a draft specification for
>> Virtio CAN. In the scope of this work I am developing a Virtio CAN Linux
>> driver and a Virtio CAN Linux device
> Oh that sounds interesting. Please keep the linux-can mailing list in
> the loop. Do you have a first draft version for review, yet?

First draft went to virtio-comment@lists.oasis-open.org and 
virtio-dev@lists.oasis-open.org.

https://markmail.org/search/?q=virtio-can&q=list%3Aorg.oasis-open.lists.virtio-comment#query:virtio-can%20list%3Aorg.oasis-open.lists.virtio-comment+page:1+mid:hdxj35fsthypllkt+state:results

Link should reveal the short conversation. Currently working on the next 
draft which incorporates the review comments I got so far but the next 
draft will also address the "TX ACK" problem we are discussing here.

In the future I will put the Linux-CAN list in the loop.

>> running on top of our hypervisor solution.
>>
>> The Virtio CAN Linux device forwards an existing SocketCAN CAN device
>> (currently vcan) via Virtio to the Virtio driver guest so that the virtual
>> driver guest can send and receive CAN frames via SocketCAN.
>>
>> What was originally planned (probably with too much AUTOSAR CAN driver
>> semantics in my head and too few SocketCAN knowledge) is to mark a
>> transmission request as used (done) when it's sent finally on the CAN bus
>> (vs. when it's given to SocketCAN not really done but still pending
>> somewhere in the protocol stack).
> Makes sense.

Reading the "Makes sense". But reading also the rest of the E-Mail (and 
the thread) it makes the impression that making this timing requirement 
mandatory using SocketCAN is calling for trouble.

- Could remove the timing requirement. This is the easy solution. But 
there is the "Makes sense".

- The original strict timing requirement becomes an option so it's not a 
mandatory requirement.

2nd is my favorite (but I tend to do over engineering in the first shot 
so the option before may be indeed the better one).

Not having this timing behavior has the implication that in the next 
virtio draft spec some other things have to be changed and this means 
now simplified.

>> Thought this was doable with some implementation effort using
>>
>> setsockopt(..., SOL_CAN_RAW, CAN_RAW_RECV_OWN_MSGS, ...) and evaluatiing the
>> MSG_CONFIRM bit on received messages.
> Where does that code run? Would that be part of qemu running on the host
> of an open source solution?
The device application is closed source, runs under the COQOS hypervisor 
which is also closed source. A qemu device implementation is not planned 
as of now. The virtio CAN driver is a Linux device driver and will be 
open sourced at some point in time in the hope to get it upstreamed in a 
more far away future. Currently the driver is on an internal development 
branch, outsiders cannot see it (still better for everyone) and the 
colleagues are reviewing helping to bring it into an acceptable shape.
> Can you sketch a quick block diagram showing guest, host, Virtio device,
> Virtio driver, etc...

I hope this arrives on the list as is been sent and not garbled:

      Guest 2                    | Guest3
----------------                | ----------------
! cangen,      !                | ! cangen,      !
! candump,     !                | ! candump,     !
! cansend      !                | ! cansend      !
! using vcan0  !                | ! using can0   !
----------------                | ----------------
  ^                              |             ^
  !  ---------------------       |             !
  !  ! Service process   !       |             !
  !  ! in user space     !       |             !
  !  ! virtio-can device !       |             !
  !  ! forwarding vcan0  !       |             !
  !  ---------------------       |             !
  !    ^               ^         |             !
  !    !               !         |             !
--------------------------------------------------
  !    !   Device side ! kernel  | Driver side ! kernel
  v    v               v         |             v
---------------- -------------- | ----------------
! Device Linux ! ! HV support ! | ! Driver Linux !
!    VCan      ! !   module   ! | !  Virtio CAN  !
!    vcan0     ! ! on device  ! | !     can0     !
!              ! !   side     ! | !              !
---------------- -------------- | ----------------
        ^               ^        |        ^
        !               !        |        !
--------------------------------------------------
        !               !                 ! Hypervisor
        v               v                 v
--------------------------------------------------
!                     COQOS-HV                   !
--------------------------------------------------

>> This works fine with
>>
>> cangen -g 0 -i can0
>>
>> on the driver side sending CAN messages to the device guest. No confirmation
>> is lost testing for several minutes.
> Where's the driver side? On the host or the guest?

Both sides are guests of the hypervisor in our architecture. There is no 
host in this sense, COQOS-HV is a type 1 hypervisor. The hypervisor does 
not provide devices directly on its own, the devices are provided with 
the support of a device (provider) guest which is also only a guest of 
the hypervisor.

>
> Have you activated SO_RXQ_OVFL?
> With recvmsg() you get the number of dropped messages in the socket.
> Have a look at:
> https://github.com/linux-can/can-utils/blob/master/cansequence.c

I had no idea about SO_RXQ_OVFL. This looks to be useful to implement an 
emergency recovery mechanism not to get stuck. If detecting loss of 
received frames and the controller is still active and TX messages are 
pending for a too long time then marking the pending TX messages as used 
(done) to cope with the situation and not getting stuck (for too long). 
Might be acceptable if this was something which normally does not happen 
besides in really exceptional situations.

Nothing which should be done now, getting far too complicated for a 1st 
shot to implement a Virtio CAN device.

> We don't have a feature flag to query if the Linux driver support proper
> CAN echo on TX complete notification.

Not so nice. But the device integrator should know which backend is used 
and having a command line option for the device application the issue 
can be handled. Need the command line switch anyway now to do experiments.

Regards
Harald

-- 
Dipl.-Ing. Harald Mommer
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone:  +49 (30) 60 98 540-0 <== Zentrale
Fax:    +49 (30) 60 98 540-99
E-Mail:harald.mommer@opensynergy.com

www.opensynergy.com

Handelsregister: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Regis Adjamah


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: MSG_CONFIRM RX messages with SocketCAN known as unreliable under heavy load?
  2021-06-24 15:21   ` Harald Mommer
@ 2021-06-24 18:45     ` Oliver Hartkopp
  2021-06-28 13:47       ` Harald Mommer
  2021-06-25  9:19     ` review of virtio-can (was: Re: MSG_CONFIRM RX messages with SocketCAN known as unreliable under heavy load?) Marc Kleine-Budde
  2021-06-25  9:39     ` MSG_CONFIRM RX messages with SocketCAN known as unreliable under heavy load? Marc Kleine-Budde
  2 siblings, 1 reply; 14+ messages in thread
From: Oliver Hartkopp @ 2021-06-24 18:45 UTC (permalink / raw)
  To: Harald Mommer, linux-can

Hello Harald,

On 24.06.21 17:21, Harald Mommer wrote:

> The device application is closed source, runs under the COQOS hypervisor 
> which is also closed source.

What is this 'device application' in the sketch below?

>> Can you sketch a quick block diagram showing guest, host, Virtio device,
>> Virtio driver, etc...
> 
> I hope this arrives on the list as is been sent and not garbled:
> 
>       Guest 2                    | Guest3
> ----------------                | ----------------
> ! cangen,      !                | ! cangen,      !
> ! candump,     !                | ! candump,     !
> ! cansend      !                | ! cansend      !
> ! using vcan0  !                | ! using can0   !
> ----------------                | ----------------
>   ^                              |             ^
>   !  ---------------------       |             !
>   !  ! Service process   !       |             !
>   !  ! in user space     !       |             !
>   !  ! virtio-can device !       |             !
>   !  ! forwarding vcan0  !       |             !
>   !  ---------------------       |             !

Hopefully not this "Service process in user space" ???

If so, this is a very questionable approach!

To route/forward/manipulate CAN frames between CAN network interfaces 
there is a CAN gateway module 'can-gw' which can be controlled over 
PF_NETLINK.

The can-gw runs super efficient and fast inside kernel space in the 
SOFTIRQ context.

E.g. 22.000 CAN frames/s with 6% sys load on a 2 core i7 from 2012, 
here: https://youtu.be/O3eOjfTl1yk?t=89

Just type cangw from the can-utils to get an impression of the powerful 
options.

You can even calculate E2E CRCs and XOR checksums after doing content 
mods on the fly.

>   !    ^               ^         |             !
>   !    !               !         |             !
> --------------------------------------------------
>   !    !   Device side ! kernel  | Driver side ! kernel
>   v    v               v         |             v
> ---------------- -------------- | ----------------
> ! Device Linux ! ! HV support ! | ! Driver Linux !
> !    VCan      ! !   module   ! | !  Virtio CAN  !
> !    vcan0     ! ! on device  ! | !     can0     !
> !              ! !   side     ! | !              !
> ---------------- -------------- | ----------------
>         ^               ^        |        ^
>         !               !        |        !
> --------------------------------------------------
>         !               !                 ! Hypervisor
>         v               v                 v
> --------------------------------------------------
> !                     COQOS-HV                   !
> --------------------------------------------------
> 

(..)

> can be handled. Need the command line switch anyway now to do experiments.

Now with cangw ?!? ;-)

Regards,
Oliver

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: MSG_CONFIRM RX messages with SocketCAN known as unreliable under heavy load?
  2021-06-24 18:45     ` Oliver Hartkopp
@ 2021-06-28 13:47       ` Harald Mommer
  0 siblings, 0 replies; 14+ messages in thread
From: Harald Mommer @ 2021-06-28 13:47 UTC (permalink / raw)
  To: Oliver Hartkopp, linux-can

Hello Oliver,

Am 24.06.21 um 20:45 schrieb Oliver Hartkopp:
>
> What is this 'device application' in the sketch below?
The device application provides the virtio CAN device. It provides a 
virtio CAN device using an existing CAN device (here vcan).
>
>>> Can you sketch a quick block diagram showing guest, host, Virtio 
>>> device,
>>> Virtio driver, etc...
>>
>> I hope this arrives on the list as is been sent and not garbled:
>>
>>       Guest 2                    | Guest3
>> ----------------                | ----------------
>> ! cangen,      !                | ! cangen,      !
>> ! candump,     !                | ! candump,     !
>> ! cansend      !                | ! cansend      !
>> ! using vcan0  !                | ! using can0   !
>> ----------------                | ----------------
>>   ^                              |             ^
>>   !  ---------------------       |             !
>>   !  ! Service process   !       |             !
>>   !  ! in user space     !       |             !
>>   !  ! virtio-can device !       |             !
>>   !  ! forwarding vcan0  !       |             !
>>   !  ---------------------       |             !
>
> Hopefully not this "Service process in user space" ???
The virtio CAN device is the "Service process in user space".
>
> If so, this is a very questionable approach!
>
> To route/forward/manipulate CAN frames between CAN network interfaces
> there is a CAN gateway module 'can-gw' which can be controlled over
> PF_NETLINK.
>
> The can-gw runs super efficient and fast inside kernel space in the
> SOFTIRQ context.
>
> E.g. 22.000 CAN frames/s with 6% sys load on a 2 core i7 from 2012,
> here: https://youtu.be/O3eOjfTl1yk?t=89
>
> Just type cangw from the can-utils to get an impression of the powerful
> options.
>
> You can even calculate E2E CRCs and XOR checksums after doing content
> mods on the fly.
>
>>   ! ^               ^         |             !
>>   !    !               !         |             !
>> --------------------------------------------------
>>   !    !   Device side ! kernel  | Driver side ! kernel
>>   v    v               v         |             v
>> ---------------- -------------- | ----------------
>> ! Device Linux ! ! HV support ! | ! Driver Linux !
>> !    VCan      ! !   module   ! | !  Virtio CAN  !
>> !    vcan0     ! ! on device  ! | !     can0     !
>> !              ! !   side     ! | !              !
>> ---------------- -------------- | ----------------
>>         ^               ^        |        ^
>>         !               !        |        !
>> --------------------------------------------------
>>         !               !                 ! Hypervisor
>>         v               v                 v
>> --------------------------------------------------
>> !                     COQOS-HV                   !
>> --------------------------------------------------
>>
>
> (..)
>
>> can be handled. Need the command line switch anyway now to do 
>> experiments.
>
> Now with cangw ?!? ;-)

No. We cannot do this here with something which already exists like CAN 
GW. We are not talking about user processes running on the same Linux 
instance which want to communicate to each other. This might have been 
the misunderstanding here.

We are talking about two different virtual machines both running 
different OS instances under a hypervisor! And one or two VMs may not 
even run Linux as the OS. The device VM could in a future setup run 
under an RTOS using maybe an AUTOSAR CAN driver as backend which might 
even come from a 3rd party.

In the current setup we have 2 VMs running different instances of Linux 
on the same physical machine under hypervisor control. Only the left VM, 
the device VM has access to any hardware (like a CAN controller). The 
right VM has no direct access to any hardware at all. To be able to send 
and receive frames in the right (driver) VM we have to do something to 
be able to get out to the external world. Currently there exists nothing 
to do this for CAN so we must do the new virtio CAN device which allows 
the access to a (physical) CAN controller via Virtio means.

>
> Regards,
> Oliver
>
Regards
Harald

^ permalink raw reply	[flat|nested] 14+ messages in thread

* review of virtio-can (was: Re: MSG_CONFIRM RX messages with SocketCAN known as unreliable under heavy load?)
  2021-06-24 15:21   ` Harald Mommer
  2021-06-24 18:45     ` Oliver Hartkopp
@ 2021-06-25  9:19     ` Marc Kleine-Budde
  2021-06-29 17:14       ` Harald Mommer
  2021-07-14  7:15       ` [virtio-dev] " Michael S. Tsirkin
  2021-06-25  9:39     ` MSG_CONFIRM RX messages with SocketCAN known as unreliable under heavy load? Marc Kleine-Budde
  2 siblings, 2 replies; 14+ messages in thread
From: Marc Kleine-Budde @ 2021-06-25  9:19 UTC (permalink / raw)
  To: Harald Mommer; +Cc: linux-can, virtio-dev

[-- Attachment #1: Type: text/plain, Size: 13704 bytes --]

On 24.06.2021 17:21:15, Harald Mommer wrote:
> First draft went to virtio-comment@lists.oasis-open.org and
> virtio-dev@lists.oasis-open.org.
> 
> https://markmail.org/search/?q=virtio-can&q=list%3Aorg.oasis-open.lists.virtio-comment#query:virtio-can%20list%3Aorg.oasis-open.lists.virtio-comment+page:1+mid:hdxj35fsthypllkt+state:results

> [virtio-dev] [PATCH 1/1] [RFC] virtio-can: Add the device specification.
> Harald Mommer Thu, 01 Apr 2021 08:21:09 -0700
> 
> virtio-can is a virtual CAN device. It provides a way to give access to
> a CAN controller from a driver guest. The device is aimed to be used by
> driver guests running a HLOS as well as by driver guests running a
> typical RTOS as used in controller environments.

Let's open the focus of this driver and not limit us to RTOSes.

> ---
>  content.tex      |   1 +
>  introduction.tex |   3 +
>  virtio-can.tex   | 245 +++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 249 insertions(+)
>  create mode 100644 virtio-can.tex
> 
> diff --git a/content.tex b/content.tex
> index e536fd4..c1604db 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -6564,6 +6564,7 @@ \subsubsection{Legacy Interface: Framing 
> Requirements}\label{sec:Device
>  \input{virtio-mem.tex}
>  \input{virtio-i2c.tex}
>  \input{virtio-scmi.tex}
> +\input{virtio-can.tex}
>  
>  \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>  
> diff --git a/introduction.tex b/introduction.tex
> index 7204b24..84ea5c0 100644
> --- a/introduction.tex
> +++ b/introduction.tex
> @@ -79,6 +79,9 @@ \section{Normative References}\label{sec:Normative References}
>         \phantomsection\label{intro:SCMI}\textbf{[SCMI]} &
>         Arm System Control and Management Interface, DEN0056,
>         \newline\url{https://developer.arm.com/docs/den0056/c}, version C and 
> any future revisions\\
> +       \phantomsection\label{intro:CAN_Driver}\textbf{[CAN Driver]} &
> +       Specification of CAN Driver -- AUTOSAR CP R20-11,

As mentioned before don't make this AUTOSAR specific.

> +       
> \newline\url{https://www.autosar.org/fileadmin/user_upload/standards/classic/20-11/AUTOSAR_SWS_CANDriver.pdf}\\
>  
>  \end{longtable}
>  
> diff --git a/virtio-can.tex b/virtio-can.tex
> new file mode 100644
> index 0000000..c343759
> --- /dev/null
> +++ b/virtio-can.tex
> @@ -0,0 +1,245 @@
> +\section{CAN Device}\label{sec:Device Types / CAN Device}
> +
> +virtio-can is a virtio based CAN (Controller Area Network) device. It is
> +used to give a virtual machine access to a CAN bus. The CAN bus may
> +either be a physical CAN bus or a virtual CAN bus between virtual
> +machines or a combination of both.
> +
> +This section relies on definitions made by the AUTOSAR
> +\hyperref[intro:CAN_Driver]{CAN Driver} specification.

Please refer to the ISO CAN specs.

> +
> +\subsection{Device ID}\label{sec:Device Types / CAN Device / Device ID}
> +
> +36
> +
> +\subsection{Virtqueues}\label{sec:Device Types / CAN Device / Virtqueues}
> +
> +\begin{description}
> +\item[0] Txq
> +\item[1] Rxq
> +\item[2] Controlq
> +\item[3] Indicationq
> +\end{description}
> +
> +The \field{Txq} is used to send CAN packets to the CAN bus.
> +
> +The \field{Rxq} is used to receive CAN packets from the CAN bus.
> +
> +The \field{Controlq} is used to control the state of the CAN controller.
> +
> +The \field{Indicationq} is used to receive unsolicited indications of
> +CAN controller state changes.
> +
> +\subsection{Feature Bits}\label{sec:Device Types / CAN Device / Feature Bits}
> +
> +The virtio-can device always supports classic CAN frames with a maximum
> +payload size of 8 bytes.
> +
> +Actual CAN controllers support Extended CAN IDs with 29 bits (CAN~2.0B)
> +as well as Standard CAN IDs with 11 bits (CAN~2.0A). The support of
> +CAN~2.0B Extended CAN IDs is considered as mandatory for this
> +specification.

Let's make Classical CAN a feature just like CAN-FD. There might be
Controller Implementations that only support CAN-FD.

> +
> +\begin{description}
> +
> +\item[VIRTIO_CAN_F_CAN_FD (0)]
> +
> +In addition to classic CAN frames the device supports CAN FD frames with
> +a maximum payload size of 64 bytes.
>

OK

> +\end{description}
> +
> +\subsection{Device configuration layout}\label{sec:Device Types / CAN Device / 
> Device configuration layout}
> +
> +All fields of this configuration are always available and read-only for
> +the driver.
> +
> +\begin{lstlisting}
> +struct virtio_can_config {
> +        le16 lo_prio_count; 
> +        le16 hi_prio_count;
> +};
> +\end{lstlisting}

Have you had a look at the virtio-net? There is already support for
multiple queue pairs. Though, I haven't found any notion of priorities
among the queues.

> +
> +To operate the Virtio CAN device it may be necessary to know some basic
> +properties of the underlying physical CAN controller hardware and its
> +configuration.
> +
> +Physical CAN controllers may support transmission by putting messages
> +into FIFOs first and / or by using transmit buffers directly. The user
> +of the Virtio CAN driver may need to know
> +
> +\begin{itemize}
> +\item Number of TX FIFO places for non time critical CAN messages
> +\item Number of TX buffers for high priority CAN messages
> +\end{itemize}

IMHO the FIFO depth is optional and should be per queue.

> +
> +to schedule an optimal transmission of CAN messages. Non time critical
> +messages may be sent via a FIFO where they may suffer "Inner Priority
> +Inversion" (\hyperref[intro:CAN_Driver]{CAN Driver} chapter 2.1). High
> +priority messages are preferably sent directly to a transmit buffer
> +where they immediately participate in CAN bus arbitration.
> +

Let's use multiple queues like Ethernet has.

> +\subsection{Device Initialization}\label{sec:Device Types / CAN Device / 
> Device Initialization}
> +
> +\begin{enumerate}
> +
> +\item Read the feature bits and negotiate with the device.
> +
> +\item Fill the virtqueue \field{Rxq} with empty buffers to be ready for
> +the reception of CAN messages.
> +
> +\item Fill the virtqueue \field{Indicationq} with empty buffers so that
> +the CAN device is able to provide status change indications to the
> +virtio CAN driver.
> +
> +\item Read the CAN controller properties using the \field{Controlq}.
> +
> +\item Start the CAN controller using the \field{Controlq}.

How does this work on Ethernet?

> +
> +\end{enumerate}
> +
> +\subsection{Device Operation}\label{sec:Device Types / CAN Device / Device 
> Operation}
> +
> +A device operation has an outcome which is described by one of the
> +following values:
> +
> +\begin{lstlisting}
> +#define VIRTIO_CAN_RESULT_OK     0u
> +#define VIRTIO_CAN_RESULT_NOT_OK 1u
> +\end{lstlisting}
> +
> +The type of a CAN message identifier is identified by the most
> +significant 2 bits of the internally used 32 bit value. This matches the
> +definition for Can_IdType in

I'm missing RTR messages.

Please don't cramp to much information, or better say any additional
information in the remaining 3 bits of the 32 bit CAN-id.

You defined a struct for the CAN messages, so add all needed flags to a
proper flags field.

> +\hyperref[intro:CAN_Driver]{CAN Driver} chapter 8.2.3.
> +
> +\begin{lstlisting}
> +#define VIRTIO_CAN_ID_TYPE_STANDARD    0x00000000U
> +#define VIRTIO_CAN_ID_TYPE_STANDARD_FD 0x40000000U
> +#define VIRTIO_CAN_ID_TYPE_EXTENDED    0x80000000U
> +#define VIRTIO_CAN_ID_TYPE_EXTENDED_FD 0xC0000000U
> +\end{lstlisting}
> +
> +\subsubsection{Controller Mode}\label{sec:Device Types / CAN Device / Device 
> Operation / Controller Mode}
> +
> +The general format of a request in the \field{Controlq} is
> +
> +\begin{lstlisting}
> +struct virtio_can_control_out {
> +#define VIRTIO_CAN_SET_CTRL_MODE_START  0x0201u
> +#define VIRTIO_CAN_SET_CTRL_MODE_STOP   0x0202u
> +        le16 msg_type; 
> +};
> +\end{lstlisting}

How does Ethernet handle this?

> +
> +To participate in bus communication the CAN controller must be started
> +by sending a VIRTIO_CAN_SET_CTRL_MODE_START control message,
> +to stop participating in bus communication it must be stopped by sending
> +a VIRTIO_CAN_SET_CTRL_MODE_STOP control message. Both requests are
> +confirmed by the result of the operation.
> +
> +\begin{lstlisting}
> +struct virtio_can_set_ctrl_mode_in {
> +        u8 result;
> +};
> +\end{lstlisting}
> +
> +If the transition succeeded the result shall be VIRTIO_CAN_RESULT_OK
> +otherwise it shall be VIRTIO_CAN_RESULT_NOT_OK. The request shall be put
> +into the used queue when the CAN controller finalized the transition to
> +the requested controller mode.
> +
> +A transition to STOPPED state cancels all CAN messages pending for
> +transmission. A state transition to STOPPED state shall trigger to put
> +all CAN messages pending for transmission into the used queue with
> +result VIRTIO_CAN_RESULT_NOT_OK.
> +
> +Initially the CAN controller is in STOPPED state.
> +
> +\subsubsection{CAN Message Transmission}\label{sec:Device Types / CAN Device / 
> Device Operation / CAN Message Transmission}
> +
> +Messages may be transmitted by placing outgoing CAN messages in the
> +virtqueue \field{Txq}.
> +
> +\begin{lstlisting}
> +struct virtio_can_tx_out {
> +#define VIRTIO_CAN_TX 0x0001u
> +        le16 msg_type;

make this 32 bit and add flags for extended CAN messages, RTR, CAN-FD.
We have to discuss if we need a bit rate switch flag.

> +        le16 priority;

Let's handle priority by using different queues.

> +        le32 can_id;
> +        u8 sdu[];

Where's the dlc or len information?

We have to discuss if we want to pass dlc (0x0...0xf for both Classical
CAN and CAN-FD) or len (0...8 for Classical CAN and 0...64 for CAN-FD).

> +};
> +
> +struct virtio_can_tx_in {
> +        u8 result;
> +};
> +\end{lstlisting}
> +
> +Priority is 0 for low priority and 1 for high priority CAN messages.
> +
> +The actual length of the SDU can be calculated from the length of the device
> +read-only descriptor.
> +
> +To avoid internal priority inversion in the \field{Txq} the user of the
> +driver may do a book keeping of in flight transmission requests and
> +defer sending of TX messages until the chosen transmission resource
> +becomes available.
> +
> +If priority, can_id or SDU length are out of range or the CAN controller
> +is in an invalid state result shall be set to VIRTIO_CAN_RESULT_NOT_OK
> +and the message shall not be scheduled for transmission. Sending a CAN
> +message with a priority with 0 transmission places configured shall
> +be considered as priority being out of range.
> +
> +If the parameters are valid the message is scheduled for transmission
> +and result is set to VIRTIO_CAN_OK. The transmission request should be
> +put into the used queue after the physical CAN controller acknowledged
> +the transmission on the CAN bus (may have to be put under a feature flag
> +as there may be non AUTOSAR CAN driver backends which don't provide a
> +trigger to do this correctly).

I think this should me mandatory, but we might have a flag or feature
flag to indicate the "quality" if the tx done information. Some HW CAN
devices only provide feedback that they have queued the CAN message, but
no feedback that they have actually transmitted the message.

> +
> +\subsubsection{CAN Message Reception}\label{sec:Device Types / CAN Device / 
> Device Operation / CAN Message Reception}
> +
> +Messages can be received by providing empty incoming buffers to the
> +virtqueue \field{Rxq}.
> +
> +\begin{lstlisting}
> +struct virtio_can_rx {
> +#define VIRTIO_CAN_RX 0x0101u
> +        le16 msg_type;
> +        le16 reserved;
> +        le32 can_id;
> +        u8 sdu[];
> +};
> +\end{lstlisting}
> +
> +The structure element reserved may in the future be used to forward an
> +AUTOSAR hoh, see (\hyperref[intro:CAN_Driver]{CAN Driver} chapter 7.6).
> +The value should be set to 0xFFFF.

Please remove any AUTOSAR references.

> +
> +If the feature \field{VIRTIO_CAN_F_CAN_FD} has been negotiated the
> +maximal possible SDU length is 64, if the feature has not been
> +negotiated the maximal possible SDU length is 8.
> +
> +The actual length of the SDU can be calculated from the length of the
> +driver read-only descriptor.
> +
> +\subsubsection{BusOff Indication}\label{sec:Device Types / CAN Device / Device 
> Operation / BusOff Indication}
> +
> +There are certain error conditions so that the physical CAN controller
> +has to stop participating in CAN communication on the bus. If such an
> +error condition occurs the device informs the driver about the
> +unsolicited CAN controller state change by a CAN BusOff indication.
> +
> +\begin{lstlisting}
> +struct virtio_can_busoff_ind {
> +#define VIRTIO_CAN_BUSOFF_IND 0x0301u
> +        le16 msg_type;
> +};
> +\end{lstlisting}
> +
> +After bus-off detection the CAN controller is in STOPPED state. The CAN
> +module does not participate in bus communication any more so all CAN
> +messages pending for transmission must be put into the used queue with
> +result VIRTIO_CAN_RESULT_NOT_OK.
> -- 
> 2.17.1

Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: review of virtio-can (was: Re: MSG_CONFIRM RX messages with SocketCAN known as unreliable under heavy load?)
  2021-06-25  9:19     ` review of virtio-can (was: Re: MSG_CONFIRM RX messages with SocketCAN known as unreliable under heavy load?) Marc Kleine-Budde
@ 2021-06-29 17:14       ` Harald Mommer
  2021-07-14  7:15       ` [virtio-dev] " Michael S. Tsirkin
  1 sibling, 0 replies; 14+ messages in thread
From: Harald Mommer @ 2021-06-29 17:14 UTC (permalink / raw)
  To: Marc Kleine-Budde; +Cc: linux-can, virtio-dev

Hello,

Am 25.06.21 um 11:19 schrieb Marc Kleine-Budde:
> On 24.06.2021 17:21:15, Harald Mommer wrote:
>> First draft went to virtio-comment@lists.oasis-open.org and
>> virtio-dev@lists.oasis-open.org.
>>
>> https://markmail.org/search/?q=virtio-can&q=list%3Aorg.oasis-open.lists.virtio-comment#query:virtio-can%20list%3Aorg.oasis-open.lists.virtio-comment+page:1+mid:hdxj35fsthypllkt+state:results
>> [virtio-dev] [PATCH 1/1] [RFC] virtio-can: Add the device specification.
>> Harald Mommer Thu, 01 Apr 2021 08:21:09 -0700
>>
>> virtio-can is a virtual CAN device. It provides a way to give access to
>> a CAN controller from a driver guest. The device is aimed to be used by
>> driver guests running a HLOS as well as by driver guests running a
>> typical RTOS as used in controller environments.
> Let's open the focus of this driver and not limit us to RTOSes.
Not limited to. But usable for the RTOS audience as well.
>>   The CAN dlc is not needed for anything, just like in SocketCAN.---
>>   content.tex      |   1 +
>>   introduction.tex |   3 +
>>   virtio-can.tex   | 245 +++++++++++++++++++++++++++++++++++++++++++++++
>>   3 files changed, 249 insertions(+)
>>   create mode 100644 virtio-can.tex
>>
>> diff --git a/content.tex b/content.tex
>> index e536fd4..c1604db 100644
>> --- a/content.tex
>> +++ b/content.tex
>> @@ -6564,6 +6564,7 @@ \subsubsection{Legacy Interface: Framing
>> Requirements}\label{sec:Device
>>   \input{virtio-mem.tex}
>>   \input{virtio-i2c.tex}
>>   \input{virtio-scmi.tex}
>> +\input{virtio-can.tex}
>>   
>>   \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>>   
>> diff --git a/introduction.tex b/introduction.tex
>> index 7204b24..84ea5c0 100644
>> --- a/introduction.tex
>> +++ b/introduction.tex
>> @@ -79,6 +79,9 @@ \section{Normative References}\label{sec:Normative References}
>>          \phantomsection\label{intro:SCMI}\textbf{[SCMI]} &
>>          Arm System Control and Management Interface, DEN0056,
>>          \newline\url{https://developer.arm.com/docs/den0056/c}, version C and
>> any future revisions\\
>> +       \phantomsection\label{intro:CAN_Driver}\textbf{[CAN Driver]} &
>> +       Specification of CAN Driver -- AUTOSAR CP R20-11,
> As mentioned before don't make this AUTOSAR specific.

Not sure yet whether the AUTOSAR reference should be removed. Going to 
simplify so some things may disappear and this may result in being able 
to do so. Working in this direction.

>> +
>> \newline\url{https://www.autosar.org/fileadmin/user_upload/standards/classic/20-11/AUTOSAR_SWS_CANDriver.pdf}\\
>>   
>>   \end{longtable}
>>   
>> diff --git a/virtio-can.tex b/virtio-can.tex
>> new file mode 100644
>> index 0000000..c343759
>> --- /dev/null
>> +++ b/virtio-can.tex
>> @@ -0,0 +1,245 @@
>> +\section{CAN Device}\label{sec:Device Types / CAN Device}
>> +
>> +virtio-can is a virtio based CAN (Controller Area Network) device. It is
>> +used to give a virtual machine access to a CAN bus. The CAN bus may
>> +either be a physical CAN bus or a virtual CAN bus between virtual
>> +machines or a combination of both.
>> +
>> +This section relies on definitions made by the AUTOSAR
>> +\hyperref[intro:CAN_Driver]{CAN Driver} specification.
> Please refer to the ISO CAN specs.
Next version will contain the ISO specification additionally in the 
references. In the meantime I heavily looked into the ISO spec also so 
this reference will belong into the document anyway.
>> +
>> +\subsection{Device ID}\label{sec:Device Types / CAN Device / Device ID}
>> +
>> +36
>> +
>> +\subsection{Virtqueues}\label{sec:Device Types / CAN Device / Virtqueues}
>> +
>> +\begin{description}
>> +\item[0] Txq
>> +\item[1] Rxq
>> +\item[2] Controlq
>> +\item[3] Indicationq
>> +\end{description}
>> +
>> +The \field{Txq} is used to send CAN packets to the CAN bus.
>> +
>> +The \field{Rxq} is used to receive CAN packets from the CAN bus.
>> +
>> +The \field{Controlq} is used to control the state of the CAN controller.
>> +
>> +The \field{Indicationq} is used to receive unsolicited indications of
>> +CAN controller state changes.
>> +
>> +\subsection{Feature Bits}\label{sec:Device Types / CAN Device / Feature Bits}
>> +
>> +The virtio-can device always supports classic CAN frames with a maximum
>> +payload size of 8 bytes.
>> +
>> +Actual CAN controllers support Extended CAN IDs with 29 bits (CAN~2.0B)
>> +as well as Standard CAN IDs with 11 bits (CAN~2.0A). The support of
>> +CAN~2.0B Extended CAN IDs is considered as mandatory for this
>> +specification.
> Let's make Classical CAN a feature just like CAN-FD. There might be
> Controller Implementations that only support CAN-FD.

Originally it was there in a very first internal version. ISO sais in "1 
Scope" that there are 3 implementation options. In this chapter 
classical CAN is always supported. So I thought "This is not a feature". 
But "10.9.10 Disabling of frame formats" says that a frame format can be 
disabled by configuration. So this is indeed needed as a feature. Was 
mislead by ISO "1 Scope".

>> +
>> +\begin{description}
>> +
>> +\item[VIRTIO_CAN_F_CAN_FD (0)]
>> +
>> +In addition to classic CAN frames the device supports CAN FD frames with
>> +a maximum payload size of 64 bytes.
>>
> OK
>
>> +\end{description}
>> +
>> +\subsection{Device configuration layout}\label{sec:Device Types / CAN Device /
>> Device configuration layout}
>> +
>> +All fields of this configuration are always available and read-only for
>> +the driver.
>> +
>> +\begin{lstlisting}
>> +struct virtio_can_config {
>> +        le16 lo_prio_count;
>> +        le16 hi_prio_count;
>> +};
>> +\end{lstlisting}
> Have you had a look at the virtio-net? There is already support for
> multiple queue pairs. Though, I haven't found any notion of priorities
> among the queues.

I had a look into the network device. Any I have also not found any 
notion that those multiple queues are associated to different priorites.

The block device has optionally also multiple queues. The purpose in the 
block device is that different CPUs can have ideally their own queue 
assigned. This avoids lock operations on the queues avoiding 
bottlenecks. The network device and the block device have in common that 
they are both devices which have to transport an incredible amount of 
data so I suspect that the reason for optional multiple queue support in 
the network device is exactly the same: Performance improvement avoiding 
lock bottlenecks.

>> +
>> +To operate the Virtio CAN device it may be necessary to know some basic
>> +properties of the underlying physical CAN controller hardware and its
>> +configuration.
>> +
>> +Physical CAN controllers may support transmission by putting messages
>> +into FIFOs first and / or by using transmit buffers directly. The user
>> +of the Virtio CAN driver may need to know
>> +
>> +\begin{itemize}
>> +\item Number of TX FIFO places for non time critical CAN messages
>> +\item Number of TX buffers for high priority CAN messages
>> +\end{itemize}
> IMHO the FIFO depth is optional and should be per queue.

In the meantime I believe this part here was over-engineered for a first 
specification draft.

- I see in SocketCAN no support of priorities. Looked into the m_can.c 
driver heavily and also there no indication that priorities are supported.

- I see in m_can.c also no support that I could query for the properties 
of anything (queue depth etc.)

Having an AUTOSAR driver as backend from which the properties are known 
under an RTOS this all could be done. But this is not what can be done 
in the Linux/SocketCAN environment which is used now.

I'm going to remove this, this has no place in a first draft specification.

Not sure whether the usage of different virtio queues for different 
priorities would be benefitial. But if I remove now those stuff (and may 
keep the priority thing open by requiring to set a reserved field to 0) 
no need to decide this now.

>> +
>> +to schedule an optimal transmission of CAN messages. Non time critical
>> +messages may be sent via a FIFO where they may suffer "Inner Priority
>> +Inversion" (\hyperref[intro:CAN_Driver]{CAN Driver} chapter 2.1). High
>> +priority messages are preferably sent directly to a transmit buffer
>> +where they immediately participate in CAN bus arbitration.
>> +
> Let's use multiple queues like Ethernet has.
Ethernet does it (most probably also like block device) for a different 
purpose, not for priorities.
>> +\subsection{Device Initialization}\label{sec:Device Types / CAN Device /
>> Device Initialization}
>> +
>> +\begin{enumerate}
>> +
>> +\item Read the feature bits and negotiate with the device.
>> +
>> +\item Fill the virtqueue \field{Rxq} with empty buffers to be ready for
>> +the reception of CAN messages.
>> +
>> +\item Fill the virtqueue \field{Indicationq} with empty buffers so that
>> +the CAN device is able to provide status change indications to the
>> +virtio CAN driver.
>> +
>> +\item Read the CAN controller properties using the \field{Controlq}.
>> +
>> +\item Start the CAN controller using the \field{Controlq}.
> How does this work on Ethernet?

The part with property reading from the Control Queue is wrong, it's the 
config space. Besides this initialization works always the same using 
virtio. It's a standardized initialisation sequence.

https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.pdf

3.1.1 "Driver Requirements: Device Initialization" is about the feature 
negotation.

And then is

5.1.5 "Device Initialization" for the network device.

Reading feature bits, filling the queues on which something is 
transmitted from the device => driver with empty buffers so that the 
device has something so that it can put in data. Always the same scheme.

Starting and Stopping the CAN controller: I had support of the AUTOSAR 
CAN spec in mind, means Can_SetControllerMode() which allows to start 
and to stop a CAN controller. At first glance the Ethernet device does 
not have anything similar to stop a whole Ethernet controller. The 
Ethernet device supports optionally the stop of reception of all kinds 
of frames, chapter 5.1.6.5.1 "Packet Receive Filtering".

In m_can.c I saw m_can_start() and m_can_stop(). So far so good.

But not only in m_can_set_mode() I saw that only CAN_MODE_START is 
supported. So this is not a bug in m_can.c but something intentional. I 
guess I will receive comments on this attempt to support AUTOSAR 
Can_SetControllerMode(), especially on the stopping part. Smells like 
I've something not gotten.

>> +
>> +\end{enumerate}
>> +
>> +\subsection{Device Operation}\label{sec:Device Types / CAN Device / Device
>> Operation}
>> +
>> +A device operation has an outcome which is described by one of the
>> +following values:
>> +
>> +\begin{lstlisting}
>> +#define VIRTIO_CAN_RESULT_OK     0u
>> +#define VIRTIO_CAN_RESULT_NOT_OK 1u
>> +\end{lstlisting}
>> +
>> +The type of a CAN message identifier is identified by the most
>> +significant 2 bits of the internally used 32 bit value. This matches the
>> +definition for Can_IdType in
> I'm missing RTR messages.
>
> Please don't cramp to much information, or better say any additional
> information in the remaining 3 bits of the 32 bit CAN-id.
>
> You defined a struct for the CAN messages, so add all needed flags to a
> proper flags field.

You are missing RTR frames because my brain is AUTOSAR CAN polluted.

- AUTOSAR CAN does not support RTR frames

- RTR frames were removed from CAN FD, they exist only in classic CAN

=> Are RTR frames an obsolete or a needed feature? I just don't know, 
therefore I ask.

If anyone implements a virtio CAN device on a controller using an 
AUTOSAR CAN driver as backend there is neither support nor need for RTR 
frames. So RTR frames must be considered as an optional feature (flag) 
because they are not doable in all environments.

The bits were defined as in AUTOSAR CAN. Now the CAN FD bit in AUTOSAR 
clashes with the RTR bit for SocketCan. To add all needed flags in an 
own field is fine. No clashes, no problems and one reference less to the 
AUTOSAR CAN driver specification.
>> +\hyperref[intro:CAN_Driver]{CAN Driver} chapter 8.2.3.
>> +
>> +\begin{lstlisting}
>> +#define VIRTIO_CAN_ID_TYPE_STANDARD    0x00000000U
>> +#define VIRTIO_CAN_ID_TYPE_STANDARD_FD 0x40000000U
>> +#define VIRTIO_CAN_ID_TYPE_EXTENDED    0x80000000U
>> +#define VIRTIO_CAN_ID_TYPE_EXTENDED_FD 0xC0000000U
>> +\end{lstlisting}
>> +
>> +\subsubsection{Controller Mode}\label{sec:Device Types / CAN Device / Device
>> Operation / Controller Mode}
>> +
>> +The general format of a request in the \field{Controlq} is
>> +
>> +\begin{lstlisting}
>> +struct virtio_can_control_out {
>> +#define VIRTIO_CAN_SET_CTRL_MODE_START  0x0201u
>> +#define VIRTIO_CAN_SET_CTRL_MODE_STOP   0x0202u
>> +        le16 msg_type;
>> +};
>> +\end{lstlisting}
> How does Ethernet handle this?
As mentioned above, Virtio Ethernet seems not to support a full off of 
the Ethernet controller. But the more interesting question is: How 
should virtio CAN handle this? If this AUTOSAR Can_SetControllerMode() 
start and stop support brings us into trouble then here something has to 
happen. Always on?
>> +
>> +To participate in bus communication the CAN controller must be started
>> +by sending a VIRTIO_CAN_SET_CTRL_MODE_START control message,
>> +to stop participating in bus communication it must be stopped by sending
>> +a VIRTIO_CAN_SET_CTRL_MODE_STOP control message. Both requests are
>> +confirmed by the result of the operation.
>> +
>> +\begin{lstlisting}
>> +struct virtio_can_set_ctrl_mode_in {
>> +        u8 result;
>> +};
>> +\end{lstlisting}
>> +
>> +If the transition succeeded the result shall be VIRTIO_CAN_RESULT_OK
>> +otherwise it shall be VIRTIO_CAN_RESULT_NOT_OK. The request shall be put
>> +into the used queue when the CAN controller finalized the transition to
>> +the requested controller mode.
>> +
>> +A transition to STOPPED state cancels all CAN messages pending for
>> +transmission. A state transition to STOPPED state shall trigger to put
>> +all CAN messages pending for transmission into the used queue with
>> +result VIRTIO_CAN_RESULT_NOT_OK.
>> +
>> +Initially the CAN controller is in STOPPED state.
>> +
>> +\subsubsection{CAN Message Transmission}\label{sec:Device Types / CAN Device /
>> Device Operation / CAN Message Transmission}
>> +
>> +Messages may be transmitted by placing outgoing CAN messages in the
>> +virtqueue \field{Txq}.
>> +
>> +\begin{lstlisting}
>> +struct virtio_can_tx_out {
>> +#define VIRTIO_CAN_TX 0x0001u
>> +        le16 msg_type;
> make this 32 bit and add flags for extended CAN messages, RTR, CAN-FD.
> We have to discuss if we need a bit rate switch flag.
>
>> +        le16 priority;
> Let's handle priority by using different queues.
>
>> +        le32 can_id;
>> +        u8 sdu[];
> Where's the dlc or len information?

I will think about the details, msg_type is everywhere le16 and just 
there in case we need some day to support a totally different message on 
this queue we have not even an idea of today. Somehow I'm preferring to 
spend an own field for the flags.

As 2 priorities should be sufficient this can be a bit in the flags if 
not to be implemented by different TX queues. So instead of "le16 
priority" "le16 flags".

The CAN len is not needed as Virtio provides the message length of the 
whole transmitted message.

CAN len = Virtio message length - offset of sdu[0].

>
> We have to discuss if we want to pass dlc (0x0...0xf for both Classical
> CAN and CAN-FD) or len (0...8 for Classical CAN and 0...64 for CAN-FD).
>
>> +};
>> +
>> +struct virtio_can_tx_in {
>> +        u8 result;
>> +};
>> +\end{lstlisting}
>> +
>> +Priority is 0 for low priority and 1 for high priority CAN messages.
>> +
>> +The actual length of the SDU can be calculated from the length of the device
>> +read-only descriptor.
>> +
>> +To avoid internal priority inversion in the \field{Txq} the user of the
>> +driver may do a book keeping of in flight transmission requests and
>> +defer sending of TX messages until the chosen transmission resource
>> +becomes available.
>> +
>> +If priority, can_id or SDU length are out of range or the CAN controller
>> +is in an invalid state result shall be set to VIRTIO_CAN_RESULT_NOT_OK
>> +and the message shall not be scheduled for transmission. Sending a CAN
>> +message with a priority with 0 transmission places configured shall
>> +be considered as priority being out of range.
>> +
>> +If the parameters are valid the message is scheduled for transmission
>> +and result is set to VIRTIO_CAN_OK. The transmission request should be
>> +put into the used queue after the physical CAN controller acknowledged
>> +the transmission on the CAN bus (may have to be put under a feature flag
>> +as there may be non AUTOSAR CAN driver backends which don't provide a
>> +trigger to do this correctly).
> I think this should me mandatory, but we might have a flag or feature
> flag to indicate the "quality" if the tx done information. Some HW CAN
> devices only provide feedback that they have queued the CAN message, but
> no feedback that they have actually transmitted the message.

Feature flag to be added: VIRTIO_CAN_F_LATE_TX_ACK

And all this config field and priority stuff will be removed. This is 
nothing for the 1st version. Especially because it cannot be tested in 
the SocketCAN environment as there is no support. Let's keep it in mind 
as feature for a subsequent version but forget now about it. 
Overengineered for the 1st shot, wanted too much.

>> +
>> +\subsubsection{CAN Message Reception}\label{sec:Device Types / CAN Device /
>> Device Operation / CAN Message Reception}
>> +
>> +Messages can be received by providing empty incoming buffers to the
>> +virtqueue \field{Rxq}.
>> +
>> +\begin{lstlisting}
>> +struct virtio_can_rx {
>> +#define VIRTIO_CAN_RX 0x0101u
>> +        le16 msg_type;
>> +        le16 reserved;
>> +        le32 can_id;
>> +        u8 sdu[];
>> +};
>> +\end{lstlisting}
>> +
>> +The structure element reserved may in the future be used to forward an
>> +AUTOSAR hoh, see (\hyperref[intro:CAN_Driver]{CAN Driver} chapter 7.6).
>> +The value should be set to 0xFFFF.
> Please remove any AUTOSAR references.
Last remaining. And it's not needed if no rationale for the requirement 
to set the reserved field to 0xFFFF is given. Just wanted to have it out 
of the expected value range if someone wants to use it for that purpose 
later.
>> +
>> +If the feature \field{VIRTIO_CAN_F_CAN_FD} has been negotiated the
>> +maximal possible SDU length is 64, if the feature has not been
>> +negotiated the maximal possible SDU length is 8.
>> +
>> +The actual length of the SDU can be calculated from the length of the
>> +driver read-only descriptor.
>> +
>> +\subsubsection{BusOff Indication}\label{sec:Device Types / CAN Device / Device
>> Operation / BusOff Indication}
>> +
>> +There are certain error conditions so that the physical CAN controller
>> +has to stop participating in CAN communication on the bus. If such an
>> +error condition occurs the device informs the driver about the
>> +unsolicited CAN controller state change by a CAN BusOff indication.
>> +
>> +\begin{lstlisting}
>> +struct virtio_can_busoff_ind {
>> +#define VIRTIO_CAN_BUSOFF_IND 0x0301u
>> +        le16 msg_type;
>> +};
>> +\end{lstlisting}
>> +
>> +After bus-off detection the CAN controller is in STOPPED state. The CAN
>> +module does not participate in bus communication any more so all CAN
>> +messages pending for transmission must be put into the used queue with
>> +result VIRTIO_CAN_RESULT_NOT_OK.
>> -- 
>> 2.17.1

I expected a fat comment about this indication queue here. Using a 
virtio queue just to be able to send a single rare event. We discussed 
internally whether this could be done by writing a status byte to the 
config space but decided against it because of Appendix B.2 in the 
virtio specification which advises against doing so.

And I had additionally AUTOSAR in mind and the polling functions there:

Can_MainFunctionRead() could poll the RX queue while 
Can_MainFunction_BusOff() could poll the indication queue. Clear 
distribution of Can_MainFunctionXXX() duties, nice.

But having seen that a BusOff condition in SocketCAN is done by 
receiving a special error message indicating the BusOff condition 
(CAN_ERR_FLAG / CAN_ERR_BUSOFF) I'm not sure whether having this last 
virtio queue was the best idea. Maybe I should simplify and send this 
indication message via the RX queue and the indication queue is gone. 
Could be that this queue was over-engineering. But this does not decide 
the war now, this is just a detail to think about.

Harald



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [virtio-dev] review of virtio-can (was: Re: MSG_CONFIRM RX messages with SocketCAN known as unreliable under heavy load?)
  2021-06-25  9:19     ` review of virtio-can (was: Re: MSG_CONFIRM RX messages with SocketCAN known as unreliable under heavy load?) Marc Kleine-Budde
  2021-06-29 17:14       ` Harald Mommer
@ 2021-07-14  7:15       ` Michael S. Tsirkin
  2021-07-15 16:04         ` Harald Mommer
  1 sibling, 1 reply; 14+ messages in thread
From: Michael S. Tsirkin @ 2021-07-14  7:15 UTC (permalink / raw)
  To: Marc Kleine-Budde; +Cc: Harald Mommer, linux-can, virtio-dev

On Fri, Jun 25, 2021 at 11:19:38AM +0200, Marc Kleine-Budde wrote:
> On 24.06.2021 17:21:15, Harald Mommer wrote:
> > First draft went to virtio-comment@lists.oasis-open.org and
> > virtio-dev@lists.oasis-open.org.
> > 
> > https://markmail.org/search/?q=virtio-can&q=list%3Aorg.oasis-open.lists.virtio-comment#query:virtio-can%20list%3Aorg.oasis-open.lists.virtio-comment+page:1+mid:hdxj35fsthypllkt+state:results
> 
> > [virtio-dev] [PATCH 1/1] [RFC] virtio-can: Add the device specification.
> > Harald Mommer Thu, 01 Apr 2021 08:21:09 -0700
> > 
> > virtio-can is a virtual CAN device. It provides a way to give access to
> > a CAN controller from a driver guest. The device is aimed to be used by
> > driver guests running a HLOS as well as by driver guests running a
> > typical RTOS as used in controller environments.
> 
> Let's open the focus of this driver and not limit us to RTOSes.
> 
> > ---
> >  content.tex      |   1 +
> >  introduction.tex |   3 +
> >  virtio-can.tex   | 245 +++++++++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 249 insertions(+)
> >  create mode 100644 virtio-can.tex
> > 
> > diff --git a/content.tex b/content.tex
> > index e536fd4..c1604db 100644
> > --- a/content.tex
> > +++ b/content.tex
> > @@ -6564,6 +6564,7 @@ \subsubsection{Legacy Interface: Framing 
> > Requirements}\label{sec:Device
> >  \input{virtio-mem.tex}
> >  \input{virtio-i2c.tex}
> >  \input{virtio-scmi.tex}
> > +\input{virtio-can.tex}
> >  
> >  \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> >  
> > diff --git a/introduction.tex b/introduction.tex
> > index 7204b24..84ea5c0 100644
> > --- a/introduction.tex
> > +++ b/introduction.tex
> > @@ -79,6 +79,9 @@ \section{Normative References}\label{sec:Normative References}
> >         \phantomsection\label{intro:SCMI}\textbf{[SCMI]} &
> >         Arm System Control and Management Interface, DEN0056,
> >         \newline\url{https://developer.arm.com/docs/den0056/c}, version C and 
> > any future revisions\\
> > +       \phantomsection\label{intro:CAN_Driver}\textbf{[CAN Driver]} &
> > +       Specification of CAN Driver -- AUTOSAR CP R20-11,
> 
> As mentioned before don't make this AUTOSAR specific.

If the specs are more or less identical it might be worth it to link
to AUTOSAR too just because it can be downloaded for free.

-- 
MST


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [virtio-dev] review of virtio-can (was: Re: MSG_CONFIRM RX messages with SocketCAN known as unreliable under heavy load?)
  2021-07-14  7:15       ` [virtio-dev] " Michael S. Tsirkin
@ 2021-07-15 16:04         ` Harald Mommer
  0 siblings, 0 replies; 14+ messages in thread
From: Harald Mommer @ 2021-07-15 16:04 UTC (permalink / raw)
  To: Michael S. Tsirkin, Marc Kleine-Budde; +Cc: linux-can, virtio-dev

Am 14.07.21 um 09:15 schrieb Michael S. Tsirkin:
>
>> As mentioned before don't make this AUTOSAR specific.
> If the specs are more or less identical it might be worth it to link
> to AUTOSAR too just because it can be downloaded for free.
>
> --
> MST

The specs are not identical. But the specifications do not contradict, 
we are talking about the same CAN. From this point of view inclusion 
would be ok. But there was already a comment about the AUTOSAR 
specifications highliting a problem. While the AUTOSAR spec is 
downloadable for free it is not usable for free for all purposes. This 
may be a problem or even a trap for people.

AUTOSAR CP R20-11 says on page 9 chapter "Disclaimer":

"This work (specification and/or software implementation) and the 
material contained in it, as released by AUTOSAR, is for the purpose of 
information only. ..."

"The material contained in this work is protected by copyright and other 
types of intellectual property rights. The commercial exploitation of 
the material contained in this work requires a license to such 
intellectual property rights.
This work may be utilized or reproduced without any modification, in any 
form or by any means, for informational purposes only. ..."

Cannot judge the exact impact of those sentences, I'm an engineer not a 
lawyer. Not my field of expertise. Free downloadable seems not to mean 
free usable so it may indeed be better to get rid of this reference 
here. Poisoned? Don't know, looks so.

Regards
Harald

-- 

Dipl.-Ing. Harald Mommer
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone:  +49 (30) 60 98 540-0 <== Zentrale
Fax:    +49 (30) 60 98 540-99
E-Mail: harald.mommer@opensynergy.com

www.opensynergy.com

Handelsregister: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Regis Adjamah

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: MSG_CONFIRM RX messages with SocketCAN known as unreliable under heavy load?
  2021-06-24 15:21   ` Harald Mommer
  2021-06-24 18:45     ` Oliver Hartkopp
  2021-06-25  9:19     ` review of virtio-can (was: Re: MSG_CONFIRM RX messages with SocketCAN known as unreliable under heavy load?) Marc Kleine-Budde
@ 2021-06-25  9:39     ` Marc Kleine-Budde
  2 siblings, 0 replies; 14+ messages in thread
From: Marc Kleine-Budde @ 2021-06-25  9:39 UTC (permalink / raw)
  To: Harald Mommer; +Cc: linux-can

[-- Attachment #1: Type: text/plain, Size: 8179 bytes --]

On 24.06.2021 17:21:15, Harald Mommer wrote:
> Hello,
> 
> Am 18.06.21 um 11:16 schrieb Marc Kleine-Budde:
> > On 17.06.2021 14:22:03, Harald Mommer wrote:
> > > we are currently in the process of developing a draft specification for
> > > Virtio CAN. In the scope of this work I am developing a Virtio CAN Linux
> > > driver and a Virtio CAN Linux device
> > Oh that sounds interesting. Please keep the linux-can mailing list in
> > the loop. Do you have a first draft version for review, yet?
> 
> First draft went to virtio-comment@lists.oasis-open.org and
> virtio-dev@lists.oasis-open.org.
> 
> https://markmail.org/search/?q=virtio-can&q=list%3Aorg.oasis-open.lists.virtio-comment#query:virtio-can%20list%3Aorg.oasis-open.lists.virtio-comment+page:1+mid:hdxj35fsthypllkt+state:results
> 
> Link should reveal the short conversation. Currently working on the next
> draft which incorporates the review comments I got so far but the next draft
> will also address the "TX ACK" problem we are discussing here.
> 
> In the future I will put the Linux-CAN list in the loop.
> 
> > > running on top of our hypervisor solution.
> > > 
> > > The Virtio CAN Linux device forwards an existing SocketCAN CAN device
> > > (currently vcan) via Virtio to the Virtio driver guest so that the virtual
> > > driver guest can send and receive CAN frames via SocketCAN.
> > > 
> > > What was originally planned (probably with too much AUTOSAR CAN driver
> > > semantics in my head and too few SocketCAN knowledge) is to mark a
> > > transmission request as used (done) when it's sent finally on the CAN bus
> > > (vs. when it's given to SocketCAN not really done but still pending
> > > somewhere in the protocol stack).
> > Makes sense.
> 
> Reading the "Makes sense". But reading also the rest of the E-Mail (and the
> thread) it makes the impression that making this timing requirement
> mandatory using SocketCAN is calling for trouble.

It makes sense to have a TX done notification. You probably need this
for proper queue handling and throttling.

> - Could remove the timing requirement. This is the easy solution. But there
> is the "Makes sense".
> 
> - The original strict timing requirement becomes an option so it's not a
> mandatory requirement.
> 
> 2nd is my favorite (but I tend to do over engineering in the first shot so
> the option before may be indeed the better one).
> 
> Not having this timing behavior has the implication that in the next virtio
> draft spec some other things have to be changed and this means now
> simplified.
> 
> > > Thought this was doable with some implementation effort using
> > > 
> > > setsockopt(..., SOL_CAN_RAW, CAN_RAW_RECV_OWN_MSGS, ...) and evaluatiing the
> > > MSG_CONFIRM bit on received messages.

> > Where does that code run? Would that be part of qemu running on the host
> > of an open source solution?

> The device application is closed source, runs under the COQOS hypervisor
> which is also closed source.

Ok

> A qemu device implementation is not planned as of now. The virtio CAN
> driver is a Linux device driver and will be open sourced at some point
> in time in the hope to get it upstreamed in a more far away future.

I suggest to post the code as early as possible, probably along with the
next round of virio-can spec RFC.

> Currently the driver is on an internal development branch, outsiders
> cannot see it (still better for everyone)

I doubt that :) I think the Linux community has seen a lot of code that
has been cooking for too long before trying to bring it mainline.

> and the colleagues are reviewing helping to bring it into an
> acceptable shape.

You have to pass the review here anyways :D

> > Can you sketch a quick block diagram showing guest, host, Virtio device,
> > Virtio driver, etc...
> 
> I hope this arrives on the list as is been sent and not garbled:
> 
>      Guest 2                    | Guest3
> ----------------                | ----------------
> ! cangen,      !                | ! cangen,      !
> ! candump,     !                | ! candump,     !
> ! cansend      !                | ! cansend      !
> ! using vcan0  !                | ! using can0   !
> ----------------                | ----------------
>  ^                              |             ^
>  !  ---------------------       |             !
>  !  ! Service process   !       |             !
>  !  ! in user space     !       |             !

Oliver has already commented on this :) Getting feedback from the
community early could have saved you some work :)

>  !  ! virtio-can device !       |             !
>  !  ! forwarding vcan0  !       |             !
>  !  ---------------------       |             !
>  !    ^               ^         |             !
>  !    !               !         |             !
> --------------------------------------------------
>  !    !   Device side ! kernel  | Driver side ! kernel
>  v    v               v         |             v
> ---------------- -------------- | ----------------
> ! Device Linux ! ! HV support ! | ! Driver Linux !
> !    VCan      ! !   module   ! | !  Virtio CAN  !
> !    vcan0     ! ! on device  ! | !     can0     !
> !              ! !   side     ! | !              !
> ---------------- -------------- | ----------------
>        ^               ^        |        ^
>        !               !        |        !
> --------------------------------------------------
>        !               !                 ! Hypervisor
>        v               v                 v
> --------------------------------------------------
> !                     COQOS-HV                   !
> --------------------------------------------------
> 
> > > This works fine with
> > > 
> > > cangen -g 0 -i can0
> > > 
> > > on the driver side sending CAN messages to the device guest. No confirmation
> > > is lost testing for several minutes.
>
> > Where's the driver side? On the host or the guest?
> 
> Both sides are guests of the hypervisor in our architecture. There is no
> host in this sense, COQOS-HV is a type 1 hypervisor. The hypervisor does not
> provide devices directly on its own, the devices are provided with the
> support of a device (provider) guest which is also only a guest of the
> hypervisor.

IC - as I'm not interested in closed source solution I'd focus on the
qemu use case. Good thing is, the virtio-can must handle both use cases
anyways.

> > Have you activated SO_RXQ_OVFL?
> > With recvmsg() you get the number of dropped messages in the socket.
> > Have a look at:
> > https://github.com/linux-can/can-utils/blob/master/cansequence.c
> 
> I had no idea about SO_RXQ_OVFL. This looks to be useful to implement an
> emergency recovery mechanism not to get stuck. If detecting loss of received
> frames and the controller is still active and TX messages are pending for a
> too long time then marking the pending TX messages as used (done) to cope
> with the situation and not getting stuck (for too long). Might be acceptable
> if this was something which normally does not happen besides in really
> exceptional situations.

Your user space bridge is the wrong solution here.....See Oliver's mail.

> Nothing which should be done now, getting far too complicated for a 1st shot
> to implement a Virtio CAN device.
> 
> > We don't have a feature flag to query if the Linux driver support proper
> > CAN echo on TX complete notification.
> 
> Not so nice. But the device integrator should know which backend is used and
> having a command line option for the device application the issue can be
> handled. Need the command line switch anyway now to do experiments.

If needed we can add flags to the CAN drivers so that they are
introspectable, maybe via the ethtool interface.

Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: MSG_CONFIRM RX messages with SocketCAN known as unreliable under heavy load?
@ 2021-06-29 19:39 Harald Mommer
  2021-06-30  7:27 ` Oliver Hartkopp
  0 siblings, 1 reply; 14+ messages in thread
From: Harald Mommer @ 2021-06-29 19:39 UTC (permalink / raw)
  To: Marc Kleine-Budde, Oliver Hartkopp; +Cc: linux-can

[Re-sent because some mechanism on the mailing list thought this was 
SPAM and rejected.
Looks like the list does not like when Thunderbird composes a HTML 
E-Mail. Setting changed & retry.]

Hello,

Am 25.06.21 um 11:39 schrieb Marc Kleine-Budde:
> It makes sense to have a TX done notification. You probably need this
> for proper queue handling and throttling.
Yes. But this acknowledgements must be 100% reliable under all possible 
load conditions otherwise testers will prove that the solution does only 
work when the sun is shining but not during bad weather.
>
>>> Can you sketch a quick block diagram showing guest, host, Virtio device,
>>> Virtio driver, etc...
>> I hope this arrives on the list as is been sent and not garbled:
>>
>>       Guest 2                    | Guest3
>> ----------------                | ----------------
>> ! cangen,      !                | ! cangen,      !
>> ! candump,     !                | ! candump,     !
>> ! cansend      !                | ! cansend      !
>> ! using vcan0  !                | ! using can0   !
>> ----------------                | ----------------
>>   ^                              |             ^
>>   !  ---------------------       |             !
>>   !  ! Service process   !       |             !
>>   !  ! in user space     !       |             !
> Oliver has already commented on this :) Getting feedback from the
> community early could have saved you some work :)

I still don't get it. This service process is the virtio device itself. 
All our virtio devices are user land processes. There is no problem, 
this works that way.

The problem may be that the virtio device should better not have used 
vcan0 to get CAN access and that it should have used something different 
instead. CAN GW? Is it that what you want to tell me all the time? "Do 
not use vcan0 to exchange CAN messages but use CAN GW"? In this case in 
the picture the box "Device Linux / VCAN / vcan0" changes but not the 
userland virtio CAN device service process box.

If it's this I'll get into CAN GW to understand what all this means now 
and how to use it.

But anyway, if so this should not have any impact on the driver or the 
spec, this would be an issue of the device implementation itself which 
is closed source and should now not be this interesting.

>>   !  ! virtio-can device !       |             !
>>   !  ! forwarding vcan0  !       |             !
>>   !  ---------------------       |             !
>>   !    ^               ^         |             !
>>   !    !               !         |             !
>> --------------------------------------------------
>>   !    !   Device side ! kernel  | Driver side ! kernel
>>   v    v               v         |             v
>> ---------------- -------------- | ----------------
>> ! Device Linux ! ! HV support ! | ! Driver Linux !
>> !    VCan      ! !   module   ! | !  Virtio CAN  !
>> !    vcan0     ! ! on device  ! | !     can0     !
>> !              ! !   side     ! | !              !
>> ---------------- -------------- | ----------------
>>         ^               ^        |        ^
>>         !               !        |        !
>> --------------------------------------------------
>>         !               !                 ! Hypervisor
>>         v               v                 v
>> --------------------------------------------------
>> !                     COQOS-HV                   !
>> --------------------------------------------------
>>
>>
> IC - as I'm not interested in closed source solution I'd focus on the
> qemu use case. Good thing is, the virtio-can must handle both use cases
> anyways.
For me qemu is in this moment an unknown environment to develop for. 
There are already some challenges in this project and at some point 
there are too much challenges. Have to discuss if/how qemu is to be 
addressed.
> Your user space bridge is the wrong solution here.....See Oliver's mail.
The virtio devices are always user land processes in our architecture. 
Only what exactly is to be bridged is the question.
>> Nothing which should be done now, getting far too complicated for a 1st shot
>> to implement a Virtio CAN device.
>>
>>> We don't have a feature flag to query if the Linux driver support proper
>>> CAN echo on TX complete notification.
>> Not so nice. But the device integrator should know which backend is used and
>> having a command line option for the device application the issue can be
>> handled. Need the command line switch anyway now to do experiments.
> If needed we can add flags to the CAN drivers so that they are
> introspectable, maybe via the ethtool interface.
I understand here that nothing is etched in stone for all time. Did not 
expect that something like this could be possible.
> Marc

Harald



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: MSG_CONFIRM RX messages with SocketCAN known as unreliable under heavy load?
  2021-06-29 19:39 Harald Mommer
@ 2021-06-30  7:27 ` Oliver Hartkopp
  0 siblings, 0 replies; 14+ messages in thread
From: Oliver Hartkopp @ 2021-06-30  7:27 UTC (permalink / raw)
  To: Harald Mommer, Marc Kleine-Budde; +Cc: linux-can

On 29.06.21 21:39, Harald Mommer wrote:

> I still don't get it. This service process is the virtio device itself. 
> All our virtio devices are user land processes. There is no problem, 
> this works that way.

Works this way ... well, AFAIK virtio devices are usually no user space 
implementations.

> The problem may be that the virtio device should better not have used 
> vcan0 to get CAN access and that it should have used something different 
> instead. CAN GW? Is it that what you want to tell me all the time? "Do 
> not use vcan0 to exchange CAN messages but use CAN GW"?

You would still still use vcan0 or whatever you name it. But the 
"routing between CAN interfaces" can be done more efficiently inside the 
kernel.

> In this case in 
> the picture the box "Device Linux / VCAN / vcan0" changes but not the 
> userland virtio CAN device service process box.

My suggestion is more like: Create a virtual CAN device that exposes the 
virtio net driver as a CAN device inside kernel space.

An then you can use can-gw to do filtering/firewalling/forwarding to 
different application specific vcan's with can-gw.

> If it's this I'll get into CAN GW to understand what all this means now 
> and how to use it.

Just try this (as root):

modprobe can-gw

cangw -A -s vcan0 -d vcan1 -e
cangw -A -s vcan0 -d vcan2 -e -m OR:ID:400.8.8888888888888888

cangen vcan0

(and candump -c -c any on a second terminal)

This should give an impression. No filtering shown.

> But anyway, if so this should not have any impact on the driver or the 
> spec, this would be an issue of the device implementation itself which 
> is closed source and should now not be this interesting.

IMO a CAN virtio driver can be from public interest - and it has no USP. 
So why putting such a simple thing under closed source?

Regards,
Oliver

ps. Some can-gw / CAN net namespace slideware: 
https://wiki.automotivelinux.org/_media/agl-distro/agl2018-socketcan.pdf

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-07-15 16:04 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-06-17 12:22 MSG_CONFIRM RX messages with SocketCAN known as unreliable under heavy load? Harald Mommer
2021-06-18  9:16 ` Marc Kleine-Budde
2021-06-18 18:23   ` Oliver Hartkopp
2021-06-19 21:42     ` Marc Kleine-Budde
2021-06-24 15:21   ` Harald Mommer
2021-06-24 18:45     ` Oliver Hartkopp
2021-06-28 13:47       ` Harald Mommer
2021-06-25  9:19     ` review of virtio-can (was: Re: MSG_CONFIRM RX messages with SocketCAN known as unreliable under heavy load?) Marc Kleine-Budde
2021-06-29 17:14       ` Harald Mommer
2021-07-14  7:15       ` [virtio-dev] " Michael S. Tsirkin
2021-07-15 16:04         ` Harald Mommer
2021-06-25  9:39     ` MSG_CONFIRM RX messages with SocketCAN known as unreliable under heavy load? Marc Kleine-Budde
  -- strict thread matches above, loose matches on Subject: below --
2021-06-29 19:39 Harald Mommer
2021-06-30  7:27 ` Oliver Hartkopp

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox