netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Question about to KMSAN: uninit-value in can_receive
       [not found] ` <0c98b1c4-3975-4bf5-9049-9d7f10d22a6d@hartkopp.net>
@ 2025-11-30 12:44   ` Oliver Hartkopp
  2025-11-30 17:29     ` Prithvi Tambewagh
  0 siblings, 1 reply; 10+ messages in thread
From: Oliver Hartkopp @ 2025-11-30 12:44 UTC (permalink / raw)
  To: Prithvi Tambewagh, mkl; +Cc: linux-can, linux-kernel, syzkaller-bugs, netdev



On 29.11.25 18:04, Oliver Hartkopp wrote:
> Hello Prithvi,
> 
> thanks for picking up this topic!
> 
> I had your mail in my open tabs and I was reading some code several 
> times without having a really good idea how to continue.
> 
> On 17.11.25 18:30, Prithvi Tambewagh wrote:
> 
>> The call trace suggests that the bug appears to be due to effect of 
>> change
>> in headroom by pskb_header_expand(). The new headroom remains 
>> uninitialized
>> and when can_receive tries accessing can_skb_prv(skb)->skbcnt, indirectly
>> skb->head is accessed which causes KMSAN uninitialized value read bug.
> 
> Yes.
> 
> If you take a look at the KMSAN message:
> 
> https://lore.kernel.org/linux- 
> can/68bae75b.050a0220.192772.0190.GAE@google.com/T/ 
> #m0372e223746b9da19cbf39348ab1cda52a5cfadc
> 
> I wonder why anybody is obviously fiddling with the with the skb->head 
> here.
> 
> When initially creating skb for the CAN subsystem we use 
> can_skb_reserve() which does a
> 
> skb_reserve(skb, sizeof(struct can_skb_priv));
> 
> so that we get some headroom for struct can_skb_priv.
> 
> Then we access this struct by referencing skb->head:
> 
> static inline struct can_skb_priv *can_skb_prv(struct sk_buff *skb)
> {
>      return (struct can_skb_priv *)(skb->head);
> }
> 
> If anybody is now extending the headroom skb->head will likely not 
> pointing to struct can_skb_priv anymore, right?
> 
>> To fix this bug, I think we can call can_dropped_invalid_skb() in 
>> can_rcv()
>> just before calling can_receive(). Further, we can add a condition for 
>> these
>> sk_buff with uninitialized headroom to initialize the skb, the way it had
>> been done in the patch for an earlier packet injection case in a similar
>> KMSAN bug:
>> https://lore.kernel.org/linux-can/20191207183418.28868-1- 
>> socketcan@hartkopp.net/
> 
> No. This is definitely a wrong approach. You can not wildly poke values 
> behind skb->head, when the correctly initialized struct can_skb_priv 
> just sits somewhere else.
> 
> In opposite to the case in your referenced patch we do not get a skb 
> from PF_PACKET but we handle a skb that has been properly created in 
> isotp_sendmsg(). Including can_skb_reserve() and an initialized struct 
> can_skb_priv.
> 
>> However, I am not getting on what basis can I filter the sk_buff so that
>> only those with an uninitialized headroom will be initialized via this 
>> path.
>> Is this the correct approach?
> 
> No.
> 
> When we are creating CAN skbs with [can_]skb_reserve(), the struct 
> can_skb_priv is located directly "before" the struct can_frame which is 
> at skb->data.
> 
> I'm therefore currently thinking in the direction of using skb->data 
> instead of skb->head as reference to struct can_skb_priv:
> 
> diff --git a/include/linux/can/skb.h b/include/linux/can/skb.h
> index 1abc25a8d144..8822d7d2e3df 100644
> --- a/include/linux/can/skb.h
> +++ b/include/linux/can/skb.h
> @@ -60,11 +60,11 @@ struct can_skb_priv {
>          struct can_frame cf[];
>   };
> 
>   static inline struct can_skb_priv *can_skb_prv(struct sk_buff *skb)
>   {
> -       return (struct can_skb_priv *)(skb->head);
> +       return (struct can_skb_priv *)(skb->data - sizeof(struct 
> can_skb_priv));
>   }
> 
>   static inline void can_skb_reserve(struct sk_buff *skb)
>   {
>          skb_reserve(skb, sizeof(struct can_skb_priv));
> 
> I have not checked what effect this might have to this patch
> 
> https://lore.kernel.org/linux-can/20191207183418.28868-1- 
> socketcan@hartkopp.net/
> 
> when we initialize struct can_skb_priv inside skbs we did not create in 
> the CAN subsystem. The difference would be that we access struct 
> can_skb_priv via skb->data and not via skb->head. The effect to the 
> system should be similar.
> 
> What do you think about such approach?
> 
> Best regards,
> Oliver
> 

Hello Prithvi,

I'm answering in this mail thread as you answered on the other thread 
which does not preserve the discussion above.

On 30.11.25 13:04, Prithvi Tambewagh wrote:
 > Hello Oliver,
 >
 > Thanks for the feedback! I now understand how struct can_skb_priv is
 > reserved in the headroom, more clearly, given that I am relatively new
 > to kernel development. I agree on your patch.
 >
 > I tested it locally  using the reproducer program for this bug 
provided by
 > syzbot and it didn't crash the kernel. Also, I checked the patch here
 >
 > 
https://lore.kernel.org/linux-can/20191207183418.28868-1-socketcan@hartkopp.net/
 >
 > looking at it, I think your patch will work fine with the above patch as
 > well, since data will be accessed at
 >
 > skb->data - sizeof(struct can_skb_priv)
 >
 > which is the intended place for it, according to te action of
 > can_skb_reserve() which increases headroom by length
 > sizeof(struct can_skb_priv), reserving the space just before skb->data.
 >
 > I think it solves this specific KMSAN bug. Kindly correct me if I am 
wrong.

Yes. It solves that specific bug. But IMO we need to fix the root cause 
of this issue.

The CAN skb is passed to NAPI and XDP code

  kmalloc_reserve+0x23e/0x4a0 net/core/skbuff.c:609
  pskb_expand_head+0x226/0x1a60 net/core/skbuff.c:2275
  netif_skb_check_for_xdp net/core/dev.c:5081 [inline]
  netif_receive_generic_xdp net/core/dev.c:5112 [inline]
  do_xdp_generic+0x9e3/0x15a0 net/core/dev.c:5180
  __netif_receive_skb_core+0x25c3/0x6f10 net/core/dev.c:5524

which invoked pskb_expand_head() which manipulates skb->head and 
therefore removes the reference to our struct can_skb_priv.
 > Would you like to fix this bug by sending your patch upstream? Or else
 > shall I send this patch upstream and mention your name in 
Suggested-by tag?

No. Neither of that - as it will not fix the root cause.

IMO we need to check who is using the headroom in CAN skbs and for what 
reason first. And when we are not able to safely control the headroom 
for our struct can_skb_priv content we might need to find another way to 
store that content.
E.g. by creating this space behind skb->data or add new attributes to 
struct sk_buff.

Best regards,
Oliver

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question about to KMSAN: uninit-value in can_receive
  2025-11-30 12:44   ` Question about to KMSAN: uninit-value in can_receive Oliver Hartkopp
@ 2025-11-30 17:29     ` Prithvi Tambewagh
  2025-11-30 19:09       ` Oliver Hartkopp
  0 siblings, 1 reply; 10+ messages in thread
From: Prithvi Tambewagh @ 2025-11-30 17:29 UTC (permalink / raw)
  To: Oliver Hartkopp, mkl; +Cc: linux-can, linux-kernel, syzkaller-bugs, netdev

On Sun, Nov 30, 2025 at 01:44:32PM +0100, Oliver Hartkopp wrote:
>
>
>On 29.11.25 18:04, Oliver Hartkopp wrote:
>>Hello Prithvi,
>>
>>thanks for picking up this topic!
>>
>>I had your mail in my open tabs and I was reading some code several 
>>times without having a really good idea how to continue.
>>
>>On 17.11.25 18:30, Prithvi Tambewagh wrote:
>>
>>>The call trace suggests that the bug appears to be due to effect 
>>>of change
>>>in headroom by pskb_header_expand(). The new headroom remains 
>>>uninitialized
>>>and when can_receive tries accessing can_skb_prv(skb)->skbcnt, indirectly
>>>skb->head is accessed which causes KMSAN uninitialized value read bug.
>>
>>Yes.
>>
>>If you take a look at the KMSAN message:
>>
>>https://lore.kernel.org/linux- 
>>can/68bae75b.050a0220.192772.0190.GAE@google.com/T/ 
>>#m0372e223746b9da19cbf39348ab1cda52a5cfadc
>>
>>I wonder why anybody is obviously fiddling with the with the 
>>skb->head here.
>>
>>When initially creating skb for the CAN subsystem we use 
>>can_skb_reserve() which does a
>>
>>skb_reserve(skb, sizeof(struct can_skb_priv));
>>
>>so that we get some headroom for struct can_skb_priv.
>>
>>Then we access this struct by referencing skb->head:
>>
>>static inline struct can_skb_priv *can_skb_prv(struct sk_buff *skb)
>>{
>>     return (struct can_skb_priv *)(skb->head);
>>}
>>
>>If anybody is now extending the headroom skb->head will likely not 
>>pointing to struct can_skb_priv anymore, right?
>>
>>>To fix this bug, I think we can call can_dropped_invalid_skb() in 
>>>can_rcv()
>>>just before calling can_receive(). Further, we can add a condition 
>>>for these
>>>sk_buff with uninitialized headroom to initialize the skb, the way it had
>>>been done in the patch for an earlier packet injection case in a similar
>>>KMSAN bug:
>>>https://lore.kernel.org/linux-can/20191207183418.28868-1- 
>>>socketcan@hartkopp.net/
>>
>>No. This is definitely a wrong approach. You can not wildly poke 
>>values behind skb->head, when the correctly initialized struct 
>>can_skb_priv just sits somewhere else.
>>
>>In opposite to the case in your referenced patch we do not get a skb 
>>from PF_PACKET but we handle a skb that has been properly created in 
>>isotp_sendmsg(). Including can_skb_reserve() and an initialized 
>>struct can_skb_priv.
>>
>>>However, I am not getting on what basis can I filter the sk_buff so that
>>>only those with an uninitialized headroom will be initialized via 
>>>this path.
>>>Is this the correct approach?
>>
>>No.
>>
>>When we are creating CAN skbs with [can_]skb_reserve(), the struct 
>>can_skb_priv is located directly "before" the struct can_frame which 
>>is at skb->data.
>>
>>I'm therefore currently thinking in the direction of using skb->data 
>>instead of skb->head as reference to struct can_skb_priv:
>>
>>diff --git a/include/linux/can/skb.h b/include/linux/can/skb.h
>>index 1abc25a8d144..8822d7d2e3df 100644
>>--- a/include/linux/can/skb.h
>>+++ b/include/linux/can/skb.h
>>@@ -60,11 +60,11 @@ struct can_skb_priv {
>>         struct can_frame cf[];
>>  };
>>
>>  static inline struct can_skb_priv *can_skb_prv(struct sk_buff *skb)
>>  {
>>-       return (struct can_skb_priv *)(skb->head);
>>+       return (struct can_skb_priv *)(skb->data - sizeof(struct 
>>can_skb_priv));
>>  }
>>
>>  static inline void can_skb_reserve(struct sk_buff *skb)
>>  {
>>         skb_reserve(skb, sizeof(struct can_skb_priv));
>>
>>I have not checked what effect this might have to this patch
>>
>>https://lore.kernel.org/linux-can/20191207183418.28868-1- 
>>socketcan@hartkopp.net/
>>
>>when we initialize struct can_skb_priv inside skbs we did not create 
>>in the CAN subsystem. The difference would be that we access struct 
>>can_skb_priv via skb->data and not via skb->head. The effect to the 
>>system should be similar.
>>
>>What do you think about such approach?
>>
>>Best regards,
>>Oliver
>>
>
>Hello Prithvi,
>
>I'm answering in this mail thread as you answered on the other thread 
>which does not preserve the discussion above.

Hello Oliver,

Apologies for this, I was using git send-email and probably messed up with
the Message ID. I have just set up mutt, this should be correct now.

>
>On 30.11.25 13:04, Prithvi Tambewagh wrote:
>> Hello Oliver,
>>
>> Thanks for the feedback! I now understand how struct can_skb_priv is
>> reserved in the headroom, more clearly, given that I am relatively new
>> to kernel development. I agree on your patch.
>>
>> I tested it locally  using the reproducer program for this bug 
>provided by
>> syzbot and it didn't crash the kernel. Also, I checked the patch here
>>
>> https://lore.kernel.org/linux-can/20191207183418.28868-1-socketcan@hartkopp.net/
>>
>> looking at it, I think your patch will work fine with the above patch as
>> well, since data will be accessed at
>>
>> skb->data - sizeof(struct can_skb_priv)
>>
>> which is the intended place for it, according to te action of
>> can_skb_reserve() which increases headroom by length
>> sizeof(struct can_skb_priv), reserving the space just before skb->data.
>>
>> I think it solves this specific KMSAN bug. Kindly correct me if I am 
>wrong.
>
>Yes. It solves that specific bug. But IMO we need to fix the root 
>cause of this issue.
>
>The CAN skb is passed to NAPI and XDP code
>
> kmalloc_reserve+0x23e/0x4a0 net/core/skbuff.c:609
> pskb_expand_head+0x226/0x1a60 net/core/skbuff.c:2275
> netif_skb_check_for_xdp net/core/dev.c:5081 [inline]
> netif_receive_generic_xdp net/core/dev.c:5112 [inline]
> do_xdp_generic+0x9e3/0x15a0 net/core/dev.c:5180
> __netif_receive_skb_core+0x25c3/0x6f10 net/core/dev.c:5524
>
>which invoked pskb_expand_head() which manipulates skb->head and 
>therefore removes the reference to our struct can_skb_priv.
>> Would you like to fix this bug by sending your patch upstream? Or else
>> shall I send this patch upstream and mention your name in 
>Suggested-by tag?
>
>No. Neither of that - as it will not fix the root cause.
>
>IMO we need to check who is using the headroom in CAN skbs and for 
>what reason first. And when we are not able to safely control the 
>headroom for our struct can_skb_priv content we might need to find 
>another way to store that content.
>E.g. by creating this space behind skb->data or add new attributes to 
>struct sk_buff.

I will work in this direction. Just to confirm, what you mean is
that first it should be checked where the headroom is used while also
checking whether the data from region covered by struct can_skb_priv is 
intact, and if not then we need to ensure that it is intact by other 
measures, right? 

>
>Best regards,
>Oliver

Thank You,
Prithvi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question about to KMSAN: uninit-value in can_receive
  2025-11-30 17:29     ` Prithvi Tambewagh
@ 2025-11-30 19:09       ` Oliver Hartkopp
  2025-12-07 18:45         ` Prithvi
  2025-12-20 17:33         ` Prithvi
  0 siblings, 2 replies; 10+ messages in thread
From: Oliver Hartkopp @ 2025-11-30 19:09 UTC (permalink / raw)
  To: Prithvi Tambewagh, Marc Kleine-Budde
  Cc: linux-can, linux-kernel, syzkaller-bugs, netdev

Hi Prithvi,

On 30.11.25 18:29, Prithvi Tambewagh wrote:
> On Sun, Nov 30, 2025 at 01:44:32PM +0100, Oliver Hartkopp wrote:

>>> shall I send this patch upstream and mention your name in 
>> Suggested-by tag?
>>
>> No. Neither of that - as it will not fix the root cause.
>>
>> IMO we need to check who is using the headroom in CAN skbs and for 
>> what reason first. And when we are not able to safely control the 
>> headroom for our struct can_skb_priv content we might need to find 
>> another way to store that content.
>> E.g. by creating this space behind skb->data or add new attributes to 
>> struct sk_buff.
> 
> I will work in this direction. Just to confirm, what you mean is
> that first it should be checked where the headroom is used while also
> checking whether the data from region covered by struct can_skb_priv is 
> intact, and if not then we need to ensure that it is intact by other 
> measures, right?

I have added skb_dump(KERN_WARNING, skb, true) in my local dummy_can.c
an sent some CAN frames with cansend.

CAN CC:

[ 3351.708018] skb len=16 headroom=16 headlen=16 tailroom=288
                mac=(16,0) mac_len=0 net=(16,0) trans=16
                shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
                csum(0x0 start=0 offset=0 ip_summed=1 complete_sw=0 
valid=0 level=0)
                hash(0x0 sw=0 l4=0) proto=0x000c pkttype=5 iif=0
                priority=0x0 mark=0x0 alloc_cpu=5 vlan_all=0x0
                encapsulation=0 inner(proto=0x0000, mac=0, net=0, trans=0)
[ 3351.708151] dev name=can0 feat=0x0000000000004008
[ 3351.708159] sk family=29 type=3 proto=0
[ 3351.708166] skb headroom: 00000000: 07 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00
[ 3351.708173] skb linear:   00000000: 23 01 00 00 04 00 00 00 11 22 33 
44 00 00 00 00

(..)

CAN FD:

[ 3557.069471] skb len=72 headroom=16 headlen=72 tailroom=232
                mac=(16,0) mac_len=0 net=(16,0) trans=16
                shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
                csum(0x0 start=0 offset=0 ip_summed=1 complete_sw=0 
valid=0 level=0)
                hash(0x0 sw=0 l4=0) proto=0x000d pkttype=5 iif=0
                priority=0x0 mark=0x0 alloc_cpu=6 vlan_all=0x0
                encapsulation=0 inner(proto=0x0000, mac=0, net=0, trans=0)
[ 3557.069499] dev name=can0 feat=0x0000000000004008
[ 3557.069507] sk family=29 type=3 proto=0
[ 3557.069513] skb headroom: 00000000: 07 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00
[ 3557.069520] skb linear:   00000000: 33 03 00 00 10 05 00 00 00 11 22 
33 44 55 66 77
[ 3557.069526] skb linear:   00000010: 88 aa bb cc dd ee ff 00 00 00 00 
00 00 00 00 00

(..)

CAN XL:

[ 5477.498205] skb len=908 headroom=16 headlen=908 tailroom=804
                mac=(16,0) mac_len=0 net=(16,0) trans=16
                shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
                csum(0x0 start=0 offset=0 ip_summed=1 complete_sw=0 
valid=0 level=0)
                hash(0x0 sw=0 l4=0) proto=0x000e pkttype=5 iif=0
                priority=0x0 mark=0x0 alloc_cpu=6 vlan_all=0x0
                encapsulation=0 inner(proto=0x0000, mac=0, net=0, trans=0)
[ 5477.498236] dev name=can0 feat=0x0000000000004008
[ 5477.498244] sk family=29 type=3 proto=0
[ 5477.498251] skb headroom: 00000000: 07 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00
[ 5477.498258] skb linear:   00000000: b0 05 92 00 81 cd 80 03 cd b4 92 
58 4c a1 f6 0c
[ 5477.498264] skb linear:   00000010: 1a c9 6d 0a 4c a1 f6 0c 1a c9 6d 
0a 4c a1 f6 0c
[ 5477.498269] skb linear:   00000020: 1a c9 6d 0a 4c a1 f6 0c 1a c9 6d 
0a 4c a1 f6 0c
[ 5477.498275] skb linear:   00000030: 1a c9 6d 0a 4c a1 f6 0c 1a c9 6d 
0a 4c a1 f6 0c


I will also add skb_dump(KERN_WARNING, skb, true) in the CAN receive 
path to see what's going on there.

My main problem with the KMSAN message
https://lore.kernel.org/linux-can/68bae75b.050a0220.192772.0190.GAE@google.com/
is that it uses

NAPI, XDP and therefore pskb_expand_head():

  kmalloc_reserve+0x23e/0x4a0 net/core/skbuff.c:609
  pskb_expand_head+0x226/0x1a60 net/core/skbuff.c:2275
  netif_skb_check_for_xdp net/core/dev.c:5081 [inline]
  netif_receive_generic_xdp net/core/dev.c:5112 [inline]
  do_xdp_generic+0x9e3/0x15a0 net/core/dev.c:5180
  __netif_receive_skb_core+0x25c3/0x6f10 net/core/dev.c:5524
  __netif_receive_skb_one_core net/core/dev.c:5702 [inline]
  __netif_receive_skb+0xca/0xa00 net/core/dev.c:5817
  process_backlog+0x4ad/0xa50 net/core/dev.c:6149
  __napi_poll+0xe7/0x980 net/core/dev.c:6902
  napi_poll net/core/dev.c:6971 [inline]

As you can see in
https://syzkaller.appspot.com/x/log.txt?x=144ece64580000

[pid  5804] socket(AF_CAN, SOCK_DGRAM, CAN_ISOTP) = 5
[pid  5804] ioctl(5, SIOCGIFINDEX, {ifr_name="vxcan0", ifr_ifindex=20}) = 0

they are using the vxcan driver which is mainly derived from vcan.c and 
veth.c (~2017). The veth.c driver supports all those GRO, NAPI and XDP 
features today which vxcan.c still does NOT support.

Therefore I wonder how the NAPI and XDP code can be used together with 
vxcan. And if this is still the case today, as the syzcaller kernel 
6.13.0-rc7-syzkaller-00039-gc3812b15000c is already one year old.

Many questions ...

Best regards,
Oliver

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question about to KMSAN: uninit-value in can_receive
  2025-11-30 19:09       ` Oliver Hartkopp
@ 2025-12-07 18:45         ` Prithvi
  2025-12-20 17:33         ` Prithvi
  1 sibling, 0 replies; 10+ messages in thread
From: Prithvi @ 2025-12-07 18:45 UTC (permalink / raw)
  To: Oliver Hartkopp
  Cc: Marc Kleine-Budde, linux-can, linux-kernel, syzkaller-bugs,
	netdev

On Sun, Nov 30, 2025 at 08:09:48PM +0100, Oliver Hartkopp wrote:
> Hi Prithvi,
> 
> On 30.11.25 18:29, Prithvi Tambewagh wrote:
> > On Sun, Nov 30, 2025 at 01:44:32PM +0100, Oliver Hartkopp wrote:
> 
> > > > shall I send this patch upstream and mention your name in
> > > Suggested-by tag?
> > > 
> > > No. Neither of that - as it will not fix the root cause.
> > > 
> > > IMO we need to check who is using the headroom in CAN skbs and for
> > > what reason first. And when we are not able to safely control the
> > > headroom for our struct can_skb_priv content we might need to find
> > > another way to store that content.
> > > E.g. by creating this space behind skb->data or add new attributes
> > > to struct sk_buff.
> > 
> > I will work in this direction. Just to confirm, what you mean is
> > that first it should be checked where the headroom is used while also
> > checking whether the data from region covered by struct can_skb_priv is
> > intact, and if not then we need to ensure that it is intact by other
> > measures, right?
> 
> I have added skb_dump(KERN_WARNING, skb, true) in my local dummy_can.c
> an sent some CAN frames with cansend.
> 
> CAN CC:
> 
> [ 3351.708018] skb len=16 headroom=16 headlen=16 tailroom=288
>                mac=(16,0) mac_len=0 net=(16,0) trans=16
>                shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
>                csum(0x0 start=0 offset=0 ip_summed=1 complete_sw=0 valid=0
> level=0)
>                hash(0x0 sw=0 l4=0) proto=0x000c pkttype=5 iif=0
>                priority=0x0 mark=0x0 alloc_cpu=5 vlan_all=0x0
>                encapsulation=0 inner(proto=0x0000, mac=0, net=0, trans=0)
> [ 3351.708151] dev name=can0 feat=0x0000000000004008
> [ 3351.708159] sk family=29 type=3 proto=0
> [ 3351.708166] skb headroom: 00000000: 07 00 00 00 00 00 00 00 00 00 00 00
> 00 00 00 00
> [ 3351.708173] skb linear:   00000000: 23 01 00 00 04 00 00 00 11 22 33 44
> 00 00 00 00
> 
> (..)
> 
> CAN FD:
> 
> [ 3557.069471] skb len=72 headroom=16 headlen=72 tailroom=232
>                mac=(16,0) mac_len=0 net=(16,0) trans=16
>                shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
>                csum(0x0 start=0 offset=0 ip_summed=1 complete_sw=0 valid=0
> level=0)
>                hash(0x0 sw=0 l4=0) proto=0x000d pkttype=5 iif=0
>                priority=0x0 mark=0x0 alloc_cpu=6 vlan_all=0x0
>                encapsulation=0 inner(proto=0x0000, mac=0, net=0, trans=0)
> [ 3557.069499] dev name=can0 feat=0x0000000000004008
> [ 3557.069507] sk family=29 type=3 proto=0
> [ 3557.069513] skb headroom: 00000000: 07 00 00 00 00 00 00 00 00 00 00 00
> 00 00 00 00
> [ 3557.069520] skb linear:   00000000: 33 03 00 00 10 05 00 00 00 11 22 33
> 44 55 66 77
> [ 3557.069526] skb linear:   00000010: 88 aa bb cc dd ee ff 00 00 00 00 00
> 00 00 00 00
> 
> (..)
> 
> CAN XL:
> 
> [ 5477.498205] skb len=908 headroom=16 headlen=908 tailroom=804
>                mac=(16,0) mac_len=0 net=(16,0) trans=16
>                shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
>                csum(0x0 start=0 offset=0 ip_summed=1 complete_sw=0 valid=0
> level=0)
>                hash(0x0 sw=0 l4=0) proto=0x000e pkttype=5 iif=0
>                priority=0x0 mark=0x0 alloc_cpu=6 vlan_all=0x0
>                encapsulation=0 inner(proto=0x0000, mac=0, net=0, trans=0)
> [ 5477.498236] dev name=can0 feat=0x0000000000004008
> [ 5477.498244] sk family=29 type=3 proto=0
> [ 5477.498251] skb headroom: 00000000: 07 00 00 00 00 00 00 00 00 00 00 00
> 00 00 00 00
> [ 5477.498258] skb linear:   00000000: b0 05 92 00 81 cd 80 03 cd b4 92 58
> 4c a1 f6 0c
> [ 5477.498264] skb linear:   00000010: 1a c9 6d 0a 4c a1 f6 0c 1a c9 6d 0a
> 4c a1 f6 0c
> [ 5477.498269] skb linear:   00000020: 1a c9 6d 0a 4c a1 f6 0c 1a c9 6d 0a
> 4c a1 f6 0c
> [ 5477.498275] skb linear:   00000030: 1a c9 6d 0a 4c a1 f6 0c 1a c9 6d 0a
> 4c a1 f6 0c
> 
> 
> I will also add skb_dump(KERN_WARNING, skb, true) in the CAN receive path to
> see what's going on there.
> 
> My main problem with the KMSAN message
> https://lore.kernel.org/linux-can/68bae75b.050a0220.192772.0190.GAE@google.com/
> is that it uses
> 
> NAPI, XDP and therefore pskb_expand_head():
> 
>  kmalloc_reserve+0x23e/0x4a0 net/core/skbuff.c:609
>  pskb_expand_head+0x226/0x1a60 net/core/skbuff.c:2275
>  netif_skb_check_for_xdp net/core/dev.c:5081 [inline]
>  netif_receive_generic_xdp net/core/dev.c:5112 [inline]
>  do_xdp_generic+0x9e3/0x15a0 net/core/dev.c:5180
>  __netif_receive_skb_core+0x25c3/0x6f10 net/core/dev.c:5524
>  __netif_receive_skb_one_core net/core/dev.c:5702 [inline]
>  __netif_receive_skb+0xca/0xa00 net/core/dev.c:5817
>  process_backlog+0x4ad/0xa50 net/core/dev.c:6149
>  __napi_poll+0xe7/0x980 net/core/dev.c:6902
>  napi_poll net/core/dev.c:6971 [inline]
> 
> As you can see in
> https://syzkaller.appspot.com/x/log.txt?x=144ece64580000
> 
> [pid  5804] socket(AF_CAN, SOCK_DGRAM, CAN_ISOTP) = 5
> [pid  5804] ioctl(5, SIOCGIFINDEX, {ifr_name="vxcan0", ifr_ifindex=20}) = 0
> 
> they are using the vxcan driver which is mainly derived from vcan.c and
> veth.c (~2017). The veth.c driver supports all those GRO, NAPI and XDP
> features today which vxcan.c still does NOT support.
> 
> Therefore I wonder how the NAPI and XDP code can be used together with
> vxcan. And if this is still the case today, as the syzcaller kernel
> 6.13.0-rc7-syzkaller-00039-gc3812b15000c is already one year old.
> 
> Many questions ...
> 
> Best regards,
> Oliver
Hello Oliver,

Firstly I apologize for I have not been able to get back to the coversation.
I have my exams going on right now and unfortunately my PC got some hardware 
issue, due to which I am using another old PC, which d0oesn't work much well. 
Hence I am not able to work on this right now

However I look forward to continue testing this bug ASAP. There are sevral 
things to analyse here.

Best regards,
Prithvi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question about to KMSAN: uninit-value in can_receive
  2025-11-30 19:09       ` Oliver Hartkopp
  2025-12-07 18:45         ` Prithvi
@ 2025-12-20 17:33         ` Prithvi
  2025-12-21 18:29           ` [bpf, xdp] headroom - was: " Oliver Hartkopp
  1 sibling, 1 reply; 10+ messages in thread
From: Prithvi @ 2025-12-20 17:33 UTC (permalink / raw)
  To: Oliver Hartkopp
  Cc: Marc Kleine-Budde, linux-can, linux-kernel, syzkaller-bugs,
	netdev

On Sun, Nov 30, 2025 at 08:09:48PM +0100, Oliver Hartkopp wrote:
> Hi Prithvi,
> 
> On 30.11.25 18:29, Prithvi Tambewagh wrote:
> > On Sun, Nov 30, 2025 at 01:44:32PM +0100, Oliver Hartkopp wrote:
> 
> > > > shall I send this patch upstream and mention your name in
> > > Suggested-by tag?
> > > 
> > > No. Neither of that - as it will not fix the root cause.
> > > 
> > > IMO we need to check who is using the headroom in CAN skbs and for
> > > what reason first. And when we are not able to safely control the
> > > headroom for our struct can_skb_priv content we might need to find
> > > another way to store that content.
> > > E.g. by creating this space behind skb->data or add new attributes
> > > to struct sk_buff.
> > 
> > I will work in this direction. Just to confirm, what you mean is
> > that first it should be checked where the headroom is used while also
> > checking whether the data from region covered by struct can_skb_priv is
> > intact, and if not then we need to ensure that it is intact by other
> > measures, right?
> 
> I have added skb_dump(KERN_WARNING, skb, true) in my local dummy_can.c
> an sent some CAN frames with cansend.
> 
> CAN CC:
> 
> [ 3351.708018] skb len=16 headroom=16 headlen=16 tailroom=288
>                mac=(16,0) mac_len=0 net=(16,0) trans=16
>                shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
>                csum(0x0 start=0 offset=0 ip_summed=1 complete_sw=0 valid=0
> level=0)
>                hash(0x0 sw=0 l4=0) proto=0x000c pkttype=5 iif=0
>                priority=0x0 mark=0x0 alloc_cpu=5 vlan_all=0x0
>                encapsulation=0 inner(proto=0x0000, mac=0, net=0, trans=0)
> [ 3351.708151] dev name=can0 feat=0x0000000000004008
> [ 3351.708159] sk family=29 type=3 proto=0
> [ 3351.708166] skb headroom: 00000000: 07 00 00 00 00 00 00 00 00 00 00 00
> 00 00 00 00
> [ 3351.708173] skb linear:   00000000: 23 01 00 00 04 00 00 00 11 22 33 44
> 00 00 00 00
> 
> (..)
> 
> CAN FD:
> 
> [ 3557.069471] skb len=72 headroom=16 headlen=72 tailroom=232
>                mac=(16,0) mac_len=0 net=(16,0) trans=16
>                shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
>                csum(0x0 start=0 offset=0 ip_summed=1 complete_sw=0 valid=0
> level=0)
>                hash(0x0 sw=0 l4=0) proto=0x000d pkttype=5 iif=0
>                priority=0x0 mark=0x0 alloc_cpu=6 vlan_all=0x0
>                encapsulation=0 inner(proto=0x0000, mac=0, net=0, trans=0)
> [ 3557.069499] dev name=can0 feat=0x0000000000004008
> [ 3557.069507] sk family=29 type=3 proto=0
> [ 3557.069513] skb headroom: 00000000: 07 00 00 00 00 00 00 00 00 00 00 00
> 00 00 00 00
> [ 3557.069520] skb linear:   00000000: 33 03 00 00 10 05 00 00 00 11 22 33
> 44 55 66 77
> [ 3557.069526] skb linear:   00000010: 88 aa bb cc dd ee ff 00 00 00 00 00
> 00 00 00 00
> 
> (..)
> 
> CAN XL:
> 
> [ 5477.498205] skb len=908 headroom=16 headlen=908 tailroom=804
>                mac=(16,0) mac_len=0 net=(16,0) trans=16
>                shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
>                csum(0x0 start=0 offset=0 ip_summed=1 complete_sw=0 valid=0
> level=0)
>                hash(0x0 sw=0 l4=0) proto=0x000e pkttype=5 iif=0
>                priority=0x0 mark=0x0 alloc_cpu=6 vlan_all=0x0
>                encapsulation=0 inner(proto=0x0000, mac=0, net=0, trans=0)
> [ 5477.498236] dev name=can0 feat=0x0000000000004008
> [ 5477.498244] sk family=29 type=3 proto=0
> [ 5477.498251] skb headroom: 00000000: 07 00 00 00 00 00 00 00 00 00 00 00
> 00 00 00 00
> [ 5477.498258] skb linear:   00000000: b0 05 92 00 81 cd 80 03 cd b4 92 58
> 4c a1 f6 0c
> [ 5477.498264] skb linear:   00000010: 1a c9 6d 0a 4c a1 f6 0c 1a c9 6d 0a
> 4c a1 f6 0c
> [ 5477.498269] skb linear:   00000020: 1a c9 6d 0a 4c a1 f6 0c 1a c9 6d 0a
> 4c a1 f6 0c
> [ 5477.498275] skb linear:   00000030: 1a c9 6d 0a 4c a1 f6 0c 1a c9 6d 0a
> 4c a1 f6 0c
> 
> 
> I will also add skb_dump(KERN_WARNING, skb, true) in the CAN receive path to
> see what's going on there.
> 
> My main problem with the KMSAN message
> https://lore.kernel.org/linux-can/68bae75b.050a0220.192772.0190.GAE@google.com/
> is that it uses
> 
> NAPI, XDP and therefore pskb_expand_head():
> 
>  kmalloc_reserve+0x23e/0x4a0 net/core/skbuff.c:609
>  pskb_expand_head+0x226/0x1a60 net/core/skbuff.c:2275
>  netif_skb_check_for_xdp net/core/dev.c:5081 [inline]
>  netif_receive_generic_xdp net/core/dev.c:5112 [inline]
>  do_xdp_generic+0x9e3/0x15a0 net/core/dev.c:5180
>  __netif_receive_skb_core+0x25c3/0x6f10 net/core/dev.c:5524
>  __netif_receive_skb_one_core net/core/dev.c:5702 [inline]
>  __netif_receive_skb+0xca/0xa00 net/core/dev.c:5817
>  process_backlog+0x4ad/0xa50 net/core/dev.c:6149
>  __napi_poll+0xe7/0x980 net/core/dev.c:6902
>  napi_poll net/core/dev.c:6971 [inline]
> 
> As you can see in
> https://syzkaller.appspot.com/x/log.txt?x=144ece64580000
> 
> [pid  5804] socket(AF_CAN, SOCK_DGRAM, CAN_ISOTP) = 5
> [pid  5804] ioctl(5, SIOCGIFINDEX, {ifr_name="vxcan0", ifr_ifindex=20}) = 0
> 
> they are using the vxcan driver which is mainly derived from vcan.c and
> veth.c (~2017). The veth.c driver supports all those GRO, NAPI and XDP
> features today which vxcan.c still does NOT support.
> 
> Therefore I wonder how the NAPI and XDP code can be used together with
> vxcan. And if this is still the case today, as the syzcaller kernel
> 6.13.0-rc7-syzkaller-00039-gc3812b15000c is already one year old.
> 
> Many questions ...
> 
> Best regards,
> Oliver

Hello Oliver,

I tried investigating further why the XDP path was chosen inspite of using 
vxcan. I tried looking for dummy_can.c in upstream tree but could not find 
it; I might be missing something here - could you please tell where can I 
find it? Meanwhile, I tried using GDB for the analysis.

I observed in the bug's strace log:

[pid  5804] bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_XDP, insn_cnt=3, insns=0x200000c0, license="syzkaller", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="", prog_ifindex=0, expected_attach_type=BPF_XDP, prog_btf_fd=-1, func_info_rec_size=8, func_info=NULL, func_info_cnt=0, line_info_rec_size=16, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0, fd_array=NULL, ...}, 144) = 3
[pid  5804] socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE) = 4
[pid  5804] sendmsg(4, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\x34\x00\x00\x00\x10\x00\x01\x08\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x80\x40\x01\x00\x00\x00\x01\x00\x0c\x00\x2b\x80\x08\x00\x01\x00\x03\x00\x00\x00\x08\x00\x1b\x00\x00\x00\x00\x00", iov_len=52}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_DONTWAIT|MSG_FASTOPEN}, 0) = 52
[pid  5804] socket(AF_CAN, SOCK_DGRAM, CAN_ISOTP) = 5
[pid  5804] ioctl(5, SIOCGIFINDEX, {ifr_name="vxcan0", ifr_ifindex=20}) = 0

Notably, before binding vxcan0 to the CAN socket, a BPF program is loaded. 
I then tried using GDB to check and got the following insights:

(gdb) b vxcan_xmit
Breakpoint 23 at 0xffffffff88ca899e: file drivers/net/can/vxcan.c, line 38.
(gdb) delete 23
(gdb) b __sys_bpf
Breakpoint 24 at 0xffffffff81d2653e: file kernel/bpf/syscall.c, line 5752.
(gdb) b bpf_prog_load
Breakpoint 25 at 0xffffffff81d2cd80: file kernel/bpf/syscall.c, line 2736.
(gdb) b vxcan_xmit if (oskb->dev->name[0]=='v' && ((oskb->dev->name[1]=='x' && oskb->dev->name[2]=='c' && oskb->dev->name[3]=='a' && oskb->dev->name[4]=='n') || (oskb->dev->name[1]=='c' && oskb->dev->name[2]=='a' && oskb->dev->name[3]=='n')))
Breakpoint 26 at 0xffffffff88ca899e: file drivers/net/can/vxcan.c, line 38.
(gdb) b __netif_receive_skb if (skb->dev->name[0]=='v' && ((skb->dev->name[1]=='x' && skb->dev->name[2]=='c' && skb->dev->name[3]=='a' && skb->dev->name[4]=='n') || (skb->dev->name[1]=='c' && skb->dev->name[2]=='a' && skb->dev->name[3]=='n')))
Breakpoint 27 at 0xffffffff8ce3c310: file net/core/dev.c, line 5798.
(gdb) b do_xdp_generic if (pskb->dev->name[0]=='v' && ((pskb->dev->name[1]=='x' && pskb->dev->name[2]=='c' && pskb->dev->name[3]=='a' && pskb->dev->name[4]=='n') || (pskb->dev->name[1]=='c' && pskb->dev->name[2]=='a' && pskb->dev->name[3]=='n')))
Breakpoint 28 at 0xffffffff8cdfccd7: file net/core/dev.c, line 5171.
(gdb) b dev_xdp_attach if (dev->name[0]=='v' && ((dev->name[1]=='x' && dev->name[2]=='c' && dev->name[3]=='a' && dev->name[4]=='n') || (dev->name[1]=='c' && dev->name[2]=='a' && dev->name[3]=='n')))
Breakpoint 29 at 0xffffffff8ce18b4e: file net/core/dev.c, line 9610.

Thread 2 hit Breakpoint 24, __sys_bpf (cmd=cmd@entry=BPF_PROG_LOAD, uattr=..., size=size@entry=144) at kernel/bpf/syscall.c:5752
5752    {
(gdb) c
Continuing.

Thread 2 hit Breakpoint 25, bpf_prog_load (attr=attr@entry=0xffff88811c987d60, uattr=..., uattr_size=144) at kernel/bpf/syscall.c:2736
2736    {
(gdb) c
Continuing.
[Switching to Thread 1.1]

Thread 1 hit Breakpoint 29, dev_xdp_attach (dev=dev@entry=0xffff888124e78000, extack=extack@entry=0xffff88811c987858, link=link@entry=0x0 <fixed_percpu_data>, new_prog=new_prog@entry=0xffffc9000a516000, old_prog=old_prog@entry=0x0 <fixed_percpu_data>, flags=flags@entry=0) at net/core/dev.c:9610
9610    {
(gdb) p dev->name
$104 = "vcan0\000\000\000\000\000\000\000\000\000\000"
(gdb) p dev->xdp_prog
$105 = (struct bpf_prog *) 0x0 <fixed_percpu_data>
(gdb) c
Continuing.

Thread 1 hit Breakpoint 29, dev_xdp_attach (dev=dev@entry=0xffff88818e918000, extack=extack@entry=0xffff88811c987858, link=link@entry=0x0 <fixed_percpu_data>, new_prog=new_prog@entry=0xffffc9000a516000, old_prog=old_prog@entry=0x0 <fixed_percpu_data>, flags=flags@entry=0) at net/core/dev.c:9610
9610    {
(gdb) p dev->name
$106 = "vxcan0\000\000\000\000\000\000\000\000\000"
(gdb) p dev->xdp_prog
$107 = (struct bpf_prog *) 0x0 <fixed_percpu_data>
(gdb) c
Continuing.

Thread 1 hit Breakpoint 29, dev_xdp_attach (dev=dev@entry=0xffff88818e910000, extack=extack@entry=0xffff88811c987858, link=link@entry=0x0 <fixed_percpu_data>, new_prog=new_prog@entry=0xffffc9000a516000, old_prog=old_prog@entry=0x0 <fixed_percpu_data>, flags=flags@entry=0) at net/core/dev.c:9610
9610    {
(gdb) p dev->name
$108 = "vxcan1\000\000\000\000\000\000\000\000\000"
(gdb) p dev->xdp_prog
$109 = (struct bpf_prog *) 0x0 <fixed_percpu_data>
(gdb) c
Continuing.
[Switching to Thread 1.2]

Here, it is attempted to attach the eariler BPF program to each of the CAN 
devices present (I checked only for CAN devices since we are dealing with 
effect of XDP in CAN networing stack). Earlier they didn't seem to have any 
BPF program attached due to which  XDP wasn't attempted for these CAN devices
earlier.

Thread 2 hit Breakpoint 26, vxcan_xmit (oskb=0xffff888115d8a400, dev=0xffff88818e918000) at drivers/net/can/vxcan.c:38
38      {
(gdb) p oskb->dev->name
$110 = "vxcan0\000\000\000\000\000\000\000\000\000"
(gdb) p oskb->dev->xdp_prog
$111 = (struct bpf_prog *) 0xffffc9000a516000
(gdb) c
Continuing.

Thread 2 hit Breakpoint 27, __netif_receive_skb (skb=skb@entry=0xffff888115d8ab00) at net/core/dev.c:5798
5798    {
(gdb) p skb->dev->name
$112 = "vxcan1\000\000\000\000\000\000\000\000\000"
(gdb) p skb->dev->xdp_prog
$113 = (struct bpf_prog *) 0xffffc9000a516000
(gdb) c
Continuing.

Thread 2 hit Breakpoint 28, do_xdp_generic (xdp_prog=0xffffc9000a516000, pskb=0xffff88843fc05af8) at net/core/dev.c:5171
5171    {
(gdb) p pskb->dev->name
$114 = "vxcan1\000\000\000\000\000\000\000\000\000"
(gdb) p pskb->dev->xdp_prog
$115 = (struct bpf_prog *) 0xffffc9000a516000
(gdb) c
Continuing.

After this, the KMSAN bug is triggered. Hence, we can conclude that due to the
BPF program loaded earlier, the CAN device undertakes generic XDP path during RX, 
which is accessible even if vxcan doesn't support XDP by itself.

It seems that the way CAN devices use the headroom for storing private skb related
data might be incompatible for XPD path, due to which the generic networking stack 
at RX requires to expand the head, and it is done in such a way that the yet 
uninitialized expanded headroom is accesssed by can_skb_prv() using skb->head.

So, I think we can solve this bug in the following ways:

1. As you suggested earlier, access struct can_skb_priv using: 
struct can_skb_priv *)(skb->data - sizeof(struct can_skb_priv)
This method ensures that the remaining CAN networking stack, which expects can_skb_priv
just before skb->data, as well as maintain compatibility with headroom expamnsion during
generic XDP.

2. Try to find some way so that XDP pathway is rejected by CAN devices at the beginning 
itself, like for example in function dev_xdp_attach():

/* don't call drivers if the effective program didn't change */
if (new_prog != cur_prog) {
	bpf_op = dev_xdp_bpf_op(dev, mode);
	if (!bpf_op) {
		NL_SET_ERR_MSG(extack, "Underlying driver does not support XDP in native mode");
		return -EOPNOTSUPP;
	}

	err = dev_xdp_install(dev, mode, bpf_op, extack, flags, new_prog);
	if (err)
		return err;
}

or in some other appropriate way.

What do you think what should be done ahead?

Best Regards,
Prithvi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [bpf, xdp] headroom - was: Re: Question about to KMSAN: uninit-value in can_receive
  2025-12-20 17:33         ` Prithvi
@ 2025-12-21 18:29           ` Oliver Hartkopp
  2025-12-21 19:06             ` Marc Kleine-Budde
  2026-01-02 15:36             ` Prithvi
  0 siblings, 2 replies; 10+ messages in thread
From: Oliver Hartkopp @ 2025-12-21 18:29 UTC (permalink / raw)
  To: Andrii Nakryiko, Prithvi
  Cc: Marc Kleine-Budde, linux-can, linux-kernel, syzkaller-bugs,
	netdev

Hello Andrii,

we have a "KMSAN: uninit value" problem which is created by 
netif_skb_check_for_xdp() and later pskb_expand_head().

The CAN netdev interfaces (ARPHRD_CAN) don't have XDP support and the 
CAN bus related skbs allocate 16 bytes of pricate headroom.

Although CAN netdevs don't support XDP the KMSAN issue shows that the 
headroom is expanded for CAN skbs and a following access to the CAN skb 
private data via skb->head now reads from the beginning of the XDP 
expanded head which is (of course) uninitialized.

Prithvi thankfully did some investigation (see below!) which proved my 
estimation about "someone is expanding our CAN skb headroom".

Prithvi also proposed two ways to solve the issue (at the end of his 
mail below), where I think the first one is a bad hack (although it was 
my idea).

The second idea is a change for dev_xdp_attach() where your expertise 
would be necessary.

My sugestion would rather go into the direction to extend dev_xdp_mode()

https://elixir.bootlin.com/linux/v6.19-rc1/source/net/core/dev.c#L10170

in a way that it allows to completely disable XDP for CAN skbs, e.g. 
with a new XDP_FLAGS_DISABLED that completely keeps the hands off such skbs.

Do you have any (better) idea how to preserve the private data in the 
skb->head of CAN related skbs?

Many thanks and best regards,
Oliver

ps. original mail thread at 
https://lore.kernel.org/linux-can/68bae75b.050a0220.192772.0190.GAE@google.com/

On 20.12.25 18:33, Prithvi wrote:
> On Sun, Nov 30, 2025 at 08:09:48PM +0100, Oliver Hartkopp wrote:
>> Hi Prithvi,
>>
>> On 30.11.25 18:29, Prithvi Tambewagh wrote:
>>> On Sun, Nov 30, 2025 at 01:44:32PM +0100, Oliver Hartkopp wrote:
>>
>>>>> shall I send this patch upstream and mention your name in
>>>> Suggested-by tag?
>>>>
>>>> No. Neither of that - as it will not fix the root cause.
>>>>
>>>> IMO we need to check who is using the headroom in CAN skbs and for
>>>> what reason first. And when we are not able to safely control the
>>>> headroom for our struct can_skb_priv content we might need to find
>>>> another way to store that content.
>>>> E.g. by creating this space behind skb->data or add new attributes
>>>> to struct sk_buff.
>>>
>>> I will work in this direction. Just to confirm, what you mean is
>>> that first it should be checked where the headroom is used while also
>>> checking whether the data from region covered by struct can_skb_priv is
>>> intact, and if not then we need to ensure that it is intact by other
>>> measures, right?
>>
>> I have added skb_dump(KERN_WARNING, skb, true) in my local dummy_can.c
>> an sent some CAN frames with cansend.
>>
>> CAN CC:
>>
>> [ 3351.708018] skb len=16 headroom=16 headlen=16 tailroom=288
>>                 mac=(16,0) mac_len=0 net=(16,0) trans=16
>>                 shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
>>                 csum(0x0 start=0 offset=0 ip_summed=1 complete_sw=0 valid=0
>> level=0)
>>                 hash(0x0 sw=0 l4=0) proto=0x000c pkttype=5 iif=0
>>                 priority=0x0 mark=0x0 alloc_cpu=5 vlan_all=0x0
>>                 encapsulation=0 inner(proto=0x0000, mac=0, net=0, trans=0)
>> [ 3351.708151] dev name=can0 feat=0x0000000000004008
>> [ 3351.708159] sk family=29 type=3 proto=0
>> [ 3351.708166] skb headroom: 00000000: 07 00 00 00 00 00 00 00 00 00 00 00
>> 00 00 00 00
>> [ 3351.708173] skb linear:   00000000: 23 01 00 00 04 00 00 00 11 22 33 44
>> 00 00 00 00
>>
>> (..)
>>
>> CAN FD:
>>
>> [ 3557.069471] skb len=72 headroom=16 headlen=72 tailroom=232
>>                 mac=(16,0) mac_len=0 net=(16,0) trans=16
>>                 shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
>>                 csum(0x0 start=0 offset=0 ip_summed=1 complete_sw=0 valid=0
>> level=0)
>>                 hash(0x0 sw=0 l4=0) proto=0x000d pkttype=5 iif=0
>>                 priority=0x0 mark=0x0 alloc_cpu=6 vlan_all=0x0
>>                 encapsulation=0 inner(proto=0x0000, mac=0, net=0, trans=0)
>> [ 3557.069499] dev name=can0 feat=0x0000000000004008
>> [ 3557.069507] sk family=29 type=3 proto=0
>> [ 3557.069513] skb headroom: 00000000: 07 00 00 00 00 00 00 00 00 00 00 00
>> 00 00 00 00
>> [ 3557.069520] skb linear:   00000000: 33 03 00 00 10 05 00 00 00 11 22 33
>> 44 55 66 77
>> [ 3557.069526] skb linear:   00000010: 88 aa bb cc dd ee ff 00 00 00 00 00
>> 00 00 00 00
>>
>> (..)
>>
>> CAN XL:
>>
>> [ 5477.498205] skb len=908 headroom=16 headlen=908 tailroom=804
>>                 mac=(16,0) mac_len=0 net=(16,0) trans=16
>>                 shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
>>                 csum(0x0 start=0 offset=0 ip_summed=1 complete_sw=0 valid=0
>> level=0)
>>                 hash(0x0 sw=0 l4=0) proto=0x000e pkttype=5 iif=0
>>                 priority=0x0 mark=0x0 alloc_cpu=6 vlan_all=0x0
>>                 encapsulation=0 inner(proto=0x0000, mac=0, net=0, trans=0)
>> [ 5477.498236] dev name=can0 feat=0x0000000000004008
>> [ 5477.498244] sk family=29 type=3 proto=0
>> [ 5477.498251] skb headroom: 00000000: 07 00 00 00 00 00 00 00 00 00 00 00
>> 00 00 00 00
>> [ 5477.498258] skb linear:   00000000: b0 05 92 00 81 cd 80 03 cd b4 92 58
>> 4c a1 f6 0c
>> [ 5477.498264] skb linear:   00000010: 1a c9 6d 0a 4c a1 f6 0c 1a c9 6d 0a
>> 4c a1 f6 0c
>> [ 5477.498269] skb linear:   00000020: 1a c9 6d 0a 4c a1 f6 0c 1a c9 6d 0a
>> 4c a1 f6 0c
>> [ 5477.498275] skb linear:   00000030: 1a c9 6d 0a 4c a1 f6 0c 1a c9 6d 0a
>> 4c a1 f6 0c
>>
>>
>> I will also add skb_dump(KERN_WARNING, skb, true) in the CAN receive path to
>> see what's going on there.
>>
>> My main problem with the KMSAN message
>> https://lore.kernel.org/linux-can/68bae75b.050a0220.192772.0190.GAE@google.com/
>> is that it uses
>>
>> NAPI, XDP and therefore pskb_expand_head():
>>
>>   kmalloc_reserve+0x23e/0x4a0 net/core/skbuff.c:609
>>   pskb_expand_head+0x226/0x1a60 net/core/skbuff.c:2275
>>   netif_skb_check_for_xdp net/core/dev.c:5081 [inline]
>>   netif_receive_generic_xdp net/core/dev.c:5112 [inline]
>>   do_xdp_generic+0x9e3/0x15a0 net/core/dev.c:5180
>>   __netif_receive_skb_core+0x25c3/0x6f10 net/core/dev.c:5524
>>   __netif_receive_skb_one_core net/core/dev.c:5702 [inline]
>>   __netif_receive_skb+0xca/0xa00 net/core/dev.c:5817
>>   process_backlog+0x4ad/0xa50 net/core/dev.c:6149
>>   __napi_poll+0xe7/0x980 net/core/dev.c:6902
>>   napi_poll net/core/dev.c:6971 [inline]
>>
>> As you can see in
>> https://syzkaller.appspot.com/x/log.txt?x=144ece64580000
>>
>> [pid  5804] socket(AF_CAN, SOCK_DGRAM, CAN_ISOTP) = 5
>> [pid  5804] ioctl(5, SIOCGIFINDEX, {ifr_name="vxcan0", ifr_ifindex=20}) = 0
>>
>> they are using the vxcan driver which is mainly derived from vcan.c and
>> veth.c (~2017). The veth.c driver supports all those GRO, NAPI and XDP
>> features today which vxcan.c still does NOT support.
>>
>> Therefore I wonder how the NAPI and XDP code can be used together with
>> vxcan. And if this is still the case today, as the syzcaller kernel
>> 6.13.0-rc7-syzkaller-00039-gc3812b15000c is already one year old.
>>
>> Many questions ...
>>
>> Best regards,
>> Oliver
> 
> Hello Oliver,
> 
> I tried investigating further why the XDP path was chosen inspite of using
> vxcan. I tried looking for dummy_can.c in upstream tree but could not find
> it; I might be missing something here - could you please tell where can I
> find it? Meanwhile, I tried using GDB for the analysis.
> 
> I observed in the bug's strace log:
> 
> [pid  5804] bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_XDP, insn_cnt=3, insns=0x200000c0, license="syzkaller", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="", prog_ifindex=0, expected_attach_type=BPF_XDP, prog_btf_fd=-1, func_info_rec_size=8, func_info=NULL, func_info_cnt=0, line_info_rec_size=16, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0, fd_array=NULL, ...}, 144) = 3
> [pid  5804] socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE) = 4
> [pid  5804] sendmsg(4, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\x34\x00\x00\x00\x10\x00\x01\x08\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x80\x40\x01\x00\x00\x00\x01\x00\x0c\x00\x2b\x80\x08\x00\x01\x00\x03\x00\x00\x00\x08\x00\x1b\x00\x00\x00\x00\x00", iov_len=52}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_DONTWAIT|MSG_FASTOPEN}, 0) = 52
> [pid  5804] socket(AF_CAN, SOCK_DGRAM, CAN_ISOTP) = 5
> [pid  5804] ioctl(5, SIOCGIFINDEX, {ifr_name="vxcan0", ifr_ifindex=20}) = 0
> 
> Notably, before binding vxcan0 to the CAN socket, a BPF program is loaded.
> I then tried using GDB to check and got the following insights:
> 
> (gdb) b vxcan_xmit
> Breakpoint 23 at 0xffffffff88ca899e: file drivers/net/can/vxcan.c, line 38.
> (gdb) delete 23
> (gdb) b __sys_bpf
> Breakpoint 24 at 0xffffffff81d2653e: file kernel/bpf/syscall.c, line 5752.
> (gdb) b bpf_prog_load
> Breakpoint 25 at 0xffffffff81d2cd80: file kernel/bpf/syscall.c, line 2736.
> (gdb) b vxcan_xmit if (oskb->dev->name[0]=='v' && ((oskb->dev->name[1]=='x' && oskb->dev->name[2]=='c' && oskb->dev->name[3]=='a' && oskb->dev->name[4]=='n') || (oskb->dev->name[1]=='c' && oskb->dev->name[2]=='a' && oskb->dev->name[3]=='n')))
> Breakpoint 26 at 0xffffffff88ca899e: file drivers/net/can/vxcan.c, line 38.
> (gdb) b __netif_receive_skb if (skb->dev->name[0]=='v' && ((skb->dev->name[1]=='x' && skb->dev->name[2]=='c' && skb->dev->name[3]=='a' && skb->dev->name[4]=='n') || (skb->dev->name[1]=='c' && skb->dev->name[2]=='a' && skb->dev->name[3]=='n')))
> Breakpoint 27 at 0xffffffff8ce3c310: file net/core/dev.c, line 5798.
> (gdb) b do_xdp_generic if (pskb->dev->name[0]=='v' && ((pskb->dev->name[1]=='x' && pskb->dev->name[2]=='c' && pskb->dev->name[3]=='a' && pskb->dev->name[4]=='n') || (pskb->dev->name[1]=='c' && pskb->dev->name[2]=='a' && pskb->dev->name[3]=='n')))
> Breakpoint 28 at 0xffffffff8cdfccd7: file net/core/dev.c, line 5171.
> (gdb) b dev_xdp_attach if (dev->name[0]=='v' && ((dev->name[1]=='x' && dev->name[2]=='c' && dev->name[3]=='a' && dev->name[4]=='n') || (dev->name[1]=='c' && dev->name[2]=='a' && dev->name[3]=='n')))
> Breakpoint 29 at 0xffffffff8ce18b4e: file net/core/dev.c, line 9610.
> 
> Thread 2 hit Breakpoint 24, __sys_bpf (cmd=cmd@entry=BPF_PROG_LOAD, uattr=..., size=size@entry=144) at kernel/bpf/syscall.c:5752
> 5752    {
> (gdb) c
> Continuing.
> 
> Thread 2 hit Breakpoint 25, bpf_prog_load (attr=attr@entry=0xffff88811c987d60, uattr=..., uattr_size=144) at kernel/bpf/syscall.c:2736
> 2736    {
> (gdb) c
> Continuing.
> [Switching to Thread 1.1]
> 
> Thread 1 hit Breakpoint 29, dev_xdp_attach (dev=dev@entry=0xffff888124e78000, extack=extack@entry=0xffff88811c987858, link=link@entry=0x0 <fixed_percpu_data>, new_prog=new_prog@entry=0xffffc9000a516000, old_prog=old_prog@entry=0x0 <fixed_percpu_data>, flags=flags@entry=0) at net/core/dev.c:9610
> 9610    {
> (gdb) p dev->name
> $104 = "vcan0\000\000\000\000\000\000\000\000\000\000"
> (gdb) p dev->xdp_prog
> $105 = (struct bpf_prog *) 0x0 <fixed_percpu_data>
> (gdb) c
> Continuing.
> 
> Thread 1 hit Breakpoint 29, dev_xdp_attach (dev=dev@entry=0xffff88818e918000, extack=extack@entry=0xffff88811c987858, link=link@entry=0x0 <fixed_percpu_data>, new_prog=new_prog@entry=0xffffc9000a516000, old_prog=old_prog@entry=0x0 <fixed_percpu_data>, flags=flags@entry=0) at net/core/dev.c:9610
> 9610    {
> (gdb) p dev->name
> $106 = "vxcan0\000\000\000\000\000\000\000\000\000"
> (gdb) p dev->xdp_prog
> $107 = (struct bpf_prog *) 0x0 <fixed_percpu_data>
> (gdb) c
> Continuing.
> 
> Thread 1 hit Breakpoint 29, dev_xdp_attach (dev=dev@entry=0xffff88818e910000, extack=extack@entry=0xffff88811c987858, link=link@entry=0x0 <fixed_percpu_data>, new_prog=new_prog@entry=0xffffc9000a516000, old_prog=old_prog@entry=0x0 <fixed_percpu_data>, flags=flags@entry=0) at net/core/dev.c:9610
> 9610    {
> (gdb) p dev->name
> $108 = "vxcan1\000\000\000\000\000\000\000\000\000"
> (gdb) p dev->xdp_prog
> $109 = (struct bpf_prog *) 0x0 <fixed_percpu_data>
> (gdb) c
> Continuing.
> [Switching to Thread 1.2]
> 
> Here, it is attempted to attach the eariler BPF program to each of the CAN
> devices present (I checked only for CAN devices since we are dealing with
> effect of XDP in CAN networing stack). Earlier they didn't seem to have any
> BPF program attached due to which  XDP wasn't attempted for these CAN devices
> earlier.
> 
> Thread 2 hit Breakpoint 26, vxcan_xmit (oskb=0xffff888115d8a400, dev=0xffff88818e918000) at drivers/net/can/vxcan.c:38
> 38      {
> (gdb) p oskb->dev->name
> $110 = "vxcan0\000\000\000\000\000\000\000\000\000"
> (gdb) p oskb->dev->xdp_prog
> $111 = (struct bpf_prog *) 0xffffc9000a516000
> (gdb) c
> Continuing.
> 
> Thread 2 hit Breakpoint 27, __netif_receive_skb (skb=skb@entry=0xffff888115d8ab00) at net/core/dev.c:5798
> 5798    {
> (gdb) p skb->dev->name
> $112 = "vxcan1\000\000\000\000\000\000\000\000\000"
> (gdb) p skb->dev->xdp_prog
> $113 = (struct bpf_prog *) 0xffffc9000a516000
> (gdb) c
> Continuing.
> 
> Thread 2 hit Breakpoint 28, do_xdp_generic (xdp_prog=0xffffc9000a516000, pskb=0xffff88843fc05af8) at net/core/dev.c:5171
> 5171    {
> (gdb) p pskb->dev->name
> $114 = "vxcan1\000\000\000\000\000\000\000\000\000"
> (gdb) p pskb->dev->xdp_prog
> $115 = (struct bpf_prog *) 0xffffc9000a516000
> (gdb) c
> Continuing.
> 
> After this, the KMSAN bug is triggered. Hence, we can conclude that due to the
> BPF program loaded earlier, the CAN device undertakes generic XDP path during RX,
> which is accessible even if vxcan doesn't support XDP by itself.
> 
> It seems that the way CAN devices use the headroom for storing private skb related
> data might be incompatible for XPD path, due to which the generic networking stack
> at RX requires to expand the head, and it is done in such a way that the yet
> uninitialized expanded headroom is accesssed by can_skb_prv() using skb->head.
> 
> So, I think we can solve this bug in the following ways:
> 
> 1. As you suggested earlier, access struct can_skb_priv using:
> struct can_skb_priv *)(skb->data - sizeof(struct can_skb_priv)
> This method ensures that the remaining CAN networking stack, which expects can_skb_priv
> just before skb->data, as well as maintain compatibility with headroom expamnsion during
> generic XDP.
> 
> 2. Try to find some way so that XDP pathway is rejected by CAN devices at the beginning
> itself, like for example in function dev_xdp_attach():
> 
> /* don't call drivers if the effective program didn't change */
> if (new_prog != cur_prog) {
> 	bpf_op = dev_xdp_bpf_op(dev, mode);
> 	if (!bpf_op) {
> 		NL_SET_ERR_MSG(extack, "Underlying driver does not support XDP in native mode");
> 		return -EOPNOTSUPP;
> 	}
> 
> 	err = dev_xdp_install(dev, mode, bpf_op, extack, flags, new_prog);
> 	if (err)
> 		return err;
> }
> 
> or in some other appropriate way.
> 
> What do you think what should be done ahead?
> 
> Best Regards,
> Prithvi
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [bpf, xdp] headroom - was: Re: Question about to KMSAN: uninit-value in can_receive
  2025-12-21 18:29           ` [bpf, xdp] headroom - was: " Oliver Hartkopp
@ 2025-12-21 19:06             ` Marc Kleine-Budde
  2025-12-21 19:42               ` Oliver Hartkopp
  2026-01-02 15:36             ` Prithvi
  1 sibling, 1 reply; 10+ messages in thread
From: Marc Kleine-Budde @ 2025-12-21 19:06 UTC (permalink / raw)
  To: Oliver Hartkopp
  Cc: Andrii Nakryiko, Prithvi, linux-can, linux-kernel, syzkaller-bugs,
	netdev

[-- Attachment #1: Type: text/plain, Size: 1788 bytes --]

On 21.12.2025 19:29:37, Oliver Hartkopp wrote:
> we have a "KMSAN: uninit value" problem which is created by
> netif_skb_check_for_xdp() and later pskb_expand_head().
>
> The CAN netdev interfaces (ARPHRD_CAN) don't have XDP support and the CAN
> bus related skbs allocate 16 bytes of pricate headroom.
>
> Although CAN netdevs don't support XDP the KMSAN issue shows that the
> headroom is expanded for CAN skbs and a following access to the CAN skb
> private data via skb->head now reads from the beginning of the XDP expanded
> head which is (of course) uninitialized.
>
> Prithvi thankfully did some investigation (see below!) which proved my
> estimation about "someone is expanding our CAN skb headroom".
>
> Prithvi also proposed two ways to solve the issue (at the end of his mail
> below), where I think the first one is a bad hack (although it was my idea).
>
> The second idea is a change for dev_xdp_attach() where your expertise would
> be necessary.
>
> My sugestion would rather go into the direction to extend dev_xdp_mode()
>
> https://elixir.bootlin.com/linux/v6.19-rc1/source/net/core/dev.c#L10170
>
> in a way that it allows to completely disable XDP for CAN skbs, e.g. with a
> new XDP_FLAGS_DISABLED that completely keeps the hands off such skbs.

That sounds not like a good idea to me.

> Do you have any (better) idea how to preserve the private data in the
> skb->head of CAN related skbs?

We probably have to place the data somewhere else.

regards,
Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde          |
Embedded Linux                   | https://www.pengutronix.de |
Vertretung Nürnberg              | Phone: +49-5121-206917-129 |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-9   |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [bpf, xdp] headroom - was: Re: Question about to KMSAN: uninit-value in can_receive
  2025-12-21 19:06             ` Marc Kleine-Budde
@ 2025-12-21 19:42               ` Oliver Hartkopp
  0 siblings, 0 replies; 10+ messages in thread
From: Oliver Hartkopp @ 2025-12-21 19:42 UTC (permalink / raw)
  To: Marc Kleine-Budde
  Cc: Andrii Nakryiko, Prithvi, linux-can, linux-kernel, syzkaller-bugs,
	netdev



On 21.12.25 20:06, Marc Kleine-Budde wrote:
> On 21.12.2025 19:29:37, Oliver Hartkopp wrote:
>> we have a "KMSAN: uninit value" problem which is created by
>> netif_skb_check_for_xdp() and later pskb_expand_head().
>>
>> The CAN netdev interfaces (ARPHRD_CAN) don't have XDP support and the CAN
>> bus related skbs allocate 16 bytes of pricate headroom.
>>
>> Although CAN netdevs don't support XDP the KMSAN issue shows that the
>> headroom is expanded for CAN skbs and a following access to the CAN skb
>> private data via skb->head now reads from the beginning of the XDP expanded
>> head which is (of course) uninitialized.
>>
>> Prithvi thankfully did some investigation (see below!) which proved my
>> estimation about "someone is expanding our CAN skb headroom".
>>
>> Prithvi also proposed two ways to solve the issue (at the end of his mail
>> below), where I think the first one is a bad hack (although it was my idea).
>>
>> The second idea is a change for dev_xdp_attach() where your expertise would
>> be necessary.
>>
>> My sugestion would rather go into the direction to extend dev_xdp_mode()
>>
>> https://elixir.bootlin.com/linux/v6.19-rc1/source/net/core/dev.c#L10170
>>
>> in a way that it allows to completely disable XDP for CAN skbs, e.g. with a
>> new XDP_FLAGS_DISABLED that completely keeps the hands off such skbs.
> 
> That sounds not like a good idea to me.
> 
>> Do you have any (better) idea how to preserve the private data in the
>> skb->head of CAN related skbs?
> 
> We probably have to place the data somewhere else.

Maybe in the tail room or inside struct sk_buff with some #ifdef 
CONFIG_CAN handling?

But let's wait for Andrii's feedback first, whether he is generally 
aware of this XDP behavior effect on CAN skbs.

Best regards,
Oliver


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [bpf, xdp] headroom - was: Re: Question about to KMSAN: uninit-value in can_receive
  2025-12-21 18:29           ` [bpf, xdp] headroom - was: " Oliver Hartkopp
  2025-12-21 19:06             ` Marc Kleine-Budde
@ 2026-01-02 15:36             ` Prithvi
  2026-01-02 20:04               ` Jakub Kicinski
  1 sibling, 1 reply; 10+ messages in thread
From: Prithvi @ 2026-01-02 15:36 UTC (permalink / raw)
  To: andrii; +Cc: socketcan, mkl, linux-can, linux-kernel, syzkaller-bugs, netdev

On Sun, Dec 21, 2025 at 07:29:37PM +0100, Oliver Hartkopp wrote:
> Hello Andrii,
> 
> we have a "KMSAN: uninit value" problem which is created by
> netif_skb_check_for_xdp() and later pskb_expand_head().
> 
> The CAN netdev interfaces (ARPHRD_CAN) don't have XDP support and the CAN
> bus related skbs allocate 16 bytes of pricate headroom.
> 
> Although CAN netdevs don't support XDP the KMSAN issue shows that the
> headroom is expanded for CAN skbs and a following access to the CAN skb
> private data via skb->head now reads from the beginning of the XDP expanded
> head which is (of course) uninitialized.
> 
> Prithvi thankfully did some investigation (see below!) which proved my
> estimation about "someone is expanding our CAN skb headroom".
> 
> Prithvi also proposed two ways to solve the issue (at the end of his mail
> below), where I think the first one is a bad hack (although it was my idea).
> 
> The second idea is a change for dev_xdp_attach() where your expertise would
> be necessary.
> 
> My sugestion would rather go into the direction to extend dev_xdp_mode()
> 
> https://elixir.bootlin.com/linux/v6.19-rc1/source/net/core/dev.c#L10170
> 
> in a way that it allows to completely disable XDP for CAN skbs, e.g. with a
> new XDP_FLAGS_DISABLED that completely keeps the hands off such skbs.
> 
> Do you have any (better) idea how to preserve the private data in the
> skb->head of CAN related skbs?
> 
> Many thanks and best regards,
> Oliver
> 
> ps. original mail thread at https://lore.kernel.org/linux-can/68bae75b.050a0220.192772.0190.GAE@google.com/
> 
> On 20.12.25 18:33, Prithvi wrote:
> > On Sun, Nov 30, 2025 at 08:09:48PM +0100, Oliver Hartkopp wrote:
> > > Hi Prithvi,
> > > 
> > > On 30.11.25 18:29, Prithvi Tambewagh wrote:
> > > > On Sun, Nov 30, 2025 at 01:44:32PM +0100, Oliver Hartkopp wrote:
> > > 
> > > > > > shall I send this patch upstream and mention your name in
> > > > > Suggested-by tag?
> > > > > 
> > > > > No. Neither of that - as it will not fix the root cause.
> > > > > 
> > > > > IMO we need to check who is using the headroom in CAN skbs and for
> > > > > what reason first. And when we are not able to safely control the
> > > > > headroom for our struct can_skb_priv content we might need to find
> > > > > another way to store that content.
> > > > > E.g. by creating this space behind skb->data or add new attributes
> > > > > to struct sk_buff.
> > > > 
> > > > I will work in this direction. Just to confirm, what you mean is
> > > > that first it should be checked where the headroom is used while also
> > > > checking whether the data from region covered by struct can_skb_priv is
> > > > intact, and if not then we need to ensure that it is intact by other
> > > > measures, right?
> > > 
> > > I have added skb_dump(KERN_WARNING, skb, true) in my local dummy_can.c
> > > an sent some CAN frames with cansend.
> > > 
> > > CAN CC:
> > > 
> > > [ 3351.708018] skb len=16 headroom=16 headlen=16 tailroom=288
> > >                 mac=(16,0) mac_len=0 net=(16,0) trans=16
> > >                 shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
> > >                 csum(0x0 start=0 offset=0 ip_summed=1 complete_sw=0 valid=0
> > > level=0)
> > >                 hash(0x0 sw=0 l4=0) proto=0x000c pkttype=5 iif=0
> > >                 priority=0x0 mark=0x0 alloc_cpu=5 vlan_all=0x0
> > >                 encapsulation=0 inner(proto=0x0000, mac=0, net=0, trans=0)
> > > [ 3351.708151] dev name=can0 feat=0x0000000000004008
> > > [ 3351.708159] sk family=29 type=3 proto=0
> > > [ 3351.708166] skb headroom: 00000000: 07 00 00 00 00 00 00 00 00 00 00 00
> > > 00 00 00 00
> > > [ 3351.708173] skb linear:   00000000: 23 01 00 00 04 00 00 00 11 22 33 44
> > > 00 00 00 00
> > > 
> > > (..)
> > > 
> > > CAN FD:
> > > 
> > > [ 3557.069471] skb len=72 headroom=16 headlen=72 tailroom=232
> > >                 mac=(16,0) mac_len=0 net=(16,0) trans=16
> > >                 shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
> > >                 csum(0x0 start=0 offset=0 ip_summed=1 complete_sw=0 valid=0
> > > level=0)
> > >                 hash(0x0 sw=0 l4=0) proto=0x000d pkttype=5 iif=0
> > >                 priority=0x0 mark=0x0 alloc_cpu=6 vlan_all=0x0
> > >                 encapsulation=0 inner(proto=0x0000, mac=0, net=0, trans=0)
> > > [ 3557.069499] dev name=can0 feat=0x0000000000004008
> > > [ 3557.069507] sk family=29 type=3 proto=0
> > > [ 3557.069513] skb headroom: 00000000: 07 00 00 00 00 00 00 00 00 00 00 00
> > > 00 00 00 00
> > > [ 3557.069520] skb linear:   00000000: 33 03 00 00 10 05 00 00 00 11 22 33
> > > 44 55 66 77
> > > [ 3557.069526] skb linear:   00000010: 88 aa bb cc dd ee ff 00 00 00 00 00
> > > 00 00 00 00
> > > 
> > > (..)
> > > 
> > > CAN XL:
> > > 
> > > [ 5477.498205] skb len=908 headroom=16 headlen=908 tailroom=804
> > >                 mac=(16,0) mac_len=0 net=(16,0) trans=16
> > >                 shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
> > >                 csum(0x0 start=0 offset=0 ip_summed=1 complete_sw=0 valid=0
> > > level=0)
> > >                 hash(0x0 sw=0 l4=0) proto=0x000e pkttype=5 iif=0
> > >                 priority=0x0 mark=0x0 alloc_cpu=6 vlan_all=0x0
> > >                 encapsulation=0 inner(proto=0x0000, mac=0, net=0, trans=0)
> > > [ 5477.498236] dev name=can0 feat=0x0000000000004008
> > > [ 5477.498244] sk family=29 type=3 proto=0
> > > [ 5477.498251] skb headroom: 00000000: 07 00 00 00 00 00 00 00 00 00 00 00
> > > 00 00 00 00
> > > [ 5477.498258] skb linear:   00000000: b0 05 92 00 81 cd 80 03 cd b4 92 58
> > > 4c a1 f6 0c
> > > [ 5477.498264] skb linear:   00000010: 1a c9 6d 0a 4c a1 f6 0c 1a c9 6d 0a
> > > 4c a1 f6 0c
> > > [ 5477.498269] skb linear:   00000020: 1a c9 6d 0a 4c a1 f6 0c 1a c9 6d 0a
> > > 4c a1 f6 0c
> > > [ 5477.498275] skb linear:   00000030: 1a c9 6d 0a 4c a1 f6 0c 1a c9 6d 0a
> > > 4c a1 f6 0c
> > > 
> > > 
> > > I will also add skb_dump(KERN_WARNING, skb, true) in the CAN receive path to
> > > see what's going on there.
> > > 
> > > My main problem with the KMSAN message
> > > https://lore.kernel.org/linux-can/68bae75b.050a0220.192772.0190.GAE@google.com/
> > > is that it uses
> > > 
> > > NAPI, XDP and therefore pskb_expand_head():
> > > 
> > >   kmalloc_reserve+0x23e/0x4a0 net/core/skbuff.c:609
> > >   pskb_expand_head+0x226/0x1a60 net/core/skbuff.c:2275
> > >   netif_skb_check_for_xdp net/core/dev.c:5081 [inline]
> > >   netif_receive_generic_xdp net/core/dev.c:5112 [inline]
> > >   do_xdp_generic+0x9e3/0x15a0 net/core/dev.c:5180
> > >   __netif_receive_skb_core+0x25c3/0x6f10 net/core/dev.c:5524
> > >   __netif_receive_skb_one_core net/core/dev.c:5702 [inline]
> > >   __netif_receive_skb+0xca/0xa00 net/core/dev.c:5817
> > >   process_backlog+0x4ad/0xa50 net/core/dev.c:6149
> > >   __napi_poll+0xe7/0x980 net/core/dev.c:6902
> > >   napi_poll net/core/dev.c:6971 [inline]
> > > 
> > > As you can see in
> > > https://syzkaller.appspot.com/x/log.txt?x=144ece64580000
> > > 
> > > [pid  5804] socket(AF_CAN, SOCK_DGRAM, CAN_ISOTP) = 5
> > > [pid  5804] ioctl(5, SIOCGIFINDEX, {ifr_name="vxcan0", ifr_ifindex=20}) = 0
> > > 
> > > they are using the vxcan driver which is mainly derived from vcan.c and
> > > veth.c (~2017). The veth.c driver supports all those GRO, NAPI and XDP
> > > features today which vxcan.c still does NOT support.
> > > 
> > > Therefore I wonder how the NAPI and XDP code can be used together with
> > > vxcan. And if this is still the case today, as the syzcaller kernel
> > > 6.13.0-rc7-syzkaller-00039-gc3812b15000c is already one year old.
> > > 
> > > Many questions ...
> > > 
> > > Best regards,
> > > Oliver
> > 
> > Hello Oliver,
> > 
> > I tried investigating further why the XDP path was chosen inspite of using
> > vxcan. I tried looking for dummy_can.c in upstream tree but could not find
> > it; I might be missing something here - could you please tell where can I
> > find it? Meanwhile, I tried using GDB for the analysis.
> > 
> > I observed in the bug's strace log:
> > 
> > [pid  5804] bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_XDP, insn_cnt=3, insns=0x200000c0, license="syzkaller", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="", prog_ifindex=0, expected_attach_type=BPF_XDP, prog_btf_fd=-1, func_info_rec_size=8, func_info=NULL, func_info_cnt=0, line_info_rec_size=16, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0, fd_array=NULL, ...}, 144) = 3
> > [pid  5804] socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE) = 4
> > [pid  5804] sendmsg(4, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\x34\x00\x00\x00\x10\x00\x01\x08\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x80\x40\x01\x00\x00\x00\x01\x00\x0c\x00\x2b\x80\x08\x00\x01\x00\x03\x00\x00\x00\x08\x00\x1b\x00\x00\x00\x00\x00", iov_len=52}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_DONTWAIT|MSG_FASTOPEN}, 0) = 52
> > [pid  5804] socket(AF_CAN, SOCK_DGRAM, CAN_ISOTP) = 5
> > [pid  5804] ioctl(5, SIOCGIFINDEX, {ifr_name="vxcan0", ifr_ifindex=20}) = 0
> > 
> > Notably, before binding vxcan0 to the CAN socket, a BPF program is loaded.
> > I then tried using GDB to check and got the following insights:
> > 
> > (gdb) b vxcan_xmit
> > Breakpoint 23 at 0xffffffff88ca899e: file drivers/net/can/vxcan.c, line 38.
> > (gdb) delete 23
> > (gdb) b __sys_bpf
> > Breakpoint 24 at 0xffffffff81d2653e: file kernel/bpf/syscall.c, line 5752.
> > (gdb) b bpf_prog_load
> > Breakpoint 25 at 0xffffffff81d2cd80: file kernel/bpf/syscall.c, line 2736.
> > (gdb) b vxcan_xmit if (oskb->dev->name[0]=='v' && ((oskb->dev->name[1]=='x' && oskb->dev->name[2]=='c' && oskb->dev->name[3]=='a' && oskb->dev->name[4]=='n') || (oskb->dev->name[1]=='c' && oskb->dev->name[2]=='a' && oskb->dev->name[3]=='n')))
> > Breakpoint 26 at 0xffffffff88ca899e: file drivers/net/can/vxcan.c, line 38.
> > (gdb) b __netif_receive_skb if (skb->dev->name[0]=='v' && ((skb->dev->name[1]=='x' && skb->dev->name[2]=='c' && skb->dev->name[3]=='a' && skb->dev->name[4]=='n') || (skb->dev->name[1]=='c' && skb->dev->name[2]=='a' && skb->dev->name[3]=='n')))
> > Breakpoint 27 at 0xffffffff8ce3c310: file net/core/dev.c, line 5798.
> > (gdb) b do_xdp_generic if (pskb->dev->name[0]=='v' && ((pskb->dev->name[1]=='x' && pskb->dev->name[2]=='c' && pskb->dev->name[3]=='a' && pskb->dev->name[4]=='n') || (pskb->dev->name[1]=='c' && pskb->dev->name[2]=='a' && pskb->dev->name[3]=='n')))
> > Breakpoint 28 at 0xffffffff8cdfccd7: file net/core/dev.c, line 5171.
> > (gdb) b dev_xdp_attach if (dev->name[0]=='v' && ((dev->name[1]=='x' && dev->name[2]=='c' && dev->name[3]=='a' && dev->name[4]=='n') || (dev->name[1]=='c' && dev->name[2]=='a' && dev->name[3]=='n')))
> > Breakpoint 29 at 0xffffffff8ce18b4e: file net/core/dev.c, line 9610.
> > 
> > Thread 2 hit Breakpoint 24, __sys_bpf (cmd=cmd@entry=BPF_PROG_LOAD, uattr=..., size=size@entry=144) at kernel/bpf/syscall.c:5752
> > 5752    {
> > (gdb) c
> > Continuing.
> > 
> > Thread 2 hit Breakpoint 25, bpf_prog_load (attr=attr@entry=0xffff88811c987d60, uattr=..., uattr_size=144) at kernel/bpf/syscall.c:2736
> > 2736    {
> > (gdb) c
> > Continuing.
> > [Switching to Thread 1.1]
> > 
> > Thread 1 hit Breakpoint 29, dev_xdp_attach (dev=dev@entry=0xffff888124e78000, extack=extack@entry=0xffff88811c987858, link=link@entry=0x0 <fixed_percpu_data>, new_prog=new_prog@entry=0xffffc9000a516000, old_prog=old_prog@entry=0x0 <fixed_percpu_data>, flags=flags@entry=0) at net/core/dev.c:9610
> > 9610    {
> > (gdb) p dev->name
> > $104 = "vcan0\000\000\000\000\000\000\000\000\000\000"
> > (gdb) p dev->xdp_prog
> > $105 = (struct bpf_prog *) 0x0 <fixed_percpu_data>
> > (gdb) c
> > Continuing.
> > 
> > Thread 1 hit Breakpoint 29, dev_xdp_attach (dev=dev@entry=0xffff88818e918000, extack=extack@entry=0xffff88811c987858, link=link@entry=0x0 <fixed_percpu_data>, new_prog=new_prog@entry=0xffffc9000a516000, old_prog=old_prog@entry=0x0 <fixed_percpu_data>, flags=flags@entry=0) at net/core/dev.c:9610
> > 9610    {
> > (gdb) p dev->name
> > $106 = "vxcan0\000\000\000\000\000\000\000\000\000"
> > (gdb) p dev->xdp_prog
> > $107 = (struct bpf_prog *) 0x0 <fixed_percpu_data>
> > (gdb) c
> > Continuing.
> > 
> > Thread 1 hit Breakpoint 29, dev_xdp_attach (dev=dev@entry=0xffff88818e910000, extack=extack@entry=0xffff88811c987858, link=link@entry=0x0 <fixed_percpu_data>, new_prog=new_prog@entry=0xffffc9000a516000, old_prog=old_prog@entry=0x0 <fixed_percpu_data>, flags=flags@entry=0) at net/core/dev.c:9610
> > 9610    {
> > (gdb) p dev->name
> > $108 = "vxcan1\000\000\000\000\000\000\000\000\000"
> > (gdb) p dev->xdp_prog
> > $109 = (struct bpf_prog *) 0x0 <fixed_percpu_data>
> > (gdb) c
> > Continuing.
> > [Switching to Thread 1.2]
> > 
> > Here, it is attempted to attach the eariler BPF program to each of the CAN
> > devices present (I checked only for CAN devices since we are dealing with
> > effect of XDP in CAN networing stack). Earlier they didn't seem to have any
> > BPF program attached due to which  XDP wasn't attempted for these CAN devices
> > earlier.
> > 
> > Thread 2 hit Breakpoint 26, vxcan_xmit (oskb=0xffff888115d8a400, dev=0xffff88818e918000) at drivers/net/can/vxcan.c:38
> > 38      {
> > (gdb) p oskb->dev->name
> > $110 = "vxcan0\000\000\000\000\000\000\000\000\000"
> > (gdb) p oskb->dev->xdp_prog
> > $111 = (struct bpf_prog *) 0xffffc9000a516000
> > (gdb) c
> > Continuing.
> > 
> > Thread 2 hit Breakpoint 27, __netif_receive_skb (skb=skb@entry=0xffff888115d8ab00) at net/core/dev.c:5798
> > 5798    {
> > (gdb) p skb->dev->name
> > $112 = "vxcan1\000\000\000\000\000\000\000\000\000"
> > (gdb) p skb->dev->xdp_prog
> > $113 = (struct bpf_prog *) 0xffffc9000a516000
> > (gdb) c
> > Continuing.
> > 
> > Thread 2 hit Breakpoint 28, do_xdp_generic (xdp_prog=0xffffc9000a516000, pskb=0xffff88843fc05af8) at net/core/dev.c:5171
> > 5171    {
> > (gdb) p pskb->dev->name
> > $114 = "vxcan1\000\000\000\000\000\000\000\000\000"
> > (gdb) p pskb->dev->xdp_prog
> > $115 = (struct bpf_prog *) 0xffffc9000a516000
> > (gdb) c
> > Continuing.
> > 
> > After this, the KMSAN bug is triggered. Hence, we can conclude that due to the
> > BPF program loaded earlier, the CAN device undertakes generic XDP path during RX,
> > which is accessible even if vxcan doesn't support XDP by itself.
> > 
> > It seems that the way CAN devices use the headroom for storing private skb related
> > data might be incompatible for XPD path, due to which the generic networking stack
> > at RX requires to expand the head, and it is done in such a way that the yet
> > uninitialized expanded headroom is accesssed by can_skb_prv() using skb->head.
> > 
> > So, I think we can solve this bug in the following ways:
> > 
> > 1. As you suggested earlier, access struct can_skb_priv using:
> > struct can_skb_priv *)(skb->data - sizeof(struct can_skb_priv)
> > This method ensures that the remaining CAN networking stack, which expects can_skb_priv
> > just before skb->data, as well as maintain compatibility with headroom expamnsion during
> > generic XDP.
> > 
> > 2. Try to find some way so that XDP pathway is rejected by CAN devices at the beginning
> > itself, like for example in function dev_xdp_attach():
> > 
> > /* don't call drivers if the effective program didn't change */
> > if (new_prog != cur_prog) {
> > 	bpf_op = dev_xdp_bpf_op(dev, mode);
> > 	if (!bpf_op) {
> > 		NL_SET_ERR_MSG(extack, "Underlying driver does not support XDP in native mode");
> > 		return -EOPNOTSUPP;
> > 	}
> > 
> > 	err = dev_xdp_install(dev, mode, bpf_op, extack, flags, new_prog);
> > 	if (err)
> > 		return err;
> > }
> > 
> > or in some other appropriate way.
> > 
> > What do you think what should be done ahead?
> > 
> > Best Regards,
> > Prithvi
> > 
> 

Hello Andrii,

Just a gentle ping on this thread 

Thanks, 
Prithvi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [bpf, xdp] headroom - was: Re: Question about to KMSAN: uninit-value in can_receive
  2026-01-02 15:36             ` Prithvi
@ 2026-01-02 20:04               ` Jakub Kicinski
  0 siblings, 0 replies; 10+ messages in thread
From: Jakub Kicinski @ 2026-01-02 20:04 UTC (permalink / raw)
  To: Prithvi
  Cc: andrii, socketcan, mkl, linux-can, linux-kernel, syzkaller-bugs,
	netdev

On Fri, 2 Jan 2026 21:06:11 +0530 Prithvi wrote:
> Just a gentle ping on this thread 

You're asking the wrong person, IIUC Andrii is tangentially involved
in XDP (via bpf links?):

XDP (eXpress Data Path)
M:	Alexei Starovoitov <ast@kernel.org>
M:	Daniel Borkmann <daniel@iogearbox.net>
M:	David S. Miller <davem@davemloft.net>
M:	Jakub Kicinski <kuba@kernel.org>
M:	Jesper Dangaard Brouer <hawk@kernel.org>
M:	John Fastabend <john.fastabend@gmail.com>
R:	Stanislav Fomichev <sdf@fomichev.me>
L:	netdev@vger.kernel.org
L:	bpf@vger.kernel.org

Without looking too deeply - XDP has historically left the new space
uninitialized after push, expecting programs to immediately write the
headers in that space. syzbot had run into this in the past but I can't
find any references to past threads quickly :(

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-01-02 20:04 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20251117173012.230731-1-activprithvi@gmail.com>
     [not found] ` <0c98b1c4-3975-4bf5-9049-9d7f10d22a6d@hartkopp.net>
2025-11-30 12:44   ` Question about to KMSAN: uninit-value in can_receive Oliver Hartkopp
2025-11-30 17:29     ` Prithvi Tambewagh
2025-11-30 19:09       ` Oliver Hartkopp
2025-12-07 18:45         ` Prithvi
2025-12-20 17:33         ` Prithvi
2025-12-21 18:29           ` [bpf, xdp] headroom - was: " Oliver Hartkopp
2025-12-21 19:06             ` Marc Kleine-Budde
2025-12-21 19:42               ` Oliver Hartkopp
2026-01-02 15:36             ` Prithvi
2026-01-02 20:04               ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).