* Re: [ROSE] [AX25] 6.15.10 long term stable kernel oops
[not found] ` <fff0b3eb-ea42-4475-970d-30622dc25dca@free.fr>
@ 2025-08-16 18:45 ` Bernard Pidoux
2025-08-18 10:00 ` Bernard Pidoux
0 siblings, 1 reply; 36+ messages in thread
From: Bernard Pidoux @ 2025-08-16 18:45 UTC (permalink / raw)
To: David Ranch, linux-hams, netdev
David,
For some reason my messages are not accepted by vger.kernel.org despite
I configured thunderbird not to send html.
I just compiled and loaded kernel 6.15.1.
Up to now FPAC 4.1.4 is running fine and performing connexions with
neighbour ROSE nodes.
I will let it run a while before starting to apply progressively the
AX25 and ROSE patches committed in kernels 15.2 to 15.10
I will start with ax25 ones and see what happens.
73 de Bernard f6bvp / ai7bg
Le 16/08/2025 à 19:49, Bernard Pidoux a écrit :
> Hi David,
>
> Actually Ubuntu stops responding without any message. No more response
> from keyboard or mouse. Only switch power !
>
> I am working on activating kernel messages on oops.
>
> The bug is already present in 6.15.10 so there is no reason to look at a
> more recent version.
>
> I will report any progress if I find something interesting.
>
> This is quite a challenge for me as I did not perform this kind of
> kernel investigations since nearly a decade...and I am not getting younger !
>
> 73 de Bernard, f6bvp / ai7bg
>
>
> Le 16/08/2025 à 19:32, David Ranch a écrit :
>>
>> Hey Bernard,
>>
>> Thanks for posting this issue. Can you copy/paste in the Oops you're
>> seeing? I did see a recent ROSE issue on 6.16.0-rc6-next-20250718-
>> syzkaller and I wonder if that could have created this issue:
>>
>> https://groups.google.com/g/syzkaller-bugs/c/0TmBbcJ2PKE
>>
>> Btw, I would say that posting this to netdev@vger.kernel.org would
>> probably be more important than this Debian list since this is most
>> likely a kernel issue and not a distro issue per se.
>>
>> --David
>> KI6ZHD
>>
>>
>> On 08/16/2025 10:02 AM, Bernard Pidoux wrote:
>>> Hi,
>>>
>>> I am continuously working on AX25 ROSE/FPAC node since decades,
>>> running a number of RaspBerry Pi (Raspi OS 64bit) plus Ubuntu LTS on
>>> a mini PC.
>>>
>>> Stable FPAC version 4.1.4 is performing packet switch quite well
>>> although some improvements are underway.
>>>
>>> FPAC runs flawlessly with kernel 6.14.11.
>>>
>>> However, trying FPAC under stable kernel 6.15.10 experienced a frozen
>>> system when issuing some commands like connect request.
>>>
>>> Investigations seem to show that ax25 connect is fine and that the
>>> bug is probably in ROSE module .
>>>
>>> I am presently trying to find the faulty bug that triggers the kernel
>>> oops by compiling and installing previous kernel versions starting
>>> with 6.15.1.
>>>
>>> 73s de Bernard, f6bvp / ai7bg
>>>
>>> http://f6bvp.org
>>>
>>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ROSE] [AX25] 6.15.10 long term stable kernel oops
2025-08-16 18:45 ` [ROSE] [AX25] 6.15.10 long term stable kernel oops Bernard Pidoux
@ 2025-08-18 10:00 ` Bernard Pidoux
2025-08-18 10:04 ` Folkert van Heusden
2025-08-18 16:30 ` Dan Cross
0 siblings, 2 replies; 36+ messages in thread
From: Bernard Pidoux @ 2025-08-18 10:00 UTC (permalink / raw)
To: David Ranch, linux-hams, netdev
[-- Attachment #1: Type: text/plain, Size: 2805 bytes --]
Hi,
I captured a screen picture of kernel panic in linux-6.16.0 that
displays [mkiss]. See included picture.
Bernard
Le 16/08/2025 à 20:45, Bernard Pidoux a écrit :
> David,
>
> For some reason my messages are not accepted by vger.kernel.org despite
> I configured thunderbird not to send html.
>
> I just compiled and loaded kernel 6.15.1.
>
> Up to now FPAC 4.1.4 is running fine and performing connexions with
> neighbour ROSE nodes.
>
> I will let it run a while before starting to apply progressively the
> AX25 and ROSE patches committed in kernels 15.2 to 15.10
>
> I will start with ax25 ones and see what happens.
>
> 73 de Bernard f6bvp / ai7bg
>
>
>
> Le 16/08/2025 à 19:49, Bernard Pidoux a écrit :
>> Hi David,
>>
>> Actually Ubuntu stops responding without any message. No more response
>> from keyboard or mouse. Only switch power !
>>
>> I am working on activating kernel messages on oops.
>>
>> The bug is already present in 6.15.10 so there is no reason to look at
>> a more recent version.
>>
>> I will report any progress if I find something interesting.
>>
>> This is quite a challenge for me as I did not perform this kind of
>> kernel investigations since nearly a decade...and I am not getting
>> younger !
>>
>> 73 de Bernard, f6bvp / ai7bg
>>
>>
>> Le 16/08/2025 à 19:32, David Ranch a écrit :
>>>
>>> Hey Bernard,
>>>
>>> Thanks for posting this issue. Can you copy/paste in the Oops you're
>>> seeing? I did see a recent ROSE issue on 6.16.0-rc6-next-20250718-
>>> syzkaller and I wonder if that could have created this issue:
>>>
>>> https://groups.google.com/g/syzkaller-bugs/c/0TmBbcJ2PKE
>>>
>>> Btw, I would say that posting this to netdev@vger.kernel.org would
>>> probably be more important than this Debian list since this is most
>>> likely a kernel issue and not a distro issue per se.
>>>
>>> --David
>>> KI6ZHD
>>>
>>>
>>> On 08/16/2025 10:02 AM, Bernard Pidoux wrote:
>>>> Hi,
>>>>
>>>> I am continuously working on AX25 ROSE/FPAC node since decades,
>>>> running a number of RaspBerry Pi (Raspi OS 64bit) plus Ubuntu LTS on
>>>> a mini PC.
>>>>
>>>> Stable FPAC version 4.1.4 is performing packet switch quite well
>>>> although some improvements are underway.
>>>>
>>>> FPAC runs flawlessly with kernel 6.14.11.
>>>>
>>>> However, trying FPAC under stable kernel 6.15.10 experienced a
>>>> frozen system when issuing some commands like connect request.
>>>>
>>>> Investigations seem to show that ax25 connect is fine and that the
>>>> bug is probably in ROSE module .
>>>>
>>>> I am presently trying to find the faulty bug that triggers the
>>>> kernel oops by compiling and installing previous kernel versions
>>>> starting with 6.15.1.
>>>>
>>>> 73s de Bernard, f6bvp / ai7bg
>>>>
>>>> http://f6bvp.org
>>>>
>>>
>
[-- Attachment #2: kernel_panic_6.16.0.jpg --]
[-- Type: image/jpeg, Size: 3034185 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ROSE] [AX25] 6.15.10 long term stable kernel oops
2025-08-18 10:00 ` Bernard Pidoux
@ 2025-08-18 10:04 ` Folkert van Heusden
2025-08-18 14:19 ` F6BVP
2025-08-18 16:30 ` Dan Cross
1 sibling, 1 reply; 36+ messages in thread
From: Folkert van Heusden @ 2025-08-18 10:04 UTC (permalink / raw)
To: Bernard Pidoux; +Cc: David Ranch, linux-hams, netdev
Looks like the issue I reported a few weeks ago.
In my case not related to rose, only new connections.
On 2025-08-18 12:00, Bernard Pidoux wrote:
> Hi,
>
> I captured a screen picture of kernel panic in linux-6.16.0 that
> displays [mkiss]. See included picture.
>
> Bernard
>
>
> Le 16/08/2025 à 20:45, Bernard Pidoux a écrit :
>> David,
>>
>> For some reason my messages are not accepted by vger.kernel.org
>> despite I configured thunderbird not to send html.
>>
>> I just compiled and loaded kernel 6.15.1.
>>
>> Up to now FPAC 4.1.4 is running fine and performing connexions with
>> neighbour ROSE nodes.
>>
>> I will let it run a while before starting to apply progressively the
>> AX25 and ROSE patches committed in kernels 15.2 to 15.10
>>
>> I will start with ax25 ones and see what happens.
>>
>> 73 de Bernard f6bvp / ai7bg
>>
>>
>>
>> Le 16/08/2025 à 19:49, Bernard Pidoux a écrit :
>>> Hi David,
>>>
>>> Actually Ubuntu stops responding without any message. No more
>>> response from keyboard or mouse. Only switch power !
>>>
>>> I am working on activating kernel messages on oops.
>>>
>>> The bug is already present in 6.15.10 so there is no reason to look
>>> at a more recent version.
>>>
>>> I will report any progress if I find something interesting.
>>>
>>> This is quite a challenge for me as I did not perform this kind of
>>> kernel investigations since nearly a decade...and I am not getting
>>> younger !
>>>
>>> 73 de Bernard, f6bvp / ai7bg
>>>
>>>
>>> Le 16/08/2025 à 19:32, David Ranch a écrit :
>>>>
>>>> Hey Bernard,
>>>>
>>>> Thanks for posting this issue. Can you copy/paste in the Oops
>>>> you're seeing? I did see a recent ROSE issue on
>>>> 6.16.0-rc6-next-20250718- syzkaller and I wonder if that could have
>>>> created this issue:
>>>>
>>>> https://groups.google.com/g/syzkaller-bugs/c/0TmBbcJ2PKE
>>>>
>>>> Btw, I would say that posting this to netdev@vger.kernel.org would
>>>> probably be more important than this Debian list since this is most
>>>> likely a kernel issue and not a distro issue per se.
>>>>
>>>> --David
>>>> KI6ZHD
>>>>
>>>>
>>>> On 08/16/2025 10:02 AM, Bernard Pidoux wrote:
>>>>> Hi,
>>>>>
>>>>> I am continuously working on AX25 ROSE/FPAC node since decades,
>>>>> running a number of RaspBerry Pi (Raspi OS 64bit) plus Ubuntu LTS
>>>>> on a mini PC.
>>>>>
>>>>> Stable FPAC version 4.1.4 is performing packet switch quite well
>>>>> although some improvements are underway.
>>>>>
>>>>> FPAC runs flawlessly with kernel 6.14.11.
>>>>>
>>>>> However, trying FPAC under stable kernel 6.15.10 experienced a
>>>>> frozen system when issuing some commands like connect request.
>>>>>
>>>>> Investigations seem to show that ax25 connect is fine and that the
>>>>> bug is probably in ROSE module .
>>>>>
>>>>> I am presently trying to find the faulty bug that triggers the
>>>>> kernel oops by compiling and installing previous kernel versions
>>>>> starting with 6.15.1.
>>>>>
>>>>> 73s de Bernard, f6bvp / ai7bg
>>>>>
>>>>> http://f6bvp.org
>>>>>
>>>>
>>
--
www.vanheusden.com
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ROSE] [AX25] 6.15.10 long term stable kernel oops
2025-08-18 10:04 ` Folkert van Heusden
@ 2025-08-18 14:19 ` F6BVP
0 siblings, 0 replies; 36+ messages in thread
From: F6BVP @ 2025-08-18 14:19 UTC (permalink / raw)
To: Folkert van Heusden, Bernard Pidoux; +Cc: David Ranch, linux-hams, netdev
[-- Attachment #1: Type: text/plain, Size: 300 bytes --]
Here are two screen dump pictures of kernel panic that seems to be
related to [mkiss] and already present in 6.15.6 kernel
Bernard
Le 18/08/2025 à 12:04, Folkert van Heusden a écrit :
> Looks like the issue I reported a few weeks ago.
> In my case not related to rose, only new connections.
>
[-- Attachment #2: 6.15-6-panic1.jpg --]
[-- Type: image/jpeg, Size: 823671 bytes --]
[-- Attachment #3: 6.15-6-panic2.jpg --]
[-- Type: image/jpeg, Size: 905106 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ROSE] [AX25] 6.15.10 long term stable kernel oops
2025-08-18 10:00 ` Bernard Pidoux
2025-08-18 10:04 ` Folkert van Heusden
@ 2025-08-18 16:30 ` Dan Cross
2025-08-18 18:28 ` F6BVP
1 sibling, 1 reply; 36+ messages in thread
From: Dan Cross @ 2025-08-18 16:30 UTC (permalink / raw)
To: Bernard Pidoux; +Cc: David Ranch, linux-hams, netdev
On Mon, Aug 18, 2025 at 6:02 AM Bernard Pidoux <bernard.pidoux@free.fr> wrote:
> Hi,
>
> I captured a screen picture of kernel panic in linux-6.16.0 that
> displays [mkiss]. See included picture.
Hi Bernard,
This is the same issue that I and a few other folks have run into.
Please see the analysis in
https://lore.kernel.org/linux-hams/CAEoi9W4FGoEv+2FUKs7zc=XoLuwhhLY8f8t_xQ6MgTJyzQPxXA@mail.gmail.com/#R
There, I traced the issue far enough to see that it comes from
`sbk->dev` being NULL on these connections. I haven't had time to look
further into why that is, or what changed that made that the case. I
now think that this occurs on the _first_ of the two loops I
mentioned, not the second, however.
- Dan C.
(Aside: I'm pretty sure that `linux-hams@vger.kernel.org` is not a
Debian-specific list.)
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ROSE] [AX25] 6.15.10 long term stable kernel oops
2025-08-18 16:30 ` Dan Cross
@ 2025-08-18 18:28 ` F6BVP
2025-08-18 22:11 ` Dan Cross
` (2 more replies)
0 siblings, 3 replies; 36+ messages in thread
From: F6BVP @ 2025-08-18 18:28 UTC (permalink / raw)
To: Dan Cross, Bernard Pidoux; +Cc: David Ranch, linux-hams, netdev
Hi Dan,
I agree that it must be the same bug and mkiss module is involved in
both cases although the environment is quite different.
I am using ROSE/FPAC nodes on different machines for AX25 messages
routing with LinFBB BBS.
Nowadays I do not have radio anymore and all are interconnected via
Internet using IP over AX25 encapsulation with ax25ipd (UDP ports).
I am running two RaspBerry Pi 3B+ with RaspiOS 64Bit and kernel 6.12.14.
AX25 configuration is performed via kissattach to create ax0 device.
ROSE / FPAC suite of applications manage ROSE, NetRom and AX25 protocols
for communications. FBB BBS forwards via rose0 port and TCP port 23
(telnet).
I do not observe any issue on those RasPiOS systems.
Another mini PC with Ubuntu 24-04 LTS and kernel 6-14.0-27-generic is
configured identiquely with FPAC/ROSE node and have absolutely no issues
with mkiss, ROSE or NetRom.
A few years ago I had been quite active on debugging ROSE module. As I
wanted to restart AX25 debugging I installed Linux-6.15.10 stable
kernel. This was the beginning of my kernel panic hunting...
My strategy is to find the most recent kernel that do not have any issue
with mkiss and progressively add AX25 patches in order to find the
guilty instruction. I will use a buch of printk in order to localize the
wrong code. We will see if it works.
Bernard
f6bvp / ai7bg
Le 18/08/2025 à 18:30, Dan Cross a écrit :
> On Mon, Aug 18, 2025 at 6:02 AM Bernard Pidoux <bernard.pidoux@free.fr> wrote:
>> Hi,
>>
>> I captured a screen picture of kernel panic in linux-6.16.0 that
>> displays [mkiss]. See included picture.
>
> Hi Bernard,
>
> This is the same issue that I and a few other folks have run into.
> Please see the analysis in
> https://lore.kernel.org/linux-hams/CAEoi9W4FGoEv+2FUKs7zc=XoLuwhhLY8f8t_xQ6MgTJyzQPxXA@mail.gmail.com/#R
>
> There, I traced the issue far enough to see that it comes from
> `sbk->dev` being NULL on these connections. I haven't had time to look
> further into why that is, or what changed that made that the case. I
> now think that this occurs on the _first_ of the two loops I
> mentioned, not the second, however.
>
> - Dan C.
>
> (Aside: I'm pretty sure that `linux-hams@vger.kernel.org` is not a
> Debian-specific list.)
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ROSE] [AX25] 6.15.10 long term stable kernel oops
2025-08-18 18:28 ` F6BVP
@ 2025-08-18 22:11 ` Dan Cross
2025-08-18 22:31 ` F6BVP
2025-08-19 21:17 ` [OT] " Miroslav Skoric
2 siblings, 0 replies; 36+ messages in thread
From: Dan Cross @ 2025-08-18 22:11 UTC (permalink / raw)
To: F6BVP; +Cc: Bernard Pidoux, David Ranch, linux-hams, netdev
On Mon, Aug 18, 2025 at 2:29 PM F6BVP <f6bvp@free.fr> wrote:
> I agree that it must be the same bug and mkiss module is involved in
> both cases although the environment is quite different.
> I am using ROSE/FPAC nodes on different machines for AX25 messages
> routing with LinFBB BBS.
> Nowadays I do not have radio anymore and all are interconnected via
> Internet using IP over AX25 encapsulation with ax25ipd (UDP ports).
>
> I am running two RaspBerry Pi 3B+ with RaspiOS 64Bit and kernel 6.12.14.
> AX25 configuration is performed via kissattach to create ax0 device.
> ROSE / FPAC suite of applications manage ROSE, NetRom and AX25 protocols
> for communications. FBB BBS forwards via rose0 port and TCP port 23
> (telnet).
>
> I do not observe any issue on those RasPiOS systems.
>
> Another mini PC with Ubuntu 24-04 LTS and kernel 6-14.0-27-generic is
> configured identiquely with FPAC/ROSE node and have absolutely no issues
> with mkiss, ROSE or NetRom.
>
> A few years ago I had been quite active on debugging ROSE module. As I
> wanted to restart AX25 debugging I installed Linux-6.15.10 stable
> kernel. This was the beginning of my kernel panic hunting...
>
> My strategy is to find the most recent kernel that do not have any issue
> with mkiss and progressively add AX25 patches in order to find the
> guilty instruction. I will use a buch of printk in order to localize the
> wrong code. We will see if it works.
Bernard,
Very good. A caveat is that the issue seems to be the bug
manifests itself in the `skbuff` infrastructure, independent of the
specific AX.25/NETROM/ROSE code: it may be that some other change
elsewhere in the kernel failed made a change that was incompatible
with AX.25 that gave rise to this bug.
I've found the oops to be very reproducible. Given that you seem
to have a known working kernel version, you may get more mileage out
of using `git bisect` to narrow things down to a specific failing
commit, instead of trying to forward-apply AX.25-specific commits.
- Dan C.
> Le 18/08/2025 à 18:30, Dan Cross a écrit :
> > On Mon, Aug 18, 2025 at 6:02 AM Bernard Pidoux <bernard.pidoux@free.fr> wrote:
> >> Hi,
> >>
> >> I captured a screen picture of kernel panic in linux-6.16.0 that
> >> displays [mkiss]. See included picture.
> >
> > Hi Bernard,
> >
> > This is the same issue that I and a few other folks have run into.
> > Please see the analysis in
> > https://lore.kernel.org/linux-hams/CAEoi9W4FGoEv+2FUKs7zc=XoLuwhhLY8f8t_xQ6MgTJyzQPxXA@mail.gmail.com/#R
> >
> > There, I traced the issue far enough to see that it comes from
> > `sbk->dev` being NULL on these connections. I haven't had time to look
> > further into why that is, or what changed that made that the case. I
> > now think that this occurs on the _first_ of the two loops I
> > mentioned, not the second, however.
> >
> > - Dan C.
> >
> > (Aside: I'm pretty sure that `linux-hams@vger.kernel.org` is not a
> > Debian-specific list.)
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ROSE] [AX25] 6.15.10 long term stable kernel oops
2025-08-18 18:28 ` F6BVP
2025-08-18 22:11 ` Dan Cross
@ 2025-08-18 22:31 ` F6BVP
2025-08-20 8:50 ` F6BVP
2025-08-21 11:28 ` [ROSE] [AX25] 6.15.10 long term stable kernel oops F6BVP
2025-08-19 21:17 ` [OT] " Miroslav Skoric
2 siblings, 2 replies; 36+ messages in thread
From: F6BVP @ 2025-08-18 22:31 UTC (permalink / raw)
To: Dan Cross, Bernard Pidoux; +Cc: David Ranch, linux-hams, netdev
[-- Attachment #1: Type: text/plain, Size: 505 bytes --]
Hi,
Just worked linux-6.15.4 and fpacnode FPAC client performing nice
connection with adjacent nodes.
After a couple of hours I added netromd daemon to send and receive NET /
ROM routing messages.
Soon this triggered the same OOps as before with kernel 6.15.6
(screen picture is joined).
It says : Comm: kworker/u16:3 Not tainted
and inside of Call Trace: one can see mkiss_receive_buf+0x301/0x3f0 [mkiss]
All this is exactly the same OOps report as with kernel 6.15.6 already
reported.
Bernard
[-- Attachment #2: OOps-6.15.4.jpg --]
[-- Type: image/jpeg, Size: 773826 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* [OT] Re: [ROSE] [AX25] 6.15.10 long term stable kernel oops
2025-08-18 18:28 ` F6BVP
2025-08-18 22:11 ` Dan Cross
2025-08-18 22:31 ` F6BVP
@ 2025-08-19 21:17 ` Miroslav Skoric
2 siblings, 0 replies; 36+ messages in thread
From: Miroslav Skoric @ 2025-08-19 21:17 UTC (permalink / raw)
To: F6BVP, Dan Cross, Bernard Pidoux; +Cc: David Ranch, linux-hams, netdev
On 8/18/25 8:28 PM, F6BVP wrote:
> Hi Dan,
>
> I agree that it must be the same bug and mkiss module is involved in
> both cases although the environment is quite different.
> I am using ROSE/FPAC nodes on different machines for AX25 messages
> routing with LinFBB BBS.
> Nowadays I do not have radio anymore and all are interconnected via
> Internet using IP over AX25 encapsulation with ax25ipd (UDP ports).
>
Hi Bernard, et al.
Sorry for hijacking the thread. I have a question: As an experimental
station based on Ubuntu 18.04 LTS (not connected to the Internet for
several years, using kernel Linux ubuntu 4.15.0-212-generic #223-Ubuntu
SMP Tue May 23 13:08:22 UTC 2023 i686 i686 i686 GNU/Linux), I run a
rather old FPAC-Node v 4.0.3 (built Jan 3 2016), and on top of it an
FBB bbs V7.0.10 (Feb 28 2021).
All works well for my basic packet needs. However it makes me wonder
whether it would be of any use to try upgrading FPAC and FBB, having in
mind that upgrading the distro is not possible.
Best regards, 73
Misko YT7MPB
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ROSE] [AX25] 6.15.10 long term stable kernel oops
2025-08-18 22:31 ` F6BVP
@ 2025-08-20 8:50 ` F6BVP
2025-08-20 19:33 ` kworker/u16 Not tainted F6BVP
2025-08-21 11:28 ` [ROSE] [AX25] 6.15.10 long term stable kernel oops F6BVP
1 sibling, 1 reply; 36+ messages in thread
From: F6BVP @ 2025-08-20 8:50 UTC (permalink / raw)
To: Dan Cross, Bernard Pidoux; +Cc: David Ranch, linux-hams, netdev
Hi All,
As linux-6.15.1 came with the same Oops kernel panic I jummped into 6.14
branch resulting in no issue up to 6.14.4
I observed that kworker/u16 was always cited in the panic report.
Grep -r kworker/u16 found the following report in
~.drivers/gpu/drm/ci/xfails/msm-apq8096-skips.txt
I am not sure if it is relevant to our present problem.
.....................
# Whole machine hangs
kms_cursor_legacy@all-pipes-torture-move
# Skip driver specific tests
^amdgpu.*
nouveau_.*
^panfrost.*
^v3d.*
^vc4.*
^vmwgfx*
# Skip intel specific tests
gem_.*
i915_.*
tools_test.*
# Currently fails and causes coverage loss for other tests
# since core_getversion also fails.
core_hotunplug.*
# gpu fault
# [IGT] msm_mapping: executing
# [IGT] msm_mapping: starting subtest shadow
# *** gpu fault: ttbr0=00000001030ea000 iova=0000000001074000 dir=WRITE
type=PERMISSION source=1f030000 (0,0,0,0)
# msm_mdp 901000.display-controller: RBBM | ME master split |
status=0x701000B0
# watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [kworker/u16:3:46]
msm/msm_mapping@shadow
.........................
Bernard
^ permalink raw reply [flat|nested] 36+ messages in thread
* kworker/u16 Not tainted
2025-08-20 8:50 ` F6BVP
@ 2025-08-20 19:33 ` F6BVP
0 siblings, 0 replies; 36+ messages in thread
From: F6BVP @ 2025-08-20 19:33 UTC (permalink / raw)
To: netdev; +Cc: linux-hams, Bernard Pidoux
[-- Attachment #1: Type: text/plain, Size: 542 bytes --]
[kworker/u16 Not tainted]
kernel panic is occuring in up-to-date net-next kernel as well as with
ALL kernels since linux-6.15.1
As shown in attached picture Oops is always correlated with mkiss.
mkiss driver is used in hamradio communications and in this specific
case it is associated with AX25 and rose drivers in fpac a ROSE/FPAC
network application.
The bug is trigger by a connect request in fpac to a neighbour node
using axudp device created by ax25ipd for IP encapsulation over AX25 frames.
Bernard Pidoux, f6bvp / ai7bg
[-- Attachment #2: Ooops_kworker-6.17.0-rc2_1.jpg --]
[-- Type: image/jpeg, Size: 137888 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ROSE] [AX25] 6.15.10 long term stable kernel oops
2025-08-18 22:31 ` F6BVP
2025-08-20 8:50 ` F6BVP
@ 2025-08-21 11:28 ` F6BVP
2025-08-21 22:39 ` F6BVP
1 sibling, 1 reply; 36+ messages in thread
From: F6BVP @ 2025-08-21 11:28 UTC (permalink / raw)
To: linux-hams, netdev; +Cc: Dan Cross, Folkert van Heusden, David Ranch
[-- Attachment #1: Type: text/plain, Size: 1001 bytes --]
Hi,
I confirm that linux-6.14 branch up to linux-6.14.11 is fine while ALL
following kernels Oops with kernel panics when using mkiss driver for AX25.
I did not find how to see changes from 6.14.11 to 6.15.1 that Oops !
Having read that 6.15.1 should not be used and be replaced by 6.15.2, I
maked and installed it and found something more interesting during
kernel panics.
Actually a new event is triggered by mkiss. As shown in attached picture
new kind of message says :
Workqueue: event_unbound flush_to_ldisc
RIP: 0010:__netif_receive_skb_core.constprop.0+0x1051/0x1330
I grepped kernel net files and found netif_receive_skb_core in /core/dev.c
Could it be that the bug is inside this function when receiving some
kind of unexpected buffer from mkiss ?
Not being a software engineer (I was actually MD before retiring at 75
from université Pitie-Salpetriere associate professor and hospital...),
I need some help to perform further investigations.
Bernard,
hamradio f6bvp /ai7bg
[-- Attachment #2: netif_receive_6.15.2-1.jpg --]
[-- Type: image/jpeg, Size: 173845 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ROSE] [AX25] 6.15.10 long term stable kernel oops
2025-08-21 11:28 ` [ROSE] [AX25] 6.15.10 long term stable kernel oops F6BVP
@ 2025-08-21 22:39 ` F6BVP
2025-08-22 3:10 ` Folkert van Heusden
0 siblings, 1 reply; 36+ messages in thread
From: F6BVP @ 2025-08-21 22:39 UTC (permalink / raw)
To: linux-hams, netdev; +Cc: Dan Cross, Folkert van Heusden, David Ranch
As I already reported mkiss never triggered any Oops kernel panic up to
linux-6.14.11.
In that version I put a number of printk inside of mkiss.c in order to
follow the normal behaviour and content outside and during FPAC
functionning especially when issuing a connect request.
On the opposite an FPAC connect request systematically triggers a kernel
panic with linux-6.15.2 and following kernels.
In 6.14.11 I observe that when mkiss runs core/dev is never activated
i.e. neither __netif_receive_skb nor __netif_receive_skb_one_core.
These functions appear in kernel 6.15.2 panics after mkiss_receive_buf.
One can guess that mkiss_receive_buf() is triggering something wrong in
kernel 6.15.2 and all following kernels up to net-next.
The challenge to locate the bug is quite difficult as I did not find the
way to find relevant code differences between both kernels in absence of
inc patch...
I sincerely regret not knowing how to go further.
Bernard,
hamradio f6bvp /ai7bg
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ROSE] [AX25] 6.15.10 long term stable kernel oops
2025-08-21 22:39 ` F6BVP
@ 2025-08-22 3:10 ` Folkert van Heusden
2025-08-24 14:04 ` F6BVP
0 siblings, 1 reply; 36+ messages in thread
From: Folkert van Heusden @ 2025-08-22 3:10 UTC (permalink / raw)
To: F6BVP; +Cc: linux-hams, netdev, Dan Cross, David Ranch
Bernard,
I skimmed over the diff between the latest 6.14.y and latest 6.15.y tags
of the raspberry pi linux kernel and didn't saw anything relevant
changed. Altough changes in 'arch' could in theory affect everything.
On 2025-08-22 00:39, F6BVP wrote:
> As I already reported mkiss never triggered any Oops kernel panic up to
> linux-6.14.11.
>
> In that version I put a number of printk inside of mkiss.c in order to
> follow the normal behaviour and content outside and during FPAC
> functionning especially when issuing a connect request.
>
> On the opposite an FPAC connect request systematically triggers a
> kernel panic with linux-6.15.2 and following kernels.
>
> In 6.14.11 I observe that when mkiss runs core/dev is never activated
> i.e. neither __netif_receive_skb nor __netif_receive_skb_one_core.
>
> These functions appear in kernel 6.15.2 panics after mkiss_receive_buf.
>
> One can guess that mkiss_receive_buf() is triggering something wrong in
> kernel 6.15.2 and all following kernels up to net-next.
>
> The challenge to locate the bug is quite difficult as I did not find
> the way to find relevant code differences between both kernels in
> absence of inc patch...
>
> I sincerely regret not knowing how to go further.
>
> Bernard,
> hamradio f6bvp /ai7bg
--
www.vanheusden.com
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ROSE] [AX25] 6.15.10 long term stable kernel oops
2025-08-22 3:10 ` Folkert van Heusden
@ 2025-08-24 14:04 ` F6BVP
2025-08-25 12:40 ` Dan Carpenter
0 siblings, 1 reply; 36+ messages in thread
From: F6BVP @ 2025-08-24 14:04 UTC (permalink / raw)
To: linux-hams, netdev
Cc: Dan Cross, David Ranch, Eric Dumazet, Folkert van Heusden
Hi All,
I suspect I finally found the bug that triggered a kernel panic since
linux-15.1 version up to net-next.
Actually I found a report from
syzbot+dca31068cff20d2ad44d@syzkaller.appspotmail.com
that directed me to the solution.
A pointer *p to a buffer was declared in tty_buffer_alloc() buf not
initialized.
Explanation :
- Sometime AX25 can perform connexions via a kissattached Ethernet port.
- In that case when an application sends a connect request from a
console, tty_port is used by mkiss.
All kernel panic reports I sent earlier show that mkiss_receive_buf was
involved together with tty_port_default and tty_ldisc_receive_buf.
It was sysbot detailed reporting KMSAN uninit value in mkiss_receive_buf
that led me to the solution. Although it took me a while to understand
the report for this is totally new for me...
Looking at the code I found :
static struct tty_buffer *tty_buffer_alloc(struct tty_port *port, size_t
size)
{
struct llist_node *free;
struct tty_buffer *p;
I first introduced a call to kmalloc in order to initialize pointer p
like it is done elsewhere in the function.
This performed well and Oops disappeared.
Then I tried to first initialize *p to NULL when it is declared :
struct tty_buffer *p=NULL;
When added it also performed correctly.
And finally I removed the kmalloc early instruction and only kept the
*p=NULL initialization.
Since then, I checked this simple initialization on both 6.15.2 and
6.17-rc2 and there was no more Oops.
I will provide the following patch against net-next in due form if there
is no objection.
diff --git a/drivers/tty/tty_buffer.c b/drivers/tty/tty_buffer.c
index 67271fc0b223..33e7f675b06d 100644
--- a/drivers/tty/tty_buffer.c
+++ b/drivers/tty/tty_buffer.c
@@ -159,7 +159,7 @@ void tty_buffer_free_all(struct tty_port *port)
static struct tty_buffer *tty_buffer_alloc(struct tty_port *port,
size_t size)
{
struct llist_node *free;
- struct tty_buffer *p;
+ struct tty_buffer *p=NULL;
/* Round the buffer size out */
size = __ALIGN_MASK(size, TTYB_ALIGN_MASK);
Bernard
Le 22/08/2025 à 05:10, Folkert van Heusden a écrit :
> Bernard,
>
> I skimmed over the diff between the latest 6.14.y and latest 6.15.y tags
> of the raspberry pi linux kernel and didn't saw anything relevant
> changed. Altough changes in 'arch' could in theory affect everything.
>
>
> On 2025-08-22 00:39, F6BVP wrote:
>> As I already reported mkiss never triggered any Oops kernel panic up
>> to linux-6.14.11.
>>
>> In that version I put a number of printk inside of mkiss.c in order to
>> follow the normal behaviour and content outside and during FPAC
>> functionning especially when issuing a connect request.
>>
>> On the opposite an FPAC connect request systematically triggers a
>> kernel panic with linux-6.15.2 and following kernels.
>>
>> In 6.14.11 I observe that when mkiss runs core/dev is never activated
>> i.e. neither __netif_receive_skb nor __netif_receive_skb_one_core.
>>
>> These functions appear in kernel 6.15.2 panics after mkiss_receive_buf.
>>
>> One can guess that mkiss_receive_buf() is triggering something wrong
>> in kernel 6.15.2 and all following kernels up to net-next.
>>
>> The challenge to locate the bug is quite difficult as I did not find
>> the way to find relevant code differences between both kernels in
>> absence of inc patch...
>>
>> I sincerely regret not knowing how to go further.
>>
>> Bernard,
>> hamradio f6bvp /ai7bg
>
^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [ROSE] [AX25] 6.15.10 long term stable kernel oops
2025-08-24 14:04 ` F6BVP
@ 2025-08-25 12:40 ` Dan Carpenter
2025-08-26 13:31 ` F6BVP
0 siblings, 1 reply; 36+ messages in thread
From: Dan Carpenter @ 2025-08-25 12:40 UTC (permalink / raw)
To: F6BVP
Cc: linux-hams, netdev, Dan Cross, David Ranch, Eric Dumazet,
Folkert van Heusden
No, this patch doesn't do anything. "p" is never used without being
initialized. Plus, I bet that if you do:
grep CONFIG_INIT_STACK_ALL_ZERO .config
You will find it is set to =y which means the compiler is already
initializing pointers to NULL anyway.
Perhaps you're worried that about this line:
p = kmalloc(struct_size(p, data, 2 * size), GFP_ATOMIC | __GFP_NOWARN);
where it seems to call "struct_size(p" where p is not initialized? On
that line the compiler is just doing a sizeof(*p) and not really using
the value of p at all.
regards,
dan carpenter
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ROSE] [AX25] 6.15.10 long term stable kernel oops
2025-08-25 12:40 ` Dan Carpenter
@ 2025-08-26 13:31 ` F6BVP
2025-08-26 13:36 ` Eric Dumazet
0 siblings, 1 reply; 36+ messages in thread
From: F6BVP @ 2025-08-26 13:31 UTC (permalink / raw)
To: Dan Carpenter
Cc: linux-hams, netdev, Dan Cross, David Ranch, Eric Dumazet,
Folkert van Heusden
[-- Attachment #1: Type: text/plain, Size: 1208 bytes --]
Dan, I thank you for explaining why the patch actually did not prevent
the bug to be still present.
I captured via netconsole two occurence of kernel panic that did not
follow exactly the same chain.
I hope these files may help to find where things go bad.
Bug is systematically triggered when running netromd daemon and
performing a connexion using ax25_call()
[syzbot] mail reported KMSAN found uinit-value both in kiss_unesc
(mkiss.c:303) and in mkiss_receive_buf() (mkiss.c:901).
However I did not identified the bug.
Regards,
Bernard
Le 25/08/2025 à 14:40, Dan Carpenter a écrit :
> No, this patch doesn't do anything. "p" is never used without being
> initialized. Plus, I bet that if you do:
>
> grep CONFIG_INIT_STACK_ALL_ZERO .config
>
> You will find it is set to =y which means the compiler is already
> initializing pointers to NULL anyway.
>
> Perhaps you're worried that about this line:
>
> p = kmalloc(struct_size(p, data, 2 * size), GFP_ATOMIC | __GFP_NOWARN);
>
> where it seems to call "struct_size(p" where p is not initialized? On
> that line the compiler is just doing a sizeof(*p) and not really using
> the value of p at all.
>
> regards,
> dan carpenter
>
[-- Attachment #2: netconsole_1.log --]
[-- Type: text/plain, Size: 6970 bytes --]
[ 5410.128592] Here I am: flush_to_ldisc:494
[ 5410.128953] Here I am: receive_buf:467
[ 5410.129319] Here I am: tty_port_default_receive_buf:29
[ 5410.129688] Here I am: tty_ldisc_receive_buf:415 count:24 bytes processed
[ 5411.100089] Here I am: __tty_insert_flip_string_flags:351 8 copied
[ 5411.100468] Here I am: flush_to_ldisc:494
[ 5411.100940] Here I am: receive_buf:467
[ 5411.101357] Here I am: tty_port_default_receive_buf:29
[ 5411.101717] Here I am: tty_ldisc_receive_buf:415 count:8 bytes processed
[ 5412.131968] Here I am: __tty_insert_flip_string_flags:351 68 copied
[ 5412.133016] Here I am: flush_to_ldisc:494
[ 5412.134007] Here I am: receive_buf:467
[ 5412.134862] Here I am: tty_port_default_receive_buf:29
[ 5412.135967] Here I am: tty_ldisc_receive_buf:415 count:68 bytes processed
[ 5412.147321] Here I am: __tty_insert_flip_string_flags:351 68 copied
[ 5412.148471] Here I am: flush_to_ldisc:494
[ 5412.149639] Here I am: receive_buf:467
[ 5412.150656] Here I am: tty_port_default_receive_buf:29
[ 5412.151844] BUG: kernel NULL pointer dereference, address: 00000000000000d0
[ 5412.151982] Here I am: __tty_insert_flip_string_flags:351 67 copied
[ 5412.152832] #PF: supervisor read access in kernel mode
[ 5412.153886] Here I am: flush_to_ldisc:494
[ 5412.154390] #PF: error_code(0x0000) - not-present page
[ 5412.154393] PGD 0 P4D 0
[ 5412.154396] Oops: Oops: 0000 [#1] SMP PTI
[ 5412.154399] CPU: 0 UID: 0 PID: 4760 Comm: kworker/u16:0 Not tainted 6.17.0-rc2-f6bvp+ #23 PREEMPT(voluntary)
[ 5412.154832] Here I am: receive_buf:467
[ 5412.155125] Hardware name: To be filled by O.E.M. To be filled by O.E.M./CK3, BIOS 5.011 09/16/2020
[ 5412.155566] Here I am: tty_port_default_receive_buf:29
[ 5412.155889] Workqueue: events_unbound flush_to_ldisc
[ 5412.156349] Here I am: tty_ldisc_receive_buf:415 count:67 bytes processed
[ 5412.156648]
[ 5412.158431] RIP: 0010:__netif_receive_skb_core.constprop.0+0xfe5/0x12d0
[ 5412.158786] Code: 6c 0f 82 24 01 00 00 48 01 93 c0 00 00 00 e9 52 f5 ff ff 48 89 df 4d 89 f5 e8 e7 b4 fd ff e9 c9 fd ff ff 4c 8d 88 d0 00 00 00 <48> 8b 80 d0 00 00 00 4c 8d 78 c8 49 39 c1 0f 84 6c fa ff ff 44 88
[ 5412.159182] RSP: 0018:ffffcd1940003c98 EFLAGS: 00010286
[ 5412.159606] RAX: 0000000000000000 RBX: ffff88a36fef0900 RCX: 0000000000000000
[ 5412.160022] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 5412.160441] RBP: ffffcd1940003da8 R08: 0000000000000200 R09: 00000000000000d0
[ 5412.160861] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88a37059ed40
[ 5412.161279] R13: 0000000000000000 R14: 0000000000000200 R15: ffff88a371fca0d0
[ 5412.161700] FS: 0000000000000000(0000) GS:ffff88a4ecabb000(0000) knlGS:0000000000000000
[ 5412.162128] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5412.162553] CR2: 00000000000000d0 CR3: 000000013aa40006 CR4: 00000000001726f0
[ 5412.162975] Call Trace:
[ 5412.163399] <IRQ>
[ 5412.163803] ? default_wake_function+0x1a/0x40
[ 5412.164210] ? ep_autoremove_wake_function+0x12/0x40
[ 5412.164614] ? __wake_up_common+0x79/0xb0
[ 5412.165019] ? __wake_up+0x45/0x70
[ 5412.165425] __netif_receive_skb_one_core+0x3d/0xa0
[ 5412.165825] __netif_receive_skb+0x15/0x60
[ 5412.166227] process_backlog+0x90/0x160
[ 5412.166624] __napi_poll+0x33/0x230
[ 5412.167018] net_rx_action+0x20b/0x3f0
[ 5412.167414] ? update_process_times+0x89/0xd0
[ 5412.167812] handle_softirqs+0xe7/0x340
[ 5412.168211] __do_softirq+0x10/0x18
[ 5412.168602] do_softirq.part.0+0x3f/0x80
[ 5412.168994] </IRQ>
[ 5412.169383] <TASK>
[ 5412.169772] __local_bh_enable_ip+0x6e/0x70
[ 5412.170166] _raw_spin_unlock_bh+0x1d/0x30
[ 5412.170565] mkiss_receive_buf+0x36b/0x4b0 [mkiss]
[ 5412.170961] tty_ldisc_receive_buf+0x78/0x80
[ 5412.171361] tty_port_default_receive_buf+0x5e/0xa0
[ 5412.171760] flush_to_ldisc+0xf9/0x1f0
[ 5412.172158] process_one_work+0x191/0x3e0
[ 5412.172548] worker_thread+0x2e3/0x420
[ 5412.172941] ? __pfx_worker_thread+0x10/0x10
[ 5412.173332] kthread+0x10d/0x230
[ 5412.173722] ? __pfx_kthread+0x10/0x10
[ 5412.174111] ret_from_fork+0x1a4/0x1d0
[ 5412.174501] ? __pfx_kthread+0x10/0x10
[ 5412.174888] ret_from_fork_asm+0x1a/0x30
[ 5412.175280] </TASK>
[ 5412.175695] Modules linked in: netrom mkiss rose ax25 netconsole snd_seq_dummy snd_hrtimer cmac nls_utf8 cifs cifs_arc4 nls_ucs2_utils netfs cifs_md4 snd_hda_codec_intelhdmi qrtr snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp coretemp i915 snd_hda_codec_alc269 snd_hda_scodec_component snd_hda_codec_realtek_lib snd_hda_codec_generic snd_hda_intel kvm_intel snd_intel_dspcfg spi_nor snd_hda_codec kvm snd_hwdep mtd binfmt_misc snd_hda_core mei_pxp processor_thermal_device_pci_legacy spi_intel_platform irqbypass intel_soc_dts_iosf snd_pcm at24 mei_hdcp intel_rapl_msr spi_intel processor_thermal_device polyval_clmulni processor_thermal_wt_hint ghash_clmulni_intel platform_temperature_control processor_thermal_rfim snd_seq aesni_intel processor_thermal_rapl i2c_algo_bit intel_rapl_common drm_buddy rapl nls_iso8859_1 snd_seq_device snd_timer i2c_i801 intel_cstate processor_thermal_wt_req ttm snd mei_me i2c_smbus processor_thermal_power_floor intel_pch_thermal lpc_ich drm_display_helper mei processor_thermal_mbox
[ 5412.175760] soundcore intel_pmc_core int340x_thermal_zone pmt_telemetry video pmt_discovery pmt_class wmi intel_pmc_ssram_telemetry acpi_pad intel_vsec input_leds mac_hid sch_fq_codel msr parport_pc ppdev lp parport efi_pstore nfnetlink dmi_sysfs autofs4 hid_generic usbhid hid r8169 ahci libahci realtek uas usb_storage [last unloaded: mkiss]
[ 5412.179250] CR2: 00000000000000d0
[ 5412.179757] ---[ end trace 0000000000000000 ]---
[ 5412.258374] pstore: backend (efi_pstore) writing error (-5)
[ 5412.258977] RIP: 0010:__netif_receive_skb_core.constprop.0+0xfe5/0x12d0
[ 5412.259599] Code: 6c 0f 82 24 01 00 00 48 01 93 c0 00 00 00 e9 52 f5 ff ff 48 89 df 4d 89 f5 e8 e7 b4 fd ff e9 c9 fd ff ff 4c 8d 88 d0 00 00 00 <48> 8b 80 d0 00 00 00 4c 8d 78 c8 49 39 c1 0f 84 6c fa ff ff 44 88
[ 5412.260252] RSP: 0018:ffffcd1940003c98 EFLAGS: 00010286
[ 5412.260926] RAX: 0000000000000000 RBX: ffff88a36fef0900 RCX: 0000000000000000
[ 5412.261600] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 5412.262276] RBP: ffffcd1940003da8 R08: 0000000000000200 R09: 00000000000000d0
[ 5412.262922] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88a37059ed40
[ 5412.263535] R13: 0000000000000000 R14: 0000000000000200 R15: ffff88a371fca0d0
[ 5412.264150] FS: 0000000000000000(0000) GS:ffff88a4ecabb000(0000) knlGS:0000000000000000
[ 5412.264756] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5412.265362] CR2: 00000000000000d0 CR3: 000000013aa40006 CR4: 00000000001726f0
[ 5412.265974] Kernel panic - not syncing: Fatal exception in interrupt
[ 5412.266515] Kernel Offset: 0x1ee00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 5412.344532] Rebooting in 60 seconds..
[-- Attachment #3: netconsole_2.log --]
[-- Type: text/plain, Size: 7465 bytes --]
[ 4405.445220] Here I am: receive_buf:467
[ 4405.445221] Here I am: tty_port_default_receive_buf:29
[ 4405.445222] Here I am: tty_ldisc_receive_buf:415 count:26 bytes processed
[ 4405.445233] mkiss: ax0: Trying crc-smack
[ 4405.445238] Here I am: __tty_buffer_request_room:304 size:20
[ 4405.445240] Here I am: __tty_insert_flip_string_flags:351 20 copied
[ 4405.445245] Here I am: flush_to_ldisc:494
[ 4405.445246] Here I am: tty_buffer_free:210
[ 4405.445247] Here I am: receive_buf:467
[ 4405.445248] Here I am: tty_port_default_receive_buf:29
[ 4405.445251] Here I am: tty_ldisc_receive_buf:415 count:20 bytes processed
[ 4405.446487] Here I am: __tty_buffer_request_room:304 size:18
[ 4405.450656] Here I am: __tty_insert_flip_string_flags:351 18 copied
[ 4405.450740] Here I am: flush_to_ldisc:494
[ 4405.451637] Here I am: tty_buffer_free:210
[ 4405.451639] Here I am: receive_buf:467
[ 4405.451640] Here I am: tty_port_default_receive_buf:29
[ 4405.451654] Here I am: tty_ldisc_receive_buf:415 count:18 bytes processed
[ 4405.451670] Here I am: __tty_insert_flip_string_flags:351 14 copied
[ 4405.453616] Here I am: flush_to_ldisc:494
[ 4405.453987] Here I am: receive_buf:467
[ 4405.454380] Here I am: tty_port_default_receive_buf:29
[ 4405.454460] Here I am: __tty_insert_flip_string_flags:351 272 copied
[ 4405.454825] Here I am: tty_ldisc_receive_buf:415 count:14 bytes processed
[ 4405.455218] Here I am: flush_to_ldisc:494
[ 4405.455228] Here I am: __tty_insert_flip_string_flags:351 215 copied
[ 4405.456669] Here I am: receive_buf:467
[ 4405.456670] Here I am: tty_port_default_receive_buf:29
[ 4405.456687] BUG: kernel NULL pointer dereference, address: 00000000000000d0
[ 4405.456711] Here I am: __tty_insert_flip_string_flags:351 253 copied
[ 4405.456723] Here I am: flush_to_ldisc:494
[ 4405.456726] Here I am: receive_buf:467
[ 4405.456728] Here I am: tty_port_default_receive_buf:29
[ 4405.456730] Here I am: tty_ldisc_receive_buf:415 count:253 bytes processed
[ 4405.459672] #PF: supervisor read access in kernel mode
[ 4405.459676] #PF: error_code(0x0000) - not-present page
[ 4405.459678] PGD 0 P4D 0
[ 4405.459682] Oops: Oops: 0000 [#1] SMP PTI
[ 4405.461239] CPU: 1 UID: 0 PID: 2953 Comm: kworker/u16:1 Not tainted 6.17.0-rc2-f6bvp+ #23 PREEMPT(voluntary)
[ 4405.461243] Hardware name: To be filled by O.E.M. To be filled by O.E.M./CK3, BIOS 5.011 09/16/2020
[ 4405.461245] Workqueue: events_unbound flush_to_ldisc
[ 4405.461252] RIP: 0010:__netif_receive_skb_core.constprop.0+0xfe5/0x12d0
[ 4405.461258] Code: 6c 0f 82 24 01 00 00 48 01 93 c0 00 00 00 e9 52 f5 ff ff 48 89 df 4d 89 f5 e8 e7 b4 fd ff e9 c9 fd ff ff 4c 8d 88 d0 00 00 00 <48> 8b 80 d0 00 00 00 4c 8d 78 c8 49 39 c1 0f 84 6c fa ff ff 44 88
[ 4405.461261] RSP: 0018:ffffd254c0016c98 EFLAGS: 00010286
[ 4405.461264] RAX: 0000000000000000 RBX: ffff8aca56b4c300 RCX: 0000000000000000
[ 4405.461266] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 4405.461268] RBP: ffffd254c0016da8 R08: 0000000000000200 R09: 00000000000000d0
[ 4405.461270] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8aca6d5b6d40
[ 4405.461272] R13: 0000000000000000 R14: 0000000000000200 R15: ffff8aca522870d0
[ 4405.461274] FS: 0000000000000000(0000) GS:ffff8acbec13b000(0000) knlGS:0000000000000000
[ 4405.461277] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4405.461279] CR2: 00000000000000d0 CR3: 00000001a7c40003 CR4: 00000000001726f0
[ 4405.461281] Call Trace:
[ 4405.461283] <IRQ>
[ 4405.461286] ? attach_entity_load_avg+0x1d1/0x1f0
[ 4405.461293] ? psi_group_change+0x201/0x4e0
[ 4405.461298] ? skb_free_head+0xa4/0xd0
[ 4405.468661] ? skb_release_data+0x186/0x210
[ 4405.468667] __netif_receive_skb_one_core+0x3d/0xa0
[ 4405.468670] __netif_receive_skb+0x15/0x60
[ 4405.470017] process_backlog+0x90/0x160
[ 4405.470400] __napi_poll+0x33/0x230
[ 4405.470780] net_rx_action+0x20b/0x3f0
[ 4405.471156] handle_softirqs+0xe7/0x340
[ 4405.471528] __do_softirq+0x10/0x18
[ 4405.471903] do_softirq.part.0+0x3f/0x80
[ 4405.472305] </IRQ>
[ 4405.472702] <TASK>
[ 4405.473104] __local_bh_enable_ip+0x6e/0x70
[ 4405.473503] _raw_spin_unlock_bh+0x1d/0x30
[ 4405.473907] mkiss_receive_buf+0x36b/0x4b0 [mkiss]
[ 4405.474310] tty_ldisc_receive_buf+0x78/0x80
[ 4405.474710] tty_port_default_receive_buf+0x5e/0xa0
[ 4405.475099] flush_to_ldisc+0xf9/0x1f0
[ 4405.475482] process_one_work+0x191/0x3e0
[ 4405.475875] worker_thread+0x2e3/0x420
[ 4405.476273] ? __pfx_worker_thread+0x10/0x10
[ 4405.476657] kthread+0x10d/0x230
[ 4405.477025] ? __pfx_kthread+0x10/0x10
[ 4405.477387] ret_from_fork+0x1a4/0x1d0
[ 4405.477750] ? __pfx_kthread+0x10/0x10
[ 4405.478110] ret_from_fork_asm+0x1a/0x30
[ 4405.478471] </TASK>
[ 4405.478827] Modules linked in: netrom netconsole mkiss ax25 snd_seq_dummy snd_hrtimer cmac nls_utf8 cifs cifs_arc4 nls_ucs2_utils netfs cifs_md4 qrtr snd_hda_codec_intelhdmi snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp i915 coretemp snd_hda_codec_alc269 kvm_intel snd_hda_scodec_component snd_hda_codec_realtek_lib snd_hda_codec_generic snd_hda_intel kvm snd_intel_dspcfg spi_nor processor_thermal_device_pci_legacy snd_hda_codec intel_soc_dts_iosf mtd irqbypass snd_hwdep binfmt_misc processor_thermal_device snd_hda_core processor_thermal_wt_hint snd_pcm polyval_clmulni at24 spi_intel_platform mei_hdcp mei_pxp ghash_clmulni_intel intel_rapl_msr platform_temperature_control spi_intel aesni_intel processor_thermal_rfim snd_seq processor_thermal_rapl nls_iso8859_1 snd_seq_device rapl intel_rapl_common i2c_algo_bit snd_timer drm_buddy mei_me intel_pmc_core processor_thermal_wt_req i2c_i801 snd ttm processor_thermal_power_floor pmt_telemetry mei intel_cstate intel_pch_thermal i2c_smbus pmt_discovery
[ 4405.478885] drm_display_helper lpc_ich processor_thermal_mbox soundcore int340x_thermal_zone video pmt_class wmi input_leds intel_pmc_ssram_telemetry intel_vsec acpi_pad mac_hid sch_fq_codel msr parport_pc ppdev lp parport efi_pstore nfnetlink dmi_sysfs autofs4 hid_generic usbhid hid r8169 ahci realtek libahci uas usb_storage
[ 4405.481860] CR2: 00000000000000d0
[ 4405.482371] ---[ end trace 0000000000000000 ]---
[ 4405.563737] pstore: backend (efi_pstore) writing error (-5)
[ 4405.564259] RIP: 0010:__netif_receive_skb_core.constprop.0+0xfe5/0x12d0
[ 4405.564776] Code: 6c 0f 82 24 01 00 00 48 01 93 c0 00 00 00 e9 52 f5 ff ff 48 89 df 4d 89 f5 e8 e7 b4 fd ff e9 c9 fd ff ff 4c 8d 88 d0 00 00 00 <48> 8b 80 d0 00 00 00 4c 8d 78 c8 49 39 c1 0f 84 6c fa ff ff 44 88
[ 4405.565314] RSP: 0018:ffffd254c0016c98 EFLAGS: 00010286
[ 4405.565841] RAX: 0000000000000000 RBX: ffff8aca56b4c300 RCX: 0000000000000000
[ 4405.566369] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 4405.566890] RBP: ffffd254c0016da8 R08: 0000000000000200 R09: 00000000000000d0
[ 4405.567423] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8aca6d5b6d40
[ 4405.567958] R13: 0000000000000000 R14: 0000000000000200 R15: ffff8aca522870d0
[ 4405.568479] FS: 0000000000000000(0000) GS:ffff8acbec13b000(0000) knlGS:0000000000000000
[ 4405.569013] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4405.569538] CR2: 00000000000000d0 CR3: 00000001a7c40003 CR4: 00000000001726f0
[ 4405.570073] Kernel panic - not syncing: Fatal exception in interrupt
[ 4405.570562] Kernel Offset: 0x1f800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 4405.651489] Rebooting in 30 seconds..
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ROSE] [AX25] 6.15.10 long term stable kernel oops
2025-08-26 13:31 ` F6BVP
@ 2025-08-26 13:36 ` Eric Dumazet
2025-08-27 14:16 ` F6BVP
0 siblings, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2025-08-26 13:36 UTC (permalink / raw)
To: F6BVP
Cc: Dan Carpenter, linux-hams, netdev, Dan Cross, David Ranch,
Folkert van Heusden
On Tue, Aug 26, 2025 at 6:31 AM F6BVP <f6bvp@free.fr> wrote:
>
> Dan, I thank you for explaining why the patch actually did not prevent
> the bug to be still present.
>
> I captured via netconsole two occurence of kernel panic that did not
> follow exactly the same chain.
>
> I hope these files may help to find where things go bad.
>
> Bug is systematically triggered when running netromd daemon and
> performing a connexion using ax25_call()
>
> [syzbot] mail reported KMSAN found uinit-value both in kiss_unesc
> (mkiss.c:303) and in mkiss_receive_buf() (mkiss.c:901).
>
> However I did not identified the bug.
>
> Regards,
>
Make sure to add symbols to these logs, otherwise we can not really help.
cat CRASH | scripts/decode_stacktrace.sh ./vmlinux
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ROSE] [AX25] 6.15.10 long term stable kernel oops
2025-08-26 13:36 ` Eric Dumazet
@ 2025-08-27 14:16 ` F6BVP
2025-08-27 17:30 ` Florian Westphal
0 siblings, 1 reply; 36+ messages in thread
From: F6BVP @ 2025-08-27 14:16 UTC (permalink / raw)
To: Eric Dumazet
Cc: Dan Carpenter, linux-hams, netdev, Dan Cross, David Ranch,
Folkert van Heusden
[-- Attachment #1: Type: text/plain, Size: 3074 bytes --]
Hi Eric,
I finally found the instruction triggering the bug in
tty_ldisc_receive_buf()
Being absolutely new in kernel debugging, I read
Documentation/admin-guide/bug-hunting.rst in order to see what I needed
to do :
./scripts/config -d COMPILE_TEST -e DEBUG_KERNEL -e DEBUG_INFO
I also installed netconsole driver to capture Oops and received it on a
local RaspBerry Pi with socat :
nohup socat -u udp-recv:6666 ./netconsole.log < /dev/null > /dev/null 2>&1 &
In function tty_ldisc_receive_buf() , call to receive_buf() goes well as
long as count is small whereas with large number of bytes there is a
kernel BUG NULL pointer dereference.
Included is the last netconsole log I captured and only kept last pages.
If with analyze netconsole.log from 4,19279,15377986 we see three
sequences from tty_port_default_receive_buf to tty_ldisc_receive_buf
giving the number of bytes processed.
If we concentrate on sequence __tty_insert_flip_string_flags that gives
the number of copied bytes we see that just before the BUG something
goes differently when bytes number is relatively high i.e. 272 in our case.
There is yet another tty_ldisc_deref after flush_to_ldisc and before
receive_buf.
In netconsole.log (4,19284,1537778017) call to receive_buf() is fine
with count value equal 18 bytes.
With line sequence (see below) 411, line 416, line 427 everything goes
well when byte count is <=28 in our report.
On the contrary the sequence line 416, line 421, if bytes count is
bigger (272), line 427 is not reached which means that
ld->ops->receive_buf(ld->tty, p, f, count);
never returns.
As a proof I commented this line of code and the BUG dissapeared... Of
course the application did not achieved the AX25 connexion, waiting for
a reply.
Here I am. Next step is probably to discover why the call to
receive_buf() fails when bytes are not small and tty_ldisc_deref() is
acting after flush_to_ldisc probably leading to an error. What value is
wrong ? ld->tty , p, f ?
Regards,
Bernard
Here here tty_ldisc_receive_buf() with line numbers added to printk() lines:
size_t tty_ldisc_receive_buf(struct tty_ldisc *ld, const u8 *p, const u8 *f,
size_t count)
{
if (ld->ops->receive_buf2)
{
count = ld->ops->receive_buf2(ld->tty, p, f, count);
L411 printk("Here I am: %s:%d count:%ld bytes
buf2\n",__FUNCTION__,__LINE__,count);
}
else {
count = min_t(size_t, count, ld->tty->receive_room);
L416 printk("Here I am: %s:%d count:%ld bytes tty
receive_room\n",__FUNCTION__,__LINE__,count);
if (count && ld->ops->receive_buf)
{
L421 printk("Here I am: %s:%d count:%ld bytes --->
receive_buf\n",__FUNCTION__,__LINE__,count);
ld->ops->receive_buf(ld->tty, p, f, count);
}
}
L427 printk("Here I am: %s:%d count:%ld bytes
processed\n",__FUNCTION__,__LINE__,count);
return count;
}
EXPORT_SYMBOL_GPL(tty_ldisc_receive_buf);
Le 26/08/2025 à 15:36, Eric Dumazet a écrit :
>
> Make sure to add symbols to these logs, otherwise we can not really help.
>
> cat CRASH | scripts/decode_stacktrace.sh ./vmlinux
[-- Attachment #2: netconsole.log --]
[-- Type: text/plain, Size: 8519 bytes --]
4,19278,153777981,-;Here I am: tty_ldisc_receive_buf:427 count:18 bytes processed
4,19279,153777986,-;Here I am: tty_port_default_receive_buf:46 count:18
4,19280,153777991,-;Here I am: tty_ldisc_deref:283 !tty
4,19281,153777996,-;Here I am: __tty_insert_flip_string_flags:351 14 copied
4,19282,153778009,-;Here I am: tty_ldisc_deref:283 !tty
4,19283,153778010,-;Here I am: flush_to_ldisc:506
4,19284,153778017,-;Here I am: receive_buf:479
4,19285,153778023,-;Here I am: n_tty_receive_buf_common:1686
4,19286,153778028,-;Here I am: tty_ldisc_deref:283 !tty
4,19287,153778037,-;Here I am: tty_ldisc_receive_buf:411 count:14 bytes buf2
4,19288,153778043,-;Here I am: tty_ldisc_receive_buf:427 count:14 bytes processed
4,19289,153778048,-;Here I am: tty_port_default_receive_buf:46 count:14
4,19290,153778053,-;Here I am: tty_ldisc_deref:283 !tty
4,19291,153778067,-;Here I am: tty_ldisc_deref:283 !tty
4,19292,153778072,-;Here I am: tty_ldisc_deref:283 !tty
4,19293,153778093,-;Here I am: tty_ldisc_deref:283 !tty
4,19294,153778105,-;Here I am: tty_ldisc_deref:283 !tty
4,19295,153778109,-;Here I am: tty_ldisc_deref:283 !tty
4,19296,153778112,-;Here I am: tty_ldisc_deref:283 !tty
4,19297,153778126,-;Here I am: __tty_insert_flip_string_flags:351 14 copied
4,19298,153778130,-;Here I am: tty_ldisc_deref:283 !tty
4,19299,153778134,-;Here I am: flush_to_ldisc:506
4,19300,153778136,-;Here I am: receive_buf:479
4,19301,153778138,-;Here I am: n_tty_receive_buf_common:1686
4,19302,153778147,-;Here I am: tty_ldisc_receive_buf:411 count:14 bytes buf2
4,19303,153778152,-;Here I am: tty_ldisc_receive_buf:427 count:14 bytes processed
4,19304,153778157,-;Here I am: tty_port_default_receive_buf:46 count:14
4,19305,153778161,-;Here I am: tty_ldisc_deref:283 !tty
4,19306,153778162,-;Here I am: tty_ldisc_deref:283 !tty
4,19307,153778169,-;Here I am: tty_ldisc_deref:283 !tty
4,19308,153778425,-;Here I am: tty_ldisc_deref:283 !tty
4,19309,153778450,-;Here I am: tty_ldisc_deref:283 !tty
4,19310,153778457,-;Here I am: tty_ldisc_deref:283 !tty
4,19311,153778469,-;Here I am: tty_ldisc_deref:283 !tty
4,19312,153778480,-;Here I am: tty_ldisc_deref:283 !tty
6,19313,153778507,-;mkiss: ax0: Trying crc-flexnet
4,19314,153778518,-;Here I am: __tty_insert_flip_string_flags:351 28 copied
4,19315,153778551,-;Here I am: flush_to_ldisc:506
4,19316,153778557,-;Here I am: receive_buf:479
4,19317,153778566,-;Here I am: n_tty_receive_buf_common:1686
4,19318,153778579,-;Here I am: tty_ldisc_receive_buf:411 count:28 bytes buf2
4,19319,153778584,-;Here I am: tty_ldisc_receive_buf:427 count:28 bytes processed
4,19320,153778592,-;Here I am: tty_port_default_receive_buf:46 count:28
4,19321,153778600,-;Here I am: tty_ldisc_deref:283 !tty
4,19322,153778615,-;Here I am: tty_ldisc_deref:283 !tty
4,19323,153778637,-;Here I am: tty_ldisc_deref:283 !tty
4,19324,153778643,-;Here I am: tty_ldisc_deref:283 !tty
4,19325,153778879,-;Here I am: tty_ldisc_deref:283 !tty
4,19326,153779283,-;Here I am: tty_ldisc_deref:283 !tty
4,19327,153779328,-;Here I am: tty_ldisc_deref:283 !tty
4,19328,153786849,-;Here I am: tty_ldisc_deref:283 !tty
4,19329,153786875,-;Here I am: __tty_insert_flip_string_flags:351 272 copied
4,19330,153786882,-;Here I am: tty_ldisc_deref:283 !tty
4,19331,153786888,-;Here I am: flush_to_ldisc:506
4,19332,153786891,-;Here I am: tty_ldisc_deref:283 !tty
4,19333,153786897,-;Here I am: receive_buf:479
4,19334,153786902,-;Here I am: tty_ldisc_receive_buf:416 count:272 bytes tty receive_room
4,19335,153786906,-;Here I am: tty_ldisc_receive_buf:421 count:272 bytes ---> receive_buf
1,19336,153786932,-;BUG: kernel NULL pointer dereference, address: 00000000000000d0
1,19337,153786937,-;#PF: supervisor read access in kernel mode
1,19338,153786955,-;#PF: error_code(0x0000) - not-present page
6,19339,153786960,-;PGD 0
4,19340,153786960,-;Here I am: tty_ldisc_deref:283 !tty
4,19341,153786967,c;P4D 0
4,19342,153786972,-;Oops: Oops: 0000 [#1] SMP PTI
4,19343,153786972,-;Here I am: tty_ldisc_deref:283 !tty
4,19344,153786981,-;CPU: 3 UID: 0 PID: 48 Comm: kworker/u16:2 Not tainted 6.17.0-rc2-f6bvp+ #25 PREEMPT(voluntary)
4,19345,153786986,-;Here I am: __tty_insert_flip_string_flags:351 215 copied
4,19346,153786988,-;Hardware name: To be filled by O.E.M. To be filled by O.E.M./CK3, BIOS 5.011 09/16/2020
4,19347,153786990,-;Workqueue: events_unbound flush_to_ldisc
4,19348,153786998,-;Here I am: tty_ldisc_deref:283 !tty
4,19349,153787003,-;Here I am: tty_ldisc_deref:283 !tty
4,19350,153787005,-;RIP: 0010:__netif_receive_skb_core.constprop.0+0xfe5/0x12d0
4,19351,153787009,-;Here I am: __tty_buffer_request_room:304 size:253
4,19352,153787011,-;Code: 6c 0f 82 24 01 00 00 48 01 93 c0 00 00 00 e9 52 f5 ff ff 48 89 df 4d 89 f5 e8 e7 b4 fd ff e9 c9 fd ff ff 4c 8d 88 d0 00 00 00 <48> 8b 80 d0 00 00 00 4c 8d 78 c8 49 39 c1 0f 84 6c fa ff ff 44 88
4,19353,153787015,-;RSP: 0018:ffffd13040184c98 EFLAGS: 00010286
4,19379,153787121,-;CR2: 00000000000000d0 CR3: 00000001a5440002 CR4: 00000000001726f0
4,19380,153787124,-;Call Trace:
4,19381,153787128,-;Here I am: __tty_insert_flip_string_flags:351 253 copied
4,19382,153787131,-;Here I am: tty_ldisc_deref:283 !tty
4,19383,153787133,-;Here I am: flush_to_ldisc:506
4,19384,153787135,-;Here I am: receive_buf:479
4,19385,153787137,-;Here I am: n_tty_receive_buf_common:1686
4,19386,153787140,-;Here I am: tty_ldisc_receive_buf:411 count:253 bytes buf2
4,19387,153787142,-;Here I am: tty_ldisc_receive_buf:427 count:253 bytes processed
4,19388,153787144,-;Here I am: tty_port_default_receive_buf:46 count:253
4,19389,153787145,-;Here I am: tty_ldisc_deref:283 !tty
4,19390,153787149,-;Here I am: tty_ldisc_deref:283 !tty
4,19391,153787164,-;Here I am: tty_ldisc_deref:283 !tty
4,19392,153787167,-;Here I am: tty_ldisc_deref:283 !tty
4,19393,153787176,-;Here I am: tty_ldisc_deref:283 !tty
4,19394,153787183,-;Here I am: tty_ldisc_deref:283 !tty
4,19395,153787189,-; <IRQ>
4,19396,153787257,-;Here I am: tty_ldisc_deref:283 !tty
4,19397,153787259,-;Here I am: tty_ldisc_deref:283 !tty
4,19398,153787265,-; __netif_receive_skb_one_core+0x3d/0xa0
4,19399,153787407,-; __netif_receive_skb+0x15/0x60
4,19400,153787414,-; process_backlog+0x90/0x160
4,19401,153787421,-; __napi_poll+0x33/0x230
4,19402,153787485,-; net_rx_action+0x20b/0x3f0
4,19403,153787493,-; ? update_process_times+0x89/0xd0
4,19404,153787504,-; handle_softirqs+0xe7/0x340
4,19405,153787512,-; __do_softirq+0x10/0x18
4,19406,153787519,-; do_softirq.part.0+0x3f/0x80
4,19407,153787526,-; </IRQ>
4,19408,153787530,-; <TASK>
4,19409,153787535,-; __local_bh_enable_ip+0x6e/0x70
4,19410,153787542,-; _raw_spin_unlock_bh+0x1d/0x30
4,19411,153787553,-; mkiss_receive_buf+0x36b/0x4b0 [mkiss]
4,19412,153787562,-; tty_ldisc_receive_buf+0xed/0x100
4,19413,153787569,-; tty_port_default_receive_buf+0x43/0xd0
4,19414,153787576,-; flush_to_ldisc+0xf9/0x1f0
4,19415,153787582,-; ? queue_delayed_work_on+0x81/0x90
4,19416,153787603,-; process_one_work+0x191/0x3e0
4,19417,153787609,-; worker_thread+0x2e3/0x420
4,19418,153787615,-; ? __pfx_worker_thread+0x10/0x10
4,19419,153787621,-; kthread+0x10d/0x230
4,19420,153787627,-; ? __pfx_kthread+0x10/0x10
3,19429,153871289,-;pstore: backend (efi_pstore) writing error (-5)
4,19430,153871296,-;RIP: 0010:__netif_receive_skb_core.constprop.0+0xfe5/0x12d0
4,19431,153871302,-;Code: 6c 0f 82 24 01 00 00 48 01 93 c0 00 00 00 e9 52 f5 ff ff 48 89 df 4d 89 f5 e8 e7 b4 fd ff e9 c9 fd ff ff 4c 8d 88 d0 00 00 00 <48> 8b 80 d0 00 00 00 4c 8d 78 c8 49 39 c1 0f 84 6c fa ff ff 44 88
4,19432,153871306,-;RSP: 0018:ffffd13040184c98 EFLAGS: 00010286
4,19433,153871311,-;RAX: 0000000000000000 RBX: ffff8d64683f8900 RCX: 0000000000000000
4,19434,153871314,-;RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
4,19435,153871317,-;RBP: ffffd13040184da8 R08: 0000000000000200 R09: 00000000000000d0
4,19436,153871374,-;R10: 0000000000000000 R11: 0000000000000000 R12: ffff8d64681b3540
4,19437,153871377,-;R13: 0000000000000000 R14: 0000000000000200 R15: ffff8d6459d170d0
4,19438,153871381,-;FS: 0000000000000000(0000) GS:ffff8d65f5a3b000(0000) knlGS:0000000000000000
4,19439,153871385,-;CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
4,19440,153871388,-;CR2: 00000000000000d0 CR3: 00000001a5440002 CR4: 00000000001726f0
0,19441,153871391,-;Kernel panic - not syncing: Fatal exception in interrupt
0,19442,153871407,-;Kernel Offset: 0x16000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
0,19443,153951562,-;Rebooting in 30 seconds..
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ROSE] [AX25] 6.15.10 long term stable kernel oops
2025-08-27 14:16 ` F6BVP
@ 2025-08-27 17:30 ` Florian Westphal
2025-08-28 16:39 ` F6BVP
0 siblings, 1 reply; 36+ messages in thread
From: Florian Westphal @ 2025-08-27 17:30 UTC (permalink / raw)
To: F6BVP
Cc: Eric Dumazet, Dan Carpenter, linux-hams, netdev, Dan Cross,
David Ranch, Folkert van Heusden
F6BVP <f6bvp@free.fr> wrote:
> Here I am. Next step is probably to discover why the call to
> receive_buf() fails when bytes are not small and tty_ldisc_deref() is
> acting after flush_to_ldisc probably leading to an error. What value is
> wrong ? ld->tty , p, f ?
Did you enable CONFIG_KASAN?
Also, since you seem to be able to reproduce this easily, did you
try a 'git bisect' to identify the breaking change?
That would allow to CC the author of that change.
> 4,19346,153786988,-;Hardware name: To be filled by O.E.M. To be filled by O.E.M./CK3, BIOS 5.011 09/16/2020
> 4,19347,153786990,-;Workqueue: events_unbound flush_to_ldisc
> 4,19348,153786998,-;Here I am: tty_ldisc_deref:283 !tty
> 4,19349,153787003,-;Here I am: tty_ldisc_deref:283 !tty
> 4,19350,153787005,-;RIP: 0010:__netif_receive_skb_core.constprop.0+0xfe5/0x12d0
> 4,19398,153787265,-; __netif_receive_skb_one_core+0x3d/0xa0
as Eric noted, you need to pipe this through
scripts/decode_stacktrace.sh so this gets translated to line numbers.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ROSE] [AX25] 6.15.10 long term stable kernel oops
2025-08-27 17:30 ` Florian Westphal
@ 2025-08-28 16:39 ` F6BVP
2025-08-30 23:37 ` F6BVP
0 siblings, 1 reply; 36+ messages in thread
From: F6BVP @ 2025-08-28 16:39 UTC (permalink / raw)
To: Florian Westphal
Cc: Eric Dumazet, Dan Carpenter, linux-hams, netdev, Dan Cross,
David Ranch, Folkert van Heusden
[-- Attachment #1: Type: text/plain, Size: 1396 bytes --]
Florian, thanks a lot for suggestions I followed when I understood more
clearly what to do....
Here is linux-6.15.1 kernel panic captured by netconsole remotely and
decoded by stacktrace script.
I guess next step is bisecting between 6.14.11 (good) and 6.15.1 (bad).
Regards,
Bernard
Le 27/08/2025 à 19:30, Florian Westphal a écrit :
> F6BVP <f6bvp@free.fr> wrote:
>> Here I am. Next step is probably to discover why the call to
>> receive_buf() fails when bytes are not small and tty_ldisc_deref() is
>> acting after flush_to_ldisc probably leading to an error. What value is
>> wrong ? ld->tty , p, f ?
>
> Did you enable CONFIG_KASAN?
>
> Also, since you seem to be able to reproduce this easily, did you
> try a 'git bisect' to identify the breaking change?
>
> That would allow to CC the author of that change.
>
>> 4,19346,153786988,-;Hardware name: To be filled by O.E.M. To be filled by O.E.M./CK3, BIOS 5.011 09/16/2020
>> 4,19347,153786990,-;Workqueue: events_unbound flush_to_ldisc
>> 4,19348,153786998,-;Here I am: tty_ldisc_deref:283 !tty
>> 4,19349,153787003,-;Here I am: tty_ldisc_deref:283 !tty
>> 4,19350,153787005,-;RIP: 0010:__netif_receive_skb_core.constprop.0+0xfe5/0x12d0
>> 4,19398,153787265,-; __netif_receive_skb_one_core+0x3d/0xa0
>
> as Eric noted, you need to pipe this through
>
> scripts/decode_stacktrace.sh so this gets translated to line numbers.
[-- Attachment #2: linux-6.15.1_stacktrace.txt --]
[-- Type: text/plain, Size: 12602 bytes --]
6,1088,187160578,-;NET: Registered PF_AX25 protocol family
6,1089,187171606,-;mkiss: AX.25 Multikiss, Hans Albas PE1AYX
6,1090,189184303,-;mkiss: ax0: crc mode is auto.
6,1091,243510195,-;printk: legacy console [netcon_ext0] disabled
6,1094,243576964,-;netpoll: netconsole: local port 4444
6,1095,243576979,-;netpoll: netconsole: local IPv4 address 44.168.19.9
6,1096,243576985,-;netpoll: netconsole: interface name 'enp2s0'
6,1097,243576989,-;netpoll: netconsole: local ethernet address 'ff:ff:ff:ff:ff:ff'
6,1098,243576994,-;netpoll: netconsole: remote port 6666
6,1099,243576997,-;netpoll: netconsole: remote IPv4 address 44.168.19.6
6,1100,243577001,-;netpoll: netconsole: remote ethernet address b8:27:eb:16:10:a5
6,1101,243577182,-;printk: legacy console [netcon_ext0] enabled
6,1102,243577280,-;printk: legacy console [netcon0] enabled
6,1103,243577297,-;netconsole: network logging started
6,1104,264640762,-;NET: Registered PF_NETROM protocol family
6,1105,292182165,-;mkiss: ax0: Trying crc-smack
6,1106,292185839,-;mkiss: ax0: Trying crc-flexnet
4,1107,292188215,-;Oops: general protection fault, probably for non-canonical address 0xdffffc000000001a: 0000 [#1] SMP KASAN PTI
1,1108,292188298,-;KASAN: null-ptr-deref in range [0x00000000000000d0-0x00000000000000d7]
4,1109,292188328,-;CPU: 0 UID: 0 PID: 197 Comm: kworker/u16:4 Not tainted 6.15.1-f6bvp #1 PREEMPT(voluntary)
4,1110,292188357,-;Hardware name: To be filled by O.E.M. To be filled by O.E.M./CK3, BIOS 5.011 09/16/2020
4,1111,292188395,-;Workqueue: events_unbound flush_to_ldisc
4,1112,292188459,-;RIP: 0010:__netif_receive_skb_core.constprop.0 (net/core/dev.c:2430 (discriminator 2) net/core/dev.c:5847 (discriminator 2))
4,1113,292188505,-;Code: e8 cc 11 f9 ff e9 6f fe ff ff 45 31 f6 e9 c7 ee ff ff 4d 8d 85 d0 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 c1 48 c1 e9 03 <80> 3c 01 00 0f 85 3a 11 00 00 49 8b 85 d0 00 00 00 4c 8d 68 c8 49
All code
========
0: e8 cc 11 f9 ff call 0xfffffffffff911d1
5: e9 6f fe ff ff jmp 0xfffffffffffffe79
a: 45 31 f6 xor %r14d,%r14d
d: e9 c7 ee ff ff jmp 0xffffffffffffeed9
12: 4d 8d 85 d0 00 00 00 lea 0xd0(%r13),%r8
19: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax
20: fc ff df
23: 4c 89 c1 mov %r8,%rcx
26: 48 c1 e9 03 shr $0x3,%rcx
2a:* 80 3c 01 00 cmpb $0x0,(%rcx,%rax,1) <-- trapping instruction
2e: 0f 85 3a 11 00 00 jne 0x116e
34: 49 8b 85 d0 00 00 00 mov 0xd0(%r13),%rax
3b: 4c 8d 68 c8 lea -0x38(%rax),%r13
3f: 49 rex.WB
Code starting with the faulting instruction
===========================================
0: 80 3c 01 00 cmpb $0x0,(%rcx,%rax,1)
4: 0f 85 3a 11 00 00 jne 0x1144
a: 49 8b 85 d0 00 00 00 mov 0xd0(%r13),%rax
11: 4c 8d 68 c8 lea -0x38(%rax),%r13
15: 49 rex.WB
4,1114,292188537,-;RSP: 0018:ffffc90000007930 EFLAGS: 00010202
4,1115,292188572,-;RAX: dffffc0000000000 RBX: 0000000000000200 RCX: 000000000000001a
4,1116,292188604,-;RDX: ffff888113851690 RSI: ffff88813ecc2000 RDI: ffff88812d792578
4,1117,292188632,-;RBP: ffffc90000007b78 R08: 00000000000000d0 R09: 0000000000000000
4,1118,292188656,-;R10: ffff888113851680 R11: 0000000000000000 R12: 0000000000000000
4,1119,292188672,-;R13: 0000000000000000 R14: ffff88812d792540 R15: ffff88813ecc2098
4,1120,292188688,-;FS: 0000000000000000(0000) GS:ffff88825629b000(0000) knlGS:0000000000000000
4,1121,292188704,-;CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
4,1122,292188719,-;CR2: 00007fa3b50ee230 CR3: 000000011849c006 CR4: 00000000001726f0
4,1123,292188736,-;Call Trace:
4,1124,292188751,-; <IRQ>
4,1125,292188768,-; ? stack_depot_save_flags (lib/stackdepot.c:610)
4,1126,292188793,-; ? __pfx___netif_receive_skb_core.constprop.0 (net/core/dev.c:5659)
4,1127,292188816,-; ? kasan_save_stack (mm/kasan/common.c:49)
4,1128,292188837,-; ? kasan_save_stack (mm/kasan/common.c:48)
4,1129,292188854,-; ? __pfx_sched_balance_find_src_group (kernel/sched/fair.c:11296)
4,1130,292188877,-; ? __pfx_sugov_get_util (kernel/sched/cpufreq_schedutil.c:225)
4,1131,292188896,-; ? sched_clock_noinstr (arch/x86/kernel/tsc.c:271)
4,1132,292188926,-; ? sched_clock (./arch/x86/include/asm/preempt.h:95 (discriminator 1) arch/x86/kernel/tsc.c:288 (discriminator 1))
4,1133,292188947,-; __netif_receive_skb_one_core (net/core/dev.c:5886)
4,1134,292188966,-; ? __pfx___netif_receive_skb_one_core (net/core/dev.c:5880)
4,1135,292188984,-; ? sched_balance_rq (kernel/sched/fair.c:11769)
4,1136,292189001,-; ? __kasan_check_write (mm/kasan/shadow.c:38)
4,1137,292189017,-; ? _raw_spin_lock_irq (./arch/x86/include/asm/atomic.h:107 (discriminator 4) ./include/linux/atomic/atomic-arch-fallback.h:2170 (discriminator 4) ./include/linux/atomic/atomic-instrumented.h:1302 (discriminator 4) ./include/asm-generic/qspinlock.h:111 (discriminator 4) ./include/linux/spinlock.h:187 (discriminator 4) ./include/linux/spinlock_api_smp.h:120 (discriminator 4) kernel/locking/spinlock.c:170 (discriminator 4))
4,1138,292189034,-; ? __pfx__raw_spin_lock_irq (kernel/locking/spinlock.c:169)
4,1139,292189051,-; __netif_receive_skb (net/core/dev.c:6003)
4,1140,292189070,-; process_backlog (./include/linux/rcupdate.h:873 net/core/dev.c:6353)
4,1141,292189086,-; __napi_poll (net/core/dev.c:7324)
4,1142,292189102,-; net_rx_action (net/core/dev.c:7390 net/core/dev.c:7510)
4,1143,292189117,-; ? __pfx_sugov_get_util (kernel/sched/cpufreq_schedutil.c:225)
4,1144,292189135,-; ? __pfx_net_rx_action (net/core/dev.c:7472)
4,1145,292189151,-; ? __pfx_sched_balance_domains (kernel/sched/fair.c:12187)
4,1146,292189169,-; ? sched_clock_cpu (kernel/sched/clock.c:394 (discriminator 1))
4,1147,292189185,-; ? sched_balance_softirq (kernel/sched/fair.c:12928)
4,1148,292189201,-; handle_softirqs (./arch/x86/include/asm/jump_label.h:36 ./include/trace/events/irq.h:142 kernel/softirq.c:580)
4,1149,292189219,-; ? __pfx_handle_softirqs (kernel/softirq.c:537)
4,1150,292189235,-; ? tick_nohz_irq_exit (kernel/time/tick-sched.c:1296)
4,1151,292189253,-; __do_softirq (kernel/softirq.c:614)
4,1152,292189269,-; do_softirq.part.0 (kernel/softirq.c:480 (discriminator 32))
4,1153,292189285,-; </IRQ>
4,1154,292189299,-; <TASK>
4,1155,292189312,-; __local_bh_enable_ip (./arch/x86/include/asm/preempt.h:27 (discriminator 1) kernel/softirq.c:407 (discriminator 1))
4,1156,292189329,-; _raw_spin_unlock_bh (kernel/locking/spinlock.c:211)
4,1157,292189344,-; mkiss_receive_buf (./include/linux/spinlock.h:397 drivers/net/hamradio/mkiss.c:298 drivers/net/hamradio/mkiss.c:310 drivers/net/hamradio/mkiss.c:901) mkiss
4,1158,292189362,-; ? __kasan_check_write (mm/kasan/shadow.c:38)
4,1159,292189377,-; ? ldsem_down_read_trylock (./arch/x86/include/asm/atomic64_64.h:101 (discriminator 1) ./include/linux/atomic/atomic-arch-fallback.h:4256 (discriminator 1) ./include/linux/atomic/atomic-long.h:1458 (discriminator 1) ./include/linux/atomic/atomic-instrumented.h:4436 (discriminator 1) drivers/tty/tty_ldsem.c:351 (discriminator 1))
4,1160,292189396,-; tty_ldisc_receive_buf (drivers/tty/tty_buffer.c:391)
4,1161,292189414,-; tty_port_default_receive_buf (drivers/tty/tty_port.c:39)
4,1162,292189431,-; flush_to_ldisc (drivers/tty/tty_buffer.c:446 drivers/tty/tty_buffer.c:495)
4,1163,292189448,-; ? __pfx___kasan_check_read (??:?)
4,1164,292189464,-; process_one_work (kernel/workqueue.c:3243)
4,1165,292189481,-; ? __kasan_check_write (mm/kasan/shadow.c:38)
4,1166,292189498,-; worker_thread (kernel/workqueue.c:3313 (discriminator 2) kernel/workqueue.c:3400 (discriminator 2))
4,1167,292189513,-; ? __pfx_try_to_wake_up (kernel/sched/core.c:4175)
4,1168,292189531,-; ? __pfx_worker_thread (kernel/workqueue.c:3346)
4,1169,292189547,-; kthread (kernel/kthread.c:464)
4,1170,292189562,-; ? __pfx__raw_spin_lock_irq (kernel/locking/spinlock.c:169)
4,1171,292189578,-; ? __pfx_kthread (kernel/kthread.c:413)
4,1172,292189598,-; ? __kasan_check_write (mm/kasan/shadow.c:38)
4,1173,292189613,-; ? recalc_sigpending (./arch/x86/include/asm/bitops.h:75 ./include/asm-generic/bitops/instrumented-atomic.h:42 ./include/linux/thread_info.h:102 kernel/signal.c:181 kernel/signal.c:177)
4,1174,292189630,-; ? ret_from_fork (arch/x86/kernel/process.c:152 (discriminator 1))
4,1175,292189647,-; ? calculate_sigpending (kernel/signal.c:195)
4,1176,292189663,-; ? __pfx_kthread (kernel/kthread.c:413)
4,1177,292189678,-; ret_from_fork (arch/x86/kernel/process.c:159)
4,1178,292189693,-; ? __pfx_kthread (kernel/kthread.c:413)
4,1179,292189708,-; ret_from_fork_asm (arch/x86/entry/entry_64.S:258)
4,1180,292189726,-; </TASK>
4,1181,292189739,-,ncfrag=0/1022;Modules linked in: netrom netconsole mkiss ax25 snd_seq_dummy snd_hrtimer cmac nls_utf8 cifs cifs_arc4 nls_ucs2_utils netfs cifs_md4 qrtr snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp i915 coretemp kvm_intel kvm snd_hda_codec_realtek spi_nor snd_hda_codec_generic snd_hda_scodec_component mtd snd_hda_intel snd_intel_dspcfg mei_hdcp mei_pxp snd_hda_codec at24 spi_intel_platform intel_rapl_msr spi_intel irqbypass snd_hwdep polyval_clmulni snd_hda_core snd_pcm polyval_generic ghash_clmulni_intel processor_thermal_device_pci_legacy aesni_intel crypto_simd intel_soc_dts_iosf processor_thermal_device cryptd processor_thermal_wt_hint snd_seq processor_thermal_rfim mei_me i2c_algo_bit rapl binfmt_misc processor_thermal_rapl drm_buddy intel_cstate snd_seq_device ttm i2c_i801 intel_rapl_common snd_timer intel_pch_thermal i2c_smbus mei processor_thermal_wt_req lpc_ich processor_thermal_power_floor drm_display_helper snd processor_thermal_mbox int340x_the4,1181,292189739,-,ncfrag=967/1022;rmal_zone soundcore intel_pmc_core video pmt_telemetry
4,1182,292189921,c; wmi pmt_class acpi_pad intel_vsec nls_iso8859_1 input_leds mac_hid sch_fq_codel msr parport_pc ppdev lp parport efi_pstore nfnetlink dmi_sysfs autofs4 r8169 ahci realtek libahci hid_generic usbhid hid uas usb_storage [last unloaded: netconsole]
4,1183,292190085,-;---[ end trace 0000000000000000 ]---
3,1184,292227984,-;pstore: backend (efi_pstore) writing error (-5)
4,1185,292228068,-;RIP: 0010:__netif_receive_skb_core.constprop.0 (net/core/dev.c:2430 (discriminator 2) net/core/dev.c:5847 (discriminator 2))
4,1186,292228114,-;Code: e8 cc 11 f9 ff e9 6f fe ff ff 45 31 f6 e9 c7 ee ff ff 4d 8d 85 d0 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 c1 48 c1 e9 03 <80> 3c 01 00 0f 85 3a 11 00 00 49 8b 85 d0 00 00 00 4c 8d 68 c8 49
All code
========
0: e8 cc 11 f9 ff call 0xfffffffffff911d1
5: e9 6f fe ff ff jmp 0xfffffffffffffe79
a: 45 31 f6 xor %r14d,%r14d
d: e9 c7 ee ff ff jmp 0xffffffffffffeed9
12: 4d 8d 85 d0 00 00 00 lea 0xd0(%r13),%r8
19: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax
20: fc ff df
23: 4c 89 c1 mov %r8,%rcx
26: 48 c1 e9 03 shr $0x3,%rcx
2a:* 80 3c 01 00 cmpb $0x0,(%rcx,%rax,1) <-- trapping instruction
2e: 0f 85 3a 11 00 00 jne 0x116e
34: 49 8b 85 d0 00 00 00 mov 0xd0(%r13),%rax
3b: 4c 8d 68 c8 lea -0x38(%rax),%r13
3f: 49 rex.WB
Code starting with the faulting instruction
===========================================
0: 80 3c 01 00 cmpb $0x0,(%rcx,%rax,1)
4: 0f 85 3a 11 00 00 jne 0x1144
a: 49 8b 85 d0 00 00 00 mov 0xd0(%r13),%rax
11: 4c 8d 68 c8 lea -0x38(%rax),%r13
15: 49 rex.WB
4,1187,292228147,-;RSP: 0018:ffffc90000007930 EFLAGS: 00010202
4,1188,292228181,-;RAX: dffffc0000000000 RBX: 0000000000000200 RCX: 000000000000001a
4,1189,292228208,-;RDX: ffff888113851690 RSI: ffff88813ecc2000 RDI: ffff88812d792578
4,1190,292228235,-;RBP: ffffc90000007b78 R08: 00000000000000d0 R09: 0000000000000000
4,1191,292228264,-;R10: ffff888113851680 R11: 0000000000000000 R12: 0000000000000000
4,1192,292228291,-;R13: 0000000000000000 R14: ffff88812d792540 R15: ffff88813ecc2098
4,1193,292228319,-;FS: 0000000000000000(0000) GS:ffff88825629b000(0000) knlGS:0000000000000000
4,1194,292228348,-;CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
4,1195,292228375,-;CR2: 00007fa3b50ee230 CR3: 00000001136d0005 CR4: 00000000001726f0
0,1196,292228427,-;Kernel panic - not syncing: Fatal exception in interrupt
0,1197,292228467,-;Kernel Offset: 0x32200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
0,1198,292263791,-;Rebooting in 30 seconds..
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ROSE] [AX25] 6.15.10 long term stable kernel oops
2025-08-28 16:39 ` F6BVP
@ 2025-08-30 23:37 ` F6BVP
2025-09-01 12:04 ` Eric Dumazet
0 siblings, 1 reply; 36+ messages in thread
From: F6BVP @ 2025-08-30 23:37 UTC (permalink / raw)
To: Paolo Abeni
Cc: Eric Dumazet, Dan Carpenter, linux-hams, netdev, Dan Cross,
David Ranch, Folkert van Heusden, Florian Westphal
[-- Attachment #1: Type: text/plain, Size: 327 bytes --]
Here is a bad commit report by git bisect and the corresponding decoded
stack trace of kernel panic triggered when mkiss receives AX25 packet.
All kernels following 6.14.11, i.e. starting with 6.15.1 until net-next
are affected by the issue.
I would be pleased to check any patch correcting the issue.
Regards,
Bernard
[-- Attachment #2: Bad_commit --]
[-- Type: text/plain, Size: 1653 bytes --]
c353e8983e0dea5dbba7789033326e1ad34135b7 is the first bad commit
commit c353e8983e0dea5dbba7789033326e1ad34135b7
Author: Paolo Abeni <pabeni@redhat.com>
Date: Thu Mar 20 19:22:38 2025 +0100
net: introduce per netns packet chains
Currently network taps unbound to any interface are linked in the
global ptype_all list, affecting the performance in all the network
namespaces.
Add per netns ptypes chains, so that in the mentioned case only
the netns owning the packet socket(s) is affected.
While at that drop the global ptype_all list: no in kernel user
registers a tap on "any" type without specifying either the target
device or the target namespace (and IMHO doing that would not make
any sense).
Note that this adds a conditional in the fast path (to check for
per netns ptype_specific list) and increases the dataset size by
a cacheline (owing the per netns lists).
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumaze@google.com>
Link: https://patch.msgid.link/ae405f98875ee87f8150c460ad162de7e466f8a7.1742494826.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
include/linux/netdevice.h | 12 +++++++++-
include/net/hotdata.h | 1 -
include/net/net_namespace.h | 3 +++
net/core/dev.c | 53 +++++++++++++++++++++++++++++++++++----------
net/core/hotdata.c | 1 -
net/core/net-procfs.c | 28 ++++++++++++++++++------
net/core/net_namespace.c | 2 ++
7 files changed, 78 insertions(+), 22 deletions(-)
[-- Attachment #3: netconsole_local-01458.txt --]
[-- Type: text/plain, Size: 11063 bytes --]
6,1124,85753801,-;NET: Registered PF_NETROM protocol family
6,1125,95585802,-;mkiss: ax0: Trying crc-smack
6,1126,95586356,-;mkiss: ax0: Trying crc-flexnet
4,1127,97604936,-;Oops: general protection fault, probably for non-canonical address 0xdffffc000000001a: 0000 [#1] PREEMPT SMP KASAN PTI
1,1128,97605079,-;KASAN: null-ptr-deref in range [0x00000000000000d0-0x00000000000000d7]
4,1129,97605182,-;CPU: 0 UID: 0 PID: 198 Comm: kworker/u16:4 Not tainted 6.14.0-rc7-local-01458-gc353e8983e0d #15
4,1130,97605294,-;Hardware name: To be filled by O.E.M. To be filled by O.E.M./CK3, BIOS 5.011 09/16/2020
4,1131,97605394,-;Workqueue: events_unbound flush_to_ldisc
4,1132,97605521,-;RIP: 0010:__netif_receive_skb_core.constprop.0 (net/core/dev.c:2434 (discriminator 2) net/core/dev.c:5846 (discriminator 2))
4,1133,97605629,-;Code: e8 07 fb f8 ff e9 64 fe ff ff 45 31 f6 e9 7b ee ff ff 4d 8d 85 d0 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 c1 48 c1 e9 03 <80> 3c 01 00 0f 85 99 11 00 00 49 8b 85 d0 00 00 00 4c 8d 68 c8 49
All code
========
0: e8 07 fb f8 ff call 0xfffffffffff8fb0c
5: e9 64 fe ff ff jmp 0xfffffffffffffe6e
a: 45 31 f6 xor %r14d,%r14d
d: e9 7b ee ff ff jmp 0xffffffffffffee8d
12: 4d 8d 85 d0 00 00 00 lea 0xd0(%r13),%r8
19: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax
20: fc ff df
23: 4c 89 c1 mov %r8,%rcx
26: 48 c1 e9 03 shr $0x3,%rcx
2a:* 80 3c 01 00 cmpb $0x0,(%rcx,%rax,1) <-- trapping instruction
2e: 0f 85 99 11 00 00 jne 0x11cd
34: 49 8b 85 d0 00 00 00 mov 0xd0(%r13),%rax
3b: 4c 8d 68 c8 lea -0x38(%rax),%r13
3f: 49 rex.WB
Code starting with the faulting instruction
===========================================
0: 80 3c 01 00 cmpb $0x0,(%rcx,%rax,1)
4: 0f 85 99 11 00 00 jne 0x11a3
a: 49 8b 85 d0 00 00 00 mov 0xd0(%r13),%rax
11: 4c 8d 68 c8 lea -0x38(%rax),%r13
15: 49 rex.WB
4,1134,97605731,-;RSP: 0018:ffff88820f009930 EFLAGS: 00010202
4,1135,97605837,-;RAX: dffffc0000000000 RBX: 0000000000000200 RCX: 000000000000001a
4,1136,97605929,-;RDX: ffff88810220bb90 RSI: ffff888125166000 RDI: ffff88812cae4578
4,1137,97605981,-;RBP: ffff88820f009b78 R08: 00000000000000d0 R09: 0000000000000000
4,1138,97606031,-;R10: ffff88810220bb80 R11: 0000000000000000 R12: 0000000000000000
4,1139,97606081,-;R13: 0000000000000000 R14: ffff88812cae4540 R15: ffff888125166098
4,1140,97606130,-;FS: 0000000000000000(0000) GS:ffff88820f000000(0000) knlGS:0000000000000000
4,1141,97606183,-;CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
4,1142,97606234,-;CR2: 000070b8b015400c CR3: 000000002129c004 CR4: 00000000001726f0
4,1143,97606285,-;Call Trace:
4,1144,97606331,-; <IRQ>
4,1145,97606380,-; ? show_regs (arch/x86/kernel/dumpstack.c:479)
4,1146,97606440,-; ? die_addr (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:460)
4,1147,97606493,-; ? exc_general_protection (arch/x86/kernel/traps.c:751 arch/x86/kernel/traps.c:693)
4,1148,97606597,-; ? asm_exc_general_protection (./arch/x86/include/asm/idtentry.h:574)
4,1149,97606711,-; ? __netif_receive_skb_core.constprop.0 (net/core/dev.c:2434 (discriminator 2) net/core/dev.c:5846 (discriminator 2))
4,1150,97606776,-; ? sched_balance_find_src_group (kernel/sched/fair.c:11290)
4,1151,97606834,-; ? __pfx_sched_balance_find_src_group (kernel/sched/fair.c:11277)
4,1152,97606894,-; ? __pfx___netif_receive_skb_core.constprop.0 (net/core/dev.c:5658)
4,1153,97606954,-; ? sched_balance_rq (kernel/sched/fair.c:11750)
4,1154,97607009,-; ? cpuidle_enter_state (drivers/cpuidle/cpuidle.c:292)
4,1155,97607066,-; ? cpuidle_enter (drivers/cpuidle/cpuidle.c:391 (discriminator 2))
4,1156,97607120,-; ? call_cpuidle (kernel/sched/idle.c:156)
4,1157,97607178,-; __netif_receive_skb_one_core (net/core/dev.c:5885)
4,1158,97607233,-; ? __pfx___netif_receive_skb_one_core (net/core/dev.c:5879)
4,1159,97607288,-; ? __kasan_check_write (mm/kasan/shadow.c:38)
4,1160,97607343,-; ? _raw_spin_lock_irq (./arch/x86/include/asm/atomic.h:107 (discriminator 4) ./include/linux/atomic/atomic-arch-fallback.h:2170 (discriminator 4) ./include/linux/atomic/atomic-instrumented.h:1302 (discriminator 4) ./include/asm-generic/qspinlock.h:111 (discriminator 4) ./include/linux/spinlock.h:187 (discriminator 4) ./include/linux/spinlock_api_smp.h:120 (discriminator 4) kernel/locking/spinlock.c:170 (discriminator 4))
4,1161,97607401,-; __netif_receive_skb (net/core/dev.c:6002)
4,1162,97607454,-; process_backlog (./include/linux/rcupdate.h:882 net/core/dev.c:6352)
4,1163,97607506,-; ? __kasan_check_read (mm/kasan/shadow.c:32)
4,1164,97607561,-; __napi_poll (net/core/dev.c:7325)
4,1165,97607615,-; net_rx_action (net/core/dev.c:7391 net/core/dev.c:7511)
4,1166,97607674,-; ? __pfx_net_rx_action (net/core/dev.c:7473)
4,1167,97607736,-; handle_softirqs (./arch/x86/include/asm/jump_label.h:36 ./include/trace/events/irq.h:142 kernel/softirq.c:562)
4,1168,97607796,-; ? __pfx_handle_softirqs (kernel/softirq.c:519)
4,1169,97607856,-; __do_softirq (kernel/softirq.c:596)
4,1170,97607910,-; do_softirq.part.0 (kernel/softirq.c:462 (discriminator 20))
4,1171,97607966,-; </IRQ>
4,1172,97608012,-; <TASK>
4,1173,97608057,-; __local_bh_enable_ip (kernel/softirq.c:464 (discriminator 1) kernel/softirq.c:389 (discriminator 1))
4,1174,97608114,-; _raw_spin_unlock_bh (kernel/locking/spinlock.c:211)
4,1175,97608168,-; mkiss_receive_buf (./include/linux/spinlock.h:397 drivers/net/hamradio/mkiss.c:298 drivers/net/hamradio/mkiss.c:310 drivers/net/hamradio/mkiss.c:901) mkiss
4,1176,97608226,-; ? __kasan_check_write (mm/kasan/shadow.c:38)
4,1177,97608278,-; ? ldsem_down_read_trylock (./arch/x86/include/asm/atomic64_64.h:101 (discriminator 1) ./include/linux/atomic/atomic-arch-fallback.h:4256 (discriminator 1) ./include/linux/atomic/atomic-long.h:1458 (discriminator 1) ./include/linux/atomic/atomic-instrumented.h:4436 (discriminator 1) drivers/tty/tty_ldsem.c:351 (discriminator 1))
4,1178,97608336,-; tty_ldisc_receive_buf (drivers/tty/tty_buffer.c:391)
4,1179,97608395,-; tty_port_default_receive_buf (drivers/tty/tty_port.c:39)
4,1180,97608449,-; flush_to_ldisc (drivers/tty/tty_buffer.c:446 drivers/tty/tty_buffer.c:495)
4,1181,97608510,-; process_one_work (kernel/workqueue.c:3243)
4,1182,97608567,-; ? __kasan_check_write (mm/kasan/shadow.c:38)
4,1183,97608624,-; worker_thread (kernel/workqueue.c:3313 (discriminator 2) kernel/workqueue.c:3400 (discriminator 2))
4,1184,97608684,-; ? __pfx_worker_thread (kernel/workqueue.c:3346)
4,1185,97608739,-; kthread (kernel/kthread.c:464)
4,1186,97608794,-; ? __pfx_kthread (kernel/kthread.c:413)
4,1187,97608846,-; ? recalc_sigpending (./arch/x86/include/asm/bitops.h:75 ./include/asm-generic/bitops/instrumented-atomic.h:42 ./include/linux/thread_info.h:102 kernel/signal.c:180)
4,1188,97608902,-; ? calculate_sigpending (kernel/signal.c:194)
4,1189,97608957,-; ? __pfx_kthread (kernel/kthread.c:413)
4,1190,97609009,-; ret_from_fork (arch/x86/kernel/process.c:154)
4,1191,97609066,-; ? __pfx_kthread (kernel/kthread.c:413)
4,1192,97609117,-; ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
4,1193,97609177,-; </TASK>
4,1194,97609222,-,ncfrag=0/1021;Modules linked in: netrom mkiss rose ax25 netconsole snd_seq_dummy snd_hrtimer cmac nls_utf8 cifs cifs_arc4 nls_ucs2_utils netfs cifs_md4 snd_hda_codec_hdmi qrtr x86_pkg_temp_thermal intel_powerclamp coretemp i915 spi_nor kvm_intel at24 mei_hdcp kvm mtd snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component mei_pxp polyval_clmulni spi_intel_platform polyval_generic spi_intel intel_rapl_msr ghash_clmulni_intel snd_hda_intel aesni_intel snd_intel_dspcfg snd_hda_codec crypto_simd cryptd binfmt_misc rapl snd_hwdep snd_hda_core intel_cstate i2c_i801 processor_thermal_device_pci_legacy i2c_smbus intel_pch_thermal mei_me intel_soc_dts_iosf snd_pcm mei processor_thermal_device processor_thermal_wt_hint lpc_ich processor_thermal_rfim snd_seq processor_thermal_rapl i2c_algo_bit snd_seq_device intel_rapl_common snd_timer drm_buddy intel_pmc_core processor_thermal_wt_req ttm snd processor_thermal_power_floor pmt_telemetry drm_display_helper processor_t4,1194,97609222,-,ncfrag=968/1021;hermal_mbox video pmt_class wmi int340x_thermal_zone
4,1195,97609826,c; intel_vsec acpi_pad soundcore nls_iso8859_1 input_leds mac_hid sch_fq_codel msr parport_pc ppdev lp parport efi_pstore nfnetlink dmi_sysfs autofs4 uas r8169 usb_storage ahci libahci realtek hid_generic usbhid hid [last unloaded: mkiss]
4,1196,97610219,-;---[ end trace 0000000000000000 ]---
4,1197,98142688,-;RIP: 0010:__netif_receive_skb_core.constprop.0 (net/core/dev.c:2434 (discriminator 2) net/core/dev.c:5846 (discriminator 2))
4,1198,98142752,-;Code: e8 07 fb f8 ff e9 64 fe ff ff 45 31 f6 e9 7b ee ff ff 4d 8d 85 d0 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 c1 48 c1 e9 03 <80> 3c 01 00 0f 85 99 11 00 00 49 8b 85 d0 00 00 00 4c 8d 68 c8 49
All code
========
0: e8 07 fb f8 ff call 0xfffffffffff8fb0c
5: e9 64 fe ff ff jmp 0xfffffffffffffe6e
a: 45 31 f6 xor %r14d,%r14d
d: e9 7b ee ff ff jmp 0xffffffffffffee8d
12: 4d 8d 85 d0 00 00 00 lea 0xd0(%r13),%r8
19: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax
20: fc ff df
23: 4c 89 c1 mov %r8,%rcx
26: 48 c1 e9 03 shr $0x3,%rcx
2a:* 80 3c 01 00 cmpb $0x0,(%rcx,%rax,1) <-- trapping instruction
2e: 0f 85 99 11 00 00 jne 0x11cd
34: 49 8b 85 d0 00 00 00 mov 0xd0(%r13),%rax
3b: 4c 8d 68 c8 lea -0x38(%rax),%r13
3f: 49 rex.WB
Code starting with the faulting instruction
===========================================
0: 80 3c 01 00 cmpb $0x0,(%rcx,%rax,1)
4: 0f 85 99 11 00 00 jne 0x11a3
a: 49 8b 85 d0 00 00 00 mov 0xd0(%r13),%rax
11: 4c 8d 68 c8 lea -0x38(%rax),%r13
15: 49 rex.WB
4,1199,98142786,-;RSP: 0018:ffff88820f009930 EFLAGS: 00010202
4,1200,98142818,-;RAX: dffffc0000000000 RBX: 0000000000000200 RCX: 000000000000001a
4,1201,98142844,-;RDX: ffff88810220bb90 RSI: ffff888125166000 RDI: ffff88812cae4578
4,1202,98142880,-;RBP: ffff88820f009b78 R08: 00000000000000d0 R09: 0000000000000000
4,1203,98142903,-;R10: ffff88810220bb80 R11: 0000000000000000 R12: 0000000000000000
4,1204,98142930,-;R13: 0000000000000000 R14: ffff88812cae4540 R15: ffff888125166098
4,1205,98142958,-;FS: 0000000000000000(0000) GS:ffff88820f000000(0000) knlGS:0000000000000000
4,1206,98142986,-;CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
4,1207,98143014,-;CR2: 000070b8b015400c CR3: 00000001150b4003 CR4: 00000000001726f0
0,1208,98143043,-;Kernel panic - not syncing: Fatal exception in interrupt
0,1209,98143084,-;Kernel Offset: 0x2c200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
0,1210,98643775,-;Rebooting in 60 seconds..
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ROSE] [AX25] 6.15.10 long term stable kernel oops
2025-08-30 23:37 ` F6BVP
@ 2025-09-01 12:04 ` Eric Dumazet
2025-09-01 12:05 ` Eric Dumazet
2025-09-02 7:54 ` F6BVP
0 siblings, 2 replies; 36+ messages in thread
From: Eric Dumazet @ 2025-09-01 12:04 UTC (permalink / raw)
To: F6BVP
Cc: Paolo Abeni, Dan Carpenter, linux-hams, netdev, Dan Cross,
David Ranch, Folkert van Heusden, Florian Westphal
On Sat, Aug 30, 2025 at 4:37 PM F6BVP <f6bvp@free.fr> wrote:
>
> Here is a bad commit report by git bisect and the corresponding decoded
> stack trace of kernel panic triggered when mkiss receives AX25 packet.
>
> All kernels following 6.14.11, i.e. starting with 6.15.1 until net-next
> are affected by the issue.
>
> I would be pleased to check any patch correcting the issue.
>
Thanks for the report.
At some point we will have to remove ax25, this has been quite broken
for a long time.
Please try :
diff --git a/net/ax25/ax25_in.c b/net/ax25/ax25_in.c
index 1cac25aca637..f2d66af86359 100644
--- a/net/ax25/ax25_in.c
+++ b/net/ax25/ax25_in.c
@@ -433,6 +433,10 @@ static int ax25_rcv(struct sk_buff *skb, struct
net_device *dev,
int ax25_kiss_rcv(struct sk_buff *skb, struct net_device *dev,
struct packet_type *ptype, struct net_device *orig_dev)
{
+ skb = skb_share_check(skb, GFP_ATOMIC);
+ if (!skb)
+ return NET_RX_DROP;
+
skb_orphan(skb);
if (!net_eq(dev_net(dev), &init_net)) {
^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [ROSE] [AX25] 6.15.10 long term stable kernel oops
2025-09-01 12:04 ` Eric Dumazet
@ 2025-09-01 12:05 ` Eric Dumazet
[not found] ` <cd0461e0-8136-4f90-df7b-64f1e43e78d4@trinnet.net>
2025-09-01 19:04 ` [ROSE] [AX25] 6.15.10 long term stable kernel oops David Ranch
2025-09-02 7:54 ` F6BVP
1 sibling, 2 replies; 36+ messages in thread
From: Eric Dumazet @ 2025-09-01 12:05 UTC (permalink / raw)
To: F6BVP
Cc: Paolo Abeni, Dan Carpenter, linux-hams, netdev, Dan Cross,
David Ranch, Folkert van Heusden, Florian Westphal
On Mon, Sep 1, 2025 at 5:04 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Sat, Aug 30, 2025 at 4:37 PM F6BVP <f6bvp@free.fr> wrote:
> >
> > Here is a bad commit report by git bisect and the corresponding decoded
> > stack trace of kernel panic triggered when mkiss receives AX25 packet.
> >
> > All kernels following 6.14.11, i.e. starting with 6.15.1 until net-next
> > are affected by the issue.
> >
> > I would be pleased to check any patch correcting the issue.
> >
>
> Thanks for the report.
>
> At some point we will have to remove ax25, this has been quite broken
> for a long time.
>
> Please try :
>
> diff --git a/net/ax25/ax25_in.c b/net/ax25/ax25_in.c
> index 1cac25aca637..f2d66af86359 100644
> --- a/net/ax25/ax25_in.c
> +++ b/net/ax25/ax25_in.c
> @@ -433,6 +433,10 @@ static int ax25_rcv(struct sk_buff *skb, struct
> net_device *dev,
> int ax25_kiss_rcv(struct sk_buff *skb, struct net_device *dev,
> struct packet_type *ptype, struct net_device *orig_dev)
> {
> + skb = skb_share_check(skb, GFP_ATOMIC);
> + if (!skb)
> + return NET_RX_DROP;
> +
> skb_orphan(skb);
>
> if (!net_eq(dev_net(dev), &init_net)) {
We had a similar fix in 2016 for phonet:
commit 7aaed57c5c2890634cfadf725173c7c68ea4cb4f
Author: Eric Dumazet <edumazet@google.com>
Date: Tue Jan 12 08:58:00 2016 -0800
phonet: properly unshare skbs in phonet_rcv()
Ivaylo Dimitrov reported a regression caused by commit 7866a621043f
("dev: add per net_device packet type chains").
skb->dev becomes NULL and we crash in __netif_receive_skb_core().
Before above commit, different kind of bugs or corruptions could happen
without major crash.
But the root cause is that phonet_rcv() can queue skb without checking
if skb is shared or not.
Many thanks to Ivaylo Dimitrov for his help, diagnosis and tests.
Reported-by: Ivaylo Dimitrov <ivo.g.dimitrov.75@gmail.com>
Tested-by: Ivaylo Dimitrov <ivo.g.dimitrov.75@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Remi Denis-Courmont <courmisch@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ROSE] [AX25] 6.15.10 long term stable kernel oops
[not found] ` <cd0461e0-8136-4f90-df7b-64f1e43e78d4@trinnet.net>
@ 2025-09-01 15:59 ` F6BVP
2025-09-01 16:03 ` Eric Dumazet
0 siblings, 1 reply; 36+ messages in thread
From: F6BVP @ 2025-09-01 15:59 UTC (permalink / raw)
To: David Ranch, Eric Dumazet
Cc: Paolo Abeni, Dan Carpenter, linux-hams, netdev, Dan Cross,
Folkert van Heusden, Florian Westphal
Radioamateur have contributed to Linux since the begining.
If a protocole should be removed from Linux kernel as soon as a commit
breaks it, Linux itself would be t be abandonned.
AX25 is not responsible of a kernel Oops due to a commit in dev.c
Like David KI6ZHD mentionned, many hams are still experimenting using
packet radio.
Not mentioning that a large number of pico satellites from universities
all around the world are using AX25 for TM/TC !
Bernard Pidoux
F6BVP /AI7BG
Founder president AMSAT-France
President Dimension Parabole
http://radiotelescope-lavillette.fr
Le 01/09/2025 à 16:43, David Ranch a écrit :
>
> Hello Eric, Everyone,
>
>>> At some point we will have to remove ax25, this has been quite broken
>>> for a long time.
>
> I can appreciate that the code implementing AX.25 in the kernel is very
> old but say it needs to be removed will impact a lot of people. There
> is a very active community around AX.25 packet radio today and Linux's
> native implementation still offers features and functions that aren't
> implemented anywhere else. There are also some large / popular projects
> that are dependent on it for their connectivity via libax25, etc. I
> continue to hope someone will be willing to step forward and write a
> modernized version of this stack (and netrom and rose too) so we can
> continue to run things natively on Linux.
>
> --David
> KI6ZHD
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ROSE] [AX25] 6.15.10 long term stable kernel oops
2025-09-01 15:59 ` F6BVP
@ 2025-09-01 16:03 ` Eric Dumazet
2025-09-01 19:10 ` David Ranch
` (2 more replies)
0 siblings, 3 replies; 36+ messages in thread
From: Eric Dumazet @ 2025-09-01 16:03 UTC (permalink / raw)
To: F6BVP
Cc: David Ranch, Paolo Abeni, Dan Carpenter, linux-hams, netdev,
Dan Cross, Folkert van Heusden, Florian Westphal
On Mon, Sep 1, 2025 at 8:59 AM F6BVP <f6bvp@free.fr> wrote:
>
> Radioamateur have contributed to Linux since the begining.
>
> If a protocole should be removed from Linux kernel as soon as a commit
> breaks it, Linux itself would be t be abandonned.
>
> AX25 is not responsible of a kernel Oops due to a commit in dev.c
>
> Like David KI6ZHD mentionned, many hams are still experimenting using
> packet radio.
>
> Not mentioning that a large number of pico satellites from universities
> all around the world are using AX25 for TM/TC !
>
Keep calm, I am just saying that the bisection pointed to a fine commit,
but it took a _lot_ of time to root-cause the issue.
And the bug is in ax25, not in Paolo patch.
Please test the fix, and thank me for actually working on a fix, while
I have more urgent work on my plate.
> Bernard Pidoux
> F6BVP /AI7BG
> Founder president AMSAT-France
> President Dimension Parabole
> http://radiotelescope-lavillette.fr
>
>
> Le 01/09/2025 à 16:43, David Ranch a écrit :
> >
> > Hello Eric, Everyone,
> >
> >>> At some point we will have to remove ax25, this has been quite broken
> >>> for a long time.
> >
> > I can appreciate that the code implementing AX.25 in the kernel is very
> > old but say it needs to be removed will impact a lot of people. There
> > is a very active community around AX.25 packet radio today and Linux's
> > native implementation still offers features and functions that aren't
> > implemented anywhere else. There are also some large / popular projects
> > that are dependent on it for their connectivity via libax25, etc. I
> > continue to hope someone will be willing to step forward and write a
> > modernized version of this stack (and netrom and rose too) so we can
> > continue to run things natively on Linux.
> >
> > --David
> > KI6ZHD
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ROSE] [AX25] 6.15.10 long term stable kernel oops
2025-09-01 12:05 ` Eric Dumazet
[not found] ` <cd0461e0-8136-4f90-df7b-64f1e43e78d4@trinnet.net>
@ 2025-09-01 19:04 ` David Ranch
1 sibling, 0 replies; 36+ messages in thread
From: David Ranch @ 2025-09-01 19:04 UTC (permalink / raw)
To: Eric Dumazet, F6BVP
Cc: Paolo Abeni, Dan Carpenter, linux-hams, netdev, Dan Cross,
Folkert van Heusden, Florian Westphal
Hello Eric, Everyone,
>> At some point we will have to remove ax25, this has been quite broken
>> for a long time.
I can appreciate that the code implementing AX.25 in the kernel is very
old but say it needs to be removed will impact a lot of people. There
is a very active community around AX.25 packet radio today and Linux's
native implementation still offers features and functions that aren't
implemented anywhere else. There are also some large / popular projects
that are dependent on it for their connectivity via libax25, etc. I
continue to hope someone will be willing to step forward and write a
modernized version of this stack (and netrom and rose too) so we can
continue to run things natively on Linux.
--David
KI6ZHD
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ROSE] [AX25] 6.15.10 long term stable kernel oops
2025-09-01 16:03 ` Eric Dumazet
@ 2025-09-01 19:10 ` David Ranch
2025-09-01 19:16 ` Eric Dumazet
2025-09-02 7:44 ` F6BVP
2025-09-03 9:51 ` [BUG] [ROSE] slab-use-after-free in lock_timer_base Bernard Pidoux
2 siblings, 1 reply; 36+ messages in thread
From: David Ranch @ 2025-09-01 19:10 UTC (permalink / raw)
To: Eric Dumazet, F6BVP
Cc: Paolo Abeni, Dan Carpenter, linux-hams, netdev, Dan Cross,
Folkert van Heusden, Florian Westphal
> Keep calm, I am just saying that the bisection pointed to a fine commit,
> but it took a _lot_ of time to root-cause the issue.
>
> And the bug is in ax25, not in Paolo patch.
>
> Please test the fix, and thank me for actually working on a fix, while
> I have more urgent work on my plate.
Much appreciated for your work on this patch but I'm curious, is the
core issue here on this other committer's patch or just weaknesses in
the original AX.25 stack code?
--David
KI6ZHD
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ROSE] [AX25] 6.15.10 long term stable kernel oops
2025-09-01 19:10 ` David Ranch
@ 2025-09-01 19:16 ` Eric Dumazet
0 siblings, 0 replies; 36+ messages in thread
From: Eric Dumazet @ 2025-09-01 19:16 UTC (permalink / raw)
To: David Ranch
Cc: F6BVP, Paolo Abeni, Dan Carpenter, linux-hams, netdev, Dan Cross,
Folkert van Heusden, Florian Westphal
On Mon, Sep 1, 2025 at 12:10 PM David Ranch <linux-hams@trinnet.net> wrote:
>
>
> > Keep calm, I am just saying that the bisection pointed to a fine commit,
> > but it took a _lot_ of time to root-cause the issue.
> >
> > And the bug is in ax25, not in Paolo patch.
> >
> > Please test the fix, and thank me for actually working on a fix, while
> > I have more urgent work on my plate.
>
> Much appreciated for your work on this patch but I'm curious, is the
> core issue here on this other committer's patch or just weaknesses in
> the original AX.25 stack code?
Plain day-0 bug in ax25 code.
It was probably not working well with packet capture (tcpdump), with
possibly silent corruptions.
The kind of bugs that can be exploited by malicious actors.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ROSE] [AX25] 6.15.10 long term stable kernel oops
2025-09-01 16:03 ` Eric Dumazet
2025-09-01 19:10 ` David Ranch
@ 2025-09-02 7:44 ` F6BVP
2025-09-02 7:55 ` Eric Dumazet
2025-09-03 9:51 ` [BUG] [ROSE] slab-use-after-free in lock_timer_base Bernard Pidoux
2 siblings, 1 reply; 36+ messages in thread
From: F6BVP @ 2025-09-02 7:44 UTC (permalink / raw)
To: Eric Dumazet
Cc: David Ranch, Paolo Abeni, Dan Carpenter, linux-hams, netdev,
Dan Cross, Folkert van Heusden, Florian Westphal
I tested the fix and validated it on different kernels versions.
All are doing fine : 6.14.11 , 6.15.11, 6.16.4
Congratulations and many thanks to Eric Dumazet for spending his time on
repairing AX25 mkiss serial connexions.
Hamradio fans will be able to continue experimenting with AX25 using
next Linux developments.
Bernard Pidoux
F6BVP / AI7BG
http://radiotelescope-lavillette.fr
Le 01/09/2025 à 18:03, Eric Dumazet a écrit :
> Keep calm, I am just saying that the bisection pointed to a fine commit,
> but it took a _lot_ of time to root-cause the issue.
>
> And the bug is in ax25, not in Paolo patch.
>
> Please test the fix, and thank me for actually working on a fix, while
> I have more urgent work on my plate.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ROSE] [AX25] 6.15.10 long term stable kernel oops
2025-09-01 12:04 ` Eric Dumazet
2025-09-01 12:05 ` Eric Dumazet
@ 2025-09-02 7:54 ` F6BVP
1 sibling, 0 replies; 36+ messages in thread
From: F6BVP @ 2025-09-02 7:54 UTC (permalink / raw)
To: Eric Dumazet
Cc: Paolo Abeni, Dan Carpenter, linux-hams, netdev, Dan Cross,
David Ranch, Folkert van Heusden, Florian Westphal
I tested the fix and validated it on different kernels versions.
All are doing fine : 6.14.11 , 6.15.11, 6.16.4
Congratulations and many thanks to Eric Dumazet for spending his time on
repairing AX25 mkiss serial connexions.
Hamradio fans will be able to continue experimenting with AX25 using
next Linux developments.
Bernard Pidoux
F6BVP / AI7BG
http://radiotelescope-lavillette.fr
Le 01/09/2025 à 14:04, Eric Dumazet a écrit :
> On Sat, Aug 30, 2025 at 4:37 PM F6BVP <f6bvp@free.fr> wrote:
>>
>> Here is a bad commit report by git bisect and the corresponding decoded
>> stack trace of kernel panic triggered when mkiss receives AX25 packet.
>>
>> All kernels following 6.14.11, i.e. starting with 6.15.1 until net-next
>> are affected by the issue.
>>
>> I would be pleased to check any patch correcting the issue.
>>
>
> Thanks for the report.
>
> At some point we will have to remove ax25, this has been quite broken
> for a long time.
>
> Please try :
>
> diff --git a/net/ax25/ax25_in.c b/net/ax25/ax25_in.c
> index 1cac25aca637..f2d66af86359 100644
> --- a/net/ax25/ax25_in.c
> +++ b/net/ax25/ax25_in.c
> @@ -433,6 +433,10 @@ static int ax25_rcv(struct sk_buff *skb, struct
> net_device *dev,
> int ax25_kiss_rcv(struct sk_buff *skb, struct net_device *dev,
> struct packet_type *ptype, struct net_device *orig_dev)
> {
> + skb = skb_share_check(skb, GFP_ATOMIC);
> + if (!skb)
> + return NET_RX_DROP;
> +
> skb_orphan(skb);
>
> if (!net_eq(dev_net(dev), &init_net)) {
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ROSE] [AX25] 6.15.10 long term stable kernel oops
2025-09-02 7:44 ` F6BVP
@ 2025-09-02 7:55 ` Eric Dumazet
0 siblings, 0 replies; 36+ messages in thread
From: Eric Dumazet @ 2025-09-02 7:55 UTC (permalink / raw)
To: F6BVP
Cc: David Ranch, Paolo Abeni, Dan Carpenter, linux-hams, netdev,
Dan Cross, Folkert van Heusden, Florian Westphal
On Tue, Sep 2, 2025 at 12:45 AM F6BVP <f6bvp@free.fr> wrote:
>
> I tested the fix and validated it on different kernels versions.
>
> All are doing fine : 6.14.11 , 6.15.11, 6.16.4
>
> Congratulations and many thanks to Eric Dumazet for spending his time on
> repairing AX25 mkiss serial connexions.
>
> Hamradio fans will be able to continue experimenting with AX25 using
> next Linux developments.
>
Great, many thanks again for your report, bisection, and tests.
I will send the formal patch right away.
> Bernard Pidoux
> F6BVP / AI7BG
> http://radiotelescope-lavillette.fr
>
>
> Le 01/09/2025 à 18:03, Eric Dumazet a écrit :
>
> > Keep calm, I am just saying that the bisection pointed to a fine commit,
> > but it took a _lot_ of time to root-cause the issue.
> >
> > And the bug is in ax25, not in Paolo patch.
> >
> > Please test the fix, and thank me for actually working on a fix, while
> > I have more urgent work on my plate.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [BUG] [ROSE] slab-use-after-free in lock_timer_base
2025-09-01 16:03 ` Eric Dumazet
2025-09-01 19:10 ` David Ranch
2025-09-02 7:44 ` F6BVP
@ 2025-09-03 9:51 ` Bernard Pidoux
2025-09-03 10:01 ` Eric Dumazet
2 siblings, 1 reply; 36+ messages in thread
From: Bernard Pidoux @ 2025-09-03 9:51 UTC (permalink / raw)
To: linux-hams, netdev; +Cc: Eric Dumazet
[-- Attachment #1: Type: text/plain, Size: 451 bytes --]
On 6.16.4 kernel patched with last ROSE commit for refcount use
rose_remove_node() is causing refcount_t: underflow; use-after-free
List: linux-stable-commits
Subject: Patch "net: rose: split remove and free operations in
rose_remove_neigh()" has been added to the 6.1
From: Sasha Levin <sashal () kernel ! org>
Date: 2025-08-30 20:20:24
Message-ID: 20250830202024.2485006-1-sashal () kernel ! org
Bernard Pidoux
F6BVP / AI7BG
[-- Attachment #2: slab-use-after-free --]
[-- Type: text/plain, Size: 23665 bytes --]
[50355.077325] Here I am: rose_remove_node:209
[50355.077396] ==================================================================
[50355.077411] BUG: KASAN: slab-use-after-free in lock_timer_base (kernel/time/timer.c:1000 (discriminator 2))
[50355.077447] Read of size 4 at addr ffff888133567998 by task ax25ipd/2247
[50355.077468] CPU: 1 UID: 0 PID: 2247 Comm: ax25ipd Not tainted 6.16.4-local-dirty #3 PREEMPT(voluntary)
[50355.077481] Hardware name: To be filled by O.E.M. To be filled by O.E.M./CK3, BIOS 5.011 09/16/2020
[50355.077486] Call Trace:
[50355.077489] <TASK>
[50355.077494] dump_stack_lvl (lib/dump_stack.c:123)
[50355.077510] print_report (mm/kasan/report.c:379 mm/kasan/report.c:482)
[50355.077523] ? __pfx__raw_spin_lock_irqsave (kernel/locking/spinlock.c:161)
[50355.077535] ? unregister_netdevice_queue (net/core/dev.c:11998)
[50355.077547] ? kasan_complete_mode_report_info (mm/kasan/report_generic.c:179 (discriminator 14))
[50355.077561] kasan_report (mm/kasan/report.c:597)
[50355.077567] ? lock_timer_base (kernel/time/timer.c:1000 (discriminator 2))
[50355.077581] ? lock_timer_base (kernel/time/timer.c:1000 (discriminator 2))
[50355.077596] __asan_report_load4_noabort (mm/kasan/report_generic.c:380)
[50355.077607] lock_timer_base (kernel/time/timer.c:1000 (discriminator 2))
[50355.077618] __timer_delete_sync (kernel/time/timer.c:1461 kernel/time/timer.c:1620)
[50355.077628] ? __pfx___timer_delete_sync (kernel/time/timer.c:1591)
[50355.077635] ? __kasan_slab_free (mm/kasan/common.c:281)
[50355.077644] timer_delete_sync (kernel/time/timer.c:1676)
[50355.077653] rose_remove_neigh (net/rose/rose_route.c:237) rose
[50355.077683] rose_rt_device_down (./include/linux/instrumented.h:96 ./include/linux/atomic/atomic-instrumented.h:400 ./include/linux/refcount.h:389 ./include/linux/refcount.h:432 ./include/linux/refcount.h:450 ./include/net/rose.h:162 net/rose/rose_route.c:520) rose
[50355.077700] ? _raw_spin_unlock_bh (kernel/locking/spinlock.c:211)
[50355.077714] ? rose_kill_by_neigh (net/rose/af_rose.c:178) rose
[50355.077735] rose_device_event (net/rose/af_rose.c:249) rose
[50355.077752] notifier_call_chain (kernel/notifier.c:87)
[50355.077766] ? nlmsg_notify (./include/net/netlink.h:1151 ./include/net/netlink.h:1170 net/netlink/af_netlink.c:2595)
[50355.077783] raw_notifier_call_chain (kernel/notifier.c:454)
[50355.077797] call_netdevice_notifiers_info (net/core/dev.c:2231)
[50355.077812] dev_close_many (net/core/dev.c:1786)
[50355.077819] ? __pfx_stack_trace_consume_entry (kernel/stacktrace.c:83)
[50355.077828] ? __pfx_dev_close_many (net/core/dev.c:1773)
[50355.077836] ? update_stack_state (./arch/x86/include/asm/unwind.h:111 (discriminator 1) ./arch/x86/include/asm/unwind.h:127 (discriminator 1) arch/x86/kernel/unwind_frame.c:253 (discriminator 1))
[50355.077850] unregister_netdevice_many_notify (net/core/dev.c:12061)
[50355.077862] ? __pfx_unregister_netdevice_many_notify (net/core/dev.c:12016)
[50355.077870] ? is_bpf_text_address (kernel/bpf/core.c:773 (discriminator 1))
[50355.077879] ? kernel_text_address (kernel/extable.c:125 (discriminator 1) kernel/extable.c:94 (discriminator 1))
[50355.077891] ? __kernel_text_address (kernel/extable.c:79 (discriminator 1))
[50355.077899] ? unwind_get_return_address (arch/x86/kernel/unwind_frame.c:19 (discriminator 1))
[50355.077909] ? __pfx_stack_trace_consume_entry (kernel/stacktrace.c:83)
[50355.077917] ? arch_stack_walk (arch/x86/kernel/stacktrace.c:26)
[50355.077931] unregister_netdevice_queue (net/core/dev.c:11998)
[50355.077941] ? __pfx_unregister_netdevice_queue (net/core/dev.c:11987)
[50355.077948] ? __kasan_check_write (mm/kasan/shadow.c:38)
[50355.077959] ? rtnl_lock (net/core/rtnetlink.c:81)
[50355.077972] unregister_netdev (./include/net/net_namespace.h:409 ./include/linux/netdevice.h:2713 net/core/dev.c:2158 net/core/dev.c:12171)
[50355.077979] mkiss_close (drivers/net/hamradio/mkiss.c:800) mkiss
[50355.077993] tty_ldisc_close (drivers/tty/tty_ldisc.c:457)
[50355.078003] tty_ldisc_hangup (drivers/tty/tty_ldisc.c:614 drivers/tty/tty_ldisc.c:729)
[50355.078012] __tty_hangup.part.0 (./include/linux/spinlock.h:376 drivers/tty/tty_io.c:623)
[50355.078022] ? mutex_unlock (./arch/x86/include/asm/atomic64_64.h:101 (discriminator 5) ./include/linux/atomic/atomic-arch-fallback.h:4329 (discriminator 5) ./include/linux/atomic/atomic-long.h:1506 (discriminator 5) ./include/linux/atomic/atomic-instrumented.h:4481 (discriminator 5) kernel/locking/mutex.c:167 (discriminator 5) kernel/locking/mutex.c:537 (discriminator 5))
[50355.078035] tty_vhangup (drivers/tty/tty_io.c:692)
[50355.078044] pty_close (drivers/tty/pty.c:81)
[50355.078056] tty_release (drivers/tty/tty_io.c:1748)
[50355.078063] ? __pfx_locks_remove_file (fs/locks.c:2686)
[50355.078074] __fput (fs/file_table.c:465)
[50355.078084] ? _raw_spin_lock_irq (./arch/x86/include/asm/atomic.h:107 (discriminator 4) ./include/linux/atomic/atomic-arch-fallback.h:2170 (discriminator 4) ./include/linux/atomic/atomic-instrumented.h:1302 (discriminator 4) ./include/asm-generic/qspinlock.h:111 (discriminator 4) ./include/linux/spinlock.h:187 (discriminator 4) ./include/linux/spinlock_api_smp.h:120 (discriminator 4) kernel/locking/spinlock.c:170 (discriminator 4))
[50355.078092] ? __pfx__raw_spin_lock_irq (kernel/locking/spinlock.c:169)
[50355.078102] ____fput (fs/file_table.c:494)
[50355.078110] task_work_run (kernel/task_work.c:228)
[50355.078119] ? __pfx_task_work_run (kernel/task_work.c:195)
[50355.078128] do_exit (kernel/exit.c:965)
[50355.078140] ? do_wp_page (mm/memory.c:4017)
[50355.078153] ? __pfx_do_exit (kernel/exit.c:897)
[50355.078165] ? __pfx_zap_other_threads (kernel/signal.c:1338)
[50355.078172] ? __pfx_do_wp_page (mm/memory.c:3940)
[50355.078182] do_group_exit (kernel/exit.c:1086)
[50355.078192] __x64_sys_exit_group (kernel/exit.c:1114)
[50355.078200] x64_sys_call (arch/x86/entry/syscall_64.c:37)
[50355.078209] do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1))
[50355.078219] ? __handle_mm_fault (mm/memory.c:6085 mm/memory.c:6212)
[50355.078227] ? __pfx___handle_mm_fault (mm/memory.c:6121)
[50355.078235] ? __kasan_check_read (mm/kasan/shadow.c:32)
[50355.078243] ? count_memcg_events (./arch/x86/include/asm/atomic.h:23 ./include/linux/atomic/atomic-arch-fallback.h:457 ./include/linux/atomic/atomic-instrumented.h:33 mm/memcontrol.c:560 mm/memcontrol.c:585 mm/memcontrol.c:564 mm/memcontrol.c:848)
[50355.078254] ? handle_mm_fault (mm/memory.c:6254 mm/memory.c:6407)
[50355.078265] ? __kasan_check_read (mm/kasan/shadow.c:32)
[50355.078274] ? fpregs_assert_state_consistent (./arch/x86/include/asm/bitops.h:206 (discriminator 1) ./arch/x86/include/asm/bitops.h:238 (discriminator 1) ./include/asm-generic/bitops/instrumented-non-atomic.h:142 (discriminator 1) ./include/linux/thread_info.h:126 (discriminator 1) arch/x86/kernel/fpu/core.c:862 (discriminator 1))
[50355.078286] ? irqentry_exit_to_user_mode (./arch/x86/include/asm/entry-common.h:65 (discriminator 1) ./include/linux/entry-common.h:332 (discriminator 1) kernel/entry/common.c:184 (discriminator 1))
[50355.078298] ? irqentry_exit (kernel/entry/common.c:320)
[50355.078308] ? exc_page_fault (arch/x86/mm/fault.c:1536)
[50355.078316] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
[50355.078326] RIP: 0033:0x76e08b0ee21d
[50355.078335] Code: Unable to access opcode bytes at 0x76e08b0ee1f3.
Code starting with the faulting instruction
===========================================
[50355.078340] RSP: 002b:00007ffdcf08da78 EFLAGS: 00000206 ORIG_RAX: 00000000000000e7
[50355.078352] RAX: ffffffffffffffda RBX: 000076e08b204fa8 RCX: 000076e08b0ee21d
[50355.078357] RDX: 00000000000000e7 RSI: ffffffffffffff88 RDI: 0000000000000001
[50355.078361] RBP: 00007ffdcf08dad0 R08: 00007ffdcf08da18 R09: 0000000000000000
[50355.078366] R10: 00007ffdcf08d98f R11: 0000000000000206 R12: 0000000000000001
[50355.078371] R13: 0000000000000000 R14: 0000000000000001 R15: 000076e08b204fc0
[50355.078380] </TASK>
[50355.078611] Allocated by task 2376 on cpu 1 at 80.019686s:
[50355.078625] kasan_save_stack (mm/kasan/common.c:48)
[50355.078648] kasan_save_track (mm/kasan/common.c:68)
[50355.078659] kasan_save_alloc_info (mm/kasan/generic.c:563)
[50355.078672] __kasan_kmalloc (mm/kasan/common.c:377 mm/kasan/common.c:394)
[50355.078684] __kmalloc_cache_noprof (mm/slub.c:4366)
[50355.078705] rose_rt_ioctl (./include/linux/slab.h:905 net/rose/rose_route.c:85 net/rose/rose_route.c:760) rose
[50355.078731] rose_ioctl (net/rose/af_rose.c:1387) rose
[50355.078747] sock_do_ioctl (net/socket.c:1198)
[50355.078764] sock_ioctl (net/socket.c:1316)
[50355.078772] __x64_sys_ioctl (fs/ioctl.c:52 fs/ioctl.c:907 fs/ioctl.c:893 fs/ioctl.c:893)
[50355.078786] x64_sys_call (arch/x86/entry/syscall_64.c:41)
[50355.078803] do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1))
[50355.078818] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
[50355.078833] Freed by task 2247 on cpu 1 at 50355.077393s:
[50355.078842] kasan_save_stack (mm/kasan/common.c:48)
[50355.078856] kasan_save_track (mm/kasan/common.c:68)
[50355.078862] kasan_save_free_info (mm/kasan/generic.c:579 (discriminator 1))
[50355.078871] __kasan_slab_free (mm/kasan/common.c:271)
[50355.078879] kfree (mm/slub.c:4648 (discriminator 3) mm/slub.c:4847 (discriminator 3))
[50355.078893] rose_rt_device_down (./include/net/rose.h:166 ./include/net/rose.h:160 net/rose/rose_route.c:512) rose
[50355.078910] rose_device_event (net/rose/af_rose.c:249) rose
[50355.078923] notifier_call_chain (kernel/notifier.c:87)
[50355.078936] raw_notifier_call_chain (kernel/notifier.c:454)
[50355.078947] call_netdevice_notifiers_info (net/core/dev.c:2231)
[50355.078957] dev_close_many (net/core/dev.c:1786)
[50355.078965] unregister_netdevice_many_notify (net/core/dev.c:12061)
[50355.078973] unregister_netdevice_queue (net/core/dev.c:11998)
[50355.078982] unregister_netdev (./include/net/net_namespace.h:409 ./include/linux/netdevice.h:2713 net/core/dev.c:2158 net/core/dev.c:12171)
[50355.078989] mkiss_close (drivers/net/hamradio/mkiss.c:800) mkiss
[50355.078999] tty_ldisc_close (drivers/tty/tty_ldisc.c:457)
[50355.079008] tty_ldisc_hangup (drivers/tty/tty_ldisc.c:614 drivers/tty/tty_ldisc.c:729)
[50355.079014] __tty_hangup.part.0 (./include/linux/spinlock.h:376 drivers/tty/tty_io.c:623)
[50355.079026] tty_vhangup (drivers/tty/tty_io.c:692)
[50355.079034] pty_close (drivers/tty/pty.c:81)
[50355.079046] tty_release (drivers/tty/tty_io.c:1748)
[50355.079053] __fput (fs/file_table.c:465)
[50355.079066] ____fput (fs/file_table.c:494)
[50355.079076] task_work_run (kernel/task_work.c:228)
[50355.079087] do_exit (kernel/exit.c:965)
[50355.079096] do_group_exit (kernel/exit.c:1086)
[50355.079103] __x64_sys_exit_group (kernel/exit.c:1114)
[50355.079113] x64_sys_call (arch/x86/entry/syscall_64.c:37)
[50355.079127] do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1))
[50355.079140] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
[50355.079156] The buggy address belongs to the object at ffff888133567900
which belongs to the cache kmalloc-rnd-15-192 of size 192
[50355.079165] The buggy address is located 152 bytes inside of
freed 192-byte region [ffff888133567900, ffff8881335679c0)
[50355.079178] The buggy address belongs to the physical page:
[50355.079189] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x133567
[50355.079201] ksm flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
[50355.079213] page_type: f5(slab)
[50355.079225] raw: 0017ffffc0000000 ffff888100059cc0 ffffea0004c60380 dead000000000003
[50355.079234] raw: 0000000000000000 0000000000100010 00000000f5000000 0000000000000000
[50355.079240] page dumped because: kasan: bad access detected
[50355.079250] Memory state around the buggy address:
[50355.079257] ffff888133567880: 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc
[50355.079265] ffff888133567900: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[50355.079272] >ffff888133567980: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[50355.079278] ^
[50355.079285] ffff888133567a00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[50355.079292] ffff888133567a80: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[50355.079298] ==================================================================
[50355.079335] Disabling lock debugging due to kernel taint
[50355.079345] ------------[ cut here ]------------
[50355.079351] refcount_t: underflow; use-after-free.
[50355.079389] WARNING: CPU: 1 PID: 2247 at lib/refcount.c:28 refcount_warn_saturate (lib/refcount.c:28 (discriminator 1))
[50355.079405] Modules linked in: netrom mkiss rose ax25 snd_seq_dummy snd_hrtimer cmac nls_utf8 cifs cifs_arc4 nls_ucs2_utils netfs cifs_md4 snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp coretemp i915 qrtr kvm_intel snd_hda_codec_realtek snd_hda_codec_generic kvm snd_hda_scodec_component snd_hda_intel snd_intel_dspcfg snd_hda_codec spi_nor snd_hwdep snd_hda_core mtd snd_pcm at24 mei_hdcp mei_pxp spi_intel_platform spi_intel intel_rapl_msr irqbypass snd_seq polyval_clmulni ghash_clmulni_intel aesni_intel binfmt_misc snd_seq_device rapl i2c_algo_bit processor_thermal_device_pci_legacy drm_buddy intel_soc_dts_iosf intel_cstate ttm processor_thermal_device processor_thermal_wt_hint snd_timer i2c_i801 platform_temperature_control i2c_smbus snd processor_thermal_rfim intel_pch_thermal mei_me processor_thermal_rapl lpc_ich intel_rapl_common soundcore mei drm_display_helper processor_thermal_wt_req intel_pmc_core processor_thermal_power_floor processor_thermal_mbox int340x_thermal_zone pmt_telemetry pmt_class
[50355.079675] intel_pmc_ssram_telemetry video acpi_pad wmi intel_vsec nls_iso8859_1 input_leds mac_hid sch_fq_codel msr parport_pc ppdev lp parport efi_pstore nfnetlink dmi_sysfs autofs4 r8169 ahci realtek libahci hid_generic usbhid hid uas usb_storage [last unloaded: mkiss]
[50355.079783] CPU: 1 UID: 0 PID: 2247 Comm: ax25ipd Tainted: G B 6.16.4-local-dirty #3 PREEMPT(voluntary)
[50355.079798] Tainted: [B]=BAD_PAGE
[50355.079804] Hardware name: To be filled by O.E.M. To be filled by O.E.M./CK3, BIOS 5.011 09/16/2020
[50355.079811] RIP: 0010:refcount_warn_saturate (lib/refcount.c:28 (discriminator 1))
[50355.079821] Code: eb 97 0f b6 1d 75 73 8d 03 80 fb 01 0f 87 fe 80 81 fe 83 e3 01 75 82 48 c7 c7 40 51 ac b3 c6 05 59 73 8d 03 01 e8 52 c0 aa fe <0f> 0b e9 68 ff ff ff 0f b6 1d 47 73 8d 03 80 fb 01 0f 87 bb 80 81
All code
========
0: eb 97 jmp 0xffffffffffffff99
2: 0f b6 1d 75 73 8d 03 movzbl 0x38d7375(%rip),%ebx # 0x38d737e
9: 80 fb 01 cmp $0x1,%bl
c: 0f 87 fe 80 81 fe ja 0xfffffffffe818110
12: 83 e3 01 and $0x1,%ebx
15: 75 82 jne 0xffffffffffffff99
17: 48 c7 c7 40 51 ac b3 mov $0xffffffffb3ac5140,%rdi
1e: c6 05 59 73 8d 03 01 movb $0x1,0x38d7359(%rip) # 0x38d737e
25: e8 52 c0 aa fe call 0xfffffffffeaac07c
2a:* 0f 0b ud2 <-- trapping instruction
2c: e9 68 ff ff ff jmp 0xffffffffffffff99
31: 0f b6 1d 47 73 8d 03 movzbl 0x38d7347(%rip),%ebx # 0x38d737f
38: 80 fb 01 cmp $0x1,%bl
3b: 0f .byte 0xf
3c: 87 .byte 0x87
3d: bb .byte 0xbb
3e: 80 .byte 0x80
3f: 81 .byte 0x81
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: e9 68 ff ff ff jmp 0xffffffffffffff6f
7: 0f b6 1d 47 73 8d 03 movzbl 0x38d7347(%rip),%ebx # 0x38d7355
e: 80 fb 01 cmp $0x1,%bl
11: 0f .byte 0xf
12: 87 .byte 0x87
13: bb .byte 0xbb
14: 80 .byte 0x80
15: 81 .byte 0x81
[50355.079831] RSP: 0018:ffff8881140cf370 EFLAGS: 00010246
[50355.079842] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[50355.079849] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[50355.079856] RBP: ffff8881140cf380 R08: 0000000000000000 R09: 0000000000000000
[50355.079864] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000003
[50355.079871] R13: ffff888128d67a00 R14: ffff88812d7e0000 R15: ffff888133567900
[50355.079879] FS: 0000000000000000(0000) GS:ffff8882592fd000(0000) knlGS:0000000000000000
[50355.079888] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[50355.079896] CR2: 000076e08b265008 CR3: 000000011e8be006 CR4: 00000000001726f0
[50355.079904] Call Trace:
[50355.079910] <TASK>
[50355.079917] rose_rt_device_down (./include/linux/refcount.h:400 ./include/linux/refcount.h:432 ./include/linux/refcount.h:450 ./include/net/rose.h:162 net/rose/rose_route.c:520) rose
[50355.079937] ? _raw_spin_unlock_bh (kernel/locking/spinlock.c:211)
[50355.079948] ? rose_kill_by_neigh (net/rose/af_rose.c:178) rose
[50355.079969] rose_device_event (net/rose/af_rose.c:249) rose
[50355.079986] notifier_call_chain (kernel/notifier.c:87)
[50355.079998] ? nlmsg_notify (./include/net/netlink.h:1151 ./include/net/netlink.h:1170 net/netlink/af_netlink.c:2595)
[50355.080011] raw_notifier_call_chain (kernel/notifier.c:454)
[50355.080022] call_netdevice_notifiers_info (net/core/dev.c:2231)
[50355.080034] dev_close_many (net/core/dev.c:1786)
[50355.080044] ? __pfx_stack_trace_consume_entry (kernel/stacktrace.c:83)
[50355.080055] ? __pfx_dev_close_many (net/core/dev.c:1773)
[50355.080067] ? update_stack_state (./arch/x86/include/asm/unwind.h:111 (discriminator 1) ./arch/x86/include/asm/unwind.h:127 (discriminator 1) arch/x86/kernel/unwind_frame.c:253 (discriminator 1))
[50355.080082] unregister_netdevice_many_notify (net/core/dev.c:12061)
[50355.080096] ? __pfx_unregister_netdevice_many_notify (net/core/dev.c:12016)
[50355.080107] ? is_bpf_text_address (kernel/bpf/core.c:773 (discriminator 1))
[50355.080119] ? kernel_text_address (kernel/extable.c:125 (discriminator 1) kernel/extable.c:94 (discriminator 1))
[50355.080131] ? __kernel_text_address (kernel/extable.c:79 (discriminator 1))
[50355.080141] ? unwind_get_return_address (arch/x86/kernel/unwind_frame.c:19 (discriminator 1))
[50355.080152] ? __pfx_stack_trace_consume_entry (kernel/stacktrace.c:83)
[50355.080165] ? arch_stack_walk (arch/x86/kernel/stacktrace.c:26)
[50355.080184] unregister_netdevice_queue (net/core/dev.c:11998)
[50355.080197] ? __pfx_unregister_netdevice_queue (net/core/dev.c:11987)
[50355.080221] ? __kasan_check_write (mm/kasan/shadow.c:38)
[50355.080235] ? rtnl_lock (net/core/rtnetlink.c:81)
[50355.080251] unregister_netdev (./include/net/net_namespace.h:409 ./include/linux/netdevice.h:2713 net/core/dev.c:2158 net/core/dev.c:12171)
[50355.080262] mkiss_close (drivers/net/hamradio/mkiss.c:800) mkiss
[50355.080276] tty_ldisc_close (drivers/tty/tty_ldisc.c:457)
[50355.080288] tty_ldisc_hangup (drivers/tty/tty_ldisc.c:614 drivers/tty/tty_ldisc.c:729)
[50355.080300] __tty_hangup.part.0 (./include/linux/spinlock.h:376 drivers/tty/tty_io.c:623)
[50355.080310] ? mutex_unlock (./arch/x86/include/asm/atomic64_64.h:101 (discriminator 5) ./include/linux/atomic/atomic-arch-fallback.h:4329 (discriminator 5) ./include/linux/atomic/atomic-long.h:1506 (discriminator 5) ./include/linux/atomic/atomic-instrumented.h:4481 (discriminator 5) kernel/locking/mutex.c:167 (discriminator 5) kernel/locking/mutex.c:537 (discriminator 5))
[50355.080327] tty_vhangup (drivers/tty/tty_io.c:692)
[50355.080337] pty_close (drivers/tty/pty.c:81)
[50355.080350] tty_release (drivers/tty/tty_io.c:1748)
[50355.080361] ? __pfx_locks_remove_file (fs/locks.c:2686)
[50355.080376] __fput (fs/file_table.c:465)
[50355.080387] ? _raw_spin_lock_irq (./arch/x86/include/asm/atomic.h:107 (discriminator 4) ./include/linux/atomic/atomic-arch-fallback.h:2170 (discriminator 4) ./include/linux/atomic/atomic-instrumented.h:1302 (discriminator 4) ./include/asm-generic/qspinlock.h:111 (discriminator 4) ./include/linux/spinlock.h:187 (discriminator 4) ./include/linux/spinlock_api_smp.h:120 (discriminator 4) kernel/locking/spinlock.c:170 (discriminator 4))
[50355.080400] ? __pfx__raw_spin_lock_irq (kernel/locking/spinlock.c:169)
[50355.080412] ____fput (fs/file_table.c:494)
[50355.080423] task_work_run (kernel/task_work.c:228)
[50355.080435] ? __pfx_task_work_run (kernel/task_work.c:195)
[50355.080449] do_exit (kernel/exit.c:965)
[50355.080462] ? do_wp_page (mm/memory.c:4017)
[50355.080478] ? __pfx_do_exit (kernel/exit.c:897)
[50355.080488] ? __pfx_zap_other_threads (kernel/signal.c:1338)
[50355.080499] ? __pfx_do_wp_page (mm/memory.c:3940)
[50355.080513] do_group_exit (kernel/exit.c:1086)
[50355.080525] __x64_sys_exit_group (kernel/exit.c:1114)
[50355.080536] x64_sys_call (arch/x86/entry/syscall_64.c:37)
[50355.080550] do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1))
[50355.080565] ? __handle_mm_fault (mm/memory.c:6085 mm/memory.c:6212)
[50355.080579] ? __pfx___handle_mm_fault (mm/memory.c:6121)
[50355.080593] ? __kasan_check_read (mm/kasan/shadow.c:32)
[50355.080605] ? count_memcg_events (./arch/x86/include/asm/atomic.h:23 ./include/linux/atomic/atomic-arch-fallback.h:457 ./include/linux/atomic/atomic-instrumented.h:33 mm/memcontrol.c:560 mm/memcontrol.c:585 mm/memcontrol.c:564 mm/memcontrol.c:848)
[50355.080619] ? handle_mm_fault (mm/memory.c:6254 mm/memory.c:6407)
[50355.080632] ? __kasan_check_read (mm/kasan/shadow.c:32)
[50355.080642] ? fpregs_assert_state_consistent (./arch/x86/include/asm/bitops.h:206 (discriminator 1) ./arch/x86/include/asm/bitops.h:238 (discriminator 1) ./include/asm-generic/bitops/instrumented-non-atomic.h:142 (discriminator 1) ./include/linux/thread_info.h:126 (discriminator 1) arch/x86/kernel/fpu/core.c:862 (discriminator 1))
[50355.080654] ? irqentry_exit_to_user_mode (./arch/x86/include/asm/entry-common.h:65 (discriminator 1) ./include/linux/entry-common.h:332 (discriminator 1) kernel/entry/common.c:184 (discriminator 1))
[50355.080669] ? irqentry_exit (kernel/entry/common.c:320)
[50355.080680] ? exc_page_fault (arch/x86/mm/fault.c:1536)
[50355.080691] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
[50355.080703] RIP: 0033:0x76e08b0ee21d
[50355.080713] Code: Unable to access opcode bytes at 0x76e08b0ee1f3.
Code starting with the faulting instruction
===========================================
[50355.080720] RSP: 002b:00007ffdcf08da78 EFLAGS: 00000206 ORIG_RAX: 00000000000000e7
[50355.080734] RAX: ffffffffffffffda RBX: 000076e08b204fa8 RCX: 000076e08b0ee21d
[50355.080742] RDX: 00000000000000e7 RSI: ffffffffffffff88 RDI: 0000000000000001
[50355.080750] RBP: 00007ffdcf08dad0 R08: 00007ffdcf08da18 R09: 0000000000000000
[50355.080759] R10: 00007ffdcf08d98f R11: 0000000000000206 R12: 0000000000000001
[50355.080768] R13: 0000000000000000 R14: 0000000000000001 R15: 000076e08b204fc0
[50355.080782] </TASK>
[50355.080788] ---[ end trace 0000000000000000 ]---
[50355.080798] Here I am: rose_remove_node:209
[50355.080826] Here I am: rose_remove_node:209
[50357.352785] NET: Unregistered PF_NETROM protocol family
root@ubuntu-f6bvp:/media/udisk/home/bernard#
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [BUG] [ROSE] slab-use-after-free in lock_timer_base
2025-09-03 9:51 ` [BUG] [ROSE] slab-use-after-free in lock_timer_base Bernard Pidoux
@ 2025-09-03 10:01 ` Eric Dumazet
2025-09-03 10:11 ` F6BVP
0 siblings, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2025-09-03 10:01 UTC (permalink / raw)
To: Bernard Pidoux, Takamitsu Iwai; +Cc: linux-hams, netdev
On Wed, Sep 3, 2025 at 2:51 AM Bernard Pidoux <bernard.pidoux@free.fr> wrote:
>
> On 6.16.4 kernel patched with last ROSE commit for refcount use
> rose_remove_node() is causing refcount_t: underflow; use-after-free
>
> List: linux-stable-commits
> Subject: Patch "net: rose: split remove and free operations in
> rose_remove_neigh()" has been added to the 6.1
> From: Sasha Levin <sashal () kernel ! org>
> Date: 2025-08-30 20:20:24
> Message-ID: 20250830202024.2485006-1-sashal () kernel ! org
>
> Bernard Pidoux
> F6BVP / AI7BG
Any particular reason you do not CC the author ?
CC Takamitsu Iwai <takamitz@amazon.co.jp>
BTW, a syzbot report was already sent to the list.
https://syzkaller.appspot.com/bug?extid=7287222a6d88bdb559a7
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [BUG] [ROSE] slab-use-after-free in lock_timer_base
2025-09-03 10:01 ` Eric Dumazet
@ 2025-09-03 10:11 ` F6BVP
2025-09-03 11:07 ` Takamitsu Iwai
0 siblings, 1 reply; 36+ messages in thread
From: F6BVP @ 2025-09-03 10:11 UTC (permalink / raw)
To: Eric Dumazet, Bernard Pidoux, Takamitsu Iwai; +Cc: linux-hams, netdev
I am confused for not having CC Takamisu Iwai.
I apologize for this novice error.
Considering the syzreport report I just wanted to add my contribution to
provide a way to easily reproduce the bug when performing rose network.
Le 03/09/2025 à 12:01, Eric Dumazet a écrit :
> On Wed, Sep 3, 2025 at 2:51 AM Bernard Pidoux <bernard.pidoux@free.fr> wrote:
>>
>> On 6.16.4 kernel patched with last ROSE commit for refcount use
>> rose_remove_node() is causing refcount_t: underflow; use-after-free
>>
>> List: linux-stable-commits
>> Subject: Patch "net: rose: split remove and free operations in
>> rose_remove_neigh()" has been added to the 6.1
>> From: Sasha Levin <sashal () kernel ! org>
>> Date: 2025-08-30 20:20:24
>> Message-ID: 20250830202024.2485006-1-sashal () kernel ! org
>>
>> Bernard Pidoux
>> F6BVP / AI7BG
>
> Any particular reason you do not CC the author ?
>
> CC Takamitsu Iwai <takamitz@amazon.co.jp>
>
> BTW, a syzbot report was already sent to the list.
>
> https://syzkaller.appspot.com/bug?extid=7287222a6d88bdb559a7
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Re: [BUG] [ROSE] slab-use-after-free in lock_timer_base
2025-09-03 10:11 ` F6BVP
@ 2025-09-03 11:07 ` Takamitsu Iwai
0 siblings, 0 replies; 36+ messages in thread
From: Takamitsu Iwai @ 2025-09-03 11:07 UTC (permalink / raw)
To: f6bvp; +Cc: bernard.pidoux, edumazet, linux-hams, netdev, takamitz
Thank you for your reach out, and I'm sorry for the inconvenience
caused by my patch.
I have confirmed that the syzbot report outputs following error.
> ODEBUG: free active (active state 0) object: ffff88804fb25890 object
> type: timer_list hint: rose_t0timer_expiry+0x0/0x150
It seems neigh->t0timer is removed at rose_timer_expiry() when refcount
of rose_neigh becomes 0 even if neigh->t0timer is still alive.
> rose_neigh_put include/net/rose.h:166 [inline]
> rose_timer_expiry+0x53f/0x630 net/rose/rose_timer.c:183
I guess the error you show in this thread is also related to this issue
because the UAF occurs at deleting the timer in rose_remove_neigh().
> [50355.077644] timer_delete_sync (kernel/time/timer.c:1676)
> [50355.077653] rose_remove_neigh (net/rose/rose_route.c:237) rose
I'm not confident, but the aid I can think of now is to increment the
refcount of rose_neigh before setting t0timer or stop t0timer before
freeing at rose_timer_expiry().
Currently, rose_t0timer_expiry() is set to neigh->t0timer at
rose_start_t0timer(), and it is called in rose_transmit_link() firstly.
It seems that refcount is not incremented this paths.
I'm investigating the code paths where we need to increment refcount
exactly, but I'm sorry I'm struggling for tracing the reference count
around timer precisely.
If you have a reproducing steps which can be done in a virtual
environment, I'll try it out too.
Sincerely,
Takamitsu
^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2025-09-03 11:07 UTC | newest]
Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <11c5701d-4bf9-4661-ad8a-06690bbe1c1c@free.fr>
[not found] ` <fff0b3eb-ea42-4475-970d-30622dc25dca@free.fr>
2025-08-16 18:45 ` [ROSE] [AX25] 6.15.10 long term stable kernel oops Bernard Pidoux
2025-08-18 10:00 ` Bernard Pidoux
2025-08-18 10:04 ` Folkert van Heusden
2025-08-18 14:19 ` F6BVP
2025-08-18 16:30 ` Dan Cross
2025-08-18 18:28 ` F6BVP
2025-08-18 22:11 ` Dan Cross
2025-08-18 22:31 ` F6BVP
2025-08-20 8:50 ` F6BVP
2025-08-20 19:33 ` kworker/u16 Not tainted F6BVP
2025-08-21 11:28 ` [ROSE] [AX25] 6.15.10 long term stable kernel oops F6BVP
2025-08-21 22:39 ` F6BVP
2025-08-22 3:10 ` Folkert van Heusden
2025-08-24 14:04 ` F6BVP
2025-08-25 12:40 ` Dan Carpenter
2025-08-26 13:31 ` F6BVP
2025-08-26 13:36 ` Eric Dumazet
2025-08-27 14:16 ` F6BVP
2025-08-27 17:30 ` Florian Westphal
2025-08-28 16:39 ` F6BVP
2025-08-30 23:37 ` F6BVP
2025-09-01 12:04 ` Eric Dumazet
2025-09-01 12:05 ` Eric Dumazet
[not found] ` <cd0461e0-8136-4f90-df7b-64f1e43e78d4@trinnet.net>
2025-09-01 15:59 ` F6BVP
2025-09-01 16:03 ` Eric Dumazet
2025-09-01 19:10 ` David Ranch
2025-09-01 19:16 ` Eric Dumazet
2025-09-02 7:44 ` F6BVP
2025-09-02 7:55 ` Eric Dumazet
2025-09-03 9:51 ` [BUG] [ROSE] slab-use-after-free in lock_timer_base Bernard Pidoux
2025-09-03 10:01 ` Eric Dumazet
2025-09-03 10:11 ` F6BVP
2025-09-03 11:07 ` Takamitsu Iwai
2025-09-01 19:04 ` [ROSE] [AX25] 6.15.10 long term stable kernel oops David Ranch
2025-09-02 7:54 ` F6BVP
2025-08-19 21:17 ` [OT] " Miroslav Skoric
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).