From: Taehee Yoo <ap420073@gmail.com>
To: wangyufen <wangyufen@huawei.com>,
Eric Dumazet <edumazet@google.com>,
Dmitry Vyukov <dvyukov@google.com>,
Jarod Wilson <jarod@redhat.com>
Cc: syzbot <syzbot+60748c96cf5c6df8e581@syzkaller.appspotmail.com>,
"Jiří Pírko" <jiri@resnulli.us>,
davem@davemloft.net, kuba@kernel.org,
linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
pabeni@redhat.com, syzkaller-bugs@googlegroups.com,
"Cong Wang" <xiyou.wangcong@gmail.com>
Subject: Re: [syzbot] kernel panic: kernel stack overflow
Date: Fri, 7 Apr 2023 17:17:37 +0900 [thread overview]
Message-ID: <b82de9da-da16-c0de-6e93-460b53179b0f@gmail.com> (raw)
In-Reply-To: <1ddf7fc8-bcb3-ab48-4894-24158e8a9d0f@huawei.com>
Hi wangyufen,
On 2023. 4. 7. 오후 4:22, wangyufen wrote:
>
>
> 在 2022/10/21 19:08, Taehee Yoo 写道:
>> Hi,
>>
>> 2022. 10. 14. 오전 12:00에 Taehee Yoo 이(가) 쓴 글:
>> > Hi,
>> >
>> > On 10/12/22 21:19, Eric Dumazet wrote:
>> > > On Wed, Oct 12, 2022 at 12:53 AM Dmitry Vyukov
<dvyukov@google.com>
>> > wrote:
>> > >>
>> > >> On Wed, 12 Oct 2022 at 09:48, syzbot
>> > >> <syzbot+60748c96cf5c6df8e581@syzkaller.appspotmail.com> wrote:
>> > >>>
>> > >>> Hello,
>> > >>>
>> > >>> syzbot found the following issue on:
>> > >>>
>> > >>> HEAD commit: bbed346d5a96 Merge branch 'for-next/core' into
>> > for-kernelci
>> > >>> git tree:
>> > git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git
>> for-kernelci
>> > >>> console output:
>> > https://syzkaller.appspot.com/x/log.txt?x=14a03a2a880000
>> > >>> kernel config:
>> > https://syzkaller.appspot.com/x/.config?x=aae2d21e7dd80684
>> > >>> dashboard link:
>> > https://syzkaller.appspot.com/bug?extid=60748c96cf5c6df8e581
>> > >>> compiler: Debian clang version
>> > 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld
>> > (GNU Binutils for Debian) 2.35.2
>> > >>> userspace arch: arm64
>> > >>>
>> > >>> Unfortunately, I don't have any reproducer for this issue yet.
>> > >>>
>> > >>> Downloadable assets:
>> > >>> disk image:
>> >
>>
https://storage.googleapis.com/syzbot-assets/11078f50b80b/disk-bbed346d.raw.xz
>> >
>> > >>> vmlinux:
>> >
>>
https://storage.googleapis.com/syzbot-assets/398e5f1e6c84/vmlinux-bbed346d.xz
>> >
>> > >>>
>> > >>> IMPORTANT: if you fix the issue, please add the following tag to
>> > the commit:
>> > >>> Reported-by:
>> syzbot+60748c96cf5c6df8e581@syzkaller.appspotmail.com
>> > >>
>> > >> +Jiri
>> > >>
>> > >> It looks like the issue is with the team device. It seems to call
>> > >> itself infinitely.
>> > >> team_device_event was mentioned in stack overflow bugs in the
>> past:
>> > >>
>> >
>>
https://groups.google.com/g/syzkaller-bugs/search?q=%22team_device_event%22
>> > >>
>> > >
>> > >
>> > > Taehee Yoo, can you take a look ?
>> > >
>> > > Patch series of yours was supposed to limit max nest level to 8
>> > >
>> > >
>> >
>>
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=65921376425fc9c8b7ce647e1f7989f7cdf5dd70
>> >
>> > >
>> >
>> > I found a reproducer.
>> >
>> > #test_team.sh
>> > ip link add dummy0 type dummy
>> > ip link set dummy0 up
>> > for a1 in {0..1}
>> > do
>> > ip link add team$a1 type team
>> > for a2 in {0..1}
>> > do
>> > ip link add team$a1$a2 master team$a1 type team
>> > for a3 in {0..1}
>> > do
>> > ip link add team$a1$a2$a3 master team$a1$a2
>> > type team
>> > for a4 in {0..1}
>> > do
>> > ip link add team$a1$a2$a3$a4 master
>> > team$a1$a2$a3 type team
>> > for a5 in {0..1}
>> > do
>> > ip link add
>> team$a1$a2$a3$a4$a5
>> > master team$a1$a2$a3$a4 type team
>> > for a6 in {0..1}
>> > do
>> > ip link add
>> > team$a1$a2$a3$a4$a5$a6 master team$a1$a2$a3$a4$a5 type team
>> > ip link add
>> > macvlan$a1$a2$a3$a4$a5$a6 link dummy0 master team$a1$a2$a3$a4$a5$a6
>> type
>> > macvlan
>> > ip link set
>> > macvlan$a1$a2$a3$a4$a5$a6 up
>> > ip link set
>> > team$a1$a2$a3$a4$a5$a6 up
>> > done
>> > ip link set
>> team$a1$a2$a3$a4$a5 up
>> > done
>> > ip link set team$a1$a2$a3$a4 up
>> > done
>> > ip link set team$a1$a2$a3 up
>> > done
>> > ip link set team$a1$a2 up
>> > done
>> > ip link set team$a1 up
>> > done
>> >
>> > #test_ethtool.sh
>> > for a1 in {0..1}
>> > do
>> > ethtool -K team$a1 lro $1
>> > for a2 in {0..1}
>> > do
>> > ethtool -K team$a1$a2 lro $1
>> > for a3 in {0..1}
>> > do
>> > ethtool -K team$a1$a2$a3 lro $1
>> > for a4 in {0..1}
>> > do
>> > ethtool -K team$a1$a2$a3$a4 lro $1
>> > for a5 in {0..1}
>> > do
>> > ethtool -K
>> team$a1$a2$a3$a4$a5
>> > lro $1
>> > for a6 in {0..1}
>> > do
>> > ethtool -K
>> > team$a1$a2$a3$a4$a5$a6 lro $1
>> > ethtool -K
>> > macvlan$a1$a2$a3$a4$a5$a6 lro $1
>> > done
>> > done
>> > done
>> > done
>> > done
>> > done
>> >
>> > shell#1
>> > bash test_team.sh
>> > while :
>> > do
>> > bash test_ethtool.sh on
>> > done
>> > shell#2
>> > while :
>> > do
>> > bash test_ethtool.sh off
>> > done
>> >
>> > We can see a very similar call trace with the above reproducer.
>> > I think it is the same issue.
>> > Could you please test it?
>> >
>> > And, I found the fixed same issue too.
>> >
>>
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.0&id=dd912306ff008891c82cd9f63e8181e47a9cb2fb
>> >
>> >
>> https://groups.google.com/g/syzkaller-bugs/c/-5OV1OW-dS4/m/o2Oq6AYSAwAJ
>> >
>>
>> I found the root cause of this issue.
>>
>> This is simpler reproducer.
>>
>> ip link add team0 type team
>> ethtool -K team0 lro on
>> for i in {1..100}
>> do
>> ip link add team$i master team0 type team
>> ethtool -K team$i lro on
>> done
>>
>> ethtool -K team0 lro off
>>
>> The above graph is like below:
>> team0
>> |
>> +------+------+-----+-----+
>> | | | | |
>> team1 team2 team3 ... team100
>>
>> int __netdev_update_features(struct net_device *dev)
>> {
>> struct net_device *upper, *lower;
>> netdev_features_t features;
>> struct list_head *iter;
>> int err = -1;
>> ...
>> sync_lower:
>> /* some features must be disabled on lower devices when
disabled
>> * on an upper device (think: bonding master or bridge)
>> */
>> netdev_for_each_lower_dev(dev, lower, iter)
>> netdev_sync_lower_features(dev, lower, features);
>> ...
>>
>>
>> static void netdev_sync_lower_features(struct net_device *upper,
>> struct net_device *lower, netdev_features_t features)
>> {
>> netdev_features_t upper_disables = NETIF_F_UPPER_DISABLES;
>> netdev_features_t feature;
>> int feature_bit;
>>
>> for_each_netdev_feature(upper_disables, feature_bit) {
>> feature = __NETIF_F_BIT(feature_bit);
>> if (!(features & feature) && (lower->features &
>> feature)) {
>> netdev_dbg(upper, "Disabling feature %pNF on
>> lower dev %s.\n",
>> &feature, lower->name);
>> lower->wanted_features &= ~feature;
>> __netdev_update_features(lower);
>>
>> if (unlikely(lower->features & feature))
>> netdev_WARN(upper, "failed to disable
>> %pNF on %s!\n",
>> &feature, lower->name);
>> else
>>
netdev_features_change(lower);<-----HERE
>> }
>> }
>> }
>>
>> void netdev_features_change(struct net_device *dev)
>> {
>> call_netdevice_notifiers(NETDEV_FEAT_CHANGE, dev);
>> }
>>
>> The code looks like an iterator.
>> But it would work recursively because of notification.
>>
>> When team0's feature(LRO) is changed with <ethtool -K team0 lro off>",
>> __netdev_update_features(team0) is called.
>> __netdev_update_features(team0) internally sends NETDEV_FEAT_CHANGE
>> event to all lower interfaces(team1, team2, ... team100).
>> team1 will receive NETDEV_FEAT_CHANGE, and it sends NETDEV_FEAT_CHANGE
>> to the upper interface(team0).
>> team0 will receive NETDEV_FEAT_CHANGE again, and it sends
>> NETDEV_FEAT_CHANGE to the all lower interfaces(team1, team2, ...
>> team100).
>> (At this point, team1 flag was already set, so it will be skipped.)
>> team2 will receive NETDEV_FEAT_CHANGE, and it sends NETDEV_FEAT_CHANGE
>> to the upper interface(team0).
>> team0 will receive NETDEV_FEAT_CHANGE again again, and it sends
>> NETDEV_FEAT_CHANGE to the all lower interfaces(team1, team2, ...
>> team100).
>> (team1, team2 skipped.)
>> ...
>> So, if there are a few lower interfaces(roughly under 30 lower
>> interfaces), it anyway works even if internally works recursively.
>> But so many lower interfaces exist, stack overflow will occur.
>> This is the root cause of this issue.
>>
>> I think synchronization direction should be one way.
>> Up or Down.
>> It means that if the team0 interface can send the NETDEV_FEAT_CHANGE
>> notification event to the lower interface,
>> the lower interfaces should be disallowed to send NETDEV_FEAT_CHANGE
>> event to the upper interface.
>>
>> bonding has same issue.
>
> Excuse me, is there a fix for this issue? I had the same issue with the
> 5.10 version of the bonding.
It is not fixed, I will fix it.
I found the problem of this issue, but I couldn't find a good solution yet.
I think It would need relatively much time for fixing it.
Thanks!
Taehee Yoo
next prev parent reply other threads:[~2023-04-07 8:18 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-12 7:48 [syzbot] kernel panic: kernel stack overflow syzbot
2022-10-12 7:53 ` Dmitry Vyukov
2022-10-12 12:19 ` Eric Dumazet
2022-10-12 13:31 ` Taehee Yoo
2022-10-13 15:00 ` Taehee Yoo
2022-10-13 16:04 ` Eric Dumazet
2022-10-21 11:08 ` Taehee Yoo
2023-04-07 7:22 ` wangyufen
2023-04-07 8:17 ` Taehee Yoo [this message]
2022-10-12 13:11 ` Jiri Pirko
2022-10-12 13:54 ` Dmitry Vyukov
2022-10-12 15:08 ` Jiri Pirko
2022-10-12 16:42 ` Eric Dumazet
2022-10-13 7:11 ` Jiri Pirko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b82de9da-da16-c0de-6e93-460b53179b0f@gmail.com \
--to=ap420073@gmail.com \
--cc=davem@davemloft.net \
--cc=dvyukov@google.com \
--cc=edumazet@google.com \
--cc=jarod@redhat.com \
--cc=jiri@resnulli.us \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=syzbot+60748c96cf5c6df8e581@syzkaller.appspotmail.com \
--cc=syzkaller-bugs@googlegroups.com \
--cc=wangyufen@huawei.com \
--cc=xiyou.wangcong@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox