public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Taehee Yoo <ap420073@gmail.com>
To: wangyufen <wangyufen@huawei.com>,
	Eric Dumazet <edumazet@google.com>,
	Dmitry Vyukov <dvyukov@google.com>,
	Jarod Wilson <jarod@redhat.com>
Cc: syzbot <syzbot+60748c96cf5c6df8e581@syzkaller.appspotmail.com>,
	"Jiří Pírko" <jiri@resnulli.us>,
	davem@davemloft.net, kuba@kernel.org,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	pabeni@redhat.com, syzkaller-bugs@googlegroups.com,
	"Cong Wang" <xiyou.wangcong@gmail.com>
Subject: Re: [syzbot] kernel panic: kernel stack overflow
Date: Fri, 7 Apr 2023 17:17:37 +0900	[thread overview]
Message-ID: <b82de9da-da16-c0de-6e93-460b53179b0f@gmail.com> (raw)
In-Reply-To: <1ddf7fc8-bcb3-ab48-4894-24158e8a9d0f@huawei.com>

Hi wangyufen,

On 2023. 4. 7. 오후 4:22, wangyufen wrote:
 >
 >
 > 在 2022/10/21 19:08, Taehee Yoo 写道:
 >> Hi,
 >>
 >> 2022. 10. 14. 오전 12:00에 Taehee Yoo 이(가) 쓴 글:
 >>  > Hi,
 >>  >
 >>  > On 10/12/22 21:19, Eric Dumazet wrote:
 >>  >  > On Wed, Oct 12, 2022 at 12:53 AM Dmitry Vyukov 
<dvyukov@google.com>
 >>  > wrote:
 >>  >  >>
 >>  >  >> On Wed, 12 Oct 2022 at 09:48, syzbot
 >>  >  >> <syzbot+60748c96cf5c6df8e581@syzkaller.appspotmail.com> wrote:
 >>  >  >>>
 >>  >  >>> Hello,
 >>  >  >>>
 >>  >  >>> syzbot found the following issue on:
 >>  >  >>>
 >>  >  >>> HEAD commit:    bbed346d5a96 Merge branch 'for-next/core' into
 >>  > for-kernelci
 >>  >  >>> git tree:
 >>  > git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git
 >> for-kernelci
 >>  >  >>> console output:
 >>  > https://syzkaller.appspot.com/x/log.txt?x=14a03a2a880000
 >>  >  >>> kernel config:
 >>  > https://syzkaller.appspot.com/x/.config?x=aae2d21e7dd80684
 >>  >  >>> dashboard link:
 >>  > https://syzkaller.appspot.com/bug?extid=60748c96cf5c6df8e581
 >>  >  >>> compiler:       Debian clang version
 >>  > 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld
 >>  > (GNU Binutils for Debian) 2.35.2
 >>  >  >>> userspace arch: arm64
 >>  >  >>>
 >>  >  >>> Unfortunately, I don't have any reproducer for this issue yet.
 >>  >  >>>
 >>  >  >>> Downloadable assets:
 >>  >  >>> disk image:
 >>  >
 >> 
https://storage.googleapis.com/syzbot-assets/11078f50b80b/disk-bbed346d.raw.xz
 >>  >
 >>  >  >>> vmlinux:
 >>  >
 >> 
https://storage.googleapis.com/syzbot-assets/398e5f1e6c84/vmlinux-bbed346d.xz
 >>  >
 >>  >  >>>
 >>  >  >>> IMPORTANT: if you fix the issue, please add the following tag to
 >>  > the commit:
 >>  >  >>> Reported-by:
 >> syzbot+60748c96cf5c6df8e581@syzkaller.appspotmail.com
 >>  >  >>
 >>  >  >> +Jiri
 >>  >  >>
 >>  >  >> It looks like the issue is with the team device. It seems to call
 >>  >  >> itself infinitely.
 >>  >  >> team_device_event was mentioned in stack overflow bugs in the
 >> past:
 >>  >  >>
 >>  >
 >> 
https://groups.google.com/g/syzkaller-bugs/search?q=%22team_device_event%22
 >>  >  >>
 >>  >  >
 >>  >  >
 >>  >  > Taehee Yoo, can you take a look ?
 >>  >  >
 >>  >  > Patch series of yours was supposed to limit max nest level to 8
 >>  >  >
 >>  >  >
 >>  >
 >> 
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=65921376425fc9c8b7ce647e1f7989f7cdf5dd70
 >>  >
 >>  >  >
 >>  >
 >>  > I found a reproducer.
 >>  >
 >>  > #test_team.sh
 >>  > ip link add dummy0 type dummy
 >>  > ip link set dummy0 up
 >>  > for a1 in {0..1}
 >>  > do
 >>  >          ip link add team$a1 type team
 >>  >          for a2 in {0..1}
 >>  >          do
 >>  >                  ip link add team$a1$a2 master team$a1 type team
 >>  >                  for a3 in {0..1}
 >>  >                  do
 >>  >                          ip link add team$a1$a2$a3 master team$a1$a2
 >>  > type team
 >>  >                          for a4 in {0..1}
 >>  >                          do
 >>  >                                  ip link add team$a1$a2$a3$a4 master
 >>  > team$a1$a2$a3 type team
 >>  >                                  for a5 in {0..1}
 >>  >                                  do
 >>  >                                          ip link add
 >> team$a1$a2$a3$a4$a5
 >>  > master team$a1$a2$a3$a4 type team
 >>  >                                          for a6 in {0..1}
 >>  >                                          do
 >>  >                                                  ip link add
 >>  > team$a1$a2$a3$a4$a5$a6 master team$a1$a2$a3$a4$a5 type team
 >>  >                                                  ip link add
 >>  > macvlan$a1$a2$a3$a4$a5$a6 link dummy0 master team$a1$a2$a3$a4$a5$a6
 >> type
 >>  > macvlan
 >>  >                                                  ip link set
 >>  > macvlan$a1$a2$a3$a4$a5$a6 up
 >>  >                                                  ip link set
 >>  > team$a1$a2$a3$a4$a5$a6 up
 >>  >                                          done
 >>  >                                          ip link set
 >> team$a1$a2$a3$a4$a5 up
 >>  >                                  done
 >>  >                                  ip link set team$a1$a2$a3$a4 up
 >>  >                          done
 >>  >                          ip link set team$a1$a2$a3 up
 >>  >                  done
 >>  >                  ip link set team$a1$a2 up
 >>  >          done
 >>  >          ip link set team$a1 up
 >>  > done
 >>  >
 >>  > #test_ethtool.sh
 >>  > for a1 in {0..1}
 >>  > do
 >>  >          ethtool -K team$a1 lro $1
 >>  >          for a2 in {0..1}
 >>  >          do
 >>  >                  ethtool -K team$a1$a2 lro $1
 >>  >                  for a3 in {0..1}
 >>  >                  do
 >>  >                          ethtool -K team$a1$a2$a3 lro $1
 >>  >                          for a4 in {0..1}
 >>  >                          do
 >>  >                                  ethtool -K team$a1$a2$a3$a4 lro $1
 >>  >                                  for a5 in {0..1}
 >>  >                                  do
 >>  >                                          ethtool -K
 >> team$a1$a2$a3$a4$a5
 >>  > lro $1
 >>  >                                          for a6 in {0..1}
 >>  >                                          do
 >>  >                                                  ethtool -K
 >>  > team$a1$a2$a3$a4$a5$a6 lro $1
 >>  >                                                  ethtool -K
 >>  > macvlan$a1$a2$a3$a4$a5$a6 lro $1
 >>  >                                          done
 >>  >                                  done
 >>  >                          done
 >>  >                  done
 >>  >          done
 >>  > done
 >>  >
 >>  > shell#1
 >>  > bash test_team.sh
 >>  > while :
 >>  > do
 >>  > bash test_ethtool.sh on
 >>  > done
 >>  > shell#2
 >>  > while :
 >>  > do
 >>  > bash test_ethtool.sh off
 >>  > done
 >>  >
 >>  > We can see a very similar call trace with the above reproducer.
 >>  > I think it is the same issue.
 >>  > Could you please test it?
 >>  >
 >>  > And, I found the fixed same issue too.
 >>  >
 >> 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.0&id=dd912306ff008891c82cd9f63e8181e47a9cb2fb
 >>  >
 >>  >
 >> https://groups.google.com/g/syzkaller-bugs/c/-5OV1OW-dS4/m/o2Oq6AYSAwAJ
 >>  >
 >>
 >> I found the root cause of this issue.
 >>
 >> This is simpler reproducer.
 >>
 >> ip link add team0 type team
 >> ethtool -K team0 lro on
 >> for i in {1..100}
 >> do
 >>          ip link add team$i master team0 type team
 >>          ethtool -K team$i lro on
 >> done
 >>
 >> ethtool -K team0 lro off
 >>
 >> The above graph is like below:
 >>         team0
 >>           |
 >>    +------+------+-----+-----+
 >>    |      |      |     |     |
 >> team1  team2  team3  ...  team100
 >>
 >> int __netdev_update_features(struct net_device *dev)
 >> {
 >>          struct net_device *upper, *lower;
 >>          netdev_features_t features;
 >>          struct list_head *iter;
 >>          int err = -1;
 >> ...
 >> sync_lower:
 >>          /* some features must be disabled on lower devices when 
disabled
 >>           * on an upper device (think: bonding master or bridge)
 >>           */
 >>          netdev_for_each_lower_dev(dev, lower, iter)
 >>                  netdev_sync_lower_features(dev, lower, features);
 >> ...
 >>
 >>
 >> static void netdev_sync_lower_features(struct net_device *upper,
 >>          struct net_device *lower, netdev_features_t features)
 >> {
 >>          netdev_features_t upper_disables = NETIF_F_UPPER_DISABLES;
 >>          netdev_features_t feature;
 >>          int feature_bit;
 >>
 >>          for_each_netdev_feature(upper_disables, feature_bit) {
 >>                  feature = __NETIF_F_BIT(feature_bit);
 >>                  if (!(features & feature) && (lower->features &
 >> feature)) {
 >>                          netdev_dbg(upper, "Disabling feature %pNF on
 >> lower dev %s.\n",
 >>                                     &feature, lower->name);
 >>                          lower->wanted_features &= ~feature;
 >>                          __netdev_update_features(lower);
 >>
 >>                          if (unlikely(lower->features & feature))
 >>                                  netdev_WARN(upper, "failed to disable
 >> %pNF on %s!\n",
 >>                                              &feature, lower->name);
 >>                          else
 >> 
netdev_features_change(lower);<-----HERE
 >>                  }
 >>          }
 >> }
 >>
 >> void netdev_features_change(struct net_device *dev)
 >> {
 >>          call_netdevice_notifiers(NETDEV_FEAT_CHANGE, dev);
 >> }
 >>
 >> The code looks like an iterator.
 >> But it would work recursively because of notification.
 >>
 >> When team0's feature(LRO) is changed with <ethtool -K team0 lro off>",
 >> __netdev_update_features(team0) is called.
 >> __netdev_update_features(team0) internally sends NETDEV_FEAT_CHANGE
 >> event to all lower interfaces(team1, team2, ... team100).
 >> team1 will receive NETDEV_FEAT_CHANGE, and it sends NETDEV_FEAT_CHANGE
 >> to the upper interface(team0).
 >> team0 will receive NETDEV_FEAT_CHANGE again, and it sends
 >> NETDEV_FEAT_CHANGE to the all lower interfaces(team1, team2, ...
 >> team100).
 >> (At this point, team1 flag was already set, so it will be skipped.)
 >> team2 will receive NETDEV_FEAT_CHANGE, and it sends NETDEV_FEAT_CHANGE
 >> to the upper interface(team0).
 >> team0 will receive NETDEV_FEAT_CHANGE again again, and it sends
 >> NETDEV_FEAT_CHANGE to the all lower interfaces(team1, team2, ...
 >> team100).
 >> (team1, team2 skipped.)
 >> ...
 >> So, if there are a few lower interfaces(roughly under 30 lower
 >> interfaces), it anyway works even if internally works recursively.
 >> But so many lower interfaces exist, stack overflow will occur.
 >> This is the root cause of this issue.
 >>
 >> I think synchronization direction should be one way.
 >> Up or Down.
 >> It means that if the team0 interface can send the NETDEV_FEAT_CHANGE
 >> notification event to the lower interface,
 >> the lower interfaces should be disallowed to send NETDEV_FEAT_CHANGE
 >> event to the upper interface.
 >>
 >> bonding has same issue.
 >
 > Excuse me, is there a fix for this issue? I had the same issue with the
 > 5.10 version of the bonding.

It is not fixed, I will fix it.
I found the problem of this issue, but I couldn't find a good solution yet.
I think It would need relatively much time for fixing it.

Thanks!
Taehee Yoo

  reply	other threads:[~2023-04-07  8:18 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-12  7:48 [syzbot] kernel panic: kernel stack overflow syzbot
2022-10-12  7:53 ` Dmitry Vyukov
2022-10-12 12:19   ` Eric Dumazet
2022-10-12 13:31     ` Taehee Yoo
2022-10-13 15:00     ` Taehee Yoo
2022-10-13 16:04       ` Eric Dumazet
2022-10-21 11:08       ` Taehee Yoo
2023-04-07  7:22         ` wangyufen
2023-04-07  8:17           ` Taehee Yoo [this message]
2022-10-12 13:11   ` Jiri Pirko
2022-10-12 13:54     ` Dmitry Vyukov
2022-10-12 15:08       ` Jiri Pirko
2022-10-12 16:42         ` Eric Dumazet
2022-10-13  7:11           ` Jiri Pirko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b82de9da-da16-c0de-6e93-460b53179b0f@gmail.com \
    --to=ap420073@gmail.com \
    --cc=davem@davemloft.net \
    --cc=dvyukov@google.com \
    --cc=edumazet@google.com \
    --cc=jarod@redhat.com \
    --cc=jiri@resnulli.us \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=syzbot+60748c96cf5c6df8e581@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=wangyufen@huawei.com \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox