From: "Michael S. Tsirkin" <mst@redhat.com>
To: Ivan <ivan@prestigetransportation.com>
Cc: Willem de Bruijn <willemb@google.com>,
virtualization@lists.linux-foundation.org,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>,
"David S. Miller" <davem@davemloft.net>
Subject: Re: PROBLEM: virtio_net LRO kernel panics
Date: Tue, 10 Aug 2021 11:31:16 -0400 [thread overview]
Message-ID: <20210810112147-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <CACFia2f8xmOwB69Cj+OUNobNSurVnrLrJFdrxnmurww9aSzJMw@mail.gmail.com>
On Mon, Aug 02, 2021 at 04:23:12PM -0500, Ivan wrote:
> On Mon, Aug 2, 2021 at 2:52 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Aug 02, 2021 at 01:32:05PM -0500, Ivan wrote:
> > > On Tue, Jul 27, 2021 at 4:11 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Mon, Jul 26, 2021 at 07:44:43PM -0500, Ivan wrote:
> > > > > On Sat, Jul 24, 2021 at 11:18 PM Ivan <ivan@prestigetransportation.com> wrote:
> > > > > >
> > > > > > On Sat, Jul 24, 2021 at 7:17 PM Ivan <ivan@prestigetransportation.com> wrote:
> > > > > > >
> > > > > > > On Fri, Jul 23, 2021 at 7:33 AM Ivan <ivan@prestigetransportation.com> wrote:
> > > > > > >>
> > > > > > >> On Fri, Jul 23, 2021 at 7:10 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > >>>
> > > > > > >>> On Fri, Jul 23, 2021 at 03:06:04AM -0500, Ivan wrote:
> > > > > > >>> > On Fri, Jul 23, 2021 at 2:59 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > >>> > >
> > > > > > >>> > > On Thu, Jul 22, 2021 at 11:50:11PM -0500, Ivan wrote:
> > > > > > >>> > > > On Thu, Jul 22, 2021 at 11:25 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > >>> > > > > 在 2021/7/23 上午10:54, Ivan 写道:
> > > > > > >>> > > > > > On Thu, Jul 22, 2021 at 9:37 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > >>> > > > > >> Does it work if you turn off lro before enabling the forwarding?
> > > > > > >>> > > > > > 0 root@NuRaid:~# ethtool -K eth0 lro off
> > > > > > >>> > > > > > Actual changes:
> > > > > > >>> > > > > > rx-lro: on [requested off]
> > > > > > >>> > > > > > Could not change any device features
> > > > > > >>> > > > >
> > > > > > >>> > > > > Ok, it looks like the device misses the VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
> > > > > > >>> > > > > which makes it impossible to change the LRO setting.
> > > > > > >>> > > > >
> > > > > > >>> > > > > Did you use qemu? If yes, what's the qemu version you've used?
> > > > > > >>> > > >
> > > > > > >>> > > > These are VirtualBox machines, which I've been using for years with
> > > > > > >>> > > > longterm kernels 4.19, and I never had such a problem. But now that I
> > > > > > >>> > > > tried upgrading to kernels 5.10 or 5.13 -- the panics started. These
> > > > > > >>> > > > are just generic kernel builds, and a minimalistic userspace.
> > > > > > >>> > >
> > > > > > >>> > > I would be useful to see the features your virtualbox instance provides
> > > > > > >>> > >
> > > > > > >>> > > cat /sys/class/net/eth0/device/features
> > > > > > >>> >
> > > > > > >>> > # cat /sys/class/net/eth0/device/features
> > > > > > >>> > 1100010110111011111100000000000000000000000000000000000000000000
> > > > > > >>>
> > > > > > >>> I was able to reproduce the warning but not the panic.
> > > > > > >>> OTOH if LRO stays on when enabling forwarding that
> > > > > > >>> is already a problem. Any chance you can bisect to
> > > > > > >>> find out which change introduced the panic?
> > > > > > >>
> > > > > > >>
> > > > > > >> Any kernels up to 4.19.198 don't panic.
> > > > > > >> Any kernels 5.10+ panic immediately upon starting forwarding.
> > > > > > >> I have not tested any kernels between 4.19 and 5.10.
> > > > > > >> I guess I can build a few kernels inbetween, and try pinpoint where it starts.
> > > > > > >> That may take a day or so. I'll get on with it now, and report my findings.
> > > > > > >
> > > > > > > So, I narrowed it down: the panics start with kernel 5.0-rc.
> > > > > >
> > > > > > More narowly, the problem seems be coming from commit
> > > > > > a02e8964eaf9271a8a5fcc0c55bd13f933bafc56.
> > > > > > Just to test my suspicion, I deleted a few lines from that code,
> > > > > > and the panic went away. Hope that helps you guys figure out
> > > > > > what the problem might be.
> > > >
> > > > Well it disables LRO but we knew this :( I'd help if we knew
> > > > where does it panic, all we see it the warning which is
> > > > related for sure but not the immediate rootcause ...
> > > >
> > > > > >
> > > > > > --- a/drivers/net/virtio_net.c
> > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > @@ -2978,11 +2978,6 @@
> > > > > > }
> > > > > > if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
> > > > > > dev->features |= NETIF_F_RXCSUM;
> > > > > > - if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> > > > > > - virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6))
> > > > > > - dev->features |= NETIF_F_LRO;
> > > > > > - if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS))
> > > > > > - dev->hw_features |= NETIF_F_LRO;
> > > > > >
> > > > > > dev->vlan_features = dev->features;
> > > > >
> > > > > Just FYI, Google turned up two similar bug reposts...
> > > > > Apr 14, 2020 -- https://github.com/containers/podman/issues/5815
> > > > > Oct 09. 2020 -- https://bugzilla.kernel.org/show_bug.cgi?id=209593
> > > > >
> > > > > Is there any sensible thing I could do, temporarily, until this
> > > > > problem is sorted out?
> > > > > Or am I simply stuck to kernels 4.19 on these machines for now?
> > > >
> > > >
> > > > Something like this I guess:
> > > >
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index 8a58a2f013af..cc5982193a40 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -3063,6 +3063,8 @@ static int virtnet_validate(struct virtio_device *vdev)
> > > > __virtio_clear_bit(vdev, VIRTIO_NET_F_MTU);
> > > > }
> > > >
> > > > + __virtio_clear_bit(vdev, VIRTIO_NET_F_GUEST_TSO4);
> > > > + __virtio_clear_bit(vdev, VIRTIO_NET_F_GUEST_TSO6);
> > > > return 0;
> > > > }
> > >
> > > When I apply your patch, then I see drastic (more than half)
> > > reductions in speed. (confirmed with iperf).
> > >
> > > But if instead I just remove a few lines from commit
> > > a02e8964eaf9271a8a5fcc0c55bd13f933bafc56
> > > as in my earlier post, then I'm back to full speed
> > >
> > > I understand that this is just temporary workaround, until we figure this out.
> >
> >
> > Oh weird. So it's not about getting some weird LRO packet. We will get it with
> > VIRTIO_NET_F_GUEST_TSO4 anyway. It's about the LRO flag being set in
> > features.
> >
> > How about this then? Just pretend to Linux that we disabled LRO.
> >
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 8a58a2f013af..8e7e4cea176b 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -2651,8 +2651,9 @@ static int virtnet_set_features(struct net_device *dev,
> > ~GUEST_OFFLOAD_LRO_MASK;
> >
> > err = virtnet_set_guest_offloads(vi, offloads);
> > - if (err)
> > - return err;
> > + WARN_ON(err);
> > + //if (err)
> > + // return err;
> > vi->guest_offloads = offloads;
> > }
>
> No. With this applied, the problem persists:
>
> # echo "1" > /proc/sys/net/ipv4/ip_forward
>
> kernel: ------------[ cut here ]------------
> kernel: netdevice: eth0: failed to disable LRO!
> kernel: WARNING: CPU: 0 PID: 452 at net/core/dev.c:1768
> dev_disable_lro+0x108/0x150
> kernel: Modules linked in: sg nls_iso8859_1 nls_cp437 vfat fat
> hid_generic usbhid hid virtio_net net_failover failover aesni_intel
> libaes crypto_simd ohci_pci ahci libahci cryptd rapl ehci_pci ohci_hcd
> ehci_hcd usbcore usb_common libata evdev lpc_ich mfd_core rng_core
> i2c_piix4 i2c_core virtio_pci virtio_pci_modern_dev virtio_ring virtio
> rtc_cmos atkbd libps2 i8042 serio battery ac button loop unix
> kernel: CPU: 0 PID: 452 Comm: bash Not tainted 5.13.7-gnu.1-NuMini #1
> kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
> VirtualBox 12/01/2006
> kernel: RIP: 0010:dev_disable_lro+0x108/0x150
Again the warning isn't a big deal. I agree we should address - Jason
any update? But the main issue is you lose connectivity. That still
persists with this? Can't you get a serial connection
out? I know qemu Did the kernel oops afterwards?
--
MST
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
next prev parent reply other threads:[~2021-08-10 15:31 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CACFia2dwacaVVYD+1uG=CDGaJqdCOSBvZ5FcXp04caecaWAY3w@mail.gmail.com>
2021-07-23 1:28 ` PROBLEM: virtio_net LRO kernel panics Tonghao Zhang
[not found] ` <CACFia2fDZHUZB5wJ7LK8M2sv_+W58rHw0NzzrwPPoX9=s7yPdQ@mail.gmail.com>
2021-07-23 2:37 ` Jason Wang
[not found] ` <CACFia2eLCJuy=w1r20691s_cSYkBkPaY-Dbd-9CkrcpSAe7z6g@mail.gmail.com>
2021-07-23 4:25 ` Jason Wang
[not found] ` <CACFia2eH3eCZxtt70LB5zoPbhLXRv=crPh5oOhR=6mY3auDdQA@mail.gmail.com>
2021-07-23 7:59 ` Michael S. Tsirkin
[not found] ` <CACFia2fYQG4Y3_ffym06C1HGrOiOS38YWxuoUu4HYorwS9qOjA@mail.gmail.com>
2021-07-23 8:59 ` Michael S. Tsirkin
[not found] ` <CACFia2fWhWKMGF3g8SfU++2-jQ1rCKtCJo3h08KmhGfMTuZaQQ@mail.gmail.com>
2021-07-23 8:13 ` Michael S. Tsirkin
2021-07-23 12:10 ` Michael S. Tsirkin
[not found] ` <CACFia2en0JJDFyz3Umk-JTnMT=kjvRogt4PudED4kiLeMjcHFg@mail.gmail.com>
[not found] ` <CACFia2fx7Lt-4o_uqDznvk-VgdsMtD64qv6RYkrCjKLu2yt8bg@mail.gmail.com>
[not found] ` <CACFia2eUi4KNRC7MYktzUS9Nq2WcBiesX04Tbn2pTuvuGkY4qA@mail.gmail.com>
[not found] ` <CACFia2dns1rTe5OQj4H-kpurVm2CTtGfAXz0aOUS0_cs0QUrsA@mail.gmail.com>
2021-07-27 9:11 ` Michael S. Tsirkin
[not found] ` <CACFia2dLp19pzJsScSvVYREpQm0n6XCWLieWXzA94=OVYVHTbw@mail.gmail.com>
2021-08-02 19:51 ` Michael S. Tsirkin
[not found] ` <CACFia2f8xmOwB69Cj+OUNobNSurVnrLrJFdrxnmurww9aSzJMw@mail.gmail.com>
2021-08-10 15:31 ` Michael S. Tsirkin [this message]
2021-08-11 3:38 ` Jason Wang
2021-08-11 7:39 ` Michael S. Tsirkin
2021-08-11 7:45 ` Jason Wang
2021-08-11 8:01 ` Michael S. Tsirkin
2021-08-11 8:17 ` Jason Wang
2021-07-30 11:42 ` Michael S. Tsirkin
2021-07-30 11:42 ` Michael S. Tsirkin
2021-07-30 17:04 ` Ivan
2021-07-31 20:53 ` Michael S. Tsirkin
2021-07-31 20:53 ` Michael S. Tsirkin
2021-07-31 23:52 ` Ivan
2021-08-02 4:35 ` Jason Wang
2021-08-02 4:35 ` Jason Wang
2021-08-02 18:16 ` Ivan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210810112147-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=ivan@prestigetransportation.com \
--cc=kuba@kernel.org \
--cc=virtualization@lists.linux-foundation.org \
--cc=willemb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.