netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Darren Kenny <darren.kenny@oracle.com>
To: "Linux regression tracking (Thorsten Leemhuis)"
	<regressions@leemhuis.info>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>,
	Jason Wang <jasowang@redhat.com>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	virtualization@lists.linux.dev,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	Linux kernel regressions list <regressions@lists.linux.dev>
Subject: Re: [PATCH RFC 0/3] Revert "virtio_net: rx enable premapped mode by default"
Date: Thu, 15 Aug 2024 11:22:09 +0100	[thread overview]
Message-ID: <m2r0aqrsq6.fsf@oracle.com> (raw)
In-Reply-To: <a6ec1c84-428f-41b7-9a57-183f2aeca289@leemhuis.info>


On Thursday, 2024-08-15 at 09:14:27 +02, Linux regression tracking (Thorsten Leemhuis) wrote:
> [side note: the message I have been replying to at least when downloaded
> from lore has two message-ids, one of them identical two a older
> message, which is why this looks odd in the lore archives:
> https://lore.kernel.org/all/20240511031404.30903-1-xuanzhuo@linux.alibaba.com/]
>

Yes, I saw that too, hence I responded to patch 1 in the series, rather
than the cover letter.

> On 14.08.24 08:59, Michael S. Tsirkin wrote:
>> Note: Xuan Zhuo, if you have a better idea, pls post an alternative
>> patch.
>> 
>> Note2: untested, posting for Darren to help with testing.
>> 
>> Turns out unconditionally enabling premapped 
>> virtio-net leads to a regression on VM with no ACCESS_PLATFORM, and with
>> sysctl net.core.high_order_alloc_disable=1
>> 
>> where crashes and scp failures were reported (scp a file 100M in size to VM):
>> [...]
>
> TWIMC, there is a regression report on lore and I wonder if this might
> be related or the same problem, as it also mentioned a "get_swap_device:
> Bad swap file entry" error:
> https://bugzilla.kernel.org/show_bug.cgi?id=219154
>

I took a look at the stack traces, they don't look similar to what I was
seeing, but I wasn't running with an ASAN enabled in the kernel.

Most of the traces that I was seeing would look like as in the e-mail
from Si-Wei:

  https://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com/

We could trigger it only when the sysctl value was set like:

- net.core.high_order_alloc_disable=1

And it would immediately panic on any relatively large download, e.g.
wget of a few RPMS, or similar.

Best I can suggest would be to try reverting them in a custom kernel
and see if it fixes this problem too.

Thanks,

Darren.

> To quote:
>
> """
> Hello,
>
> I've encountered repeated crashes or freezes when a KVM VM receives
> large amounts of data over the network while the system is under memory
> load and performing I/O operations. The crashes sometimes occur in the
> filesystem code (ext4 and btrfs, at least), but they also happen in
> other locations.
>
> This issue occurs on my custom builds using kernel versions v6.10 to
> v6.11-rc2, with virtio network and disk drivers, and either Ubuntu 22.04
> or Debian 12 user space.
>
> The same kernel build did not crash on an Azure VM, which does not use
> the virtio network driver. Since this issue only appears when receiving
> data, I suspect there could be an issue related to the virtio interface
> or receive buffer handling.
>
> This issue did not occur on the Debian backport kernel 6.9.7-1~bpo12+1
> amd64.
>
> Steps to Reproduce:
> 1. Setup a small VM on a KVM host.
>    I tested this on an x86_64 KVM VM with 1 CPU, 512 MB RAM, 2 GB SWAP
> (the smallest configuration from Vultr), using a Debian 12 user space,
> virtio disk, and virtio net.
> 2. Induce high memory and I/O load. Run the following command:
>    stress --vm 2 --hdd 1
>    (Adjust --vm to to occupy all the RAM)
>    This slows down the system but does not cause a crash.
> 3. Send large data to the VM.
>    I used `iperf3 -s` on the VM and sent data using `iperf3 -c` from
> another host. The system crashes within a few seconds to a few minutes.
> (The reverse direction `iperf3 -c -R` did not cause a crash.)
>
>
> The OOPS messages are mostly general protection faults, but sometimes I
> see "Bad pagetable" or other errors, such as:
> Oops: general protection fault, probably for non-canonical address
> 0x2f9b7fa5e2bde696: 0000 [#1] PREEMPT SMP PTI
> Oops: Oops: 0000 [#1] PREEMPT SMP PTI
> Oops: Bad pagetable: 000d [#1] PREEMPT SMP PTI
>
> In some cases, dmesg contains something like:
> UBSAN: shift-out-of-bounds in lib/xarray.c:158:34
>
> When the system freezes without crash, I sometimes found BUGON messages
> in some cases, such as:
> get_swap_device: Bad swap file entry 3403b0f5b2584992
> BUG: Bad page map in process stress  pte:c42f93fac0299e1d pmd:0d9b2047
> BUG: Bad rss-counter-state mm:000000004df3dd9a type:MM_ANONPAGES val:2
> BUG: Bad rss-counter-state mm:000000004df3dd9a type:MM_SWAPENTS val:-1
>
> Thanks.
> """
>
> Ciao, Thorsten

  reply	other threads:[~2024-08-15 10:23 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-11  3:14 [PATCH net-next v5 0/4] virtio_net: rx enable premapped mode by default Xuan Zhuo
2024-05-11  3:14 ` [PATCH net-next v5 1/4] virtio_ring: enable premapped mode whatever use_dma_api Xuan Zhuo
2024-08-13 19:28   ` Si-Wei Liu
2024-08-13 19:46     ` Michael S. Tsirkin
2024-08-14  3:39       ` Si-Wei Liu
2024-08-14  7:00         ` Michael S. Tsirkin
2024-08-17 13:20     ` Xuan Zhuo
2024-08-20  1:06       ` Si-Wei Liu
2024-08-20  6:19         ` Xuan Zhuo
2024-05-11  3:14 ` [PATCH net-next v5 2/4] virtio_net: big mode skip the unmap check Xuan Zhuo
2024-05-11  3:14 ` [PATCH net-next v5 3/4] virtio_net: rx remove premapped failover code Xuan Zhuo
2024-05-11  3:14 ` [PATCH net-next v5 4/4] virtio_net: remove the misleading comment Xuan Zhuo
2024-05-14  0:20 ` [PATCH net-next v5 0/4] virtio_net: rx enable premapped mode by default patchwork-bot+netdevbpf
2024-08-14  6:59 ` [PATCH RFC 0/3] Revert "virtio_net: rx enable premapped mode by default" Michael S. Tsirkin
2024-08-15  7:14 ` Linux regression tracking (Thorsten Leemhuis)
2024-08-15 10:22   ` Darren Kenny [this message]
2024-08-16  5:03     ` Linux regression tracking (Thorsten Leemhuis)
2024-08-15 15:23   ` Michael S. Tsirkin
2024-08-15 15:28     ` Michael S. Tsirkin
  -- strict thread matches above, loose matches on Subject: below --
2024-08-14  6:59 [PATCH RFC 1/3] Revert "virtio_net: rx remove premapped failover code" Michael S. Tsirkin
2024-08-15 15:27 ` [PATCH RFC 0/3] Revert "virtio_net: rx enable premapped mode by default" Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m2r0aqrsq6.fsf@oracle.com \
    --to=darren.kenny@oracle.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=jasowang@redhat.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=regressions@leemhuis.info \
    --cc=regressions@lists.linux.dev \
    --cc=virtualization@lists.linux.dev \
    --cc=xuanzhuo@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).