qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Maxim Levitsky <mlevitsk@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>,
	Eric Dumazet <eric.dumazet@gmail.com>
Cc: "J. Bruce Fields" <bfields@redhat.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	netdev@vger.kernel.org, David Gilbert <dgilbert@redhat.com>,
	qemu-devel@nongnu.org, Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: TCP/IP connections sometimes stop retransmitting packets (in nested virtualization case)
Date: Tue, 19 Oct 2021 01:12:32 +0300	[thread overview]
Message-ID: <201eede7763cc364ca9c24b6b5810624e7db9de1.camel@redhat.com> (raw)
In-Reply-To: <20211018164839-mutt-send-email-mst@kernel.org>

On Mon, 2021-10-18 at 16:49 -0400, Michael S. Tsirkin wrote:
> On Mon, Oct 18, 2021 at 11:05:23AM -0700, Eric Dumazet wrote:
> > 
> > On 10/17/21 3:50 AM, Maxim Levitsky wrote:
> > > Hi!
> > >  
> > > This is a follow up mail to my mail about NFS client deadlock I was trying to debug last week:
> > > https://lore.kernel.org/all/e10b46b04fe4427fa50901dda71fb5f5a26af33e.camel@redhat.com/T/#u
> > >  
> > > I strongly believe now that this is not related to NFS, but rather to some issue in networking stack and maybe
> > > to somewhat non standard .config I was using for the kernels which has many advanced networking options disabled
> > > (to cut on compile time).
> > > This is why I choose to start a new thread about it.
> > >  
> > > Regarding the custom .config file, in particular I disabled CONFIG_NET_SCHED and CONFIG_TCP_CONG_ADVANCED. 
> > > Both host and the fedora32 VM run the same kernel with those options disabled.
> > > 
> > > 
> > > My setup is a VM (fedora32) which runs Win10 HyperV VM inside, nested, which in turn runs a fedora32 VM
> > > (but I was able to reproduce it with ordinary HyperV disabled VM running in the same fedora 32 VM)
> > >  
> > > The host is running a NFS server, and the fedora32 VM runs a NFS client which is used to read/write to a qcow2 file
> > > which contains the disk of the nested Win10 VM. The L3 VM which windows VM optionally
> > > runs, is contained in the same qcow2 file.
> > > 
> > > 
> > > I managed to capture (using wireshark) packets around the failure in both L0 and L1.
> > > The trace shows fair number of lost packets, a bit more than I would expect from communication that is running on the same host, 
> > > but they are retransmitted and don't cause any issues until the moment of failure.
> > > 
> > > 
> > > The failure happens when one packet which is sent from host to the guest,
> > > is not received by the guest (as evident by the L1 trace, and by the following SACKS from the guest which exclude this packet), 
> > > and then the host (on which the NFS server runs) never attempts to re-transmit it.
> > > 
> > > 
> > > The host keeps on sending further TCP packets with replies to previous RPC calls it received from the fedora32 VM,
> > > with an increasing sequence number, as evident from both traces, and the fedora32 VM keeps on SACK'ing those received packets, 
> > > patiently waiting for the retransmission.
> > >  
> > > After around 12 minutes (!), the host RSTs the connection.
> > > 
> > > It is worth mentioning that while all of this is happening, the fedora32 VM can become hung if one attempts to access the files 
> > > on the NFS share because effectively all NFS communication is blocked on TCP level.
> > > 
> > > I attached an extract from the two traces  (in L0 and L1) around the failure up to the RST packet.
> > > 
> > > In this trace the second packet with TCP sequence number 1736557331 (first one was empty without data) is not received by the guest
> > > and then never retransmitted by the host.
> > > 
> > > Also worth noting that to ease on storage I captured only 512 bytes of each packet, but wireshark
> > > notes how many bytes were in the actual packet.
> > >  
> > > Best regards,
> > > 	Maxim Levitsky
> > 
> > TCP has special logic not attempting a retransmit if it senses the prior
> > packet has not been consumed yet.
> > 
> > Usually, the consume part is done from NIC drivers at TC completion time,
> > when NIC signals packet has been sent to the wire.
> > 
> > It seems one skb is essentially leaked somewhere, and leaked (not freed)
> 
> Thanks Eric!
> 
> Maxim since the packets that leak are transmitted on the host,
> the question then is what kind of device do you use on the host
> to talk to the guest? tap?
> 
> 
Yes, tap with bridge, similiar to how libvirt does 'bridge' networking for vms.
I use my own set of scripts to run qemu directly.

Usually vhost is used in both L0 and L1, and it 'seems' to help to reproduce it,
but I did reproduced this with vhost disabled on both L0 and L1.

The capture was done on the bridge interface on L0, and on a virtual network card in L1.

It does seem that I am unable to make it fail again (maybe luck?) with CONFIG_NET_SCHED (and its suboptions)
and CONFIG_TCP_CONG_ADVANCED set back to defaults (everything 'm')

Also just to avoid going on the wrong path, note that I did once reproduce this on e1000e virtual nic,
thus virtio is likely not to blame here.


Thanks,
Best regards,
	Maxim Levitsky



      reply	other threads:[~2021-10-18 22:14 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-17 10:50 TCP/IP connections sometimes stop retransmitting packets (in nested virtualization case) Maxim Levitsky
2021-10-18 18:05 ` Eric Dumazet
2021-10-18 20:49   ` Michael S. Tsirkin
2021-10-18 22:12     ` Maxim Levitsky [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201eede7763cc364ca9c24b6b5810624e7db9de1.camel@redhat.com \
    --to=mlevitsk@redhat.com \
    --cc=bfields@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=eric.dumazet@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).