From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andre Przywara Subject: Re: Network hangs when communicating with host Date: Mon, 19 Oct 2015 10:22:01 +0100 Message-ID: <5624B639.4030603@arm.com> References: <56213324.3010901@oracle.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Cc: Sasha Levin , Pekka Enberg , Asias He , penberg@cs.helsinki.fi, Cyrill Gorcunov , Will Deacon , matt@ozlabs.org, Michael Ellerman , Prasad Joshi , marc.zyngier@arm.com, "Aneesh Kumar K.V" , mingo@elte.hu, gorcunov@openvz.org, kvm@vger.kernel.org, Kostya Serebryany , Evgenii Stepanov , Alexey Samsonov , Alexander Potapenko , syzkaller@googlegroups.com To: Dmitry Vyukov , Sasha Levin Return-path: Received: from cam-admin0.cambridge.arm.com ([217.140.96.50]:49998 "EHLO cam-admin0.cambridge.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750759AbbJSJVO (ORCPT ); Mon, 19 Oct 2015 05:21:14 -0400 In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: Hi Dmitry, On 19/10/15 10:05, Dmitry Vyukov wrote: > On Fri, Oct 16, 2015 at 7:25 PM, Sasha Levin wrote: >> On 10/15/2015 04:20 PM, Dmitry Vyukov wrote: >>> Hello, >>> >>> I am trying to run a program in lkvm sandbox so that it communicates >>> with a program on host. I run lkvm as: >>> >>> ./lkvm sandbox --disk sandbox-test --mem=2048 --cpus=4 --kernel >>> /arch/x86/boot/bzImage --network mode=user -- /my_prog >>> >>> /my_prog then connects to a program on host over a tcp socket. >>> I see that host receives some data, sends some data back, but then >>> my_prog hangs on network read. >>> >>> To localize this I wrote 2 programs (attached). ping is run on host >>> and pong is run from lkvm sandbox. They successfully establish tcp >>> connection, but after some iterations both hang on read. >>> >>> Networking code in Go runtime is there for more than 3 years, widely >>> used in production and does not have any known bugs. However, it uses >>> epoll edge-triggered readiness notifications that known to be tricky. >>> Is it possible that lkvm contains some networking bug? Can it be >>> related to the data races in lkvm I reported earlier today? Just to let you know: I think we have seen networking issues in the past - root over NFS had issues IIRC. Will spent some time on debugging this and it looked like a race condition in kvmtool's virtio implementation. I think pinning kvmtool's virtio threads to one host core made this go away. However although he tried hard (even by Will's standards!) he couldn't find a the real root cause or a fix at the time he looked at it and we found other ways to work around the issues (using virtio-blk or initrd's). So it's quite possible that there are issues. I haven't had time yet to look at your sanitizer reports, but it looks like a promising approach to find the root cause. Cheers, Andre. >>> >>> I am on commit 3695adeb227813d96d9c41850703fb53a23845eb. >> >> Hey Dmitry, >> >> How long does it take to reproduce? I've been running ping/pong as you've >> described and it looks like it doesn't get stuck (read/writes keep going >> on both sides). >> >> Can you share your guest kernel config maybe? > > > Humm.... it my setup it happens within a minute or so. > > My kernel is not completely standard, but it works with qemu without > any problems. > It is not trivial to reproduce, but FWIW I on commit > f9fbf6b72ffaaca8612979116c872c9d5d9cc1f5 of > https://github.com/dvyukov/linux/commits/coverage branch. Config file > is attached. Then, I build it with custom gcc: revision 228818 + > https://codereview.appspot.com/267910043 patch. This is all per > https://github.com/google/syzkaller instructions. > > I run lkvm as: > ./lkvm sandbox --disk sandbox-test --mem=2048 --cpus=4 --kernel > /arch/x86/boot/bzImage --network mode=user -- /pong > > kvmtool is on 3695adeb227813d96d9c41850703fb53a23845eb. > > Just tried to do the same with qemu, it does not hang. >