From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sasha Levin Subject: Re: Network hangs when communicating with host Date: Mon, 19 Oct 2015 10:20:41 -0400 Message-ID: <5624FC39.2060708@oracle.com> References: <56213324.3010901@oracle.com> <5624B639.4030603@arm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Cc: Sasha Levin , Pekka Enberg , Asias He , penberg@cs.helsinki.fi, Cyrill Gorcunov , Will Deacon , matt@ozlabs.org, Michael Ellerman , Prasad Joshi , marc.zyngier@arm.com, "Aneesh Kumar K.V" , mingo@elte.hu, gorcunov@openvz.org, kvm@vger.kernel.org, Kostya Serebryany , Evgenii Stepanov , Alexey Samsonov , Alexander Potapenko To: Dmitry Vyukov , syzkaller@googlegroups.com Return-path: Received: from aserp1040.oracle.com ([141.146.126.69]:22514 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752691AbbJSOVP (ORCPT ); Mon, 19 Oct 2015 10:21:15 -0400 In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: On 10/19/2015 05:28 AM, Dmitry Vyukov wrote: > On Mon, Oct 19, 2015 at 11:22 AM, Andre Przywara wrote: >> Hi Dmitry, >> >> On 19/10/15 10:05, Dmitry Vyukov wrote: >>> On Fri, Oct 16, 2015 at 7:25 PM, Sasha Levin wrote: >>>> On 10/15/2015 04:20 PM, Dmitry Vyukov wrote: >>>>> Hello, >>>>> >>>>> I am trying to run a program in lkvm sandbox so that it communicates >>>>> with a program on host. I run lkvm as: >>>>> >>>>> ./lkvm sandbox --disk sandbox-test --mem=2048 --cpus=4 --kernel >>>>> /arch/x86/boot/bzImage --network mode=user -- /my_prog >>>>> >>>>> /my_prog then connects to a program on host over a tcp socket. >>>>> I see that host receives some data, sends some data back, but then >>>>> my_prog hangs on network read. >>>>> >>>>> To localize this I wrote 2 programs (attached). ping is run on host >>>>> and pong is run from lkvm sandbox. They successfully establish tcp >>>>> connection, but after some iterations both hang on read. >>>>> >>>>> Networking code in Go runtime is there for more than 3 years, widely >>>>> used in production and does not have any known bugs. However, it uses >>>>> epoll edge-triggered readiness notifications that known to be tricky. >>>>> Is it possible that lkvm contains some networking bug? Can it be >>>>> related to the data races in lkvm I reported earlier today? >> >> Just to let you know: >> I think we have seen networking issues in the past - root over NFS had >> issues IIRC. Will spent some time on debugging this and it looked like a >> race condition in kvmtool's virtio implementation. I think pinning >> kvmtool's virtio threads to one host core made this go away. However >> although he tried hard (even by Will's standards!) he couldn't find a >> the real root cause or a fix at the time he looked at it and we found >> other ways to work around the issues (using virtio-blk or initrd's). >> >> So it's quite possible that there are issues. I haven't had time yet to >> look at your sanitizer reports, but it looks like a promising approach >> to find the root cause. > > > Thanks, Andre! > > ping/pong does not hang within at least 5 minutes when I run lkvm > under taskset 1. > > And, yeah, this pretty strongly suggests a data race. ThreadSanitizer > can point you to the bug within a minute, so you just need to say > "aha! it is here". Or maybe not. There are no guarantees. But if you > already spent significant time on this, then checking the reports > definitely looks like a good idea. Okay, that's good to know. I have a few busy days, but I'll definitely try to clear up these reports as they seem to be pointing to real issues. Thanks, Sasha