From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 01505C2BA83 for ; Thu, 13 Feb 2020 11:22:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B66422073C for ; Thu, 13 Feb 2020 11:22:08 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Ig7klHiJ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729820AbgBMLWI (ORCPT ); Thu, 13 Feb 2020 06:22:08 -0500 Received: from us-smtp-2.mimecast.com ([205.139.110.61]:34231 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1729428AbgBMLWH (ORCPT ); Thu, 13 Feb 2020 06:22:07 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1581592926; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=2Sse7iKtOsTLbxsfOMaYwOFldT7RFxq2LUJ0r3ymtPM=; b=Ig7klHiJMAs8zW/4za0Moi2VY0GmCR9YDeOa+icKF/SiifB/QCV1WGwlVmFxSh05aiJP69 yAARRzpZZI57vFXN2WhMoYq0GQEsgnOn58i9WtUcJCxRyjtlRrYJ7leH7nvPBupS9oN8xD SVmY/Y2yTLlCNFrLQAMrIyhIxCXVu8I= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-142-nuWR8fmhOIuAj965etT_rw-1; Thu, 13 Feb 2020 06:22:05 -0500 X-MC-Unique: nuWR8fmhOIuAj965etT_rw-1 Received: by mail-wr1-f71.google.com with SMTP id a12so2192112wrn.19 for ; Thu, 13 Feb 2020 03:22:04 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=2Sse7iKtOsTLbxsfOMaYwOFldT7RFxq2LUJ0r3ymtPM=; b=hMxlb0KajoiCPr09YSID9m0OMrw5WW59pDc5S/KiM6HGdyunnePfTUZGBIK1Sy6PM9 VK2Edcm/8upEVEkhHVUPSztQXwRkArY2KV86uxwQL+oVQln5IcV2Ep5I7xAD+H4XXb6s aEDBeC+8P8mdUB1UWWktGSDBqz0Ayi3ELsug7f1KIny0YdRgEHs5mEhV5Hrvy/Tn4rQo idtdfbwURhkYzWAUpKLL1B966L5A+nKZ/QeX6PCELxe+r0VDi+kbuw9iMCfipYiVUv3s 0NBJ7qXFl/TUftVESHO24nWXCcgJ+PbvCxf875rh/rwYZE/zFgsk+8Eo1bPE2uiifpdV zUMw== X-Gm-Message-State: APjAAAUTRtFO6lMUUR8zsT7CB9EaLi61CMJzuDsznnwvC3MyXmq6/nQ3 TIXAdDj3GBbLxASV+cEb5xxwMm8MLZkSPfc+WgX/Rt3eYiMYdMjN8kBaOKVpvGjxMiKlTQ8GBcw WEm2YCYnBKKz0W92a X-Received: by 2002:adf:f5cb:: with SMTP id k11mr20944034wrp.63.1581592923774; Thu, 13 Feb 2020 03:22:03 -0800 (PST) X-Google-Smtp-Source: APXvYqxwZegzL4oNL0+cCxkqt2MR6p1S5jmwlAJp7qC0dpflOMm6Ti43kAWJ25piyx03GBh2NIQuxg== X-Received: by 2002:adf:f5cb:: with SMTP id k11mr20944007wrp.63.1581592923524; Thu, 13 Feb 2020 03:22:03 -0800 (PST) Received: from steredhat (host209-4-dynamic.27-79-r.retail.telecomitalia.it. [79.27.4.209]) by smtp.gmail.com with ESMTPSA id u8sm2648710wmm.15.2020.02.13.03.22.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 13 Feb 2020 03:22:02 -0800 (PST) Date: Thu, 13 Feb 2020 12:22:00 +0100 From: Stefano Garzarella To: "Boeuf, Sebastien" Cc: "stefanha@redhat.com" , "netdev@vger.kernel.org" , "davem@davemloft.net" Subject: Re: [PATCH] net: virtio_vsock: Fix race condition between bind and listen Message-ID: <20200213112200.7qv2a7em64jpyjs3@steredhat> References: <668b0eda8823564cd604b1663dc53fbaece0cd4e.camel@intel.com> <20200213094130.vehzkr4a3pnoiogr@steredhat> <3448e588f11dad913e93dfce8031fbd60ba4c85b.camel@intel.com> <20200213102237.uyhfv5g2td5ayg2b@steredhat> <1d4c3958d8b75756341548e7d51ccf42397c2d27.camel@intel.com> <20200213110251.2unj3sykwpku6dd7@steredhat> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Thu, Feb 13, 2020 at 11:13:55AM +0000, Boeuf, Sebastien wrote: > On Thu, 2020-02-13 at 12:02 +0100, Stefano Garzarella wrote: > > On Thu, Feb 13, 2020 at 10:44:18AM +0000, Boeuf, Sebastien wrote: > > > On Thu, 2020-02-13 at 11:22 +0100, Stefano Garzarella wrote: > > > > On Thu, Feb 13, 2020 at 09:51:36AM +0000, Boeuf, Sebastien wrote: > > > > > Hi Stefano, > > > > > > > > > > On Thu, 2020-02-13 at 10:41 +0100, Stefano Garzarella wrote: > > > > > > Hi Sebastien, > > > > > > > > > > > > On Thu, Feb 13, 2020 at 09:16:11AM +0000, Boeuf, Sebastien > > > > > > wrote: > > > > > > > From 2f1276d02f5a12d85aec5adc11dfe1eab7e160d6 Mon Sep 17 > > > > > > > 00:00:00 > > > > > > > 2001 > > > > > > > From: Sebastien Boeuf > > > > > > > Date: Thu, 13 Feb 2020 08:50:38 +0100 > > > > > > > Subject: [PATCH] net: virtio_vsock: Fix race condition > > > > > > > between > > > > > > > bind > > > > > > > and listen > > > > > > > > > > > > > > Whenever the vsock backend on the host sends a packet > > > > > > > through > > > > > > > the > > > > > > > RX > > > > > > > queue, it expects an answer on the TX queue. Unfortunately, > > > > > > > there > > > > > > > is one > > > > > > > case where the host side will hang waiting for the answer > > > > > > > and > > > > > > > will > > > > > > > effectively never recover. > > > > > > > > > > > > Do you have a test case? > > > > > > > > > > Yes I do. This has been a bug we've been investigating on Kata > > > > > Containers for quite some time now. This was happening when > > > > > using > > > > > Kata > > > > > along with Cloud-Hypervisor (which rely on the hybrid vsock > > > > > implementation from Firecracker). The thing is, this bug is > > > > > very > > > > > hard > > > > > to reproduce and was happening for Kata because of the > > > > > connection > > > > > strategy. The kata-runtime tries to connect a million times > > > > > after > > > > > it > > > > > started the VM, just hoping the kata-agent will start to listen > > > > > from > > > > > the guest side at some point. > > > > > > > > Maybe is related to something else. I tried the following which > > > > should be > > > > your case simplified (IIUC): > > > > > > > > guest$ python > > > > import socket > > > > s = socket.socket(socket.AF_VSOCK, socket.SOCK_STREAM) > > > > s.bind((socket.VMADDR_CID_ANY, 1234)) > > > > > > > > host$ python > > > > import socket > > > > s = socket.socket(socket.AF_VSOCK, socket.SOCK_STREAM) > > > > s.connect((3, 1234)) > > > > > > > > Traceback (most recent call last): > > > > File "", line 1, in > > > > TimeoutError: [Errno 110] Connection timed out > > > > > > Yes this is exactly the simplified case. But that's the point, I > > > don't > > > think the timeout is the best way to go here. Because this means > > > that > > > when we run into this case, the host side will wait for quite some > > > time > > > before retrying, which can cause a very long delay before the > > > communication with the guest is established. By simply answering > > > the > > > host with a RST packet, we inform it that nobody's listening on the > > > guest side yet, therefore the host side will close and try again. > > > > Yes, make sense. > > > > I just wanted to point out that the host shouldn't get stuck if the > > guest doesn't respond. So it's weird that this happens and it might > > be > > related to some other problem. > > Yes that's because we're using the hybrid vsock implementation from > Firecracker. So basically they proxy a UNIX socket connection into a > VSOCK socket. And the timeout is not implemented. I'll point them out > that's something it'd be nice to have along with this fix in the guest. > Okay, now I get it. > > > > > > > > In the host, the af_vsock.c:vsock_stream_connect() set a > > > > > > timeout, > > > > > > so > > > > > > if > > > > > > the host try to connect before the guest starts listening, > > > > > > the > > > > > > connect() > > > > > > should return ETIMEDOUT if the guest does not answer > > > > > > anything. > > > > > > > > > > > > Anyway, maybe the patch make sense anyway, changing a bit the > > > > > > description > > > > > > (if the host connect() receive the ETIMEDOUT). > > > > > > I'm just concerned that this code is common between guest and > > > > > > host. > > > > > > If a > > > > > > malicious guest starts sending us wrong requests, we spend > > > > > > time > > > > > > sending > > > > > > a reset packet. But we already do that if we can't find the > > > > > > bound > > > > > > socket, > > > > > > so it might make sense. > > > > > > > > > > Yes I don't think this is gonna cause more trouble, but at > > > > > least we > > > > > cannot end up in this weird situation I described. > > > > > > > > Okay, but in the host, we can't trust the guest to answer, we > > > > should > > > > handle this case properly. > > > > > > Well I cannot agree more with the "we cannot trust the guest" > > > philosophy, but in this case the worst thing that can happen is the > > > guest shooting himself in the foot because it would simply prevent > > > the > > > connection from happening. > > > > > > And I agree setting up a timeout from the host side is still a good > > > idea for preventing from such DOS attack. > > > > > > But as I mentioned above, in the normal use case, this allows for > > > better responsiveness when it comes to establish the connection as > > > fast > > > as possible. > > > > Sure, maybe you can rewrite a bit the commit (title and body) to > > explain > > this. > > Of course, let me work on this. > How am I supposed to resend the patch? Should I send a new email with > v2 in the title? > Yes, v2 in the []. Something like this: [PATCH net v2] vsock/virtio: ... > > > > > > > I was just not sure if the function we should use to do the > > > > > reset > > > > > should be virtio_transport_reset_no_sock() or > > > > > virtio_transport_reset() > > > > > since at this point the socket is already bound. > > > > > > > > I think you can safely use virtio_transport_reset() in this case. > > > > > > I've just tried it and unfortunately it doesn't work. I think > > > that's > > > because the connection has not been properly established yet, so we > > > cannot consider being in this case. > > > Using virtio_transport_reset_no_sock() seems like the less > > > intrusive > > > function here. > > > > Oh sorry, I also put a comment on virtio_transport_reset() to say to > > use it > > only on connected sockets and not listeners. > > In this case it's a listener, sorry for the wrong suggestion. > > Hehe no worries, at least we're on the same page :) > :-) Thanks, Stefano