From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.0 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 63597C433F5 for ; Wed, 8 Sep 2021 21:58:42 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E8709610FF for ; Wed, 8 Sep 2021 21:58:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org E8709610FF Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=nongnu.org Received: from localhost ([::1]:50840 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mO5aa-0002Oq-Pn for qemu-devel@archiver.kernel.org; Wed, 08 Sep 2021 17:58:40 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:53386) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mO5Zr-0001j8-9E for qemu-devel@nongnu.org; Wed, 08 Sep 2021 17:57:55 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:48953) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mO5Zb-0006yY-NY for qemu-devel@nongnu.org; Wed, 08 Sep 2021 17:57:55 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1631138257; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references; bh=WC5cvpfWdh+e9j5IVXrFves3DbkvURF/iUIZleuJ3CQ=; b=KM5mDYe3z71muljNrz6+GpbBV0OosfC833sBSV+epexAjWmDqGXPx3EkGFvA8nUKfGuL6S xElgih2xTMlLwGll3Jo9Rp/S1xsz2wA1Ev5yDXFvuRs5DkZ2s0jGxEyrYw1f1XmH3OwFSR 26RSN8F0Qyq9WTrAjz/0LpjQWPSvGb4= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-592-LOQX1o8nPY6sqBWoxbcr9w-1; Wed, 08 Sep 2021 17:57:23 -0400 X-MC-Unique: LOQX1o8nPY6sqBWoxbcr9w-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id A2ED88010F4; Wed, 8 Sep 2021 21:57:22 +0000 (UTC) Received: from redhat.com (unknown [10.39.192.24]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 9BCF2188E4; Wed, 8 Sep 2021 21:57:08 +0000 (UTC) Date: Wed, 8 Sep 2021 22:57:06 +0100 From: Daniel =?utf-8?B?UC4gQmVycmFuZ8Op?= To: Peter Xu Subject: Re: [PATCH v1 2/3] io: Add zerocopy and errqueue Message-ID: References: MIME-Version: 1.0 In-Reply-To: User-Agent: Mutt/2.0.7 (2021-05-04) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=berrange@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Received-SPF: pass client-ip=170.10.133.124; envelope-from=berrange@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -5 X-Spam_score: -0.6 X-Spam_bar: / X-Spam_report: (-0.6 / 5.0 requ) DKIMWL_WL_HIGH=-0.393, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Daniel =?utf-8?B?UC4gQmVycmFuZ8Op?= Cc: Elena Ufimtseva , John G Johnson , Jagannathan Raman , qemu-block@nongnu.org, Juan Quintela , "Dr. David Alan Gilbert" , qemu-devel , Leonardo Bras Soares Passos , Paolo Bonzini , =?utf-8?Q?Marc-Andr=C3=A9?= Lureau , Fam Zheng Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" On Wed, Sep 08, 2021 at 05:09:33PM -0400, Peter Xu wrote: > On Wed, Sep 08, 2021 at 05:25:50PM -0300, Leonardo Bras Soares Passos wrote: > > On Tue, Sep 7, 2021 at 8:06 AM Dr. David Alan Gilbert > > wrote: > > > > Possibly, yes. This really need David G's input since he understands > > > > the code in way more detail than me. > > > > > > Hmm I'm not entirely sure why we have the sync after each iteration; > > > the case I can think of is if we're doing async sending, we could have > > > two versions of the same page in flight (one from each iteration) - > > > you'd want those to get there in the right order. > > > > > > Dave > > > > Well, that's the thing: as we don't copy the buffer in MSG_ZEROCOPY, we will in > > fact have the same page in flight twice, instead of two versions, > > given the buffer is > > sent as it is during transmission. > > That's an interesting point, which looks even valid... :) > > There can still be two versions depending on when the page is read and feed to > the NICs as the page can be changing during the window, but as long as the > latter sent page always lands later than the former page on dest node then it > looks ok. The really strange scenario is around TCP re-transmits. The kernel may transmit page version 1, then we have version 2 written. The kerenl now needs to retransmit a packet due to network loss. The re-transmission will contain version 2 payload which differs from the originally lost packet's payload. IOW, the supposed "reliable" TCP stream is no longer reliable and actually changes its data retrospectively because we've intentionally violated the rules the kernel documented for use of MSG_ZEROCOPY. We think we're probably ok with migration as we are going to rely on the face that we eventually pause the guest to stop page changes during the final switchover. None the less I really strongly dislike the idea of not honouring the kernel API contract, despite the potential performance benefits it brings. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|