From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38827) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1d8OGP-0000lu-Ly for qemu-devel@nongnu.org; Wed, 10 May 2017 05:50:34 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1d8OGK-0000Nk-Uv for qemu-devel@nongnu.org; Wed, 10 May 2017 05:50:33 -0400 Received: from mga01.intel.com ([192.55.52.88]:57151) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1d8OGK-0000NQ-Mn for qemu-devel@nongnu.org; Wed, 10 May 2017 05:50:28 -0400 Message-ID: <5912E2D7.1030705@intel.com> Date: Wed, 10 May 2017 17:52:23 +0800 From: Wei Wang MIME-Version: 1.0 References: <286AC319A985734F985F78AFA26841F7391FDD30@shsmsx102.ccr.corp.intel.com> <056500d7-6a91-12e5-be1d-2b2beebd0430@redhat.com> <20170505233541-mutt-send-email-mst@kernel.org> <286AC319A985734F985F78AFA26841F7391FFC13@shsmsx102.ccr.corp.intel.com> In-Reply-To: <286AC319A985734F985F78AFA26841F7391FFC13@shsmsx102.ccr.corp.intel.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [virtio-dev] RE: virtio-net: configurable TX queue size List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael S. Tsirkin" , Jason Wang Cc: Stefan Hajnoczi , =?UTF-8?B?TWFyYy1BbmRyw6kgTHVyZQ==?= =?UTF-8?B?YXU=?= , "pbonzini@redhat.com" , "virtio-dev@lists.oasis-open.org" , "qemu-devel@nongnu.org" , Jan Scheurich On 05/07/2017 12:39 PM, Wang, Wei W wrote: > On 05/06/2017 04:37 AM, Michael S. Tsirkin wrote: >> On Fri, May 05, 2017 at 10:27:13AM +0800, Jason Wang wrote: >>> >>> On 2017年05月04日 18:58, Wang, Wei W wrote: >>>> Hi, >>>> >>>> I want to re-open the discussion left long time ago: >>>> https://lists.gnu.org/archive/html/qemu-devel/2015-11/msg06194.html >>>> , and discuss the possibility of changing the hardcoded (256) TX >>>> queue size to be configurable between 256 and 1024. >>> Yes, I think we probably need this. >>> >>>> The reason to propose this request is that a severe issue of packet >>>> drops in TX direction was observed with the existing hardcoded 256 >>>> queue size, which causes performance issues for packet drop >>>> sensitive guest applications that cannot use indirect descriptor >>>> tables. The issue goes away with 1K queue size. >>> Do we need even more, what if we find 1K is even not sufficient in the >>> future? Modern nics has size up to ~8192. >>> >>>> The concern mentioned in the previous discussion (please check the >>>> link >>>> above) is that the number of chained descriptors would exceed >>>> UIO_MAXIOV (1024) supported by the Linux. >>> We could try to address this limitation but probably need a new >>> feature bit to allow more than UIO_MAXIOV sgs. >> I'd say we should split the queue size and the sg size. >> I'm still doing some investigation about this, one question (or issue) I found from the implementation is that the virtio-net device changes the message layout when the vnet_hdr needs an endianness swap (i.e. virtio_needs_swap()). This change adds one more iov to the iov[]-s passed from the driver. To be more precise, the message from the driver could be in one of the two following layout: Layout1: iov[0]: vnet_hdr + data Layout2: iov[0]: vnet_hdr iov[1]: data If the driver passes the message in Layout1, and the following code from the device changes the message from Layout1 to Layout2: if (n->needs_vnet_hdr_swap) { virtio_net_hdr_swap(vdev, (void *) &mhdr); sg2[0].iov_base = &mhdr; sg2[0].iov_len = n->guest_hdr_len; out_num = iov_copy(&sg2[1], ARRAY_SIZE(sg2) - 1, out_sg, out_num, n->guest_hdr_len, -1); if (out_num == VIRTQUEUE_MAX_SIZE) { goto drop; } out_num += 1; out_sg = sg2; } sg2[0] is the extra one, which potentially causes the off-by-one issue. I didn't find other possibilities that can cause the issue. Could we keep the original layout by just copying the swapped "mhdr" back to original out_sg[0].iov_base? Best, Wei