From mboxrd@z Thu Jan 1 00:00:00 1970 From: Thomas Monjalon Subject: Re: [PATCH v2 0/5] virtio support for container Date: Wed, 13 Apr 2016 18:14:41 +0200 Message-ID: <1642018.IWC2Tt5SYA@xps13> References: <1446748276-132087-1-git-send-email-jianfeng.tan@intel.com> <1454671228-33284-1-git-send-email-jianfeng.tan@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7Bit Cc: dev@dpdk.org, nakajima.yoshihiro@lab.ntt.co.jp, mst@redhat.com, ann.zhuangyanying@huawei.com To: Jianfeng Tan Return-path: Received: from mail-wm0-f50.google.com (mail-wm0-f50.google.com [74.125.82.50]) by dpdk.org (Postfix) with ESMTP id 199A02952 for ; Wed, 13 Apr 2016 18:14:45 +0200 (CEST) Received: by mail-wm0-f50.google.com with SMTP id v188so183439826wme.1 for ; Wed, 13 Apr 2016 09:14:45 -0700 (PDT) In-Reply-To: <1454671228-33284-1-git-send-email-jianfeng.tan@intel.com> List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Jianfeng, Thanks for raising the container issues and proposing some solutions. General comments below. 2016-02-05 19:20, Jianfeng Tan: > This patchset is to provide high performance networking interface (virtio) > for container-based DPDK applications. The way of starting DPDK apps in > containers with ownership of NIC devices exclusively is beyond the scope. > The basic idea here is to present a new virtual device (named eth_cvio), > which can be discovered and initialized in container-based DPDK apps using > rte_eal_init(). To minimize the change, we reuse already-existing virtio > frontend driver code (driver/net/virtio/). > > Compared to QEMU/VM case, virtio device framework (translates I/O port r/w > operations into unix socket/cuse protocol, which is originally provided in > QEMU), is integrated in virtio frontend driver. So this converged driver > actually plays the role of original frontend driver and the role of QEMU > device framework. > > The major difference lies in how to calculate relative address for vhost. > The principle of virtio is that: based on one or multiple shared memory > segments, vhost maintains a reference system with the base addresses and > length for each segment so that an address from VM comes (usually GPA, > Guest Physical Address) can be translated into vhost-recognizable address > (named VVA, Vhost Virtual Address). To decrease the overhead of address > translation, we should maintain as few segments as possible. In VM's case, > GPA is always locally continuous. In container's case, CVA (Container > Virtual Address) can be used. Specifically: > a. when set_base_addr, CVA address is used; > b. when preparing RX's descriptors, CVA address is used; > c. when transmitting packets, CVA is filled in TX's descriptors; > d. in TX and CQ's header, CVA is used. > > How to share memory? In VM's case, qemu always shares all physical layout > to backend. But it's not feasible for a container, as a process, to share > all virtual memory regions to backend. So only specified virtual memory > regions (with type of shared) are sent to backend. It's a limitation that > only addresses in these areas can be used to transmit or receive packets. > > Known issues > > a. When used with vhost-net, root privilege is required to create tap > device inside. > b. Control queue and multi-queue are not supported yet. > c. When --single-file option is used, socket_id of the memory may be > wrong. (Use "numactl -N x -m x" to work around this for now) There are 2 different topics in this patchset: 1/ How to provide networking in containers 2/ How to provide memory in containers 1/ You have decided to use the virtio spec to bridge the host with its containers. But there is no virtio device in a container and no vhost interface in the host (except the kernel one). So you are extending virtio to work as a vdev inside the container. Could you explain what is the datapath between virtio and the host app? Does it need to use a fake device from Qemu as Tetsuya has done? Do you think there can be some alternatives to vhost/virtio in containers? 2/ The memory management is already a mess and it's going worst. I think we need to think the requirements first and then write a proper implementation to cover every identified needs. I have started a new thread to cover this part: http://thread.gmane.org/gmane.comp.networking.dpdk.devel/37445