From mboxrd@z Thu Jan  1 00:00:00 1970
From: Thomas Monjalon <thomas.monjalon@6wind.com>
Subject: Re: [PATCH v2 0/5] virtio support for container
Date: Wed, 13 Apr 2016 18:14:41 +0200
Message-ID: <1642018.IWC2Tt5SYA@xps13>
References: <1446748276-132087-1-git-send-email-jianfeng.tan@intel.com>
 <1454671228-33284-1-git-send-email-jianfeng.tan@intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7Bit
Cc: dev@dpdk.org, nakajima.yoshihiro@lab.ntt.co.jp, mst@redhat.com,
 ann.zhuangyanying@huawei.com
To: Jianfeng Tan <jianfeng.tan@intel.com>
Return-path: <dev-bounces@dpdk.org>
Received: from mail-wm0-f50.google.com (mail-wm0-f50.google.com [74.125.82.50])
 by dpdk.org (Postfix) with ESMTP id 199A02952
 for <dev@dpdk.org>; Wed, 13 Apr 2016 18:14:45 +0200 (CEST)
Received: by mail-wm0-f50.google.com with SMTP id v188so183439826wme.1
 for <dev@dpdk.org>; Wed, 13 Apr 2016 09:14:45 -0700 (PDT)
In-Reply-To: <1454671228-33284-1-git-send-email-jianfeng.tan@intel.com>
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

Hi Jianfeng,

Thanks for raising the container issues and proposing some solutions.
General comments below.

2016-02-05 19:20, Jianfeng Tan:
> This patchset is to provide high performance networking interface (virtio)
> for container-based DPDK applications. The way of starting DPDK apps in
> containers with ownership of NIC devices exclusively is beyond the scope.
> The basic idea here is to present a new virtual device (named eth_cvio),
> which can be discovered and initialized in container-based DPDK apps using
> rte_eal_init(). To minimize the change, we reuse already-existing virtio
> frontend driver code (driver/net/virtio/).
>  
> Compared to QEMU/VM case, virtio device framework (translates I/O port r/w
> operations into unix socket/cuse protocol, which is originally provided in
> QEMU), is integrated in virtio frontend driver. So this converged driver
> actually plays the role of original frontend driver and the role of QEMU
> device framework.
>  
> The major difference lies in how to calculate relative address for vhost.
> The principle of virtio is that: based on one or multiple shared memory
> segments, vhost maintains a reference system with the base addresses and
> length for each segment so that an address from VM comes (usually GPA,
> Guest Physical Address) can be translated into vhost-recognizable address
> (named VVA, Vhost Virtual Address). To decrease the overhead of address
> translation, we should maintain as few segments as possible. In VM's case,
> GPA is always locally continuous. In container's case, CVA (Container
> Virtual Address) can be used. Specifically:
> a. when set_base_addr, CVA address is used;
> b. when preparing RX's descriptors, CVA address is used;
> c. when transmitting packets, CVA is filled in TX's descriptors;
> d. in TX and CQ's header, CVA is used.
>  
> How to share memory? In VM's case, qemu always shares all physical layout
> to backend. But it's not feasible for a container, as a process, to share
> all virtual memory regions to backend. So only specified virtual memory
> regions (with type of shared) are sent to backend. It's a limitation that
> only addresses in these areas can be used to transmit or receive packets.
> 
> Known issues
> 
> a. When used with vhost-net, root privilege is required to create tap
> device inside.
> b. Control queue and multi-queue are not supported yet.
> c. When --single-file option is used, socket_id of the memory may be
> wrong. (Use "numactl -N x -m x" to work around this for now)

There are 2 different topics in this patchset:
1/ How to provide networking in containers
2/ How to provide memory in containers

1/ You have decided to use the virtio spec to bridge the host
with its containers. But there is no virtio device in a container
and no vhost interface in the host (except the kernel one).
So you are extending virtio to work as a vdev inside the container.
Could you explain what is the datapath between virtio and the host app?
Does it need to use a fake device from Qemu as Tetsuya has done?

Do you think there can be some alternatives to vhost/virtio in containers?

2/ The memory management is already a mess and it's going worst.
I think we need to think the requirements first and then write a proper
implementation to cover every identified needs.
I have started a new thread to cover this part:
	http://thread.gmane.org/gmane.comp.networking.dpdk.devel/37445