From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.9 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF9D3C433E7 for ; Mon, 19 Oct 2020 17:16:24 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E6201208B3 for ; Mon, 19 Oct 2020 17:16:23 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="CMefzZ/7" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E6201208B3 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4C9D36B005D; Mon, 19 Oct 2020 13:16:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 47A366B0062; Mon, 19 Oct 2020 13:16:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 342656B0068; Mon, 19 Oct 2020 13:16:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 074EA6B005D for ; Mon, 19 Oct 2020 13:16:22 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 99E7B82DDBB1 for ; Mon, 19 Oct 2020 17:16:22 +0000 (UTC) X-FDA: 77389328604.22.cloth52_160a31227238 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin22.hostedemail.com (Postfix) with ESMTP id 7298D18038E60 for ; Mon, 19 Oct 2020 17:16:22 +0000 (UTC) X-HE-Tag: cloth52_160a31227238 X-Filterd-Recvd-Size: 9216 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf09.hostedemail.com (Postfix) with ESMTP for ; Mon, 19 Oct 2020 17:16:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1603127781; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=M5niLxnNtbS1aOnJmQgMaE+zu+ug62i26X4CG928pr8=; b=CMefzZ/7kDlINoy38ncko77mqF9574xgi/Bc75Bt//CKwk327t2hvx+UnxMjNksh5OfP1d VCespPc6Ubdx4JRGhGixSvdkGuTol/+zOFH8BAQVDliOLB5Aa3BgohwFUECwt3+D0+RPl5 WXEOCtPUp5otmAQnV1F+9KT1W7XdDDk= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-252-FITd220LMAKxVNfsIu19gQ-1; Mon, 19 Oct 2020 13:16:15 -0400 X-MC-Unique: FITd220LMAKxVNfsIu19gQ-1 Received: by mail-wm1-f69.google.com with SMTP id d197so71364wmd.4 for ; Mon, 19 Oct 2020 10:16:15 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=M5niLxnNtbS1aOnJmQgMaE+zu+ug62i26X4CG928pr8=; b=TrElKlI/gsUhIK9FcqiQ73cZI/EGGTMvwGt/J3XR4DZRHpWu8z0//XP0cfPv5qWBNO jyYsf2p15Zwk+IvtKH+v+clDhV53v6pfR4VMo/abi2DmbXKBn+tLNxRzwV28hAQOPui3 4cxHj5IKEavuVQqxl9YW1BcHUgTdP2OaKhWdU/TAVPV4rqGXaOKYEhv/jpaNm/CixxNC IgNvtPXG12CBItfHyNtLNtk6YtGU/7eK1l9PXI/pCtguzrb2XoPoSv8SEOMNIJhDwEFk L8n6bxBI1hm+3Dol76RQwYwswu6lnY8D1LmEEmSk2cqyUr6SEBnJ7P2dWbKMSij52Z72 8C6g== X-Gm-Message-State: AOAM533A+3v90YQl1PB9vK81rcbCwbxTgAGuefthkh380Jp8dygUABBn RKQ02tLx5vrBBQrBLtmQwpi+8UZ/5+zLI5JWJy//f+SAqt+HalbVT6g7487kuG4PRHxnoSM2mSX HYf4GRcDT2ck= X-Received: by 2002:adf:9027:: with SMTP id h36mr241377wrh.163.1603127774043; Mon, 19 Oct 2020 10:16:14 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy50ZQWEz1nLN1KYZDUvO4T6N8ODx9NyHZdqht8KFxyP4+BwusFK11On3I7wbFaj0DqL0FZxg== X-Received: by 2002:adf:9027:: with SMTP id h36mr241344wrh.163.1603127773767; Mon, 19 Oct 2020 10:16:13 -0700 (PDT) Received: from redhat.com (bzq-79-176-118-93.red.bezeqint.net. [79.176.118.93]) by smtp.gmail.com with ESMTPSA id j7sm268311wmc.7.2020.10.19.10.16.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Oct 2020 10:16:12 -0700 (PDT) Date: Mon, 19 Oct 2020 13:16:10 -0400 From: "Michael S. Tsirkin" To: Xie Yongji Cc: jasowang@redhat.com, akpm@linux-foundation.org, linux-mm@kvack.org, virtualization@lists.linux-foundation.org Subject: Re: [RFC 0/4] Introduce VDUSE - vDPA Device in Userspace Message-ID: <20201019130815-mutt-send-email-mst@kernel.org> References: <20201019145623.671-1-xieyongji@bytedance.com> MIME-Version: 1.0 In-Reply-To: <20201019145623.671-1-xieyongji@bytedance.com> Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=mst@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Oct 19, 2020 at 10:56:19PM +0800, Xie Yongji wrote: > This series introduces a framework, which can be used to implement > vDPA Devices in a userspace program. To implement it, the work > consist of two parts: control path emulating and data path offloading. > > In the control path, the VDUSE driver will make use of message > mechnism to forward the actions (get/set features, get/st status, > get/set config space and set virtqueue states) from virtio-vdpa > driver to userspace. Userspace can use read()/write() to > receive/reply to those control messages. > > In the data path, the VDUSE driver implements a MMU-based > on-chip IOMMU driver which supports both direct mapping and > indirect mapping with bounce buffer. Then userspace can access > those iova space via mmap(). Besides, eventfd mechnism is used to > trigger interrupts and forward virtqueue kicks. > > The details and our user case is shown below: > > ------------------------ ----------------------------------------------------------- > | APP | | QEMU | > | --------- | | -------------------- -------------------+<-->+------ | > | |dev/vdx| | | | device emulation | | virtio dataplane | | BDS | | > ------------+----------- -----------+-----------------------+-----------------+----- > | | | | > | | emulating | offloading | > ------------+---------------------------+-----------------------+-----------------+------ > | | block device | | vduse driver | | vdpa device | | TCP/IP | | > | -------+-------- --------+-------- +------+------- -----+---- | > | | | | | | | > | | | | | | | > | ----------+---------- ----------+----------- | | | | > | | virtio-blk driver | | virtio-vdpa driver | | | | | > | ----------+---------- ----------+----------- | | | | > | | | | | | | > | | ------------------ | | | > | ----------------------------------------------------- ---+--- | > ------------------------------------------------------------------------------ | NIC |--- > ---+--- > | > ---------+--------- > | Remote Storages | > ------------------- > We make use of it to implement a block device connecting to > our distributed storage, which can be used in containers and > bare metal. What is not exactly clear is what is the APP above doing. Taking virtio blk requests and sending them over the network in some proprietary way? > Compared with qemu-nbd solution, this solution has > higher performance, and we can have an unified technology stack > in VM and containers for remote storages. > > To test it with a host disk (e.g. /dev/sdx): > > $ qemu-storage-daemon \ > --chardev socket,id=charmonitor,path=/tmp/qmp.sock,server,nowait \ > --monitor chardev=charmonitor \ > --blockdev driver=host_device,cache.direct=on,aio=native,filename=/dev/sdx,node-name=disk0 \ > --export vduse-blk,id=test,node-name=disk0,writable=on,vduse-id=1,num-queues=16,queue-size=128 > > The qemu-storage-daemon can be found at https://github.com/bytedance/qemu/tree/vduse > > Future work: > - Improve performance (e.g. zero copy implementation in datapath) > - Config interrupt support > - Userspace library (find a way to reuse device emulation code in qemu/rust-vmm) How does this driver compare with vhost-user-blk (which doesn't need kernel support)? > Xie Yongji (4): > mm: export zap_page_range() for driver use > vduse: Introduce VDUSE - vDPA Device in Userspace > vduse: grab the module's references until there is no vduse device > vduse: Add memory shrinker to reclaim bounce pages > > drivers/vdpa/Kconfig | 8 + > drivers/vdpa/Makefile | 1 + > drivers/vdpa/vdpa_user/Makefile | 5 + > drivers/vdpa/vdpa_user/eventfd.c | 221 ++++++ > drivers/vdpa/vdpa_user/eventfd.h | 48 ++ > drivers/vdpa/vdpa_user/iova_domain.c | 488 ++++++++++++ > drivers/vdpa/vdpa_user/iova_domain.h | 104 +++ > drivers/vdpa/vdpa_user/vduse.h | 66 ++ > drivers/vdpa/vdpa_user/vduse_dev.c | 1081 ++++++++++++++++++++++++++ > include/uapi/linux/vduse.h | 85 ++ > mm/memory.c | 1 + > 11 files changed, 2108 insertions(+) > create mode 100644 drivers/vdpa/vdpa_user/Makefile > create mode 100644 drivers/vdpa/vdpa_user/eventfd.c > create mode 100644 drivers/vdpa/vdpa_user/eventfd.h > create mode 100644 drivers/vdpa/vdpa_user/iova_domain.c > create mode 100644 drivers/vdpa/vdpa_user/iova_domain.h > create mode 100644 drivers/vdpa/vdpa_user/vduse.h > create mode 100644 drivers/vdpa/vdpa_user/vduse_dev.c > create mode 100644 include/uapi/linux/vduse.h > > -- > 2.25.1