From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 366AAC48BE5 for ; Mon, 21 Jun 2021 05:12:52 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id BB64660C41 for ; Mon, 21 Jun 2021 05:12:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BB64660C41 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=marvell.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvme-bounces+linux-nvme=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:Message-ID:Date:Subject:CC :To:From:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References: List-Owner; bh=jNoHwimdQO5DYsFTYZiX9mcfcaTOSb6dU5Auh83fXDg=; b=4F0BmnM1wbmg6g 2r6JBUEtTiyMsPjAXfygVIafO7hAqyDJ/eAV+keMM9eMVQ1bzu/1MP8MetjPQ6Jbe0SniXmqponEg kzJYm4OLyRh8q3hi3naN8PG9w4UOEEaleJCpFoylWyPa+BhmweFBwGfMhbt5HUba2oe97WXLJJsIC le3xII5ZgkSB5pxCeQU+1kl8IfrmG0yfYsbtCAyaTuXFg4sEomaOsZC5htrvc7u0MnAkReAMYT8X+ AXw9QWCAmz7/tZI1tzq5GfUdiNj956X/BDycW58ob5oro1AIpmmvXU42ceHEpTh2/HETktO4RmtAW cesk/HddAViTujHBdamw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1lvCEG-0027Dy-T6; Mon, 21 Jun 2021 05:12:12 +0000 Received: from mx0b-0016f401.pphosted.com ([67.231.156.173]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1lvCEC-0027CE-2e for linux-nvme@lists.infradead.org; Mon, 21 Jun 2021 05:12:10 +0000 Received: from pps.filterd (m0045851.ppops.net [127.0.0.1]) by mx0b-0016f401.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 15L56QUK000311; Sun, 20 Jun 2021 22:11:51 -0700 Received: from dc5-exch02.marvell.com ([199.233.59.182]) by mx0b-0016f401.pphosted.com with ESMTP id 399g3qn6hm-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Sun, 20 Jun 2021 22:11:51 -0700 Received: from DC5-EXCH02.marvell.com (10.69.176.39) by DC5-EXCH02.marvell.com (10.69.176.39) with Microsoft SMTP Server (TLS) id 15.0.1497.18; Sun, 20 Jun 2021 22:11:47 -0700 Received: from lbtlvb-pcie154.il.qlogic.org (10.69.176.80) by DC5-EXCH02.marvell.com (10.69.176.39) with Microsoft SMTP Server id 15.0.1497.18 via Frontend Transport; Sun, 20 Jun 2021 22:11:44 -0700 From: Shai Malin To: , , , , CC: , , , , , , , Subject: [PATCH v2 0/8] NVMeTCP Offload ULP Date: Mon, 21 Jun 2021 08:10:38 +0300 Message-ID: <20210621051046.28783-1-smalin@marvell.com> X-Mailer: git-send-email 2.16.6 MIME-Version: 1.0 X-Proofpoint-GUID: Ivyt8PjUeSfX4j8QYXzCYqhpzEj0Rznj X-Proofpoint-ORIG-GUID: Ivyt8PjUeSfX4j8QYXzCYqhpzEj0Rznj X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-06-21_01:2021-06-20, 2021-06-21 signatures=0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210620_221208_287547_398DCEFC X-CRM114-Status: GOOD ( 20.68 ) X-BeenThere: linux-nvme@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "Linux-nvme" Errors-To: linux-nvme-bounces+linux-nvme=archiver.kernel.org@lists.infradead.org With the goal of enabling a generic infrastructure that allows NVMe/TCP offload devices like NICs to seamlessly plug into the NVMe-oF stack, this patch series introduces the nvme-tcp-offload ULP host layer, which will be a new transport type called "tcp-offload" and will serve as an abstraction layer to work with vendor specific nvme-tcp offload drivers. NVMeTCP offload is a full offload of the NVMeTCP protocol, this includes both the TCP level and the NVMeTCP level. The nvme-tcp-offload transport can co-exist with the existing tcp and other transports. The tcp offload was designed so that stack changes are kept to a bare minimum: only registering new transports. All other APIs, ops etc. are identical to the regular tcp transport. Representing the TCP offload as a new transport allows clear and manageable differentiation between the connections which should use the offload path and those that are not offloaded (even on the same device). The nvme-tcp-offload layers and API compared to nvme-tcp and nvme-rdma: * NVMe layer: * [ nvme/nvme-fabrics/blk-mq ] | (nvme API and blk-mq API) | | * Vendor agnostic transport layer: * [ nvme-rdma ] [ nvme-tcp ] [ nvme-tcp-offload ] | | | (Verbs) | | | | (Socket) | | | | | (nvme-tcp-offload API) | | | | | | * Vendor Specific Driver: * | | | [ RDMA driver ] | | [ Network driver ] | [ NVMeTCP Offload driver ] Usage: ====== The user will interact with the network-device in order to configure the ip/vlan - Logically similar to the RDMA model. The NVMeTCP configuration is populated as part of the nvme connect command. Example: Assign IP to the net-device (from any existing Linux tool): ip addr add 100.100.0.101/24 dev p1p1 This IP will be used by both net-device and offload-device. In order to connect from "sw" nvme-tcp through the net-device: nvme connect -t tcp -s 4420 -a 100.100.0.100 -n testnqn In order to connect from "offload" nvme-tcp through the offload-device: nvme connect -t tcp_offload -s 4420 -a 100.100.0.100 -n testnqn An alternative approach, and as a future enhancement that will not impact this series will be to modify nvme-cli with a new flag that will determine if "-t tcp" should be the regular nvme-tcp (which will be the default) or nvme-tcp-offload. Exmaple: nvme connect -t tcp -s 4420 -a 100.100.0.100 -n testnqn -[new flag] Queue Initialization Design: ============================ The nvme-tcp-offload ULP module shall register with the existing nvmf_transport_ops (.name = "tcp_offload"), nvme_ctrl_ops and blk_mq_ops. The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP with the following ops: - claim_dev() - in order to resolve the route to the target according to the paired net_dev. - create_queue() - in order to create offloaded nvme-tcp queue. The nvme-tcp-offload ULP module shall manage all the controller level functionalities, call claim_dev and based on the return values shall call the relevant module create_queue in order to create the admin queue and the IO queues. IO-path Design: =============== The nvme-tcp-offload shall work at the IO-level - the nvme-tcp-offload ULP module shall pass the request (the IO) to the nvme-tcp-offload vendor driver and later, the nvme-tcp-offload vendor driver returns the request completion (the IO completion). No additional handling is needed in between; this design will reduce the CPU utilization as we will describe below. The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP with the following IO-path ops: - send_req() - in order to pass the request to the handling of the offload driver that shall pass it to the vendor specific device. - poll_queue() Once the IO completes, the nvme-tcp-offload vendor driver shall call command.done() that will invoke the nvme-tcp-offload ULP layer to complete the request. Teardown and errors: ==================== In case of NVMeTCP queue error the nvme-tcp-offload vendor driver shall call the nvme_tcp_ofld_report_queue_err. The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP with the following teardown ops: - drain_queue() - destroy_queue() Changes since RFC v1: ===================== - nvme-tcp-offload: Fix nvme_tcp_ofld_ops return values. - nvme-tcp-offload: Remove NVMF_TRTYPE_TCP_OFFLOAD. - nvme-tcp-offload: Add nvme_tcp_ofld_poll() implementation. - nvme-tcp-offload: Fix nvme_tcp_ofld_queue_rq() to check map_sg() and send_req() return values. Changes since RFC v2: ===================== - nvme-tcp-offload: Fixes in controller and queue level (patches 3-6). - qedn: Add the Marvell's NVMeTCP HW offload vendor driver init and probe (patches 8-11). Changes since RFC v3: ===================== - nvme-tcp-offload: Add the full implementation of the nvme-tcp-offload layer including the new ops: setup_ctrl(), release_ctrl(), commit_rqs() and new flows (ASYNC and timeout). - nvme-tcp-offload: Add device maximums: max_hw_sectors, max_segments. - nvme-tcp-offload: layer design and optimization changes. Changes since RFC v4: ===================== (Many thanks to Hannes Reinecke for his feedback) - nvme_tcp_offload: Add num_hw_vectors in order to limit the number of queues. - nvme_tcp_offload: Add per device private_data. - nvme_tcp_offload: Fix header digest, data digest and tos initialization. Changes since RFC v5: ===================== (Many thanks to Sagi Grimberg for his feedback) - nvme-fabrics: Expose nvmf_check_required_opts() globally (as a new patch). - nvme_tcp_offload: Remove io-queues BLK_MQ_F_BLOCKING. - nvme_tcp_offload: Fix the nvme_tcp_ofld_stop_queue (drain_queue) flow. - nvme_tcp_offload: Fix the nvme_tcp_ofld_free_queue (destroy_queue) flow. - nvme_tcp_offload: Change rwsem to mutex. - nvme_tcp_offload: remove redundant fields. - nvme_tcp_offload: Remove the "new" from setup_ctrl(). - nvme_tcp_offload: Remove the init_req() and commit_rqs() ops. - nvme_tcp_offload: Minor fixes in nvme_tcp_ofld_create_ctrl() ansd nvme_tcp_ofld_free_queue(). - nvme_tcp_offload: Patch 8 (timeout and async) was squeashed into patch 7 (io level). Changes since RFC v6: ===================== - No changes in nvme_tcp_offload (only in qedn). Changes since v0: ===================== - nvme_tcp_offload: Add support for NVME_OPT_HOST_IFACE. - nvme_tcp_offload: Kconfig fix (thanks to Petr Mladek). - nvme_tcp_offload: return code fix (thanks to Dan Carpenter). Arie Gershberg (2): nvme-tcp-offload: Add controller level implementation nvme-tcp-offload: Add controller level error recovery implementation Dean Balandin (3): nvme-tcp-offload: Add device scan implementation nvme-tcp-offload: Add queue level implementation nvme-tcp-offload: Add IO level implementation Prabhakar Kushwaha (2): nvme-fabrics: Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS definitions nvme-fabrics: Expose nvmf_check_required_opts() globally Shai Malin (1): nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP MAINTAINERS | 8 + drivers/nvme/host/Kconfig | 15 + drivers/nvme/host/Makefile | 3 + drivers/nvme/host/fabrics.c | 12 +- drivers/nvme/host/fabrics.h | 9 + drivers/nvme/host/tcp-offload.c | 1333 +++++++++++++++++++++++++++++++ drivers/nvme/host/tcp-offload.h | 207 +++++ 7 files changed, 1578 insertions(+), 9 deletions(-) create mode 100644 drivers/nvme/host/tcp-offload.c create mode 100644 drivers/nvme/host/tcp-offload.h -- 2.22.0 _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme