From: Shai Malin <smalin@marvell.com>
To: <linux-nvme@lists.infradead.org>, <sagi@grimberg.me>,
<hch@lst.de>, <axboe@fb.com>, <kbusch@kernel.org>
Cc: <davem@davemloft.net>, <aelior@marvell.com>,
<mkalderon@marvell.com>, <okulkarni@marvell.com>,
<pkushwaha@marvell.com>, <prabhakar.pkin@gmail.com>,
<malin1024@gmail.com>, <smalin@marvell.com>
Subject: [PATCH v2 0/8] NVMeTCP Offload ULP
Date: Mon, 21 Jun 2021 08:10:38 +0300 [thread overview]
Message-ID: <20210621051046.28783-1-smalin@marvell.com> (raw)
With the goal of enabling a generic infrastructure that allows NVMe/TCP
offload devices like NICs to seamlessly plug into the NVMe-oF stack, this
patch series introduces the nvme-tcp-offload ULP host layer, which will
be a new transport type called "tcp-offload" and will serve as an
abstraction layer to work with vendor specific nvme-tcp offload drivers.
NVMeTCP offload is a full offload of the NVMeTCP protocol, this includes
both the TCP level and the NVMeTCP level.
The nvme-tcp-offload transport can co-exist with the existing tcp and
other transports. The tcp offload was designed so that stack changes are
kept to a bare minimum: only registering new transports.
All other APIs, ops etc. are identical to the regular tcp transport.
Representing the TCP offload as a new transport allows clear and manageable
differentiation between the connections which should use the offload path
and those that are not offloaded (even on the same device).
The nvme-tcp-offload layers and API compared to nvme-tcp and nvme-rdma:
* NVMe layer: *
[ nvme/nvme-fabrics/blk-mq ]
|
(nvme API and blk-mq API)
|
|
* Vendor agnostic transport layer: *
[ nvme-rdma ] [ nvme-tcp ] [ nvme-tcp-offload ]
| | |
(Verbs)
| | |
| (Socket)
| | |
| | (nvme-tcp-offload API)
| | |
| | |
* Vendor Specific Driver: *
| | |
[ RDMA driver ]
| |
[ Network driver ]
|
[ NVMeTCP Offload driver ]
Usage:
======
The user will interact with the network-device in order to configure
the ip/vlan - Logically similar to the RDMA model.
The NVMeTCP configuration is populated as part of the
nvme connect command.
Example:
Assign IP to the net-device (from any existing Linux tool):
ip addr add 100.100.0.101/24 dev p1p1
This IP will be used by both net-device and offload-device.
In order to connect from "sw" nvme-tcp through the net-device:
nvme connect -t tcp -s 4420 -a 100.100.0.100 -n testnqn
In order to connect from "offload" nvme-tcp through the offload-device:
nvme connect -t tcp_offload -s 4420 -a 100.100.0.100 -n testnqn
An alternative approach, and as a future enhancement that will not impact this
series will be to modify nvme-cli with a new flag that will determine
if "-t tcp" should be the regular nvme-tcp (which will be the default)
or nvme-tcp-offload.
Exmaple:
nvme connect -t tcp -s 4420 -a 100.100.0.100 -n testnqn -[new flag]
Queue Initialization Design:
============================
The nvme-tcp-offload ULP module shall register with the existing
nvmf_transport_ops (.name = "tcp_offload"), nvme_ctrl_ops and blk_mq_ops.
The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP
with the following ops:
- claim_dev() - in order to resolve the route to the target according to
the paired net_dev.
- create_queue() - in order to create offloaded nvme-tcp queue.
The nvme-tcp-offload ULP module shall manage all the controller level
functionalities, call claim_dev and based on the return values shall call
the relevant module create_queue in order to create the admin queue and
the IO queues.
IO-path Design:
===============
The nvme-tcp-offload shall work at the IO-level - the nvme-tcp-offload
ULP module shall pass the request (the IO) to the nvme-tcp-offload vendor
driver and later, the nvme-tcp-offload vendor driver returns the request
completion (the IO completion).
No additional handling is needed in between; this design will reduce the
CPU utilization as we will describe below.
The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP
with the following IO-path ops:
- send_req() - in order to pass the request to the handling of the
offload driver that shall pass it to the vendor specific device.
- poll_queue()
Once the IO completes, the nvme-tcp-offload vendor driver shall call
command.done() that will invoke the nvme-tcp-offload ULP layer to
complete the request.
Teardown and errors:
====================
In case of NVMeTCP queue error the nvme-tcp-offload vendor driver shall
call the nvme_tcp_ofld_report_queue_err.
The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP
with the following teardown ops:
- drain_queue()
- destroy_queue()
Changes since RFC v1:
=====================
- nvme-tcp-offload: Fix nvme_tcp_ofld_ops return values.
- nvme-tcp-offload: Remove NVMF_TRTYPE_TCP_OFFLOAD.
- nvme-tcp-offload: Add nvme_tcp_ofld_poll() implementation.
- nvme-tcp-offload: Fix nvme_tcp_ofld_queue_rq() to check map_sg() and
send_req() return values.
Changes since RFC v2:
=====================
- nvme-tcp-offload: Fixes in controller and queue level (patches 3-6).
- qedn: Add the Marvell's NVMeTCP HW offload vendor driver init and probe
(patches 8-11).
Changes since RFC v3:
=====================
- nvme-tcp-offload: Add the full implementation of the nvme-tcp-offload layer
including the new ops: setup_ctrl(), release_ctrl(), commit_rqs() and new
flows (ASYNC and timeout).
- nvme-tcp-offload: Add device maximums: max_hw_sectors, max_segments.
- nvme-tcp-offload: layer design and optimization changes.
Changes since RFC v4:
=====================
(Many thanks to Hannes Reinecke for his feedback)
- nvme_tcp_offload: Add num_hw_vectors in order to limit the number of queues.
- nvme_tcp_offload: Add per device private_data.
- nvme_tcp_offload: Fix header digest, data digest and tos initialization.
Changes since RFC v5:
=====================
(Many thanks to Sagi Grimberg for his feedback)
- nvme-fabrics: Expose nvmf_check_required_opts() globally (as a new patch).
- nvme_tcp_offload: Remove io-queues BLK_MQ_F_BLOCKING.
- nvme_tcp_offload: Fix the nvme_tcp_ofld_stop_queue (drain_queue) flow.
- nvme_tcp_offload: Fix the nvme_tcp_ofld_free_queue (destroy_queue) flow.
- nvme_tcp_offload: Change rwsem to mutex.
- nvme_tcp_offload: remove redundant fields.
- nvme_tcp_offload: Remove the "new" from setup_ctrl().
- nvme_tcp_offload: Remove the init_req() and commit_rqs() ops.
- nvme_tcp_offload: Minor fixes in nvme_tcp_ofld_create_ctrl() ansd
nvme_tcp_ofld_free_queue().
- nvme_tcp_offload: Patch 8 (timeout and async) was squeashed into
patch 7 (io level).
Changes since RFC v6:
=====================
- No changes in nvme_tcp_offload (only in qedn).
Changes since v0:
=====================
- nvme_tcp_offload: Add support for NVME_OPT_HOST_IFACE.
- nvme_tcp_offload: Kconfig fix (thanks to Petr Mladek).
- nvme_tcp_offload: return code fix (thanks to Dan Carpenter).
Arie Gershberg (2):
nvme-tcp-offload: Add controller level implementation
nvme-tcp-offload: Add controller level error recovery implementation
Dean Balandin (3):
nvme-tcp-offload: Add device scan implementation
nvme-tcp-offload: Add queue level implementation
nvme-tcp-offload: Add IO level implementation
Prabhakar Kushwaha (2):
nvme-fabrics: Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS
definitions
nvme-fabrics: Expose nvmf_check_required_opts() globally
Shai Malin (1):
nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP
MAINTAINERS | 8 +
drivers/nvme/host/Kconfig | 15 +
drivers/nvme/host/Makefile | 3 +
drivers/nvme/host/fabrics.c | 12 +-
drivers/nvme/host/fabrics.h | 9 +
drivers/nvme/host/tcp-offload.c | 1333 +++++++++++++++++++++++++++++++
drivers/nvme/host/tcp-offload.h | 207 +++++
7 files changed, 1578 insertions(+), 9 deletions(-)
create mode 100644 drivers/nvme/host/tcp-offload.c
create mode 100644 drivers/nvme/host/tcp-offload.h
--
2.22.0
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
next reply other threads:[~2021-06-21 5:12 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-21 5:10 Shai Malin [this message]
2021-06-21 5:10 ` [PATCH v2 1/8] nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP Shai Malin
2021-06-21 5:10 ` [PATCH v2 2/8] nvme-fabrics: Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS definitions Shai Malin
2021-06-21 5:10 ` [PATCH v2 3/8] nvme-fabrics: Expose nvmf_check_required_opts() globally Shai Malin
2021-06-21 6:49 ` Christoph Hellwig
2021-06-23 18:58 ` Shai Malin
2021-06-21 5:10 ` [PATCH v2 4/8] nvme-tcp-offload: Add device scan implementation Shai Malin
2021-06-21 6:50 ` Christoph Hellwig
2021-06-23 19:02 ` Shai Malin
2021-06-21 5:10 ` [PATCH v2 5/8] nvme-tcp-offload: Add controller level implementation Shai Malin
2021-06-21 5:10 ` [PATCH v2 6/8] nvme-tcp-offload: Add controller level error recovery implementation Shai Malin
2021-06-21 5:10 ` [PATCH v2 7/8] nvme-tcp-offload: Add queue level implementation Shai Malin
2021-06-21 5:10 ` [PATCH v2 8/8] nvme-tcp-offload: Add IO " Shai Malin
2021-06-21 6:45 ` [PATCH v2 0/8] NVMeTCP Offload ULP Christoph Hellwig
2021-06-23 18:55 ` Shai Malin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210621051046.28783-1-smalin@marvell.com \
--to=smalin@marvell.com \
--cc=aelior@marvell.com \
--cc=axboe@fb.com \
--cc=davem@davemloft.net \
--cc=hch@lst.de \
--cc=kbusch@kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=malin1024@gmail.com \
--cc=mkalderon@marvell.com \
--cc=okulkarni@marvell.com \
--cc=pkushwaha@marvell.com \
--cc=prabhakar.pkin@gmail.com \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox