From: Walker, Benjamin <benjamin.walker at intel.com>
To: spdk@lists.01.org
Subject: Re: [SPDK] sharing of single NVMe device between spdk userspace and native kernel driver
Date: Thu, 14 Jul 2016 17:55:47 +0000 [thread overview]
Message-ID: <1468518945.5999.384.camel@intel.com> (raw)
In-Reply-To: CABSNBDEeFdxDJbXsaWscz_i0+7nanc2j4eZb6YEJOLMt7wDMPg@mail.gmail.com
[-- Attachment #1: Type: text/plain, Size: 3741 bytes --]
On Wed, 2016-07-13 at 12:59 -0700, txcy uio wrote:
> Hello Ben
>
> I have a use case where I want to attach one namespace of a nvme device to spdk driver and use the
> other namespace as a kernel block device to create a regular filesystem. Current implementation of
> spdk requires the device to be unbound completely from the native kernel driver. I was wondering
> if this is at all possible and if yes can this be accomplished with the current spdk
> implementation?
Your request is one we get every few days or so, and it is a perfectly reasonable thing to ask. I
haven't written down my standard response on the mailing list yet, so I'm going to take this
opportunity to lay out our position for all to see and discuss.
From a purely technical standpoint, it is impossible to both load the SPDK driver as it exists today
and the kernel driver against the same PCI device. The registers exposed by the PCI device contain
global state and so there can only be a single "owner". There is an established hardware mechanism
for creating multiple virtual PCI devices from a single physical devices that each can load their
own driver called SR-IOV. This is typically used by NICs today and I'm not aware of any NVMe SSDs
that support it currently. SR-IOV is the right solution for sharing the device like you outline in
the long term, though.
In the short term, it would be technically possible to create some kernel patches that add entries
to sysfs or provide ioctls that allow a user space process to claim an NVMe hardware queue for a
device that the kernel is managing. You could then run the SPDK driver's I/O path against that
queue. Unfortunately, there are two insurmountable issues with this strategy. First, NVMe hardware
queues can write to any namespace on the device. Therefore, you couldn't enforce that the queue can
only write to the namespace you are intending. You couldn't even enforce that the queue is only used
for reads - you basically just have to trust the application to only do reasonable things. Second,
the device is owned by the kernel and therefore is not in an IOMMU protection domain with this
strategy. The device can directly access the DMA engine, and with a small amount of work, you could
hijack that DMA engine to copy data to wherever you wanted on the system. For these two reasons,
patches of this nature would never be accepted into the mainline kernel. The SPDK team can't be in
the business of supporting patches that have been rejected by the kernel community.
Clearly, lots of people have requested to share a device between the kernel and SPDK, so I've been
trying to uncover all of the reasons they may want to do that. So far, in every case, it boils down
to not having a filesystem for use with SPDK. I'm hoping to steer the community to solve the problem
of not having a filesystem rather than trying to share the device. I'm not advocating for writing a
(mostly) POSIX compliant filesystem, but I do think there is a small core of functionality that most
databases or storage applications all require. These are things like allocating blocks into some
unit (I've been calling it a blob) that has a name and is persistent and rediscoverable across
reboots. Writing this layer requires some serious thought - SPDK is fast in no small part because it
is purely asynchronous, polled, and lockless - so this layer would need to preserve those
characteristics.
Sorry for the very long response, but I wanted to document my current thoughts on the mailing list
for all to see.
>
> --Tyc
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
next reply other threads:[~2016-07-14 17:55 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-14 17:55 Walker, Benjamin [this message]
-- strict thread matches above, loose matches on Subject: below --
2016-07-19 8:41 [SPDK] sharing of single NVMe device between spdk userspace and native kernel driver Andrey Kuzmin
2016-07-18 23:06 Walker, Benjamin
2016-07-17 14:23 Andrey Kuzmin
2016-07-13 19:59 txcy uio
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1468518945.5999.384.camel@intel.com \
--to=spdk@lists.01.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.