Re: [LSF/MM TOPIC/ATTEND] RDMA passive target

linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Sagi Grimberg <sagig@dev.mellanox.co.il>
To: Boaz Harrosh <boaz@plexistor.com>,
	Chuck Lever <chuck.lever@oracle.com>,
	lsf-pc@lists.linux-foundation.org,
	Dan Williams <dan.j.williams@intel.com>,
	Yigal Korman <yigal@plexistor.com>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
	Linux RDMA Mailing List <linux-rdma@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Jan Kara <jack@suse.cz>, Ric Wheeler <rwheeler@redhat.com>
Subject: Re: [LSF/MM TOPIC/ATTEND] RDMA passive target
Date: Wed, 27 Jan 2016 19:27:44 +0200	[thread overview]
Message-ID: <56A8FE10.7000309@dev.mellanox.co.il> (raw)
In-Reply-To: <56A8F646.5020003@plexistor.com>

Hey Boaz,

> RDMA passive target
> ~~~~~~~~~~~~~~~~~~~
>
> The idea is to have a storage brick that exports a very
> low level pure RDMA API to access its memory based storage.
> The brick might be battery backed volatile based memory, or
> pmem based. In any case the brick might utilize a much higher
> capacity then memory by utilizing a "tiering" to slower media,
> which is enabled by the API.
>
> The API is simple:
>
> 1. Alloc_2M_block_at_virtual_address (ADDR_64_BIT)
>     ADDR_64_BIT is any virtual address and defines the logical ID of the block.
>     If the ID is already allocated an error is returned.
>     If storage is exhausted return => ENOSPC
> 2. Free_2M_block_at_virtual_address (ADDR_64_BIT)
>     Space for logical ID is returned to free store and the ID becomes free for
>     a new allocation.
> 3. map_virtual_address(ADDR_64_BIT, flags) => RDMA handle
>     previously allocated virtual address is locked in memory and an RDMA handle
>     is returned.
>     Flags: read-only, read-write, shared and so on...
> 4. unmap__virtual_address(ADDR_64_BIT)
>     At this point the brick can write data to slower storage if memory space
>     is needed. The RDMA handle from [3] is revoked.
> 5. List_mapped_IDs
>     An extent based list of all allocated ranges. (This is usually used on
>     mount or after a crash)

My understanding is that you're describing a wire protocol correct?

> The dumb brick is not the Network allocator / storage manager at all. and it
> is not a smart target / server. like an iser-target or pnfs-DS. A SW defined
> application can do that, on top of the Dumb-brick. The motivation is a low level
> very low latency API+library, which can be built upon for higher protocols or
> used directly for very low latency cluster.
> It does however mange a virtual allocation map of logical to physical mapping
> of the 2M blocks.

The challenge in my mind would be to have persistence semantics in
place.

>
> Currently both drivers initiator and target are in Kernel, but with
> latest advancement by Dan Williams it can be implemented in user-mode as well,
> Almost.
>
> The almost is because:
> 1. If the target is over a /dev/pmemX then all is fine we have 2M contiguous
>     memory blocks.
> 2. If the target is over an FS, we have a proposal pending for an falloc_2M_flag
>     to ask the FS for a contiguous 2M allocations only. If any of the 2M allocations
>     fail then return ENOSPC from falloc. This way we guaranty that each 2M block can be
>     mapped by a single RDAM handle.

Umm, you don't need the 2M to be contiguous in order to represent them
as a single RDMA handle. If that was true iSER would have never worked.
Or I misunderstood what you meant...

next prev parent reply	other threads:[~2016-01-27 17:27 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-25 21:19 [LSF/MM TOPIC] Remote access to pmem on storage targets Chuck Lever
2016-01-26  8:25 ` [Lsf-pc] " Jan Kara
2016-01-26 15:58   ` Chuck Lever
2016-01-27  0:04     ` Dave Chinner
2016-01-27 15:55       ` Chuck Lever
2016-01-28 21:10         ` Dave Chinner
2016-01-27 10:52     ` Sagi Grimberg
2016-01-26 15:25 ` Atchley, Scott
2016-01-26 15:29   ` Chuck Lever
2016-01-26 17:00     ` Christoph Hellwig
2016-01-27 16:54 ` [LSF/MM TOPIC/ATTEND] RDMA passive target Boaz Harrosh
2016-01-27 17:02   ` [Lsf-pc] " James Bottomley
2016-01-27 17:27   ` Sagi Grimberg [this message]
2016-01-31 14:20     ` Boaz Harrosh
2016-01-31 16:55       ` Yigal Korman
2016-02-01 10:36         ` Sagi Grimberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56A8FE10.7000309@dev.mellanox.co.il \
    --to=sagig@dev.mellanox.co.il \
    --cc=boaz@plexistor.com \
    --cc=chuck.lever@oracle.com \
    --cc=dan.j.williams@intel.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=rwheeler@redhat.com \
    --cc=yigal@plexistor.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).