From: Sagi Grimberg <sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
To: Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
Cc: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
Eli Cohen <eli-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
Oren Duer <oren-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>,
Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Subject: Re: [PATCH 0/5] Indirect memory registration feature
Date: Tue, 09 Jun 2015 11:44:16 +0300 [thread overview]
Message-ID: <5576A760.4090004@dev.mellanox.co.il> (raw)
In-Reply-To: <20150609062054.GA13011-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
On 6/9/2015 9:20 AM, Christoph Hellwig wrote:
> On Mon, Jun 08, 2015 at 05:42:15PM +0300, Sagi Grimberg wrote:
>> I wouldn't say this is about offloading bounce buffering to silicon.
>> The RDMA stack always imposed the alignment limitation as we can only
>> give a page lists to the devices. Other drivers (qlogic/emulex FC
>> drivers for example), use an _arbitrary_ SG lists where each element can
>> point to any {addr, len}.
>
> Those are drivers for protocols that support real SG lists. It seems
> only Infiniband and NVMe expose this silly limit.
I agree this is indeed a limitation and that's why SG_GAPS was added
in the first place. I think the next gen of nvme devices will support
real SG lists. This feature enables existing Infiniband devices that can
handle SG lists to receive them via the RDMA stack (ib_core).
If the memory registration process wasn't such a big fiasco in the
first place, wouldn't this way makes the most sense?
>
>>> So please fix it in the proper layers
>>> first,
>>
>> I agree that we can take care of bounce buffering in the block layer
>> (or scsi for SG_IO) if the driver doesn't want to see any type of
>> unaligned SG lists.
>>
>> But do you think that it should come before the stack can support this?
>
> Yes, absolutely. The other thing that needs to come first is a proper
> abstraction for MRs instead of hacking another type into all drivers.
>
I'm very much open to the idea of consolidating the memory registration
code instead of doing it in every ULP (srp, iser, xprtrdma, svcrdma,
rds, more to come...) using a general memory registration API. The main
challenge is to abstract the different methods (and considerations) of
memory registration behind an API. Do we completely mask out the way we
are doing it? I'm worried that we might end up either compromising on
performance or trying to understand too much what the caller is trying
to achieve.
For example:
- frwr requires a queue-pair for the post (and it must be the ULP
queue-pair to ensure the registration is done before the data-transfer
begins). While fmrs does not need the queue-pair.
- the ULP would probably always initiate data transfer after the
registration (send a request or do the rdma r/w). It is useful to
link the frwr post with the next wr in a single post_send call.
I wander how would an API allow such a thing (while other registration
methods don't use work request interface).
- There is the fmr_pool API which tries to tackle the disadvantages of
fmrs (very slow unmap) by delaying the fmr unmap until some dirty
watermark of remapping is met. I'm not sure how this can be done.
- How would the API choose the method to register memory?
- If there is an alignment issue, do we fail? do we bounce?
- There is the whole T10-DIF support...
...
CC'ing Bart & Chuck who share the suffer of memory registration.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2015-06-09 8:44 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-08 13:15 [PATCH 0/5] Indirect memory registration feature Sagi Grimberg
[not found] ` <1433769339-949-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-06-08 13:15 ` [PATCH 1/5] IB/core: Introduce Fast Indirect Memory Registration verbs API Sagi Grimberg
[not found] ` <1433769339-949-2-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-06-08 20:49 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A82373A8FE5C7C-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-06-30 11:47 ` Sagi Grimberg
[not found] ` <559281B4.6010807-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-06-30 12:10 ` Christoph Hellwig
[not found] ` <20150630121002.GA24169-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-06-30 12:59 ` Sagi Grimberg
[not found] ` <559292CE.9010303-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-01 7:23 ` Christoph Hellwig
2015-06-08 13:15 ` [PATCH 2/5] IB/mlx5: Implement Fast Indirect Memory Registration Feature Sagi Grimberg
2015-06-08 13:15 ` [PATCH 3/5] IB/iser: Pass iser device to registration routines Sagi Grimberg
2015-06-08 13:15 ` [PATCH 4/5] IB/iser: Add indirect registration support Sagi Grimberg
2015-06-08 13:15 ` [PATCH 5/5] IB/iser: Add debug prints to the various memory registration methods Sagi Grimberg
2015-06-08 13:22 ` [PATCH 0/5] Indirect memory registration feature Christoph Hellwig
[not found] ` <20150608132254.GA14773-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-06-08 13:39 ` Sagi Grimberg
[not found] ` <55759B0B.8050805-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-06-08 13:51 ` Christoph Hellwig
[not found] ` <20150608135151.GA14021-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-06-08 14:42 ` Sagi Grimberg
[not found] ` <5575A9C7.7000409-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-06-09 6:20 ` Christoph Hellwig
[not found] ` <20150609062054.GA13011-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-06-09 8:44 ` Sagi Grimberg [this message]
[not found] ` <5576A760.4090004-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-06-09 11:14 ` Christoph Hellwig
2015-06-09 15:06 ` Chuck Lever
2015-06-09 7:41 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5576A760.4090004@dev.mellanox.co.il \
--to=sagig-ldsdmyg8hgv8yrgs2mwiifqbs+8scbdb@public.gmane.org \
--cc=bvanassche-HInyCGIudOg@public.gmane.org \
--cc=chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
--cc=dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=eli-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=oren-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox