Re: strong ordering for data registered memory

public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed

* Re: strong ordering for data registered memory
@ 2009-11-12 21:51 Caitlin Bestler
  0 siblings, 0 replies; 12+ messages in thread
From: Caitlin Bestler @ 2009-11-12 21:51 UTC (permalink / raw)
  To: David Brean, Jason Gunthorpe; +Cc: Richard Frank, Roland Dreier, linux-rdma







>On Thu Nov 12 12:43 , David Brean <David.Brean-UdXhSnd/wVw@public.gmane.org> wrote:See section 4 in the paper called "High Performance RDMA Based MPI Implementation over InfiniBand" on the MVAPICH web page for description of one implementation that polls on data buffers.  Specifically, look at text around the statement "Although the approach uses the in-order implementation of hardware for RDMA write which is not specified in the InfiniBand standard, this feature is very likely to be kept by different hardware designers."  Although this paper is describing a PCI-X implementation, the feature is also exists on PCIe.
>
>It's assumed that the host memory interconnect complies with statements described in the "Update Ordering and Granularity Provided by a Write Transaction" of the PCI spec.  This particular application depends on PCI WRITE behavior, not READ.
>
>Does this help?
>

A simplified way of looking at this requirement is that the last N bytes of the RDMA Write payload require the same ordering guarantees as a CQE would have had.

This is of course ironic since the technique was developed to avoid the overhead of the CQ.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* strong ordering for data registered memory
@ 2009-11-10 20:19 David Brean
       [not found] ` <4AF9CACE.8070700-UdXhSnd/wVw@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: David Brean @ 2009-11-10 20:19 UTC (permalink / raw)
  To: linux-rdma

Some time ago there was an email sent to this group with the subject 
"weak ordering for data registered memory".  I don't recall any action 
resulting from this thread.  So, I have a question.  If a bit were 
defined to specify "strong ordering", perhaps as a "access" flag (see 
ibv_access_flags) and used with ibv_reg_mr(), would that be sufficient 
for (1) client applications that need a HW "guarantee" of writing the 
last byte of an RDMA last and (2) platform implementations that need to 
deliver that feature?

-David

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

[parent not found: <4AF9CACE.8070700-UdXhSnd/wVw@public.gmane.org>]

* Re: strong ordering for data registered memory
       [not found] ` <4AF9CACE.8070700-UdXhSnd/wVw@public.gmane.org>
@ 2009-11-11 17:57   ` Roland Dreier
       [not found]     ` <adaskclvta4.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Roland Dreier @ 2009-11-11 17:57 UTC (permalink / raw)
  To: David Brean; +Cc: linux-rdma


 > Some time ago there was an email sent to this group with the subject
 > "weak ordering for data registered memory".  I don't recall any action
 > resulting from this thread.  So, I have a question.  If a bit were
 > defined to specify "strong ordering", perhaps as a "access" flag (see
 > ibv_access_flags) and used with ibv_reg_mr(), would that be sufficient
 > for (1) client applications that need a HW "guarantee" of writing the
 > last byte of an RDMA last and (2) platform implementations that need
 > to deliver that feature?

What would happen if an application asked for strong ordering and the
adapter and/or platform is not capable of that?

Weak ordering is a bit easier to handle -- the app is saying "if you can
make things go faster, don't worry about ordering here" and a platform
where it doesn't matter can just ignore it.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

[parent not found: <adaskclvta4.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>]

* Re: strong ordering for data registered memory
       [not found]     ` <adaskclvta4.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>
@ 2009-11-11 18:16       ` Richard Frank
       [not found]         ` <4AFAFF7A.4090602-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  2009-11-11 21:37       ` David Brean
  1 sibling, 1 reply; 12+ messages in thread
From: Richard Frank @ 2009-11-11 18:16 UTC (permalink / raw)
  To: Roland Dreier; +Cc: David Brean, linux-rdma

Today apps are forced to assume that all transports can not provide 
strong ordering..
and hence must implement solutions to work around this.

There are specific optimizations an app might make if it knows the 
underpinning
transport can make these guarantees..

It would be useful if "strong ordering" were exposed as attribute from a 
transport..
and as well - have the ability to provide a hint to enable "strong 
ordering" on either
registration or per operation or at the qp level .

Are there any HCAs that provide "strong ordering" today ?

Roland Dreier wrote:
>  > Some time ago there was an email sent to this group with the subject
>  > "weak ordering for data registered memory".  I don't recall any action
>  > resulting from this thread.  So, I have a question.  If a bit were
>  > defined to specify "strong ordering", perhaps as a "access" flag (see
>  > ibv_access_flags) and used with ibv_reg_mr(), would that be sufficient
>  > for (1) client applications that need a HW "guarantee" of writing the
>  > last byte of an RDMA last and (2) platform implementations that need
>  > to deliver that feature?
>
> What would happen if an application asked for strong ordering and the
> adapter and/or platform is not capable of that?
>
> Weak ordering is a bit easier to handle -- the app is saying "if you can
> make things go faster, don't worry about ordering here" and a platform
> where it doesn't matter can just ignore it.
>
>  - R.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>   
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

[parent not found: <4AFAFF7A.4090602-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>]

* Re: strong ordering for data registered memory
       [not found]         ` <4AFAFF7A.4090602-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2009-11-11 22:11           ` David Brean
       [not found]             ` <4AFB3677.6050603-UdXhSnd/wVw@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: David Brean @ 2009-11-11 22:11 UTC (permalink / raw)
  To: Richard Frank; +Cc: Roland Dreier, linux-rdma

Yes, there are HCAs that provide strong ordering.  And an application 
such as OpenMPI checks the HCA model and if appropriate enables a 
mechanism called "eager RDMA" that depends on it.

-David

Richard Frank wrote:
> Today apps are forced to assume that all transports can not provide 
> strong ordering..
> and hence must implement solutions to work around this.
>
> There are specific optimizations an app might make if it knows the 
> underpinning
> transport can make these guarantees..
>
> It would be useful if "strong ordering" were exposed as attribute from 
> a transport..
> and as well - have the ability to provide a hint to enable "strong 
> ordering" on either
> registration or per operation or at the qp level .
>
> Are there any HCAs that provide "strong ordering" today ?
>
> Roland Dreier wrote:
>>  > Some time ago there was an email sent to this group with the subject
>>  > "weak ordering for data registered memory".  I don't recall any 
>> action
>>  > resulting from this thread.  So, I have a question.  If a bit were
>>  > defined to specify "strong ordering", perhaps as a "access" flag (see
>>  > ibv_access_flags) and used with ibv_reg_mr(), would that be 
>> sufficient
>>  > for (1) client applications that need a HW "guarantee" of writing the
>>  > last byte of an RDMA last and (2) platform implementations that need
>>  > to deliver that feature?
>>
>> What would happen if an application asked for strong ordering and the
>> adapter and/or platform is not capable of that?
>>
>> Weak ordering is a bit easier to handle -- the app is saying "if you can
>> make things go faster, don't worry about ordering here" and a platform
>> where it doesn't matter can just ignore it.
>>
>>  - R.
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>   
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

[parent not found: <4AFB3677.6050603-UdXhSnd/wVw@public.gmane.org>]

* Re: strong ordering for data registered memory
       [not found]             ` <4AFB3677.6050603-UdXhSnd/wVw@public.gmane.org>
@ 2009-11-11 22:44               ` Richard Frank
       [not found]                 ` <4AFB3E6B.3080606-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Richard Frank @ 2009-11-11 22:44 UTC (permalink / raw)
  To: David Brean; +Cc: Roland Dreier, linux-rdma

Would anyone like to through out the list of HCAs that do this... I can 
guess at a few...
and can ask the vendors directly.. if not.. .
 
It would be much nicer to not hardcode names of adapters.. but that won't
stop us.. :)

David Brean wrote:
> Yes, there are HCAs that provide strong ordering.  And an application 
> such as OpenMPI checks the HCA model and if appropriate enables a 
> mechanism called "eager RDMA" that depends on it.
>
> -David
>
> Richard Frank wrote:
>> Today apps are forced to assume that all transports can not provide 
>> strong ordering..
>> and hence must implement solutions to work around this.
>>
>> There are specific optimizations an app might make if it knows the 
>> underpinning
>> transport can make these guarantees..
>>
>> It would be useful if "strong ordering" were exposed as attribute 
>> from a transport..
>> and as well - have the ability to provide a hint to enable "strong 
>> ordering" on either
>> registration or per operation or at the qp level .
>>
>> Are there any HCAs that provide "strong ordering" today ?
>>
>> Roland Dreier wrote:
>>>  > Some time ago there was an email sent to this group with the subject
>>>  > "weak ordering for data registered memory".  I don't recall any 
>>> action
>>>  > resulting from this thread.  So, I have a question.  If a bit were
>>>  > defined to specify "strong ordering", perhaps as a "access" flag 
>>> (see
>>>  > ibv_access_flags) and used with ibv_reg_mr(), would that be 
>>> sufficient
>>>  > for (1) client applications that need a HW "guarantee" of writing 
>>> the
>>>  > last byte of an RDMA last and (2) platform implementations that need
>>>  > to deliver that feature?
>>>
>>> What would happen if an application asked for strong ordering and the
>>> adapter and/or platform is not capable of that?
>>>
>>> Weak ordering is a bit easier to handle -- the app is saying "if you 
>>> can
>>> make things go faster, don't worry about ordering here" and a platform
>>> where it doesn't matter can just ignore it.
>>>
>>>  - R.
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe 
>>> linux-rdma" in
>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>   
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

[parent not found: <4AFB3E6B.3080606-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>]

* Re: strong ordering for data registered memory
       [not found]                 ` <4AFB3E6B.3080606-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2009-11-11 23:13                   ` Jason Gunthorpe
       [not found]                     ` <20091111231338.GZ1966-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Jason Gunthorpe @ 2009-11-11 23:13 UTC (permalink / raw)
  To: Richard Frank; +Cc: David Brean, Roland Dreier, linux-rdma

On Wed, Nov 11, 2009 at 05:44:59PM -0500, Richard Frank wrote:

> Would anyone like to through out the list of HCAs that do this... I
> can guess at a few...  and can ask the vendors directly.. if not.. .
> 
> It would be much nicer to not hardcode names of adapters.. but that won't
> stop us.. :)

Isn't it more complex than this? AFAIK the PCI-E standard does not
specify the order which data inside a single transfer becomes visible,
only how different transfers relate. To work on the most agressive
PCI-E system the HCA would have to transfer the last XX bytes as a
seperate PCI-E transaction without relaxed ordering.

This is the sort of thing that might start to matter on QPI and HT
memory-interleaved configurations. A multi-cache line transfer will be
split up and completed on different chips - it may not be fully
coherent 100% of the time.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

[parent not found: <20091111231338.GZ1966-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]

* Re: strong ordering for data registered memory
       [not found]                     ` <20091111231338.GZ1966-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2009-11-12  5:41                       ` Dave Olson
  2009-11-12 20:43                       ` David Brean
  1 sibling, 0 replies; 12+ messages in thread
From: Dave Olson @ 2009-11-12  5:41 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Richard Frank, David Brean, Roland Dreier, linux-rdma

On Wed, 11 Nov 2009, Jason Gunthorpe wrote:

| On Wed, Nov 11, 2009 at 05:44:59PM -0500, Richard Frank wrote:
| 
| > Would anyone like to through out the list of HCAs that do this... I
| > can guess at a few...  and can ask the vendors directly.. if not.. .
| > 
| > It would be much nicer to not hardcode names of adapters.. but that won't
| > stop us.. :)
| 
| Isn't it more complex than this? AFAIK the PCI-E standard does not
| specify the order which data inside a single transfer becomes visible,
| only how different transfers relate. To work on the most agressive
| PCI-E system the HCA would have to transfer the last XX bytes as a
| seperate PCI-E transaction without relaxed ordering.

I can't speak to the specifics of this on PCIe, but yes, by default
the pcie transfers within a single tag can be unordered.

| This is the sort of thing that might start to matter on QPI and HT
| memory-interleaved configurations. A multi-cache line transfer will be
| split up and completed on different chips - it may not be fully
| coherent 100% of the time.

HT is fine, by design, at least on AMD processors (probably don't care
too much about the older sibyte cpus, since they weren't fully
HT-compliant).

I don't know about QPI.

Dave Olson
dave.olson-h88ZbnxC6KDQT0dZR+AlfA@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: strong ordering for data registered memory
       [not found]                     ` <20091111231338.GZ1966-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2009-11-12  5:41                       ` Dave Olson
@ 2009-11-12 20:43                       ` David Brean
  1 sibling, 0 replies; 12+ messages in thread
From: David Brean @ 2009-11-12 20:43 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Richard Frank, Roland Dreier, linux-rdma

See section 4 in the paper called "High Performance RDMA Based MPI Implementation over InfiniBand" on the MVAPICH web page for description of one implementation that polls on data buffers.  Specifically, look at text around the statement "Although the approach uses the in-order implementation of hardware for RDMA write which is not specified in the InfiniBand standard, this feature is very likely to be kept by different hardware designers."  Although this paper is describing a PCI-X implementation, the feature is also exists on PCIe.

It's assumed that the host memory interconnect complies with statements described in the "Update Ordering and Granularity Provided by a Write Transaction" of the PCI spec.  This particular application depends on PCI WRITE behavior, not READ.

Does this help?

Jason Gunthorpe wrote:
> On Wed, Nov 11, 2009 at 05:44:59PM -0500, Richard Frank wrote:
> 
>> Would anyone like to through out the list of HCAs that do this... I
>> can guess at a few...  and can ask the vendors directly.. if not.. .
>>
>> It would be much nicer to not hardcode names of adapters.. but that won't
>> stop us.. :)
> 
> Isn't it more complex than this? AFAIK the PCI-E standard does not
> specify the order which data inside a single transfer becomes visible,
> only how different transfers relate. To work on the most agressive
> PCI-E system the HCA would have to transfer the last XX bytes as a
> seperate PCI-E transaction without relaxed ordering.
> 
> This is the sort of thing that might start to matter on QPI and HT
> memory-interleaved configurations. A multi-cache line transfer will be
> split up and completed on different chips - it may not be fully
> coherent 100% of the time.
> 
> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: strong ordering for data registered memory
       [not found]     ` <adaskclvta4.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>
  2009-11-11 18:16       ` Richard Frank
@ 2009-11-11 21:37       ` David Brean
       [not found]         ` <4AFB2EA5.4030804-UdXhSnd/wVw@public.gmane.org>
  1 sibling, 1 reply; 12+ messages in thread
From: David Brean @ 2009-11-11 21:37 UTC (permalink / raw)
  To: Roland Dreier; +Cc: linux-rdma

I decided to minimize the impact of an API change on the class of 
applications that use the current verbs interface because those 
applications can safely run on platforms that deliver optimal 
performance using weak ordering for data buffers.  New binaries aren't 
required for this class of application.

I thought it would be more appropriate to put the burden of added 
complexity on the class of applications that bypass the verbs to access 
special features in the hardware.  In fact, those applications are 
selective about memory regions that need this special handling and would 
register lots of memory without the "strong ordering' bit.  How 
applications determine that the platform is capable of performing the 
request would be beyond the scope of the verbs, however, I suppose that 
the verbs framework could check and return an error.

If there are applications that expect the hardware to support "strong 
ordering" and don't check the hardware, then these might be a problem.  
Do any of these exists?

By the way, if I had proposed this bit several years ago, then I would 
have chosen a "weak ordering" flag.  Instead, I decided to try 
protecting the existing base of verbs-based software.

-David

Roland Dreier wrote:
>  > Some time ago there was an email sent to this group with the subject
>  > "weak ordering for data registered memory".  I don't recall any action
>  > resulting from this thread.  So, I have a question.  If a bit were
>  > defined to specify "strong ordering", perhaps as a "access" flag (see
>  > ibv_access_flags) and used with ibv_reg_mr(), would that be sufficient
>  > for (1) client applications that need a HW "guarantee" of writing the
>  > last byte of an RDMA last and (2) platform implementations that need
>  > to deliver that feature?
>
> What would happen if an application asked for strong ordering and the
> adapter and/or platform is not capable of that?
>
> Weak ordering is a bit easier to handle -- the app is saying "if you can
> make things go faster, don't worry about ordering here" and a platform
> where it doesn't matter can just ignore it.
>
>  - R.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>   
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

[parent not found: <4AFB2EA5.4030804-UdXhSnd/wVw@public.gmane.org>]

* Re: strong ordering for data registered memory
       [not found]         ` <4AFB2EA5.4030804-UdXhSnd/wVw@public.gmane.org>
@ 2009-11-11 23:06           ` Roland Dreier
       [not found]             ` <ada639gvezo.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Roland Dreier @ 2009-11-11 23:06 UTC (permalink / raw)
  To: David Brean; +Cc: linux-rdma


 > I decided to minimize the impact of an API change on the class of
 > applications that use the current verbs interface because those
 > applications can safely run on platforms that deliver optimal
 > performance using weak ordering for data buffers.  New binaries aren't
 > required for this class of application.
 > 
 > I thought it would be more appropriate to put the burden of added
 > complexity on the class of applications that bypass the verbs to
 > access special features in the hardware.  In fact, those applications
 > are selective about memory regions that need this special handling and
 > would register lots of memory without the "strong ordering' bit.  How
 > applications determine that the platform is capable of performing the
 > request would be beyond the scope of the verbs, however, I suppose
 > that the verbs framework could check and return an error.
 > 
 > If there are applications that expect the hardware to support "strong
 > ordering" and don't check the hardware, then these might be a problem.
 > Do any of these exists?
 > 
 > By the way, if I had proposed this bit several years ago, then I would
 > have chosen a "weak ordering" flag.  Instead, I decided to try
 > protecting the existing base of verbs-based software.

I can't really follow this.  Right now Open MPI et al assume that if
they see a Mellanox adapter, they get the "last byte of RDMA becomes
visible last" behavior.  And there is not a way that I know of to turn
this off at all, let alone get any performance difference.  The
exception being the Cell processor system that started the previous
discussion, where weak ordering at the platform level helped things.

But given that current software does seem to rely on ordering, it seems
that opting into weak ordering would break fewer applications.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

[parent not found: <ada639gvezo.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>]

* Re: strong ordering for data registered memory
       [not found]             ` <ada639gvezo.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>
@ 2009-11-12 20:42               ` David Brean
  0 siblings, 0 replies; 12+ messages in thread
From: David Brean @ 2009-11-12 20:42 UTC (permalink / raw)
  To: Roland Dreier; +Cc: linux-rdma

Some of the SPARC servers from Sun support an IO memory mapping 
mechanism for marking pages for weak or strong ordering.  It has no 
dependency on the HCA.  Marking memory pages for data buffers as weak 
ordered delivers significantly better throughput on IB.

I assumed that there are many more applications using the user verbs 
that don't care about ordering for data buffers, but would need to be 
modified to set a "weak ordering" flag.

Nonetheless, I'm somewhat flexible and willing to go with bit definition 
that that verbs consumers prefer.

-David

Roland Dreier wrote:
>  > I decided to minimize the impact of an API change on the class of
>  > applications that use the current verbs interface because those
>  > applications can safely run on platforms that deliver optimal
>  > performance using weak ordering for data buffers.  New binaries aren't
>  > required for this class of application.
>  > 
>  > I thought it would be more appropriate to put the burden of added
>  > complexity on the class of applications that bypass the verbs to
>  > access special features in the hardware.  In fact, those applications
>  > are selective about memory regions that need this special handling and
>  > would register lots of memory without the "strong ordering' bit.  How
>  > applications determine that the platform is capable of performing the
>  > request would be beyond the scope of the verbs, however, I suppose
>  > that the verbs framework could check and return an error.
>  > 
>  > If there are applications that expect the hardware to support "strong
>  > ordering" and don't check the hardware, then these might be a problem.
>  > Do any of these exists?
>  > 
>  > By the way, if I had proposed this bit several years ago, then I would
>  > have chosen a "weak ordering" flag.  Instead, I decided to try
>  > protecting the existing base of verbs-based software.
>
> I can't really follow this.  Right now Open MPI et al assume that if
> they see a Mellanox adapter, they get the "last byte of RDMA becomes
> visible last" behavior.  And there is not a way that I know of to turn
> this off at all, let alone get any performance difference.  The
> exception being the Cell processor system that started the previous
> discussion, where weak ordering at the platform level helped things.
>
> But given that current software does seem to rely on ordering, it seems
> that opting into weak ordering would break fewer applications.
>
>  - R.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>   
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2009-11-12 21:51 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-12 21:51 strong ordering for data registered memory Caitlin Bestler
  -- strict thread matches above, loose matches on Subject: below --
2009-11-10 20:19 David Brean
     [not found] ` <4AF9CACE.8070700-UdXhSnd/wVw@public.gmane.org>
2009-11-11 17:57   ` Roland Dreier
     [not found]     ` <adaskclvta4.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>
2009-11-11 18:16       ` Richard Frank
     [not found]         ` <4AFAFF7A.4090602-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2009-11-11 22:11           ` David Brean
     [not found]             ` <4AFB3677.6050603-UdXhSnd/wVw@public.gmane.org>
2009-11-11 22:44               ` Richard Frank
     [not found]                 ` <4AFB3E6B.3080606-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2009-11-11 23:13                   ` Jason Gunthorpe
     [not found]                     ` <20091111231338.GZ1966-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-11-12  5:41                       ` Dave Olson
2009-11-12 20:43                       ` David Brean
2009-11-11 21:37       ` David Brean
     [not found]         ` <4AFB2EA5.4030804-UdXhSnd/wVw@public.gmane.org>
2009-11-11 23:06           ` Roland Dreier
     [not found]             ` <ada639gvezo.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>
2009-11-12 20:42               ` David Brean

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox