* [PATCH] An argument for allowing applications to manually send RMPP packets if desired
@ 2011-09-12 16:02 Mike Heinz
[not found] ` <4C2744E8AD2982428C5BFE523DF8CDCB4A5387E899-amwN6d8PyQWXx9kJd3VG2h2eb7JE58TQ@public.gmane.org>
0 siblings, 1 reply; 11+ messages in thread
From: Mike Heinz @ 2011-09-12 16:02 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; +Cc: Todd Rimmer
Consider an HPC cluster with 3000 compute nodes and a single SM, where each
compute node has 16 CPUs.
Now consider an HPC application running on all cores and all processes are
starting at roughly the same time and each process is querying the SM for a
list of all nodes in the fabric. If the application uses some local sharing of
data, this will lead to 3,000 queries at the same time. (If it is naive, it
would lead to 48,000 queries hitting the SM!)
Under the OFED model, the SM would be required to build 3,000 distinct buffers
containing 3,000 slightly different replies. At 128 bytes per Node Info record,
each reply would be roughly 384k long and each would consume 384k of kernel
memory until the response was completely sent to the destination. In the case
we just described, this could result in a bit over a gigabyte of kernel memory
being allocated. (In the naive case, it would be much worse - 6 gigabytes of
kernel memory allocated to handle redundant data!)
But is this really needed? Consider that these 3,000 replies only differ in
their destination addresses - the actual data is identical in all of them.
Moreover, the data returned for a query like this changes only rarely in a
production fabric - which means that the response could be generated once and
then and then re-used to provide responses to multiple clients.
To allow this, however, the SM must be allowed to explicitly manage its own
RMPP transmissions instead of sending each response as a complete unit. If this
is allowed, then the kernel no longer needs to allocate large amounts of buffer
space, and the SM can build the results of certain queries in advance, updating
them only when the fabric changes, instead of recreating them each time it
receives an IB_MAD_METHOD_GET_TABLE.
Notes about this version of the patch:
This code incorporates feedback from spring 2010, when it was requested
that the patch provide an explicit pass-through rmpp version instead of
overriding version zero. Unlike the previous version of this patch this
version does not affect how RMPP responses are received, these are still
handled as normal. All it does is permit RMPP packets sent by the SM to be
delivered without alteration. I've tested this work by writing a sample
program based on ibsysstat.c to demonstrate that large MAD responses
can be sent and received using both RMPP version 1 and RMPP pass-through.
Signed-off-by: Michael Heinz <michael.heinz-h88ZbnxC6KDQT0dZR+AlfA@public.gmane.org>
---
drivers/infiniband/core/mad.c | 6 ++++++
drivers/infiniband/core/user_mad.c | 26 ++++++++++++++++++--------
include/rdma/ib_mad.h | 2 ++
3 files changed, 26 insertions(+), 8 deletions(-)
diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index b4d8672..d506bc0 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -207,12 +207,17 @@ struct ib_mad_agent *ib_register_mad_agent(struct ib_device *device,
int ret2, qpn;
unsigned long flags;
u8 mgmt_class, vclass;
+ u8 rmpp_passthru;
/* Validate parameters */
qpn = get_spl_qp_index(qp_type);
if (qpn == -1)
goto error1;
+ rmpp_passthru = (rmpp_version == IB_MGMT_RMPP_PASSTHRU);
+ if (rmpp_passthru)
+ rmpp_version = 0;
+
if (rmpp_version && rmpp_version != IB_MGMT_RMPP_VERSION)
goto error1;
@@ -309,6 +314,7 @@ struct ib_mad_agent *ib_register_mad_agent(struct ib_device *device,
mad_agent_priv->qp_info = &port_priv->qp_info[qpn];
mad_agent_priv->reg_req = reg_req;
mad_agent_priv->agent.rmpp_version = rmpp_version;
+ mad_agent_priv->agent.rmpp_passthru = rmpp_passthru;
mad_agent_priv->agent.device = device;
mad_agent_priv->agent.recv_handler = recv_handler;
mad_agent_priv->agent.send_handler = send_handler;
diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c
index 8d261b6..1993aad 100644
--- a/drivers/infiniband/core/user_mad.c
+++ b/drivers/infiniband/core/user_mad.c
@@ -501,7 +501,8 @@ static ssize_t ib_umad_write(struct file *filp, const char __user *buf,
rmpp_mad = (struct ib_rmpp_mad *) packet->mad.data;
hdr_len = ib_get_mad_data_offset(rmpp_mad->mad_hdr.mgmt_class);
- if (!ib_is_mad_class_rmpp(rmpp_mad->mad_hdr.mgmt_class)) {
+ if (agent->rmpp_passthru ||
+ !ib_is_mad_class_rmpp(rmpp_mad->mad_hdr.mgmt_class)) {
copy_offset = IB_MGMT_MAD_HDR;
rmpp_active = 0;
} else {
@@ -553,14 +554,23 @@ static ssize_t ib_umad_write(struct file *filp, const char __user *buf,
rmpp_mad->mad_hdr.tid = *tid;
}
- spin_lock_irq(&file->send_lock);
- ret = is_duplicate(file, packet);
- if (!ret)
+ if (agent->rmpp_passthru &&
+ ib_is_mad_class_rmpp(rmpp_mad->mad_hdr.mgmt_class) &&
+ (ib_get_rmpp_flags(&rmpp_mad->rmpp_hdr) &
+ IB_MGMT_RMPP_FLAG_ACTIVE)) {
+ spin_lock_irq(&file->send_lock);
list_add_tail(&packet->list, &file->send_list);
- spin_unlock_irq(&file->send_lock);
- if (ret) {
- ret = -EINVAL;
- goto err_msg;
+ spin_unlock_irq(&file->send_lock);
+ } else {
+ spin_lock_irq(&file->send_lock);
+ ret = is_duplicate(file, packet);
+ if (!ret)
+ list_add_tail(&packet->list, &file->send_list);
+ spin_unlock_irq(&file->send_lock);
+ if (ret) {
+ ret = -EINVAL;
+ goto err_msg;
+ }
}
ret = ib_post_send_mad(packet->msg, NULL);
diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h
index d3b9401..ee40330 100644
--- a/include/rdma/ib_mad.h
+++ b/include/rdma/ib_mad.h
@@ -79,6 +79,7 @@
/* RMPP information */
#define IB_MGMT_RMPP_VERSION 1
+#define IB_MGMT_RMPP_PASSTHRU 255
#define IB_MGMT_RMPP_TYPE_DATA 1
#define IB_MGMT_RMPP_TYPE_ACK 2
@@ -360,6 +361,7 @@ struct ib_mad_agent {
u32 hi_tid;
u8 port_num;
u8 rmpp_version;
+ u8 rmpp_passthru;
};
/**
This message and any attached documents contain information from QLogic Corporation or its wholly-owned subsidiaries that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 11+ messages in thread[parent not found: <4C2744E8AD2982428C5BFE523DF8CDCB4A5387E899-amwN6d8PyQWXx9kJd3VG2h2eb7JE58TQ@public.gmane.org>]
* Re: [PATCH] An argument for allowing applications to manually send RMPP packets if desired [not found] ` <4C2744E8AD2982428C5BFE523DF8CDCB4A5387E899-amwN6d8PyQWXx9kJd3VG2h2eb7JE58TQ@public.gmane.org> @ 2011-09-12 17:15 ` Roland Dreier [not found] ` <CAL1RGDUXM9-Ey1aF8xQo3X_L1PLrVyyLPYvqmy6Qeu5M2JnJPA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 11+ messages in thread From: Roland Dreier @ 2011-09-12 17:15 UTC (permalink / raw) To: Mike Heinz Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Todd Rimmer > But is this really needed? Consider that these 3,000 replies only differ in > their destination addresses - the actual data is identical in all of them. > Moreover, the data returned for a query like this changes only rarely in a > production fabric - which means that the response could be generated once and > then and then re-used to provide responses to multiple clients. > > To allow this, however, the SM must be allowed to explicitly manage its own > RMPP transmissions instead of sending each response as a complete unit. If this > is allowed, then the kernel no longer needs to allocate large amounts of buffer > space, and the SM can build the results of certain queries in advance, updating > them only when the fabric changes, instead of recreating them each time it > receives an IB_MAD_METHOD_GET_TABLE. It seems at least the SM side could be handled using writev() to splice together the data it wants to send. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <CAL1RGDUXM9-Ey1aF8xQo3X_L1PLrVyyLPYvqmy6Qeu5M2JnJPA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH] An argument for allowing applications to manually send RMPP packets if desired [not found] ` <CAL1RGDUXM9-Ey1aF8xQo3X_L1PLrVyyLPYvqmy6Qeu5M2JnJPA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2011-09-12 17:23 ` Jason Gunthorpe [not found] ` <20110912172334.GC18574-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 11+ messages in thread From: Jason Gunthorpe @ 2011-09-12 17:23 UTC (permalink / raw) To: Roland Dreier Cc: Mike Heinz, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Todd Rimmer On Mon, Sep 12, 2011 at 10:15:08AM -0700, Roland Dreier wrote: > > But is this really needed? Consider that these 3,000 replies only > > differ in their destination addresses - the actual data is > > identical in all of them. Moreover, the data returned for a query > > like this changes only rarely in a production fabric - which means > > that the response could be generated once and then and then > > re-used to provide responses to multiple clients. > > > > To allow this, however, the SM must be allowed to explicitly > > manage its own RMPP transmissions instead of sending each response > > as a complete unit. If this is allowed, then the kernel no longer > > needs to allocate large amounts of buffer space, and the SM can > > build the results of certain queries in advance, updating them > > only when the fabric changes, instead of recreating them each time > > it receives an IB_MAD_METHOD_GET_TABLE. > > It seems at least the SM side could be handled using writev() to > splice together the data it wants to send. How does that help avoid the kernel memory allocations? I think having the option for RMPP in user space is a good idea, it allows much more efficiency on the SM side.. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <20110912172334.GC18574-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH] An argument for allowing applications to manually send RMPP packets if desired [not found] ` <20110912172334.GC18574-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2011-09-12 18:29 ` Roland Dreier [not found] ` <CAL1RGDVPZzMKmMg7mokhG0btX+3NH_+tL-9P5guV10h6X6i0iw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 11+ messages in thread From: Roland Dreier @ 2011-09-12 18:29 UTC (permalink / raw) To: Jason Gunthorpe Cc: Mike Heinz, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Todd Rimmer On Mon, Sep 12, 2011 at 10:23 AM, Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote: >> It seems at least the SM side could be handled using writev() to >> splice together the data it wants to send. > How does that help avoid the kernel memory allocations? It doesn't, I was only responding to the idea that the SM has to rebuild the table for every query. That issue seems independent of where we do RMPP. > I think having the option for RMPP in user space is a good idea, it > allows much more efficiency on the SM side.. Maybe... but I'd like to explore if there's some way to avoid the copying without having to do the full protocol in userspace. Couldn't the kernel just copy data as needed, and say that userspace needs to keep the buffer stable until the send completes? (With an opt-in from userspace maybe?) - R. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <CAL1RGDVPZzMKmMg7mokhG0btX+3NH_+tL-9P5guV10h6X6i0iw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* RE: [PATCH] An argument for allowing applications to manually send RMPP packets if desired [not found] ` <CAL1RGDVPZzMKmMg7mokhG0btX+3NH_+tL-9P5guV10h6X6i0iw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2011-09-12 18:53 ` Hefty, Sean 2011-09-12 19:06 ` Jason Gunthorpe 1 sibling, 0 replies; 11+ messages in thread From: Hefty, Sean @ 2011-09-12 18:53 UTC (permalink / raw) To: Roland Dreier, Jason Gunthorpe Cc: Mike Heinz, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Todd Rimmer > Couldn't the kernel just copy data as needed, and say that > userspace needs to keep the buffer stable until the send > completes? (With an opt-in from userspace maybe?) I agree, and I'll add that the IBTA could also take up the task of coming up with a far more efficient way of transferring extremely large amounts of data to/from the SA. When we have transport support in the hardware and RDMA capabilities, are 256 byte UD packets with 20% overhead really the best way to transfer gigabytes of data? - Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] An argument for allowing applications to manually send RMPP packets if desired [not found] ` <CAL1RGDVPZzMKmMg7mokhG0btX+3NH_+tL-9P5guV10h6X6i0iw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2011-09-12 18:53 ` Hefty, Sean @ 2011-09-12 19:06 ` Jason Gunthorpe [not found] ` <20110912190623.GD18574-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 1 sibling, 1 reply; 11+ messages in thread From: Jason Gunthorpe @ 2011-09-12 19:06 UTC (permalink / raw) To: Roland Dreier Cc: Mike Heinz, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Todd Rimmer On Mon, Sep 12, 2011 at 11:29:20AM -0700, Roland Dreier wrote: > On Mon, Sep 12, 2011 at 10:23 AM, Jason Gunthorpe > <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote: > >> It seems at least the SM side could be handled using writev() to > >> splice together the data it wants to send. > > > How does that help avoid the kernel memory allocations? > > It doesn't, I was only responding to the idea that the SM has to > rebuild the table for every query. That issue seems independent > of where we do RMPP. > > > I think having the option for RMPP in user space is a good idea, it > > allows much more efficiency on the SM side.. > > Maybe... but I'd like to explore if there's some way to avoid > the copying without having to do the full protocol in userspace. It isn't just the copying. The RMPP protocol is designed to let an SA generate the response on the fly on a per packet basis. Only a little bit of RMPP state is required per active session, and no work is needed before starting to supply response data. This capability cannot be exposed at all via the kernel interface. The whole notion of doing RMPP transactions with asynchronous write/read operation is totally broken from a buffer management perspective. The fact that it exists at all provides way too many avenues for user space to OOM the kernel. I hope a remote OOM is not also possible through this code :( Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <20110912190623.GD18574-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* RE: [PATCH] An argument for allowing applications to manually send RMPP packets if desired [not found] ` <20110912190623.GD18574-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2011-09-16 18:28 ` Mike Heinz [not found] ` <4C2744E8AD2982428C5BFE523DF8CDCB4A5387ECDF-amwN6d8PyQWXx9kJd3VG2h2eb7JE58TQ@public.gmane.org> 0 siblings, 1 reply; 11+ messages in thread From: Mike Heinz @ 2011-09-16 18:28 UTC (permalink / raw) To: Jason Gunthorpe, Roland Dreier Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Todd Rimmer There hasn't been any discussion of this patch since Monday, I'd like to see if we can keep this conversation moving forward. Right now, as I see it, we have a scalability issue with how rmpp packets are currently handled. While the current method works great on small to mid-sized fabrics - and vastly simplifies life for the application programmer - as fabrics grow in size, it risks memory exhaustion when many nodes make similar queries to a server at the same time. In addition, it eliminates the flexibility provided in the RMPP protocol; if the SA/SM handles RMPP packets individually it can not only stash packets for re-use, it can take advantage of the timing and windowing in the protocol to throttle its responses during periods of heavy load. Note that no one is arguing that the existing model isn't appropriate for client applications - I doubt we will be seeing fabrics so large that a single NodeInfo response would exhaust all RAM on the client. Roland, you suggested developing a new interface that avoids the copying without requiring the full protocol to be handled in user space, possibly using writev(). But is it really worthwhile to add complexity to ib_mad, ib_umad, libibumad and libibmad for something that's only needed on (for example) one machine out of several thousand? In addition, such an interface still would not allow a sophisticated SM implementation to do load management of the queries. Trying to put load management of MAD queries in the kernel itself would result in a constantly changing kernel module as we try to adapt to emerging issues as the fabrics become ever larger. The patch at hand avoids all that complexity by simply allowing the SM application handle RMPP itself in whatever manner is considered best. If the SM wants to use the current RMPP implementation it can, turning this ability on is completely optional. This message and any attached documents contain information from QLogic Corporation or its wholly-owned subsidiaries that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <4C2744E8AD2982428C5BFE523DF8CDCB4A5387ECDF-amwN6d8PyQWXx9kJd3VG2h2eb7JE58TQ@public.gmane.org>]
* Re: [PATCH] An argument for allowing applications to manually send RMPP packets if desired [not found] ` <4C2744E8AD2982428C5BFE523DF8CDCB4A5387ECDF-amwN6d8PyQWXx9kJd3VG2h2eb7JE58TQ@public.gmane.org> @ 2011-09-17 3:42 ` Jason Gunthorpe [not found] ` <20110917034251.GA6056-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 11+ messages in thread From: Jason Gunthorpe @ 2011-09-17 3:42 UTC (permalink / raw) To: Mike Heinz Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Todd Rimmer On Fri, Sep 16, 2011 at 01:28:16PM -0500, Mike Heinz wrote: > The patch at hand avoids all that complexity by simply allowing the SM > application handle RMPP itself in whatever manner is considered best. If > the SM wants to use the current RMPP implementation it can, turning this > ability on is completely optional. I agree, this is a good idea. Allowing the SM to control the RMPP entirely, and amortize the content generation across the entire query is a very good way to manage many of the issues with a RMPP heavy work load. Ultimately I think the scalable/compatible answer is to move these RMPP work loads to a verbs QP and we need to have a user space RMPP implementation for that anyhow. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <20110917034251.GA6056-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* RE: [PATCH] An argument for allowing applications to manually send RMPP packets if desired [not found] ` <20110917034251.GA6056-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2011-09-19 0:35 ` Hefty, Sean [not found] ` <1828884A29C6694DAF28B7E6B8A8237316E5B763-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org> 0 siblings, 1 reply; 11+ messages in thread From: Hefty, Sean @ 2011-09-19 0:35 UTC (permalink / raw) To: Jason Gunthorpe, Mike Heinz Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Todd Rimmer > Ultimately I think the scalable/compatible answer is to move these > RMPP work loads to a verbs QP and we need to have a user space RMPP > implementation for that anyhow. Many to one is never scalable. The applications simply cannot rely on every node querying the SA at the same time, especially for large amounts of data. And the overhead of MADs and RMPP is too high to be a useful solution to scaling. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <1828884A29C6694DAF28B7E6B8A8237316E5B763-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>]
* RE: [PATCH] An argument for allowing applications to manually send RMPP packets if desired [not found] ` <1828884A29C6694DAF28B7E6B8A8237316E5B763-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org> @ 2011-09-19 15:30 ` Mike Heinz 2011-10-24 17:04 ` Mike Heinz 1 sibling, 0 replies; 11+ messages in thread From: Mike Heinz @ 2011-09-19 15:30 UTC (permalink / raw) To: Hefty, Sean, Jason Gunthorpe Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Todd Rimmer The problem with creating a new protocol is that we would need to a) design the protocol b) get the IBTA to accept it c) get the application developers to accept it. I simply cannot see getting this process done in less than two years. Meanwhile, the problem with SA/SM performance is an issue right now - and it's an issue that was created not by the IB spec but by the implementation. -----Original Message----- From: Hefty, Sean [mailto:sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org] Sent: Sunday, September 18, 2011 8:35 PM To: Jason Gunthorpe; Mike Heinz Cc: Roland Dreier; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Todd Rimmer Subject: RE: [PATCH] An argument for allowing applications to manually send RMPP packets if desired > Ultimately I think the scalable/compatible answer is to move these > RMPP work loads to a verbs QP and we need to have a user space RMPP > implementation for that anyhow. Many to one is never scalable. The applications simply cannot rely on every node querying the SA at the same time, especially for large amounts of data. And the overhead of MADs and RMPP is too high to be a useful solution to scaling. This message and any attached documents contain information from QLogic Corporation or its wholly-owned subsidiaries that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [PATCH] An argument for allowing applications to manually send RMPP packets if desired [not found] ` <1828884A29C6694DAF28B7E6B8A8237316E5B763-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org> 2011-09-19 15:30 ` Mike Heinz @ 2011-10-24 17:04 ` Mike Heinz 1 sibling, 0 replies; 11+ messages in thread From: Mike Heinz @ 2011-10-24 17:04 UTC (permalink / raw) To: Hefty, Sean, Jason Gunthorpe Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Todd Rimmer I'm not sure I understand the resistance to this patch. The patch does not break existing functionality and can demonstrably improve the performance of tools that can take advantage of it. What's wrong with implementing this patch now and working on an improved mechanism for the future? -----Original Message----- From: Hefty, Sean [mailto:sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org] Sent: Sunday, September 18, 2011 8:35 PM To: Jason Gunthorpe; Mike Heinz Cc: Roland Dreier; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Todd Rimmer Subject: RE: [PATCH] An argument for allowing applications to manually send RMPP packets if desired > Ultimately I think the scalable/compatible answer is to move these > RMPP work loads to a verbs QP and we need to have a user space RMPP > implementation for that anyhow. Many to one is never scalable. The applications simply cannot rely on every node querying the SA at the same time, especially for large amounts of data. And the overhead of MADs and RMPP is too high to be a useful solution to scaling. This message and any attached documents contain information from QLogic Corporation or its wholly-owned subsidiaries that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2011-10-24 17:04 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-09-12 16:02 [PATCH] An argument for allowing applications to manually send RMPP packets if desired Mike Heinz
[not found] ` <4C2744E8AD2982428C5BFE523DF8CDCB4A5387E899-amwN6d8PyQWXx9kJd3VG2h2eb7JE58TQ@public.gmane.org>
2011-09-12 17:15 ` Roland Dreier
[not found] ` <CAL1RGDUXM9-Ey1aF8xQo3X_L1PLrVyyLPYvqmy6Qeu5M2JnJPA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-09-12 17:23 ` Jason Gunthorpe
[not found] ` <20110912172334.GC18574-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2011-09-12 18:29 ` Roland Dreier
[not found] ` <CAL1RGDVPZzMKmMg7mokhG0btX+3NH_+tL-9P5guV10h6X6i0iw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-09-12 18:53 ` Hefty, Sean
2011-09-12 19:06 ` Jason Gunthorpe
[not found] ` <20110912190623.GD18574-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2011-09-16 18:28 ` Mike Heinz
[not found] ` <4C2744E8AD2982428C5BFE523DF8CDCB4A5387ECDF-amwN6d8PyQWXx9kJd3VG2h2eb7JE58TQ@public.gmane.org>
2011-09-17 3:42 ` Jason Gunthorpe
[not found] ` <20110917034251.GA6056-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2011-09-19 0:35 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237316E5B763-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2011-09-19 15:30 ` Mike Heinz
2011-10-24 17:04 ` Mike Heinz
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox