All of lore.kernel.org
 help / color / mirror / Atom feed
* When does IB Multicast drop?
@ 2015-11-24 22:10 Peter Chinetti
       [not found] ` <5654E056.5050608-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Chinetti @ 2015-11-24 22:10 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA

I've been reading* through the IBA spec (Release 1.3 2015-03-03), trying 
to understand IB multicast and its pitfalls.

I understand that IB multicast only supports Unreliable Datagram sends 
(10.5.2.1), and that there are neither delivery guarantees nor 
acknowledgments for UD sends. Furthermore, flow control is only 
available for Reliable Connections. I thought I saw that there was an 
ordering guarantee within a multicast group for a specific sender, but 
now I can't find that section again.

How far is the send guaranteed to propagate? What would cause the data 
to be dropped? Is it possible for an oversubscribed (e.g. a 80 Gb/s 
burst of multicast bandwidth trying to fit through a link with only 40 
Gb/s of bandwidth) link to slow down multicast data on a different 
network path, or will burst be "clipped" by dropping packets?

In testing some co-workers of mine have done we have found that when 
multicast is run in parallel over IB and Ether, the IB is occasionally 
slower, but does not seem to drop packets. This seems to suggest that 
flow control /is/ working for multicast (which would be great, as long 
as we know where the pathological cases are and avoid them).

Thank you,
Peter Chinetti

*Honestly, more like Control-F'ing for "multicast". Not super effective.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: When does IB Multicast drop?
       [not found] ` <5654E056.5050608-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2015-11-24 23:08   ` Anuj Kalia
       [not found]     ` <CADPSxAgifjbSP0qL5fvRt8G7-qgTY0YjdFuPnB9WKi7kkUw55Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Anuj Kalia @ 2015-11-24 23:08 UTC (permalink / raw)
  To: Peter Chinetti; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

I don't have experience with multicast, but here's some info.

InfiniBand flow control is done at the link layer, so UD does not drop
packets due to congestion.

AFAIK, UD only drops packets due to irrecoverable bit errors and
network device failures. Mellanox's FDR physical layer has BER less
than 10^(-15), and forward error correction on top of that, so an
irrecoverable bit error is extremeley extremely rare.

If the network topology does not have multipath, (Mellanox) UD will
not reorder packets to a particular destination sent from the same UD
QP. There is probably some guarantee in multipath topologies, too.

--Anuj (rdma_guy)

On Tue, Nov 24, 2015 at 10:10 PM, Peter Chinetti
<peter.chinetti-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> I've been reading* through the IBA spec (Release 1.3 2015-03-03), trying to
> understand IB multicast and its pitfalls.
>
> I understand that IB multicast only supports Unreliable Datagram sends
> (10.5.2.1), and that there are neither delivery guarantees nor
> acknowledgments for UD sends. Furthermore, flow control is only available
> for Reliable Connections. I thought I saw that there was an ordering
> guarantee within a multicast group for a specific sender, but now I can't
> find that section again.
>
> How far is the send guaranteed to propagate? What would cause the data to be
> dropped? Is it possible for an oversubscribed (e.g. a 80 Gb/s burst of
> multicast bandwidth trying to fit through a link with only 40 Gb/s of
> bandwidth) link to slow down multicast data on a different network path, or
> will burst be "clipped" by dropping packets?
>
> In testing some co-workers of mine have done we have found that when
> multicast is run in parallel over IB and Ether, the IB is occasionally
> slower, but does not seem to drop packets. This seems to suggest that flow
> control /is/ working for multicast (which would be great, as long as we know
> where the pathological cases are and avoid them).
>
> Thank you,
> Peter Chinetti
>
> *Honestly, more like Control-F'ing for "multicast". Not super effective.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: When does IB Multicast drop?
       [not found]     ` <CADPSxAgifjbSP0qL5fvRt8G7-qgTY0YjdFuPnB9WKi7kkUw55Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-11-25 16:10       ` Christoph Lameter
       [not found]         ` <alpine.DEB.2.20.1511251007020.31676-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  2015-11-25 16:34       ` Peter Chinetti
  1 sibling, 1 reply; 6+ messages in thread
From: Christoph Lameter @ 2015-11-25 16:10 UTC (permalink / raw)
  To: Anuj Kalia
  Cc: Peter Chinetti,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Tue, 24 Nov 2015, Anuj Kalia wrote:

> InfiniBand flow control is done at the link layer, so UD does not drop
> packets due to congestion.

Correct. But multicast packets are droped at the QP receive level if the
app does not provide enough buffers to accept the data stream. The
bufers can easily be overrun if one does not code carefully given that
the maximum number of those is 16K or so. These drops occurs silently.
Currently there is no accounting for these drops in the upstream kernel.

> AFAIK, UD only drops packets due to irrecoverable bit errors and
> network device failures. Mellanox's FDR physical layer has BER less
> than 10^(-15), and forward error correction on top of that, so an
> irrecoverable bit error is extremeley extremely rare.

Yep. These are extremely rare. We rely on reliable delivery of "unreliable
datagrams" here to avoid having messaging layers that request
retransmission on packet drops.

> If the network topology does not have multipath, (Mellanox) UD will
> not reorder packets to a particular destination sent from the same UD
> QP. There is probably some guarantee in multipath topologies, too.

Correct.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: When does IB Multicast drop?
       [not found]     ` <CADPSxAgifjbSP0qL5fvRt8G7-qgTY0YjdFuPnB9WKi7kkUw55Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2015-11-25 16:10       ` Christoph Lameter
@ 2015-11-25 16:34       ` Peter Chinetti
  1 sibling, 0 replies; 6+ messages in thread
From: Peter Chinetti @ 2015-11-25 16:34 UTC (permalink / raw)
  To: Anuj Kalia; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

Thank you very much for the info, is there a good Infiniband reference
(other than the IBA spec, I mean) I should read?

-Peter

On 11/24/2015 05:08 PM, Anuj Kalia wrote:
> I don't have experience with multicast, but here's some info.
>
> InfiniBand flow control is done at the link layer, so UD does not drop
> packets due to congestion.
>
> AFAIK, UD only drops packets due to irrecoverable bit errors and
> network device failures. Mellanox's FDR physical layer has BER less
> than 10^(-15), and forward error correction on top of that, so an
> irrecoverable bit error is extremeley extremely rare.
>
> If the network topology does not have multipath, (Mellanox) UD will
> not reorder packets to a particular destination sent from the same UD
> QP. There is probably some guarantee in multipath topologies, too.
>
> --Anuj (rdma_guy)
>
> On Tue, Nov 24, 2015 at 10:10 PM, Peter Chinetti
> <peter.chinetti-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> I've been reading* through the IBA spec (Release 1.3 2015-03-03), trying to
>> understand IB multicast and its pitfalls.
>>
>> I understand that IB multicast only supports Unreliable Datagram sends
>> (10.5.2.1), and that there are neither delivery guarantees nor
>> acknowledgments for UD sends. Furthermore, flow control is only available
>> for Reliable Connections. I thought I saw that there was an ordering
>> guarantee within a multicast group for a specific sender, but now I can't
>> find that section again.
>>
>> How far is the send guaranteed to propagate? What would cause the data to be
>> dropped? Is it possible for an oversubscribed (e.g. a 80 Gb/s burst of
>> multicast bandwidth trying to fit through a link with only 40 Gb/s of
>> bandwidth) link to slow down multicast data on a different network path, or
>> will burst be "clipped" by dropping packets?
>>
>> In testing some co-workers of mine have done we have found that when
>> multicast is run in parallel over IB and Ether, the IB is occasionally
>> slower, but does not seem to drop packets. This seems to suggest that flow
>> control /is/ working for multicast (which would be great, as long as we know
>> where the pathological cases are and avoid them).
>>
>> Thank you,
>> Peter Chinetti
>>
>> *Honestly, more like Control-F'ing for "multicast". Not super effective.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: When does IB Multicast drop?
       [not found]         ` <alpine.DEB.2.20.1511251007020.31676-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
@ 2015-11-25 17:10           ` Peter Chinetti
       [not found]             ` <5655EB96.2090600-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Chinetti @ 2015-11-25 17:10 UTC (permalink / raw)
  To: Christoph Lameter, Anuj Kalia
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org


> Correct. But multicast packets are droped at the QP receive level if the
> app does not provide enough buffers to accept the data stream. The
> bufers can easily be overrun if one does not code carefully given that
> the maximum number of those is 16K or so. These drops occurs silently.
> Currently there is no accounting for these drops in the upstream kernel.
How about when one of the destinations for the multicast group has its 
connection to the switch overloaded (because it is subscribing to many 
multicast groups whose combined bandwidth is momentarily greater than 
the bandwidth of the link to the switch). Are the messages destined for 
that endpoint dropped at the switch, or is traffic to the entire 
multicast group delayed?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: When does IB Multicast drop?
       [not found]             ` <5655EB96.2090600-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2015-11-25 17:23               ` Christoph Lameter
  0 siblings, 0 replies; 6+ messages in thread
From: Christoph Lameter @ 2015-11-25 17:23 UTC (permalink / raw)
  To: Peter Chinetti
  Cc: Anuj Kalia, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Wed, 25 Nov 2015, Peter Chinetti wrote:

> > Correct. But multicast packets are droped at the QP receive level if the
> > app does not provide enough buffers to accept the data stream. The
> > bufers can easily be overrun if one does not code carefully given that
> > the maximum number of those is 16K or so. These drops occurs silently.
> > Currently there is no accounting for these drops in the upstream kernel.

> How about when one of the destinations for the multicast group has its
> connection to the switch overloaded (because it is subscribing to many
> multicast groups whose combined bandwidth is momentarily greater than the
> bandwidth of the link to the switch). Are the messages destined for that
> endpoint dropped at the switch, or is traffic to the entire multicast group
> delayed?

The entire traffic to the muilticast group will be delayed.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-11-25 17:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-24 22:10 When does IB Multicast drop? Peter Chinetti
     [not found] ` <5654E056.5050608-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-11-24 23:08   ` Anuj Kalia
     [not found]     ` <CADPSxAgifjbSP0qL5fvRt8G7-qgTY0YjdFuPnB9WKi7kkUw55Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-11-25 16:10       ` Christoph Lameter
     [not found]         ` <alpine.DEB.2.20.1511251007020.31676-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
2015-11-25 17:10           ` Peter Chinetti
     [not found]             ` <5655EB96.2090600-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-11-25 17:23               ` Christoph Lameter
2015-11-25 16:34       ` Peter Chinetti

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.