On 09/27/2015 10:28 PM, Christoph Lameter wrote: > On Sun, 27 Sep 2015, Doug Ledford wrote: > >> Currently I'm testing your patch with a couple other patches. I dropped >> the patch of mine that added a module option, and added two different >> patches. However, I'm still waffling on this patch somewhat. In the >> discussions that Jason and I had, I pretty much decided that I would >> like to see all send-only multicast sends be sent immediately with no >> backlog queue. That means that if we had to start a send-only join, or >> if we started one and it hasn't completed yet, we would send the packet >> immediately via the broadcast group versus queueing. Doing so might >> trip this new code up. > > If we send immediately then we would need to check on each packet if the > multicast creation has been completed? We do that already anyway. Calling find_mcast and then checking if(!mcast || !mcast-ah) is exactly that check. > Also broadcast could cause a unecessary reception event on the NICs of > machines that have no interest in this traffic. This is true. However, I'm trying to balance between several competing issues. You also stated the revamped multicast code was adding latency and dropped packets into the problem space. Sending over the broadcast would help with latency. However, I have an alternative idea for that... > We would like to keep > irrelevant traffic off the fabric as much as possible. An a reception > event that requires traffic to be thrown out will cause jitter in the > processing of inbound traffic that we also would like to avoid. That may not be optimal for your app, but we also need to try and maintain proper emulation of typical IP/Ethernet behavior since this is IPoIB after all. That's why the app isn't required to join the group before sending, and also why it should be able to expect that we will fall back to sending via broadcast if needed. However, the following algorithm might be suitable here: On first packet: create mcast group queue packet to group schedule join On subsequent packets: find mcast group check mcast state if already joined, send immediately if joining, queue packet to mcast queue if join is deferred, send via bcast On join completion: successful join set mcast->ah send all queued packets via mcast if no queued packets, alloc neigh for default ipv4 ethertype on failed join mcast->ah remains NULL send all queued packets via bcast mcast->delay_until is set to future time (used to know join is deferred) schedule deferred join attemp -- Doug Ledford GPG KeyID: 0E572FDD