From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stefano Brivio Subject: Re: [PATCH RFC net-next] openvswitch: Queue upcalls to userspace in per-port round-robin order Date: Fri, 3 Aug 2018 18:52:41 +0200 Message-ID: <20180803185241.4ac0d1e5@epycfail> References: <20180704142342.21740-1-mcroce@redhat.com> <20180731220657.GC29662@ovn.org> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: Matteo Croce , Pravin B Shelar , jpettit@vmware.com, gvrose8192@gmail.com, netdev , dev@openvswitch.org, Jiri Benc , Aaron Conole To: Ben Pfaff Return-path: Received: from mx3-rdu2.redhat.com ([66.187.233.73]:36494 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727362AbeHCSt5 (ORCPT ); Fri, 3 Aug 2018 14:49:57 -0400 In-Reply-To: <20180731220657.GC29662@ovn.org> Sender: netdev-owner@vger.kernel.org List-ID: Hi Ben, On Tue, 31 Jul 2018 15:06:57 -0700 Ben Pfaff wrote: > This is an awkward problem to try to solve with sockets because of the > nature of sockets, which are strictly first-in first-out. What you > really want is something closer to the algorithm that we use in > ovs-vswitchd to send packets to an OpenFlow controller. When the > channel becomes congested, then for each packet to be sent to the > controller, OVS appends it to a queue associated with its input port. > (This could be done on a more granular basis than just port.) If the > maximum amount of queued packets is reached, then OVS discards a packet > from the longest queue. When space becomes available in the channel, > OVS round-robins through the queues to send a packet. This achieves > pretty good fairness but it can't be done with sockets because you can't > drop a packet that is already queued to one. Thanks for your feedback. What you describe is, though, functionally equivalent to what this patch does, minus the per-port queueing limit. However, instead of having one explicit queue for each port, and then fetching packets in a round-robin fashion from all the queues, we implemented this with a single queue and choose insertion points while queueing in such a way that the result is equivalent. This way, we avoid the massive overhead associated with having one queue per each port (we can have up to 2^16 ports), and cycling over them. Let's say we have two ports, A and B, and three upcalls are sent for each port. If we implement one queue for each port as you described, we end up with this: .---------------- - - - | A1 | A2 | A3 | '---------------- - - - .---------------- - - - | B1 | B2 | B3 | '---------------- - - - and then send upcalls in this order: A1, B1, A2, B2, A3, B3. What we are doing here with a single queue is inserting the upcalls directly in this order: .------------------------------- - - - | A1 | B1 | A2 | B2 | A3 | B3 | '------------------------------- - - - and dequeueing from the head. About the per-port queueing limit: we currently have a global one (UPCALL_QUEUE_MAX_LEN), while the per-port limit is simply given by implementation constraints in our case: if (dp->upcalls.count[pos->port_no] == U8_MAX - 1) { err = -ENOSPC; goto out_clear; } but we can easily swap that U8_MAX - 1 with another macro or a configurable value, if there's any value in doing that. > My current thought is that any fairness scheme we implement directly in > the kernel is going to need to evolve over time. Maybe we could do > something flexible with BPF and maps, instead of hard-coding it. Honestly, I fail to see what else we might want to do here, other than adding a simple mechanism for fairness, to solve the specific issue at hand. Flexibility would probably come at a higher cost. We could easily make limits configurable if needed. Do you have anything else in mind? -- Stefano