From mboxrd@z Thu Jan  1 00:00:00 1970
From: Stefano Brivio <sbrivio@redhat.com>
Subject: Re: [PATCH RFC net-next] openvswitch: Queue upcalls to userspace in
 per-port round-robin order
Date: Sat, 4 Aug 2018 02:43:24 +0200
Message-ID: <20180804024324.4d900b5e@epycfail>
References: <20180704142342.21740-1-mcroce@redhat.com>
        <CAOrHB_Byo=q2Om1_s+wM3X8m4dbez7WcjF3bXsm_Q6d8YWcAjw@mail.gmail.com>
        <CAGnkfhwS3J=7W7BQabFXSLLad3dV3YfocDf9qADaWqqgZKPB7g@mail.gmail.com>
        <CAGnkfhyxQSz=8OsgTsjR3NfZ2FPwv+FjPZNPEY5VHZRsEiQ68w@mail.gmail.com>
        <20180731220657.GC29662@ovn.org>
        <20180803185241.4ac0d1e5@epycfail>
        <20180803230108.GU29662@ovn.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Cc: Matteo Croce <mcroce@redhat.com>,
        Pravin B Shelar <pshelar@ovn.org>, jpettit@vmware.com,
        gvrose8192@gmail.com, netdev <netdev@vger.kernel.org>,
        dev@openvswitch.org, Jiri Benc <jbenc@redhat.com>,
        Aaron Conole <aconole@redhat.com>
To: Ben Pfaff <blp@ovn.org>
Return-path: <netdev-owner@vger.kernel.org>
Received: from mx3-rdu2.redhat.com ([66.187.233.73]:48218 "EHLO mx1.redhat.com"
        rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP
        id S1732001AbeHDCmJ (ORCPT <rfc822;netdev@vger.kernel.org>);
        Fri, 3 Aug 2018 22:42:09 -0400
In-Reply-To: <20180803230108.GU29662@ovn.org>
Sender: netdev-owner@vger.kernel.org
List-ID: <netdev.vger.kernel.org>

On Fri, 3 Aug 2018 16:01:08 -0700
Ben Pfaff <blp@ovn.org> wrote:

> I think that a simple mechanism for fairness is fine.  The direction
> of extensibility that makes me anxious is how to decide what matters
> for fairness.  So far, we've talked about per-vport fairness.  That
> works pretty well for packets coming in from virtual interfaces where
> each vport represents a separate VM.

Yes, right, that's the case where we have significant issues currently.

> It does not work well if the traffic filling your queues all comes
> from a single physical port because some source of traffic is sending
> traffic at a high rate.  In that case, you'll do a lot better if you
> do fairness based on the source 5-tuple. But if you're doing network
> virtualization, then the outer source 5-tuples won't necessarily vary
> much and you'd be better off looking at the VNI and maybe some Geneve
> TLV options and maybe the inner 5-tuple...

Sure, I see what you mean now. That looks entirely doable if we
abstract the round-robin bucket selection out of the current patch.

> I would be very pleased if we could integrate a simple mechanism for
> fairness, based for now on some simple criteria like the source port,
> but thinking ahead to how we could later make it gracefully extensible
> to consider more general and possibly customizable criteria.

We could change the patch so that instead of just using the vport for
round-robin queue insertion, we generalise that and use "buckets"
instead of vports, and have a set of possible functions that are called
instead of using port_no directly in ovs_dp_upcall_queue_roundrobin(),
making this configurable via netlink, per datapath.

We could implement selection based on source port or a hash on the
source 5-tuple, and the relevant bits of
ovs_dp_upcall_queue_roundrobin() would look like this:

static int ovs_dp_upcall_queue_roundrobin(struct datapath *dp,
					  struct dp_upcall_info *upcall)
{

[...]

	list_for_each_entry(pos, head, list) {
		int bucket = dp->rr_select(pos);

		/* Count per-bucket upcalls. */
		if (dp->upcalls.count[bucket] == U8_MAX) {
			err = -ENOSPC;
			goto out_clear;
		}
		dp->upcalls.count[bucket]++;

		if (bucket == upcall->bucket) {
			/* Another upcall for the same bucket: move insertion
			 * point here, keep looking for insertion condition to
			 * be still met further on.
			 */
			find_next = true;
			here = pos;
			continue;
		}

		count = dp->upcalls.count[bucket];
		if (find_next && dp->upcalls.count[bucket] >= count) {
			/* Insertion condition met: no need to look further,
			 * unless another upcall for the same port occurs later.
			 */
			find_next = false;
			here = pos;
		}
	}

[...]

}

and implementations for dp->rr_select() would look like:

int rr_select_vport(struct dp_upcall_info *upcall)
{
	return upcall->port_no;
}

int rr_select_srcport(struct dp_upcall_info *upcall)
{
	/* look up source port from upcall->skb... */
}

And we could then easily extend this to use BPF with maps one day.

This is for clarity by the way, but I guess we should avoid indirect
calls in the final implementation. 

What do you think?

-- 
Stefano