All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefano Brivio <sbrivio@redhat.com>
To: Ben Pfaff <blp@ovn.org>
Cc: Matteo Croce <mcroce@redhat.com>,
	Pravin B Shelar <pshelar@ovn.org>,
	jpettit@vmware.com, gvrose8192@gmail.com,
	netdev <netdev@vger.kernel.org>,
	dev@openvswitch.org, Jiri Benc <jbenc@redhat.com>,
	Aaron Conole <aconole@redhat.com>
Subject: Re: [PATCH RFC net-next] openvswitch: Queue upcalls to userspace in per-port round-robin order
Date: Fri, 3 Aug 2018 18:52:41 +0200	[thread overview]
Message-ID: <20180803185241.4ac0d1e5@epycfail> (raw)
In-Reply-To: <20180731220657.GC29662@ovn.org>

Hi Ben,

On Tue, 31 Jul 2018 15:06:57 -0700
Ben Pfaff <blp@ovn.org> wrote:

> This is an awkward problem to try to solve with sockets because of the
> nature of sockets, which are strictly first-in first-out.  What you
> really want is something closer to the algorithm that we use in
> ovs-vswitchd to send packets to an OpenFlow controller.  When the
> channel becomes congested, then for each packet to be sent to the
> controller, OVS appends it to a queue associated with its input port.
> (This could be done on a more granular basis than just port.)  If the
> maximum amount of queued packets is reached, then OVS discards a packet
> from the longest queue.  When space becomes available in the channel,
> OVS round-robins through the queues to send a packet.  This achieves
> pretty good fairness but it can't be done with sockets because you can't
> drop a packet that is already queued to one.

Thanks for your feedback. What you describe is, though, functionally
equivalent to what this patch does, minus the per-port queueing limit.

However, instead of having one explicit queue for each port, and
then fetching packets in a round-robin fashion from all the queues, we
implemented this with a single queue and choose insertion points while
queueing in such a way that the result is equivalent. This way, we
avoid the massive overhead associated with having one queue per each
port (we can have up to 2^16 ports), and cycling over them.

Let's say we have two ports, A and B, and three upcalls are sent for
each port. If we implement one queue for each port as you described, we
end up with this:

.---------------- - - -
| A1 | A2 | A3 |
'---------------- - - -

.---------------- - - -
| B1 | B2 | B3 |
'---------------- - - -

and then send upcalls in this order: A1, B1, A2, B2, A3, B3.

What we are doing here with a single queue is inserting the upcalls
directly in this order:

.------------------------------- - - -
| A1 | B1 | A2 | B2 | A3 | B3 |
'------------------------------- - - -

and dequeueing from the head.

About the per-port queueing limit: we currently have a global one
(UPCALL_QUEUE_MAX_LEN), while the per-port limit is simply given by
implementation constraints in our case:

		if (dp->upcalls.count[pos->port_no] == U8_MAX - 1) {
			err = -ENOSPC;
			goto out_clear;
		}

but we can easily swap that U8_MAX - 1 with another macro or a
configurable value, if there's any value in doing that.

> My current thought is that any fairness scheme we implement directly in
> the kernel is going to need to evolve over time.  Maybe we could do
> something flexible with BPF and maps, instead of hard-coding it.

Honestly, I fail to see what else we might want to do here, other than
adding a simple mechanism for fairness, to solve the specific issue at
hand. Flexibility would probably come at a higher cost. We could easily
make limits configurable if needed. Do you have anything else in mind?

-- 
Stefano

  reply	other threads:[~2018-08-03 18:49 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-04 14:23 [PATCH RFC net-next] openvswitch: Queue upcalls to userspace in per-port round-robin order Matteo Croce
     [not found] ` <20180704142342.21740-1-mcroce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-07-10 18:31   ` Pravin Shelar
2018-07-16 16:54     ` Matteo Croce
2018-07-31 19:43       ` Matteo Croce
     [not found]         ` <CAGnkfhyxQSz=8OsgTsjR3NfZ2FPwv+FjPZNPEY5VHZRsEiQ68w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-07-31 22:06           ` Ben Pfaff
2018-08-03 16:52             ` Stefano Brivio [this message]
2018-08-03 23:01               ` Ben Pfaff
2018-08-04  0:43                 ` Stefano Brivio
2018-08-04  0:54                   ` Ben Pfaff
2018-08-10 14:11                   ` William Tu
2018-08-14 15:25                     ` Stefano Brivio
2018-07-31 23:12         ` Pravin Shelar
2018-08-07 13:31           ` Stefano Brivio
2018-08-07 13:39             ` Stefano Brivio
2018-08-15  7:19             ` Pravin Shelar
     [not found]               ` <CAOrHB_DaA-+J=jzNOdQiUYrA7RJi30HmRESjsmGs7_z1ffpVOA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-08-16 21:07                 ` Stefano Brivio
2018-09-26  9:58               ` Stefano Brivio
2018-09-28 17:15                 ` Pravin Shelar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180803185241.4ac0d1e5@epycfail \
    --to=sbrivio@redhat.com \
    --cc=aconole@redhat.com \
    --cc=blp@ovn.org \
    --cc=dev@openvswitch.org \
    --cc=gvrose8192@gmail.com \
    --cc=jbenc@redhat.com \
    --cc=jpettit@vmware.com \
    --cc=mcroce@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pshelar@ovn.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.