From: Stefano Brivio <sbrivio-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Pravin Shelar <pshelar-LZ6Gd1LRuIk@public.gmane.org>
Cc: ovs dev <dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org>,
Justin Pettit <jpettit-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>,
netdev <netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
Jiri Benc <jbenc-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Subject: Re: [PATCH RFC net-next] openvswitch: Queue upcalls to userspace in per-port round-robin order
Date: Thu, 16 Aug 2018 23:07:37 +0200 [thread overview]
Message-ID: <20180816230737.34ea6ff9@epycfail> (raw)
In-Reply-To: <CAOrHB_DaA-+J=jzNOdQiUYrA7RJi30HmRESjsmGs7_z1ffpVOA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
Pravin,
On Wed, 15 Aug 2018 00:19:39 -0700
Pravin Shelar <pshelar-LZ6Gd1LRuIk@public.gmane.org> wrote:
> My argument is not about proposed fairness algorithm. It is about cost
> of the fairness and I do not see it is addressed in any of the follow
> ups.
We are still working on it (especially on the points that you mentioned
already), that's why there hasn't been any follow-up yet.
After all, we marked this patch as RFC because we thought we needed to
gather some feedback before we'd reach a solution that's good enough :)
> I revisited the original patch, here is what I see in term of added
> cost to existing upcall processing:
Thanks for the review and the detailed summary, some answers follow:
> 1. one "kzalloc(sizeof(*upcall), GFP_ATOMIC);" This involve allocate
> and initialize memory
We would use kmem_cache_alloc() here, the queue is rather small.
Initialisation, we don't really need it, we can drop it.
> 2. copy flow key which is more than 1 KB (upcall->key = *key)
The current idea here is to find a way to safely hold the pointer to the
flow key. Do you have any suggestion?
> 3. Acquire spin_lock_bh dp->upcalls.lock, which would disable bottom
> half processing on CPU while waiting for the global lock.
A double list, whose pointer is swapped when we start dequeuing
packets (same as it's done e.g. for the flow table on rehashing), would
avoid the need for this spinlock. We're trying that out.
> 4. Iterate list of queued upcalls, one of objective it is to avoid out
> of order packet. But I do not see point of ordering packets from
> different streams.
Please note, though, that we also have packets from the same stream.
Actually, the whole point of this exercise is to get packets from
different streams out of order, while maintaining order for a given
stream.
> 5. signal upcall thread after delay ovs_dp_upcall_delay(). This adds
> further to the latency.
The idea behind ovs_dp_upcall_delay() is to schedule without delay if
we don't currently have a storm of upcalls.
But if we do, we're probably introducing less latency by doing this than
by letting ovs-vswitchd handle them. It's also a fundamental requirement
to have fairness: we need to schedule upcalls, and to schedule we need
some (small, in the overall picture) delay. This is another point where
we need to show some detailed measurements, I guess.
> 6. upcall is then handed over to different thread (context switch),
> likely on different CPU.
> 8. the upcall object is freed on remote CPU.
The solution could be to use cmwq instead and have per-CPU workers and
queues. But I wonder what would be the drawbacks of having per-CPU
fairness. I think this depends a lot on how ovs-vswitchd handles the
upcalls. We could check how that performs. Any thoughts?
> 9. single lock essentially means OVS kernel datapath upcall processing
> is single threaded no matter number of cores in system.
This should also be solved by keeping two queues.
> I would be interested in how are we going to address these issues.
>
> In example you were talking about netlink fd issue on server with 48
> core, how does this solution works when there are 5K ports each
> triggering upcall ? Can you benchmark your patch? Do you have
> performance numbers for TCP_CRR with and without this patch ? Also
> publish latency numbers for this patch. Please turn off megaflow to
> exercise upcall handling.
We just run some tests that show that fairness is maintained with a
much lower number of ports, but we have no performance numbers at the
moment -- other than the consideration that when flooding with upcalls,
ovs-vswitchd is the bottleneck. We'll run proper performance tests,
focusing especially on latency (which we kind of ignored so far).
> I understand fairness has cost, but we need to find right balance
> between performance and fairness. Current fairness scheme is a
> lockless algorithm without much computational overhead, did you try to
> improve current algorithm so that it uses less number of ports.
We tried with one socket per thread, it just doesn't work. We can
definitely try a bit harder. The problem I see here is that the current
mechanism is not actually a fairness scheme. It kind of works for most
workloads, but if a port happens to be flooding with a given timing, I
don't see how fairness can be guaranteed.
--
Stefano
next prev parent reply other threads:[~2018-08-16 21:07 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-04 14:23 [PATCH RFC net-next] openvswitch: Queue upcalls to userspace in per-port round-robin order Matteo Croce
[not found] ` <20180704142342.21740-1-mcroce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-07-10 18:31 ` Pravin Shelar
2018-07-16 16:54 ` Matteo Croce
2018-07-31 19:43 ` Matteo Croce
[not found] ` <CAGnkfhyxQSz=8OsgTsjR3NfZ2FPwv+FjPZNPEY5VHZRsEiQ68w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-07-31 22:06 ` Ben Pfaff
2018-08-03 16:52 ` Stefano Brivio
2018-08-03 23:01 ` Ben Pfaff
2018-08-04 0:43 ` Stefano Brivio
2018-08-04 0:54 ` Ben Pfaff
2018-08-10 14:11 ` William Tu
2018-08-14 15:25 ` Stefano Brivio
2018-07-31 23:12 ` Pravin Shelar
2018-08-07 13:31 ` Stefano Brivio
2018-08-07 13:39 ` Stefano Brivio
2018-08-15 7:19 ` Pravin Shelar
[not found] ` <CAOrHB_DaA-+J=jzNOdQiUYrA7RJi30HmRESjsmGs7_z1ffpVOA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-08-16 21:07 ` Stefano Brivio [this message]
2018-09-26 9:58 ` Stefano Brivio
2018-09-28 17:15 ` Pravin Shelar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180816230737.34ea6ff9@epycfail \
--to=sbrivio-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
--cc=dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org \
--cc=jbenc-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=jpettit-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org \
--cc=netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=pshelar-LZ6Gd1LRuIk@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).