From: Willy Tarreau <w@1wt.eu>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Tolga Ceylan <tolga.ceylan@gmail.com>,
Tom Herbert <tom@herbertland.com>,
cgallek@google.com, Josh Snyder <josh@code406.com>,
Aaron Conole <aconole@bytheb.org>,
"David S. Miller" <davem@davemloft.net>,
Linux Kernel Network Developers <netdev@vger.kernel.org>
Subject: Re: [PATCH 1/1] net: Add SO_REUSEPORT_LISTEN_OFF socket option as drain mode
Date: Thu, 24 Mar 2016 19:00:11 +0100 [thread overview]
Message-ID: <20160324180011.GB7585@1wt.eu> (raw)
In-Reply-To: <1458838897.12033.10.camel@edumazet-glaptop3.roam.corp.google.com>
On Thu, Mar 24, 2016 at 10:01:37AM -0700, Eric Dumazet wrote:
> On Thu, 2016-03-24 at 17:50 +0100, Willy Tarreau wrote:
> > On Thu, Mar 24, 2016 at 09:33:11AM -0700, Eric Dumazet wrote:
> > > > --- a/net/ipv4/inet_hashtables.c
> > > > +++ b/net/ipv4/inet_hashtables.c
> > > > @@ -189,6 +189,8 @@ static inline int compute_score(struct sock *sk, struct net *net,
> > > > return -1;
> > > > score += 4;
> > > > }
> > > > + if (sk->sk_reuseport)
> > > > + score++;
> > >
> > > This wont work with BPF
> > >
> > > > if (sk->sk_incoming_cpu == raw_smp_processor_id())
> > > > score++;
> > >
> > > This one does not work either with BPF
> >
> > But this *is* in 4.5. Does this mean that this part doesn't work anymore or
> > just that it's not usable in conjunction with BPF ? In this case I'm less
> > worried, because it would mean that we have a solution for non-BPF aware
> > applications and that BPF-aware applications can simply use BPF.
> >
>
> BPF can implement the CPU choice/pref itself. It has everything needed.
Well I don't need the CPU choice, it was already there, it's not my code,
I only need the ability for an independant process to stop receiving new
connections without altering the other processes nor dropping some of these
connections.
In fact initially I didn't even need anything related to incoming connection
load-balancing, just the ability to start a new process without stopping the
old one, as it used to work in 2.2 and for which I used to keep a patch in
2.4 and 2.6. When SO_REUSEPORT was reintroduced in 3.9, that solved the issue
and some users started to complain that between the old and the new processes,
some connections were lost. Hence the proposal above. Since it's not about
load distribution and that processes are totally independant, I don't see
well how to (ab)use BPF to achieve this.
The pattern is :
t0 : unprivileged processes 1 and 2 are listening to the same port
(sock1@pid1) (sock2@pid2)
<------ listening ------>
t1 : new processes are started to replace the old ones
(sock1@pid1) (sock2@pid2) (sock3@pid3) (sock4@pid4)
<------ listening ------> <------ listening ------>
t2 : new processes signal the old ones they must stop
(sock1@pid1) (sock2@pid2) (sock3@pid3) (sock4@pid4)
<------- draining ------> <------ listening ------>
t3 : pids 1 and 2 have finished, they go away
(sock3@pid3) (sock4@pid4)
<------ gone -----> <------ listening ------>
> > - it seems to me that for BPF to be usable on process shutting down, we'd
> > need to have some form of central knowledge if the goal is to redefine
> > how to distribute the load. In my case there are multiple independant
> > processes forked on startup, so it's unclear to me how each of them could
> > reconfigure BPF when shutting down without risking to break the other ones.
> > - the doc makes me believe that BPF would require privileges to be unset, so
> > that would not be compatible with a process shutting down which has already
> > dropped its privileges after startup, but I could be wrong.
> >
> > Thanks for your help on this,
> > Willy
> >
>
> The point is : BPF is the way to go, because it is expandable.
OK so this means we have to find a way to expand it to allow an individual
non-privileged process to change the distribution algorithm without impacting
other processes.
I need to discover it better to find what can be done, but unfortunately at
this point the sole principle makes me think of a level of complexity that
doesn't seem obvious to solve at all :-/
Regards,
Willy
next prev parent reply other threads:[~2016-03-24 18:00 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-27 0:30 [PATCH 1/1] net: Add SO_REUSEPORT_LISTEN_OFF socket option as drain mode Tolga Ceylan
2015-09-27 1:04 ` Eric Dumazet
2015-09-27 1:37 ` Tolga Ceylan
2015-09-27 1:44 ` Aaron Conole
2015-09-27 2:02 ` Tolga Ceylan
2015-09-27 2:24 ` Eric Dumazet
2015-11-11 5:41 ` Tom Herbert
2015-11-11 6:19 ` Eric Dumazet
2015-11-11 17:05 ` Tom Herbert
2015-11-11 17:23 ` Eric Dumazet
2015-11-11 18:23 ` Tom Herbert
2015-11-11 18:43 ` Eric Dumazet
2015-11-12 1:09 ` Eric Dumazet
2015-12-15 16:14 ` Willy Tarreau
2015-12-15 17:10 ` Eric Dumazet
2015-12-15 17:43 ` Willy Tarreau
2015-12-15 18:21 ` Eric Dumazet
2015-12-15 19:44 ` Willy Tarreau
2015-12-15 21:21 ` Eric Dumazet
2015-12-16 7:38 ` Willy Tarreau
2015-12-16 16:15 ` Willy Tarreau
2015-12-18 16:33 ` Josh Snyder
2015-12-18 18:58 ` Willy Tarreau
2015-12-19 2:38 ` Eric Dumazet
2015-12-19 7:00 ` Willy Tarreau
2015-12-21 20:38 ` Tom Herbert
2015-12-21 20:41 ` Willy Tarreau
2016-03-24 5:10 ` Tolga Ceylan
2016-03-24 6:12 ` Willy Tarreau
2016-03-24 14:13 ` Eric Dumazet
2016-03-24 14:22 ` Willy Tarreau
2016-03-24 14:45 ` Eric Dumazet
2016-03-24 15:30 ` Willy Tarreau
2016-03-24 16:33 ` Eric Dumazet
2016-03-24 16:50 ` Willy Tarreau
2016-03-24 17:01 ` Eric Dumazet
2016-03-24 17:26 ` Tom Herbert
2016-03-24 17:55 ` Daniel Borkmann
2016-03-24 18:20 ` Tolga Ceylan
2016-03-24 18:24 ` Willy Tarreau
2016-03-24 18:37 ` Eric Dumazet
2016-03-24 22:40 ` Yann Ylavic
2016-03-24 22:49 ` Eric Dumazet
2016-03-24 23:40 ` Yann Ylavic
2016-03-24 23:54 ` Tom Herbert
2016-03-25 0:01 ` Yann Ylavic
2016-03-25 5:28 ` Willy Tarreau
2016-03-25 6:49 ` Eric Dumazet
2016-03-25 8:53 ` Willy Tarreau
2016-03-25 11:21 ` Yann Ylavic
2016-03-25 13:17 ` Eric Dumazet
2016-03-25 0:25 ` David Miller
2016-03-25 0:24 ` David Miller
2016-03-24 18:00 ` Willy Tarreau [this message]
2016-03-24 18:21 ` Willy Tarreau
2016-03-24 18:32 ` Eric Dumazet
-- strict thread matches above, loose matches on Subject: below --
2016-03-25 15:29 Craig Gallek
2016-03-25 16:21 ` Alexei Starovoitov
2016-03-25 16:31 ` Craig Gallek
2016-03-25 17:00 ` Eric Dumazet
2016-03-25 18:31 ` Willem de Bruijn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160324180011.GB7585@1wt.eu \
--to=w@1wt.eu \
--cc=aconole@bytheb.org \
--cc=cgallek@google.com \
--cc=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=josh@code406.com \
--cc=netdev@vger.kernel.org \
--cc=tolga.ceylan@gmail.com \
--cc=tom@herbertland.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).