* SO_REUSEPORT? @ 2008-08-07 16:57 Tom Herbert 2008-08-07 17:09 ` SO_REUSEPORT? Rémi Denis-Courmont 0 siblings, 1 reply; 9+ messages in thread From: Tom Herbert @ 2008-08-07 16:57 UTC (permalink / raw) To: netdev Hello, We are looking at ways to scale TCP listeners. I think we like is the ability to listen on a port from multiple threads (sockets bound to same port, INADDR_ANY, and no interface binding) , which is what SO_REUSEPORT would seem to allow. Has this ever been implemented for Linux or is there a good reason not to have it? Thanks, Tom ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: SO_REUSEPORT? 2008-08-07 16:57 SO_REUSEPORT? Tom Herbert @ 2008-08-07 17:09 ` Rémi Denis-Courmont 2008-08-07 17:58 ` SO_REUSEPORT? Tom Herbert 0 siblings, 1 reply; 9+ messages in thread From: Rémi Denis-Courmont @ 2008-08-07 17:09 UTC (permalink / raw) To: Tom Herbert; +Cc: netdev Le jeudi 7 août 2008 19:57:15 Tom Herbert, vous avez écrit : > Hello, > > We are looking at ways to scale TCP listeners. I think we like is the > ability to listen on a port from multiple threads (sockets bound to > same port, INADDR_ANY, and no interface binding) , which is what > SO_REUSEPORT would seem to allow. Has this ever been implemented for > Linux or is there a good reason not to have it? On Linux, SO_REUSEADDR provide most of what SO_REUSEPORT provides on BSD. In any case, there is absolutely no point in creating multiple TCP listeners. Multiple threads can accept() on the same listener - at the same time. -- Rémi Denis-Courmont http://www.remlab.net/ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: SO_REUSEPORT? 2008-08-07 17:09 ` SO_REUSEPORT? Rémi Denis-Courmont @ 2008-08-07 17:58 ` Tom Herbert 2008-08-07 18:17 ` SO_REUSEPORT? Rick Jones 0 siblings, 1 reply; 9+ messages in thread From: Tom Herbert @ 2008-08-07 17:58 UTC (permalink / raw) To: netdev > > We are looking at ways to scale TCP listeners. I think we like is the > > ability to listen on a port from multiple threads (sockets bound to > > same port, INADDR_ANY, and no interface binding) , which is what > > SO_REUSEPORT would seem to allow. Has this ever been implemented for > > Linux or is there a good reason not to have it? > > On Linux, SO_REUSEADDR provide most of what SO_REUSEPORT provides on BSD. > > In any case, there is absolutely no point in creating multiple TCP listeners. > Multiple threads can accept() on the same listener - at the same time. > We've been doing that, but then on wakeup it would seem that we're at the mercy of scheduling-- basically which ever threads wakes up first will get to process accept queue first. This seems to bias towards threads running on the same CPU as the wakeup is called, and so this method doesn't give us an even distribution of new connections across the threads that we'd like. Tom ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: SO_REUSEPORT? 2008-08-07 17:58 ` SO_REUSEPORT? Tom Herbert @ 2008-08-07 18:17 ` Rick Jones 2008-08-07 19:03 ` SO_REUSEPORT? Stephen Hemminger 0 siblings, 1 reply; 9+ messages in thread From: Rick Jones @ 2008-08-07 18:17 UTC (permalink / raw) To: Tom Herbert; +Cc: netdev Tom Herbert wrote: >>>We are looking at ways to scale TCP listeners. I think we like is the >>>ability to listen on a port from multiple threads (sockets bound to >>>same port, INADDR_ANY, and no interface binding) , which is what >>>SO_REUSEPORT would seem to allow. Has this ever been implemented for >>>Linux or is there a good reason not to have it? >> >>On Linux, SO_REUSEADDR provide most of what SO_REUSEPORT provides on BSD. >> >>In any case, there is absolutely no point in creating multiple TCP listeners. >>Multiple threads can accept() on the same listener - at the same time. >> > > > We've been doing that, but then on wakeup it would seem that we're at > the mercy of scheduling-- basically which ever threads wakes up first > will get to process accept queue first. This seems to bias towards > threads running on the same CPU as the wakeup is called, and so this > method doesn't give us an even distribution of new connections across > the threads that we'd like. How would the presence of multiple TCP LISTEN endpoints change that? You'd then be at the mercy of whatever "scheduling" there was inside the stack. If you want to balance the threads, perhaps a dispatch thread, or a virtual one - each thread knows how many connections it is servicing, let them know how many the other threads are servicing, and if a thread has N more connections than the other threads have it not go into accept() that time around. Might need some tweaking to handle pathological starvation cases like all the other threads are hung I suppose but the basic idea is there. rick jones > > Tom > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: SO_REUSEPORT? 2008-08-07 18:17 ` SO_REUSEPORT? Rick Jones @ 2008-08-07 19:03 ` Stephen Hemminger 2008-08-07 19:43 ` SO_REUSEPORT? Tom Herbert 0 siblings, 1 reply; 9+ messages in thread From: Stephen Hemminger @ 2008-08-07 19:03 UTC (permalink / raw) To: Rick Jones; +Cc: Tom Herbert, netdev On Thu, 07 Aug 2008 11:17:55 -0700 Rick Jones <rick.jones2@hp.com> wrote: > Tom Herbert wrote: > >>>We are looking at ways to scale TCP listeners. I think we like is the > >>>ability to listen on a port from multiple threads (sockets bound to > >>>same port, INADDR_ANY, and no interface binding) , which is what > >>>SO_REUSEPORT would seem to allow. Has this ever been implemented for > >>>Linux or is there a good reason not to have it? > >> > >>On Linux, SO_REUSEADDR provide most of what SO_REUSEPORT provides on BSD. > >> > >>In any case, there is absolutely no point in creating multiple TCP listeners. > >>Multiple threads can accept() on the same listener - at the same time. > >> > > > > > > We've been doing that, but then on wakeup it would seem that we're at > > the mercy of scheduling-- basically which ever threads wakes up first > > will get to process accept queue first. This seems to bias towards > > threads running on the same CPU as the wakeup is called, and so this > > method doesn't give us an even distribution of new connections across > > the threads that we'd like. > > How would the presence of multiple TCP LISTEN endpoints change that? > You'd then be at the mercy of whatever "scheduling" there was inside the > stack. > > If you want to balance the threads, perhaps a dispatch thread, or a > virtual one - each thread knows how many connections it is servicing, > let them know how many the other threads are servicing, and if a thread > has N more connections than the other threads have it not go into > accept() that time around. Might need some tweaking to handle > pathological starvation cases like all the other threads are hung I > suppose but the basic idea is there. > > rick jones I suspect thread balancing would actually hurt performance! You would be better off to have a couple of "hot" threads that are doing all the work and stay in cache. If you push the work around to all the threads, you have worst case cache behaviour. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: SO_REUSEPORT? 2008-08-07 19:03 ` SO_REUSEPORT? Stephen Hemminger @ 2008-08-07 19:43 ` Tom Herbert 2008-08-07 20:14 ` SO_REUSEPORT? Rick Jones 0 siblings, 1 reply; 9+ messages in thread From: Tom Herbert @ 2008-08-07 19:43 UTC (permalink / raw) To: Stephen Hemminger; +Cc: Rick Jones, netdev On Thu, Aug 7, 2008 at 12:03 PM, Stephen Hemminger <stephen.hemminger@vyatta.com> wrote: > On Thu, 07 Aug 2008 11:17:55 -0700 > Rick Jones <rick.jones2@hp.com> wrote: > >> Tom Herbert wrote: >> >>>We are looking at ways to scale TCP listeners. I think we like is the >> >>>ability to listen on a port from multiple threads (sockets bound to >> >>>same port, INADDR_ANY, and no interface binding) , which is what >> >>>SO_REUSEPORT would seem to allow. Has this ever been implemented for >> >>>Linux or is there a good reason not to have it? >> >> >> >>On Linux, SO_REUSEADDR provide most of what SO_REUSEPORT provides on BSD. >> >> >> >>In any case, there is absolutely no point in creating multiple TCP listeners. >> >>Multiple threads can accept() on the same listener - at the same time. >> >> >> > >> > >> > We've been doing that, but then on wakeup it would seem that we're at >> > the mercy of scheduling-- basically which ever threads wakes up first >> > will get to process accept queue first. This seems to bias towards >> > threads running on the same CPU as the wakeup is called, and so this >> > method doesn't give us an even distribution of new connections across >> > the threads that we'd like. >> >> How would the presence of multiple TCP LISTEN endpoints change that? >> You'd then be at the mercy of whatever "scheduling" there was inside the >> stack. >> >> If you want to balance the threads, perhaps a dispatch thread, or a >> virtual one - each thread knows how many connections it is servicing, >> let them know how many the other threads are servicing, and if a thread >> has N more connections than the other threads have it not go into >> accept() that time around. Might need some tweaking to handle >> pathological starvation cases like all the other threads are hung I >> suppose but the basic idea is there. >> >> rick jones > > I suspect thread balancing would actually hurt performance! > You would be better off to have a couple of "hot" threads that are doing > all the work and stay in cache. If you push the work around to all the > threads, you have worst case cache behaviour. > I'm not sure that's applicable for us since the server application and networking will max out all the CPUs on host anyway; one way or another we need to dispatch the work of incoming connections to threads on different CPUs. If we do this in user space and do all accepts in one thread, the CPU of that thread becomes the bottleneck (we're accepting about 40,000 connections per second). If we have multiple accept threads running on different CPUs, this helps some, but the load is spread unevenly across the CPUs and we still can't get the highest connection rate. So it seems we're looking for a method that distributes the incoming connection load across CPUs pretty evenly. Tom But we need to spread the load across multiple threads on different CPUs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: SO_REUSEPORT? 2008-08-07 19:43 ` SO_REUSEPORT? Tom Herbert @ 2008-08-07 20:14 ` Rick Jones 2008-08-07 23:05 ` SO_REUSEPORT? Tom Herbert 0 siblings, 1 reply; 9+ messages in thread From: Rick Jones @ 2008-08-07 20:14 UTC (permalink / raw) To: Tom Herbert; +Cc: Stephen Hemminger, netdev > I'm not sure that's applicable for us since the server application and > networking will max out all the CPUs on host anyway; one way or > another we need to dispatch the work of incoming connections to > threads on different CPUs. If we do this in user space and do all > accepts in one thread, the CPU of that thread becomes the bottleneck > (we're accepting about 40,000 connections per second). If we have > multiple accept threads running on different CPUs, this helps some, > but the load is spread unevenly across the CPUs and we still can't get > the highest connection rate. So it seems we're looking for a method > that distributes the incoming connection load across CPUs pretty > evenly. Well, if you _really_ want the load spread, you may need to use a multiqueue (at least inbound if not also later outbound) interface, "know" how the NIC will hash and then have N distinct port numbers each assigned to a LISTEN endpoint. The old song and dance about making an N CPU system look as much like N single-CPU systems and all that... Unless there are NICs you can "tell" where to send the interrupts, which IMO is preferable - I have a preference for the application/scheduler telling "networking" where to work rather than networking (or the NIC) telling the scheduler where to run a thread - the archives of either here or netnews will probalby pull-up stuff were I've talked about Inbound Packet Scheduling (IPS) vs Thread Optimized Packet Scheduling (TOPS) and limitations of simplistic address hashing to pick a queue/processor/whatnot :) rick jones ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: SO_REUSEPORT? 2008-08-07 20:14 ` SO_REUSEPORT? Rick Jones @ 2008-08-07 23:05 ` Tom Herbert 2008-08-07 23:28 ` SO_REUSEPORT? Rick Jones 0 siblings, 1 reply; 9+ messages in thread From: Tom Herbert @ 2008-08-07 23:05 UTC (permalink / raw) To: Rick Jones; +Cc: Stephen Hemminger, netdev > Well, if you _really_ want the load spread, you may need to use a multiqueue > (at least inbound if not also later outbound) interface, "know" how the NIC > will hash and then have N distinct port numbers each assigned to a LISTEN > endpoint. The old song and dance about making an N CPU system look as much > like N single-CPU systems and all that... > Yep that's what I really want, except for the fact that I can only use a single port for the server-- all flows could be nicely distributed by the NIC multiqueue, but I still have the problem of how to ensure that the accepting thread for a connection is run on the same CPU as the interrupt and SYN processing were. > Unless there are NICs you can "tell" where to send the interrupts, which IMO > is preferable - I have a preference for the application/scheduler telling > "networking" where to work rather than networking (or the NIC) telling the > scheduler where to run a thread - the archives of either here or netnews > will probalby pull-up stuff were I've talked about Inbound Packet Scheduling > (IPS) vs Thread Optimized Packet Scheduling (TOPS) and limitations of > simplistic address hashing to pick a queue/processor/whatnot :) > NICs are already doing steering based on tuple hash (RSS), and I think some will allow specifying the CPU for interrupt based on RX flow. Maybe this would address the issues of Inbound Packet Scheduling? Thanks for the pointers on IPS and TOPS. Out of curiosity has there been an effort to do TOPS on Linux? We are doing something very similar in software RSS with a fair amount of success (I posted patches for this a while back). Tom ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: SO_REUSEPORT? 2008-08-07 23:05 ` SO_REUSEPORT? Tom Herbert @ 2008-08-07 23:28 ` Rick Jones 0 siblings, 0 replies; 9+ messages in thread From: Rick Jones @ 2008-08-07 23:28 UTC (permalink / raw) To: Tom Herbert; +Cc: Stephen Hemminger, netdev Tom Herbert wrote: >>Well, if you _really_ want the load spread, you may need to use a multiqueue >>(at least inbound if not also later outbound) interface, "know" how the NIC >>will hash and then have N distinct port numbers each assigned to a LISTEN >>endpoint. The old song and dance about making an N CPU system look as much >>like N single-CPU systems and all that... >> > > > Yep that's what I really want, except for the fact that I can only use > a single port for the server-- all flows could be nicely distributed > by the NIC multiqueue, but I still have the problem of how to ensure > that the accepting thread for a connection is run on the same CPU as > the interrupt and SYN processing were. That is where needing to know/control the NIC's hashing comes into play. > NICs are already doing steering based on tuple hash (RSS), and I think > some will allow specifying the CPU for interrupt based on RX flow. > Maybe this would address the issues of Inbound Packet Scheduling? All IPS in HP-UX 10.20 was was hash the IP/port numbers and queue based on that - this at the handoff between driver and netisr. The problem was if you had a thread of execution servicing more than one connection, you would start whipsawing across the processors based on the remote addressing. There are IIRC indeed some NICs where you can give them a finite number of tuples and say where each tuple should go. I'm sure those vendors if watching can speak-up :) That sort of functionality can be useful and would address the limitations of ISS/plain NIC header address hashing. At least for long-lived connections. Or perhaps even long-lived LISTEN endpoints :) While you say you are constrained to a single port number, are you similarly constrained to a single IP address? > Thanks for the pointers on IPS and TOPS. Out of curiosity has there > been an effort to do TOPS on Linux? We are doing something very > similar in software RSS with a fair amount of success (I posted > patches for this a while back). I'm not sure. Anything is possible. The nice thing about TOPS in UX 11.X was/is the lookup was essentially free and didn't involve things going across I/O busses. Start to have to update those tuple mappings on the NIC with any frequency and that's the end of that. rick ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2008-08-07 23:28 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-08-07 16:57 SO_REUSEPORT? Tom Herbert 2008-08-07 17:09 ` SO_REUSEPORT? Rémi Denis-Courmont 2008-08-07 17:58 ` SO_REUSEPORT? Tom Herbert 2008-08-07 18:17 ` SO_REUSEPORT? Rick Jones 2008-08-07 19:03 ` SO_REUSEPORT? Stephen Hemminger 2008-08-07 19:43 ` SO_REUSEPORT? Tom Herbert 2008-08-07 20:14 ` SO_REUSEPORT? Rick Jones 2008-08-07 23:05 ` SO_REUSEPORT? Tom Herbert 2008-08-07 23:28 ` SO_REUSEPORT? Rick Jones
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).