All of lore.kernel.org
 help / color / mirror / Atom feed
* Least Connection Scheduler
@ 2008-04-01  4:36 Jason Stubbs
  2008-04-01  5:16 ` Simon Horman
  0 siblings, 1 reply; 9+ messages in thread
From: Jason Stubbs @ 2008-04-01  4:36 UTC (permalink / raw)
  To: lvs-devel

[-- Attachment #1: Type: text/plain, Size: 1108 bytes --]

Hi all,

This mail half belongs on -user, but there's a patch attached so I'm sending 
it here instead.

I'm wanting to use the LC scheduler with servers of different specs but the 
docs say that it doesn't perform well in this case due to TIME_WAIT 
connections. According to the HOWTO, everything that is not an ESTABLISHED 
connection is counted as inactive.  The current LC scheduler calculates each 
server with the formula of (activeconns<<8) + inactconns.

Now, the only reason I can see for activeconns to be offset by inactconns at 
all is so that round-robining happens when activeconns is equal among several 
servers. If that is in fact the only reason, how does the attached patch 
look? The resulting request distribution should match server resources fairly 
closely with sufficient load. The only downside that I can see is that slower 
servers would get priority when activeconns are equal, but is that really a 
problem?

-- 
Jason Stubbs <j.stubbs@linkthink.co.jp>
LINKTHINK INC.
東京都渋谷区桜ヶ丘町22-14 N.E.S S棟 3F
TEL 03-5728-4772  FAX 03-5728-4773

[-- Attachment #2: ip_vs_lc.patch --]
[-- Type: text/x-diff, Size: 2497 bytes --]

--- ip_vs_lc.c.orig	2008-04-01 11:35:37.518877853 +0900
+++ ip_vs_lc.c	2008-04-01 12:04:23.769147323 +0900
@@ -40,21 +40,6 @@
 }
 
 
-static inline unsigned int
-ip_vs_lc_dest_overhead(struct ip_vs_dest *dest)
-{
-	/*
-	 * We think the overhead of processing active connections is 256
-	 * times higher than that of inactive connections in average. (This
-	 * 256 times might not be accurate, we will change it later) We
-	 * use the following formula to estimate the overhead now:
-	 *		  dest->activeconns*256 + dest->inactconns
-	 */
-	return (atomic_read(&dest->activeconns) << 8) +
-		atomic_read(&dest->inactconns);
-}
-
-
 /*
  *	Least Connection scheduling
  */
@@ -62,14 +47,16 @@
 ip_vs_lc_schedule(struct ip_vs_service *svc, const struct sk_buff *skb)
 {
 	struct ip_vs_dest *dest, *least = NULL;
-	unsigned int loh = 0, doh;
+	unsigned int least_activeconns = 0, least_inactconns = 0;
+	unsigned int dest_activeconns = 0, dest_inactconns = 0;
 
 	IP_VS_DBG(6, "ip_vs_lc_schedule(): Scheduling...\n");
 
 	/*
-	 * Simply select the server with the least number of
-	 *        (activeconns<<5) + inactconns
-	 * Except whose weight is equal to zero.
+	 * Simply select the server with the least number of active
+	 * connections, or the least number of inactive connections
+	 * where the number of active connections is equal, excluding
+	 * servers whose weight is equal to zero.
 	 * If the weight is equal to zero, it means that the server is
 	 * quiesced, the existing connections to the server still get
 	 * served, but no new connection is assigned to the server.
@@ -79,18 +66,23 @@
 		if ((dest->flags & IP_VS_DEST_F_OVERLOAD) ||
 		    atomic_read(&dest->weight) == 0)
 			continue;
-		doh = ip_vs_lc_dest_overhead(dest);
-		if (!least || doh < loh) {
+		dest_activeconns = atomic_read(&dest->activeconns);
+		dest_inactconns = atomic_read(&dest->inactconns);
+		if (!least || dest_activeconns < least_activeconns) {
+			least = dest;
+			least_activeconns = dest_activeconns;
+			least_inactconns = dest_inactconns;
+		} else if (dest_activeconns == least_activeconns &&
+			dest_inactconns < least_inactconns) {
 			least = dest;
-			loh = doh;
+			least_inactconns = dest_inactconns;
 		}
 	}
 
 	if (least)
 	IP_VS_DBG(6, "LC: server %u.%u.%u.%u:%u activeconns %d inactconns %d\n",
 		  NIPQUAD(least->addr), ntohs(least->port),
-		  atomic_read(&least->activeconns),
-		  atomic_read(&least->inactconns));
+		  least_activeconns, least_inactconns);
 
 	return least;
 }

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Least Connection Scheduler
  2008-04-01  4:36 Least Connection Scheduler Jason Stubbs
@ 2008-04-01  5:16 ` Simon Horman
  2008-04-01  5:55   ` Jason Stubbs
  0 siblings, 1 reply; 9+ messages in thread
From: Simon Horman @ 2008-04-01  5:16 UTC (permalink / raw)
  To: Jason Stubbs; +Cc: lvs-devel

On Tue, Apr 01, 2008 at 01:36:08PM +0900, Jason Stubbs wrote:
> Hi all,
> 
> This mail half belongs on -user, but there's a patch attached so I'm sending 
> it here instead.
> 
> I'm wanting to use the LC scheduler with servers of different specs but the 
> docs say that it doesn't perform well in this case due to TIME_WAIT 
> connections. According to the HOWTO, everything that is not an ESTABLISHED 
> connection is counted as inactive.  The current LC scheduler calculates each 
> server with the formula of (activeconns<<8) + inactconns.
> 
> Now, the only reason I can see for activeconns to be offset by inactconns at 
> all is so that round-robining happens when activeconns is equal among several 
> servers. If that is in fact the only reason, how does the attached patch 
> look? The resulting request distribution should match server resources fairly 
> closely with sufficient load. The only downside that I can see is that slower 
> servers would get priority when activeconns are equal, but is that really a 
> problem?

Hi Jason,

I think that the reasoning is that there is some expense related to
inactive connections, though its probably only in terms of memory
or possibly scheduler (thus CPU) being taken up, and its probably
a lot less than 1/256th of the cost associated with a live connection.

I like your patch, but I wonder if it might be better to make this
configurable. Perhaps two values, multiplier for active and multiplier
for inactive, which would be 256 and 1 by default. Setting such
a configuration to 1 and 0 would achieve what you are after without
changing the default behaviour.

-- 
宝曼 西門 (ホウマン・サイモン) | Simon Horman (Horms)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Least Connection Scheduler
  2008-04-01  5:16 ` Simon Horman
@ 2008-04-01  5:55   ` Jason Stubbs
  2008-04-02  2:38     ` Jason Stubbs
  0 siblings, 1 reply; 9+ messages in thread
From: Jason Stubbs @ 2008-04-01  5:55 UTC (permalink / raw)
  To: lvs-devel

On Tuesday 01 April 2008 14:16:41 Simon Horman wrote:
> On Tue, Apr 01, 2008 at 01:36:08PM +0900, Jason Stubbs wrote:
> > Hi all,
> >
> > This mail half belongs on -user, but there's a patch attached so I'm
> > sending it here instead.
> >
> > I'm wanting to use the LC scheduler with servers of different specs but
> > the docs say that it doesn't perform well in this case due to TIME_WAIT
> > connections. According to the HOWTO, everything that is not an
> > ESTABLISHED connection is counted as inactive.  The current LC scheduler
> > calculates each server with the formula of (activeconns<<8) + inactconns.
> >
> > Now, the only reason I can see for activeconns to be offset by inactconns
> > at all is so that round-robining happens when activeconns is equal among
> > several servers. If that is in fact the only reason, how does the
> > attached patch look? The resulting request distribution should match
> > server resources fairly closely with sufficient load. The only downside
> > that I can see is that slower servers would get priority when activeconns
> > are equal, but is that really a problem?
>
> I think that the reasoning is that there is some expense related to
> inactive connections, though its probably only in terms of memory
> or possibly scheduler (thus CPU) being taken up, and its probably
> a lot less than 1/256th of the cost associated with a live connection.

This is the main reason why I kept the inactconns check as a secondary 
decision. The number of inactive connections should still stay fairly well 
balanced. If the number of inactive connections on a more powerful server is 
high enough that it starts affecting performance, lesser servers should start 
getting more requests causing things to even out again.

> I like your patch, but I wonder if it might be better to make this
> configurable. Perhaps two values, multiplier for active and multiplier
> for inactive, which would be 256 and 1 by default. Setting such
> a configuration to 1 and 0 would achieve what you are after without
> changing the default behaviour.

Hmm.. How would configuration be done? sysctls? None of the schedulers 
currently have any configuration other than server weight as far as I know. 
Also, the round-robin effect of utilizing inactive connection counts is still 
needed. In the case where several real servers share the load of servicing 
several VIPs, the servers listed earlier would be hit more often possibly 
overloading them without any round-robining.

The request distribution should be nearly identical in the case of real 
servers of equal specs. I guess I should brush off my mathematics and 
calculate what the difference is in the various other cases. ;)

-- 
Jason Stubbs <j.stubbs@linkthink.co.jp>
LINKTHINK INC.
東京都渋谷区桜ヶ丘町22-14 N.E.S S棟 3F
TEL 03-5728-4772  FAX 03-5728-4773

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Least Connection Scheduler
  2008-04-01  5:55   ` Jason Stubbs
@ 2008-04-02  2:38     ` Jason Stubbs
  2008-04-02  8:04       ` Simon Horman
  0 siblings, 1 reply; 9+ messages in thread
From: Jason Stubbs @ 2008-04-02  2:38 UTC (permalink / raw)
  To: lvs-devel

[-- Attachment #1: Type: text/plain, Size: 2812 bytes --]

On Tuesday 01 April 2008 14:55:58 Jason Stubbs wrote:
> On Tuesday 01 April 2008 14:16:41 Simon Horman wrote:
> > I think that the reasoning is that there is some expense related to
> > inactive connections, though its probably only in terms of memory
> > or possibly scheduler (thus CPU) being taken up, and its probably
> > a lot less than 1/256th of the cost associated with a live connection.
>
> This is the main reason why I kept the inactconns check as a secondary
> decision. The number of inactive connections should still stay fairly well
> balanced. If the number of inactive connections on a more powerful server
> is high enough that it starts affecting performance, lesser servers should
> start getting more requests causing things to even out again.
>
> > I like your patch, but I wonder if it might be better to make this
> > configurable. Perhaps two values, multiplier for active and multiplier
> > for inactive, which would be 256 and 1 by default. Setting such
> > a configuration to 1 and 0 would achieve what you are after without
> > changing the default behaviour.
>
> The request distribution should be nearly identical in the case of real
> servers of equal specs. I guess I should brush off my mathematics and
> calculate what the difference is in the various other cases. ;)

My mathematics was never really that good that I can just brush it off. ;)
Instead, I wrote a little simulation (attached) that compares behaviours.
The unbracketed figures below are values at the end of the run, the bracketed 
figures below are peak values during the run and T is the total number of 
connections sent to that server.

With 1000reqs/sec and two servers where #1 can handle 20% more requests:

Current LC
1:  A 21(23)  I 30567(30618)  T 153040
2:  A 24(26)  I 29388(29595)  T 146960

Patched LC
1:  A 22(22)  I 32978(32979)  T 164998
2:  A 23(23)  I 26977(26980)  T 135002

With 1000reqs/sec and two servers where #1 can handle 400% more requests:

Current LC
1:  A  5(11)  I 32352(32546)  T 162414
2:  A 24(26)  I 27619(28344)  T 137586

Patched LC
1:  A  9(10)  I 49191(49195)  T 245998
2:  A  9(10)  I 10791(10793)  T  54002

Looking at these figures, the only real problem would be the extra number of 
inactive connections on the faster server. However, after thinking about 
adding server weights to the equation, I'm wondering if this would not be 
better as yet-another-scheduler? I don't really like the idea of adding extra 
configuration as it steps away from LVS's current simplicity, but the 
difference in behaviour compared to the WLC scheduler is too great to be able 
to merge as is... Would yet-another-scheduler be accepted?

-- 
Jason Stubbs <j.stubbs@linkthink.co.jp>
LINKTHINK INC.
東京都渋谷区桜ヶ丘町22-14 N.E.S S棟 3F
TEL 03-5728-4772  FAX 03-5728-4773

[-- Attachment #2: lvslctest.cpp --]
[-- Type: text/x-c++src, Size: 4051 bytes --]

#include <iostream>
#include <queue>
#include <vector>

#define INACTIVE_TIME 60000
#define NEW_CONN_INTERVAL 1
#define SERVER1_TIME 40
#define SERVER2_TIME 50


using namespace std;

class Server
{
public:
	Server(int processingTime)
	: m_processingTime(processingTime)
	, m_maxActive(0)
	, m_maxInactive(0)
	, m_totalCount(0)
	{
	}

	void tick(int time)
	{
		if (m_active.size() > m_maxActive) {
			m_maxActive = m_active.size();
		}

		int expireTime = time - m_processingTime;
		while (m_active.size() && m_active.front() <= expireTime) {
			m_inactive.push(m_active.front());
			m_active.pop();
		}

		if (m_inactive.size() > m_maxInactive) {
			m_maxInactive = m_inactive.size();
		}

		expireTime = time - INACTIVE_TIME;
		while (m_inactive.size() && m_inactive.front() <= expireTime) {
			m_inactive.pop();
		}
	}

	void addConn(int time)
	{
		m_active.push(time);
		m_totalCount++;
	}

	int activeCount()
	{
		return m_active.size();
	}

	int maxActive()
	{
		return m_maxActive;
	}

	int inactiveCount()
	{
		return m_inactive.size();
	}

	int maxInactive()
	{
		return m_maxInactive;
	}

	int totalCount()
	{
		return m_totalCount;
	}

private:
	int m_processingTime;
	queue<int> m_active;
	queue<int> m_inactive;
	int m_maxActive;
	int m_maxInactive;
	int m_totalCount;
};


class Balancer
{
public:
	Balancer()
	: m_currentTime(0)
	{
	}

	~Balancer()
	{
		vector<Server*>::iterator i_server;
		for (i_server = m_servers.begin(); i_server != m_servers.end(); i_server++) {
			delete *i_server;
		}
	}

	void addServer(Server* server)
	{
		m_servers.push_back(server);
	}

	Server* regularLC()
	{
		Server* least = NULL;
		int loh = 0;

		vector<Server*>::iterator i_server;
		for (i_server = m_servers.begin(); i_server != m_servers.end(); i_server++) {
			Server* server = *i_server;
			int doh = (server->activeCount() << 8) + server->inactiveCount();
			if (!least || doh < loh) {
				least = server;
				loh = doh;
			}
		}

		return least;
	}

	Server* patchedLC()
	{
		Server* least = NULL;
		int least_active = 0;
		int least_inactive = 0;

		vector<Server*>::iterator i_server;
		for (i_server = m_servers.begin(); i_server != m_servers.end(); i_server++) {
			Server* server = *i_server;
			int server_active = server->activeCount();
			int server_inactive = server->inactiveCount();
			if (!least || server_active < least_active) {
				least = server;
				least_active = server_active;
				least_inactive = server_inactive;
			} else if (server_active == least_active && server_inactive < least_inactive) {
				least = server;
				least_inactive = server_inactive;
			}
		}

		return least;
	}

	void tick(int time)
	{
		vector<Server*>::iterator i_server;
		for (i_server = m_servers.begin(); i_server != m_servers.end(); i_server++) {
			Server* server = *i_server;
			server->tick(time);
		}
	}

	void debug()
	{
		vector<Server*>::iterator i_server;
		int i = 0;
		for (i_server = m_servers.begin(); i_server != m_servers.end(); i_server++) {
			Server* server = *i_server;
			cout << ++i << ":";
			cout << "  A " << server->activeCount() << "(" << server->maxActive() << ")";
			cout << "  I " << server->inactiveCount() << "(" << server->maxInactive() << ")";
			cout << "  T " << server->totalCount() << "\n";
		}
		cout << "\n";
	}

private:
	int m_currentTime;
	vector<Server*> m_servers;
};


int main(int, char**)
{
	Balancer *balancer;

	balancer = new Balancer();
	balancer->addServer(new Server(SERVER1_TIME));
	balancer->addServer(new Server(SERVER2_TIME));

	for (int time = 0; time < 5 * INACTIVE_TIME; time += NEW_CONN_INTERVAL) {
		Server* server = balancer->regularLC();
		server->addConn(time);
		balancer->tick(time);
	}
	balancer->debug();
	delete balancer;

	balancer = new Balancer();
	balancer->addServer(new Server(SERVER1_TIME));
	balancer->addServer(new Server(SERVER2_TIME));

	for (int time = 0; time < 5 * INACTIVE_TIME; time += NEW_CONN_INTERVAL) {
		balancer->tick(time);
		Server* server = balancer->patchedLC();
		server->addConn(time);
	}
	balancer->debug();
	delete balancer;

	return 0;
}

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Least Connection Scheduler
  2008-04-02  2:38     ` Jason Stubbs
@ 2008-04-02  8:04       ` Simon Horman
  2008-04-02  8:15         ` Graeme Fowler
  2008-04-03  0:58         ` Jason Stubbs
  0 siblings, 2 replies; 9+ messages in thread
From: Simon Horman @ 2008-04-02  8:04 UTC (permalink / raw)
  To: Jason Stubbs; +Cc: lvs-devel

On Wed, Apr 02, 2008 at 11:38:15AM +0900, Jason Stubbs wrote:
> On Tuesday 01 April 2008 14:55:58 Jason Stubbs wrote:
> > On Tuesday 01 April 2008 14:16:41 Simon Horman wrote:
> > > I think that the reasoning is that there is some expense related to
> > > inactive connections, though its probably only in terms of memory
> > > or possibly scheduler (thus CPU) being taken up, and its probably
> > > a lot less than 1/256th of the cost associated with a live connection.
> >
> > This is the main reason why I kept the inactconns check as a secondary
> > decision. The number of inactive connections should still stay fairly well
> > balanced. If the number of inactive connections on a more powerful server
> > is high enough that it starts affecting performance, lesser servers should
> > start getting more requests causing things to even out again.
> >
> > > I like your patch, but I wonder if it might be better to make this
> > > configurable. Perhaps two values, multiplier for active and multiplier
> > > for inactive, which would be 256 and 1 by default. Setting such
> > > a configuration to 1 and 0 would achieve what you are after without
> > > changing the default behaviour.
> >
> > The request distribution should be nearly identical in the case of real
> > servers of equal specs. I guess I should brush off my mathematics and
> > calculate what the difference is in the various other cases. ;)
> 
> My mathematics was never really that good that I can just brush it off. ;)
> Instead, I wrote a little simulation (attached) that compares behaviours.
> The unbracketed figures below are values at the end of the run, the bracketed 
> figures below are peak values during the run and T is the total number of 
> connections sent to that server.
> 
> With 1000reqs/sec and two servers where #1 can handle 20% more requests:
> 
> Current LC
> 1:  A 21(23)  I 30567(30618)  T 153040
> 2:  A 24(26)  I 29388(29595)  T 146960
> 
> Patched LC
> 1:  A 22(22)  I 32978(32979)  T 164998
> 2:  A 23(23)  I 26977(26980)  T 135002
> 
> With 1000reqs/sec and two servers where #1 can handle 400% more requests:
> 
> Current LC
> 1:  A  5(11)  I 32352(32546)  T 162414
> 2:  A 24(26)  I 27619(28344)  T 137586
> 
> Patched LC
> 1:  A  9(10)  I 49191(49195)  T 245998
> 2:  A  9(10)  I 10791(10793)  T  54002
> 
> Looking at these figures, the only real problem would be the extra number of 
> inactive connections on the faster server. However, after thinking about 
> adding server weights to the equation, I'm wondering if this would not be 
> better as yet-another-scheduler? I don't really like the idea of adding extra 
> configuration as it steps away from LVS's current simplicity, but the 
> difference in behaviour compared to the WLC scheduler is too great to be able 
> to merge as is... Would yet-another-scheduler be accepted?

Nice numbers :-)

LVS does already have a number of /proc values that can be twiddled
so I personally don't see a problem with adding one more - then again
I'm bound to say that as it was my idea.

If you want to code it up as an additional scheduler that is fine.
But looking at your numbers, I am kind of leaning towards just
using your existing patch.

I agree that the only real problem would be the extra number of
inactive connections on the faster server. But the overhead of such
things is really quite small - ~128 bytes of memory and a bit of
extra time to go through the hash table (maybe).

-- 
宝曼 西門 (ホウマン・サイモン) | Simon Horman (Horms)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Least Connection Scheduler
  2008-04-02  8:04       ` Simon Horman
@ 2008-04-02  8:15         ` Graeme Fowler
  2008-04-02  8:31           ` Simon Horman
  2008-04-03 11:14           ` Jason Stubbs
  2008-04-03  0:58         ` Jason Stubbs
  1 sibling, 2 replies; 9+ messages in thread
From: Graeme Fowler @ 2008-04-02  8:15 UTC (permalink / raw)
  To: Simon Horman; +Cc: Jason Stubbs, lvs-devel

On Wed, 2008-04-02 at 17:04 +0900, Simon Horman wrote:
> LVS does already have a number of /proc values that can be twiddled
> so I personally don't see a problem with adding one more - then again
> I'm bound to say that as it was my idea.

Personally speaking, the more stuff that's controllable with /proc the
better - it makes it easier to code up additional control environments
without the execution overhead of calling ipvsadm multiple times.

> If you want to code it up as an additional scheduler that is fine.
> But looking at your numbers, I am kind of leaning towards just
> using your existing patch.

Pardon me for speaking out of turn, but an idea just crossed my mind -
wouldn't it be better to "merge" (not in code terms) the lc and wlc
schedulers so they either base on, or use exactly, the same code? After
all, in concept the lc scheduler is simply wlc with equal weights of 1.
Isn't it?
That shrinks the codebase a bit.

> I agree that the only real problem would be the extra number of
> inactive connections on the faster server. But the overhead of such
> things is really quite small - ~128 bytes of memory and a bit of
> extra time to go through the hash table (maybe).

This would only be a problem in really large throughput situations, and
if you have one of them you probably already have some $$$ to buy a
faster processor!

Regards

Graeme


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Least Connection Scheduler
  2008-04-02  8:15         ` Graeme Fowler
@ 2008-04-02  8:31           ` Simon Horman
  2008-04-03 11:14           ` Jason Stubbs
  1 sibling, 0 replies; 9+ messages in thread
From: Simon Horman @ 2008-04-02  8:31 UTC (permalink / raw)
  To: Graeme Fowler; +Cc: Jason Stubbs, lvs-devel

On Wed, Apr 02, 2008 at 09:15:57AM +0100, Graeme Fowler wrote:
> On Wed, 2008-04-02 at 17:04 +0900, Simon Horman wrote:
> > LVS does already have a number of /proc values that can be twiddled
> > so I personally don't see a problem with adding one more - then again
> > I'm bound to say that as it was my idea.
> 
> Personally speaking, the more stuff that's controllable with /proc the
> better - it makes it easier to code up additional control environments
> without the execution overhead of calling ipvsadm multiple times.
> 
> > If you want to code it up as an additional scheduler that is fine.
> > But looking at your numbers, I am kind of leaning towards just
> > using your existing patch.
> 
> Pardon me for speaking out of turn, but an idea just crossed my mind -
> wouldn't it be better to "merge" (not in code terms) the lc and wlc
> schedulers so they either base on, or use exactly, the same code? After
> all, in concept the lc scheduler is simply wlc with equal weights of 1.
> Isn't it?
> That shrinks the codebase a bit.

Yes, I think that would be an excellent idea.
We could still have the different schedulers (for compatibility), but
have them share common code - which would basically be all of it.

I imagine that the same thing could be done for rr and wrr.

I guess that the original motivation for the separation was
either performance or to demonstrate it was possible to have more
than one scheduler at a time when there was only one.

> > I agree that the only real problem would be the extra number of
> > inactive connections on the faster server. But the overhead of such
> > things is really quite small - ~128 bytes of memory and a bit of
> > extra time to go through the hash table (maybe).
> 
> This would only be a problem in really large throughput situations, and
> if you have one of them you probably already have some $$$ to buy a
> faster processor!

Yes, I was struggling to think of a situation where it would be
a problem in practice.

-- 
Horms


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Least Connection Scheduler
  2008-04-02  8:04       ` Simon Horman
  2008-04-02  8:15         ` Graeme Fowler
@ 2008-04-03  0:58         ` Jason Stubbs
  1 sibling, 0 replies; 9+ messages in thread
From: Jason Stubbs @ 2008-04-03  0:58 UTC (permalink / raw)
  To: lvs-devel

[-- Attachment #1: Type: text/plain, Size: 3748 bytes --]

On Wednesday 02 April 2008 17:04:14 Simon Horman wrote:
> On Wed, Apr 02, 2008 at 11:38:15AM +0900, Jason Stubbs wrote:
> > On Tuesday 01 April 2008 14:55:58 Jason Stubbs wrote:
> > > The request distribution should be nearly identical in the case of real
> > > servers of equal specs. I guess I should brush off my mathematics and
> > > calculate what the difference is in the various other cases. ;)
> >
> > My mathematics was never really that good that I can just brush it off.
> > ;) Instead, I wrote a little simulation (attached) that compares
> > behaviours. The unbracketed figures below are values at the end of the
> > run, the bracketed figures below are peak values during the run and T is
> > the total number of connections sent to that server.
> >
> > With 1000reqs/sec and two servers where #1 can handle 20% more requests:
> >
> > Current LC
> > 1:  A 21(23)  I 30567(30618)  T 153040
> > 2:  A 24(26)  I 29388(29595)  T 146960
> >
> > Patched LC
> > 1:  A 22(22)  I 32978(32979)  T 164998
> > 2:  A 23(23)  I 26977(26980)  T 135002
> >
> > With 1000reqs/sec and two servers where #1 can handle 400% more requests:
> >
> > Current LC
> > 1:  A  5(11)  I 32352(32546)  T 162414
> > 2:  A 24(26)  I 27619(28344)  T 137586
> >
> > Patched LC
> > 1:  A  9(10)  I 49191(49195)  T 245998
> > 2:  A  9(10)  I 10791(10793)  T  54002
> >
> > Looking at these figures, the only real problem would be the extra number
> > of inactive connections on the faster server. However, after thinking
> > about adding server weights to the equation, I'm wondering if this would
> > not be better as yet-another-scheduler? I don't really like the idea of
> > adding extra configuration as it steps away from LVS's current
> > simplicity, but the difference in behaviour compared to the WLC scheduler
> > is too great to be able to merge as is... Would yet-another-scheduler be
> > accepted?
>
> Nice numbers :-)

Remember that these come from the simulation, but should be fairly accurate. I 
haven't actually done more than compile tested the scheduler patches. I was 
going to/will do that once the patches are generally okayed. I don't have a 
test bed at the moment. :(

> LVS does already have a number of /proc values that can be twiddled
> so I personally don't see a problem with adding one more - then again
> I'm bound to say that as it was my idea.

I coded up param twiddling before receiving this mail but didn't get a chance 
to send it out. Instead of proc values, I put the config into Kconfig. To get 
round-robining happening when the inactive weight is 0, I essentially merged 
the RR and (W)LC schedulers.

Slightly off-topic but is the compiler able to remove the 
atomic_read(&dest->inactconns) * CONFIG_IP_VS_LC_INACTIVE_WEIGHT calculation 
when CONFIG_IP_VS_LC_INACTIVE_WEIGHT is 0? If these values were changed to be 
configuration via proc, what's the overhead for retrieving the values?

> If you want to code it up as an additional scheduler that is fine.
> But looking at your numbers, I am kind of leaning towards just
> using your existing patch.
>
> I agree that the only real problem would be the extra number of
> inactive connections on the faster server. But the overhead of such
> things is really quite small - ~128 bytes of memory and a bit of
> extra time to go through the hash table (maybe).

It's not such of a big deal with the LC scheduler, but users may have finely 
tuned weights with the WLC scheduler. Without user intervention to change the 
values, these users will find that higher weighted servers are suddenly 
getting a whole lot more connections...

-- 
Jason Stubbs <j.stubbs@linkthink.co.jp>
LINKTHINK INC.
東京都渋谷区桜ヶ丘町22-14 N.E.S S棟 3F
TEL 03-5728-4772  FAX 03-5728-4773

[-- Attachment #2: lvs-lc-rr.patch --]
[-- Type: text/x-diff, Size: 5802 bytes --]

diff -uNr linux-2.6.24-orig/net/ipv4/ipvs/Kconfig linux-2.6.24/net/ipv4/ipvs/Kconfig
--- linux-2.6.24-orig/net/ipv4/ipvs/Kconfig	2008-01-25 07:58:37.000000000 +0900
+++ linux-2.6.24/net/ipv4/ipvs/Kconfig	2008-04-02 14:32:47.415956873 +0900
@@ -127,6 +127,18 @@
 	  If you want to compile it in kernel, say Y. To compile it as a
 	  module, choose M here. If unsure, say N.

+if IP_VS_LC || IP_VS_WLC
+
+config	IP_VS_LC_ACTIVE_WEIGHT
+	int "LC active connection weight"
+	default "255"
+
+config	IP_VS_LC_INACTIVE_WEIGHT
+	int "LC inactive connection weight"
+	default "1"
+
+endif # IP_VS_LC || IP_VS_WLC
+
 config	IP_VS_LBLC
 	tristate "locality-based least-connection scheduling"
 	---help---
diff -uNr linux-2.6.24-orig/net/ipv4/ipvs/ip_vs_lc.c linux-2.6.24/net/ipv4/ipvs/ip_vs_lc.c
--- linux-2.6.24-orig/net/ipv4/ipvs/ip_vs_lc.c	2008-01-25 07:58:37.000000000 +0900
+++ linux-2.6.24/net/ipv4/ipvs/ip_vs_lc.c	2008-04-02 15:13:54.121724765 +0900
@@ -24,6 +24,7 @@

 static int ip_vs_lc_init_svc(struct ip_vs_service *svc)
 {
+	svc->sched_data = &svc->destinations;
 	return 0;
 }

@@ -36,6 +37,7 @@

 static int ip_vs_lc_update_svc(struct ip_vs_service *svc)
 {
+	svc->sched_data = &svc->destinations;
 	return 0;
 }

@@ -43,15 +45,8 @@
 static inline unsigned int
 ip_vs_lc_dest_overhead(struct ip_vs_dest *dest)
 {
-	/*
-	 * We think the overhead of processing active connections is 256
-	 * times higher than that of inactive connections in average. (This
-	 * 256 times might not be accurate, we will change it later) We
-	 * use the following formula to estimate the overhead now:
-	 *		  dest->activeconns*256 + dest->inactconns
-	 */
-	return (atomic_read(&dest->activeconns) << 8) +
-		atomic_read(&dest->inactconns);
+	return atomic_read(&dest->activeconns) * CONFIG_IP_VS_LC_ACTIVE_WEIGHT +
+		atomic_read(&dest->inactconns) * CONFIG_IP_VS_LC_INACTIVE_WEIGHT;
 }


@@ -61,6 +56,7 @@
 static struct ip_vs_dest *
 ip_vs_lc_schedule(struct ip_vs_service *svc, const struct sk_buff *skb)
 {
+	struct list_head *p;
 	struct ip_vs_dest *dest, *least = NULL;
 	unsigned int loh = 0, doh;

@@ -75,16 +71,31 @@
 	 * served, but no new connection is assigned to the server.
 	 */

-	list_for_each_entry(dest, &svc->destinations, n_list) {
+	write_lock(&svc->sched_lock);
+	p = (struct list_head *)svc->sched_data;
+	do {
+		/* skip list head */
+		if (p == &svc->destinations)
+			goto next;
+
+		dest = list_entry(p, struct ip_vs_dest, n_list);
 		if ((dest->flags & IP_VS_DEST_F_OVERLOAD) ||
 		    atomic_read(&dest->weight) == 0)
-			continue;
+			goto next;
+
 		doh = ip_vs_lc_dest_overhead(dest);
 		if (!least || doh < loh) {
 			least = dest;
 			loh = doh;
 		}
-	}
+
+	next:
+		p = p->next;
+	} while (p != (struct list_head *)svc->sched_data);
+
+	p = p->next;
+	svc->sched_data = p;
+	write_unlock(&svc->sched_lock);

 	if (least)
 	IP_VS_DBG(6, "LC: server %u.%u.%u.%u:%u activeconns %d inactconns %d\n",
diff -uNr linux-2.6.24-orig/net/ipv4/ipvs/ip_vs_wlc.c linux-2.6.24/net/ipv4/ipvs/ip_vs_wlc.c
--- linux-2.6.24-orig/net/ipv4/ipvs/ip_vs_wlc.c	2008-01-25 07:58:37.000000000 +0900
+++ linux-2.6.24/net/ipv4/ipvs/ip_vs_wlc.c	2008-04-02 15:14:11.731531485 +0900
@@ -30,6 +30,7 @@
 static int
 ip_vs_wlc_init_svc(struct ip_vs_service *svc)
 {
+	svc->sched_data = &svc->destinations;
 	return 0;
 }

@@ -44,6 +45,7 @@
 static int
 ip_vs_wlc_update_svc(struct ip_vs_service *svc)
 {
+	svc->sched_data = &svc->destinations;
 	return 0;
 }

@@ -51,15 +53,8 @@
 static inline unsigned int
 ip_vs_wlc_dest_overhead(struct ip_vs_dest *dest)
 {
-	/*
-	 * We think the overhead of processing active connections is 256
-	 * times higher than that of inactive connections in average. (This
-	 * 256 times might not be accurate, we will change it later) We
-	 * use the following formula to estimate the overhead now:
-	 *		  dest->activeconns*256 + dest->inactconns
-	 */
-	return (atomic_read(&dest->activeconns) << 8) +
-		atomic_read(&dest->inactconns);
+	return atomic_read(&dest->activeconns) * CONFIG_IP_VS_LC_ACTIVE_WEIGHT +
+		atomic_read(&dest->inactconns) * CONFIG_IP_VS_LC_INACTIVE_WEIGHT;
 }


@@ -69,8 +64,9 @@
 static struct ip_vs_dest *
 ip_vs_wlc_schedule(struct ip_vs_service *svc, const struct sk_buff *skb)
 {
-	struct ip_vs_dest *dest, *least;
-	unsigned int loh, doh;
+	struct list_head *p;
+	struct ip_vs_dest *dest, *least = NULL;
+	unsigned int loh = 0, doh;

 	IP_VS_DBG(6, "ip_vs_wlc_schedule(): Scheduling...\n");

@@ -87,31 +83,36 @@
 	 * new connections.
 	 */

-	list_for_each_entry(dest, &svc->destinations, n_list) {
-		if (!(dest->flags & IP_VS_DEST_F_OVERLOAD) &&
-		    atomic_read(&dest->weight) > 0) {
-			least = dest;
-			loh = ip_vs_wlc_dest_overhead(least);
-			goto nextstage;
+	write_lock(&svc->sched_lock);
+	p = (struct list_head *)svc->sched_data;
+	do {
+		/* skip list head */
+		if (p == &svc->destinations) {
+			goto next;
+		}
+
+		dest = list_entry(p, struct ip_vs_dest, n_list);
+		if ((dest->flags & IP_VS_DEST_F_OVERLOAD) ||
+		    atomic_read(&dest->weight) == 0) {
+			goto next;
 		}
-	}
-	return NULL;

-	/*
-	 *    Find the destination with the least load.
-	 */
-  nextstage:
-	list_for_each_entry_continue(dest, &svc->destinations, n_list) {
-		if (dest->flags & IP_VS_DEST_F_OVERLOAD)
-			continue;
 		doh = ip_vs_wlc_dest_overhead(dest);
-		if (loh * atomic_read(&dest->weight) >
+		if (!least || loh * atomic_read(&dest->weight) >
 		    doh * atomic_read(&least->weight)) {
 			least = dest;
 			loh = doh;
 		}
-	}

+	next:
+		p = p->next;
+	} while (p != (struct list_head *)svc->sched_data);
+
+	p = p->next;
+	svc->sched_data = p;
+	write_unlock(&svc->sched_lock);
+
+	if (least)
 	IP_VS_DBG(6, "WLC: server %u.%u.%u.%u:%u "
 		  "activeconns %d refcnt %d weight %d overhead %d\n",
 		  NIPQUAD(least->addr), ntohs(least->port),

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Least Connection Scheduler
  2008-04-02  8:15         ` Graeme Fowler
  2008-04-02  8:31           ` Simon Horman
@ 2008-04-03 11:14           ` Jason Stubbs
  1 sibling, 0 replies; 9+ messages in thread
From: Jason Stubbs @ 2008-04-03 11:14 UTC (permalink / raw)
  To: Graeme Fowler; +Cc: Simon Horman, lvs-devel

(not sure whether I should have been CC'ing in the past, but leaving them all 
in this time)

On Wednesday 02 April 2008 17:15:57 Graeme Fowler wrote:
> On Wed, 2008-04-02 at 17:04 +0900, Simon Horman wrote:
> > LVS does already have a number of /proc values that can be twiddled
> > so I personally don't see a problem with adding one more - then again
> > I'm bound to say that as it was my idea.
>
> Personally speaking, the more stuff that's controllable with /proc the
> better - it makes it easier to code up additional control environments
> without the execution overhead of calling ipvsadm multiple times.

I'm not sure what you mean here exactly. Are you saying that you'd like 
active/inactive weights be settable per virtual IP? If you have any pointers 
to existing code, I'd be grateful as I don't really know my way around the 
kernel at all.

> > If you want to code it up as an additional scheduler that is fine.
> > But looking at your numbers, I am kind of leaning towards just
> > using your existing patch.
>
> Pardon me for speaking out of turn, but an idea just crossed my mind -
> wouldn't it be better to "merge" (not in code terms) the lc and wlc
> schedulers so they either base on, or use exactly, the same code? After
> all, in concept the lc scheduler is simply wlc with equal weights of 1.
> Isn't it?
> That shrinks the codebase a bit.

I was thinking the same thing. The only reason I can see not to do this is the 
performance "hit" of doing extra multiplies. I guess if the code was merged 
with that in mind, the compiler should optimize them out anyway though.

> > I agree that the only real problem would be the extra number of
> > inactive connections on the faster server. But the overhead of such
> > things is really quite small - ~128 bytes of memory and a bit of
> > extra time to go through the hash table (maybe).
>
> This would only be a problem in really large throughput situations, and
> if you have one of them you probably already have some $$$ to buy a
> faster processor!

Hmm.. I was going to say again that it could be a problem with WLC, but after 
simulating it I found that it isn't:


Server 1 with weight=2 and twice as fast as Server 2

WLC
1:  A 17(23)  I 41159(41260)  T 206277
2:  A 16(17)  I 18808(19139)  T 93723

patched WLC
1:  A 20(23)  I 47980(47982)  T 239997
2:  A 10(11)  I 11990(11993)  T 60003


Server 1 with weight=2 and five times as fast as Server 2

WLC
1:  A 6(11)  I 41990(42120)  T 210497
2:  A 16(17)  I 17988(18629)  T 89503

patched WLC
1:  A 10(10)  I 53990(53995)  T 270000
2:  A 5(5)  I 5995(5996)  T 30000


Server 1 with weight=5 and five times as fast as Server 2

WLC
1:  A 8(11)  I 51097(51310)  T 255921
2:  A 8(9)  I 8887(9133)  T 44079

patched WLC
1:  A 10(10)  I 57590(57593)  T 288000
2:  A 2(2)  I 2398(2399)  T 12000


The number of connections thrown to the faster server is greater in each case 
but the number of simulatenous connections to the faster server is either the 
same or less. Even in this case, the only negative is the extra number of 
inactive connections.

So, is a configurable still needed? What about the round-robining? The 
round-robining has the benefit that the slower servers don't get an 
artificial priority boost, but adds a fair amount of processing overhead.

Let me know what to do and I'll at least look into doing it. :)

-- 
Jason Stubbs <j.stubbs@linkthink.co.jp>
LINKTHINK INC.
東京都渋谷区桜ヶ丘町22-14 N.E.S S棟 3F
TEL 03-5728-4772  FAX 03-5728-4773

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2008-04-03 11:14 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-01  4:36 Least Connection Scheduler Jason Stubbs
2008-04-01  5:16 ` Simon Horman
2008-04-01  5:55   ` Jason Stubbs
2008-04-02  2:38     ` Jason Stubbs
2008-04-02  8:04       ` Simon Horman
2008-04-02  8:15         ` Graeme Fowler
2008-04-02  8:31           ` Simon Horman
2008-04-03 11:14           ` Jason Stubbs
2008-04-03  0:58         ` Jason Stubbs

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.