ClusterIP network slowdown

All of lore.kernel.org
 help / color / mirror / Atom feed

* ClusterIP network slowdown
@ 2010-11-30  9:00 Michele Codutti
  2010-11-30 12:59 ` Edison Figueira
  2010-12-02 12:10 ` Pablo Neira Ayuso
  0 siblings, 2 replies; 6+ messages in thread
From: Michele Codutti @ 2010-11-30  9:00 UTC (permalink / raw)
  To: netfilter

Hello, in these days i had fun with the ClusterIP target associated to a
web server. All is good and bright with the exception of two issues:
- the message "CLUSTERIP: no conntrack!"
- a general slowdown of the other network services (like ssh) of the two
nodes of the cluster.
To solve all my problems i've inserted this iptables rule:
iptables -I INPUT 1 -m state --state INVALID -j DROP
This is a solution that isn't good enough because i manage the apache2
and the clustered ip with heartbeat2.
Example: if i standby a node (for maintenance) and resume it after a
while this can be a problem because heartbeat put the clusterip rule on
top of the others so the dropping rule above became the second one and
then the workaround had no effect.
Why the clusterip had such an heavy impact on the networking? Before the
clusterip my cluster was active-standby and i've got no problems at all.
Now that the load per node is halved i noticed more load than before.
The strangest thing is that (with the top tool) this load seem not exist
and the nodes are not loaded at all:
load average: 0.50, 0.36, 0.37
How can i fix this without the dropping rule above?
There is a way to see how the networking is loaded?

Thanks in advance.

Michele


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ClusterIP network slowdown
  2010-11-30  9:00 ClusterIP network slowdown Michele Codutti
@ 2010-11-30 12:59 ` Edison Figueira
  2010-11-30 16:00   ` Michele Codutti
  2010-12-02 12:10 ` Pablo Neira Ayuso
  1 sibling, 1 reply; 6+ messages in thread
From: Edison Figueira @ 2010-11-30 12:59 UTC (permalink / raw)
  To: Michele Codutti; +Cc: netfilter

Hi Michele,

Both cases is because the CLUSTERIP uses broadcast addresses to
work, in the first case the message is because the packet is sent to
two machines and one of them always drops in order, to solve this
just disable debug netfilter.

The second case is probably because all the packages that are
being sent to the CLUSTERIP are being copied to all
ports on your switch, to confirm this do a tcpdump on any workstation.

The solution to this case is, enable "IGMP snooping" on your switch.

Att

Edison Figueira Junior

2010/11/30 Michele Codutti <michele.codutti@uniud.it>
>
> Hello, in these days i had fun with the ClusterIP target associated to a
> web server. All is good and bright with the exception of two issues:
> - the message "CLUSTERIP: no conntrack!"
> - a general slowdown of the other network services (like ssh) of the two
> nodes of the cluster.
> To solve all my problems i've inserted this iptables rule:
> iptables -I INPUT 1 -m state --state INVALID -j DROP
> This is a solution that isn't good enough because i manage the apache2
> and the clustered ip with heartbeat2.
> Example: if i standby a node (for maintenance) and resume it after a
> while this can be a problem because heartbeat put the clusterip rule on
> top of the others so the dropping rule above became the second one and
> then the workaround had no effect.
> Why the clusterip had such an heavy impact on the networking? Before the
> clusterip my cluster was active-standby and i've got no problems at all.
> Now that the load per node is halved i noticed more load than before.
> The strangest thing is that (with the top tool) this load seem not exist
> and the nodes are not loaded at all:
> load average: 0.50, 0.36, 0.37
> How can i fix this without the dropping rule above?
> There is a way to see how the networking is loaded?
>
> Thanks in advance.
>
> Michele
>
> --
> To unsubscribe from this list: send the line "unsubscribe netfilter" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ClusterIP network slowdown
  2010-11-30 12:59 ` Edison Figueira
@ 2010-11-30 16:00   ` Michele Codutti
  2010-12-01  9:11     ` Michele Codutti
  0 siblings, 1 reply; 6+ messages in thread
From: Michele Codutti @ 2010-11-30 16:00 UTC (permalink / raw)
  To: Edison Figueira; +Cc: netfilter

Hi Edison, you're right, all the host on the same switch can see the
packages directed to the clusterip. But this is not a problem because
these other hosts are not affected by the slowdown. The only affected
nodes are the ones that are using the clusterip.
I cannot modify any configuration of any switch on my network without a
long process for approval i can not try to enable the igmp snooping
without a strong argument. How can igmp snooping relief some load on the
clustered hosts? 
There are any kernel parameters that i can tune to make clusterip behave
better?
I'm sorry for being so pedant but i need some precise technical
information to modify something in my network.

Thanks.

Il giorno mar, 30/11/2010 alle 10.59 -0200, Edison Figueira ha scritto:
> Hi Michele,
> 
> Both cases is because the CLUSTERIP uses broadcast addresses to
> work, in the first case the message is because the packet is sent to
> two machines and one of them always drops in order, to solve this
> just disable debug netfilter.
> 
> The second case is probably because all the packages that are
> being sent to the CLUSTERIP are being copied to all
> ports on your switch, to confirm this do a tcpdump on any workstation.
> 
> The solution to this case is, enable "IGMP snooping" on your switch.
> 
> Att
> 
> Edison Figueira Junior
> 
> 2010/11/30 Michele Codutti <michele.codutti@uniud.it>
> >
> > Hello, in these days i had fun with the ClusterIP target associated to a
> > web server. All is good and bright with the exception of two issues:
> > - the message "CLUSTERIP: no conntrack!"
> > - a general slowdown of the other network services (like ssh) of the two
> > nodes of the cluster.
> > To solve all my problems i've inserted this iptables rule:
> > iptables -I INPUT 1 -m state --state INVALID -j DROP
> > This is a solution that isn't good enough because i manage the apache2
> > and the clustered ip with heartbeat2.
> > Example: if i standby a node (for maintenance) and resume it after a
> > while this can be a problem because heartbeat put the clusterip rule on
> > top of the others so the dropping rule above became the second one and
> > then the workaround had no effect.
> > Why the clusterip had such an heavy impact on the networking? Before the
> > clusterip my cluster was active-standby and i've got no problems at all.
> > Now that the load per node is halved i noticed more load than before.
> > The strangest thing is that (with the top tool) this load seem not exist
> > and the nodes are not loaded at all:
> > load average: 0.50, 0.36, 0.37
> > How can i fix this without the dropping rule above?
> > There is a way to see how the networking is loaded?
> >
> > Thanks in advance.
> >
> > Michele
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe netfilter" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe netfilter" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ClusterIP network slowdown
  2010-11-30 16:00   ` Michele Codutti
@ 2010-12-01  9:11     ` Michele Codutti
  0 siblings, 0 replies; 6+ messages in thread
From: Michele Codutti @ 2010-12-01  9:11 UTC (permalink / raw)
  To: Edison Figueira; +Cc: netfilter

I forgot to tell you that heartbeat lost lots of udp packets that are
used to transport the heartbeat signal (on all interfaces). Also the
slowdown is perceptible also on the private network interfaces (the ones
that are not used by clusterip and are not connected to the same
switch).
It seems that the invalid packet make clusterip make the entire
networking subsystem slow not only on the network interface where i
configured it.
Any hints? I desperately need them! ;)

Il giorno mar, 30/11/2010 alle 17.00 +0100, Michele Codutti ha scritto:
> Hi Edison, you're right, all the host on the same switch can see the
> packages directed to the clusterip. But this is not a problem because
> these other hosts are not affected by the slowdown. The only affected
> nodes are the ones that are using the clusterip.
> I cannot modify any configuration of any switch on my network without a
> long process for approval i can not try to enable the igmp snooping
> without a strong argument. How can igmp snooping relief some load on the
> clustered hosts? 
> There are any kernel parameters that i can tune to make clusterip behave
> better?
> I'm sorry for being so pedant but i need some precise technical
> information to modify something in my network.
> 
> Thanks.
> 
> Il giorno mar, 30/11/2010 alle 10.59 -0200, Edison Figueira ha scritto:
> > Hi Michele,
> > 
> > Both cases is because the CLUSTERIP uses broadcast addresses to
> > work, in the first case the message is because the packet is sent to
> > two machines and one of them always drops in order, to solve this
> > just disable debug netfilter.
> > 
> > The second case is probably because all the packages that are
> > being sent to the CLUSTERIP are being copied to all
> > ports on your switch, to confirm this do a tcpdump on any workstation.
> > 
> > The solution to this case is, enable "IGMP snooping" on your switch.
> > 
> > Att
> > 
> > Edison Figueira Junior
> > 
> > 2010/11/30 Michele Codutti <michele.codutti@uniud.it>
> > >
> > > Hello, in these days i had fun with the ClusterIP target associated to a
> > > web server. All is good and bright with the exception of two issues:
> > > - the message "CLUSTERIP: no conntrack!"
> > > - a general slowdown of the other network services (like ssh) of the two
> > > nodes of the cluster.
> > > To solve all my problems i've inserted this iptables rule:
> > > iptables -I INPUT 1 -m state --state INVALID -j DROP
> > > This is a solution that isn't good enough because i manage the apache2
> > > and the clustered ip with heartbeat2.
> > > Example: if i standby a node (for maintenance) and resume it after a
> > > while this can be a problem because heartbeat put the clusterip rule on
> > > top of the others so the dropping rule above became the second one and
> > > then the workaround had no effect.
> > > Why the clusterip had such an heavy impact on the networking? Before the
> > > clusterip my cluster was active-standby and i've got no problems at all.
> > > Now that the load per node is halved i noticed more load than before.
> > > The strangest thing is that (with the top tool) this load seem not exist
> > > and the nodes are not loaded at all:
> > > load average: 0.50, 0.36, 0.37
> > > How can i fix this without the dropping rule above?
> > > There is a way to see how the networking is loaded?
> > >
> > > Thanks in advance.
> > >
> > > Michele
> > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe netfilter" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe netfilter" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netfilter" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ClusterIP network slowdown
  2010-11-30  9:00 ClusterIP network slowdown Michele Codutti
  2010-11-30 12:59 ` Edison Figueira
@ 2010-12-02 12:10 ` Pablo Neira Ayuso
  2010-12-02 14:01   ` Michele Codutti
  1 sibling, 1 reply; 6+ messages in thread
From: Pablo Neira Ayuso @ 2010-12-02 12:10 UTC (permalink / raw)
  To: Michele Codutti; +Cc: netfilter

On 30/11/10 10:00, Michele Codutti wrote:
> Hello, in these days i had fun with the ClusterIP target associated to a
> web server. All is good and bright with the exception of two issues:
> - the message "CLUSTERIP: no conntrack!"
> - a general slowdown of the other network services (like ssh) of the two
> nodes of the cluster.
> To solve all my problems i've inserted this iptables rule:
> iptables -I INPUT 1 -m state --state INVALID -j DROP
> This is a solution that isn't good enough because i manage the apache2
> and the clustered ip with heartbeat2.
> Example: if i standby a node (for maintenance) and resume it after a
> while this can be a problem because heartbeat put the clusterip rule on
> top of the others so the dropping rule above became the second one and
> then the workaround had no effect.
> Why the clusterip had such an heavy impact on the networking? Before the
> clusterip my cluster was active-standby and i've got no problems at all.
> Now that the load per node is halved i noticed more load than before.
> The strangest thing is that (with the top tool) this load seem not exist
> and the nodes are not loaded at all:
> load average: 0.50, 0.36, 0.37
> How can i fix this without the dropping rule above?
> There is a way to see how the networking is loaded?

A suggestion, better use the 'cluster' match.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ClusterIP network slowdown
  2010-12-02 12:10 ` Pablo Neira Ayuso
@ 2010-12-02 14:01   ` Michele Codutti
  0 siblings, 0 replies; 6+ messages in thread
From: Michele Codutti @ 2010-12-02 14:01 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter

I know but that clusterip is deprecated but I cannot use the cluster
match because I'm working a Debian Lenny 5.0 that use the 2.6.26 version
of the Linux kernel.
I cannot change the kernel version until ne next release of Debian, this
is a constraint.
ClusterIP works pretty well, if only I could find a persistent
workaround to this packet loss issue it would be perfect ... until the
next release of Debian of course! ;)

> A suggestion, better use the 'cluster' match.
> --
> To unsubscribe from this list: send the line "unsubscribe netfilter" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Michele Codutti
Centro Servizi Informatici e Telematici (CSIT)
Universita' degli Studi di Udine
via Delle Scienze, 208 - 33100 UDINE
tel +39 0432 558928
fax +39 0432 558911
e-mail: michele.codutti at uniud.it

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-12-02 14:01 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-30  9:00 ClusterIP network slowdown Michele Codutti
2010-11-30 12:59 ` Edison Figueira
2010-11-30 16:00   ` Michele Codutti
2010-12-01  9:11     ` Michele Codutti
2010-12-02 12:10 ` Pablo Neira Ayuso
2010-12-02 14:01   ` Michele Codutti

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.