* [bug] FWMARKs and persistence in IPVS: The Use of Unions
@ 2009-04-28 8:15 Simon Horman
2009-04-28 9:07 ` Jan Engelhardt
2009-04-28 10:57 ` Julius Volz
0 siblings, 2 replies; 13+ messages in thread
From: Simon Horman @ 2009-04-28 8:15 UTC (permalink / raw)
To: netfilter-devel, lvs-devel
Cc: Fabien Duchêne, Joseph Mack NA3T, Julius Volz
[ Moving to netfilter-devel / lvs-devel for discussion on how to resolve this.
Added Julius Volz to Cc, he wrote most of the IPv6 portion of LVS.
Remove lvs-users from Cc, it is not an open list. ]
Hi,
The mail below details what appears to be a bug in IPVS.
Thanks to Fabien Duchêne for finding it and Joseph Mack for bringing
it to my attention.
The bug seems to be caused by the use of union nf_inet_addr in
the fwmark case. To start, this is what the union looks like:
union nf_inet_addr {
__u32 all[4];
__be32 ip;
__be32 ip6[4];
struct in_addr in;
struct in6_addr in6;
};
So it has some 32bit values, and some 128 bit values. And is
usually used to store either an IPv6 address or an IPv6 address.
Usually we know which and the functions ip_vs_addr_copy() and
ip_vs_addr_equal(), which are shown below, work quite well.
The problem arrives when a fwmark is used to identify packets.
These could be IPv4 packets or IPv6 packets. Regardless, what
the code currently does is to store the 32bit fwmark in .all[3]
and 0 in the other 3 octets of .all.
In the IPv6 case, this should be fine. When .ip6 is compared
with another .ip6, a 128 bit compare will occur. And this should accurately
compare fwmark values stored in .all as described above.
However, this scheme appears to break down in the IPv4 case. This is
because only .ip is used in comparison and it maps to the first octet of
all, which is always set to 0 in the current scheme.
A simple fix that comes to mind is to just store the fwmark in
the first octet of .all, and set the other octets to zero.
But is .ip always going to be the same as .all[0]?
Is a different approach required? For example, one where we know to compare
.all or perhaps a single octet of .all in the case where fmarks are used.
This particular change should be easy enough. I believe that fwmarks are
only used in this way twice, both inside ip_vs_schedule(). But
ip_vs_addr_equal() is more generic, so I'd prefer only to mangle it if
needed.
----- Forwarded message from Simon Horman <horms@verge.net.au> -----
Date: Tue, 28 Apr 2009 17:08:44 +1000
From: Simon Horman <horms@verge.net.au>
To: "LinuxVirtualServer.org users mailing list."
<lvs-users@linuxvirtualserver.org>
Subject: Re: [lvs-users] FWMARKs and persistence
On Thu, Apr 23, 2009 at 06:23:32PM +0200, Fabien Duchêne wrote:
> Hello Joe,
>
> Thanks for your reply!
>
> Okay for the -SH scheduler, but we have a script that dynamically change
> the weight of the servers (because the are very different -hardware
> speaking there-) and i think it wouldn't solve the problem (you'll see why).
> It was very simple to test, a openned the LDAP port on a webserver with
> a perl script (a server that just write "hello"), and then connected to
> this port with a telnet client just after connecting to the web service
> with my browser.
> Result: i'm connected to the webserver (via the LDAP mark...).
>
> In fact, i looked into the code, and i think that LVS can't handle
> multiple fwmark + persistence services (maybe we found a bug?).
>
> If you look in ip_vs.h (in the headers):
>
> static inline void ip_vs_addr_copy(int af, union nf_inet_addr *dst,
> const union nf_inet_addr *src)
> {
> #ifdef CONFIG_IP_VS_IPV6
> if (af == AF_INET6)
> ipv6_addr_copy(&dst->in6, &src->in6);
> else
> #endif
> dst->ip = src->ip;
> }
>
> static inline int ip_vs_addr_equal(int af, const union nf_inet_addr *a,
> const union nf_inet_addr *b)
> {
> #ifdef CONFIG_IP_VS_IPV6
> if (af == AF_INET6)
> return ipv6_addr_equal(&a->in6, &b->in6);
> #endif
>
> return a->ip == b->ip;
> }
>
> These functions are used the check if a template already exist.
> In the fwmarked template, ->ip is always 0.0.0.0 and the ->all[3] (where
> the fwmark is written) isn't tested (and not copied as you can see!).
> So, the first template created by a "fwmark persistent service" will
> match every fwmark persistent service (ip = 0.0.0.0, it's the same for
> all!).
>
> Correct me if i'm wrong?
>
> If it's a bug, I hope a Dev' could fix this..
Hi Fabien,
that looks like a bug in the IPv4 to me too. If so, it would have been
introduced when the IPv6 code was added to LVS, which was fairly recently.
It seems to me that it should be easy enough to fix by changing
fwmark in ip_vs_sched_persist() from:
union nf_inet_addr fwmark = {
.all = { 0, 0, 0, htonl(svc->fwmark) }
};
to:
union nf_inet_addr fwmark = {
.all = { htonl(svc->fwmark), 0, 0, 0 }
};
Assuming that this would result in fwmark->ip being set to
htonl(svc->fwmark), which is relevant if svc->af is AF_INET - that is,
for IPv4. IPv6 should work fine in both cases, as all 4 octets of
all will be tested - which would have presumably been the focus
of the IPv6 work that changed this code.
This assums that the ip element of union nf_inet_addr always corresponds
to the first octet of all.
An alternate idea would be to change the af value used for fwmarks,
but this seems to be even less clean than the current (slightly broken)
technique of using nf_inet_addr for IPv4 or IPv6 addresses, or fwmarks.
----- End forwarded message -----
--
Simon Horman
VA Linux Systems Japan K.K. Satellite Lab in Sydney, Australia
H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [bug] FWMARKs and persistence in IPVS: The Use of Unions
2009-04-28 8:15 Simon Horman
@ 2009-04-28 9:07 ` Jan Engelhardt
2009-04-28 9:23 ` Simon Horman
2009-04-28 10:57 ` Julius Volz
1 sibling, 1 reply; 13+ messages in thread
From: Jan Engelhardt @ 2009-04-28 9:07 UTC (permalink / raw)
To: Simon Horman
Cc: netfilter-devel, lvs-devel, Fabien Duchêne, Joseph Mack NA3T,
Julius Volz
On Tuesday 2009-04-28 10:15, Simon Horman wrote:
>
>It seems to me that it should be easy enough to fix by changing
>fwmark in ip_vs_sched_persist() from:
>
>union nf_inet_addr fwmark = {
> .all = { 0, 0, 0, htonl(svc->fwmark) }
>};
>
>to:
>
>union nf_inet_addr fwmark = {
> .all = { htonl(svc->fwmark), 0, 0, 0 }
>};
>
>Assuming that this would result in fwmark->ip being set to
>htonl(svc->fwmark), which is relevant if svc->af is AF_INET - that is,
>for IPv4.[...]
>An alternate idea would be to change the af value used for fwmarks,
>but this seems to be even less clean than the current (slightly broken)
>technique of using nf_inet_addr for IPv4 or IPv6 addresses, or fwmarks.
If you use ->all, then using NFPROTO_UNSPEC as af
seems to me like a good match.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [bug] FWMARKs and persistence in IPVS: The Use of Unions
2009-04-28 9:07 ` Jan Engelhardt
@ 2009-04-28 9:23 ` Simon Horman
2009-04-28 10:59 ` Simon Horman
0 siblings, 1 reply; 13+ messages in thread
From: Simon Horman @ 2009-04-28 9:23 UTC (permalink / raw)
To: Jan Engelhardt
Cc: netfilter-devel, lvs-devel, Fabien Duchêne, Joseph Mack NA3T,
Julius Volz
On Tue, Apr 28, 2009 at 11:07:40AM +0200, Jan Engelhardt wrote:
>
> On Tuesday 2009-04-28 10:15, Simon Horman wrote:
> >
> >It seems to me that it should be easy enough to fix by changing
> >fwmark in ip_vs_sched_persist() from:
> >
> >union nf_inet_addr fwmark = {
> > .all = { 0, 0, 0, htonl(svc->fwmark) }
> >};
> >
> >to:
> >
> >union nf_inet_addr fwmark = {
> > .all = { htonl(svc->fwmark), 0, 0, 0 }
> >};
> >
> >Assuming that this would result in fwmark->ip being set to
> >htonl(svc->fwmark), which is relevant if svc->af is AF_INET - that is,
> >for IPv4.[...]
> >An alternate idea would be to change the af value used for fwmarks,
> >but this seems to be even less clean than the current (slightly broken)
> >technique of using nf_inet_addr for IPv4 or IPv6 addresses, or fwmarks.
>
> If you use ->all, then using NFPROTO_UNSPEC as af
> seems to me like a good match.
That seems reasonable, though ip_vs_ct_in_get() would still
need to use the real af for the cp->af == af and
ip_vs_addr_equal(af, s_addr, &cp->caddr) portinos of the check.
--
Simon Horman
VA Linux Systems Japan K.K. Satellite Lab in Sydney, Australia
H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [bug] FWMARKs and persistence in IPVS: The Use of Unions
2009-04-28 8:15 Simon Horman
2009-04-28 9:07 ` Jan Engelhardt
@ 2009-04-28 10:57 ` Julius Volz
1 sibling, 0 replies; 13+ messages in thread
From: Julius Volz @ 2009-04-28 10:57 UTC (permalink / raw)
To: Simon Horman
Cc: netfilter-devel, lvs-devel, Fabien Duchêne, Joseph Mack NA3T
Hi,
On Tue, Apr 28, 2009 at 10:15 AM, Simon Horman <horms@verge.net.au> wrote:
> [ Moving to netfilter-devel / lvs-devel for discussion on how to resolve this.
> Added Julius Volz to Cc, he wrote most of the IPv6 portion of LVS.
> Remove lvs-users from Cc, it is not an open list. ]
Thanks for the notice and sorry for the v4 breakage.
> A simple fix that comes to mind is to just store the fwmark in
> the first octet of .all, and set the other octets to zero.
> But is .ip always going to be the same as .all[0]?
I prefer this and would have assumed that it is ok, but maybe someone
else can answer that more confidently.
> Is a different approach required? For example, one where we know to compare
> .all or perhaps a single octet of .all in the case where fmarks are used.
I guess that could be done by only specializing the lookup comparisons
in __ip_vs_conn_in_get() and ip_vs_ct_in_get() when the ports are 0.
Would blow up those if-statements even more though...
> This particular change should be easy enough. I believe that fwmarks are
> only used in this way twice, both inside ip_vs_schedule(). But
> ip_vs_addr_equal() is more generic, so I'd prefer only to mangle it if
> needed.
I agree.
Julius
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [bug] FWMARKs and persistence in IPVS: The Use of Unions
2009-04-28 9:23 ` Simon Horman
@ 2009-04-28 10:59 ` Simon Horman
2009-04-28 11:30 ` Fabien Duchêne
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Simon Horman @ 2009-04-28 10:59 UTC (permalink / raw)
To: Jan Engelhardt
Cc: netfilter-devel, lvs-devel, Fabien Duchêne, Joseph Mack NA3T,
Julius Volz
On Tue, Apr 28, 2009 at 07:23:55PM +1000, Simon Horman wrote:
> On Tue, Apr 28, 2009 at 11:07:40AM +0200, Jan Engelhardt wrote:
> >
> > On Tuesday 2009-04-28 10:15, Simon Horman wrote:
> > >
> > >It seems to me that it should be easy enough to fix by changing
> > >fwmark in ip_vs_sched_persist() from:
> > >
> > >union nf_inet_addr fwmark = {
> > > .all = { 0, 0, 0, htonl(svc->fwmark) }
> > >};
> > >
> > >to:
> > >
> > >union nf_inet_addr fwmark = {
> > > .all = { htonl(svc->fwmark), 0, 0, 0 }
> > >};
> > >
> > >Assuming that this would result in fwmark->ip being set to
> > >htonl(svc->fwmark), which is relevant if svc->af is AF_INET - that is,
> > >for IPv4.[...]
> > >An alternate idea would be to change the af value used for fwmarks,
> > >but this seems to be even less clean than the current (slightly broken)
> > >technique of using nf_inet_addr for IPv4 or IPv6 addresses, or fwmarks.
> >
> > If you use ->all, then using NFPROTO_UNSPEC as af
> > seems to me like a good match.
I am guessing that AF_UNSPEC is more appropriate than NFPROTO_UNSPEC.
Please correct me if I am wrong.
> That seems reasonable, though ip_vs_ct_in_get() would still
> need to use the real af for the cp->af == af and
> ip_vs_addr_equal(af, s_addr, &cp->caddr) portinos of the check.
It looks like checking for proto == IPPROTO_IP can tell us if
the destination is a fwmark. This is based on the assumption that
iph.protocol can never be IPPROTO_IP in ip_vs_sched_persist().
The following patch expresses these ideas as they crrently stand.
Fabien, is it possible for you to test this?
Index: net-next-2.6/net/netfilter/ipvs/ip_vs_conn.c
===================================================================
--- net-next-2.6.orig/net/netfilter/ipvs/ip_vs_conn.c 2009-04-28 20:37:48.000000000 +1000
+++ net-next-2.6/net/netfilter/ipvs/ip_vs_conn.c 2009-04-28 20:37:51.000000000 +1000
@@ -260,7 +260,10 @@ struct ip_vs_conn *ip_vs_ct_in_get
list_for_each_entry(cp, &ip_vs_conn_tab[hash], c_list) {
if (cp->af == af &&
ip_vs_addr_equal(af, s_addr, &cp->caddr) &&
- ip_vs_addr_equal(af, d_addr, &cp->vaddr) &&
+ /* protocol should only be IPPROTO_IP if
+ * d_addr is a fwmark */
+ ip_vs_addr_equal(protocol == IPPROTO_IP ? AF_UNSPEC : af,
+ d_addr, &cp->vaddr) &&
s_port == cp->cport && d_port == cp->vport &&
cp->flags & IP_VS_CONN_F_TEMPLATE &&
protocol == cp->protocol) {
@@ -698,7 +701,9 @@ ip_vs_conn_new(int af, int proto, const
cp->cport = cport;
ip_vs_addr_copy(af, &cp->vaddr, vaddr);
cp->vport = vport;
- ip_vs_addr_copy(af, &cp->daddr, daddr);
+ /* proto should only be IPPROTO_IP if d_addr is a fwmark */
+ ip_vs_addr_copy(proto == IPPROTO_IP ? AF_UNSPEC : af,
+ &cp->daddr, daddr);
cp->dport = dport;
cp->flags = flags;
spin_lock_init(&cp->lock);
Index: net-next-2.6/net/netfilter/ipvs/ip_vs_core.c
===================================================================
--- net-next-2.6.orig/net/netfilter/ipvs/ip_vs_core.c 2009-04-28 20:37:48.000000000 +1000
+++ net-next-2.6/net/netfilter/ipvs/ip_vs_core.c 2009-04-28 20:37:51.000000000 +1000
@@ -278,7 +278,7 @@ ip_vs_sched_persist(struct ip_vs_service
*/
if (svc->fwmark) {
union nf_inet_addr fwmark = {
- .all = { 0, 0, 0, htonl(svc->fwmark) }
+ .ip = htonl(svc->fwmark)
};
ct = ip_vs_ct_in_get(svc->af, IPPROTO_IP, &snet, 0,
@@ -306,7 +306,7 @@ ip_vs_sched_persist(struct ip_vs_service
*/
if (svc->fwmark) {
union nf_inet_addr fwmark = {
- .all = { 0, 0, 0, htonl(svc->fwmark) }
+ .ip = htonl(svc->fwmark)
};
ct = ip_vs_conn_new(svc->af, IPPROTO_IP,
--
Simon Horman
VA Linux Systems Japan K.K. Satellite Lab in Sydney, Australia
H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [bug] FWMARKs and persistence in IPVS: The Use of Unions
2009-04-28 10:59 ` Simon Horman
@ 2009-04-28 11:30 ` Fabien Duchêne
2009-05-01 6:40 ` Simon Horman
2009-04-28 11:59 ` Julius Volz
2009-04-28 12:27 ` Jan Engelhardt
2 siblings, 1 reply; 13+ messages in thread
From: Fabien Duchêne @ 2009-04-28 11:30 UTC (permalink / raw)
To: Simon Horman
Cc: Jan Engelhardt, netfilter-devel, lvs-devel, Joseph Mack NA3T,
Julius Volz
Simon Horman a écrit :
> On Tue, Apr 28, 2009 at 07:23:55PM +1000, Simon Horman wrote:
>> On Tue, Apr 28, 2009 at 11:07:40AM +0200, Jan Engelhardt wrote:
>>> On Tuesday 2009-04-28 10:15, Simon Horman wrote:
>>>> It seems to me that it should be easy enough to fix by changing
>>>> fwmark in ip_vs_sched_persist() from:
>>>>
>>>> union nf_inet_addr fwmark = {
>>>> .all = { 0, 0, 0, htonl(svc->fwmark) }
>>>> };
>>>>
>>>> to:
>>>>
>>>> union nf_inet_addr fwmark = {
>>>> .all = { htonl(svc->fwmark), 0, 0, 0 }
>>>> };
>>>>
>>>> Assuming that this would result in fwmark->ip being set to
>>>> htonl(svc->fwmark), which is relevant if svc->af is AF_INET - that is,
>>>> for IPv4.[...]
>>>> An alternate idea would be to change the af value used for fwmarks,
>>>> but this seems to be even less clean than the current (slightly broken)
>>>> technique of using nf_inet_addr for IPv4 or IPv6 addresses, or fwmarks.
>>> If you use ->all, then using NFPROTO_UNSPEC as af
>>> seems to me like a good match.
>
> I am guessing that AF_UNSPEC is more appropriate than NFPROTO_UNSPEC.
> Please correct me if I am wrong.
>
>> That seems reasonable, though ip_vs_ct_in_get() would still
>> need to use the real af for the cp->af == af and
>> ip_vs_addr_equal(af, s_addr, &cp->caddr) portinos of the check.
>
> It looks like checking for proto == IPPROTO_IP can tell us if
> the destination is a fwmark. This is based on the assumption that
> iph.protocol can never be IPPROTO_IP in ip_vs_sched_persist().
>
> The following patch expresses these ideas as they crrently stand.
> Fabien, is it possible for you to test this?
>
Yep!
I'll do it right now.
> Index: net-next-2.6/net/netfilter/ipvs/ip_vs_conn.c
> ===================================================================
> --- net-next-2.6.orig/net/netfilter/ipvs/ip_vs_conn.c 2009-04-28 20:37:48.000000000 +1000
> +++ net-next-2.6/net/netfilter/ipvs/ip_vs_conn.c 2009-04-28 20:37:51.000000000 +1000
> @@ -260,7 +260,10 @@ struct ip_vs_conn *ip_vs_ct_in_get
> list_for_each_entry(cp, &ip_vs_conn_tab[hash], c_list) {
> if (cp->af == af &&
> ip_vs_addr_equal(af, s_addr, &cp->caddr) &&
> - ip_vs_addr_equal(af, d_addr, &cp->vaddr) &&
> + /* protocol should only be IPPROTO_IP if
> + * d_addr is a fwmark */
> + ip_vs_addr_equal(protocol == IPPROTO_IP ? AF_UNSPEC : af,
> + d_addr, &cp->vaddr) &&
> s_port == cp->cport && d_port == cp->vport &&
> cp->flags & IP_VS_CONN_F_TEMPLATE &&
> protocol == cp->protocol) {
> @@ -698,7 +701,9 @@ ip_vs_conn_new(int af, int proto, const
> cp->cport = cport;
> ip_vs_addr_copy(af, &cp->vaddr, vaddr);
> cp->vport = vport;
> - ip_vs_addr_copy(af, &cp->daddr, daddr);
> + /* proto should only be IPPROTO_IP if d_addr is a fwmark */
> + ip_vs_addr_copy(proto == IPPROTO_IP ? AF_UNSPEC : af,
> + &cp->daddr, daddr);
> cp->dport = dport;
> cp->flags = flags;
> spin_lock_init(&cp->lock);
> Index: net-next-2.6/net/netfilter/ipvs/ip_vs_core.c
> ===================================================================
> --- net-next-2.6.orig/net/netfilter/ipvs/ip_vs_core.c 2009-04-28 20:37:48.000000000 +1000
> +++ net-next-2.6/net/netfilter/ipvs/ip_vs_core.c 2009-04-28 20:37:51.000000000 +1000
> @@ -278,7 +278,7 @@ ip_vs_sched_persist(struct ip_vs_service
> */
> if (svc->fwmark) {
> union nf_inet_addr fwmark = {
> - .all = { 0, 0, 0, htonl(svc->fwmark) }
> + .ip = htonl(svc->fwmark)
> };
>
> ct = ip_vs_ct_in_get(svc->af, IPPROTO_IP, &snet, 0,
> @@ -306,7 +306,7 @@ ip_vs_sched_persist(struct ip_vs_service
> */
> if (svc->fwmark) {
> union nf_inet_addr fwmark = {
> - .all = { 0, 0, 0, htonl(svc->fwmark) }
> + .ip = htonl(svc->fwmark)
> };
>
> ct = ip_vs_conn_new(svc->af, IPPROTO_IP,
>
>
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [bug] FWMARKs and persistence in IPVS: The Use of Unions
2009-04-28 10:59 ` Simon Horman
2009-04-28 11:30 ` Fabien Duchêne
@ 2009-04-28 11:59 ` Julius Volz
2009-04-28 12:27 ` Jan Engelhardt
2 siblings, 0 replies; 13+ messages in thread
From: Julius Volz @ 2009-04-28 11:59 UTC (permalink / raw)
To: Simon Horman
Cc: Jan Engelhardt, netfilter-devel, lvs-devel, Fabien Duchêne,
Joseph Mack NA3T
On Tue, Apr 28, 2009 at 12:59 PM, Simon Horman <horms@verge.net.au> wrote:
> On Tue, Apr 28, 2009 at 07:23:55PM +1000, Simon Horman wrote:
>> On Tue, Apr 28, 2009 at 11:07:40AM +0200, Jan Engelhardt wrote:
>> >
>> > On Tuesday 2009-04-28 10:15, Simon Horman wrote:
>> > >
>> > >It seems to me that it should be easy enough to fix by changing
>> > >fwmark in ip_vs_sched_persist() from:
>> > >
>> > >union nf_inet_addr fwmark = {
>> > > .all = { 0, 0, 0, htonl(svc->fwmark) }
>> > >};
>> > >
>> > >to:
>> > >
>> > >union nf_inet_addr fwmark = {
>> > > .all = { htonl(svc->fwmark), 0, 0, 0 }
>> > >};
>> > >
>> > >Assuming that this would result in fwmark->ip being set to
>> > >htonl(svc->fwmark), which is relevant if svc->af is AF_INET - that is,
>> > >for IPv4.[...]
>> > >An alternate idea would be to change the af value used for fwmarks,
>> > >but this seems to be even less clean than the current (slightly broken)
>> > >technique of using nf_inet_addr for IPv4 or IPv6 addresses, or fwmarks.
>> >
>> > If you use ->all, then using NFPROTO_UNSPEC as af
>> > seems to me like a good match.
>
> I am guessing that AF_UNSPEC is more appropriate than NFPROTO_UNSPEC.
> Please correct me if I am wrong.
>
>> That seems reasonable, though ip_vs_ct_in_get() would still
>> need to use the real af for the cp->af == af and
>> ip_vs_addr_equal(af, s_addr, &cp->caddr) portinos of the check.
>
> It looks like checking for proto == IPPROTO_IP can tell us if
> the destination is a fwmark. This is based on the assumption that
> iph.protocol can never be IPPROTO_IP in ip_vs_sched_persist().
>
> The following patch expresses these ideas as they crrently stand.
> Fabien, is it possible for you to test this?
>
> Index: net-next-2.6/net/netfilter/ipvs/ip_vs_conn.c
> ===================================================================
> --- net-next-2.6.orig/net/netfilter/ipvs/ip_vs_conn.c 2009-04-28 20:37:48.000000000 +1000
> +++ net-next-2.6/net/netfilter/ipvs/ip_vs_conn.c 2009-04-28 20:37:51.000000000 +1000
> @@ -260,7 +260,10 @@ struct ip_vs_conn *ip_vs_ct_in_get
> list_for_each_entry(cp, &ip_vs_conn_tab[hash], c_list) {
> if (cp->af == af &&
> ip_vs_addr_equal(af, s_addr, &cp->caddr) &&
> - ip_vs_addr_equal(af, d_addr, &cp->vaddr) &&
> + /* protocol should only be IPPROTO_IP if
> + * d_addr is a fwmark */
> + ip_vs_addr_equal(protocol == IPPROTO_IP ? AF_UNSPEC : af,
> + d_addr, &cp->vaddr) &&
> s_port == cp->cport && d_port == cp->vport &&
> cp->flags & IP_VS_CONN_F_TEMPLATE &&
> protocol == cp->protocol) {
> @@ -698,7 +701,9 @@ ip_vs_conn_new(int af, int proto, const
> cp->cport = cport;
> ip_vs_addr_copy(af, &cp->vaddr, vaddr);
> cp->vport = vport;
> - ip_vs_addr_copy(af, &cp->daddr, daddr);
> + /* proto should only be IPPROTO_IP if d_addr is a fwmark */
> + ip_vs_addr_copy(proto == IPPROTO_IP ? AF_UNSPEC : af,
> + &cp->daddr, daddr);
> cp->dport = dport;
> cp->flags = flags;
> spin_lock_init(&cp->lock);
> Index: net-next-2.6/net/netfilter/ipvs/ip_vs_core.c
> ===================================================================
> --- net-next-2.6.orig/net/netfilter/ipvs/ip_vs_core.c 2009-04-28 20:37:48.000000000 +1000
> +++ net-next-2.6/net/netfilter/ipvs/ip_vs_core.c 2009-04-28 20:37:51.000000000 +1000
> @@ -278,7 +278,7 @@ ip_vs_sched_persist(struct ip_vs_service
> */
> if (svc->fwmark) {
> union nf_inet_addr fwmark = {
> - .all = { 0, 0, 0, htonl(svc->fwmark) }
> + .ip = htonl(svc->fwmark)
> };
>
> ct = ip_vs_ct_in_get(svc->af, IPPROTO_IP, &snet, 0,
> @@ -306,7 +306,7 @@ ip_vs_sched_persist(struct ip_vs_service
> */
> if (svc->fwmark) {
> union nf_inet_addr fwmark = {
> - .all = { 0, 0, 0, htonl(svc->fwmark) }
> + .ip = htonl(svc->fwmark)
> };
>
> ct = ip_vs_conn_new(svc->af, IPPROTO_IP,
Looks good to me, without being able to test it now.
I earlier mentioned also changing __ip_vs_conn_in_get(), but now
realized that the problem exists only during connection template
lookup, not for regular connections.
Julius
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [bug] FWMARKs and persistence in IPVS: The Use of Unions
2009-04-28 10:59 ` Simon Horman
2009-04-28 11:30 ` Fabien Duchêne
2009-04-28 11:59 ` Julius Volz
@ 2009-04-28 12:27 ` Jan Engelhardt
2009-04-28 15:00 ` Simon Horman
2 siblings, 1 reply; 13+ messages in thread
From: Jan Engelhardt @ 2009-04-28 12:27 UTC (permalink / raw)
To: Simon Horman
Cc: netfilter-devel, lvs-devel, Fabien Duchêne, Joseph Mack NA3T,
Julius Volz
On Tuesday 2009-04-28 12:59, Simon Horman wrote:
>>>
>>>union nf_inet_addr fwmark = {
>>> .all = { 0, 0, 0, htonl(svc->fwmark) }
>>>};
>[said something about cp->af...]
It does not make sense to use AF_INE with some address as unreal
as {0,0,0,fwmark}, just BTW.
>> > If you use ->all, then using NFPROTO_UNSPEC as af
>> > seems to me like a good match.
>
>I am guessing that AF_UNSPEC is more appropriate than NFPROTO_UNSPEC.
>Please correct me if I am wrong.
Whatever. You could even use AF_INET6 to mean "take the ipv4 part
of nf_inet_addr", and AF_INET to "take the ipv6 part". The mapping
is on you, so to speak.
Since you are dealing with an *nf*_inet_addr, using *NF*PROTO_ seemed
the closest thing.
>The following patch expresses these ideas as they crrently stand.
>Fabien, is it possible for you to test this?
>
>Index: net-next-2.6/net/netfilter/ipvs/ip_vs_conn.c
>===================================================================
>--- net-next-2.6.orig/net/netfilter/ipvs/ip_vs_conn.c 2009-04-28 20:37:48.000000000 +1000
>+++ net-next-2.6/net/netfilter/ipvs/ip_vs_conn.c 2009-04-28 20:37:51.000000000 +1000
>@@ -260,7 +260,10 @@ struct ip_vs_conn *ip_vs_ct_in_get
> list_for_each_entry(cp, &ip_vs_conn_tab[hash], c_list) {
> if (cp->af == af &&
> ip_vs_addr_equal(af, s_addr, &cp->caddr) &&
>- ip_vs_addr_equal(af, d_addr, &cp->vaddr) &&
>+ /* protocol should only be IPPROTO_IP if
>+ * d_addr is a fwmark */
>+ ip_vs_addr_equal(protocol == IPPROTO_IP ? AF_UNSPEC : af,
>+ d_addr, &cp->vaddr) &&
What about IPPROTO_IPV6?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [bug] FWMARKs and persistence in IPVS: The Use of Unions
2009-04-28 12:27 ` Jan Engelhardt
@ 2009-04-28 15:00 ` Simon Horman
2009-04-28 15:28 ` Jan Engelhardt
0 siblings, 1 reply; 13+ messages in thread
From: Simon Horman @ 2009-04-28 15:00 UTC (permalink / raw)
To: Jan Engelhardt
Cc: netfilter-devel, lvs-devel, Fabien Duchêne, Joseph Mack NA3T,
Julius Volz
On Tue, Apr 28, 2009 at 02:27:34PM +0200, Jan Engelhardt wrote:
>
> On Tuesday 2009-04-28 12:59, Simon Horman wrote:
> >>>
> >>>union nf_inet_addr fwmark = {
> >>> .all = { 0, 0, 0, htonl(svc->fwmark) }
> >>>};
> >[said something about cp->af...]
>
> It does not make sense to use AF_INE with some address as unreal
> as {0,0,0,fwmark}, just BTW.
Agreed.
> >> > If you use ->all, then using NFPROTO_UNSPEC as af
> >> > seems to me like a good match.
> >
> >I am guessing that AF_UNSPEC is more appropriate than NFPROTO_UNSPEC.
> >Please correct me if I am wrong.
>
> Whatever. You could even use AF_INET6 to mean "take the ipv4 part
> of nf_inet_addr", and AF_INET to "take the ipv6 part". The mapping
> is on you, so to speak.
Ok, though I would hope that we can come up with something a little
more intuative than that.
> Since you are dealing with an *nf*_inet_addr, using *NF*PROTO_ seemed
> the closest thing.
Ok, as it happens the code that looked like it should be modified was
using AF_ values, even though it deals with nf_inet_addr. As you mention
this, it looks like it should be possible to change all the relevant
ip_vs code over to use NFPROTO_ instead, but lets leave looking into
that for later.
> >The following patch expresses these ideas as they crrently stand.
> >Fabien, is it possible for you to test this?
> >
> >Index: net-next-2.6/net/netfilter/ipvs/ip_vs_conn.c
> >===================================================================
> >--- net-next-2.6.orig/net/netfilter/ipvs/ip_vs_conn.c 2009-04-28 20:37:48.000000000 +1000
> >+++ net-next-2.6/net/netfilter/ipvs/ip_vs_conn.c 2009-04-28 20:37:51.000000000 +1000
> >@@ -260,7 +260,10 @@ struct ip_vs_conn *ip_vs_ct_in_get
> > list_for_each_entry(cp, &ip_vs_conn_tab[hash], c_list) {
> > if (cp->af == af &&
> > ip_vs_addr_equal(af, s_addr, &cp->caddr) &&
> >- ip_vs_addr_equal(af, d_addr, &cp->vaddr) &&
> >+ /* protocol should only be IPPROTO_IP if
> >+ * d_addr is a fwmark */
> >+ ip_vs_addr_equal(protocol == IPPROTO_IP ? AF_UNSPEC : af,
> >+ d_addr, &cp->vaddr) &&
>
> What about IPPROTO_IPV6?
I believe that the value IPPROTO_IP is only used in the case of fwmark.
Here is a explanation of why.
The value of protocol in ip_vs_ct_in_get() and ip_vs_conn_new()
can come from serveral sources:
1) If a fwmark in use, then it is set to IPPROTO_IP when dealing
with templates for persistance.
2) If the entry is created by the FTP helper, IPPROTO_TCP is used.
3) If the entry is created by syncrhonisation of the table of another
machine, then the protocol used in the foreign entry is used -
which would have been set by one of these 4 cases.
4) Otherwise the value of iph.protocol is used. iph is set set as
ip_vs_fill_iphdr(svc->af, skb_network_header(skb), &iph);
The key is that functions that provide cases 1), 2) and 4) are all called
from ip_vs_in() which filters packets so that, amongst other things, the
protocol is accepted by ip_vs_proto_get(). At this time, that means it must
be one of:
IPPROTO_TCP (as per ip_vs_protocol_tcp)
IPPROTO_UDP (as per ip_vs_protocol_udp)
IPPROTO_AH (as per ip_vs_protocol_ah)
IPPROTO_ESP (as per ip_vs_protocol_esp)
I had thought about adding an explicit is_fwmark field to ip_vs_ct_in_get()
and ip_vs_conn_new(), but it would also need to be added to the syncronisation
protocol, changes to which seem best avoided for compatibility reasons.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [bug] FWMARKs and persistence in IPVS: The Use of Unions
2009-04-28 15:00 ` Simon Horman
@ 2009-04-28 15:28 ` Jan Engelhardt
2009-04-29 0:06 ` Simon Horman
0 siblings, 1 reply; 13+ messages in thread
From: Jan Engelhardt @ 2009-04-28 15:28 UTC (permalink / raw)
To: Simon Horman
Cc: netfilter-devel, lvs-devel, Fabien Duchêne, Joseph Mack NA3T,
Julius Volz
On Tuesday 2009-04-28 17:00, Simon Horman wrote:
>> >Index: net-next-2.6/net/netfilter/ipvs/ip_vs_conn.c
>> >===================================================================
>> >--- net-next-2.6.orig/net/netfilter/ipvs/ip_vs_conn.c 2009-04-28 20:37:48.000000000 +1000
>> >+++ net-next-2.6/net/netfilter/ipvs/ip_vs_conn.c 2009-04-28 20:37:51.000000000 +1000
>> >@@ -260,7 +260,10 @@ struct ip_vs_conn *ip_vs_ct_in_get
>> > list_for_each_entry(cp, &ip_vs_conn_tab[hash], c_list) {
>> > if (cp->af == af &&
>> > ip_vs_addr_equal(af, s_addr, &cp->caddr) &&
>> >- ip_vs_addr_equal(af, d_addr, &cp->vaddr) &&
>> >+ /* protocol should only be IPPROTO_IP if
>> >+ * d_addr is a fwmark */
>> >+ ip_vs_addr_equal(protocol == IPPROTO_IP ? AF_UNSPEC : af,
>> >+ d_addr, &cp->vaddr) &&
>>
>> What about IPPROTO_IPV6?
>
>I believe that the value IPPROTO_IP is only used in the case of fwmark.
>Here is a explanation of why.
>
>1) If a fwmark in use, then it is set to IPPROTO_IP when dealing
> with templates for persistance.
>2) If the entry is created by the FTP helper, IPPROTO_TCP is used.
>3) If the entry is created by syncrhonisation of the table of another
> machine, then the protocol used in the foreign entry is used -
> which would have been set by one of these 4 cases.
>4) Otherwise the value of iph.protocol is used.
That sounds a bit like whenever you get an IPIP packet,
IPVS will erroneously think it is operating on an fwmark-based address.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [bug] FWMARKs and persistence in IPVS: The Use of Unions
2009-04-28 15:28 ` Jan Engelhardt
@ 2009-04-29 0:06 ` Simon Horman
0 siblings, 0 replies; 13+ messages in thread
From: Simon Horman @ 2009-04-29 0:06 UTC (permalink / raw)
To: Jan Engelhardt
Cc: netfilter-devel, lvs-devel, Fabien Duchêne, Joseph Mack NA3T,
Julius Volz
On Tue, Apr 28, 2009 at 05:28:06PM +0200, Jan Engelhardt wrote:
>
> On Tuesday 2009-04-28 17:00, Simon Horman wrote:
> >> >Index: net-next-2.6/net/netfilter/ipvs/ip_vs_conn.c
> >> >===================================================================
> >> >--- net-next-2.6.orig/net/netfilter/ipvs/ip_vs_conn.c 2009-04-28 20:37:48.000000000 +1000
> >> >+++ net-next-2.6/net/netfilter/ipvs/ip_vs_conn.c 2009-04-28 20:37:51.000000000 +1000
> >> >@@ -260,7 +260,10 @@ struct ip_vs_conn *ip_vs_ct_in_get
> >> > list_for_each_entry(cp, &ip_vs_conn_tab[hash], c_list) {
> >> > if (cp->af == af &&
> >> > ip_vs_addr_equal(af, s_addr, &cp->caddr) &&
> >> >- ip_vs_addr_equal(af, d_addr, &cp->vaddr) &&
> >> >+ /* protocol should only be IPPROTO_IP if
> >> >+ * d_addr is a fwmark */
> >> >+ ip_vs_addr_equal(protocol == IPPROTO_IP ? AF_UNSPEC : af,
> >> >+ d_addr, &cp->vaddr) &&
> >>
> >> What about IPPROTO_IPV6?
> >
> >I believe that the value IPPROTO_IP is only used in the case of fwmark.
> >Here is a explanation of why.
> >
> >1) If a fwmark in use, then it is set to IPPROTO_IP when dealing
> > with templates for persistance.
> >2) If the entry is created by the FTP helper, IPPROTO_TCP is used.
> >3) If the entry is created by syncrhonisation of the table of another
> > machine, then the protocol used in the foreign entry is used -
> > which would have been set by one of these 4 cases.
> >4) Otherwise the value of iph.protocol is used.
>
> That sounds a bit like whenever you get an IPIP packet,
> IPVS will erroneously think it is operating on an fwmark-based address.
Unless I am missing something, if the protcol of the skb is IPPROTO_IPIP
then ip_vs_in() will return NF_ACCEPT and it won't go through
any code paths that use the logic above.
iph.protocol needs to be matched by ip_vs_proto_get(),
which means it needs to be one of:
IPPROTO_TCP
IPPROTO_UDP
IPPROTO_AH
IPPROTO_ESP
IPPROTO_ICMP is also handled, basically by extracting the embeded
header and then checking its iph.protocol with ip_vs_proto_get()
In other words IPVS only knows how to load balance TCP, UDP, AH and ESP,
and handle related ICMP traffic.
--
Simon Horman
VA Linux Systems Japan K.K. Satellite Lab in Sydney, Australia
H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [bug] FWMARKs and persistence in IPVS: The Use of Unions
2009-04-28 11:30 ` Fabien Duchêne
@ 2009-05-01 6:40 ` Simon Horman
0 siblings, 0 replies; 13+ messages in thread
From: Simon Horman @ 2009-05-01 6:40 UTC (permalink / raw)
To: Fabien Duchêne
Cc: Jan Engelhardt, netfilter-devel, lvs-devel, Joseph Mack NA3T,
Julius Volz
On Tue, Apr 28, 2009 at 01:30:26PM +0200, Fabien Duchêne wrote:
> Simon Horman a écrit :
> > On Tue, Apr 28, 2009 at 07:23:55PM +1000, Simon Horman wrote:
> >> On Tue, Apr 28, 2009 at 11:07:40AM +0200, Jan Engelhardt wrote:
> >>> On Tuesday 2009-04-28 10:15, Simon Horman wrote:
> >>>> It seems to me that it should be easy enough to fix by changing
> >>>> fwmark in ip_vs_sched_persist() from:
> >>>>
> >>>> union nf_inet_addr fwmark = {
> >>>> .all = { 0, 0, 0, htonl(svc->fwmark) }
> >>>> };
> >>>>
> >>>> to:
> >>>>
> >>>> union nf_inet_addr fwmark = {
> >>>> .all = { htonl(svc->fwmark), 0, 0, 0 }
> >>>> };
> >>>>
> >>>> Assuming that this would result in fwmark->ip being set to
> >>>> htonl(svc->fwmark), which is relevant if svc->af is AF_INET - that is,
> >>>> for IPv4.[...]
> >>>> An alternate idea would be to change the af value used for fwmarks,
> >>>> but this seems to be even less clean than the current (slightly broken)
> >>>> technique of using nf_inet_addr for IPv4 or IPv6 addresses, or fwmarks.
> >>> If you use ->all, then using NFPROTO_UNSPEC as af
> >>> seems to me like a good match.
> >
> > I am guessing that AF_UNSPEC is more appropriate than NFPROTO_UNSPEC.
> > Please correct me if I am wrong.
> >
> >> That seems reasonable, though ip_vs_ct_in_get() would still
> >> need to use the real af for the cp->af == af and
> >> ip_vs_addr_equal(af, s_addr, &cp->caddr) portinos of the check.
> >
> > It looks like checking for proto == IPPROTO_IP can tell us if
> > the destination is a fwmark. This is based on the assumption that
> > iph.protocol can never be IPPROTO_IP in ip_vs_sched_persist().
> >
> > The following patch expresses these ideas as they crrently stand.
> > Fabien, is it possible for you to test this?
> >
> Yep!
>
> I'll do it right now.
Hi Fabien,
did you get a chance to test this?
> > Index: net-next-2.6/net/netfilter/ipvs/ip_vs_conn.c
> > ===================================================================
> > --- net-next-2.6.orig/net/netfilter/ipvs/ip_vs_conn.c 2009-04-28 20:37:48.000000000 +1000
> > +++ net-next-2.6/net/netfilter/ipvs/ip_vs_conn.c 2009-04-28 20:37:51.000000000 +1000
> > @@ -260,7 +260,10 @@ struct ip_vs_conn *ip_vs_ct_in_get
> > list_for_each_entry(cp, &ip_vs_conn_tab[hash], c_list) {
> > if (cp->af == af &&
> > ip_vs_addr_equal(af, s_addr, &cp->caddr) &&
> > - ip_vs_addr_equal(af, d_addr, &cp->vaddr) &&
> > + /* protocol should only be IPPROTO_IP if
> > + * d_addr is a fwmark */
> > + ip_vs_addr_equal(protocol == IPPROTO_IP ? AF_UNSPEC : af,
> > + d_addr, &cp->vaddr) &&
> > s_port == cp->cport && d_port == cp->vport &&
> > cp->flags & IP_VS_CONN_F_TEMPLATE &&
> > protocol == cp->protocol) {
> > @@ -698,7 +701,9 @@ ip_vs_conn_new(int af, int proto, const
> > cp->cport = cport;
> > ip_vs_addr_copy(af, &cp->vaddr, vaddr);
> > cp->vport = vport;
> > - ip_vs_addr_copy(af, &cp->daddr, daddr);
> > + /* proto should only be IPPROTO_IP if d_addr is a fwmark */
> > + ip_vs_addr_copy(proto == IPPROTO_IP ? AF_UNSPEC : af,
> > + &cp->daddr, daddr);
> > cp->dport = dport;
> > cp->flags = flags;
> > spin_lock_init(&cp->lock);
> > Index: net-next-2.6/net/netfilter/ipvs/ip_vs_core.c
> > ===================================================================
> > --- net-next-2.6.orig/net/netfilter/ipvs/ip_vs_core.c 2009-04-28 20:37:48.000000000 +1000
> > +++ net-next-2.6/net/netfilter/ipvs/ip_vs_core.c 2009-04-28 20:37:51.000000000 +1000
> > @@ -278,7 +278,7 @@ ip_vs_sched_persist(struct ip_vs_service
> > */
> > if (svc->fwmark) {
> > union nf_inet_addr fwmark = {
> > - .all = { 0, 0, 0, htonl(svc->fwmark) }
> > + .ip = htonl(svc->fwmark)
> > };
> >
> > ct = ip_vs_ct_in_get(svc->af, IPPROTO_IP, &snet, 0,
> > @@ -306,7 +306,7 @@ ip_vs_sched_persist(struct ip_vs_service
> > */
> > if (svc->fwmark) {
> > union nf_inet_addr fwmark = {
> > - .all = { 0, 0, 0, htonl(svc->fwmark) }
> > + .ip = htonl(svc->fwmark)
> > };
> >
> > ct = ip_vs_conn_new(svc->af, IPPROTO_IP,
> >
> >
--
Simon Horman
VA Linux Systems Japan K.K. Satellite Lab in Sydney, Australia
H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [bug] FWMARKs and persistence in IPVS: The Use of Unions
@ 2009-05-07 0:43 Simon Horman
0 siblings, 0 replies; 13+ messages in thread
From: Simon Horman @ 2009-05-07 0:43 UTC (permalink / raw)
To: Fabien Duchêne
Cc: 'Jan Engelhardt', 'netfilter-devel', lvs-devel,
'Joseph Mack NA3T', 'Julius Volz'
On Fri, May 01, 2009 at 02:49:07PM +0200, Fabien Duchêne wrote:
> Hello,
>
> Yes, it's working perfectly, no any problems and no difference in the
> performances.
> Tested with a lot (100 000+) of connections for 3 days now.. :)
Great, thanks.
--
Simon Horman
VA Linux Systems Japan K.K. Satellite Lab in Sydney, Australia
H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2009-05-07 0:43 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-07 0:43 [bug] FWMARKs and persistence in IPVS: The Use of Unions Simon Horman
-- strict thread matches above, loose matches on Subject: below --
2009-04-28 8:15 Simon Horman
2009-04-28 9:07 ` Jan Engelhardt
2009-04-28 9:23 ` Simon Horman
2009-04-28 10:59 ` Simon Horman
2009-04-28 11:30 ` Fabien Duchêne
2009-05-01 6:40 ` Simon Horman
2009-04-28 11:59 ` Julius Volz
2009-04-28 12:27 ` Jan Engelhardt
2009-04-28 15:00 ` Simon Horman
2009-04-28 15:28 ` Jan Engelhardt
2009-04-29 0:06 ` Simon Horman
2009-04-28 10:57 ` Julius Volz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).