* [RFC net-next] ipv6: Use destination address determined by IPVS
@ 2013-10-16 0:02 Simon Horman
2013-10-16 0:13 ` Eric Dumazet
` (3 more replies)
0 siblings, 4 replies; 28+ messages in thread
From: Simon Horman @ 2013-10-16 0:02 UTC (permalink / raw)
To: YOSHIFUJI Hideaki / 吉藤英明
Cc: lvs-devel, netdev, Julian Anastasov, Mark Brooks, Simon Horman
In v3.9 6fd6ce2056de2709 ("ipv6: Do not depend on rt->n in
ip6_finish_output2()") changed the behaviour of ip6_finish_output2()
such that it creates and uses a neigh entry if none is found.
Subsequently the 'n' field was removed from struct rt6_info.
Unfortunately my analysis is that in the case of IPVS direct routing this
change leads to incorrect behaviour as in this case packets may be output
to a destination other than where they would be output according to the
route table. In particular, the destination address may actually be a local
address and empirically a neighbour lookup seems to result in it becoming
unreachable.
This patch resolves the problem by providing the destination address
determined by IPVS to ip6_finish_output2() in the skb callback. Although
this seems to work I can see several problems with this approach:
* It is rather ugly, stuffing an IPVS exception right in
the middle of IPv6 code. The overhead could be eliminated for many users
by using a staic key. But none the less it is not attractive.
* The use of the skb callback is may not be valid
as it crosses from IPVS to IPv6 code. A possible, though unpleasant,
alternative is to add a new field to struct sk_buff.
* This covers all IPv6 packets output by IPVS but actually
only those output using IPVS Direct-Routing need this. One way to
resolve this would be to add a more fine-grained ipvs_property to
struct sk_buff.
Reported-by: Mark Brooks <mark@loadbalancer.org>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
include/net/ip_vs.h | 6 ++++++
net/ipv6/ip6_output.c | 9 +++++++--
net/netfilter/ipvs/ip_vs_xmit.c | 2 ++
3 files changed, 15 insertions(+), 2 deletions(-)
diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 1c2e1b9..11d90a6 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -1649,4 +1649,10 @@ ip_vs_dest_conn_overhead(struct ip_vs_dest *dest)
atomic_read(&dest->inactconns);
}
+struct ipvs_skb_cb {
+ struct in6_addr *daddr;
+};
+
+#define IP_VS_SKB_CB(skb) ((struct ipvs_skb_cb *)&(skb)->cb)
+
#endif /* _NET_IP_VS_H */
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index a54c45c..a340180 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -52,6 +52,7 @@
#include <net/addrconf.h>
#include <net/rawv6.h>
#include <net/icmp.h>
+#include <net/ip_vs.h>
#include <net/xfrm.h>
#include <net/checksum.h>
#include <linux/mroute6.h>
@@ -61,7 +62,7 @@ static int ip6_finish_output2(struct sk_buff *skb)
struct dst_entry *dst = skb_dst(skb);
struct net_device *dev = dst->dev;
struct neighbour *neigh;
- struct in6_addr *nexthop;
+ struct in6_addr *nexthop, *daddr;
int ret;
skb->protocol = htons(ETH_P_IPV6);
@@ -105,7 +106,11 @@ static int ip6_finish_output2(struct sk_buff *skb)
}
rcu_read_lock_bh();
- nexthop = rt6_nexthop((struct rt6_info *)dst, &ipv6_hdr(skb)->daddr);
+ if (unlikely(IS_ENABLED(CONFIG_IP_VS) && skb->ipvs_property))
+ daddr = IP_VS_SKB_CB(skb)->daddr;
+ else
+ daddr = &ipv6_hdr(skb)->daddr;
+ nexthop = rt6_nexthop((struct rt6_info *)dst, daddr);
neigh = __ipv6_neigh_lookup_noref(dst->dev, nexthop);
if (unlikely(!neigh))
neigh = __neigh_create(&nd_tbl, nexthop, dst->dev, false);
diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index c47444e..054b679 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -391,6 +391,8 @@ __ip_vs_get_out_rt_v6(struct sk_buff *skb, struct ip_vs_dest *dest,
rt = (struct rt6_info *) dst;
}
+ IP_VS_SKB_CB(skb)->daddr = daddr;
+
local = __ip_vs_is_local_route6(rt);
if (!((local ? IP_VS_RT_MODE_LOCAL : IP_VS_RT_MODE_NON_LOCAL) &
rt_mode)) {
--
1.8.4
^ permalink raw reply related [flat|nested] 28+ messages in thread* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
2013-10-16 0:02 [RFC net-next] ipv6: Use destination address determined by IPVS Simon Horman
@ 2013-10-16 0:13 ` Eric Dumazet
2013-10-16 0:28 ` Simon Horman
2013-10-16 0:53 ` Hannes Frederic Sowa
` (2 subsequent siblings)
3 siblings, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2013-10-16 0:13 UTC (permalink / raw)
To: Simon Horman
Cc: YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
netdev, Julian Anastasov, Mark Brooks
On Wed, 2013-10-16 at 09:02 +0900, Simon Horman wrote:
> In v3.9 6fd6ce2056de2709 ("ipv6: Do not depend on rt->n in
> ip6_finish_output2()") changed the behaviour of ip6_finish_output2()
> such that it creates and uses a neigh entry if none is found.
> Subsequently the 'n' field was removed from struct rt6_info.
>
> Unfortunately my analysis is that in the case of IPVS direct routing this
> change leads to incorrect behaviour as in this case packets may be output
> to a destination other than where they would be output according to the
> route table. In particular, the destination address may actually be a local
> address and empirically a neighbour lookup seems to result in it becoming
> unreachable.
>
> This patch resolves the problem by providing the destination address
> determined by IPVS to ip6_finish_output2() in the skb callback. Although
> this seems to work I can see several problems with this approach:
>
> * It is rather ugly, stuffing an IPVS exception right in
> the middle of IPv6 code. The overhead could be eliminated for many users
> by using a staic key. But none the less it is not attractive.
>
> * The use of the skb callback is may not be valid
> as it crosses from IPVS to IPv6 code. A possible, though unpleasant,
> alternative is to add a new field to struct sk_buff.
Please no ;)
>
> * This covers all IPv6 packets output by IPVS but actually
> only those output using IPVS Direct-Routing need this. One way to
> resolve this would be to add a more fine-grained ipvs_property to
> struct sk_buff.
>
> Reported-by: Mark Brooks <mark@loadbalancer.org>
> Signed-off-by: Simon Horman <horms@verge.net.au>
> ---
> include/net/ip_vs.h | 6 ++++++
> net/ipv6/ip6_output.c | 9 +++++++--
> net/netfilter/ipvs/ip_vs_xmit.c | 2 ++
> 3 files changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
> index 1c2e1b9..11d90a6 100644
> --- a/include/net/ip_vs.h
> +++ b/include/net/ip_vs.h
> @@ -1649,4 +1649,10 @@ ip_vs_dest_conn_overhead(struct ip_vs_dest *dest)
> atomic_read(&dest->inactconns);
> }
>
> +struct ipvs_skb_cb {
> + struct in6_addr *daddr;
> +};
So we pass a reference.
> +
> +#define IP_VS_SKB_CB(skb) ((struct ipvs_skb_cb *)&(skb)->cb)
> +
> #endif /* _NET_IP_VS_H */
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index a54c45c..a340180 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -52,6 +52,7 @@
> #include <net/addrconf.h>
> #include <net/rawv6.h>
> #include <net/icmp.h>
> +#include <net/ip_vs.h>
> #include <net/xfrm.h>
> #include <net/checksum.h>
> #include <linux/mroute6.h>
> @@ -61,7 +62,7 @@ static int ip6_finish_output2(struct sk_buff *skb)
> struct dst_entry *dst = skb_dst(skb);
> struct net_device *dev = dst->dev;
> struct neighbour *neigh;
> - struct in6_addr *nexthop;
> + struct in6_addr *nexthop, *daddr;
> int ret;
>
> skb->protocol = htons(ETH_P_IPV6);
> @@ -105,7 +106,11 @@ static int ip6_finish_output2(struct sk_buff *skb)
> }
>
> rcu_read_lock_bh();
> - nexthop = rt6_nexthop((struct rt6_info *)dst, &ipv6_hdr(skb)->daddr);
> + if (unlikely(IS_ENABLED(CONFIG_IP_VS) && skb->ipvs_property))
> + daddr = IP_VS_SKB_CB(skb)->daddr;
What guarantee do we have daddr points to valid memory (not already
freed/reused) ?
I guess things like NFQUEUE could happen ?
^ permalink raw reply [flat|nested] 28+ messages in thread* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
2013-10-16 0:13 ` Eric Dumazet
@ 2013-10-16 0:28 ` Simon Horman
2013-10-16 0:39 ` Eric Dumazet
0 siblings, 1 reply; 28+ messages in thread
From: Simon Horman @ 2013-10-16 0:28 UTC (permalink / raw)
To: Eric Dumazet
Cc: YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
netdev, Julian Anastasov, Mark Brooks
On Tue, Oct 15, 2013 at 05:13:46PM -0700, Eric Dumazet wrote:
> On Wed, 2013-10-16 at 09:02 +0900, Simon Horman wrote:
> > In v3.9 6fd6ce2056de2709 ("ipv6: Do not depend on rt->n in
> > ip6_finish_output2()") changed the behaviour of ip6_finish_output2()
> > such that it creates and uses a neigh entry if none is found.
> > Subsequently the 'n' field was removed from struct rt6_info.
> >
> > Unfortunately my analysis is that in the case of IPVS direct routing this
> > change leads to incorrect behaviour as in this case packets may be output
> > to a destination other than where they would be output according to the
> > route table. In particular, the destination address may actually be a local
> > address and empirically a neighbour lookup seems to result in it becoming
> > unreachable.
> >
> > This patch resolves the problem by providing the destination address
> > determined by IPVS to ip6_finish_output2() in the skb callback. Although
> > this seems to work I can see several problems with this approach:
> >
> > * It is rather ugly, stuffing an IPVS exception right in
> > the middle of IPv6 code. The overhead could be eliminated for many users
> > by using a staic key. But none the less it is not attractive.
> >
> > * The use of the skb callback is may not be valid
> > as it crosses from IPVS to IPv6 code. A possible, though unpleasant,
> > alternative is to add a new field to struct sk_buff.
>
> Please no ;)
I thought of you when I wrote that comment :)
> > * This covers all IPv6 packets output by IPVS but actually
> > only those output using IPVS Direct-Routing need this. One way to
> > resolve this would be to add a more fine-grained ipvs_property to
> > struct sk_buff.
> >
> > Reported-by: Mark Brooks <mark@loadbalancer.org>
> > Signed-off-by: Simon Horman <horms@verge.net.au>
> > ---
> > include/net/ip_vs.h | 6 ++++++
> > net/ipv6/ip6_output.c | 9 +++++++--
> > net/netfilter/ipvs/ip_vs_xmit.c | 2 ++
> > 3 files changed, 15 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
> > index 1c2e1b9..11d90a6 100644
> > --- a/include/net/ip_vs.h
> > +++ b/include/net/ip_vs.h
> > @@ -1649,4 +1649,10 @@ ip_vs_dest_conn_overhead(struct ip_vs_dest *dest)
> > atomic_read(&dest->inactconns);
> > }
> >
> > +struct ipvs_skb_cb {
> > + struct in6_addr *daddr;
> > +};
>
> So we pass a reference.
>
> > +
> > +#define IP_VS_SKB_CB(skb) ((struct ipvs_skb_cb *)&(skb)->cb)
> > +
> > #endif /* _NET_IP_VS_H */
> > diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> > index a54c45c..a340180 100644
> > --- a/net/ipv6/ip6_output.c
> > +++ b/net/ipv6/ip6_output.c
> > @@ -52,6 +52,7 @@
> > #include <net/addrconf.h>
> > #include <net/rawv6.h>
> > #include <net/icmp.h>
> > +#include <net/ip_vs.h>
> > #include <net/xfrm.h>
> > #include <net/checksum.h>
> > #include <linux/mroute6.h>
> > @@ -61,7 +62,7 @@ static int ip6_finish_output2(struct sk_buff *skb)
> > struct dst_entry *dst = skb_dst(skb);
> > struct net_device *dev = dst->dev;
> > struct neighbour *neigh;
> > - struct in6_addr *nexthop;
> > + struct in6_addr *nexthop, *daddr;
> > int ret;
> >
> > skb->protocol = htons(ETH_P_IPV6);
> > @@ -105,7 +106,11 @@ static int ip6_finish_output2(struct sk_buff *skb)
> > }
> >
> > rcu_read_lock_bh();
> > - nexthop = rt6_nexthop((struct rt6_info *)dst, &ipv6_hdr(skb)->daddr);
> > + if (unlikely(IS_ENABLED(CONFIG_IP_VS) && skb->ipvs_property))
> > + daddr = IP_VS_SKB_CB(skb)->daddr;
>
> What guarantee do we have daddr points to valid memory (not already
> freed/reused) ?
I can change that to passing a value if there is a risk
the reference could become invalid. To be honest I am more
worried that skb->cb might be clobbered entirely.
> I guess things like NFQUEUE could happen ?
Could you expand a little?
^ permalink raw reply [flat|nested] 28+ messages in thread* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
2013-10-16 0:28 ` Simon Horman
@ 2013-10-16 0:39 ` Eric Dumazet
2013-10-16 2:13 ` Simon Horman
0 siblings, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2013-10-16 0:39 UTC (permalink / raw)
To: Simon Horman
Cc: YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
netdev, Julian Anastasov, Mark Brooks
On Wed, 2013-10-16 at 09:28 +0900, Simon Horman wrote:
> > I guess things like NFQUEUE could happen ?
>
> Could you expand a little?
This was to point that between IPVS and ipv6 stack we might have a
delay, and daddr was maybe pointed to a freed memory.
IP6CB only uses 24 bytes, so I think you would be safe adding 16 bytes.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
2013-10-16 0:39 ` Eric Dumazet
@ 2013-10-16 2:13 ` Simon Horman
2013-10-16 2:40 ` Eric Dumazet
0 siblings, 1 reply; 28+ messages in thread
From: Simon Horman @ 2013-10-16 2:13 UTC (permalink / raw)
To: Eric Dumazet
Cc: YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
netdev, Julian Anastasov, Mark Brooks
On Tue, Oct 15, 2013 at 05:39:09PM -0700, Eric Dumazet wrote:
> On Wed, 2013-10-16 at 09:28 +0900, Simon Horman wrote:
>
> > > I guess things like NFQUEUE could happen ?
> >
> > Could you expand a little?
>
> This was to point that between IPVS and ipv6 stack we might have a
> delay, and daddr was maybe pointed to a freed memory.
>
> IP6CB only uses 24 bytes, so I think you would be safe adding 16 bytes.
That does seem very promising but while implementing it
I hit a problem.
struct tcp_skb_cb includes a field of type struct inet6_skb_parm. And
expanding struct inet6_skb_parm by 16 bytes means that struct tcp_skb_cb is
now larger than 48 bytes and no longer fits in skb->cb.
Is it appropriate to grow skb->cb as the comment above struct tcp_skb_cb
suggests?
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
2013-10-16 2:13 ` Simon Horman
@ 2013-10-16 2:40 ` Eric Dumazet
2013-10-16 4:26 ` Simon Horman
0 siblings, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2013-10-16 2:40 UTC (permalink / raw)
To: Simon Horman
Cc: YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
netdev, Julian Anastasov, Mark Brooks
On Wed, 2013-10-16 at 11:13 +0900, Simon Horman wrote:
> That does seem very promising but while implementing it
> I hit a problem.
>
> struct tcp_skb_cb includes a field of type struct inet6_skb_parm. And
> expanding struct inet6_skb_parm by 16 bytes means that struct tcp_skb_cb is
> now larger than 48 bytes and no longer fits in skb->cb.
>
> Is it appropriate to grow skb->cb as the comment above struct tcp_skb_cb
> suggests?
Ah well...
No, its not appropriate to grow sk_buff by 16 bytes, sorry.
You could not change struct inet6_skb_parm, but define IP6CB as
a compound
struct ip6cb {
struct inet6_skb_parm foo;
struct in6_addr bar;
};
^ permalink raw reply [flat|nested] 28+ messages in thread* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
2013-10-16 2:40 ` Eric Dumazet
@ 2013-10-16 4:26 ` Simon Horman
0 siblings, 0 replies; 28+ messages in thread
From: Simon Horman @ 2013-10-16 4:26 UTC (permalink / raw)
To: Eric Dumazet
Cc: YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
netdev, Julian Anastasov, Mark Brooks
On Tue, Oct 15, 2013 at 07:40:59PM -0700, Eric Dumazet wrote:
> On Wed, 2013-10-16 at 11:13 +0900, Simon Horman wrote:
>
> > That does seem very promising but while implementing it
> > I hit a problem.
> >
> > struct tcp_skb_cb includes a field of type struct inet6_skb_parm. And
> > expanding struct inet6_skb_parm by 16 bytes means that struct tcp_skb_cb is
> > now larger than 48 bytes and no longer fits in skb->cb.
> >
> > Is it appropriate to grow skb->cb as the comment above struct tcp_skb_cb
> > suggests?
>
> Ah well...
>
> No, its not appropriate to grow sk_buff by 16 bytes, sorry.
>
> You could not change struct inet6_skb_parm, but define IP6CB as
> a compound
>
> struct ip6cb {
> struct inet6_skb_parm foo;
> struct in6_addr bar;
> };
Thanks for your guidance, that possibility had occured to me too.
I'll rework the patch accordingly.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
2013-10-16 0:02 [RFC net-next] ipv6: Use destination address determined by IPVS Simon Horman
2013-10-16 0:13 ` Eric Dumazet
@ 2013-10-16 0:53 ` Hannes Frederic Sowa
2013-10-16 2:14 ` Simon Horman
2013-10-16 7:27 ` Julian Anastasov
2013-10-17 23:03 ` Julian Anastasov
3 siblings, 1 reply; 28+ messages in thread
From: Hannes Frederic Sowa @ 2013-10-16 0:53 UTC (permalink / raw)
To: Simon Horman
Cc: YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
netdev, Julian Anastasov, Mark Brooks
On Wed, Oct 16, 2013 at 09:02:31AM +0900, Simon Horman wrote:
> In v3.9 6fd6ce2056de2709 ("ipv6: Do not depend on rt->n in
> ip6_finish_output2()") changed the behaviour of ip6_finish_output2()
> such that it creates and uses a neigh entry if none is found.
> Subsequently the 'n' field was removed from struct rt6_info.
>
> Unfortunately my analysis is that in the case of IPVS direct routing this
> change leads to incorrect behaviour as in this case packets may be output
> to a destination other than where they would be output according to the
> route table. In particular, the destination address may actually be a local
> address and empirically a neighbour lookup seems to result in it becoming
> unreachable.
>
> This patch resolves the problem by providing the destination address
> determined by IPVS to ip6_finish_output2() in the skb callback. Although
> this seems to work I can see several problems with this approach:
>
> * It is rather ugly, stuffing an IPVS exception right in
> the middle of IPv6 code. The overhead could be eliminated for many users
> by using a staic key. But none the less it is not attractive.
>
> * The use of the skb callback is may not be valid
> as it crosses from IPVS to IPv6 code. A possible, though unpleasant,
> alternative is to add a new field to struct sk_buff.
>
> * This covers all IPv6 packets output by IPVS but actually
> only those output using IPVS Direct-Routing need this. One way to
> resolve this would be to add a more fine-grained ipvs_property to
> struct sk_buff.
Hmm, that reminds me on the following bug report which would be nice we could
solve in one go, too: http://www.spinics.net/lists/netdev/msg250785.html
Greetings,
Hannes
^ permalink raw reply [flat|nested] 28+ messages in thread* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
2013-10-16 0:53 ` Hannes Frederic Sowa
@ 2013-10-16 2:14 ` Simon Horman
0 siblings, 0 replies; 28+ messages in thread
From: Simon Horman @ 2013-10-16 2:14 UTC (permalink / raw)
To: YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
netdev, Julian Anastasov, Mark Brooks
On Wed, Oct 16, 2013 at 02:53:22AM +0200, Hannes Frederic Sowa wrote:
> On Wed, Oct 16, 2013 at 09:02:31AM +0900, Simon Horman wrote:
> > In v3.9 6fd6ce2056de2709 ("ipv6: Do not depend on rt->n in
> > ip6_finish_output2()") changed the behaviour of ip6_finish_output2()
> > such that it creates and uses a neigh entry if none is found.
> > Subsequently the 'n' field was removed from struct rt6_info.
> >
> > Unfortunately my analysis is that in the case of IPVS direct routing this
> > change leads to incorrect behaviour as in this case packets may be output
> > to a destination other than where they would be output according to the
> > route table. In particular, the destination address may actually be a local
> > address and empirically a neighbour lookup seems to result in it becoming
> > unreachable.
> >
> > This patch resolves the problem by providing the destination address
> > determined by IPVS to ip6_finish_output2() in the skb callback. Although
> > this seems to work I can see several problems with this approach:
> >
> > * It is rather ugly, stuffing an IPVS exception right in
> > the middle of IPv6 code. The overhead could be eliminated for many users
> > by using a staic key. But none the less it is not attractive.
> >
> > * The use of the skb callback is may not be valid
> > as it crosses from IPVS to IPv6 code. A possible, though unpleasant,
> > alternative is to add a new field to struct sk_buff.
> >
> > * This covers all IPv6 packets output by IPVS but actually
> > only those output using IPVS Direct-Routing need this. One way to
> > resolve this would be to add a more fine-grained ipvs_property to
> > struct sk_buff.
>
> Hmm, that reminds me on the following bug report which would be nice we could
> solve in one go, too: http://www.spinics.net/lists/netdev/msg250785.html
I think it should be possible to solve that using the
IP6CB() approach that Eric suggested. Hopefully we can make
that approach fly.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
2013-10-16 0:02 [RFC net-next] ipv6: Use destination address determined by IPVS Simon Horman
2013-10-16 0:13 ` Eric Dumazet
2013-10-16 0:53 ` Hannes Frederic Sowa
@ 2013-10-16 7:27 ` Julian Anastasov
2013-10-16 14:32 ` Hannes Frederic Sowa
2013-10-17 23:03 ` Julian Anastasov
3 siblings, 1 reply; 28+ messages in thread
From: Julian Anastasov @ 2013-10-16 7:27 UTC (permalink / raw)
To: Simon Horman
Cc: YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
netdev, Mark Brooks
Hello,
On Wed, 16 Oct 2013, Simon Horman wrote:
> In v3.9 6fd6ce2056de2709 ("ipv6: Do not depend on rt->n in
> ip6_finish_output2()") changed the behaviour of ip6_finish_output2()
> such that it creates and uses a neigh entry if none is found.
> Subsequently the 'n' field was removed from struct rt6_info.
Similar change in IPv4 opened the Pandora box:
IPVS, xt_TEE, raw.c (IP_HDRINCL). May be the corrsponding
places in IPv6 have the same problem.
I don't know the IPv6 routing but if we find a way
to keep the desired nexthop in rt6i_gateway and to add
RTF_GATEWAY checks here and there such solution would be more
general. FLOWI_FLAG_KNOWN_NH flag can help, if needed.
Regards
--
Julian Anastasov <ja@ssi.bg>
^ permalink raw reply [flat|nested] 28+ messages in thread* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
2013-10-16 7:27 ` Julian Anastasov
@ 2013-10-16 14:32 ` Hannes Frederic Sowa
2013-10-16 20:22 ` Julian Anastasov
0 siblings, 1 reply; 28+ messages in thread
From: Hannes Frederic Sowa @ 2013-10-16 14:32 UTC (permalink / raw)
To: Julian Anastasov
Cc: Simon Horman,
YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
netdev, Mark Brooks
On Wed, Oct 16, 2013 at 10:27:47AM +0300, Julian Anastasov wrote:
>
> Hello,
>
> On Wed, 16 Oct 2013, Simon Horman wrote:
>
> > In v3.9 6fd6ce2056de2709 ("ipv6: Do not depend on rt->n in
> > ip6_finish_output2()") changed the behaviour of ip6_finish_output2()
> > such that it creates and uses a neigh entry if none is found.
> > Subsequently the 'n' field was removed from struct rt6_info.
>
> Similar change in IPv4 opened the Pandora box:
> IPVS, xt_TEE, raw.c (IP_HDRINCL). May be the corrsponding
> places in IPv6 have the same problem.
>
> I don't know the IPv6 routing but if we find a way
> to keep the desired nexthop in rt6i_gateway and to add
> RTF_GATEWAY checks here and there such solution would be more
> general. FLOWI_FLAG_KNOWN_NH flag can help, if needed.
I thought about this yesterday but did not see an easy way. How does the IPv4
implementation accomplish this?
ipvs caches the dst in its own infrastructure, so we need to be sure we don't
disconnect this dst from the ipv6 routing table, otherwise ip6_dst_check won't
recognize when relookups should be done. Playing games with RTF_GATEWAY seems
dangerous then.
I am currently thinking about using a new flag to replace the nexthop
information with rt6i_dst in these circumstances. The flag would have
to be included in the skb somewhere.
Greetings,
Hannes
^ permalink raw reply [flat|nested] 28+ messages in thread* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
2013-10-16 14:32 ` Hannes Frederic Sowa
@ 2013-10-16 20:22 ` Julian Anastasov
2013-10-16 21:59 ` Hannes Frederic Sowa
0 siblings, 1 reply; 28+ messages in thread
From: Julian Anastasov @ 2013-10-16 20:22 UTC (permalink / raw)
To: Hannes Frederic Sowa
Cc: Simon Horman,
YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
netdev, Mark Brooks
Hello,
On Wed, 16 Oct 2013, Hannes Frederic Sowa wrote:
> On Wed, Oct 16, 2013 at 10:27:47AM +0300, Julian Anastasov wrote:
> >
> > I don't know the IPv6 routing but if we find a way
> > to keep the desired nexthop in rt6i_gateway and to add
> > RTF_GATEWAY checks here and there such solution would be more
> > general. FLOWI_FLAG_KNOWN_NH flag can help, if needed.
>
> I thought about this yesterday but did not see an easy way. How does the IPv4
> implementation accomplish this?
In IPv4 rt->rt_flags has no bit to indicate if the route
is via gateway (like RTF_GATEWAY in IPv6). We added rt_uses_gateway
for this purpose.
In the default case, rt_gateway may contain 0 if we return
cached result, eg. when target is part of a local subnet.
Then IPVS/TEE/RAW can request valid rt_gateway, even with the price
of a cloned result, so that rt_gateway can remember the requested
nexthop which may differ from daddr.
> ipvs caches the dst in its own infrastructure, so we need to be sure we don't
> disconnect this dst from the ipv6 routing table, otherwise ip6_dst_check won't
> recognize when relookups should be done. Playing games with RTF_GATEWAY seems
> dangerous then.
dst_check works for IPVS. There is a problem only
with the recent changes that moved the indication for PMTU
change from dst_check to dst_mtu() calls. But this is safe
for IPVS, it handles FRAG_NEEDED for the tunneling mode itself.
Initially, I thought IPv6 stores zeroes in rt6i_gateway.
But now I see rt6_alloc_cow() to be called for the case I assumed
to fail - when no gateway is used.
So, I'll try to test the IPVS case in the following 1-2 days
and will report after adding some printks. If xt_TEE has
the same problem then it should not be IPVS-specific. RAW not
tested yet.
Regards
--
Julian Anastasov <ja@ssi.bg>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
2013-10-16 20:22 ` Julian Anastasov
@ 2013-10-16 21:59 ` Hannes Frederic Sowa
2013-10-16 22:50 ` Julian Anastasov
0 siblings, 1 reply; 28+ messages in thread
From: Hannes Frederic Sowa @ 2013-10-16 21:59 UTC (permalink / raw)
To: Julian Anastasov
Cc: Simon Horman,
YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
netdev, Mark Brooks
On Wed, Oct 16, 2013 at 11:22:40PM +0300, Julian Anastasov wrote:
>
> Hello,
>
> On Wed, 16 Oct 2013, Hannes Frederic Sowa wrote:
>
> > On Wed, Oct 16, 2013 at 10:27:47AM +0300, Julian Anastasov wrote:
> > >
> > > I don't know the IPv6 routing but if we find a way
> > > to keep the desired nexthop in rt6i_gateway and to add
> > > RTF_GATEWAY checks here and there such solution would be more
> > > general. FLOWI_FLAG_KNOWN_NH flag can help, if needed.
> >
> > I thought about this yesterday but did not see an easy way. How does the IPv4
> > implementation accomplish this?
>
> In IPv4 rt->rt_flags has no bit to indicate if the route
> is via gateway (like RTF_GATEWAY in IPv6). We added rt_uses_gateway
> for this purpose.
>
> In the default case, rt_gateway may contain 0 if we return
> cached result, eg. when target is part of a local subnet.
> Then IPVS/TEE/RAW can request valid rt_gateway, even with the price
> of a cloned result, so that rt_gateway can remember the requested
> nexthop which may differ from daddr.
To have ip6_dst_check working, there must to be a valid link from
the rt6_info to the fib6_node. Otherwise we cannot check the serial
number. As I currently see we also need a link from the fib6_node down
to the dst entry for resource management. Thus we would have to insert
the special dst-entry with RTF_GATEWAY and non-null rt6i_gateway back
into the fib and have it globally visible. This could have unforseen
side effects. We still cache all dst entries in the fib. One think I
foresee as a possible problem is the automatic aggregation of ECMP routes,
too.
IPv4 does not seem to need this link at all.
> > ipvs caches the dst in its own infrastructure, so we need to be sure we don't
> > disconnect this dst from the ipv6 routing table, otherwise ip6_dst_check won't
> > recognize when relookups should be done. Playing games with RTF_GATEWAY seems
> > dangerous then.
>
> dst_check works for IPVS. There is a problem only
> with the recent changes that moved the indication for PMTU
> change from dst_check to dst_mtu() calls. But this is safe
> for IPVS, it handles FRAG_NEEDED for the tunneling mode itself.
Ok, I see.
> Initially, I thought IPv6 stores zeroes in rt6i_gateway.
> But now I see rt6_alloc_cow() to be called for the case I assumed
> to fail - when no gateway is used.
>
> So, I'll try to test the IPVS case in the following 1-2 days
> and will report after adding some printks. If xt_TEE has
> the same problem then it should not be IPVS-specific. RAW not
> tested yet.
We should provide something similar to what IPv4 does with the
KNOWN_NH flag. I guess my idea with exchanging rt6i_dst as nexthop would
solve this without too much hassle but this would have to be checked by
implementing it.
I don't think that storing a changed nexthop in the ipv6 cb is that nice and
maintainable.
Greetings,
Hannes
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
2013-10-16 21:59 ` Hannes Frederic Sowa
@ 2013-10-16 22:50 ` Julian Anastasov
0 siblings, 0 replies; 28+ messages in thread
From: Julian Anastasov @ 2013-10-16 22:50 UTC (permalink / raw)
To: Hannes Frederic Sowa
Cc: Simon Horman,
YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
netdev, Mark Brooks
Hello,
On Wed, 16 Oct 2013, Hannes Frederic Sowa wrote:
> To have ip6_dst_check working, there must to be a valid link from
> the rt6_info to the fib6_node. Otherwise we cannot check the serial
> number. As I currently see we also need a link from the fib6_node down
> to the dst entry for resource management. Thus we would have to insert
> the special dst-entry with RTF_GATEWAY and non-null rt6i_gateway back
> into the fib and have it globally visible. This could have unforseen
> side effects. We still cache all dst entries in the fib. One think I
> foresee as a possible problem is the automatic aggregation of ECMP routes,
> too.
I'm still not sure what is needed. Looking at
ip6_pol_route(), I see that everything should just work: for
routes without RTF_GATEWAY flag we return cloned route
from rt6_alloc_cow() with valid rt6i_gateway. I still
didn't tested the problem myself, so I'll stop with the
comments before that.
> We should provide something similar to what IPv4 does with the
> KNOWN_NH flag. I guess my idea with exchanging rt6i_dst as nexthop would
> solve this without too much hassle but this would have to be checked by
> implementing it.
>
> I don't think that storing a changed nexthop in the ipv6 cb is that nice and
> maintainable.
Agreed. I'm not going to test that change. I'll test
latest net-next without any changes to see where exactly is
the failure because the IPv6 routing seems capable for what
we need. I have problems with my setup, so I'll need some
days. As Simon is also testing the problem, he can find the
reason before me.
Regards
--
Julian Anastasov <ja@ssi.bg>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
2013-10-16 0:02 [RFC net-next] ipv6: Use destination address determined by IPVS Simon Horman
` (2 preceding siblings ...)
2013-10-16 7:27 ` Julian Anastasov
@ 2013-10-17 23:03 ` Julian Anastasov
2013-10-18 2:10 ` Hannes Frederic Sowa
3 siblings, 1 reply; 28+ messages in thread
From: Julian Anastasov @ 2013-10-17 23:03 UTC (permalink / raw)
To: Simon Horman
Cc: YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
netdev, Mark Brooks, Phil Oester
Hello,
Here is a solution that should work not only for IPVS.
If the change looks correct I'll send it in a separate message.
[PATCH net] ipv6: always prefer rt6i_gateway if present
From: Julian Anastasov <ja@ssi.bg>
In v3.9 6fd6ce2056de2709 ("ipv6: Do not depend on rt->n in
ip6_finish_output2()." changed the behaviour of ip6_finish_output2()
such that the recently introduced rt6_nexthop() is used
instead of an assigned neighbor.
As rt6_nexthop() prefers rt6i_gateway only for gatewayed
routes this causes a problem for users like IPVS, xt_TEE and
RAW(hdrincl) if they want to use different address for routing
compared to the destination address.
Fix it by considering the rt6i_gateway address in all
cases, so that traffic routed to address on local subnet is
not wrongly diverted to the destination address.
Thanks to Simon Horman and Phil Oester for spotting the
problematic commit.
Reported-by: Phil Oester <kernel@linuxace.com>
Reported-by: Mark Brooks <mark@loadbalancer.org>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
Please review for possible side effects when using
rt6i_gateway without RTF_GATEWAY!
include/net/ip6_route.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index f525e70..481404a 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -196,7 +196,7 @@ static inline int ip6_skb_dst_mtu(struct sk_buff *skb)
static inline struct in6_addr *rt6_nexthop(struct rt6_info *rt, struct in6_addr *dest)
{
- if (rt->rt6i_flags & RTF_GATEWAY)
+ if (rt->rt6i_flags & RTF_GATEWAY || !ipv6_addr_any(&rt->rt6i_gateway))
return &rt->rt6i_gateway;
return dest;
}
--
1.8.3.1
Regards
--
Julian Anastasov <ja@ssi.bg>
^ permalink raw reply related [flat|nested] 28+ messages in thread* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
2013-10-17 23:03 ` Julian Anastasov
@ 2013-10-18 2:10 ` Hannes Frederic Sowa
2013-10-18 6:33 ` Julian Anastasov
2013-10-19 16:37 ` Julian Anastasov
0 siblings, 2 replies; 28+ messages in thread
From: Hannes Frederic Sowa @ 2013-10-18 2:10 UTC (permalink / raw)
To: Julian Anastasov
Cc: Simon Horman,
YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
netdev, Mark Brooks, Phil Oester
On Fri, Oct 18, 2013 at 02:03:27AM +0300, Julian Anastasov wrote:
>
> Hello,
>
> Here is a solution that should work not only for IPVS.
> If the change looks correct I'll send it in a separate message.
>
> [PATCH net] ipv6: always prefer rt6i_gateway if present
>
> From: Julian Anastasov <ja@ssi.bg>
>
> In v3.9 6fd6ce2056de2709 ("ipv6: Do not depend on rt->n in
> ip6_finish_output2()." changed the behaviour of ip6_finish_output2()
> such that the recently introduced rt6_nexthop() is used
> instead of an assigned neighbor.
>
> As rt6_nexthop() prefers rt6i_gateway only for gatewayed
> routes this causes a problem for users like IPVS, xt_TEE and
> RAW(hdrincl) if they want to use different address for routing
> compared to the destination address.
>
> Fix it by considering the rt6i_gateway address in all
> cases, so that traffic routed to address on local subnet is
> not wrongly diverted to the destination address.
>
> Thanks to Simon Horman and Phil Oester for spotting the
> problematic commit.
I played around with your patch and tested xt_TEE. I added a TEE rule to
mangle/OUTPUT and pinged. This happend, I have not yet analyzed it:
[ 101.126649] ------------[ cut here ]------------
[ 101.128436] BUG: unable to handle kernel paging request at fffffffb8a2fda88
[ 101.129421] IP: [<ffffffff810c9737>] cpuacct_charge+0x97/0x200
[ 101.129421] PGD 1c0f067 PUD 0
[ 101.129421] Thread overran stack, or stack corrupted
[ 101.129421] Oops: 0000 [#1] SMP
[ 101.129421] Modules linked in: xt_TEE nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_nat nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle iptable_security iptable_raw nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables joydev serio_raw virtio_balloon i2c_piix4 i2c_core nfsd auth_rpcgss nfs_acl lockd sunrpc virtio_blk virtio_net ata_generic pata_acpi
[ 101.129421] CPU: 1 PID: 1190 Comm: ping6 Not tainted 3.12.0-rc3+ #2
[ 101.129421] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 101.129421] task: ffff8800c30a4590 ti: ffff8801193ce000 task.ti: ffff8801193ce000
[ 101.129421] RIP: 0010:[<ffffffff810c9737>] [<ffffffff810c9737>] cpuacct_charge+0x97/0x200
[ 101.129421] RSP: 0018:ffff88011b403d60 EFLAGS: 00010002
[ 101.129421] RAX: 000000000000e5d8 RBX: 0000000000244314 RCX: ffffffff810b87bd
[ 101.129421] RDX: ffffffff81c4dce0 RSI: ffffffff81c47e00 RDI: 0000000000000046
[ 101.129421] RBP: ffff88011b403d88 R08: ffff8800c30a5308 R09: 0000000000000001
[ 101.129421] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8800c30a4590
[ 101.129421] R13: 00000000810b87bd R14: ffff880117d5b450 R15: 0000000000000000
[ 101.129421] FS: 00007f79260bc740(0000) GS:ffff88011b400000(0000) knlGS:0000000000000000
[ 101.129421] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 101.129421] CR2: fffffffb8a2fda88 CR3: 0000000117e40000 CR4: 00000000000006e0
[ 101.129421] Stack:
[ 101.129421] ffffffff810c96a5 ffff8800c30a45f8 ffff8800c8d1da40 ffff8800c30a4590
[ 101.129421] 0000000000244314 ffff88011b403dc8 ffffffff810bce9c 000000003264e993
[ 101.129421] ffff8800c30a45f8 0000000000000001 ffff8800c8d1da40 ffff88011b5d5240
[ 101.129421] Call Trace:
[ 101.129421] <IRQ>
[ 101.129421] [<ffffffff810c96a5>] ? cpuacct_charge+0x5/0x200
[ 101.129421] [<ffffffff810bce9c>] update_curr+0xcc/0x210
[ 101.129421] [<ffffffff810beb19>] task_tick_fair+0x2b9/0x680
[ 101.129421] [<ffffffff810b8948>] ? sched_clock_cpu+0xa8/0x100
[ 101.129421] [<ffffffff810b3564>] scheduler_tick+0x64/0xe0
[ 101.129421] [<ffffffff810866a6>] update_process_times+0x66/0x80
[ 101.129421] [<ffffffff810e7a15>] tick_sched_handle.isra.15+0x25/0x60
[ 101.129421] [<ffffffff810e7a91>] tick_sched_timer+0x41/0x60
[ 101.129421] [<ffffffff810a5146>] __run_hrtimer+0x86/0x490
[ 101.129421] [<ffffffff810e7a50>] ? tick_sched_handle.isra.15+0x60/0x60
[ 101.129421] [<ffffffff810a5fc7>] hrtimer_interrupt+0xf7/0x240
[ 101.129421] [<ffffffff8104a877>] local_apic_timer_interrupt+0x37/0x60
[ 101.129421] [<ffffffff8173dcbf>] smp_apic_timer_interrupt+0x3f/0x60
[ 101.129421] [<ffffffff8173c6b2>] apic_timer_interrupt+0x72/0x80
[ 101.129421] <EOI>
[ 101.129421] [<ffffffff810d5bdd>] ? vprintk_emit+0x1dd/0x5e0
[ 101.129421] [<ffffffff8107afe4>] ? __local_bh_disable+0x94/0xa0
[ 101.129421] [<ffffffff81723dbf>] printk+0x67/0x69
[ 101.129421] [<ffffffff810ed07c>] ? trace_hardirqs_on_caller+0xac/0x1c0
[ 101.129421] [<ffffffff8107afe4>] ? __local_bh_disable+0x94/0xa0
[ 101.129421] [<ffffffff81075394>] warn_slowpath_common+0x34/0xa0
[ 101.129421] [<ffffffff81600309>] ? dst_alloc+0x139/0x170
[ 101.129421] [<ffffffff810754ba>] warn_slowpath_null+0x1a/0x20
[ 101.129421] [<ffffffff8107afe4>] __local_bh_disable+0x94/0xa0
[ 101.129421] [<ffffffff8107b007>] local_bh_disable+0x17/0x20
[ 101.129421] [<ffffffff81600309>] dst_alloc+0x139/0x170
[ 101.129421] [<ffffffff816d4e47>] icmp6_dst_alloc+0xf7/0x430
[ 101.129421] [<ffffffff816d4d55>] ? icmp6_dst_alloc+0x5/0x430
[ 101.129421] [<ffffffff816dde26>] ndisc_send_skb+0x3e6/0x540
[ 101.129421] [<ffffffff816df81c>] ndisc_send_ns+0xdc/0x140
[ 101.129421] [<ffffffff816d394b>] rt6_probe+0x31b/0x380
[ 101.129421] [<ffffffff816d3690>] ? rt6_probe+0x60/0x380
[ 101.129421] [<ffffffff816d1b93>] ? rt6_score_route+0xa3/0x380
[ 101.129421] [<ffffffff816d471c>] ip6_pol_route.isra.47+0x2fc/0x4f0
[ 101.129421] [<ffffffff816d4940>] ? ip6_pol_route_input+0x30/0x30
[ 101.129421] [<ffffffff816d496a>] ip6_pol_route_output+0x2a/0x30
[ 101.129421] [<ffffffff8170516e>] fib6_rule_action+0xbe/0x200
[ 101.129421] [<ffffffff810ea057>] ? __lock_is_held+0x57/0x80
[ 101.129421] [<ffffffff816142e5>] fib_rules_lookup+0x145/0x390
[ 101.129421] [<ffffffff816141a5>] ? fib_rules_lookup+0x5/0x390
[ 101.129421] [<ffffffff81705464>] fib6_rule_lookup+0x44/0x80
[ 101.129421] [<ffffffff816d4940>] ? ip6_pol_route_input+0x30/0x30
[ 101.129421] [<ffffffff816d1263>] ip6_route_output+0x73/0xb0
[ 101.129421] [<ffffffffa01df54f>] tee_tg6+0x15f/0x228 [xt_TEE]
[ 101.129421] [<ffffffffa012413f>] ip6t_do_table+0x26f/0x435 [ip6_tables]
[ 101.129421] [<ffffffff8107b09a>] ? local_bh_enable+0x8a/0xf0
[ 101.129421] [<ffffffff816dd6e0>] ? ndisc_alloc_skb+0x110/0x110
[ 101.129421] [<ffffffffa01c10e8>] ip6table_mangle_hook+0xe8/0x170 [ip6table_mangle]
[ 101.129421] [<ffffffff810ea057>] ? __lock_is_held+0x57/0x80
[ 101.129421] [<ffffffff816dd6e0>] ? ndisc_alloc_skb+0x110/0x110
[ 101.129421] [<ffffffff8162ab5a>] nf_iterate+0xca/0x180
[ 101.129421] [<ffffffff816dd6e0>] ? ndisc_alloc_skb+0x110/0x110
[ 101.129421] [<ffffffff8162acf4>] nf_hook_slow+0xe4/0x290
[ 101.129421] [<ffffffff816dd6e0>] ? ndisc_alloc_skb+0x110/0x110
[ 101.129421] [<ffffffff816dddb2>] ndisc_send_skb+0x372/0x540
[ 101.129421] [<ffffffff816df81c>] ndisc_send_ns+0xdc/0x140
[ 101.129421] [<ffffffff816d394b>] rt6_probe+0x31b/0x380
[ 101.129421] [<ffffffff816d3690>] ? rt6_probe+0x60/0x380
[ 101.129421] [<ffffffff816d1b93>] ? rt6_score_route+0xa3/0x380
[ 101.129421] [<ffffffff816d471c>] ip6_pol_route.isra.47+0x2fc/0x4f0
[ 101.129421] [<ffffffff816d4940>] ? ip6_pol_route_input+0x30/0x30
[ 101.129421] [<ffffffff816d496a>] ip6_pol_route_output+0x2a/0x30
[ 101.129421] [<ffffffff8170516e>] fib6_rule_action+0xbe/0x200
[ 101.129421] [<ffffffff810ea057>] ? __lock_is_held+0x57/0x80
[ 101.129421] [<ffffffff816142e5>] fib_rules_lookup+0x145/0x390
[ 101.129421] [<ffffffff816141a5>] ? fib_rules_lookup+0x5/0x390
[ 101.129421] [<ffffffff81705464>] fib6_rule_lookup+0x44/0x80
[ 101.129421] [<ffffffff816d4940>] ? ip6_pol_route_input+0x30/0x30
[ 101.129421] [<ffffffff816d1263>] ip6_route_output+0x73/0xb0
[ 101.129421] [<ffffffffa01df54f>] tee_tg6+0x15f/0x228 [xt_TEE]
[ 101.129421] [<ffffffffa012413f>] ip6t_do_table+0x26f/0x435 [ip6_tables]
[ 101.129421] [<ffffffff8107b09a>] ? local_bh_enable+0x8a/0xf0
[ 101.129421] [<ffffffff816dd6e0>] ? ndisc_alloc_skb+0x110/0x110
[ 101.129421] [<ffffffffa01c10e8>] ip6table_mangle_hook+0xe8/0x170 [ip6table_mangle]
[ 101.129421] [<ffffffff810ea057>] ? __lock_is_held+0x57/0x80
[ 101.129421] [<ffffffff816dd6e0>] ? ndisc_alloc_skb+0x110/0x110
[ 101.129421] [<ffffffff8162ab5a>] nf_iterate+0xca/0x180
[ 101.129421] [<ffffffff816dd6e0>] ? ndisc_alloc_skb+0x110/0x110
[ 101.129421] [<ffffffff8162acf4>] nf_hook_slow+0xe4/0x290
[ 101.129421] [<ffffffff816dd6e0>] ? ndisc_alloc_skb+0x110/0x110
[ 101.129421] [<ffffffff816dddb2>] ndisc_send_skb+0x372/0x540
[ 101.129421] [<ffffffff816df81c>] ndisc_send_ns+0xdc/0x140
[ 101.129421] [<ffffffff816d394b>] rt6_probe+0x31b/0x380
[ 101.129421] [<ffffffff816d3690>] ? rt6_probe+0x60/0x380
[ 101.129421] [<ffffffff816d1b93>] ? rt6_score_route+0xa3/0x380
[ 101.129421] [<ffffffff816d471c>] ip6_pol_route.isra.47+0x2fc/0x4f0
[ 101.129421] [<ffffffff816d4940>] ? ip6_pol_route_input+0x30/0x30
[ 101.129421] [<ffffffff816d496a>] ip6_pol_route_output+0x2a/0x30
[ 101.129421] [<ffffffff8170516e>] fib6_rule_action+0xbe/0x200
[ 101.129421] [<ffffffff810ea057>] ? __lock_is_held+0x57/0x80
[ 101.129421] [<ffffffff816142e5>] fib_rules_lookup+0x145/0x390
[ 101.129421] [<ffffffff816141a5>] ? fib_rules_lookup+0x5/0x390
[ 101.129421] [<ffffffff81705464>] fib6_rule_lookup+0x44/0x80
[ 101.129421] [<ffffffff816d4940>] ? ip6_pol_route_input+0x30/0x30
[ 101.129421] [<ffffffff816d1263>] ip6_route_output+0x73/0xb0
[ 101.129421] [<ffffffffa01df54f>] tee_tg6+0x15f/0x228 [xt_TEE]
[ 101.129421] [<ffffffffa012413f>] ip6t_do_table+0x26f/0x435 [ip6_tables]
[ 101.129421] [<ffffffff8107b09a>] ? local_bh_enable+0x8a/0xf0
[ 101.129421] [<ffffffff816dd6e0>] ? ndisc_alloc_skb+0x110/0x110
[ 101.129421] [<ffffffffa01c10e8>] ip6table_mangle_hook+0xe8/0x170 [ip6table_mangle]
[ 101.129421] [<ffffffff810ea057>] ? __lock_is_held+0x57/0x80
[ 101.129421] [<ffffffff816dd6e0>] ? ndisc_alloc_skb+0x110/0x110
[ 101.129421] [<ffffffff8162ab5a>] nf_iterate+0xca/0x180
[ 101.129421] [<ffffffff816dd6e0>] ? ndisc_alloc_skb+0x110/0x110
[ 101.129421] [<ffffffff8162acf4>] nf_hook_slow+0xe4/0x290
[ 101.129421] [<ffffffff816dd6e0>] ? ndisc_alloc_skb+0x110/0x110
[ 101.129421] [<ffffffff816dddb2>] ndisc_send_skb+0x372/0x540
[ 101.129421] [<ffffffff816df81c>] ndisc_send_ns+0xdc/0x140
[ 101.129421] [<ffffffff816d394b>] rt6_probe+0x31b/0x380
[ 101.129421] [<ffffffff816d3690>] ? rt6_probe+0x60/0x380
[ 101.129421] [<ffffffff816d1b93>] ? rt6_score_route+0xa3/0x380
[ 101.129421] [<ffffffff816d471c>] ip6_pol_route.isra.47+0x2fc/0x4f0
[ 101.129421] [<ffffffff816d4940>] ? ip6_pol_route_input+0x30/0x30
[ 101.129421] [<ffffffff816d496a>] ip6_pol_route_output+0x2a/0x30
[ 101.129421] [<ffffffff8170516e>] fib6_rule_action+0xbe/0x200
[ 101.129421] [<ffffffff810ea057>] ? __lock_is_held+0x57/0x80
[ 101.129421] [<ffffffff816142e5>] fib_rules_lookup+0x145/0x390
[ 101.129421] [<ffffffff816141a5>] ? fib_rules_lookup+0x5/0x390
[ 101.129421] [<ffffffff81705464>] fib6_rule_lookup+0x44/0x80
[ 101.129421] [<ffffffff816d4940>] ? ip6_pol_route_input+0x30/0x30
[ 101.129421] [<ffffffff816d1263>] ip6_route_output+0x73/0xb0
[ 101.129421] [<ffffffffa01df54f>] tee_tg6+0x15f/0x228 [xt_TEE]
[ 101.129421] [<ffffffffa012413f>] ip6t_do_table+0x26f/0x435 [ip6_tables]
[ 101.129421] [<ffffffff8107b09a>] ? local_bh_enable+0x8a/0xf0
[ 101.129421] [<ffffffff816dd6e0>] ? ndisc_alloc_skb+0x110/0x110
[ 101.129421] [<ffffffffa01c10e8>] ip6table_mangle_hook+0xe8/0x170 [ip6table_mangle]
[ 101.129421] [<ffffffff810ea057>] ? __lock_is_held+0x57/0x80
[ 101.129421] [<ffffffff816dd6e0>] ? ndisc_alloc_skb+0x110/0x110
[ 101.129421] [<ffffffff8162ab5a>] nf_iterate+0xca/0x180
[ 101.129421] [<ffffffff816dd6e0>] ? ndisc_alloc_skb+0x110/0x110
[ 101.129421] [<ffffffff8162acf4>] nf_hook_slow+0xe4/0x290
[ 101.129421] [<ffffffff816dd6e0>] ? ndisc_alloc_skb+0x110/0x110
[ 101.129421] [<ffffffff816dddb2>] ndisc_send_skb+0x372/0x540
[ 101.129421] [<ffffffff816df81c>] ndisc_send_ns+0xdc/0x140
[ 101.129421] [<ffffffff816d394b>] rt6_probe+0x31b/0x380
[ 101.129421] [<ffffffff816d3690>] ? rt6_probe+0x60/0x380
[ 101.129421] [<ffffffff816d1b93>] ? rt6_score_route+0xa3/0x380
[ 101.129421] [<ffffffff816d471c>] ip6_pol_route.isra.47+0x2fc/0x4f0
[ 101.129421] [<ffffffff816d4940>] ? ip6_pol_route_input+0x30/0x30
[ 101.129421] [<ffffffff816d496a>] ip6_pol_route_output+0x2a/0x30
[ 101.129421] [<ffffffff8170516e>] fib6_rule_action+0xbe/0x200
[ 101.129421] [<ffffffff810ea057>] ? __lock_is_held+0x57/0x80
[ 101.129421] [<ffffffff816142e5>] fib_rules_lookup+0x145/0x390
[ 101.129421] [<ffffffff816141a5>] ? fib_rules_lookup+0x5/0x390
[ 101.129421] [<ffffffff81705464>] fib6_rule_lookup+0x44/0x80
[ 101.129421] [<ffffffff816d4940>] ? ip6_pol_route_input+0x30/0x30
[ 101.129421] [<ffffffff816d1263>] ip6_route_output+0x73/0xb0
[ 101.129421] [<ffffffff816bfd13>] ip6_dst_lookup_tail+0x373/0x4a0
[ 101.129421] [<ffffffff8105768f>] ? kvm_clock_read+0x2f/0x50
[ 101.129421] [<ffffffff81021b89>] ? sched_clock+0x9/0x10
[ 101.129421] [<ffffffff810b87bd>] ? sched_clock_local+0x1d/0x80
[ 101.129421] [<ffffffff816bfe93>] ip6_dst_lookup_flow+0x33/0x90
[ 101.129421] [<ffffffff816f94bd>] ip6_datagram_connect+0x16d/0x510
[ 101.129421] [<ffffffff810ed0cd>] ? trace_hardirqs_on_caller+0xfd/0x1c0
[ 101.129421] [<ffffffff810ed19d>] ? trace_hardirqs_on+0xd/0x10
[ 101.129421] [<ffffffff81678dde>] inet_dgram_connect+0x2e/0x80
[ 101.129421] [<ffffffff815da25a>] SYSC_connect+0xea/0x130
[ 101.129421] [<ffffffff81732818>] ? retint_swapgs+0x13/0x1b
[ 101.129421] [<ffffffff810ed0cd>] ? trace_hardirqs_on_caller+0xfd/0x1c0
[ 101.129421] [<ffffffff8137b67e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 101.129421] [<ffffffff815db75e>] SyS_connect+0xe/0x10
[ 101.129421] [<ffffffff8173ba29>] system_call_fastpath+0x16/0x1b
[ 101.129421] Code: 00 00 4d 8b b4 24 e0 13 00 00 e8 e5 3d fd ff 85 c0 74 09 80 3d d0 4a c5 00 00 74 78 49 8b 56 48 49 63 cd 90 48 8b 82 b8 00 00 00 <48> 03 04 cd a0 9c d3 81 48 01 18 48 8b 52 40 48 85 d2 75 e5 e8
[ 101.129421] RIP [<ffffffff810c9737>] cpuacct_charge+0x97/0x200
[ 101.129421] RSP <ffff88011b403d60>
[ 101.129421] CR2: fffffffb8a2fda88
[ 101.129421] ---[ end trace 68ce790414632b74 ]---
[ 101.129421] Kernel panic - not syncing: Fatal exception in interrupt
^ permalink raw reply [flat|nested] 28+ messages in thread* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
2013-10-18 2:10 ` Hannes Frederic Sowa
@ 2013-10-18 6:33 ` Julian Anastasov
2013-10-18 16:33 ` Hannes Frederic Sowa
2013-10-19 16:37 ` Julian Anastasov
1 sibling, 1 reply; 28+ messages in thread
From: Julian Anastasov @ 2013-10-18 6:33 UTC (permalink / raw)
To: Hannes Frederic Sowa
Cc: Simon Horman,
YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
netdev, Mark Brooks, Phil Oester
Hello,
On Fri, 18 Oct 2013, Hannes Frederic Sowa wrote:
> I played around with your patch and tested xt_TEE. I added a TEE rule to
> mangle/OUTPUT and pinged. This happend, I have not yet analyzed it:
May be a side effect of using some multicast address?
Can you test it with such address check?:
if (rt->rt6i_flags & RTF_GATEWAY ||
ipv6_addr_type(&rt->rt6i_gateway) & IPV6_ADDR_UNICAST)
> [ 101.129421] [<ffffffff816d4e47>] icmp6_dst_alloc+0xf7/0x430
> [ 101.129421] [<ffffffff816d4d55>] ? icmp6_dst_alloc+0x5/0x430
> [ 101.129421] [<ffffffff816dde26>] ndisc_send_skb+0x3e6/0x540
> [ 101.129421] [<ffffffff816df81c>] ndisc_send_ns+0xdc/0x140
> [ 101.129421] [<ffffffff816d394b>] rt6_probe+0x31b/0x380
> [ 101.129421] [<ffffffff816d3690>] ? rt6_probe+0x60/0x380
> [ 101.129421] [<ffffffff816d1b93>] ? rt6_score_route+0xa3/0x380
> [ 101.129421] [<ffffffff816d471c>] ip6_pol_route.isra.47+0x2fc/0x4f0
Regards
--
Julian Anastasov <ja@ssi.bg>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
2013-10-18 6:33 ` Julian Anastasov
@ 2013-10-18 16:33 ` Hannes Frederic Sowa
0 siblings, 0 replies; 28+ messages in thread
From: Hannes Frederic Sowa @ 2013-10-18 16:33 UTC (permalink / raw)
To: Julian Anastasov
Cc: Simon Horman,
YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
netdev, Mark Brooks, Phil Oester
On Fri, Oct 18, 2013 at 09:33:13AM +0300, Julian Anastasov wrote:
>
> Hello,
>
> On Fri, 18 Oct 2013, Hannes Frederic Sowa wrote:
>
> > I played around with your patch and tested xt_TEE. I added a TEE rule to
> > mangle/OUTPUT and pinged. This happend, I have not yet analyzed it:
>
> May be a side effect of using some multicast address?
> Can you test it with such address check?:
I don't think that is the reason.
>
> if (rt->rt6i_flags & RTF_GATEWAY ||
> ipv6_addr_type(&rt->rt6i_gateway) & IPV6_ADDR_UNICAST)
Just checked, I had the same panic.
Greetings,
Hannes
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
2013-10-18 2:10 ` Hannes Frederic Sowa
2013-10-18 6:33 ` Julian Anastasov
@ 2013-10-19 16:37 ` Julian Anastasov
2013-10-19 18:34 ` Hannes Frederic Sowa
1 sibling, 1 reply; 28+ messages in thread
From: Julian Anastasov @ 2013-10-19 16:37 UTC (permalink / raw)
To: Hannes Frederic Sowa
Cc: Simon Horman,
YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
netdev, Mark Brooks, Phil Oester
Hello,
On Fri, 18 Oct 2013, Hannes Frederic Sowa wrote:
> I played around with your patch and tested xt_TEE. I added a TEE rule to
> mangle/OUTPUT and pinged. This happend, I have not yet analyzed it:
>
> [ 101.126649] ------------[ cut here ]------------
> [ 101.128436] BUG: unable to handle kernel paging request at fffffffb8a2fda88
> [ 101.129421] IP: [<ffffffff810c9737>] cpuacct_charge+0x97/0x200
> [ 101.129421] PGD 1c0f067 PUD 0
> [ 101.129421] Thread overran stack, or stack corrupted
Problem with process stack? May be some packet loop
happens? Because I can not reproduce such problem in my
virtual setup, I tested TEE too, with careful packet
matching and 1 CPU. Should I assume that you don't have such
oops when the patch is not applied, with the same TEE rule?
> [ 101.129421] Oops: 0000 [#1] SMP
You don't appear to have PREEMPT in above line.
I'm not sure when preemption is enabled if tee_tg6() does
not have a problem with its anti-loop measures (tee_active).
Is preemption possible in OUTPUT hook, i.e. can we change
the CPU while playing with tee_active and as result change
different flag?
Regards
--
Julian Anastasov <ja@ssi.bg>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
2013-10-19 16:37 ` Julian Anastasov
@ 2013-10-19 18:34 ` Hannes Frederic Sowa
2013-10-19 22:36 ` Hannes Frederic Sowa
2013-10-20 6:39 ` Julian Anastasov
0 siblings, 2 replies; 28+ messages in thread
From: Hannes Frederic Sowa @ 2013-10-19 18:34 UTC (permalink / raw)
To: Julian Anastasov
Cc: Simon Horman,
YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
netdev, Mark Brooks, Phil Oester
On Sat, Oct 19, 2013 at 07:37:10PM +0300, Julian Anastasov wrote:
>
> Hello,
>
> On Fri, 18 Oct 2013, Hannes Frederic Sowa wrote:
>
> > I played around with your patch and tested xt_TEE. I added a TEE rule to
> > mangle/OUTPUT and pinged. This happend, I have not yet analyzed it:
> >
> > [ 101.126649] ------------[ cut here ]------------
> > [ 101.128436] BUG: unable to handle kernel paging request at fffffffb8a2fda88
> > [ 101.129421] IP: [<ffffffff810c9737>] cpuacct_charge+0x97/0x200
> > [ 101.129421] PGD 1c0f067 PUD 0
> > [ 101.129421] Thread overran stack, or stack corrupted
>
> Problem with process stack? May be some packet loop
> happens? Because I can not reproduce such problem in my
> virtual setup, I tested TEE too, with careful packet
> matching and 1 CPU. Should I assume that you don't have such
> oops when the patch is not applied, with the same TEE rule?
Oh, sorry, you are right. It happens with an unpatched net-next kernel, too.
I inserted the TEE rule in mangel/OUTGOING and had only one route, ip -6 r a
default via fe80::1 dev eth0 which at the time of the panic was actually not
reachable.
> > [ 101.129421] Oops: 0000 [#1] SMP
>
> You don't appear to have PREEMPT in above line.
> I'm not sure when preemption is enabled if tee_tg6() does
> not have a problem with its anti-loop measures (tee_active).
> Is preemption possible in OUTPUT hook, i.e. can we change
> the CPU while playing with tee_active and as result change
> different flag?
Hm, maybe. I don't have too much insight into netfilter stack and
what are the differences between OUTPUT and FORWARD path but plan to
investigate. ;)
Anyways just wanted to let you know that unpatched kernels are affected, too.
Will have a closer look later.
Greetings,
Hannes
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
2013-10-19 18:34 ` Hannes Frederic Sowa
@ 2013-10-19 22:36 ` Hannes Frederic Sowa
2013-10-19 22:39 ` Hannes Frederic Sowa
2013-10-20 7:11 ` Julian Anastasov
2013-10-20 6:39 ` Julian Anastasov
1 sibling, 2 replies; 28+ messages in thread
From: Hannes Frederic Sowa @ 2013-10-19 22:36 UTC (permalink / raw)
To: Julian Anastasov, Simon Horman,
YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
netdev, Mark Brooks, Phil Oester
On Sat, Oct 19, 2013 at 08:34:33PM +0200, Hannes Frederic Sowa wrote:
> On Sat, Oct 19, 2013 at 07:37:10PM +0300, Julian Anastasov wrote:
> >
> > Hello,
> >
> > On Fri, 18 Oct 2013, Hannes Frederic Sowa wrote:
> >
> > > I played around with your patch and tested xt_TEE. I added a TEE rule to
> > > mangle/OUTPUT and pinged. This happend, I have not yet analyzed it:
> > >
> > > [ 101.126649] ------------[ cut here ]------------
> > > [ 101.128436] BUG: unable to handle kernel paging request at fffffffb8a2fda88
> > > [ 101.129421] IP: [<ffffffff810c9737>] cpuacct_charge+0x97/0x200
> > > [ 101.129421] PGD 1c0f067 PUD 0
> > > [ 101.129421] Thread overran stack, or stack corrupted
> >
> > Problem with process stack? May be some packet loop
> > happens? Because I can not reproduce such problem in my
> > virtual setup, I tested TEE too, with careful packet
> > matching and 1 CPU. Should I assume that you don't have such
> > oops when the patch is not applied, with the same TEE rule?
>
> Oh, sorry, you are right. It happens with an unpatched net-next kernel, too.
>
> I inserted the TEE rule in mangel/OUTGOING and had only one route, ip -6 r a
> default via fe80::1 dev eth0 which at the time of the panic was actually not
> reachable.
>
> > > [ 101.129421] Oops: 0000 [#1] SMP
> >
> > You don't appear to have PREEMPT in above line.
> > I'm not sure when preemption is enabled if tee_tg6() does
> > not have a problem with its anti-loop measures (tee_active).
> > Is preemption possible in OUTPUT hook, i.e. can we change
> > the CPU while playing with tee_active and as result change
> > different flag?
>
> Hm, maybe. I don't have too much insight into netfilter stack and
> what are the differences between OUTPUT and FORWARD path but plan to
> investigate. ;)
It seems tables are processed with bh disabled, so no preemption while
recursing. So I guess the use of tee_active is safe for breaking the
tie here.
The reason I exhaust stack space is that we can actually send out packets
while looking up routes (rt6_probe). The nonreachability of the default
gateway and the to-teed-to box does the rest.
We need to change the route lookup of the duplicated packet in xt_tee to not
cause ndisc probes to be generated.
The more I review the patch the more I think it is ok. But we could actually
try to just always return rt6i_gateway, as we should always be handed a cloned
rt6_info where the gateway is already filled in, no?
Greetings,
Hannes
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
2013-10-19 22:36 ` Hannes Frederic Sowa
@ 2013-10-19 22:39 ` Hannes Frederic Sowa
2013-10-20 7:11 ` Julian Anastasov
1 sibling, 0 replies; 28+ messages in thread
From: Hannes Frederic Sowa @ 2013-10-19 22:39 UTC (permalink / raw)
To: Julian Anastasov, Simon Horman,
YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
netdev, Mark Brooks, Phil Oester
On Sun, Oct 20, 2013 at 12:36:57AM +0200, Hannes Frederic Sowa wrote:
> The more I review the patch the more I think it is ok. But we could actually
> try to just always return rt6i_gateway, as we should always be handed a cloned
> rt6_info where the gateway is already filled in, no?
Ergh, this seems to break loopback then.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
2013-10-19 22:36 ` Hannes Frederic Sowa
2013-10-19 22:39 ` Hannes Frederic Sowa
@ 2013-10-20 7:11 ` Julian Anastasov
2013-10-20 7:33 ` Hannes Frederic Sowa
1 sibling, 1 reply; 28+ messages in thread
From: Julian Anastasov @ 2013-10-20 7:11 UTC (permalink / raw)
To: Hannes Frederic Sowa
Cc: Simon Horman,
YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
netdev, Mark Brooks, Phil Oester
Hello,
On Sun, 20 Oct 2013, Hannes Frederic Sowa wrote:
> > Hm, maybe. I don't have too much insight into netfilter stack and
> > what are the differences between OUTPUT and FORWARD path but plan to
> > investigate. ;)
>
> It seems tables are processed with bh disabled, so no preemption while
> recursing. So I guess the use of tee_active is safe for breaking the
> tie here.
May be, I'll check it again, for now I see only
rcu_read_lock() in nf_hook_slow() which is preemptable.
Looking at rcu_preempt_note_context_switch, many levels of
RCU locks are preemptable too.
> The reason I exhaust stack space is that we can actually send out packets
> while looking up routes (rt6_probe). The nonreachability of the default
> gateway and the to-teed-to box does the rest.
In my test I used link route to local subnet,
--gateway to IP that is not present. I'll try other
variants.
> We need to change the route lookup of the duplicated packet in xt_tee to not
> cause ndisc probes to be generated.
>
> The more I review the patch the more I think it is ok. But we could actually
> try to just always return rt6i_gateway, as we should always be handed a cloned
> rt6_info where the gateway is already filled in, no?
Yes, this patch is ok and after spending the whole
saturday I'm preparing a new patch that will convert
rt6_nexthop() to return just rt6i_gateway, without daddr.
This can happen after filling rt6i_gateway in all places.
For your concern for loopback, I don't see problem,
local/anycast route will have rt6i_gateway=IP, they are
simple DST_HOST routes. I'm preparing now the patches and
will post them in following hours.
Regards
--
Julian Anastasov <ja@ssi.bg>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
2013-10-20 7:11 ` Julian Anastasov
@ 2013-10-20 7:33 ` Hannes Frederic Sowa
2013-10-20 8:33 ` Julian Anastasov
2013-10-20 12:27 ` Julian Anastasov
0 siblings, 2 replies; 28+ messages in thread
From: Hannes Frederic Sowa @ 2013-10-20 7:33 UTC (permalink / raw)
To: Julian Anastasov
Cc: Simon Horman,
YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
netdev, Mark Brooks, Phil Oester
On Sun, Oct 20, 2013 at 10:11:16AM +0300, Julian Anastasov wrote:
>
> Hello,
>
> On Sun, 20 Oct 2013, Hannes Frederic Sowa wrote:
>
> > > Hm, maybe. I don't have too much insight into netfilter stack and
> > > what are the differences between OUTPUT and FORWARD path but plan to
> > > investigate. ;)
> >
> > It seems tables are processed with bh disabled, so no preemption while
> > recursing. So I guess the use of tee_active is safe for breaking the
> > tie here.
>
> May be, I'll check it again, for now I see only
> rcu_read_lock() in nf_hook_slow() which is preemptable.
> Looking at rcu_preempt_note_context_switch, many levels of
> RCU locks are preemptable too.
The caller I found was ip6t_do_table which does deactivate bottom halves.
Maybe there are others I did not see, so double checking is better.
> In my test I used link route to local subnet, --gateway to IP
> that is not present. I'll try other variants.
Is your kernel compiled with CONFIG_IPV6_ROUTER_PREF?
> > The more I review the patch the more I think it is ok. But we could actually
> > try to just always return rt6i_gateway, as we should always be handed a cloned
> > rt6_info where the gateway is already filled in, no?
>
> Yes, this patch is ok and after spending the whole
> saturday I'm preparing a new patch that will convert
> rt6_nexthop() to return just rt6i_gateway, without daddr.
> This can happen after filling rt6i_gateway in all places.
>
> For your concern for loopback, I don't see problem,
> local/anycast route will have rt6i_gateway=IP, they are
> simple DST_HOST routes. I'm preparing now the patches and
> will post them in following hours.
Ok, that's a nice simplification. I'll have a look tomorrow.
I cannot test my patch today any more, so I just leave it here. It is only
compile tested. Maybe you can make use of it:
Btw: I cannot put a reference to the rt6_info into __rt6_probe_work because we
are not supposed to use rt6_info reference counters outside of ip6_fib
because the deletion from the fib will break otherwise.
Maybe we should also create a seperate ipv6 workqueue. Will check later.
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index c3130ff..6c539bc 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -476,6 +476,40 @@ out:
}
#ifdef CONFIG_IPV6_ROUTER_PREF
+struct __rt6_probe_work {
+ struct work_struct work;
+ struct in6_addr target;
+ struct net_device *dev;
+};
+
+static void rt6_probe_deferred(struct work_struct *w)
+{
+ struct in6_addr mcaddr;
+ struct __rt6_probe_work *work =
+ container_of(w, struct __rt6_probe_work, work);
+
+ addrconf_addr_solict_mult(&work->target, &mcaddr);
+ ndisc_send_ns(work->dev, NULL, &work->target, &mcaddr, NULL);
+ dev_put(work->dev);
+ kfree(w);
+}
+
+static bool rt6_probe_later(struct rt6_info *rt)
+{
+ struct __rt6_probe_work *work;
+
+ work = kmalloc(sizeof(*work), GFP_ATOMIC);
+ if (!work)
+ return false;
+
+ INIT_WORK(&work->work, rt6_probe_deferred);
+ work->target = rt->rt6i_gateway;
+ dev_hold(rt->dst.dev);
+ work->dev = rt->dst.dev;
+ schedule_work(&work->work);
+ return true;
+}
+
static void rt6_probe(struct rt6_info *rt)
{
struct neighbour *neigh;
@@ -499,17 +533,10 @@ static void rt6_probe(struct rt6_info *rt)
if (!neigh ||
time_after(jiffies, neigh->updated + rt->rt6i_idev->cnf.rtr_probe_interval)) {
- struct in6_addr mcaddr;
- struct in6_addr *target;
-
- if (neigh) {
- neigh->updated = jiffies;
+ if (neigh)
write_unlock(&neigh->lock);
- }
^ permalink raw reply related [flat|nested] 28+ messages in thread* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
2013-10-20 7:33 ` Hannes Frederic Sowa
@ 2013-10-20 8:33 ` Julian Anastasov
2013-10-20 12:27 ` Julian Anastasov
1 sibling, 0 replies; 28+ messages in thread
From: Julian Anastasov @ 2013-10-20 8:33 UTC (permalink / raw)
To: Hannes Frederic Sowa
Cc: Simon Horman,
YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
netdev, Mark Brooks, Phil Oester
Hello,
On Sun, 20 Oct 2013, Hannes Frederic Sowa wrote:
> The caller I found was ip6t_do_table which does deactivate bottom halves.
> Maybe there are others I did not see, so double checking is better.
Aha, I see.
> > In my test I used link route to local subnet, --gateway to IP
> > that is not present. I'll try other variants.
>
> Is your kernel compiled with CONFIG_IPV6_ROUTER_PREF?
Hm, no. I'll enable it. But it works for
RTF_GATEWAY, something I didn't tested before now.
> I cannot test my patch today any more, so I just leave it here. It is only
> compile tested. Maybe you can make use of it:
I'll try it. But note that my test setup has
no any routers around.
Regards
--
Julian Anastasov <ja@ssi.bg>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
2013-10-20 7:33 ` Hannes Frederic Sowa
2013-10-20 8:33 ` Julian Anastasov
@ 2013-10-20 12:27 ` Julian Anastasov
1 sibling, 0 replies; 28+ messages in thread
From: Julian Anastasov @ 2013-10-20 12:27 UTC (permalink / raw)
To: Hannes Frederic Sowa
Cc: Simon Horman,
YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
netdev, Mark Brooks, Phil Oester
Hello,
On Sun, 20 Oct 2013, Hannes Frederic Sowa wrote:
> Is your kernel compiled with CONFIG_IPV6_ROUTER_PREF?
I tested with this flag, even with your patch,
added printk in rt6_probe_deferred() to be sure. But
in my setup without router I can not test it fully.
I'm on 32-bit platform, so may be the shorter structs
fit in stack and I don't see oops.
Regards
--
Julian Anastasov <ja@ssi.bg>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
2013-10-19 18:34 ` Hannes Frederic Sowa
2013-10-19 22:36 ` Hannes Frederic Sowa
@ 2013-10-20 6:39 ` Julian Anastasov
2013-10-20 6:47 ` Hannes Frederic Sowa
1 sibling, 1 reply; 28+ messages in thread
From: Julian Anastasov @ 2013-10-20 6:39 UTC (permalink / raw)
To: Hannes Frederic Sowa
Cc: Simon Horman,
YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
netdev, Mark Brooks, Phil Oester
Hello,
On Sat, 19 Oct 2013, Hannes Frederic Sowa wrote:
> On Sat, Oct 19, 2013 at 07:37:10PM +0300, Julian Anastasov wrote:
> >
> > Problem with process stack? May be some packet loop
> > happens? Because I can not reproduce such problem in my
> > virtual setup, I tested TEE too, with careful packet
> > matching and 1 CPU. Should I assume that you don't have such
> > oops when the patch is not applied, with the same TEE rule?
>
> Oh, sorry, you are right. It happens with an unpatched net-next kernel, too.
>
> I inserted the TEE rule in mangel/OUTGOING and had only one route, ip -6 r a
> default via fe80::1 dev eth0 which at the time of the panic was actually not
> reachable.
Thanks for the confirmation! I'll try later
to reproduce such problem with TEE, it is interesting
to know the real reason for this loop.
Regards
--
Julian Anastasov <ja@ssi.bg>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
2013-10-20 6:39 ` Julian Anastasov
@ 2013-10-20 6:47 ` Hannes Frederic Sowa
0 siblings, 0 replies; 28+ messages in thread
From: Hannes Frederic Sowa @ 2013-10-20 6:47 UTC (permalink / raw)
To: Julian Anastasov
Cc: Simon Horman,
YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
netdev, Mark Brooks, Phil Oester
On Sun, Oct 20, 2013 at 09:39:26AM +0300, Julian Anastasov wrote:
>
> Hello,
>
> On Sat, 19 Oct 2013, Hannes Frederic Sowa wrote:
>
> > On Sat, Oct 19, 2013 at 07:37:10PM +0300, Julian Anastasov wrote:
> > >
> > > Problem with process stack? May be some packet loop
> > > happens? Because I can not reproduce such problem in my
> > > virtual setup, I tested TEE too, with careful packet
> > > matching and 1 CPU. Should I assume that you don't have such
> > > oops when the patch is not applied, with the same TEE rule?
> >
> > Oh, sorry, you are right. It happens with an unpatched net-next kernel, too.
> >
> > I inserted the TEE rule in mangel/OUTGOING and had only one route, ip -6 r a
> > default via fe80::1 dev eth0 which at the time of the panic was actually not
> > reachable.
>
> Thanks for the confirmation! I'll try later
> to reproduce such problem with TEE, it is interesting
> to know the real reason for this loop.
Yup, I have a patch in works wich defers the ndisc_send_ns call in
rt6_probe into a workqueue out of the call stack. This should solve
the problem.
Still wonder if there is a nicer api as to create a new structure and pass the
arguments via container_of dereferencing to the deferred function (and I
don't think async calls are useable there).
Greetings,
Hannes
^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2013-10-20 12:27 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-16 0:02 [RFC net-next] ipv6: Use destination address determined by IPVS Simon Horman
2013-10-16 0:13 ` Eric Dumazet
2013-10-16 0:28 ` Simon Horman
2013-10-16 0:39 ` Eric Dumazet
2013-10-16 2:13 ` Simon Horman
2013-10-16 2:40 ` Eric Dumazet
2013-10-16 4:26 ` Simon Horman
2013-10-16 0:53 ` Hannes Frederic Sowa
2013-10-16 2:14 ` Simon Horman
2013-10-16 7:27 ` Julian Anastasov
2013-10-16 14:32 ` Hannes Frederic Sowa
2013-10-16 20:22 ` Julian Anastasov
2013-10-16 21:59 ` Hannes Frederic Sowa
2013-10-16 22:50 ` Julian Anastasov
2013-10-17 23:03 ` Julian Anastasov
2013-10-18 2:10 ` Hannes Frederic Sowa
2013-10-18 6:33 ` Julian Anastasov
2013-10-18 16:33 ` Hannes Frederic Sowa
2013-10-19 16:37 ` Julian Anastasov
2013-10-19 18:34 ` Hannes Frederic Sowa
2013-10-19 22:36 ` Hannes Frederic Sowa
2013-10-19 22:39 ` Hannes Frederic Sowa
2013-10-20 7:11 ` Julian Anastasov
2013-10-20 7:33 ` Hannes Frederic Sowa
2013-10-20 8:33 ` Julian Anastasov
2013-10-20 12:27 ` Julian Anastasov
2013-10-20 6:39 ` Julian Anastasov
2013-10-20 6:47 ` Hannes Frederic Sowa
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).