* [patch v3 00/12] IPVS: SIP Persistence Engine
From: Simon Horman @ 2010-10-02 2:30 UTC (permalink / raw)
To: lvs-devel, netdev, netfilter, netfilter-devel
Cc: Jan Engelhardt, Stephen Hemminger, Wensong Zhang,
Julian Anastasov, Patrick McHardy
This patch series adds load-balancing of UDP SIP based on Call-ID to
IPVS as well as a frame-work for extending IPVS to handle alternate
persistence requirements.
REVISIONS
This is v3 of the patch series with addresses serveral problems
raised by Julian Anastasov on the lvs-devel mailing list.
v2 of the patch series fixed several problems
including non-atomic allocations while running atomic, and a memory leak.
v1 of this series addressed a few minor problems.
Internally there were 4 rfc versions, 0.1, 0.2, 0.3 and 0.4.
All changes are noted on a per-patch basis.
OVERVIEW
The approach that I have taken is what I call persistence engines.
The basic idea being that you can provide a module to LVS that alters
the way that it handles connection templates, which are at the core
of persistence. In particular, an additional key can be added, and
any of the normal IP address, port and protocol information can either
be used or ignored.
In the case of the SIP persistence engine, the only persistence engine, all
the keys used by the default persistence behaviour are used and the callid
is added as an extra key. I originally intended to ignore the cip, but this
can optionally be done by setting the persistence mask (-M) to 0.0.0.0
while allowing the flexibility of other mask values.
It is envisaged that the SIP persistence engine will be used in conjunction
with one-packet scheduling. I'm interested to hear if that doesn't fit your
needs.
CONFIGURATION
A persistence engine is associated with a virtual service
(as are schedulers). I have added the --pe option to the
ivpsadm -A and -E commands to allow the persistence engine
of a virtual service to be added, changed, or deleted.
e.g. ipvsadm -A -u 10.4.3.192:5060 -p 60 -M 0.0.0.0 -o --pe sip
There are no other configuration parameters at this time.
RUNNING
When a connection template is created, if its virtual service
has a persistence engine, then the persistence engine can add
an extra key to the connection template. For the SIP module this
is the callid. More generically, it is known as "pe data". And
both the name of the persistence engine, "pe name", and "pe data"
can be viewed in /proc/net/ip_vs_conn and by passing the
--persistent-conn option to ipvsadm -Lc.
e.g.
# ipvsadm -Lcn --persistent-conn
UDP 00:38 UDP 10.4.3.0:0 10.4.3.192:5060 127.0.0.1:5060 sip 193373839
Here we see a single persistence template (cport is 0), which has been
handled by the sip persistence engine. The pe data (callid) is 193373839.
In the case where the persistence engine can't match a packet for some
reason, the connection will fall back to the normal persistence handling.
This seems reasonable, as that if the packet ought to be dropped, iptables
could be used.
A limited amount of debugging information has been added which
can be enabled using a value of 9 or greater in
/proc/sys/net/ipv4/vs/debug_level
CODE AVAILABILITY
The kernel patches (12) are available in git as the pe-3 branch of
git://git.kernel.org/pub/scm/linux/kernel/git/horms/lvs-test-2.6.git
The ipvsadm patches (2) are available in git as the pe-3 branch of
git://github.com/horms/ipvsadm-test.git
There is no change to the ipvsadm patches since the v2 series
so I will not repost them.
^ permalink raw reply
* Re: [patch v2 03/12] [PATCH 03/12] IPVS: compact ip_vs_sched_persist()
From: Simon Horman @ 2010-10-02 2:20 UTC (permalink / raw)
To: Julian Anastasov
Cc: lvs-devel, netdev, netfilter, netfilter-devel, Jan Engelhardt,
Stephen Hemminger, Wensong Zhang, Patrick McHardy
In-Reply-To: <alpine.LFD.2.00.1010020129400.2462@ja.ssi.bg>
On Sat, Oct 02, 2010 at 01:35:28AM +0300, Julian Anastasov wrote:
>
> Hello,
>
> On Fri, 1 Oct 2010, Simon Horman wrote:
>
> >Compact ip_vs_sched_persist() by setting up parameters
> >and calling functions once.
> >
> >Signed-off-by: Simon Horman <horms@verge.net.au>
> >---
> >
> >v2
> >* Make "union nf_inet_addr fwmark" const
> >* Don't remove the comment next to the declaration of dport
> >* Add a comment to the declaration of vport
> >
> >Index: lvs-test-2.6/net/netfilter/ipvs/ip_vs_core.c
> >===================================================================
> >--- lvs-test-2.6.orig/net/netfilter/ipvs/ip_vs_core.c 2010-10-01 21:56:39.000000000 +0900
> >+++ lvs-test-2.6/net/netfilter/ipvs/ip_vs_core.c 2010-10-01 22:02:41.000000000 +0900
> >@@ -193,10 +193,14 @@ ip_vs_sched_persist(struct ip_vs_service
> > struct ip_vs_iphdr iph;
> > struct ip_vs_dest *dest;
> > struct ip_vs_conn *ct;
> >- __be16 dport; /* destination port to forward */
> >+ int protocol = iph.protocol;
> >+ __be16 dport = 0; /* destination port to forward */
> >+ __be16 vport = 0; /* virtual service port */
> > unsigned int flags;
> > union nf_inet_addr snet; /* source network of the client,
> > after masking */
> >+ const union nf_inet_addr fwmark = { .ip = htonl(svc->fwmark) };
> >+ const union nf_inet_addr *vaddr = &iph.daddr;
> >
> > ip_vs_fill_iphdr(svc->af, skb_network_header(skb), &iph);
> >
> >@@ -227,119 +231,58 @@ ip_vs_sched_persist(struct ip_vs_service
> > * service, and a template like <caddr, 0, vaddr, vport, daddr, dport>
> > * is created for other persistent services.
> > */
> >- if (ports[1] == svc->port) {
> >- /* Check if a template already exists */
> >- if (svc->port != FTPPORT)
> >- ct = ip_vs_ct_in_get(svc->af, iph.protocol, &snet, 0,
> >- &iph.daddr, ports[1]);
> >- else
> >- ct = ip_vs_ct_in_get(svc->af, iph.protocol, &snet, 0,
> >- &iph.daddr, 0);
> >-
> >- if (!ct || !ip_vs_check_template(ct)) {
> >- /*
> >- * No template found or the dest of the connection
> >- * template is not available.
> >- */
> >- dest = svc->scheduler->schedule(svc, skb);
> >- if (dest == NULL) {
> >- IP_VS_DBG(1, "p-schedule: no dest found.\n");
> >- return NULL;
> >- }
> >-
> >- /*
> >- * Create a template like <protocol,caddr,0,
> >- * vaddr,vport,daddr,dport> for non-ftp service,
> >- * and <protocol,caddr,0,vaddr,0,daddr,0>
> >- * for ftp service.
> >+ {
> >+ if (ports[1] == svc->port) {
> >+ /* non-FTP template:
> >+ * <protocol, caddr, 0, vaddr, vport, daddr, dport>
> >+ * FTP template:
> >+ * <protocol, caddr, 0, vaddr, 0, daddr, 0>
> > */
> > if (svc->port != FTPPORT)
> >- ct = ip_vs_conn_new(svc->af, iph.protocol,
> >- &snet, 0,
> >- &iph.daddr,
> >- ports[1],
> >- &dest->addr, dest->port,
> >- IP_VS_CONN_F_TEMPLATE,
> >- dest);
> >- else
> >- ct = ip_vs_conn_new(svc->af, iph.protocol,
> >- &snet, 0,
> >- &iph.daddr, 0,
> >- &dest->addr, 0,
> >- IP_VS_CONN_F_TEMPLATE,
> >- dest);
> >- if (ct == NULL)
> >- return NULL;
> >-
> >- ct->timeout = svc->timeout;
> >+ vport = ports[1];
> > } else {
> >- /* set destination with the found template */
> >- dest = ct->dest;
> >- }
> >- dport = dest->port;
> >- } else {
> >- /*
> >- * Note: persistent fwmark-based services and persistent
> >- * port zero service are handled here.
> >- * fwmark template: <IPPROTO_IP,caddr,0,fwmark,0,daddr,0>
> >- * port zero template: <protocol,caddr,0,vaddr,0,daddr,0>
> >- */
> >- if (svc->fwmark) {
> >- union nf_inet_addr fwmark = {
> >- .ip = htonl(svc->fwmark)
> >- };
> >-
> >- ct = ip_vs_ct_in_get(svc->af, IPPROTO_IP, &snet, 0,
> >- &fwmark, 0);
> >- } else
> >- ct = ip_vs_ct_in_get(svc->af, iph.protocol, &snet, 0,
> >- &iph.daddr, 0);
> >-
> >- if (!ct || !ip_vs_check_template(ct)) {
> >- /*
> >- * If it is not persistent port zero, return NULL,
> >- * otherwise create a connection template.
> >+ /* Note: persistent fwmark-based services and
> >+ * persistent port zero service are handled here.
> >+ * fwmark template:
> >+ * <IPPROTO_IP,caddr,0,fwmark,0,daddr,0>
> >+ * port zero template:
> >+ * <protocol,caddr,0,vaddr,0,daddr,0>
> > */
> >- if (svc->port)
> >- return NULL;
> >-
> >- dest = svc->scheduler->schedule(svc, skb);
> >- if (dest == NULL) {
> >- IP_VS_DBG(1, "p-schedule: no dest found.\n");
> >- return NULL;
> >+ if (svc->fwmark) {
> >+ protocol = IPPROTO_IP;
> >+ vaddr = &fwmark;
> > }
> >+ }
> >+ }
> >
> >- /*
> >- * Create a template according to the service
> >- */
> >- if (svc->fwmark) {
> >- union nf_inet_addr fwmark = {
> >- .ip = htonl(svc->fwmark)
> >- };
> >-
> >- ct = ip_vs_conn_new(svc->af, IPPROTO_IP,
> >- &snet, 0,
> >- &fwmark, 0,
> >- &dest->addr, 0,
> >- IP_VS_CONN_F_TEMPLATE,
> >- dest);
> >- } else
> >- ct = ip_vs_conn_new(svc->af, iph.protocol,
> >- &snet, 0,
> >- &iph.daddr, 0,
> >- &dest->addr, 0,
> >- IP_VS_CONN_F_TEMPLATE,
> >- dest);
> >- if (ct == NULL)
> >- return NULL;
> >+ /* Check if a template already exists */
> >+ ct = ip_vs_ct_in_get(svc->af, protocol, &snet, 0, vaddr, vport);
> >
> >- ct->timeout = svc->timeout;
> >- } else {
> >- /* set destination with the found template */
> >- dest = ct->dest;
> >+ if (!ct || !ip_vs_check_template(ct)) {
> >+ /* No template found or the dest of the connection
> >+ * template is not available.
> >+ */
> >+ dest = svc->scheduler->schedule(svc, skb);
> >+ if (!dest) {
> >+ IP_VS_DBG(1, "p-schedule: no dest found.\n");
> >+ return NULL;
> > }
> >- dport = ports[1];
> >- }
> >+
> >+ if (ports[1] == svc->port && svc->port != FTPPORT)
> >+ dport = dest->port;
> >+
> >+ /* Create a template */
> >+ ct = ip_vs_conn_new(svc->af, protocol, &snet, 0,vaddr, vport,
> >+ &dest->addr, dport,
> >+ IP_VS_CONN_F_TEMPLATE, dest);
> >+ if (ct == NULL)
> >+ return NULL;
> >+
> >+ ct->timeout = svc->timeout;
> >+ } else
> >+ /* set destination with the found template */
> >+ dest = ct->dest;
>
> Here dport:
>
> >+ dport = dest->port;
>
> should be:
>
> dport = ports[1];
> if (dport == svc->port && dest->port)
> dport = dest->port;
Thanks, fixed.
> > flags = (svc->flags & IP_VS_SVC_F_ONEPACKET
> > && iph.protocol == IPPROTO_UDP)?
I will repost the entire series a little later.
For reference, here is the updated version of this patch.
>From a6310d1a8f21bdf15fa797ed748651679c0197e2 Mon Sep 17 00:00:00 2001
From: Simon Horman <horms@verge.net.au>
Date: Sun, 22 Aug 2010 21:37:51 +0900
Subject: [PATCH 03/12] IPVS: compact ip_vs_sched_persist()
Compact ip_vs_sched_persist() by setting up parameters
and calling functions once.
Signed-off-by: Simon Horman <horms@verge.net.au>
---
v2
* Make "union nf_inet_addr fwmark" const
* Don't remove the comment next to the declaration of dport
* Add a comment to the declaration of vport
v3
* As suggested by Julian Anastasov
- Correct dport logic
Index: lvs-test-2.6/net/netfilter/ipvs/ip_vs_core.c
===================================================================
--- lvs-test-2.6.orig/net/netfilter/ipvs/ip_vs_core.c 2010-10-02 10:54:57.000000000 +0900
+++ lvs-test-2.6/net/netfilter/ipvs/ip_vs_core.c 2010-10-02 11:04:04.000000000 +0900
@@ -193,10 +193,14 @@ ip_vs_sched_persist(struct ip_vs_service
struct ip_vs_iphdr iph;
struct ip_vs_dest *dest;
struct ip_vs_conn *ct;
- __be16 dport; /* destination port to forward */
+ int protocol = iph.protocol;
+ __be16 dport = 0; /* destination port to forward */
+ __be16 vport = 0; /* virtual service port */
unsigned int flags;
union nf_inet_addr snet; /* source network of the client,
after masking */
+ const union nf_inet_addr fwmark = { .ip = htonl(svc->fwmark) };
+ const union nf_inet_addr *vaddr = &iph.daddr;
ip_vs_fill_iphdr(svc->af, skb_network_header(skb), &iph);
@@ -227,119 +231,61 @@ ip_vs_sched_persist(struct ip_vs_service
* service, and a template like <caddr, 0, vaddr, vport, daddr, dport>
* is created for other persistent services.
*/
- if (ports[1] == svc->port) {
- /* Check if a template already exists */
- if (svc->port != FTPPORT)
- ct = ip_vs_ct_in_get(svc->af, iph.protocol, &snet, 0,
- &iph.daddr, ports[1]);
- else
- ct = ip_vs_ct_in_get(svc->af, iph.protocol, &snet, 0,
- &iph.daddr, 0);
-
- if (!ct || !ip_vs_check_template(ct)) {
- /*
- * No template found or the dest of the connection
- * template is not available.
- */
- dest = svc->scheduler->schedule(svc, skb);
- if (dest == NULL) {
- IP_VS_DBG(1, "p-schedule: no dest found.\n");
- return NULL;
- }
-
- /*
- * Create a template like <protocol,caddr,0,
- * vaddr,vport,daddr,dport> for non-ftp service,
- * and <protocol,caddr,0,vaddr,0,daddr,0>
- * for ftp service.
+ {
+ if (ports[1] == svc->port) {
+ /* non-FTP template:
+ * <protocol, caddr, 0, vaddr, vport, daddr, dport>
+ * FTP template:
+ * <protocol, caddr, 0, vaddr, 0, daddr, 0>
*/
if (svc->port != FTPPORT)
- ct = ip_vs_conn_new(svc->af, iph.protocol,
- &snet, 0,
- &iph.daddr,
- ports[1],
- &dest->addr, dest->port,
- IP_VS_CONN_F_TEMPLATE,
- dest);
- else
- ct = ip_vs_conn_new(svc->af, iph.protocol,
- &snet, 0,
- &iph.daddr, 0,
- &dest->addr, 0,
- IP_VS_CONN_F_TEMPLATE,
- dest);
- if (ct == NULL)
- return NULL;
-
- ct->timeout = svc->timeout;
+ vport = ports[1];
} else {
- /* set destination with the found template */
- dest = ct->dest;
- }
- dport = dest->port;
- } else {
- /*
- * Note: persistent fwmark-based services and persistent
- * port zero service are handled here.
- * fwmark template: <IPPROTO_IP,caddr,0,fwmark,0,daddr,0>
- * port zero template: <protocol,caddr,0,vaddr,0,daddr,0>
- */
- if (svc->fwmark) {
- union nf_inet_addr fwmark = {
- .ip = htonl(svc->fwmark)
- };
-
- ct = ip_vs_ct_in_get(svc->af, IPPROTO_IP, &snet, 0,
- &fwmark, 0);
- } else
- ct = ip_vs_ct_in_get(svc->af, iph.protocol, &snet, 0,
- &iph.daddr, 0);
-
- if (!ct || !ip_vs_check_template(ct)) {
- /*
- * If it is not persistent port zero, return NULL,
- * otherwise create a connection template.
+ /* Note: persistent fwmark-based services and
+ * persistent port zero service are handled here.
+ * fwmark template:
+ * <IPPROTO_IP,caddr,0,fwmark,0,daddr,0>
+ * port zero template:
+ * <protocol,caddr,0,vaddr,0,daddr,0>
*/
- if (svc->port)
- return NULL;
-
- dest = svc->scheduler->schedule(svc, skb);
- if (dest == NULL) {
- IP_VS_DBG(1, "p-schedule: no dest found.\n");
- return NULL;
+ if (svc->fwmark) {
+ protocol = IPPROTO_IP;
+ vaddr = &fwmark;
}
+ }
+ }
- /*
- * Create a template according to the service
- */
- if (svc->fwmark) {
- union nf_inet_addr fwmark = {
- .ip = htonl(svc->fwmark)
- };
-
- ct = ip_vs_conn_new(svc->af, IPPROTO_IP,
- &snet, 0,
- &fwmark, 0,
- &dest->addr, 0,
- IP_VS_CONN_F_TEMPLATE,
- dest);
- } else
- ct = ip_vs_conn_new(svc->af, iph.protocol,
- &snet, 0,
- &iph.daddr, 0,
- &dest->addr, 0,
- IP_VS_CONN_F_TEMPLATE,
- dest);
- if (ct == NULL)
- return NULL;
+ /* Check if a template already exists */
+ ct = ip_vs_ct_in_get(svc->af, protocol, &snet, 0, vaddr, vport);
- ct->timeout = svc->timeout;
- } else {
- /* set destination with the found template */
- dest = ct->dest;
+ if (!ct || !ip_vs_check_template(ct)) {
+ /* No template found or the dest of the connection
+ * template is not available.
+ */
+ dest = svc->scheduler->schedule(svc, skb);
+ if (!dest) {
+ IP_VS_DBG(1, "p-schedule: no dest found.\n");
+ return NULL;
}
- dport = ports[1];
- }
+
+ if (ports[1] == svc->port && svc->port != FTPPORT)
+ dport = dest->port;
+
+ /* Create a template */
+ ct = ip_vs_conn_new(svc->af, protocol, &snet, 0,vaddr, vport,
+ &dest->addr, dport,
+ IP_VS_CONN_F_TEMPLATE, dest);
+ if (ct == NULL)
+ return NULL;
+
+ ct->timeout = svc->timeout;
+ } else
+ /* set destination with the found template */
+ dest = ct->dest;
+
+ dport = ports[1];
+ if (dport == svc->port && dest->port)
+ dport = dest->port;
flags = (svc->flags & IP_VS_SVC_F_ONEPACKET
&& iph.protocol == IPPROTO_UDP)?
^ permalink raw reply
* [PATCH net-next 4/4] ipmr: cleanups
From: Eric Dumazet @ 2010-10-02 2:15 UTC (permalink / raw)
To: David Miller; +Cc: netdev
Various code style cleanups
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
net/ipv4/ipmr.c | 238 +++++++++++++++++++++++-----------------------
1 file changed, 124 insertions(+), 114 deletions(-)
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index cbb6dab..86dd569 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -98,7 +98,7 @@ struct ipmr_result {
};
/* Big lock, protecting vif table, mrt cache and mroute socket state.
- Note that the changes are semaphored via rtnl_lock.
+ * Note that the changes are semaphored via rtnl_lock.
*/
static DEFINE_RWLOCK(mrt_lock);
@@ -113,11 +113,11 @@ static DEFINE_RWLOCK(mrt_lock);
static DEFINE_SPINLOCK(mfc_unres_lock);
/* We return to original Alan's scheme. Hash table of resolved
- entries is changed only in process context and protected
- with weak lock mrt_lock. Queue of unresolved entries is protected
- with strong spinlock mfc_unres_lock.
-
- In this case data path is free of exclusive locks at all.
+ * entries is changed only in process context and protected
+ * with weak lock mrt_lock. Queue of unresolved entries is protected
+ * with strong spinlock mfc_unres_lock.
+ *
+ * In this case data path is free of exclusive locks at all.
*/
static struct kmem_cache *mrt_cachep __read_mostly;
@@ -396,9 +396,9 @@ struct net_device *ipmr_new_tunnel(struct net *net, struct vifctl *v)
set_fs(KERNEL_DS);
err = ops->ndo_do_ioctl(dev, &ifr, SIOCADDTUNNEL);
set_fs(oldfs);
- } else
+ } else {
err = -EOPNOTSUPP;
-
+ }
dev = NULL;
if (err == 0 &&
@@ -495,7 +495,8 @@ static struct net_device *ipmr_reg_vif(struct net *net, struct mr_table *mrt)
dev->iflink = 0;
rcu_read_lock();
- if ((in_dev = __in_dev_get_rcu(dev)) == NULL) {
+ in_dev = __in_dev_get_rcu(dev);
+ if (!in_dev) {
rcu_read_unlock();
goto failure;
}
@@ -552,9 +553,10 @@ static int vif_delete(struct mr_table *mrt, int vifi, int notify,
mrt->mroute_reg_vif_num = -1;
#endif
- if (vifi+1 == mrt->maxvif) {
+ if (vifi + 1 == mrt->maxvif) {
int tmp;
- for (tmp=vifi-1; tmp>=0; tmp--) {
+
+ for (tmp = vifi - 1; tmp >= 0; tmp--) {
if (VIF_EXISTS(mrt, tmp))
break;
}
@@ -565,12 +567,13 @@ static int vif_delete(struct mr_table *mrt, int vifi, int notify,
dev_set_allmulti(dev, -1);
- if ((in_dev = __in_dev_get_rtnl(dev)) != NULL) {
+ in_dev = __in_dev_get_rtnl(dev);
+ if (in_dev) {
IPV4_DEVCONF(in_dev->cnf, MC_FORWARDING)--;
ip_rt_multicast_event(in_dev);
}
- if (v->flags&(VIFF_TUNNEL|VIFF_REGISTER) && !notify)
+ if (v->flags & (VIFF_TUNNEL | VIFF_REGISTER) && !notify)
unregister_netdevice_queue(dev, head);
dev_put(dev);
@@ -590,7 +593,7 @@ static inline void ipmr_cache_free(struct mfc_cache *c)
}
/* Destroy an unresolved cache entry, killing queued skbs
- and reporting error to netlink readers.
+ * and reporting error to netlink readers.
*/
static void ipmr_destroy_unres(struct mr_table *mrt, struct mfc_cache *c)
@@ -612,8 +615,9 @@ static void ipmr_destroy_unres(struct mr_table *mrt, struct mfc_cache *c)
memset(&e->msg, 0, sizeof(e->msg));
rtnl_unicast(skb, net, NETLINK_CB(skb).pid);
- } else
+ } else {
kfree_skb(skb);
+ }
}
ipmr_cache_free(c);
@@ -735,9 +739,9 @@ static int vif_add(struct net *net, struct mr_table *mrt,
dev_put(dev);
return -EADDRNOTAVAIL;
}
- } else
+ } else {
dev = ip_dev_find(net, vifc->vifc_lcl_addr.s_addr);
-
+ }
if (!dev)
return -EADDRNOTAVAIL;
err = dev_set_allmulti(dev, 1);
@@ -750,16 +754,16 @@ static int vif_add(struct net *net, struct mr_table *mrt,
return -EINVAL;
}
- if ((in_dev = __in_dev_get_rtnl(dev)) == NULL) {
+ in_dev = __in_dev_get_rtnl(dev);
+ if (!in_dev) {
dev_put(dev);
return -EADDRNOTAVAIL;
}
IPV4_DEVCONF(in_dev->cnf, MC_FORWARDING)++;
ip_rt_multicast_event(in_dev);
- /*
- * Fill in the VIF structures
- */
+ /* Fill in the VIF structures */
+
v->rate_limit = vifc->vifc_rate_limit;
v->local = vifc->vifc_lcl_addr.s_addr;
v->remote = vifc->vifc_rmt_addr.s_addr;
@@ -772,14 +776,14 @@ static int vif_add(struct net *net, struct mr_table *mrt,
v->pkt_in = 0;
v->pkt_out = 0;
v->link = dev->ifindex;
- if (v->flags&(VIFF_TUNNEL|VIFF_REGISTER))
+ if (v->flags & (VIFF_TUNNEL | VIFF_REGISTER))
v->link = dev->iflink;
/* And finish update writing critical data */
write_lock_bh(&mrt_lock);
v->dev = dev;
#ifdef CONFIG_IP_PIMSM
- if (v->flags&VIFF_REGISTER)
+ if (v->flags & VIFF_REGISTER)
mrt->mroute_reg_vif_num = vifi;
#endif
if (vifi+1 > mrt->maxvif)
@@ -836,17 +840,15 @@ static void ipmr_cache_resolve(struct net *net, struct mr_table *mrt,
struct sk_buff *skb;
struct nlmsgerr *e;
- /*
- * Play the pending entries through our router
- */
+ /* Play the pending entries through our router */
while ((skb = __skb_dequeue(&uc->mfc_un.unres.unresolved))) {
if (ip_hdr(skb)->version == 0) {
struct nlmsghdr *nlh = (struct nlmsghdr *)skb_pull(skb, sizeof(struct iphdr));
if (__ipmr_fill_mroute(mrt, skb, c, NLMSG_DATA(nlh)) > 0) {
- nlh->nlmsg_len = (skb_tail_pointer(skb) -
- (u8 *)nlh);
+ nlh->nlmsg_len = skb_tail_pointer(skb) -
+ (u8 *)nlh;
} else {
nlh->nlmsg_type = NLMSG_ERROR;
nlh->nlmsg_len = NLMSG_LENGTH(sizeof(struct nlmsgerr));
@@ -857,8 +859,9 @@ static void ipmr_cache_resolve(struct net *net, struct mr_table *mrt,
}
rtnl_unicast(skb, net, NETLINK_CB(skb).pid);
- } else
+ } else {
ip_mr_forward(net, mrt, skb, c, 0);
+ }
}
}
@@ -892,9 +895,9 @@ static int ipmr_cache_report(struct mr_table *mrt,
#ifdef CONFIG_IP_PIMSM
if (assert == IGMPMSG_WHOLEPKT) {
/* Ugly, but we have no choice with this interface.
- Duplicate old header, fix ihl, length etc.
- And all this only to mangle msg->im_msgtype and
- to set msg->im_mbz to "mbz" :-)
+ * Duplicate old header, fix ihl, length etc.
+ * And all this only to mangle msg->im_msgtype and
+ * to set msg->im_mbz to "mbz" :-)
*/
skb_push(skb, sizeof(struct iphdr));
skb_reset_network_header(skb);
@@ -911,27 +914,23 @@ static int ipmr_cache_report(struct mr_table *mrt,
#endif
{
- /*
- * Copy the IP header
- */
+ /* Copy the IP header */
skb->network_header = skb->tail;
skb_put(skb, ihl);
skb_copy_to_linear_data(skb, pkt->data, ihl);
- ip_hdr(skb)->protocol = 0; /* Flag to the kernel this is a route add */
+ ip_hdr(skb)->protocol = 0; /* Flag to the kernel this is a route add */
msg = (struct igmpmsg *)skb_network_header(skb);
msg->im_vif = vifi;
skb_dst_set(skb, dst_clone(skb_dst(pkt)));
- /*
- * Add our header
- */
+ /* Add our header */
- igmp=(struct igmphdr *)skb_put(skb, sizeof(struct igmphdr));
+ igmp = (struct igmphdr *)skb_put(skb, sizeof(struct igmphdr));
igmp->type =
msg->im_msgtype = assert;
- igmp->code = 0;
- ip_hdr(skb)->tot_len = htons(skb->len); /* Fix the length */
+ igmp->code = 0;
+ ip_hdr(skb)->tot_len = htons(skb->len); /* Fix the length */
skb->transport_header = skb->network_header;
}
@@ -943,9 +942,8 @@ static int ipmr_cache_report(struct mr_table *mrt,
return -EINVAL;
}
- /*
- * Deliver to mrouted
- */
+ /* Deliver to mrouted */
+
ret = sock_queue_rcv_skb(mroute_sk, skb);
rcu_read_unlock();
if (ret < 0) {
@@ -979,9 +977,7 @@ ipmr_cache_unresolved(struct mr_table *mrt, vifi_t vifi, struct sk_buff *skb)
}
if (!found) {
- /*
- * Create a new entry if allowable
- */
+ /* Create a new entry if allowable */
if (atomic_read(&mrt->cache_resolve_queue_len) >= 10 ||
(c = ipmr_cache_alloc_unres()) == NULL) {
@@ -991,16 +987,14 @@ ipmr_cache_unresolved(struct mr_table *mrt, vifi_t vifi, struct sk_buff *skb)
return -ENOBUFS;
}
- /*
- * Fill in the new cache entry
- */
+ /* Fill in the new cache entry */
+
c->mfc_parent = -1;
c->mfc_origin = iph->saddr;
c->mfc_mcastgrp = iph->daddr;
- /*
- * Reflect first query at mrouted.
- */
+ /* Reflect first query at mrouted. */
+
err = ipmr_cache_report(mrt, skb, vifi, IGMPMSG_NOCACHE);
if (err < 0) {
/* If the report failed throw the cache entry
@@ -1020,10 +1014,9 @@ ipmr_cache_unresolved(struct mr_table *mrt, vifi_t vifi, struct sk_buff *skb)
mod_timer(&mrt->ipmr_expire_timer, c->mfc_un.unres.expires);
}
- /*
- * See if we can append the packet
- */
- if (c->mfc_un.unres.unresolved.qlen>3) {
+ /* See if we can append the packet */
+
+ if (c->mfc_un.unres.unresolved.qlen > 3) {
kfree_skb(skb);
err = -ENOBUFS;
} else {
@@ -1140,18 +1133,16 @@ static void mroute_clean_tables(struct mr_table *mrt)
LIST_HEAD(list);
struct mfc_cache *c, *next;
- /*
- * Shut down all active vif entries
- */
+ /* Shut down all active vif entries */
+
for (i = 0; i < mrt->maxvif; i++) {
- if (!(mrt->vif_table[i].flags&VIFF_STATIC))
+ if (!(mrt->vif_table[i].flags & VIFF_STATIC))
vif_delete(mrt, i, 0, &list);
}
unregister_netdevice_many(&list);
- /*
- * Wipe the cache
- */
+ /* Wipe the cache */
+
for (i = 0; i < MFC_LINES; i++) {
list_for_each_entry_safe(c, next, &mrt->mfc_cache_array[i], list) {
if (c->mfc_flags & MFC_STATIC)
@@ -1282,7 +1273,7 @@ int ip_mroute_setsockopt(struct sock *sk, int optname, char __user *optval, unsi
case MRT_ASSERT:
{
int v;
- if (get_user(v,(int __user *)optval))
+ if (get_user(v, (int __user *)optval))
return -EFAULT;
mrt->mroute_do_assert = (v) ? 1 : 0;
return 0;
@@ -1292,7 +1283,7 @@ int ip_mroute_setsockopt(struct sock *sk, int optname, char __user *optval, unsi
{
int v;
- if (get_user(v,(int __user *)optval))
+ if (get_user(v, (int __user *)optval))
return -EFAULT;
v = (v) ? 1 : 0;
@@ -1355,9 +1346,9 @@ int ip_mroute_getsockopt(struct sock *sk, int optname, char __user *optval, int
if (optname != MRT_VERSION &&
#ifdef CONFIG_IP_PIMSM
- optname!=MRT_PIM &&
+ optname != MRT_PIM &&
#endif
- optname!=MRT_ASSERT)
+ optname != MRT_ASSERT)
return -ENOPROTOOPT;
if (get_user(olr, optlen))
@@ -1473,7 +1464,7 @@ static struct notifier_block ip_mr_notifier = {
};
/*
- * Encapsulate a packet by attaching a valid IPIP header to it.
+ * Encapsulate a packet by attaching a valid IPIP header to it.
* This avoids tunnel drivers and other mess and gives us the speed so
* important for multicast video.
*/
@@ -1488,7 +1479,7 @@ static void ip_encap(struct sk_buff *skb, __be32 saddr, __be32 daddr)
skb_reset_network_header(skb);
iph = ip_hdr(skb);
- iph->version = 4;
+ iph->version = 4;
iph->tos = old_iph->tos;
iph->ttl = old_iph->ttl;
iph->frag_off = 0;
@@ -1506,7 +1497,7 @@ static void ip_encap(struct sk_buff *skb, __be32 saddr, __be32 daddr)
static inline int ipmr_forward_finish(struct sk_buff *skb)
{
- struct ip_options * opt = &(IPCB(skb)->opt);
+ struct ip_options *opt = &(IPCB(skb)->opt);
IP_INC_STATS_BH(dev_net(skb_dst(skb)->dev), IPSTATS_MIB_OUTFORWDATAGRAMS);
@@ -1543,22 +1534,34 @@ static void ipmr_queue_xmit(struct net *net, struct mr_table *mrt,
}
#endif
- if (vif->flags&VIFF_TUNNEL) {
- struct flowi fl = { .oif = vif->link,
- .nl_u = { .ip4_u =
- { .daddr = vif->remote,
- .saddr = vif->local,
- .tos = RT_TOS(iph->tos) } },
- .proto = IPPROTO_IPIP };
+ if (vif->flags & VIFF_TUNNEL) {
+ struct flowi fl = {
+ .oif = vif->link,
+ .nl_u = {
+ .ip4_u = {
+ .daddr = vif->remote,
+ .saddr = vif->local,
+ .tos = RT_TOS(iph->tos)
+ }
+ },
+ .proto = IPPROTO_IPIP
+ };
+
if (ip_route_output_key(net, &rt, &fl))
goto out_free;
encap = sizeof(struct iphdr);
} else {
- struct flowi fl = { .oif = vif->link,
- .nl_u = { .ip4_u =
- { .daddr = iph->daddr,
- .tos = RT_TOS(iph->tos) } },
- .proto = IPPROTO_IPIP };
+ struct flowi fl = {
+ .oif = vif->link,
+ .nl_u = {
+ .ip4_u = {
+ .daddr = iph->daddr,
+ .tos = RT_TOS(iph->tos)
+ }
+ },
+ .proto = IPPROTO_IPIP
+ };
+
if (ip_route_output_key(net, &rt, &fl))
goto out_free;
}
@@ -1567,8 +1570,8 @@ static void ipmr_queue_xmit(struct net *net, struct mr_table *mrt,
if (skb->len+encap > dst_mtu(&rt->dst) && (ntohs(iph->frag_off) & IP_DF)) {
/* Do not fragment multicasts. Alas, IPv4 does not
- allow to send ICMP, so that packets will disappear
- to blackhole.
+ * allow to send ICMP, so that packets will disappear
+ * to blackhole.
*/
IP_INC_STATS_BH(dev_net(dev), IPSTATS_MIB_FRAGFAILS);
@@ -1591,7 +1594,8 @@ static void ipmr_queue_xmit(struct net *net, struct mr_table *mrt,
ip_decrease_ttl(ip_hdr(skb));
/* FIXME: forward and output firewalls used to be called here.
- * What do we do with netfilter? -- RR */
+ * What do we do with netfilter? -- RR
+ */
if (vif->flags & VIFF_TUNNEL) {
ip_encap(skb, vif->local, vif->remote);
/* FIXME: extra output firewall step used to be here. --RR */
@@ -1652,15 +1656,15 @@ static int ip_mr_forward(struct net *net, struct mr_table *mrt,
if (skb_rtable(skb)->fl.iif == 0) {
/* It is our own packet, looped back.
- Very complicated situation...
-
- The best workaround until routing daemons will be
- fixed is not to redistribute packet, if it was
- send through wrong interface. It means, that
- multicast applications WILL NOT work for
- (S,G), which have default multicast route pointing
- to wrong oif. In any case, it is not a good
- idea to use multicasting applications on router.
+ * Very complicated situation...
+ *
+ * The best workaround until routing daemons will be
+ * fixed is not to redistribute packet, if it was
+ * send through wrong interface. It means, that
+ * multicast applications WILL NOT work for
+ * (S,G), which have default multicast route pointing
+ * to wrong oif. In any case, it is not a good
+ * idea to use multicasting applications on router.
*/
goto dont_forward;
}
@@ -1670,9 +1674,9 @@ static int ip_mr_forward(struct net *net, struct mr_table *mrt,
if (true_vifi >= 0 && mrt->mroute_do_assert &&
/* pimsm uses asserts, when switching from RPT to SPT,
- so that we cannot check that packet arrived on an oif.
- It is bad, but otherwise we would need to move pretty
- large chunk of pimd to kernel. Ough... --ANK
+ * so that we cannot check that packet arrived on an oif.
+ * It is bad, but otherwise we would need to move pretty
+ * large chunk of pimd to kernel. Ough... --ANK
*/
(mrt->mroute_do_pim ||
cache->mfc_un.res.ttls[true_vifi] < 255) &&
@@ -1690,10 +1694,12 @@ static int ip_mr_forward(struct net *net, struct mr_table *mrt,
/*
* Forward the frame
*/
- for (ct = cache->mfc_un.res.maxvif-1; ct >= cache->mfc_un.res.minvif; ct--) {
+ for (ct = cache->mfc_un.res.maxvif - 1;
+ ct >= cache->mfc_un.res.minvif; ct--) {
if (ip_hdr(skb)->ttl > cache->mfc_un.res.ttls[ct]) {
if (psend != -1) {
struct sk_buff *skb2 = skb_clone(skb, GFP_ATOMIC);
+
if (skb2)
ipmr_queue_xmit(net, mrt, skb2, cache,
psend);
@@ -1704,6 +1710,7 @@ static int ip_mr_forward(struct net *net, struct mr_table *mrt,
if (psend != -1) {
if (local) {
struct sk_buff *skb2 = skb_clone(skb, GFP_ATOMIC);
+
if (skb2)
ipmr_queue_xmit(net, mrt, skb2, cache, psend);
} else {
@@ -1733,7 +1740,7 @@ int ip_mr_input(struct sk_buff *skb)
int err;
/* Packet is looped back after forward, it should not be
- forwarded second time, but still can be delivered locally.
+ * forwarded second time, but still can be delivered locally.
*/
if (IPCB(skb)->flags & IPSKB_FORWARDED)
goto dont_forward;
@@ -1822,10 +1829,10 @@ static int __pim_rcv(struct mr_table *mrt, struct sk_buff *skb,
encap = (struct iphdr *)(skb_transport_header(skb) + pimlen);
/*
- Check that:
- a. packet is really destinted to a multicast group
- b. packet is not a NULL-REGISTER
- c. packet is not truncated
+ * Check that:
+ * a. packet is really sent to a multicast group
+ * b. packet is not a NULL-REGISTER
+ * c. packet is not truncated
*/
if (!ipv4_is_multicast(encap->daddr) ||
encap->tot_len == 0 ||
@@ -1860,7 +1867,7 @@ static int __pim_rcv(struct mr_table *mrt, struct sk_buff *skb,
* Handle IGMP messages of PIMv1
*/
-int pim_rcv_v1(struct sk_buff * skb)
+int pim_rcv_v1(struct sk_buff *skb)
{
struct igmphdr *pim;
struct net *net = dev_net(skb->dev);
@@ -1887,7 +1894,7 @@ drop:
#endif
#ifdef CONFIG_IP_PIMSM_V2
-static int pim_rcv(struct sk_buff * skb)
+static int pim_rcv(struct sk_buff *skb)
{
struct pimreghdr *pim;
struct net *net = dev_net(skb->dev);
@@ -1897,8 +1904,8 @@ static int pim_rcv(struct sk_buff * skb)
goto drop;
pim = (struct pimreghdr *)skb_transport_header(skb);
- if (pim->type != ((PIM_VERSION<<4)|(PIM_REGISTER)) ||
- (pim->flags&PIM_NULL_REGISTER) ||
+ if (pim->type != ((PIM_VERSION << 4) | (PIM_REGISTER)) ||
+ (pim->flags & PIM_NULL_REGISTER) ||
(ip_compute_csum((void *)pim, sizeof(*pim)) != 0 &&
csum_fold(skb_checksum(skb, 0, skb->len, 0))))
goto drop;
@@ -1971,7 +1978,7 @@ int ipmr_get_route(struct net *net,
struct sk_buff *skb2;
struct iphdr *iph;
struct net_device *dev;
- int vif;
+ int vif = -1;
if (nowait) {
rcu_read_unlock();
@@ -1980,7 +1987,9 @@ int ipmr_get_route(struct net *net,
dev = skb->dev;
read_lock(&mrt_lock);
- if (dev == NULL || (vif = ipmr_find_vif(mrt, dev)) < 0) {
+ if (dev)
+ vif = ipmr_find_vif(mrt, dev);
+ if (vif < 0) {
read_unlock(&mrt_lock);
rcu_read_unlock();
return -ENODEV;
@@ -2098,7 +2107,8 @@ done:
#ifdef CONFIG_PROC_FS
/*
- * The /proc interfaces to multicast routing /proc/ip_mr_cache /proc/ip_mr_vif
+ * The /proc interfaces to multicast routing :
+ * /proc/net/ip_mr_cache & /proc/net/ip_mr_vif
*/
struct ipmr_vif_iter {
struct seq_net_private p;
@@ -2294,7 +2304,7 @@ static void *ipmr_mfc_seq_next(struct seq_file *seq, void *v, loff_t *pos)
if (!list_empty(it->cache))
return list_first_entry(it->cache, struct mfc_cache, list);
- end_of_list:
+end_of_list:
spin_unlock_bh(&mfc_unres_lock);
it->cache = NULL;
@@ -2335,7 +2345,7 @@ static int ipmr_mfc_seq_show(struct seq_file *seq, void *v)
mfc->mfc_un.res.bytes,
mfc->mfc_un.res.wrong_if);
for (n = mfc->mfc_un.res.minvif;
- n < mfc->mfc_un.res.maxvif; n++ ) {
+ n < mfc->mfc_un.res.maxvif; n++) {
if (VIF_EXISTS(mrt, n) &&
mfc->mfc_un.res.ttls[n] < 255)
seq_printf(seq,
^ permalink raw reply related
* [PATCH net-next 3/4] ipmr: RCU protection for mfc_cache_array
From: Eric Dumazet @ 2010-10-02 2:15 UTC (permalink / raw)
To: David Miller; +Cc: netdev
Use RCU & RTNL protection for mfc_cache_array[]
ipmr_cache_find() is called under rcu_read_lock();
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
include/linux/mroute.h | 1
net/ipv4/ipmr.c | 87 +++++++++++++++++++++------------------
2 files changed, 48 insertions(+), 40 deletions(-)
diff --git a/include/linux/mroute.h b/include/linux/mroute.h
index fa04b24..0fa7a3a 100644
--- a/include/linux/mroute.h
+++ b/include/linux/mroute.h
@@ -213,6 +213,7 @@ struct mfc_cache {
unsigned char ttls[MAXVIFS]; /* TTL thresholds */
} res;
} mfc_un;
+ struct rcu_head rcu;
};
#define MFC_STATIC 1
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index e2db2ea..cbb6dab 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -577,11 +577,18 @@ static int vif_delete(struct mr_table *mrt, int vifi, int notify,
return 0;
}
-static inline void ipmr_cache_free(struct mfc_cache *c)
+static void ipmr_cache_free_rcu(struct rcu_head *head)
{
+ struct mfc_cache *c = container_of(head, struct mfc_cache, rcu);
+
kmem_cache_free(mrt_cachep, c);
}
+static inline void ipmr_cache_free(struct mfc_cache *c)
+{
+ call_rcu(&c->rcu, ipmr_cache_free_rcu);
+}
+
/* Destroy an unresolved cache entry, killing queued skbs
and reporting error to netlink readers.
*/
@@ -781,6 +788,7 @@ static int vif_add(struct net *net, struct mr_table *mrt,
return 0;
}
+/* called with rcu_read_lock() */
static struct mfc_cache *ipmr_cache_find(struct mr_table *mrt,
__be32 origin,
__be32 mcastgrp)
@@ -788,7 +796,7 @@ static struct mfc_cache *ipmr_cache_find(struct mr_table *mrt,
int line = MFC_HASH(mcastgrp, origin);
struct mfc_cache *c;
- list_for_each_entry(c, &mrt->mfc_cache_array[line], list) {
+ list_for_each_entry_rcu(c, &mrt->mfc_cache_array[line], list) {
if (c->mfc_origin == origin && c->mfc_mcastgrp == mcastgrp)
return c;
}
@@ -801,19 +809,20 @@ static struct mfc_cache *ipmr_cache_find(struct mr_table *mrt,
static struct mfc_cache *ipmr_cache_alloc(void)
{
struct mfc_cache *c = kmem_cache_zalloc(mrt_cachep, GFP_KERNEL);
- if (c == NULL)
- return NULL;
- c->mfc_un.res.minvif = MAXVIFS;
+
+ if (c)
+ c->mfc_un.res.minvif = MAXVIFS;
return c;
}
static struct mfc_cache *ipmr_cache_alloc_unres(void)
{
struct mfc_cache *c = kmem_cache_zalloc(mrt_cachep, GFP_ATOMIC);
- if (c == NULL)
- return NULL;
- skb_queue_head_init(&c->mfc_un.unres.unresolved);
- c->mfc_un.unres.expires = jiffies + 10*HZ;
+
+ if (c) {
+ skb_queue_head_init(&c->mfc_un.unres.unresolved);
+ c->mfc_un.unres.expires = jiffies + 10*HZ;
+ }
return c;
}
@@ -1040,9 +1049,7 @@ static int ipmr_mfc_delete(struct mr_table *mrt, struct mfcctl *mfc)
list_for_each_entry_safe(c, next, &mrt->mfc_cache_array[line], list) {
if (c->mfc_origin == mfc->mfcc_origin.s_addr &&
c->mfc_mcastgrp == mfc->mfcc_mcastgrp.s_addr) {
- write_lock_bh(&mrt_lock);
- list_del(&c->list);
- write_unlock_bh(&mrt_lock);
+ list_del_rcu(&c->list);
ipmr_cache_free(c);
return 0;
@@ -1095,9 +1102,7 @@ static int ipmr_mfc_add(struct net *net, struct mr_table *mrt,
if (!mrtsock)
c->mfc_flags |= MFC_STATIC;
- write_lock_bh(&mrt_lock);
- list_add(&c->list, &mrt->mfc_cache_array[line]);
- write_unlock_bh(&mrt_lock);
+ list_add_rcu(&c->list, &mrt->mfc_cache_array[line]);
/*
* Check to see if we resolved a queued list. If so we
@@ -1149,12 +1154,9 @@ static void mroute_clean_tables(struct mr_table *mrt)
*/
for (i = 0; i < MFC_LINES; i++) {
list_for_each_entry_safe(c, next, &mrt->mfc_cache_array[i], list) {
- if (c->mfc_flags&MFC_STATIC)
+ if (c->mfc_flags & MFC_STATIC)
continue;
- write_lock_bh(&mrt_lock);
- list_del(&c->list);
- write_unlock_bh(&mrt_lock);
-
+ list_del_rcu(&c->list);
ipmr_cache_free(c);
}
}
@@ -1422,19 +1424,19 @@ int ipmr_ioctl(struct sock *sk, int cmd, void __user *arg)
if (copy_from_user(&sr, arg, sizeof(sr)))
return -EFAULT;
- read_lock(&mrt_lock);
+ rcu_read_lock();
c = ipmr_cache_find(mrt, sr.src.s_addr, sr.grp.s_addr);
if (c) {
sr.pktcnt = c->mfc_un.res.pkt;
sr.bytecnt = c->mfc_un.res.bytes;
sr.wrong_if = c->mfc_un.res.wrong_if;
- read_unlock(&mrt_lock);
+ rcu_read_unlock();
if (copy_to_user(arg, &sr, sizeof(sr)))
return -EFAULT;
return 0;
}
- read_unlock(&mrt_lock);
+ rcu_read_unlock();
return -EADDRNOTAVAIL;
default:
return -ENOIOCTLCMD;
@@ -1764,7 +1766,7 @@ int ip_mr_input(struct sk_buff *skb)
}
}
- read_lock(&mrt_lock);
+ /* already under rcu_read_lock() */
cache = ipmr_cache_find(mrt, ip_hdr(skb)->saddr, ip_hdr(skb)->daddr);
/*
@@ -1776,13 +1778,12 @@ int ip_mr_input(struct sk_buff *skb)
if (local) {
struct sk_buff *skb2 = skb_clone(skb, GFP_ATOMIC);
ip_local_deliver(skb);
- if (skb2 == NULL) {
- read_unlock(&mrt_lock);
+ if (skb2 == NULL)
return -ENOBUFS;
- }
skb = skb2;
}
+ read_lock(&mrt_lock);
vif = ipmr_find_vif(mrt, skb->dev);
if (vif >= 0) {
int err2 = ipmr_cache_unresolved(mrt, vif, skb);
@@ -1795,8 +1796,8 @@ int ip_mr_input(struct sk_buff *skb)
return -ENODEV;
}
+ read_lock(&mrt_lock);
ip_mr_forward(net, mrt, skb, cache, local);
-
read_unlock(&mrt_lock);
if (local)
@@ -1963,7 +1964,7 @@ int ipmr_get_route(struct net *net,
if (mrt == NULL)
return -ENOENT;
- read_lock(&mrt_lock);
+ rcu_read_lock();
cache = ipmr_cache_find(mrt, rt->rt_src, rt->rt_dst);
if (cache == NULL) {
@@ -1973,18 +1974,21 @@ int ipmr_get_route(struct net *net,
int vif;
if (nowait) {
- read_unlock(&mrt_lock);
+ rcu_read_unlock();
return -EAGAIN;
}
dev = skb->dev;
+ read_lock(&mrt_lock);
if (dev == NULL || (vif = ipmr_find_vif(mrt, dev)) < 0) {
read_unlock(&mrt_lock);
+ rcu_read_unlock();
return -ENODEV;
}
skb2 = skb_clone(skb, GFP_ATOMIC);
if (!skb2) {
read_unlock(&mrt_lock);
+ rcu_read_unlock();
return -ENOMEM;
}
@@ -1997,13 +2001,16 @@ int ipmr_get_route(struct net *net,
iph->version = 0;
err = ipmr_cache_unresolved(mrt, vif, skb2);
read_unlock(&mrt_lock);
+ rcu_read_unlock();
return err;
}
- if (!nowait && (rtm->rtm_flags&RTM_F_NOTIFY))
+ read_lock(&mrt_lock);
+ if (!nowait && (rtm->rtm_flags & RTM_F_NOTIFY))
cache->mfc_flags |= MFC_NOTIFY;
err = __ipmr_fill_mroute(mrt, skb, cache, rtm);
read_unlock(&mrt_lock);
+ rcu_read_unlock();
return err;
}
@@ -2055,14 +2062,14 @@ static int ipmr_rtm_dumproute(struct sk_buff *skb, struct netlink_callback *cb)
s_h = cb->args[1];
s_e = cb->args[2];
- read_lock(&mrt_lock);
+ rcu_read_lock();
ipmr_for_each_table(mrt, net) {
if (t < s_t)
goto next_table;
if (t > s_t)
s_h = 0;
for (h = s_h; h < MFC_LINES; h++) {
- list_for_each_entry(mfc, &mrt->mfc_cache_array[h], list) {
+ list_for_each_entry_rcu(mfc, &mrt->mfc_cache_array[h], list) {
if (e < s_e)
goto next_entry;
if (ipmr_fill_mroute(mrt, skb,
@@ -2080,7 +2087,7 @@ next_table:
t++;
}
done:
- read_unlock(&mrt_lock);
+ rcu_read_unlock();
cb->args[2] = e;
cb->args[1] = h;
@@ -2213,14 +2220,14 @@ static struct mfc_cache *ipmr_mfc_seq_idx(struct net *net,
struct mr_table *mrt = it->mrt;
struct mfc_cache *mfc;
- read_lock(&mrt_lock);
+ rcu_read_lock();
for (it->ct = 0; it->ct < MFC_LINES; it->ct++) {
it->cache = &mrt->mfc_cache_array[it->ct];
- list_for_each_entry(mfc, it->cache, list)
+ list_for_each_entry_rcu(mfc, it->cache, list)
if (pos-- == 0)
return mfc;
}
- read_unlock(&mrt_lock);
+ rcu_read_unlock();
spin_lock_bh(&mfc_unres_lock);
it->cache = &mrt->mfc_unres_queue;
@@ -2279,7 +2286,7 @@ static void *ipmr_mfc_seq_next(struct seq_file *seq, void *v, loff_t *pos)
}
/* exhausted cache_array, show unresolved */
- read_unlock(&mrt_lock);
+ rcu_read_unlock();
it->cache = &mrt->mfc_unres_queue;
it->ct = 0;
@@ -2302,7 +2309,7 @@ static void ipmr_mfc_seq_stop(struct seq_file *seq, void *v)
if (it->cache == &mrt->mfc_unres_queue)
spin_unlock_bh(&mfc_unres_lock);
else if (it->cache == &mrt->mfc_cache_array[it->ct])
- read_unlock(&mrt_lock);
+ rcu_read_unlock();
}
static int ipmr_mfc_seq_show(struct seq_file *seq, void *v)
@@ -2426,7 +2433,7 @@ int __init ip_mr_init(void)
mrt_cachep = kmem_cache_create("ip_mrt_cache",
sizeof(struct mfc_cache),
- 0, SLAB_HWCACHE_ALIGN|SLAB_PANIC,
+ 0, SLAB_HWCACHE_ALIGN | SLAB_PANIC,
NULL);
if (!mrt_cachep)
return -ENOMEM;
^ permalink raw reply related
* [PATCH net-next 2/4] ipmr: RCU conversion of mroute_sk
From: Eric Dumazet @ 2010-10-02 2:15 UTC (permalink / raw)
To: David Miller; +Cc: netdev
Use RCU and RTNL to protect (struct mr_table)->mroute_sk
Readers use RCU, writers use RTNL.
ip_ra_control() already use an RCU grace period before
ip_ra_destroy_rcu(), so we dont need synchronize_rcu() in
mrtsock_destruct()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
net/ipv4/ipmr.c | 91
++++++++++++++++++++++++++++++--------------------------
1 file changed, 49 insertions(+), 42 deletions(-)
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 1a92ebd..e2db2ea 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -75,7 +75,7 @@ struct mr_table {
struct net *net;
#endif
u32 id;
- struct sock *mroute_sk;
+ struct sock __rcu *mroute_sk;
struct timer_list ipmr_expire_timer;
struct list_head mfc_unres_queue;
struct list_head mfc_cache_array[MFC_LINES];
@@ -867,6 +867,7 @@ static int ipmr_cache_report(struct mr_table *mrt,
const int ihl = ip_hdrlen(pkt);
struct igmphdr *igmp;
struct igmpmsg *msg;
+ struct sock *mroute_sk;
int ret;
#ifdef CONFIG_IP_PIMSM
@@ -925,7 +926,10 @@ static int ipmr_cache_report(struct mr_table *mrt,
skb->transport_header = skb->network_header;
}
- if (mrt->mroute_sk == NULL) {
+ rcu_read_lock();
+ mroute_sk = rcu_dereference(mrt->mroute_sk);
+ if (mroute_sk == NULL) {
+ rcu_read_unlock();
kfree_skb(skb);
return -EINVAL;
}
@@ -933,7 +937,8 @@ static int ipmr_cache_report(struct mr_table *mrt,
/*
* Deliver to mrouted
*/
- ret = sock_queue_rcv_skb(mrt->mroute_sk, skb);
+ ret = sock_queue_rcv_skb(mroute_sk, skb);
+ rcu_read_unlock();
if (ret < 0) {
if (net_ratelimit())
printk(KERN_WARNING "mroute: pending queue full, dropping entries.\n");
@@ -1164,6 +1169,9 @@ static void mroute_clean_tables(struct mr_table *mrt)
}
}
+/* called from ip_ra_control(), before an RCU grace period,
+ * we dont need to call synchronize_rcu() here
+ */
static void mrtsock_destruct(struct sock *sk)
{
struct net *net = sock_net(sk);
@@ -1171,13 +1179,9 @@ static void mrtsock_destruct(struct sock *sk)
rtnl_lock();
ipmr_for_each_table(mrt, net) {
- if (sk == mrt->mroute_sk) {
+ if (sk == rtnl_dereference(mrt->mroute_sk)) {
IPV4_DEVCONF_ALL(net, MC_FORWARDING)--;
-
- write_lock_bh(&mrt_lock);
- mrt->mroute_sk = NULL;
- write_unlock_bh(&mrt_lock);
-
+ rcu_assign_pointer(mrt->mroute_sk, NULL);
mroute_clean_tables(mrt);
}
}
@@ -1204,7 +1208,8 @@ int ip_mroute_setsockopt(struct sock *sk, int optname, char __user *optval, unsi
return -ENOENT;
if (optname != MRT_INIT) {
- if (sk != mrt->mroute_sk && !capable(CAP_NET_ADMIN))
+ if (sk != rcu_dereference_raw(mrt->mroute_sk) &&
+ !capable(CAP_NET_ADMIN))
return -EACCES;
}
@@ -1217,23 +1222,20 @@ int ip_mroute_setsockopt(struct sock *sk, int optname, char __user *optval, unsi
return -ENOPROTOOPT;
rtnl_lock();
- if (mrt->mroute_sk) {
+ if (rtnl_dereference(mrt->mroute_sk)) {
rtnl_unlock();
return -EADDRINUSE;
}
ret = ip_ra_control(sk, 1, mrtsock_destruct);
if (ret == 0) {
- write_lock_bh(&mrt_lock);
- mrt->mroute_sk = sk;
- write_unlock_bh(&mrt_lock);
-
+ rcu_assign_pointer(mrt->mroute_sk, sk);
IPV4_DEVCONF_ALL(net, MC_FORWARDING)++;
}
rtnl_unlock();
return ret;
case MRT_DONE:
- if (sk != mrt->mroute_sk)
+ if (sk != rcu_dereference_raw(mrt->mroute_sk))
return -EACCES;
return ip_ra_control(sk, 0, NULL);
case MRT_ADD_VIF:
@@ -1246,7 +1248,8 @@ int ip_mroute_setsockopt(struct sock *sk, int optname, char __user *optval, unsi
return -ENFILE;
rtnl_lock();
if (optname == MRT_ADD_VIF) {
- ret = vif_add(net, mrt, &vif, sk == mrt->mroute_sk);
+ ret = vif_add(net, mrt, &vif,
+ sk == rtnl_dereference(mrt->mroute_sk));
} else {
ret = vif_delete(mrt, vif.vifc_vifi, 0, NULL);
}
@@ -1267,7 +1270,8 @@ int ip_mroute_setsockopt(struct sock *sk, int optname, char __user *optval, unsi
if (optname == MRT_DEL_MFC)
ret = ipmr_mfc_delete(mrt, &mfc);
else
- ret = ipmr_mfc_add(net, mrt, &mfc, sk == mrt->mroute_sk);
+ ret = ipmr_mfc_add(net, mrt, &mfc,
+ sk == rtnl_dereference(mrt->mroute_sk));
rtnl_unlock();
return ret;
/*
@@ -1309,14 +1313,16 @@ int ip_mroute_setsockopt(struct sock *sk, int optname, char __user *optval, unsi
return -EINVAL;
if (get_user(v, (u32 __user *)optval))
return -EFAULT;
- if (sk == mrt->mroute_sk)
- return -EBUSY;
rtnl_lock();
ret = 0;
- if (!ipmr_new_table(net, v))
- ret = -ENOMEM;
- raw_sk(sk)->ipmr_table = v;
+ if (sk == rtnl_dereference(mrt->mroute_sk)) {
+ ret = -EBUSY;
+ } else {
+ if (!ipmr_new_table(net, v))
+ ret = -ENOMEM;
+ raw_sk(sk)->ipmr_table = v;
+ }
rtnl_unlock();
return ret;
}
@@ -1713,6 +1719,7 @@ dont_forward:
/*
* Multicast packets for forwarding arrive here
+ * Called with rcu_read_lock();
*/
int ip_mr_input(struct sk_buff *skb)
@@ -1726,7 +1733,7 @@ int ip_mr_input(struct sk_buff *skb)
/* Packet is looped back after forward, it should not be
forwarded second time, but still can be delivered locally.
*/
- if (IPCB(skb)->flags&IPSKB_FORWARDED)
+ if (IPCB(skb)->flags & IPSKB_FORWARDED)
goto dont_forward;
err = ipmr_fib_lookup(net, &skb_rtable(skb)->fl, &mrt);
@@ -1736,24 +1743,24 @@ int ip_mr_input(struct sk_buff *skb)
}
if (!local) {
- if (IPCB(skb)->opt.router_alert) {
- if (ip_call_ra_chain(skb))
- return 0;
- } else if (ip_hdr(skb)->protocol == IPPROTO_IGMP){
- /* IGMPv1 (and broken IGMPv2 implementations sort of
- Cisco IOS <= 11.2(8)) do not put router alert
- option to IGMP packets destined to routable
- groups. It is very bad, because it means
- that we can forward NO IGMP messages.
- */
- read_lock(&mrt_lock);
- if (mrt->mroute_sk) {
- nf_reset(skb);
- raw_rcv(mrt->mroute_sk, skb);
- read_unlock(&mrt_lock);
- return 0;
- }
- read_unlock(&mrt_lock);
+ if (IPCB(skb)->opt.router_alert) {
+ if (ip_call_ra_chain(skb))
+ return 0;
+ } else if (ip_hdr(skb)->protocol == IPPROTO_IGMP) {
+ /* IGMPv1 (and broken IGMPv2 implementations sort of
+ * Cisco IOS <= 11.2(8)) do not put router alert
+ * option to IGMP packets destined to routable
+ * groups. It is very bad, because it means
+ * that we can forward NO IGMP messages.
+ */
+ struct sock *mroute_sk;
+
+ mroute_sk = rcu_dereference(mrt->mroute_sk);
+ if (mroute_sk) {
+ nf_reset(skb);
+ raw_rcv(mroute_sk, skb);
+ return 0;
+ }
}
}
^ permalink raw reply related
* [PATCH net-next 1/4] ipmr: __pim_rcv() is called under rcu_read_lock
From: Eric Dumazet @ 2010-10-02 2:14 UTC (permalink / raw)
To: David Miller; +Cc: netdev
No need to get a reference on reg_dev and release it, we are in a
rcu_read_lock() protected section.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
net/ipv4/ipmr.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 10b24c0..1a92ebd 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1805,6 +1805,7 @@ dont_forward:
}
#ifdef CONFIG_IP_PIMSM
+/* called with rcu_read_lock() */
static int __pim_rcv(struct mr_table *mrt, struct sk_buff *skb,
unsigned int pimlen)
{
@@ -1826,26 +1827,23 @@ static int __pim_rcv(struct mr_table *mrt, struct sk_buff *skb,
read_lock(&mrt_lock);
if (mrt->mroute_reg_vif_num >= 0)
reg_dev = mrt->vif_table[mrt->mroute_reg_vif_num].dev;
- if (reg_dev)
- dev_hold(reg_dev);
read_unlock(&mrt_lock);
if (reg_dev == NULL)
return 1;
skb->mac_header = skb->network_header;
- skb_pull(skb, (u8*)encap - skb->data);
+ skb_pull(skb, (u8 *)encap - skb->data);
skb_reset_network_header(skb);
skb->protocol = htons(ETH_P_IP);
- skb->ip_summed = 0;
+ skb->ip_summed = CHECKSUM_NONE;
skb->pkt_type = PACKET_HOST;
skb_tunnel_rx(skb, reg_dev);
netif_rx(skb);
- dev_put(reg_dev);
- return 0;
+ return NET_RX_SUCCESS;
}
#endif
^ permalink raw reply related
* Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support
From: M. Warner Losh @ 2010-10-02 1:44 UTC (permalink / raw)
To: cl
Cc: christian, giometti, peterz, johnstul, devicetree-discuss,
linux-kernel, davem, netdev, linux-arm-kernel, linux-api, tglx,
linuxppc-dev, richardcochran, alan, khc
In-Reply-To: <alpine.DEB.2.00.1009271035110.9258@router.home>
In message: <alpine.DEB.2.00.1009271035110.9258@router.home>
Christoph Lameter <cl@linux.com> writes:
: On Thu, 23 Sep 2010, Christian Riesch wrote:
:
: > > > It implies clock tuning in userspace for a potential sub microsecond
: > > > accurate clock. The clock accuracy will be limited by user space
: > > > latencies and noise. You wont be able to discipline the system clock
: > > > accurately.
: > >
: > > Noise matters, latency doesn't.
: >
: > Well put! That's why we need hardware support for PTP timestamping to reduce
: > the noise, but get along well with the clock servo that is steering the PHC in
: > user space.
:
: Even if I buy into the catch phrase above: User space is subject to noise
: that the in kernel code is not. If you do the tuning over long intervals
: then it hopefully averages out but it still causes jitter effects that
: affects the degree of accuracy (or sync) that you can reach. And the noise
: varies with the load on the system.
Please see the earlier posts in this thread about why this doesn't
matter as much as you might think. What matters is the measurements
(which are done in hardware and the results buffered), not the latency
in processing those messages through your servo. This is due to the
fact that the errors that even long latencies introduce are
proportional to the change in fractional frequency[*] of the clock being
steered. This change is usually on the order of a part per million.
Even with 10ms of latency would mean that you're introducing on the
order of sub-nanoseconds of phase error that will be measured in the
next cycle and steered out.
That's why latency doesn't matter. Do you have other math to show
that it does?
Warner
[*] abs(1 - (clock_freq_old / clock_freq_new)) where clock_freq_old is
the old estimate of the clock and clock_freq_new is the new frequency
estimate of the clock. Second to second, these change on the order of
a part per million or less...
^ permalink raw reply
* Re: [patch v2 08/12] [PATCH 08/12] IPVS: Add persistence engine data to /proc/net/ip_vs_conn
From: Simon Horman @ 2010-10-02 1:58 UTC (permalink / raw)
To: Julian Anastasov
Cc: lvs-devel, netdev, netfilter, netfilter-devel, Jan Engelhardt,
Stephen Hemminger, Wensong Zhang, Patrick McHardy
In-Reply-To: <alpine.LFD.2.00.1010020048030.2462@ja.ssi.bg>
On Sat, Oct 02, 2010 at 12:50:09AM +0300, Julian Anastasov wrote:
>
> Hello,
>
> On Fri, 1 Oct 2010, Simon Horman wrote:
>
> >Index: lvs-test-2.6/net/netfilter/ipvs/ip_vs_conn.c
> >===================================================================
> >--- lvs-test-2.6.orig/net/netfilter/ipvs/ip_vs_conn.c 2010-10-01 22:27:17.000000000 +0900
> >+++ lvs-test-2.6/net/netfilter/ipvs/ip_vs_conn.c 2010-10-01 22:27:32.000000000 +0900
> >@@ -938,30 +938,44 @@ static int ip_vs_conn_seq_show(struct se
> >
> > if (v == SEQ_START_TOKEN)
> > seq_puts(seq,
> >- "Pro FromIP FPrt ToIP TPrt DestIP DPrt State Expires\n");
> >+ "Pro FromIP FPrt ToIP TPrt DestIP DPrt State Expires PEName PEData\n");
> > else {
> > const struct ip_vs_conn *cp = v;
> >+ char pe_data[IP_VS_PENAME_MAXLEN + IP_VS_PEDATA_MAXLEN + 3];
> >+ size_t len = 0;
> >+
>
> Add check for cp->dest, it is optional:
Done.
> >+ if (cp->dest->svc->pe && cp->dest->svc->pe->show_pe_data) {
> >+ pe_data[0] = ' ';
> >+ len = strlen(cp->dest->svc->pe->name);
> >+ memcpy(pe_data + 1, cp->dest->svc->pe->name, len);
> >+ pe_data[len + 1] = ' ';
> >+ len += 2;
> >+ len += cp->dest->svc->pe->show_pe_data(cp,
> >+ pe_data + len);
> >+ }
> >+ pe_data[len] = '\0';
>
I will repost the entire series a little later.
For reference here is the updated patch.
>From bee0c73040b632723ff7a058f61a79063b18e882 Mon Sep 17 00:00:00 2001
From: Simon Horman <horms@verge.net.au>
Date: Sun, 22 Aug 2010 21:37:53 +0900
Subject: [patch v2.1 08/12] IPVS: Add persistence engine data to /proc/net/ip_vs_conn
This shouldn't break compatibility with userspace as the new data
is at the end of the line.
I have confirmed that this doesn't break ipvsadm, the main (only?)
user-space user of this data.
Signed-off-by: Simon Horman <horms@verge.net.au>
---
* Jan Engelhardt suggested using netlink to do this, but it seems like
overkill to me. I'm willing to be convinced otherwise.
v2
* Trivial rediff
v2.1
* As suggested by Julian Anastasov
- ip_vs_conn_seq_show(): cp->dest is optional
so test for its presence accordingly
- Re-diff for addition of inverse parameter to ip_vs_conn_hashkey_param()
Index: lvs-test-2.6/include/net/ip_vs.h
===================================================================
--- lvs-test-2.6.orig/include/net/ip_vs.h 2010-10-02 10:55:43.000000000 +0900
+++ lvs-test-2.6/include/net/ip_vs.h 2010-10-02 10:55:49.000000000 +0900
@@ -572,6 +572,7 @@ struct ip_vs_pe {
struct ip_vs_conn *ct);
u32 (*hashkey_raw)(const struct ip_vs_conn_param *p, u32 initval,
bool inverse);
+ int (*show_pe_data)(const struct ip_vs_conn *cp, char *buf);
};
/*
Index: lvs-test-2.6/net/netfilter/ipvs/ip_vs_conn.c
===================================================================
--- lvs-test-2.6.orig/net/netfilter/ipvs/ip_vs_conn.c 2010-10-02 10:55:43.000000000 +0900
+++ lvs-test-2.6/net/netfilter/ipvs/ip_vs_conn.c 2010-10-02 10:57:00.000000000 +0900
@@ -950,30 +950,45 @@ static int ip_vs_conn_seq_show(struct se
if (v == SEQ_START_TOKEN)
seq_puts(seq,
- "Pro FromIP FPrt ToIP TPrt DestIP DPrt State Expires\n");
+ "Pro FromIP FPrt ToIP TPrt DestIP DPrt State Expires PEName PEData\n");
else {
const struct ip_vs_conn *cp = v;
+ char pe_data[IP_VS_PENAME_MAXLEN + IP_VS_PEDATA_MAXLEN + 3];
+ size_t len = 0;
+
+ if (cp->dest && cp->dest->svc->pe &&
+ cp->dest->svc->pe->show_pe_data) {
+ pe_data[0] = ' ';
+ len = strlen(cp->dest->svc->pe->name);
+ memcpy(pe_data + 1, cp->dest->svc->pe->name, len);
+ pe_data[len + 1] = ' ';
+ len += 2;
+ len += cp->dest->svc->pe->show_pe_data(cp,
+ pe_data + len);
+ }
+ pe_data[len] = '\0';
#ifdef CONFIG_IP_VS_IPV6
if (cp->af == AF_INET6)
- seq_printf(seq, "%-3s %pI6 %04X %pI6 %04X %pI6 %04X %-11s %7lu\n",
+ seq_printf(seq, "%-3s %pI6 %04X %pI6 %04X "
+ "%pI6 %04X %-11s %7lu%s\n",
ip_vs_proto_name(cp->protocol),
&cp->caddr.in6, ntohs(cp->cport),
&cp->vaddr.in6, ntohs(cp->vport),
&cp->daddr.in6, ntohs(cp->dport),
ip_vs_state_name(cp->protocol, cp->state),
- (cp->timer.expires-jiffies)/HZ);
+ (cp->timer.expires-jiffies)/HZ, pe_data);
else
#endif
seq_printf(seq,
"%-3s %08X %04X %08X %04X"
- " %08X %04X %-11s %7lu\n",
+ " %08X %04X %-11s %7lu%s\n",
ip_vs_proto_name(cp->protocol),
ntohl(cp->caddr.ip), ntohs(cp->cport),
ntohl(cp->vaddr.ip), ntohs(cp->vport),
ntohl(cp->daddr.ip), ntohs(cp->dport),
ip_vs_state_name(cp->protocol, cp->state),
- (cp->timer.expires-jiffies)/HZ);
+ (cp->timer.expires-jiffies)/HZ, pe_data);
}
return 0;
}
^ permalink raw reply
* Re: [patch v2 07/12] [PATCH 07/12] IPVS: Add struct ip_vs_pe
From: Simon Horman @ 2010-10-02 1:55 UTC (permalink / raw)
To: Julian Anastasov
Cc: lvs-devel, netdev, netfilter, netfilter-devel, Jan Engelhardt,
Stephen Hemminger, Wensong Zhang, Patrick McHardy
In-Reply-To: <alpine.LFD.2.00.1010020030590.2320@ja.ssi.bg>
On Sat, Oct 02, 2010 at 12:45:52AM +0300, Julian Anastasov wrote:
>
> Hello,
>
> On Fri, 1 Oct 2010, Simon Horman wrote:
>
> >===================================================================
> >--- lvs-test-2.6.orig/net/netfilter/ipvs/ip_vs_conn.c 2010-10-01 22:48:42.000000000 +0900
> >+++ lvs-test-2.6/net/netfilter/ipvs/ip_vs_conn.c 2010-10-01 22:49:15.000000000 +0900
> >@@ -148,6 +148,29 @@ static unsigned int ip_vs_conn_hashkey(i
> > & ip_vs_conn_tab_mask;
> >}
> >
> >+static unsigned int ip_vs_conn_hashkey_param(const struct ip_vs_conn_param *p)
> >+{
> >+ if (p->pe && p->pe->hashkey_raw)
> >+ return p->pe->hashkey_raw(p, ip_vs_conn_rnd) &
> >+ ip_vs_conn_tab_mask;
> >+ return ip_vs_conn_hashkey(p->af, p->protocol, p->caddr, p->cport);
> >+}
> >+
> >+static unsigned int ip_vs_conn_hashkey_conn(const struct ip_vs_conn *cp)
> >+{
> >+ struct ip_vs_conn_param p;
> >+
> >+ ip_vs_conn_fill_param(cp->af, cp->protocol, &cp->caddr, cp->cport,
> >+ NULL, 0, &p);
> >+
>
> cp->dest is optional, line should be
> 'if (cp->dest && cp->dest->svc->pe) {':
Thanks, fixed.
> >+ if (cp->dest->svc->pe) {
> >+ p.pe = cp->dest->svc->pe;
> >+ p.pe_data = cp->pe_data;
> >+ p.pe_data_len = cp->pe_data_len;
> >+ }
> >+
> >+ return ip_vs_conn_hashkey_param(&p);
> >+}
> >
>
>
> >@@ -359,7 +387,7 @@ struct ip_vs_conn *ip_vs_conn_out_get(co
> > /*
> > * Check for "full" addressed entries
> > */
>
> Here ip_vs_conn_out_get expects client data in
> p->vaddr and p->vport (was daddr before) but
> ip_vs_conn_hashkey_param hashes client data from p->caddr and
> p->cport:
>
> >- hash = ip_vs_conn_hashkey(p->af, p->protocol, p->vaddr, p->vport);
> >+ hash = ip_vs_conn_hashkey_param(p);
> >
> > ct_read_lock(hash);
Thanks, I have resolved this by adding an inverse parameter to
ip_vs_conn_hashkey_param().
I will repost the entire series later, once I have addressed the
concerns you raised with other patches in the series. For reference
here is the revised patch.
>From 69bb236dde5b48cf043f2e31dfd4a93cd126c690 Mon Sep 17 00:00:00 2001
From: Simon Horman <horms@verge.net.au>
Date: Sun, 22 Aug 2010 21:37:53 +0900
Subject: [patch v2.1 07/12] IPVS: Add struct ip_vs_pe
Signed-off-by: Simon Horman <horms@verge.net.au>
---
This the first of several patches that add persistence engines.
v2
* Don't leak pe_data
- It wasn't being freed anywhere, ever
* Trivial rediff
v2.1
* As suggested by Julian Anastasov
- ip_vs_conn_fill_param(): cp->dest is optional
so check for its presence accordingly
- ip_vs_conn_out_get() needs to hash on vaddr/vport
whereas ip_vs_conn_hash() hashes on caddr/cport.
Add an inverse parameter to ip_vs_conn_hashkey_param()
to allow it to handle both cases.
* Trivial rediff
Index: lvs-test-2.6/include/linux/ip_vs.h
===================================================================
--- lvs-test-2.6.orig/include/linux/ip_vs.h 2010-10-02 10:29:43.000000000 +0900
+++ lvs-test-2.6/include/linux/ip_vs.h 2010-10-02 10:54:05.000000000 +0900
@@ -99,8 +99,10 @@
0)
#define IP_VS_SCHEDNAME_MAXLEN 16
+#define IP_VS_PENAME_MAXLEN 16
#define IP_VS_IFNAME_MAXLEN 16
+#define IP_VS_PEDATA_MAXLEN 255
/*
* The struct ip_vs_service_user and struct ip_vs_dest_user are
Index: lvs-test-2.6/include/net/ip_vs.h
===================================================================
--- lvs-test-2.6.orig/include/net/ip_vs.h 2010-10-02 10:32:41.000000000 +0900
+++ lvs-test-2.6/include/net/ip_vs.h 2010-10-02 10:54:06.000000000 +0900
@@ -364,6 +364,10 @@ struct ip_vs_conn_param {
__be16 vport;
__u16 protocol;
u16 af;
+
+ const struct ip_vs_pe *pe;
+ char *pe_data;
+ __u8 pe_data_len;
};
/*
@@ -416,6 +420,9 @@ struct ip_vs_conn {
void *app_data; /* Application private data */
struct ip_vs_seq in_seq; /* incoming seq. struct */
struct ip_vs_seq out_seq; /* outgoing seq. struct */
+
+ char *pe_data;
+ __u8 pe_data_len;
};
@@ -486,6 +493,9 @@ struct ip_vs_service {
struct ip_vs_scheduler *scheduler; /* bound scheduler object */
rwlock_t sched_lock; /* lock sched_data */
void *sched_data; /* scheduler application data */
+
+ /* alternate persistence engine */
+ struct ip_vs_pe *pe;
};
@@ -549,6 +559,20 @@ struct ip_vs_scheduler {
const struct sk_buff *skb);
};
+/* The persistence engine object */
+struct ip_vs_pe {
+ struct list_head n_list; /* d-linked list head */
+ char *name; /* scheduler name */
+ atomic_t refcnt; /* reference counter */
+ struct module *module; /* THIS_MODULE/NULL */
+
+ /* get the connection template, if any */
+ int (*fill_param)(struct ip_vs_conn_param *p, struct sk_buff *skb);
+ bool (*ct_match)(const struct ip_vs_conn_param *p,
+ struct ip_vs_conn *ct);
+ u32 (*hashkey_raw)(const struct ip_vs_conn_param *p, u32 initval,
+ bool inverse);
+};
/*
* The application module object (a.k.a. app incarnation)
@@ -648,6 +672,8 @@ static inline void ip_vs_conn_fill_param
p->cport = cport;
p->vaddr = vaddr;
p->vport = vport;
+ p->pe = NULL;
+ p->pe_data = NULL;
}
struct ip_vs_conn *ip_vs_conn_in_get(const struct ip_vs_conn_param *p);
@@ -803,7 +829,7 @@ extern int ip_vs_unbind_scheduler(struct
extern struct ip_vs_scheduler *ip_vs_scheduler_get(const char *sched_name);
extern void ip_vs_scheduler_put(struct ip_vs_scheduler *scheduler);
extern struct ip_vs_conn *
-ip_vs_schedule(struct ip_vs_service *svc, const struct sk_buff *skb);
+ip_vs_schedule(struct ip_vs_service *svc, struct sk_buff *skb);
extern int ip_vs_leave(struct ip_vs_service *svc, struct sk_buff *skb,
struct ip_vs_protocol *pp);
Index: lvs-test-2.6/net/netfilter/ipvs/ip_vs_conn.c
===================================================================
--- lvs-test-2.6.orig/net/netfilter/ipvs/ip_vs_conn.c 2010-10-02 10:32:41.000000000 +0900
+++ lvs-test-2.6/net/netfilter/ipvs/ip_vs_conn.c 2010-10-02 10:54:06.000000000 +0900
@@ -148,6 +148,42 @@ static unsigned int ip_vs_conn_hashkey(i
& ip_vs_conn_tab_mask;
}
+static unsigned int ip_vs_conn_hashkey_param(const struct ip_vs_conn_param *p,
+ bool inverse)
+{
+ const union nf_inet_addr *addr;
+ __be16 port;
+
+ if (p->pe && p->pe->hashkey_raw)
+ return p->pe->hashkey_raw(p, ip_vs_conn_rnd, inverse) &
+ ip_vs_conn_tab_mask;
+
+ if (likely(!inverse)) {
+ addr = p->caddr;
+ port = p->cport;
+ } else {
+ addr = p->vaddr;
+ port = p->vport;
+ }
+
+ return ip_vs_conn_hashkey(p->af, p->protocol, addr, port);
+}
+
+static unsigned int ip_vs_conn_hashkey_conn(const struct ip_vs_conn *cp)
+{
+ struct ip_vs_conn_param p;
+
+ ip_vs_conn_fill_param(cp->af, cp->protocol, &cp->caddr, cp->cport,
+ NULL, 0, &p);
+
+ if (cp->dest && cp->dest->svc->pe) {
+ p.pe = cp->dest->svc->pe;
+ p.pe_data = cp->pe_data;
+ p.pe_data_len = cp->pe_data_len;
+ }
+
+ return ip_vs_conn_hashkey_param(&p, false);
+}
/*
* Hashes ip_vs_conn in ip_vs_conn_tab by proto,addr,port.
@@ -162,7 +198,7 @@ static inline int ip_vs_conn_hash(struct
return 0;
/* Hash by protocol, client address and port */
- hash = ip_vs_conn_hashkey(cp->af, cp->protocol, &cp->caddr, cp->cport);
+ hash = ip_vs_conn_hashkey_conn(cp);
ct_write_lock(hash);
spin_lock(&cp->lock);
@@ -195,7 +231,7 @@ static inline int ip_vs_conn_unhash(stru
int ret;
/* unhash it and decrease its reference counter */
- hash = ip_vs_conn_hashkey(cp->af, cp->protocol, &cp->caddr, cp->cport);
+ hash = ip_vs_conn_hashkey_conn(cp);
ct_write_lock(hash);
spin_lock(&cp->lock);
@@ -227,7 +263,7 @@ __ip_vs_conn_in_get(const struct ip_vs_c
unsigned hash;
struct ip_vs_conn *cp;
- hash = ip_vs_conn_hashkey(p->af, p->protocol, p->caddr, p->cport);
+ hash = ip_vs_conn_hashkey_param(p, false);
ct_read_lock(hash);
@@ -312,11 +348,17 @@ struct ip_vs_conn *ip_vs_ct_in_get(const
unsigned hash;
struct ip_vs_conn *cp;
- hash = ip_vs_conn_hashkey(p->af, p->protocol, p->caddr, p->cport);
+ hash = ip_vs_conn_hashkey_param(p, false);
ct_read_lock(hash);
list_for_each_entry(cp, &ip_vs_conn_tab[hash], c_list) {
+ if (p->pe && p->pe->ct_match) {
+ if (p->pe->ct_match(p, cp))
+ goto out;
+ continue;
+ }
+
if (cp->af == p->af &&
ip_vs_addr_equal(p->af, p->caddr, &cp->caddr) &&
/* protocol should only be IPPROTO_IP if
@@ -325,15 +367,14 @@ struct ip_vs_conn *ip_vs_ct_in_get(const
p->af, p->vaddr, &cp->vaddr) &&
p->cport == cp->cport && p->vport == cp->vport &&
cp->flags & IP_VS_CONN_F_TEMPLATE &&
- p->protocol == cp->protocol) {
- /* HIT */
- atomic_inc(&cp->refcnt);
+ p->protocol == cp->protocol)
goto out;
- }
}
cp = NULL;
out:
+ if (cp)
+ atomic_inc(&cp->refcnt);
ct_read_unlock(hash);
IP_VS_DBG_BUF(9, "template lookup/in %s %s:%d->%s:%d %s\n",
@@ -357,7 +398,7 @@ struct ip_vs_conn *ip_vs_conn_out_get(co
/*
* Check for "full" addressed entries
*/
- hash = ip_vs_conn_hashkey(p->af, p->protocol, p->vaddr, p->vport);
+ hash = ip_vs_conn_hashkey_param(p, true);
ct_read_lock(hash);
@@ -722,6 +763,7 @@ static void ip_vs_conn_expire(unsigned l
if (cp->flags & IP_VS_CONN_F_NFCT)
ip_vs_conn_drop_conntrack(cp);
+ kfree(cp->pe_data);
if (unlikely(cp->app != NULL))
ip_vs_unbind_app(cp);
ip_vs_unbind_dest(cp);
@@ -782,6 +824,10 @@ ip_vs_conn_new(const struct ip_vs_conn_p
&cp->daddr, daddr);
cp->dport = dport;
cp->flags = flags;
+ if (flags & IP_VS_CONN_F_TEMPLATE && p->pe_data) {
+ cp->pe_data = p->pe_data;
+ cp->pe_data_len = p->pe_data_len;
+ }
spin_lock_init(&cp->lock);
/*
@@ -832,7 +878,6 @@ ip_vs_conn_new(const struct ip_vs_conn_p
return cp;
}
-
/*
* /proc/net/ip_vs_conn entries
*/
@@ -848,7 +893,7 @@ static void *ip_vs_conn_array(struct seq
list_for_each_entry(cp, &ip_vs_conn_tab[idx], c_list) {
if (pos-- == 0) {
seq->private = &ip_vs_conn_tab[idx];
- return cp;
+ return cp;
}
}
ct_read_unlock_bh(idx);
Index: lvs-test-2.6/net/netfilter/ipvs/ip_vs_core.c
===================================================================
--- lvs-test-2.6.orig/net/netfilter/ipvs/ip_vs_core.c 2010-10-02 10:32:41.000000000 +0900
+++ lvs-test-2.6/net/netfilter/ipvs/ip_vs_core.c 2010-10-02 10:54:05.000000000 +0900
@@ -176,6 +176,19 @@ ip_vs_set_state(struct ip_vs_conn *cp, i
return pp->state_transition(cp, direction, skb, pp);
}
+static inline int
+ip_vs_conn_fill_param_persist(const struct ip_vs_service *svc,
+ struct sk_buff *skb, int protocol,
+ const union nf_inet_addr *caddr, __be16 cport,
+ const union nf_inet_addr *vaddr, __be16 vport,
+ struct ip_vs_conn_param *p)
+{
+ ip_vs_conn_fill_param(svc->af, protocol, caddr, cport, vaddr, vport, p);
+ p->pe = svc->pe;
+ if (p->pe && p->pe->fill_param)
+ return p->pe->fill_param(p, skb);
+ return 0;
+}
/*
* IPVS persistent scheduling function
@@ -186,7 +199,7 @@ ip_vs_set_state(struct ip_vs_conn *cp, i
*/
static struct ip_vs_conn *
ip_vs_sched_persist(struct ip_vs_service *svc,
- const struct sk_buff *skb,
+ struct sk_buff *skb,
__be16 ports[2])
{
struct ip_vs_conn *cp = NULL;
@@ -255,8 +268,9 @@ ip_vs_sched_persist(struct ip_vs_service
vaddr = &fwmark;
}
}
- ip_vs_conn_fill_param(svc->af, protocol, &snet, 0,
- vaddr, vport, ¶m);
+ if (ip_vs_conn_fill_param_persist(svc, skb, protocol, &snet, 0,
+ vaddr, vport, ¶m))
+ return NULL;
}
/* Check if a template already exists */
@@ -268,22 +282,31 @@ ip_vs_sched_persist(struct ip_vs_service
dest = svc->scheduler->schedule(svc, skb);
if (!dest) {
IP_VS_DBG(1, "p-schedule: no dest found.\n");
+ kfree(param.pe_data);
return NULL;
}
if (ports[1] == svc->port && svc->port != FTPPORT)
dport = dest->port;
- /* Create a template */
+ /* Create a template
+ * This adds param.pe_data to the template,
+ * and thus param.pe_data will be destroyed
+ * when the template expires */
ct = ip_vs_conn_new(¶m, &dest->addr, dport,
IP_VS_CONN_F_TEMPLATE, dest);
- if (ct == NULL)
+ if (ct == NULL) {
+ kfree(param.pe_data);
return NULL;
+ }
ct->timeout = svc->timeout;
- } else
+ } else {
/* set destination with the found template */
dest = ct->dest;
+ kfree(param.pe_data);
+ }
+
dport = dest->port;
flags = (svc->flags & IP_VS_SVC_F_ONEPACKET
@@ -319,7 +342,7 @@ ip_vs_sched_persist(struct ip_vs_service
* Protocols supported: TCP, UDP
*/
struct ip_vs_conn *
-ip_vs_schedule(struct ip_vs_service *svc, const struct sk_buff *skb)
+ip_vs_schedule(struct ip_vs_service *svc, struct sk_buff *skb)
{
struct ip_vs_conn *cp = NULL;
struct ip_vs_iphdr iph;
Index: lvs-test-2.6/net/netfilter/ipvs/ip_vs_sync.c
===================================================================
--- lvs-test-2.6.orig/net/netfilter/ipvs/ip_vs_sync.c 2010-10-02 10:32:41.000000000 +0900
+++ lvs-test-2.6/net/netfilter/ipvs/ip_vs_sync.c 2010-10-02 10:32:42.000000000 +0900
@@ -288,6 +288,16 @@ void ip_vs_sync_conn(struct ip_vs_conn *
ip_vs_sync_conn(cp->control);
}
+static inline int
+ip_vs_conn_fill_param_sync(int af, int protocol,
+ const union nf_inet_addr *caddr, __be16 cport,
+ const union nf_inet_addr *vaddr, __be16 vport,
+ struct ip_vs_conn_param *p)
+{
+ /* XXX: Need to take into account persistence engine */
+ ip_vs_conn_fill_param(af, protocol, caddr, cport, vaddr, vport, p);
+ return 0;
+}
/*
* Process received multicast message and create the corresponding
@@ -372,11 +382,14 @@ static void ip_vs_process_message(const
}
{
- ip_vs_conn_fill_param(AF_INET, s->protocol,
+ if (ip_vs_conn_fill_param_sync(AF_INET, s->protocol,
(union nf_inet_addr *)&s->caddr,
s->cport,
(union nf_inet_addr *)&s->vaddr,
- s->vport, ¶m);
+ s->vport, ¶m)) {
+ pr_err("ip_vs_conn_fill_param_sync failed");
+ return;
+ }
if (!(flags & IP_VS_CONN_F_TEMPLATE))
cp = ip_vs_conn_in_get(¶m);
else
^ permalink raw reply
* Re: [patch v2 04/12] [PATCH 04/12] IPVS: Add struct ip_vs_conn_param
From: Simon Horman @ 2010-10-02 1:15 UTC (permalink / raw)
To: Julian Anastasov
Cc: lvs-devel, netdev, netfilter, netfilter-devel, Jan Engelhardt,
Stephen Hemminger, Wensong Zhang, Patrick McHardy
In-Reply-To: <20101002011350.GC2235@verge.net.au>
On Sat, Oct 02, 2010 at 10:13:50AM +0900, Simon Horman wrote:
> On Fri, Oct 01, 2010 at 11:58:04PM +0300, Julian Anastasov wrote:
> >
> > Hello,
> >
> > On Fri, 1 Oct 2010, Simon Horman wrote:
> >
> > >+static int
> > >+ip_vs_conn_fill_param_proto(int af, const struct sk_buff *skb,
> > >+ const struct ip_vs_iphdr *iph,
> > >+ unsigned int proto_off, int inverse,
> > >+ struct ip_vs_conn_param *p)
> > >+{
> > >+ __be16 _ports[2], *pptr;
> > >+
> > >+ pptr = skb_header_pointer(skb, proto_off, sizeof(_ports), _ports);
> > >+ if (pptr == NULL)
> > >+ return 1;
> > >+
> > >+ if (likely(!inverse))
> > >+ ip_vs_conn_fill_param(af, iph->protocol, &iph->saddr, pptr[0],
> > >+ &iph->daddr, pptr[1], p);
> > >+ else
> >
> > Next line is wrong for inverse=1, must be
> > &iph->daddr, pptr[1], &iph->saddr, pptr[0]
>
> Thanks, fixed.
>
> > >+ ip_vs_conn_fill_param(af, iph->protocol, &iph->saddr, pptr[0],
> > >+ &iph->daddr, pptr[1], p);
> > >+ return 0;
> > >+}
> > >+
> >
> > May be comments before ip_vs_conn_out_get should be
> > changed:
> >
> > >@@ -341,9 +351,7 @@ struct ip_vs_conn *ip_vs_ct_in_get
> > > * s_addr, s_port: pkt source address (inside host)
> > > * d_addr, d_port: pkt dest address (foreign host)
> > > */
> > >-struct ip_vs_conn *ip_vs_conn_out_get
> > >-(int af, int protocol, const union nf_inet_addr *s_addr, __be16 s_port,
> > >- const union nf_inet_addr *d_addr, __be16 d_port)
> > >+struct ip_vs_conn *ip_vs_conn_out_get(const struct ip_vs_conn_param *p)
>
> I have updated it to:
>
> /* Gets ip_vs_conn associated with supplied parameters in the
> * ip_vs_conn_tab.
> * Called for pkts coming from inside-to-OUTside.
> * p->caddr, p->cport: pkt source address (inside host)
> * p->vaddr, p->vport: pkt dest address (foreign host) */
>
> > >===================================================================
> > >--- lvs-test-2.6.orig/net/netfilter/ipvs/ip_vs_core.c 2010-10-01 22:06:23.000000000 +0900
> > >+++ lvs-test-2.6/net/netfilter/ipvs/ip_vs_core.c 2010-10-01 22:10:46.000000000 +0900
> > >@@ -193,14 +193,11 @@ ip_vs_sched_persist(struct ip_vs_service
> > > struct ip_vs_iphdr iph;
> > > struct ip_vs_dest *dest;
> > > struct ip_vs_conn *ct;
> > >- int protocol = iph.protocol;
> > > __be16 dport = 0; /* destination port to forward */
> > >- __be16 vport = 0; /* virtual service port */
> > > unsigned int flags;
> > > union nf_inet_addr snet; /* source network of the client,
> > > after masking */
> > >- const union nf_inet_addr fwmark = { .ip = htonl(svc->fwmark) };
> > >- const union nf_inet_addr *vaddr = &iph.daddr;
> > >+ struct ip_vs_conn_param param;
> > >
> > > ip_vs_fill_iphdr(svc->af, skb_network_header(skb), &iph);
> > >
> > >@@ -232,6 +229,11 @@ ip_vs_sched_persist(struct ip_vs_service
> > > * is created for other persistent services.
> > > */
> > > {
> > >+ int protocol = iph.protocol;
> > >+ const union nf_inet_addr *vaddr = &iph.daddr;
> > >+ const union nf_inet_addr fwmark = { .ip = htonl(svc->fwmark) };
> > >+ __be16 vport = 0;
> > >+
> > > if (ports[1] == svc->port) {
> > > /* non-FTP template:
> > > * <protocol, caddr, 0, vaddr, vport, daddr, dport>
> > >@@ -253,11 +255,12 @@ ip_vs_sched_persist(struct ip_vs_service
> > > vaddr = &fwmark;
> > > }
> > > }
> > >+ ip_vs_conn_fill_param(svc->af, protocol, &snet, 0,
> > >+ vaddr, vport, ¶m);
> > > }
> > >
> > > /* Check if a template already exists */
> > >- ct = ip_vs_ct_in_get(svc->af, protocol, &snet, 0, vaddr, vport);
> > >-
> > >+ ct = ip_vs_ct_in_get(¶m);
> > > if (!ct || !ip_vs_check_template(ct)) {
> > > /* No template found or the dest of the connection
> > > * template is not available.
> > >@@ -272,8 +275,7 @@ ip_vs_sched_persist(struct ip_vs_service
> > > dport = dest->port;
> > >
> > > /* Create a template */
> > >- ct = ip_vs_conn_new(svc->af, protocol, &snet, 0,vaddr, vport,
> > >- &dest->addr, dport,
> > >+ ct = ip_vs_conn_new(¶m, &dest->addr, dport,
> > > IP_VS_CONN_F_TEMPLATE, dest);
> > > if (ct == NULL)
> > > return NULL;
> > >@@ -291,12 +293,7 @@ ip_vs_sched_persist(struct ip_vs_service
> > > /*
> > > * Create a new connection according to the template
> > > */
> >
> > Missing ip_vs_conn_fill_param here?
>
> Ooops, yes. I think that for some reason I thought it wasn't necessary.
> I have added the following:
>
> ip_vs_conn_fill_param(svc->af, iph.protocol, &iph.saddr, ports[0],
> &iph.daddr, ports[1], ¶m);
>
> > >- cp = ip_vs_conn_new(svc->af, iph.protocol,
> > >- &iph.saddr, ports[0],
> > >- &iph.daddr, ports[1],
> > >- &dest->addr, dport,
> > >- flags,
> > >- dest);
> > >+ cp = ip_vs_conn_new(¶m, &dest->addr, dport, flags, dest);
> > > if (cp == NULL) {
> > > ip_vs_conn_put(ct);
> > > return NULL;
> >
> > >===================================================================
> > >--- lvs-test-2.6.orig/net/netfilter/ipvs/ip_vs_proto_ah_esp.c 2010-10-01 21:55:19.000000000 +0900
> > >+++ lvs-test-2.6/net/netfilter/ipvs/ip_vs_proto_ah_esp.c 2010-10-01 22:23:33.000000000 +0900
> > >@@ -40,6 +40,19 @@ struct isakmp_hdr {
> > >
> > >#define PORT_ISAKMP 500
> > >
> > >+static void
> > >+ah_esp_conn_fill_param_proto(int af, const struct ip_vs_iphdr *iph,
> > >+ int inverse, struct ip_vs_conn_param *p)
> > >+{
> > >+ if (likely(!inverse))
> > >+ ip_vs_conn_fill_param(af, IPPROTO_UDP,
> > >+ &iph->saddr, htons(PORT_ISAKMP),
> > >+ &iph->daddr, htons(PORT_ISAKMP), p);
> > >+ else
> >
> > For inverse=1 iph->protocol must be IPPROTO_UDP
> > and &iph->daddr before &iph->saddr:
>
> Thanks, fixed.
>
> > >+ ip_vs_conn_fill_param(af, iph->protocol,
> > >+ &iph->saddr, htons(PORT_ISAKMP),
> > >+ &iph->daddr, htons(PORT_ISAKMP), p);
> > >+}
>
> I will repost the entire series after addressing the concerns
> you raised with several of the other patches. But for reference
> here is the revised version of this patch.
Sorry, the previous post was not the new patch. Here it is:
>From 335c0ae2b64d8071762c50ea20b4f55bb12c2a5b Mon Sep 17 00:00:00 2001
From: Simon Horman <horms@verge.net.au>
Date: Sun, 22 Aug 2010 21:37:52 +0900
Subject: [patch v2.2 04/12] IPVS: Add struct ip_vs_conn_param
Signed-off-by: Simon Horman <horms@verge.net.au>
---
The motivation for this is to allow persistence engine modules to
fill in the parameters.
v0.3
* Add missing changes to ip_vs_ftp.c
v2
* make "union nf_inet_addr fwmark" const
* Update for the recent addition of ip_vs_nfct.c
v2.2
* As suggested by Julian Anastasov
- Correct logic for inverse case in ip_vs_conn_fill_param_proto()
and ah_esp_conn_fill_param_proto()
- Update ip_vs_conn_out_get()'s comments as its parameters have changed
- Add missing call to ip_vs_conn_fill_param() before the second
invocation of ip_vs_conn_new() in ip_vs_sched_persist()
Index: lvs-test-2.6/include/net/ip_vs.h
===================================================================
--- lvs-test-2.6.orig/include/net/ip_vs.h 2010-10-02 09:35:34.000000000 +0900
+++ lvs-test-2.6/include/net/ip_vs.h 2010-10-02 10:14:05.000000000 +0900
@@ -357,6 +357,15 @@ struct ip_vs_protocol {
extern struct ip_vs_protocol * ip_vs_proto_get(unsigned short proto);
+struct ip_vs_conn_param {
+ const union nf_inet_addr *caddr;
+ const union nf_inet_addr *vaddr;
+ __be16 cport;
+ __be16 vport;
+ __u16 protocol;
+ u16 af;
+};
+
/*
* IP_VS structure allocated for each dynamically scheduled connection
*/
@@ -626,13 +635,23 @@ enum {
IP_VS_DIR_LAST,
};
-extern struct ip_vs_conn *ip_vs_conn_in_get
-(int af, int protocol, const union nf_inet_addr *s_addr, __be16 s_port,
- const union nf_inet_addr *d_addr, __be16 d_port);
-
-extern struct ip_vs_conn *ip_vs_ct_in_get
-(int af, int protocol, const union nf_inet_addr *s_addr, __be16 s_port,
- const union nf_inet_addr *d_addr, __be16 d_port);
+static inline void ip_vs_conn_fill_param(int af, int protocol,
+ const union nf_inet_addr *caddr,
+ __be16 cport,
+ const union nf_inet_addr *vaddr,
+ __be16 vport,
+ struct ip_vs_conn_param *p)
+{
+ p->af = af;
+ p->protocol = protocol;
+ p->caddr = caddr;
+ p->cport = cport;
+ p->vaddr = vaddr;
+ p->vport = vport;
+}
+
+struct ip_vs_conn *ip_vs_conn_in_get(const struct ip_vs_conn_param *p);
+struct ip_vs_conn *ip_vs_ct_in_get(const struct ip_vs_conn_param *p);
struct ip_vs_conn * ip_vs_conn_in_get_proto(int af, const struct sk_buff *skb,
struct ip_vs_protocol *pp,
@@ -640,9 +659,7 @@ struct ip_vs_conn * ip_vs_conn_in_get_pr
unsigned int proto_off,
int inverse);
-extern struct ip_vs_conn *ip_vs_conn_out_get
-(int af, int protocol, const union nf_inet_addr *s_addr, __be16 s_port,
- const union nf_inet_addr *d_addr, __be16 d_port);
+struct ip_vs_conn *ip_vs_conn_out_get(const struct ip_vs_conn_param *p);
struct ip_vs_conn * ip_vs_conn_out_get_proto(int af, const struct sk_buff *skb,
struct ip_vs_protocol *pp,
@@ -658,11 +675,10 @@ static inline void __ip_vs_conn_put(stru
extern void ip_vs_conn_put(struct ip_vs_conn *cp);
extern void ip_vs_conn_fill_cport(struct ip_vs_conn *cp, __be16 cport);
-extern struct ip_vs_conn *
-ip_vs_conn_new(int af, int proto, const union nf_inet_addr *caddr, __be16 cport,
- const union nf_inet_addr *vaddr, __be16 vport,
- const union nf_inet_addr *daddr, __be16 dport, unsigned flags,
- struct ip_vs_dest *dest);
+struct ip_vs_conn *ip_vs_conn_new(const struct ip_vs_conn_param *p,
+ const union nf_inet_addr *daddr,
+ __be16 dport, unsigned flags,
+ struct ip_vs_dest *dest);
extern void ip_vs_conn_expire_now(struct ip_vs_conn *cp);
extern const char * ip_vs_state_name(__u16 proto, int state);
Index: lvs-test-2.6/net/netfilter/ipvs/ip_vs_conn.c
===================================================================
--- lvs-test-2.6.orig/net/netfilter/ipvs/ip_vs_conn.c 2010-10-02 09:35:34.000000000 +0900
+++ lvs-test-2.6/net/netfilter/ipvs/ip_vs_conn.c 2010-10-02 10:14:05.000000000 +0900
@@ -218,27 +218,26 @@ static inline int ip_vs_conn_unhash(stru
/*
* Gets ip_vs_conn associated with supplied parameters in the ip_vs_conn_tab.
* Called for pkts coming from OUTside-to-INside.
- * s_addr, s_port: pkt source address (foreign host)
- * d_addr, d_port: pkt dest address (load balancer)
+ * p->caddr, p->cport: pkt source address (foreign host)
+ * p->vaddr, p->vport: pkt dest address (load balancer)
*/
-static inline struct ip_vs_conn *__ip_vs_conn_in_get
-(int af, int protocol, const union nf_inet_addr *s_addr, __be16 s_port,
- const union nf_inet_addr *d_addr, __be16 d_port)
+static inline struct ip_vs_conn *
+__ip_vs_conn_in_get(const struct ip_vs_conn_param *p)
{
unsigned hash;
struct ip_vs_conn *cp;
- hash = ip_vs_conn_hashkey(af, protocol, s_addr, s_port);
+ hash = ip_vs_conn_hashkey(p->af, p->protocol, p->caddr, p->cport);
ct_read_lock(hash);
list_for_each_entry(cp, &ip_vs_conn_tab[hash], c_list) {
- if (cp->af == af &&
- ip_vs_addr_equal(af, s_addr, &cp->caddr) &&
- ip_vs_addr_equal(af, d_addr, &cp->vaddr) &&
- s_port == cp->cport && d_port == cp->vport &&
- ((!s_port) ^ (!(cp->flags & IP_VS_CONN_F_NO_CPORT))) &&
- protocol == cp->protocol) {
+ if (cp->af == p->af &&
+ ip_vs_addr_equal(p->af, p->caddr, &cp->caddr) &&
+ ip_vs_addr_equal(p->af, p->vaddr, &cp->vaddr) &&
+ p->cport == cp->cport && p->vport == cp->vport &&
+ ((!p->cport) ^ (!(cp->flags & IP_VS_CONN_F_NO_CPORT))) &&
+ p->protocol == cp->protocol) {
/* HIT */
atomic_inc(&cp->refcnt);
ct_read_unlock(hash);
@@ -251,71 +250,82 @@ static inline struct ip_vs_conn *__ip_vs
return NULL;
}
-struct ip_vs_conn *ip_vs_conn_in_get
-(int af, int protocol, const union nf_inet_addr *s_addr, __be16 s_port,
- const union nf_inet_addr *d_addr, __be16 d_port)
+struct ip_vs_conn *ip_vs_conn_in_get(const struct ip_vs_conn_param *p)
{
struct ip_vs_conn *cp;
- cp = __ip_vs_conn_in_get(af, protocol, s_addr, s_port, d_addr, d_port);
- if (!cp && atomic_read(&ip_vs_conn_no_cport_cnt))
- cp = __ip_vs_conn_in_get(af, protocol, s_addr, 0, d_addr,
- d_port);
+ cp = __ip_vs_conn_in_get(p);
+ if (!cp && atomic_read(&ip_vs_conn_no_cport_cnt)) {
+ struct ip_vs_conn_param cport_zero_p = *p;
+ cport_zero_p.cport = 0;
+ cp = __ip_vs_conn_in_get(&cport_zero_p);
+ }
IP_VS_DBG_BUF(9, "lookup/in %s %s:%d->%s:%d %s\n",
- ip_vs_proto_name(protocol),
- IP_VS_DBG_ADDR(af, s_addr), ntohs(s_port),
- IP_VS_DBG_ADDR(af, d_addr), ntohs(d_port),
+ ip_vs_proto_name(p->protocol),
+ IP_VS_DBG_ADDR(p->af, p->caddr), ntohs(p->cport),
+ IP_VS_DBG_ADDR(p->af, p->vaddr), ntohs(p->vport),
cp ? "hit" : "not hit");
return cp;
}
+static int
+ip_vs_conn_fill_param_proto(int af, const struct sk_buff *skb,
+ const struct ip_vs_iphdr *iph,
+ unsigned int proto_off, int inverse,
+ struct ip_vs_conn_param *p)
+{
+ __be16 _ports[2], *pptr;
+
+ pptr = skb_header_pointer(skb, proto_off, sizeof(_ports), _ports);
+ if (pptr == NULL)
+ return 1;
+
+ if (likely(!inverse))
+ ip_vs_conn_fill_param(af, iph->protocol, &iph->saddr, pptr[0],
+ &iph->daddr, pptr[1], p);
+ else
+ ip_vs_conn_fill_param(af, iph->protocol, &iph->daddr, pptr[1],
+ &iph->saddr, pptr[0], p);
+ return 0;
+}
+
struct ip_vs_conn *
ip_vs_conn_in_get_proto(int af, const struct sk_buff *skb,
struct ip_vs_protocol *pp,
const struct ip_vs_iphdr *iph,
unsigned int proto_off, int inverse)
{
- __be16 _ports[2], *pptr;
+ struct ip_vs_conn_param p;
- pptr = skb_header_pointer(skb, proto_off, sizeof(_ports), _ports);
- if (pptr == NULL)
+ if (ip_vs_conn_fill_param_proto(af, skb, iph, proto_off, inverse, &p))
return NULL;
- if (likely(!inverse))
- return ip_vs_conn_in_get(af, iph->protocol,
- &iph->saddr, pptr[0],
- &iph->daddr, pptr[1]);
- else
- return ip_vs_conn_in_get(af, iph->protocol,
- &iph->daddr, pptr[1],
- &iph->saddr, pptr[0]);
+ return ip_vs_conn_in_get(&p);
}
EXPORT_SYMBOL_GPL(ip_vs_conn_in_get_proto);
/* Get reference to connection template */
-struct ip_vs_conn *ip_vs_ct_in_get
-(int af, int protocol, const union nf_inet_addr *s_addr, __be16 s_port,
- const union nf_inet_addr *d_addr, __be16 d_port)
+struct ip_vs_conn *ip_vs_ct_in_get(const struct ip_vs_conn_param *p)
{
unsigned hash;
struct ip_vs_conn *cp;
- hash = ip_vs_conn_hashkey(af, protocol, s_addr, s_port);
+ hash = ip_vs_conn_hashkey(p->af, p->protocol, p->caddr, p->cport);
ct_read_lock(hash);
list_for_each_entry(cp, &ip_vs_conn_tab[hash], c_list) {
- if (cp->af == af &&
- ip_vs_addr_equal(af, s_addr, &cp->caddr) &&
+ if (cp->af == p->af &&
+ ip_vs_addr_equal(p->af, p->caddr, &cp->caddr) &&
/* protocol should only be IPPROTO_IP if
- * d_addr is a fwmark */
- ip_vs_addr_equal(protocol == IPPROTO_IP ? AF_UNSPEC : af,
- d_addr, &cp->vaddr) &&
- s_port == cp->cport && d_port == cp->vport &&
+ * p->vaddr is a fwmark */
+ ip_vs_addr_equal(p->protocol == IPPROTO_IP ? AF_UNSPEC :
+ p->af, p->vaddr, &cp->vaddr) &&
+ p->cport == cp->cport && p->vport == cp->vport &&
cp->flags & IP_VS_CONN_F_TEMPLATE &&
- protocol == cp->protocol) {
+ p->protocol == cp->protocol) {
/* HIT */
atomic_inc(&cp->refcnt);
goto out;
@@ -327,23 +337,19 @@ struct ip_vs_conn *ip_vs_ct_in_get
ct_read_unlock(hash);
IP_VS_DBG_BUF(9, "template lookup/in %s %s:%d->%s:%d %s\n",
- ip_vs_proto_name(protocol),
- IP_VS_DBG_ADDR(af, s_addr), ntohs(s_port),
- IP_VS_DBG_ADDR(af, d_addr), ntohs(d_port),
+ ip_vs_proto_name(p->protocol),
+ IP_VS_DBG_ADDR(p->af, p->caddr), ntohs(p->cport),
+ IP_VS_DBG_ADDR(p->af, p->vaddr), ntohs(p->vport),
cp ? "hit" : "not hit");
return cp;
}
-/*
- * Gets ip_vs_conn associated with supplied parameters in the ip_vs_conn_tab.
- * Called for pkts coming from inside-to-OUTside.
- * s_addr, s_port: pkt source address (inside host)
- * d_addr, d_port: pkt dest address (foreign host)
- */
-struct ip_vs_conn *ip_vs_conn_out_get
-(int af, int protocol, const union nf_inet_addr *s_addr, __be16 s_port,
- const union nf_inet_addr *d_addr, __be16 d_port)
+/* Gets ip_vs_conn associated with supplied parameters in the ip_vs_conn_tab.
+ * Called for pkts coming from inside-to-OUTside.
+ * p->caddr, p->cport: pkt source address (inside host)
+ * p->vaddr, p->vport: pkt dest address (foreign host) */
+struct ip_vs_conn *ip_vs_conn_out_get(const struct ip_vs_conn_param *p)
{
unsigned hash;
struct ip_vs_conn *cp, *ret=NULL;
@@ -351,16 +357,16 @@ struct ip_vs_conn *ip_vs_conn_out_get
/*
* Check for "full" addressed entries
*/
- hash = ip_vs_conn_hashkey(af, protocol, d_addr, d_port);
+ hash = ip_vs_conn_hashkey(p->af, p->protocol, p->vaddr, p->vport);
ct_read_lock(hash);
list_for_each_entry(cp, &ip_vs_conn_tab[hash], c_list) {
- if (cp->af == af &&
- ip_vs_addr_equal(af, d_addr, &cp->caddr) &&
- ip_vs_addr_equal(af, s_addr, &cp->daddr) &&
- d_port == cp->cport && s_port == cp->dport &&
- protocol == cp->protocol) {
+ if (cp->af == p->af &&
+ ip_vs_addr_equal(p->af, p->vaddr, &cp->caddr) &&
+ ip_vs_addr_equal(p->af, p->caddr, &cp->daddr) &&
+ p->vport == cp->cport && p->cport == cp->dport &&
+ p->protocol == cp->protocol) {
/* HIT */
atomic_inc(&cp->refcnt);
ret = cp;
@@ -371,9 +377,9 @@ struct ip_vs_conn *ip_vs_conn_out_get
ct_read_unlock(hash);
IP_VS_DBG_BUF(9, "lookup/out %s %s:%d->%s:%d %s\n",
- ip_vs_proto_name(protocol),
- IP_VS_DBG_ADDR(af, s_addr), ntohs(s_port),
- IP_VS_DBG_ADDR(af, d_addr), ntohs(d_port),
+ ip_vs_proto_name(p->protocol),
+ IP_VS_DBG_ADDR(p->af, p->caddr), ntohs(p->cport),
+ IP_VS_DBG_ADDR(p->af, p->vaddr), ntohs(p->vport),
ret ? "hit" : "not hit");
return ret;
@@ -385,20 +391,12 @@ ip_vs_conn_out_get_proto(int af, const s
const struct ip_vs_iphdr *iph,
unsigned int proto_off, int inverse)
{
- __be16 _ports[2], *pptr;
+ struct ip_vs_conn_param p;
- pptr = skb_header_pointer(skb, proto_off, sizeof(_ports), _ports);
- if (pptr == NULL)
+ if (!ip_vs_conn_fill_param_proto(af, skb, iph, proto_off, inverse, &p))
return NULL;
- if (likely(!inverse))
- return ip_vs_conn_out_get(af, iph->protocol,
- &iph->saddr, pptr[0],
- &iph->daddr, pptr[1]);
- else
- return ip_vs_conn_out_get(af, iph->protocol,
- &iph->daddr, pptr[1],
- &iph->saddr, pptr[0]);
+ return ip_vs_conn_out_get(&p);
}
EXPORT_SYMBOL_GPL(ip_vs_conn_out_get_proto);
@@ -758,13 +756,12 @@ void ip_vs_conn_expire_now(struct ip_vs_
* Create a new connection entry and hash it into the ip_vs_conn_tab
*/
struct ip_vs_conn *
-ip_vs_conn_new(int af, int proto, const union nf_inet_addr *caddr, __be16 cport,
- const union nf_inet_addr *vaddr, __be16 vport,
+ip_vs_conn_new(const struct ip_vs_conn_param *p,
const union nf_inet_addr *daddr, __be16 dport, unsigned flags,
struct ip_vs_dest *dest)
{
struct ip_vs_conn *cp;
- struct ip_vs_protocol *pp = ip_vs_proto_get(proto);
+ struct ip_vs_protocol *pp = ip_vs_proto_get(p->protocol);
cp = kmem_cache_zalloc(ip_vs_conn_cachep, GFP_ATOMIC);
if (cp == NULL) {
@@ -774,14 +771,14 @@ ip_vs_conn_new(int af, int proto, const
INIT_LIST_HEAD(&cp->c_list);
setup_timer(&cp->timer, ip_vs_conn_expire, (unsigned long)cp);
- cp->af = af;
- cp->protocol = proto;
- ip_vs_addr_copy(af, &cp->caddr, caddr);
- cp->cport = cport;
- ip_vs_addr_copy(af, &cp->vaddr, vaddr);
- cp->vport = vport;
+ cp->af = p->af;
+ cp->protocol = p->protocol;
+ ip_vs_addr_copy(p->af, &cp->caddr, p->caddr);
+ cp->cport = p->cport;
+ ip_vs_addr_copy(p->af, &cp->vaddr, p->vaddr);
+ cp->vport = p->vport;
/* proto should only be IPPROTO_IP if d_addr is a fwmark */
- ip_vs_addr_copy(proto == IPPROTO_IP ? AF_UNSPEC : af,
+ ip_vs_addr_copy(p->protocol == IPPROTO_IP ? AF_UNSPEC : p->af,
&cp->daddr, daddr);
cp->dport = dport;
cp->flags = flags;
@@ -810,7 +807,7 @@ ip_vs_conn_new(int af, int proto, const
/* Bind its packet transmitter */
#ifdef CONFIG_IP_VS_IPV6
- if (af == AF_INET6)
+ if (p->af == AF_INET6)
ip_vs_bind_xmit_v6(cp);
else
#endif
Index: lvs-test-2.6/net/netfilter/ipvs/ip_vs_core.c
===================================================================
--- lvs-test-2.6.orig/net/netfilter/ipvs/ip_vs_core.c 2010-10-02 09:35:51.000000000 +0900
+++ lvs-test-2.6/net/netfilter/ipvs/ip_vs_core.c 2010-10-02 10:14:05.000000000 +0900
@@ -193,14 +193,11 @@ ip_vs_sched_persist(struct ip_vs_service
struct ip_vs_iphdr iph;
struct ip_vs_dest *dest;
struct ip_vs_conn *ct;
- int protocol = iph.protocol;
__be16 dport = 0; /* destination port to forward */
- __be16 vport = 0; /* virtual service port */
unsigned int flags;
+ struct ip_vs_conn_param param;
union nf_inet_addr snet; /* source network of the client,
after masking */
- const union nf_inet_addr fwmark = { .ip = htonl(svc->fwmark) };
- const union nf_inet_addr *vaddr = &iph.daddr;
ip_vs_fill_iphdr(svc->af, skb_network_header(skb), &iph);
@@ -232,6 +229,11 @@ ip_vs_sched_persist(struct ip_vs_service
* is created for other persistent services.
*/
{
+ int protocol = iph.protocol;
+ const union nf_inet_addr *vaddr = &iph.daddr;
+ const union nf_inet_addr fwmark = { .ip = htonl(svc->fwmark) };
+ __be16 vport = 0;
+
if (ports[1] == svc->port) {
/* non-FTP template:
* <protocol, caddr, 0, vaddr, vport, daddr, dport>
@@ -253,11 +255,12 @@ ip_vs_sched_persist(struct ip_vs_service
vaddr = &fwmark;
}
}
+ ip_vs_conn_fill_param(svc->af, protocol, &snet, 0,
+ vaddr, vport, ¶m);
}
/* Check if a template already exists */
- ct = ip_vs_ct_in_get(svc->af, protocol, &snet, 0, vaddr, vport);
-
+ ct = ip_vs_ct_in_get(¶m);
if (!ct || !ip_vs_check_template(ct)) {
/* No template found or the dest of the connection
* template is not available.
@@ -272,8 +275,7 @@ ip_vs_sched_persist(struct ip_vs_service
dport = dest->port;
/* Create a template */
- ct = ip_vs_conn_new(svc->af, protocol, &snet, 0,vaddr, vport,
- &dest->addr, dport,
+ ct = ip_vs_conn_new(¶m, &dest->addr, dport,
IP_VS_CONN_F_TEMPLATE, dest);
if (ct == NULL)
return NULL;
@@ -291,12 +293,9 @@ ip_vs_sched_persist(struct ip_vs_service
/*
* Create a new connection according to the template
*/
- cp = ip_vs_conn_new(svc->af, iph.protocol,
- &iph.saddr, ports[0],
- &iph.daddr, ports[1],
- &dest->addr, dport,
- flags,
- dest);
+ ip_vs_conn_fill_param(svc->af, iph.protocol, &iph.saddr, ports[0],
+ &iph.daddr, ports[1], ¶m);
+ cp = ip_vs_conn_new(¶m, &dest->addr, dport, flags, dest);
if (cp == NULL) {
ip_vs_conn_put(ct);
return NULL;
@@ -363,14 +362,16 @@ ip_vs_schedule(struct ip_vs_service *svc
/*
* Create a connection entry.
*/
- cp = ip_vs_conn_new(svc->af, iph.protocol,
- &iph.saddr, pptr[0],
- &iph.daddr, pptr[1],
- &dest->addr, dest->port ? dest->port : pptr[1],
- flags,
- dest);
- if (cp == NULL)
- return NULL;
+ {
+ struct ip_vs_conn_param p;
+ ip_vs_conn_fill_param(svc->af, iph.protocol, &iph.saddr,
+ pptr[0], &iph.daddr, pptr[1], &p);
+ cp = ip_vs_conn_new(&p, &dest->addr,
+ dest->port ? dest->port : pptr[1],
+ flags, dest);
+ if (!cp)
+ return NULL;
+ }
IP_VS_DBG_BUF(6, "Schedule fwd:%c c:%s:%u v:%s:%u "
"d:%s:%u conn->flags:%X conn->refcnt:%d\n",
@@ -426,14 +427,17 @@ int ip_vs_leave(struct ip_vs_service *sv
/* create a new connection entry */
IP_VS_DBG(6, "%s(): create a cache_bypass entry\n", __func__);
- cp = ip_vs_conn_new(svc->af, iph.protocol,
- &iph.saddr, pptr[0],
- &iph.daddr, pptr[1],
- &daddr, 0,
- IP_VS_CONN_F_BYPASS | flags,
- NULL);
- if (cp == NULL)
- return NF_DROP;
+ {
+ struct ip_vs_conn_param p;
+ ip_vs_conn_fill_param(svc->af, iph.protocol,
+ &iph.saddr, pptr[0],
+ &iph.daddr, pptr[1], &p);
+ cp = ip_vs_conn_new(&p, &daddr, 0,
+ IP_VS_CONN_F_BYPASS | flags,
+ NULL);
+ if (!cp)
+ return NF_DROP;
+ }
/* statistics */
ip_vs_in_stats(cp, skb);
Index: lvs-test-2.6/net/netfilter/ipvs/ip_vs_ftp.c
===================================================================
--- lvs-test-2.6.orig/net/netfilter/ipvs/ip_vs_ftp.c 2010-10-02 09:35:34.000000000 +0900
+++ lvs-test-2.6/net/netfilter/ipvs/ip_vs_ftp.c 2010-10-02 09:35:53.000000000 +0900
@@ -195,13 +195,17 @@ static int ip_vs_ftp_out(struct ip_vs_ap
/*
* Now update or create an connection entry for it
*/
- n_cp = ip_vs_conn_out_get(AF_INET, iph->protocol, &from, port,
- &cp->caddr, 0);
+ {
+ struct ip_vs_conn_param p;
+ ip_vs_conn_fill_param(AF_INET, iph->protocol,
+ &from, port, &cp->caddr, 0, &p);
+ n_cp = ip_vs_conn_out_get(&p);
+ }
if (!n_cp) {
- n_cp = ip_vs_conn_new(AF_INET, IPPROTO_TCP,
- &cp->caddr, 0,
- &cp->vaddr, port,
- &from, port,
+ struct ip_vs_conn_param p;
+ ip_vs_conn_fill_param(AF_INET, IPPROTO_TCP, &cp->caddr,
+ 0, &cp->vaddr, port, &p);
+ n_cp = ip_vs_conn_new(&p, &from, port,
IP_VS_CONN_F_NO_CPORT |
IP_VS_CONN_F_NFCT,
cp->dest);
@@ -347,21 +351,22 @@ static int ip_vs_ftp_in(struct ip_vs_app
ip_vs_proto_name(iph->protocol),
&to.ip, ntohs(port), &cp->vaddr.ip, 0);
- n_cp = ip_vs_conn_in_get(AF_INET, iph->protocol,
- &to, port,
- &cp->vaddr, htons(ntohs(cp->vport)-1));
- if (!n_cp) {
- n_cp = ip_vs_conn_new(AF_INET, IPPROTO_TCP,
- &to, port,
+ {
+ struct ip_vs_conn_param p;
+ ip_vs_conn_fill_param(AF_INET, iph->protocol, &to, port,
&cp->vaddr, htons(ntohs(cp->vport)-1),
- &cp->daddr, htons(ntohs(cp->dport)-1),
- IP_VS_CONN_F_NFCT,
- cp->dest);
- if (!n_cp)
- return 0;
+ &p);
+ n_cp = ip_vs_conn_in_get(&p);
+ if (!n_cp) {
+ n_cp = ip_vs_conn_new(&p, &cp->daddr,
+ htons(ntohs(cp->dport)-1),
+ IP_VS_CONN_F_NFCT, cp->dest);
+ if (!n_cp)
+ return 0;
- /* add its controller */
- ip_vs_control_add(n_cp, cp);
+ /* add its controller */
+ ip_vs_control_add(n_cp, cp);
+ }
}
/*
Index: lvs-test-2.6/net/netfilter/ipvs/ip_vs_nfct.c
===================================================================
--- lvs-test-2.6.orig/net/netfilter/ipvs/ip_vs_nfct.c 2010-10-02 09:35:34.000000000 +0900
+++ lvs-test-2.6/net/netfilter/ipvs/ip_vs_nfct.c 2010-10-02 09:35:53.000000000 +0900
@@ -140,6 +140,7 @@ static void ip_vs_nfct_expect_callback(s
{
struct nf_conntrack_tuple *orig, new_reply;
struct ip_vs_conn *cp;
+ struct ip_vs_conn_param p;
if (exp->tuple.src.l3num != PF_INET)
return;
@@ -154,9 +155,10 @@ static void ip_vs_nfct_expect_callback(s
/* RS->CLIENT */
orig = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
- cp = ip_vs_conn_out_get(exp->tuple.src.l3num, orig->dst.protonum,
- &orig->src.u3, orig->src.u.tcp.port,
- &orig->dst.u3, orig->dst.u.tcp.port);
+ ip_vs_conn_fill_param(exp->tuple.src.l3num, orig->dst.protonum,
+ &orig->src.u3, orig->src.u.tcp.port,
+ &orig->dst.u3, orig->dst.u.tcp.port, &p);
+ cp = ip_vs_conn_out_get(&p);
if (cp) {
/* Change reply CLIENT->RS to CLIENT->VS */
new_reply = ct->tuplehash[IP_CT_DIR_REPLY].tuple;
@@ -176,9 +178,7 @@ static void ip_vs_nfct_expect_callback(s
}
/* CLIENT->VS */
- cp = ip_vs_conn_in_get(exp->tuple.src.l3num, orig->dst.protonum,
- &orig->src.u3, orig->src.u.tcp.port,
- &orig->dst.u3, orig->dst.u.tcp.port);
+ cp = ip_vs_conn_in_get(&p);
if (cp) {
/* Change reply VS->CLIENT to RS->CLIENT */
new_reply = ct->tuplehash[IP_CT_DIR_REPLY].tuple;
Index: lvs-test-2.6/net/netfilter/ipvs/ip_vs_proto_ah_esp.c
===================================================================
--- lvs-test-2.6.orig/net/netfilter/ipvs/ip_vs_proto_ah_esp.c 2010-10-02 09:35:34.000000000 +0900
+++ lvs-test-2.6/net/netfilter/ipvs/ip_vs_proto_ah_esp.c 2010-10-02 10:09:54.000000000 +0900
@@ -40,6 +40,19 @@ struct isakmp_hdr {
#define PORT_ISAKMP 500
+static void
+ah_esp_conn_fill_param_proto(int af, const struct ip_vs_iphdr *iph,
+ int inverse, struct ip_vs_conn_param *p)
+{
+ if (likely(!inverse))
+ ip_vs_conn_fill_param(af, IPPROTO_UDP,
+ &iph->saddr, htons(PORT_ISAKMP),
+ &iph->daddr, htons(PORT_ISAKMP), p);
+ else
+ ip_vs_conn_fill_param(af, IPPROTO_UDP,
+ &iph->daddr, htons(PORT_ISAKMP),
+ &iph->saddr, htons(PORT_ISAKMP), p);
+}
static struct ip_vs_conn *
ah_esp_conn_in_get(int af, const struct sk_buff *skb, struct ip_vs_protocol *pp,
@@ -47,21 +60,10 @@ ah_esp_conn_in_get(int af, const struct
int inverse)
{
struct ip_vs_conn *cp;
+ struct ip_vs_conn_param p;
- if (likely(!inverse)) {
- cp = ip_vs_conn_in_get(af, IPPROTO_UDP,
- &iph->saddr,
- htons(PORT_ISAKMP),
- &iph->daddr,
- htons(PORT_ISAKMP));
- } else {
- cp = ip_vs_conn_in_get(af, IPPROTO_UDP,
- &iph->daddr,
- htons(PORT_ISAKMP),
- &iph->saddr,
- htons(PORT_ISAKMP));
- }
-
+ ah_esp_conn_fill_param_proto(af, iph, inverse, &p);
+ cp = ip_vs_conn_in_get(&p);
if (!cp) {
/*
* We are not sure if the packet is from our
@@ -87,21 +89,10 @@ ah_esp_conn_out_get(int af, const struct
int inverse)
{
struct ip_vs_conn *cp;
+ struct ip_vs_conn_param p;
- if (likely(!inverse)) {
- cp = ip_vs_conn_out_get(af, IPPROTO_UDP,
- &iph->saddr,
- htons(PORT_ISAKMP),
- &iph->daddr,
- htons(PORT_ISAKMP));
- } else {
- cp = ip_vs_conn_out_get(af, IPPROTO_UDP,
- &iph->daddr,
- htons(PORT_ISAKMP),
- &iph->saddr,
- htons(PORT_ISAKMP));
- }
-
+ ah_esp_conn_fill_param_proto(af, iph, inverse, &p);
+ cp = ip_vs_conn_out_get(&p);
if (!cp) {
IP_VS_DBG_BUF(12, "Unknown ISAKMP entry for inout packet "
"%s%s %s->%s\n",
Index: lvs-test-2.6/net/netfilter/ipvs/ip_vs_sync.c
===================================================================
--- lvs-test-2.6.orig/net/netfilter/ipvs/ip_vs_sync.c 2010-10-02 09:35:35.000000000 +0900
+++ lvs-test-2.6/net/netfilter/ipvs/ip_vs_sync.c 2010-10-02 10:14:05.000000000 +0900
@@ -301,6 +301,7 @@ static void ip_vs_process_message(const
struct ip_vs_conn *cp;
struct ip_vs_protocol *pp;
struct ip_vs_dest *dest;
+ struct ip_vs_conn_param param;
char *p;
int i;
@@ -370,18 +371,17 @@ static void ip_vs_process_message(const
}
}
- if (!(flags & IP_VS_CONN_F_TEMPLATE))
- cp = ip_vs_conn_in_get(AF_INET, s->protocol,
- (union nf_inet_addr *)&s->caddr,
- s->cport,
- (union nf_inet_addr *)&s->vaddr,
- s->vport);
- else
- cp = ip_vs_ct_in_get(AF_INET, s->protocol,
- (union nf_inet_addr *)&s->caddr,
- s->cport,
- (union nf_inet_addr *)&s->vaddr,
- s->vport);
+ {
+ ip_vs_conn_fill_param(AF_INET, s->protocol,
+ (union nf_inet_addr *)&s->caddr,
+ s->cport,
+ (union nf_inet_addr *)&s->vaddr,
+ s->vport, ¶m);
+ if (!(flags & IP_VS_CONN_F_TEMPLATE))
+ cp = ip_vs_conn_in_get(¶m);
+ else
+ cp = ip_vs_ct_in_get(¶m);
+ }
if (!cp) {
/*
* Find the appropriate destination for the connection.
@@ -406,14 +406,9 @@ static void ip_vs_process_message(const
else
flags &= ~IP_VS_CONN_F_INACTIVE;
}
- cp = ip_vs_conn_new(AF_INET, s->protocol,
- (union nf_inet_addr *)&s->caddr,
- s->cport,
- (union nf_inet_addr *)&s->vaddr,
- s->vport,
+ cp = ip_vs_conn_new(¶m,
(union nf_inet_addr *)&s->daddr,
- s->dport,
- flags, dest);
+ s->dport, flags, dest);
if (dest)
atomic_dec(&dest->refcnt);
if (!cp) {
^ permalink raw reply
* Re: [patch v2 04/12] [PATCH 04/12] IPVS: Add struct ip_vs_conn_param
From: Simon Horman @ 2010-10-02 1:13 UTC (permalink / raw)
To: Julian Anastasov
Cc: lvs-devel, netdev, netfilter, netfilter-devel, Jan Engelhardt,
Stephen Hemminger, Wensong Zhang, Patrick McHardy
In-Reply-To: <alpine.LFD.2.00.1010012347290.2194@ja.ssi.bg>
On Fri, Oct 01, 2010 at 11:58:04PM +0300, Julian Anastasov wrote:
>
> Hello,
>
> On Fri, 1 Oct 2010, Simon Horman wrote:
>
> >+static int
> >+ip_vs_conn_fill_param_proto(int af, const struct sk_buff *skb,
> >+ const struct ip_vs_iphdr *iph,
> >+ unsigned int proto_off, int inverse,
> >+ struct ip_vs_conn_param *p)
> >+{
> >+ __be16 _ports[2], *pptr;
> >+
> >+ pptr = skb_header_pointer(skb, proto_off, sizeof(_ports), _ports);
> >+ if (pptr == NULL)
> >+ return 1;
> >+
> >+ if (likely(!inverse))
> >+ ip_vs_conn_fill_param(af, iph->protocol, &iph->saddr, pptr[0],
> >+ &iph->daddr, pptr[1], p);
> >+ else
>
> Next line is wrong for inverse=1, must be
> &iph->daddr, pptr[1], &iph->saddr, pptr[0]
Thanks, fixed.
> >+ ip_vs_conn_fill_param(af, iph->protocol, &iph->saddr, pptr[0],
> >+ &iph->daddr, pptr[1], p);
> >+ return 0;
> >+}
> >+
>
> May be comments before ip_vs_conn_out_get should be
> changed:
>
> >@@ -341,9 +351,7 @@ struct ip_vs_conn *ip_vs_ct_in_get
> > * s_addr, s_port: pkt source address (inside host)
> > * d_addr, d_port: pkt dest address (foreign host)
> > */
> >-struct ip_vs_conn *ip_vs_conn_out_get
> >-(int af, int protocol, const union nf_inet_addr *s_addr, __be16 s_port,
> >- const union nf_inet_addr *d_addr, __be16 d_port)
> >+struct ip_vs_conn *ip_vs_conn_out_get(const struct ip_vs_conn_param *p)
I have updated it to:
/* Gets ip_vs_conn associated with supplied parameters in the
* ip_vs_conn_tab.
* Called for pkts coming from inside-to-OUTside.
* p->caddr, p->cport: pkt source address (inside host)
* p->vaddr, p->vport: pkt dest address (foreign host) */
> >===================================================================
> >--- lvs-test-2.6.orig/net/netfilter/ipvs/ip_vs_core.c 2010-10-01 22:06:23.000000000 +0900
> >+++ lvs-test-2.6/net/netfilter/ipvs/ip_vs_core.c 2010-10-01 22:10:46.000000000 +0900
> >@@ -193,14 +193,11 @@ ip_vs_sched_persist(struct ip_vs_service
> > struct ip_vs_iphdr iph;
> > struct ip_vs_dest *dest;
> > struct ip_vs_conn *ct;
> >- int protocol = iph.protocol;
> > __be16 dport = 0; /* destination port to forward */
> >- __be16 vport = 0; /* virtual service port */
> > unsigned int flags;
> > union nf_inet_addr snet; /* source network of the client,
> > after masking */
> >- const union nf_inet_addr fwmark = { .ip = htonl(svc->fwmark) };
> >- const union nf_inet_addr *vaddr = &iph.daddr;
> >+ struct ip_vs_conn_param param;
> >
> > ip_vs_fill_iphdr(svc->af, skb_network_header(skb), &iph);
> >
> >@@ -232,6 +229,11 @@ ip_vs_sched_persist(struct ip_vs_service
> > * is created for other persistent services.
> > */
> > {
> >+ int protocol = iph.protocol;
> >+ const union nf_inet_addr *vaddr = &iph.daddr;
> >+ const union nf_inet_addr fwmark = { .ip = htonl(svc->fwmark) };
> >+ __be16 vport = 0;
> >+
> > if (ports[1] == svc->port) {
> > /* non-FTP template:
> > * <protocol, caddr, 0, vaddr, vport, daddr, dport>
> >@@ -253,11 +255,12 @@ ip_vs_sched_persist(struct ip_vs_service
> > vaddr = &fwmark;
> > }
> > }
> >+ ip_vs_conn_fill_param(svc->af, protocol, &snet, 0,
> >+ vaddr, vport, ¶m);
> > }
> >
> > /* Check if a template already exists */
> >- ct = ip_vs_ct_in_get(svc->af, protocol, &snet, 0, vaddr, vport);
> >-
> >+ ct = ip_vs_ct_in_get(¶m);
> > if (!ct || !ip_vs_check_template(ct)) {
> > /* No template found or the dest of the connection
> > * template is not available.
> >@@ -272,8 +275,7 @@ ip_vs_sched_persist(struct ip_vs_service
> > dport = dest->port;
> >
> > /* Create a template */
> >- ct = ip_vs_conn_new(svc->af, protocol, &snet, 0,vaddr, vport,
> >- &dest->addr, dport,
> >+ ct = ip_vs_conn_new(¶m, &dest->addr, dport,
> > IP_VS_CONN_F_TEMPLATE, dest);
> > if (ct == NULL)
> > return NULL;
> >@@ -291,12 +293,7 @@ ip_vs_sched_persist(struct ip_vs_service
> > /*
> > * Create a new connection according to the template
> > */
>
> Missing ip_vs_conn_fill_param here?
Ooops, yes. I think that for some reason I thought it wasn't necessary.
I have added the following:
ip_vs_conn_fill_param(svc->af, iph.protocol, &iph.saddr, ports[0],
&iph.daddr, ports[1], ¶m);
> >- cp = ip_vs_conn_new(svc->af, iph.protocol,
> >- &iph.saddr, ports[0],
> >- &iph.daddr, ports[1],
> >- &dest->addr, dport,
> >- flags,
> >- dest);
> >+ cp = ip_vs_conn_new(¶m, &dest->addr, dport, flags, dest);
> > if (cp == NULL) {
> > ip_vs_conn_put(ct);
> > return NULL;
>
> >===================================================================
> >--- lvs-test-2.6.orig/net/netfilter/ipvs/ip_vs_proto_ah_esp.c 2010-10-01 21:55:19.000000000 +0900
> >+++ lvs-test-2.6/net/netfilter/ipvs/ip_vs_proto_ah_esp.c 2010-10-01 22:23:33.000000000 +0900
> >@@ -40,6 +40,19 @@ struct isakmp_hdr {
> >
> >#define PORT_ISAKMP 500
> >
> >+static void
> >+ah_esp_conn_fill_param_proto(int af, const struct ip_vs_iphdr *iph,
> >+ int inverse, struct ip_vs_conn_param *p)
> >+{
> >+ if (likely(!inverse))
> >+ ip_vs_conn_fill_param(af, IPPROTO_UDP,
> >+ &iph->saddr, htons(PORT_ISAKMP),
> >+ &iph->daddr, htons(PORT_ISAKMP), p);
> >+ else
>
> For inverse=1 iph->protocol must be IPPROTO_UDP
> and &iph->daddr before &iph->saddr:
Thanks, fixed.
> >+ ip_vs_conn_fill_param(af, iph->protocol,
> >+ &iph->saddr, htons(PORT_ISAKMP),
> >+ &iph->daddr, htons(PORT_ISAKMP), p);
> >+}
I will repost the entire series after addressing the concerns
you raised with several of the other patches. But for reference
here is the revised version of this patch.
>From 335c0ae2b64d8071762c50ea20b4f55bb12c2a5b Mon Sep 17 00:00:00 2001
From: Simon Horman <horms@verge.net.au>
Date: Sun, 22 Aug 2010 21:37:52 +0900
Subject: [patch v2.1 04/12] IPVS: Add struct ip_vs_conn_param
Signed-off-by: Simon Horman <horms@verge.net.au>
---
The motivation for this is to allow persistence engine modules to
fill in the parameters.
v0.3
* Add missing changes to ip_vs_ftp.c
v2
* make "union nf_inet_addr fwmark" const
* Update for the recent addition of ip_vs_nfct.c
v2.1
* As suggested by Julian Anastasov
- Correct logic for inverse case in ip_vs_conn_fill_param_proto()
and ah_esp_conn_fill_param_proto()
- Update ip_vs_conn_out_get()'s comments as its parameters have changed
- Add missing call to ip_vs_conn_fill_param() before the second
invocation of ip_vs_conn_new() in ip_vs_sched_persist()
Index: lvs-test-2.6/include/net/ip_vs.h
===================================================================
--- lvs-test-2.6.orig/include/net/ip_vs.h 2010-10-01 21:56:39.000000000 +0900
+++ lvs-test-2.6/include/net/ip_vs.h 2010-10-01 22:07:22.000000000 +0900
@@ -357,6 +357,15 @@ struct ip_vs_protocol {
extern struct ip_vs_protocol * ip_vs_proto_get(unsigned short proto);
+struct ip_vs_conn_param {
+ const union nf_inet_addr *caddr;
+ const union nf_inet_addr *vaddr;
+ __be16 cport;
+ __be16 vport;
+ __u16 protocol;
+ u16 af;
+};
+
/*
* IP_VS structure allocated for each dynamically scheduled connection
*/
@@ -626,13 +635,23 @@ enum {
IP_VS_DIR_LAST,
};
-extern struct ip_vs_conn *ip_vs_conn_in_get
-(int af, int protocol, const union nf_inet_addr *s_addr, __be16 s_port,
- const union nf_inet_addr *d_addr, __be16 d_port);
-
-extern struct ip_vs_conn *ip_vs_ct_in_get
-(int af, int protocol, const union nf_inet_addr *s_addr, __be16 s_port,
- const union nf_inet_addr *d_addr, __be16 d_port);
+static inline void ip_vs_conn_fill_param(int af, int protocol,
+ const union nf_inet_addr *caddr,
+ __be16 cport,
+ const union nf_inet_addr *vaddr,
+ __be16 vport,
+ struct ip_vs_conn_param *p)
+{
+ p->af = af;
+ p->protocol = protocol;
+ p->caddr = caddr;
+ p->cport = cport;
+ p->vaddr = vaddr;
+ p->vport = vport;
+}
+
+struct ip_vs_conn *ip_vs_conn_in_get(const struct ip_vs_conn_param *p);
+struct ip_vs_conn *ip_vs_ct_in_get(const struct ip_vs_conn_param *p);
struct ip_vs_conn * ip_vs_conn_in_get_proto(int af, const struct sk_buff *skb,
struct ip_vs_protocol *pp,
@@ -640,9 +659,7 @@ struct ip_vs_conn * ip_vs_conn_in_get_pr
unsigned int proto_off,
int inverse);
-extern struct ip_vs_conn *ip_vs_conn_out_get
-(int af, int protocol, const union nf_inet_addr *s_addr, __be16 s_port,
- const union nf_inet_addr *d_addr, __be16 d_port);
+struct ip_vs_conn *ip_vs_conn_out_get(const struct ip_vs_conn_param *p);
struct ip_vs_conn * ip_vs_conn_out_get_proto(int af, const struct sk_buff *skb,
struct ip_vs_protocol *pp,
@@ -658,11 +675,10 @@ static inline void __ip_vs_conn_put(stru
extern void ip_vs_conn_put(struct ip_vs_conn *cp);
extern void ip_vs_conn_fill_cport(struct ip_vs_conn *cp, __be16 cport);
-extern struct ip_vs_conn *
-ip_vs_conn_new(int af, int proto, const union nf_inet_addr *caddr, __be16 cport,
- const union nf_inet_addr *vaddr, __be16 vport,
- const union nf_inet_addr *daddr, __be16 dport, unsigned flags,
- struct ip_vs_dest *dest);
+struct ip_vs_conn *ip_vs_conn_new(const struct ip_vs_conn_param *p,
+ const union nf_inet_addr *daddr,
+ __be16 dport, unsigned flags,
+ struct ip_vs_dest *dest);
extern void ip_vs_conn_expire_now(struct ip_vs_conn *cp);
extern const char * ip_vs_state_name(__u16 proto, int state);
Index: lvs-test-2.6/net/netfilter/ipvs/ip_vs_conn.c
===================================================================
--- lvs-test-2.6.orig/net/netfilter/ipvs/ip_vs_conn.c 2010-10-01 21:56:39.000000000 +0900
+++ lvs-test-2.6/net/netfilter/ipvs/ip_vs_conn.c 2010-10-01 22:07:22.000000000 +0900
@@ -218,27 +218,26 @@ static inline int ip_vs_conn_unhash(stru
/*
* Gets ip_vs_conn associated with supplied parameters in the ip_vs_conn_tab.
* Called for pkts coming from OUTside-to-INside.
- * s_addr, s_port: pkt source address (foreign host)
- * d_addr, d_port: pkt dest address (load balancer)
+ * p->caddr, p->cport: pkt source address (foreign host)
+ * p->vaddr, p->vport: pkt dest address (load balancer)
*/
-static inline struct ip_vs_conn *__ip_vs_conn_in_get
-(int af, int protocol, const union nf_inet_addr *s_addr, __be16 s_port,
- const union nf_inet_addr *d_addr, __be16 d_port)
+static inline struct ip_vs_conn *
+__ip_vs_conn_in_get(const struct ip_vs_conn_param *p)
{
unsigned hash;
struct ip_vs_conn *cp;
- hash = ip_vs_conn_hashkey(af, protocol, s_addr, s_port);
+ hash = ip_vs_conn_hashkey(p->af, p->protocol, p->caddr, p->cport);
ct_read_lock(hash);
list_for_each_entry(cp, &ip_vs_conn_tab[hash], c_list) {
- if (cp->af == af &&
- ip_vs_addr_equal(af, s_addr, &cp->caddr) &&
- ip_vs_addr_equal(af, d_addr, &cp->vaddr) &&
- s_port == cp->cport && d_port == cp->vport &&
- ((!s_port) ^ (!(cp->flags & IP_VS_CONN_F_NO_CPORT))) &&
- protocol == cp->protocol) {
+ if (cp->af == p->af &&
+ ip_vs_addr_equal(p->af, p->caddr, &cp->caddr) &&
+ ip_vs_addr_equal(p->af, p->vaddr, &cp->vaddr) &&
+ p->cport == cp->cport && p->vport == cp->vport &&
+ ((!p->cport) ^ (!(cp->flags & IP_VS_CONN_F_NO_CPORT))) &&
+ p->protocol == cp->protocol) {
/* HIT */
atomic_inc(&cp->refcnt);
ct_read_unlock(hash);
@@ -251,71 +250,82 @@ static inline struct ip_vs_conn *__ip_vs
return NULL;
}
-struct ip_vs_conn *ip_vs_conn_in_get
-(int af, int protocol, const union nf_inet_addr *s_addr, __be16 s_port,
- const union nf_inet_addr *d_addr, __be16 d_port)
+struct ip_vs_conn *ip_vs_conn_in_get(const struct ip_vs_conn_param *p)
{
struct ip_vs_conn *cp;
- cp = __ip_vs_conn_in_get(af, protocol, s_addr, s_port, d_addr, d_port);
- if (!cp && atomic_read(&ip_vs_conn_no_cport_cnt))
- cp = __ip_vs_conn_in_get(af, protocol, s_addr, 0, d_addr,
- d_port);
+ cp = __ip_vs_conn_in_get(p);
+ if (!cp && atomic_read(&ip_vs_conn_no_cport_cnt)) {
+ struct ip_vs_conn_param cport_zero_p = *p;
+ cport_zero_p.cport = 0;
+ cp = __ip_vs_conn_in_get(&cport_zero_p);
+ }
IP_VS_DBG_BUF(9, "lookup/in %s %s:%d->%s:%d %s\n",
- ip_vs_proto_name(protocol),
- IP_VS_DBG_ADDR(af, s_addr), ntohs(s_port),
- IP_VS_DBG_ADDR(af, d_addr), ntohs(d_port),
+ ip_vs_proto_name(p->protocol),
+ IP_VS_DBG_ADDR(p->af, p->caddr), ntohs(p->cport),
+ IP_VS_DBG_ADDR(p->af, p->vaddr), ntohs(p->vport),
cp ? "hit" : "not hit");
return cp;
}
+static int
+ip_vs_conn_fill_param_proto(int af, const struct sk_buff *skb,
+ const struct ip_vs_iphdr *iph,
+ unsigned int proto_off, int inverse,
+ struct ip_vs_conn_param *p)
+{
+ __be16 _ports[2], *pptr;
+
+ pptr = skb_header_pointer(skb, proto_off, sizeof(_ports), _ports);
+ if (pptr == NULL)
+ return 1;
+
+ if (likely(!inverse))
+ ip_vs_conn_fill_param(af, iph->protocol, &iph->saddr, pptr[0],
+ &iph->daddr, pptr[1], p);
+ else
+ ip_vs_conn_fill_param(af, iph->protocol, &iph->saddr, pptr[0],
+ &iph->daddr, pptr[1], p);
+ return 0;
+}
+
struct ip_vs_conn *
ip_vs_conn_in_get_proto(int af, const struct sk_buff *skb,
struct ip_vs_protocol *pp,
const struct ip_vs_iphdr *iph,
unsigned int proto_off, int inverse)
{
- __be16 _ports[2], *pptr;
+ struct ip_vs_conn_param p;
- pptr = skb_header_pointer(skb, proto_off, sizeof(_ports), _ports);
- if (pptr == NULL)
+ if (ip_vs_conn_fill_param_proto(af, skb, iph, proto_off, inverse, &p))
return NULL;
- if (likely(!inverse))
- return ip_vs_conn_in_get(af, iph->protocol,
- &iph->saddr, pptr[0],
- &iph->daddr, pptr[1]);
- else
- return ip_vs_conn_in_get(af, iph->protocol,
- &iph->daddr, pptr[1],
- &iph->saddr, pptr[0]);
+ return ip_vs_conn_in_get(&p);
}
EXPORT_SYMBOL_GPL(ip_vs_conn_in_get_proto);
/* Get reference to connection template */
-struct ip_vs_conn *ip_vs_ct_in_get
-(int af, int protocol, const union nf_inet_addr *s_addr, __be16 s_port,
- const union nf_inet_addr *d_addr, __be16 d_port)
+struct ip_vs_conn *ip_vs_ct_in_get(const struct ip_vs_conn_param *p)
{
unsigned hash;
struct ip_vs_conn *cp;
- hash = ip_vs_conn_hashkey(af, protocol, s_addr, s_port);
+ hash = ip_vs_conn_hashkey(p->af, p->protocol, p->caddr, p->cport);
ct_read_lock(hash);
list_for_each_entry(cp, &ip_vs_conn_tab[hash], c_list) {
- if (cp->af == af &&
- ip_vs_addr_equal(af, s_addr, &cp->caddr) &&
+ if (cp->af == p->af &&
+ ip_vs_addr_equal(p->af, p->caddr, &cp->caddr) &&
/* protocol should only be IPPROTO_IP if
- * d_addr is a fwmark */
- ip_vs_addr_equal(protocol == IPPROTO_IP ? AF_UNSPEC : af,
- d_addr, &cp->vaddr) &&
- s_port == cp->cport && d_port == cp->vport &&
+ * p->vaddr is a fwmark */
+ ip_vs_addr_equal(p->protocol == IPPROTO_IP ? AF_UNSPEC :
+ p->af, p->vaddr, &cp->vaddr) &&
+ p->cport == cp->cport && p->vport == cp->vport &&
cp->flags & IP_VS_CONN_F_TEMPLATE &&
- protocol == cp->protocol) {
+ p->protocol == cp->protocol) {
/* HIT */
atomic_inc(&cp->refcnt);
goto out;
@@ -327,9 +337,9 @@ struct ip_vs_conn *ip_vs_ct_in_get
ct_read_unlock(hash);
IP_VS_DBG_BUF(9, "template lookup/in %s %s:%d->%s:%d %s\n",
- ip_vs_proto_name(protocol),
- IP_VS_DBG_ADDR(af, s_addr), ntohs(s_port),
- IP_VS_DBG_ADDR(af, d_addr), ntohs(d_port),
+ ip_vs_proto_name(p->protocol),
+ IP_VS_DBG_ADDR(p->af, p->caddr), ntohs(p->cport),
+ IP_VS_DBG_ADDR(p->af, p->vaddr), ntohs(p->vport),
cp ? "hit" : "not hit");
return cp;
@@ -341,9 +351,7 @@ struct ip_vs_conn *ip_vs_ct_in_get
* s_addr, s_port: pkt source address (inside host)
* d_addr, d_port: pkt dest address (foreign host)
*/
-struct ip_vs_conn *ip_vs_conn_out_get
-(int af, int protocol, const union nf_inet_addr *s_addr, __be16 s_port,
- const union nf_inet_addr *d_addr, __be16 d_port)
+struct ip_vs_conn *ip_vs_conn_out_get(const struct ip_vs_conn_param *p)
{
unsigned hash;
struct ip_vs_conn *cp, *ret=NULL;
@@ -351,16 +359,16 @@ struct ip_vs_conn *ip_vs_conn_out_get
/*
* Check for "full" addressed entries
*/
- hash = ip_vs_conn_hashkey(af, protocol, d_addr, d_port);
+ hash = ip_vs_conn_hashkey(p->af, p->protocol, p->vaddr, p->vport);
ct_read_lock(hash);
list_for_each_entry(cp, &ip_vs_conn_tab[hash], c_list) {
- if (cp->af == af &&
- ip_vs_addr_equal(af, d_addr, &cp->caddr) &&
- ip_vs_addr_equal(af, s_addr, &cp->daddr) &&
- d_port == cp->cport && s_port == cp->dport &&
- protocol == cp->protocol) {
+ if (cp->af == p->af &&
+ ip_vs_addr_equal(p->af, p->vaddr, &cp->caddr) &&
+ ip_vs_addr_equal(p->af, p->caddr, &cp->daddr) &&
+ p->vport == cp->cport && p->cport == cp->dport &&
+ p->protocol == cp->protocol) {
/* HIT */
atomic_inc(&cp->refcnt);
ret = cp;
@@ -371,9 +379,9 @@ struct ip_vs_conn *ip_vs_conn_out_get
ct_read_unlock(hash);
IP_VS_DBG_BUF(9, "lookup/out %s %s:%d->%s:%d %s\n",
- ip_vs_proto_name(protocol),
- IP_VS_DBG_ADDR(af, s_addr), ntohs(s_port),
- IP_VS_DBG_ADDR(af, d_addr), ntohs(d_port),
+ ip_vs_proto_name(p->protocol),
+ IP_VS_DBG_ADDR(p->af, p->caddr), ntohs(p->cport),
+ IP_VS_DBG_ADDR(p->af, p->vaddr), ntohs(p->vport),
ret ? "hit" : "not hit");
return ret;
@@ -385,20 +393,12 @@ ip_vs_conn_out_get_proto(int af, const s
const struct ip_vs_iphdr *iph,
unsigned int proto_off, int inverse)
{
- __be16 _ports[2], *pptr;
+ struct ip_vs_conn_param p;
- pptr = skb_header_pointer(skb, proto_off, sizeof(_ports), _ports);
- if (pptr == NULL)
+ if (!ip_vs_conn_fill_param_proto(af, skb, iph, proto_off, inverse, &p))
return NULL;
- if (likely(!inverse))
- return ip_vs_conn_out_get(af, iph->protocol,
- &iph->saddr, pptr[0],
- &iph->daddr, pptr[1]);
- else
- return ip_vs_conn_out_get(af, iph->protocol,
- &iph->daddr, pptr[1],
- &iph->saddr, pptr[0]);
+ return ip_vs_conn_out_get(&p);
}
EXPORT_SYMBOL_GPL(ip_vs_conn_out_get_proto);
@@ -758,13 +758,12 @@ void ip_vs_conn_expire_now(struct ip_vs_
* Create a new connection entry and hash it into the ip_vs_conn_tab
*/
struct ip_vs_conn *
-ip_vs_conn_new(int af, int proto, const union nf_inet_addr *caddr, __be16 cport,
- const union nf_inet_addr *vaddr, __be16 vport,
+ip_vs_conn_new(const struct ip_vs_conn_param *p,
const union nf_inet_addr *daddr, __be16 dport, unsigned flags,
struct ip_vs_dest *dest)
{
struct ip_vs_conn *cp;
- struct ip_vs_protocol *pp = ip_vs_proto_get(proto);
+ struct ip_vs_protocol *pp = ip_vs_proto_get(p->protocol);
cp = kmem_cache_zalloc(ip_vs_conn_cachep, GFP_ATOMIC);
if (cp == NULL) {
@@ -774,14 +773,14 @@ ip_vs_conn_new(int af, int proto, const
INIT_LIST_HEAD(&cp->c_list);
setup_timer(&cp->timer, ip_vs_conn_expire, (unsigned long)cp);
- cp->af = af;
- cp->protocol = proto;
- ip_vs_addr_copy(af, &cp->caddr, caddr);
- cp->cport = cport;
- ip_vs_addr_copy(af, &cp->vaddr, vaddr);
- cp->vport = vport;
+ cp->af = p->af;
+ cp->protocol = p->protocol;
+ ip_vs_addr_copy(p->af, &cp->caddr, p->caddr);
+ cp->cport = p->cport;
+ ip_vs_addr_copy(p->af, &cp->vaddr, p->vaddr);
+ cp->vport = p->vport;
/* proto should only be IPPROTO_IP if d_addr is a fwmark */
- ip_vs_addr_copy(proto == IPPROTO_IP ? AF_UNSPEC : af,
+ ip_vs_addr_copy(p->protocol == IPPROTO_IP ? AF_UNSPEC : p->af,
&cp->daddr, daddr);
cp->dport = dport;
cp->flags = flags;
@@ -810,7 +809,7 @@ ip_vs_conn_new(int af, int proto, const
/* Bind its packet transmitter */
#ifdef CONFIG_IP_VS_IPV6
- if (af == AF_INET6)
+ if (p->af == AF_INET6)
ip_vs_bind_xmit_v6(cp);
else
#endif
Index: lvs-test-2.6/net/netfilter/ipvs/ip_vs_core.c
===================================================================
--- lvs-test-2.6.orig/net/netfilter/ipvs/ip_vs_core.c 2010-10-01 22:06:23.000000000 +0900
+++ lvs-test-2.6/net/netfilter/ipvs/ip_vs_core.c 2010-10-01 22:10:46.000000000 +0900
@@ -193,14 +193,11 @@ ip_vs_sched_persist(struct ip_vs_service
struct ip_vs_iphdr iph;
struct ip_vs_dest *dest;
struct ip_vs_conn *ct;
- int protocol = iph.protocol;
__be16 dport = 0; /* destination port to forward */
- __be16 vport = 0; /* virtual service port */
unsigned int flags;
union nf_inet_addr snet; /* source network of the client,
after masking */
- const union nf_inet_addr fwmark = { .ip = htonl(svc->fwmark) };
- const union nf_inet_addr *vaddr = &iph.daddr;
+ struct ip_vs_conn_param param;
ip_vs_fill_iphdr(svc->af, skb_network_header(skb), &iph);
@@ -232,6 +229,11 @@ ip_vs_sched_persist(struct ip_vs_service
* is created for other persistent services.
*/
{
+ int protocol = iph.protocol;
+ const union nf_inet_addr *vaddr = &iph.daddr;
+ const union nf_inet_addr fwmark = { .ip = htonl(svc->fwmark) };
+ __be16 vport = 0;
+
if (ports[1] == svc->port) {
/* non-FTP template:
* <protocol, caddr, 0, vaddr, vport, daddr, dport>
@@ -253,11 +255,12 @@ ip_vs_sched_persist(struct ip_vs_service
vaddr = &fwmark;
}
}
+ ip_vs_conn_fill_param(svc->af, protocol, &snet, 0,
+ vaddr, vport, ¶m);
}
/* Check if a template already exists */
- ct = ip_vs_ct_in_get(svc->af, protocol, &snet, 0, vaddr, vport);
-
+ ct = ip_vs_ct_in_get(¶m);
if (!ct || !ip_vs_check_template(ct)) {
/* No template found or the dest of the connection
* template is not available.
@@ -272,8 +275,7 @@ ip_vs_sched_persist(struct ip_vs_service
dport = dest->port;
/* Create a template */
- ct = ip_vs_conn_new(svc->af, protocol, &snet, 0,vaddr, vport,
- &dest->addr, dport,
+ ct = ip_vs_conn_new(¶m, &dest->addr, dport,
IP_VS_CONN_F_TEMPLATE, dest);
if (ct == NULL)
return NULL;
@@ -291,12 +293,7 @@ ip_vs_sched_persist(struct ip_vs_service
/*
* Create a new connection according to the template
*/
- cp = ip_vs_conn_new(svc->af, iph.protocol,
- &iph.saddr, ports[0],
- &iph.daddr, ports[1],
- &dest->addr, dport,
- flags,
- dest);
+ cp = ip_vs_conn_new(¶m, &dest->addr, dport, flags, dest);
if (cp == NULL) {
ip_vs_conn_put(ct);
return NULL;
@@ -363,14 +360,16 @@ ip_vs_schedule(struct ip_vs_service *svc
/*
* Create a connection entry.
*/
- cp = ip_vs_conn_new(svc->af, iph.protocol,
- &iph.saddr, pptr[0],
- &iph.daddr, pptr[1],
- &dest->addr, dest->port ? dest->port : pptr[1],
- flags,
- dest);
- if (cp == NULL)
- return NULL;
+ {
+ struct ip_vs_conn_param p;
+ ip_vs_conn_fill_param(svc->af, iph.protocol, &iph.saddr,
+ pptr[0], &iph.daddr, pptr[1], &p);
+ cp = ip_vs_conn_new(&p, &dest->addr,
+ dest->port ? dest->port : pptr[1],
+ flags, dest);
+ if (!cp)
+ return NULL;
+ }
IP_VS_DBG_BUF(6, "Schedule fwd:%c c:%s:%u v:%s:%u "
"d:%s:%u conn->flags:%X conn->refcnt:%d\n",
@@ -426,14 +425,17 @@ int ip_vs_leave(struct ip_vs_service *sv
/* create a new connection entry */
IP_VS_DBG(6, "%s(): create a cache_bypass entry\n", __func__);
- cp = ip_vs_conn_new(svc->af, iph.protocol,
- &iph.saddr, pptr[0],
- &iph.daddr, pptr[1],
- &daddr, 0,
- IP_VS_CONN_F_BYPASS | flags,
- NULL);
- if (cp == NULL)
- return NF_DROP;
+ {
+ struct ip_vs_conn_param p;
+ ip_vs_conn_fill_param(svc->af, iph.protocol,
+ &iph.saddr, pptr[0],
+ &iph.daddr, pptr[1], &p);
+ cp = ip_vs_conn_new(&p, &daddr, 0,
+ IP_VS_CONN_F_BYPASS | flags,
+ NULL);
+ if (!cp)
+ return NF_DROP;
+ }
/* statistics */
ip_vs_in_stats(cp, skb);
Index: lvs-test-2.6/net/netfilter/ipvs/ip_vs_ftp.c
===================================================================
--- lvs-test-2.6.orig/net/netfilter/ipvs/ip_vs_ftp.c 2010-10-01 22:14:10.000000000 +0900
+++ lvs-test-2.6/net/netfilter/ipvs/ip_vs_ftp.c 2010-10-01 22:21:09.000000000 +0900
@@ -195,13 +195,17 @@ static int ip_vs_ftp_out(struct ip_vs_ap
/*
* Now update or create an connection entry for it
*/
- n_cp = ip_vs_conn_out_get(AF_INET, iph->protocol, &from, port,
- &cp->caddr, 0);
+ {
+ struct ip_vs_conn_param p;
+ ip_vs_conn_fill_param(AF_INET, iph->protocol,
+ &from, port, &cp->caddr, 0, &p);
+ n_cp = ip_vs_conn_out_get(&p);
+ }
if (!n_cp) {
- n_cp = ip_vs_conn_new(AF_INET, IPPROTO_TCP,
- &cp->caddr, 0,
- &cp->vaddr, port,
- &from, port,
+ struct ip_vs_conn_param p;
+ ip_vs_conn_fill_param(AF_INET, IPPROTO_TCP, &cp->caddr,
+ 0, &cp->vaddr, port, &p);
+ n_cp = ip_vs_conn_new(&p, &from, port,
IP_VS_CONN_F_NO_CPORT |
IP_VS_CONN_F_NFCT,
cp->dest);
@@ -347,21 +351,22 @@ static int ip_vs_ftp_in(struct ip_vs_app
ip_vs_proto_name(iph->protocol),
&to.ip, ntohs(port), &cp->vaddr.ip, 0);
- n_cp = ip_vs_conn_in_get(AF_INET, iph->protocol,
- &to, port,
- &cp->vaddr, htons(ntohs(cp->vport)-1));
- if (!n_cp) {
- n_cp = ip_vs_conn_new(AF_INET, IPPROTO_TCP,
- &to, port,
+ {
+ struct ip_vs_conn_param p;
+ ip_vs_conn_fill_param(AF_INET, iph->protocol, &to, port,
&cp->vaddr, htons(ntohs(cp->vport)-1),
- &cp->daddr, htons(ntohs(cp->dport)-1),
- IP_VS_CONN_F_NFCT,
- cp->dest);
- if (!n_cp)
- return 0;
+ &p);
+ n_cp = ip_vs_conn_in_get(&p);
+ if (!n_cp) {
+ n_cp = ip_vs_conn_new(&p, &cp->daddr,
+ htons(ntohs(cp->dport)-1),
+ IP_VS_CONN_F_NFCT, cp->dest);
+ if (!n_cp)
+ return 0;
- /* add its controller */
- ip_vs_control_add(n_cp, cp);
+ /* add its controller */
+ ip_vs_control_add(n_cp, cp);
+ }
}
/*
Index: lvs-test-2.6/net/netfilter/ipvs/ip_vs_nfct.c
===================================================================
--- lvs-test-2.6.orig/net/netfilter/ipvs/ip_vs_nfct.c 2010-10-01 22:11:53.000000000 +0900
+++ lvs-test-2.6/net/netfilter/ipvs/ip_vs_nfct.c 2010-10-01 22:24:29.000000000 +0900
@@ -140,6 +140,7 @@ static void ip_vs_nfct_expect_callback(s
{
struct nf_conntrack_tuple *orig, new_reply;
struct ip_vs_conn *cp;
+ struct ip_vs_conn_param p;
if (exp->tuple.src.l3num != PF_INET)
return;
@@ -154,9 +155,10 @@ static void ip_vs_nfct_expect_callback(s
/* RS->CLIENT */
orig = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
- cp = ip_vs_conn_out_get(exp->tuple.src.l3num, orig->dst.protonum,
- &orig->src.u3, orig->src.u.tcp.port,
- &orig->dst.u3, orig->dst.u.tcp.port);
+ ip_vs_conn_fill_param(exp->tuple.src.l3num, orig->dst.protonum,
+ &orig->src.u3, orig->src.u.tcp.port,
+ &orig->dst.u3, orig->dst.u.tcp.port, &p);
+ cp = ip_vs_conn_out_get(&p);
if (cp) {
/* Change reply CLIENT->RS to CLIENT->VS */
new_reply = ct->tuplehash[IP_CT_DIR_REPLY].tuple;
@@ -176,9 +178,7 @@ static void ip_vs_nfct_expect_callback(s
}
/* CLIENT->VS */
- cp = ip_vs_conn_in_get(exp->tuple.src.l3num, orig->dst.protonum,
- &orig->src.u3, orig->src.u.tcp.port,
- &orig->dst.u3, orig->dst.u.tcp.port);
+ cp = ip_vs_conn_in_get(&p);
if (cp) {
/* Change reply VS->CLIENT to RS->CLIENT */
new_reply = ct->tuplehash[IP_CT_DIR_REPLY].tuple;
Index: lvs-test-2.6/net/netfilter/ipvs/ip_vs_proto_ah_esp.c
===================================================================
--- lvs-test-2.6.orig/net/netfilter/ipvs/ip_vs_proto_ah_esp.c 2010-10-01 21:55:19.000000000 +0900
+++ lvs-test-2.6/net/netfilter/ipvs/ip_vs_proto_ah_esp.c 2010-10-01 22:23:33.000000000 +0900
@@ -40,6 +40,19 @@ struct isakmp_hdr {
#define PORT_ISAKMP 500
+static void
+ah_esp_conn_fill_param_proto(int af, const struct ip_vs_iphdr *iph,
+ int inverse, struct ip_vs_conn_param *p)
+{
+ if (likely(!inverse))
+ ip_vs_conn_fill_param(af, IPPROTO_UDP,
+ &iph->saddr, htons(PORT_ISAKMP),
+ &iph->daddr, htons(PORT_ISAKMP), p);
+ else
+ ip_vs_conn_fill_param(af, iph->protocol,
+ &iph->saddr, htons(PORT_ISAKMP),
+ &iph->daddr, htons(PORT_ISAKMP), p);
+}
static struct ip_vs_conn *
ah_esp_conn_in_get(int af, const struct sk_buff *skb, struct ip_vs_protocol *pp,
@@ -47,21 +60,10 @@ ah_esp_conn_in_get(int af, const struct
int inverse)
{
struct ip_vs_conn *cp;
+ struct ip_vs_conn_param p;
- if (likely(!inverse)) {
- cp = ip_vs_conn_in_get(af, IPPROTO_UDP,
- &iph->saddr,
- htons(PORT_ISAKMP),
- &iph->daddr,
- htons(PORT_ISAKMP));
- } else {
- cp = ip_vs_conn_in_get(af, IPPROTO_UDP,
- &iph->daddr,
- htons(PORT_ISAKMP),
- &iph->saddr,
- htons(PORT_ISAKMP));
- }
-
+ ah_esp_conn_fill_param_proto(af, iph, inverse, &p);
+ cp = ip_vs_conn_in_get(&p);
if (!cp) {
/*
* We are not sure if the packet is from our
@@ -87,21 +89,10 @@ ah_esp_conn_out_get(int af, const struct
int inverse)
{
struct ip_vs_conn *cp;
+ struct ip_vs_conn_param p;
- if (likely(!inverse)) {
- cp = ip_vs_conn_out_get(af, IPPROTO_UDP,
- &iph->saddr,
- htons(PORT_ISAKMP),
- &iph->daddr,
- htons(PORT_ISAKMP));
- } else {
- cp = ip_vs_conn_out_get(af, IPPROTO_UDP,
- &iph->daddr,
- htons(PORT_ISAKMP),
- &iph->saddr,
- htons(PORT_ISAKMP));
- }
-
+ ah_esp_conn_fill_param_proto(af, iph, inverse, &p);
+ cp = ip_vs_conn_out_get(&p);
if (!cp) {
IP_VS_DBG_BUF(12, "Unknown ISAKMP entry for inout packet "
"%s%s %s->%s\n",
Index: lvs-test-2.6/net/netfilter/ipvs/ip_vs_sync.c
===================================================================
--- lvs-test-2.6.orig/net/netfilter/ipvs/ip_vs_sync.c 2010-10-01 21:55:19.000000000 +0900
+++ lvs-test-2.6/net/netfilter/ipvs/ip_vs_sync.c 2010-10-01 22:23:33.000000000 +0900
@@ -301,6 +301,7 @@ static void ip_vs_process_message(const
struct ip_vs_conn *cp;
struct ip_vs_protocol *pp;
struct ip_vs_dest *dest;
+ struct ip_vs_conn_param param;
char *p;
int i;
@@ -370,18 +371,17 @@ static void ip_vs_process_message(const
}
}
- if (!(flags & IP_VS_CONN_F_TEMPLATE))
- cp = ip_vs_conn_in_get(AF_INET, s->protocol,
- (union nf_inet_addr *)&s->caddr,
- s->cport,
- (union nf_inet_addr *)&s->vaddr,
- s->vport);
- else
- cp = ip_vs_ct_in_get(AF_INET, s->protocol,
- (union nf_inet_addr *)&s->caddr,
- s->cport,
- (union nf_inet_addr *)&s->vaddr,
- s->vport);
+ {
+ ip_vs_conn_fill_param(AF_INET, s->protocol,
+ (union nf_inet_addr *)&s->caddr,
+ s->cport,
+ (union nf_inet_addr *)&s->vaddr,
+ s->vport, ¶m);
+ if (!(flags & IP_VS_CONN_F_TEMPLATE))
+ cp = ip_vs_conn_in_get(¶m);
+ else
+ cp = ip_vs_ct_in_get(¶m);
+ }
if (!cp) {
/*
* Find the appropriate destination for the connection.
@@ -406,14 +406,9 @@ static void ip_vs_process_message(const
else
flags &= ~IP_VS_CONN_F_INACTIVE;
}
- cp = ip_vs_conn_new(AF_INET, s->protocol,
- (union nf_inet_addr *)&s->caddr,
- s->cport,
- (union nf_inet_addr *)&s->vaddr,
- s->vport,
+ cp = ip_vs_conn_new(¶m,
(union nf_inet_addr *)&s->daddr,
- s->dport,
- flags, dest);
+ s->dport, flags, dest);
if (dest)
atomic_dec(&dest->refcnt);
if (!cp) {
^ permalink raw reply
* [PATCH net-next] gre: protocol table can be static
From: Stephen Hemminger @ 2010-10-01 23:58 UTC (permalink / raw)
To: David Miller; +Cc: netdev, Herbert Xu
This table is only used in gre.c
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
--- a/net/ipv4/gre.c 2010-09-29 10:41:46.895895032 +0900
+++ b/net/ipv4/gre.c 2010-09-29 10:41:56.265898132 +0900
@@ -22,7 +22,7 @@
#include <net/gre.h>
-const struct gre_protocol *gre_proto[GREPROTO_MAX] __read_mostly;
+static const struct gre_protocol *gre_proto[GREPROTO_MAX] __read_mostly;
static DEFINE_SPINLOCK(gre_proto_lock);
int gre_add_protocol(const struct gre_protocol *proto, u8 version)
[
^ permalink raw reply
* Re: [PATCH] netfilter: unregister nf hooks, matches and targets in the reverse order
From: Jan Engelhardt @ 2010-10-01 23:57 UTC (permalink / raw)
To: Changli Gao; +Cc: Patrick McHardy, David S. Miller, netfilter-devel, netdev
In-Reply-To: <AANLkTi=AhK0jDM8WEf5tHhervKUpB6f4MR+mNx2moYuC@mail.gmail.com>
On Sunday 2010-09-19 07:58, Changli Gao wrote:
>On Thu, Sep 16, 2010 at 1:32 AM, Patrick McHardy <kaber@trash.net> wrote:
>> Am 02.09.2010 16:15, schrieb Changli Gao:
>>> Since we register nf hooks, matches and targets in order, we'd better
>>> unregister them in the reverse order.
>>
>> Why? Is there a specific bug you've noticed?
>>
>
>No, there isn't any bug. I just think unregistering them in the
>reverse order is more resonable, like the rollback when failing. And
>the code patched generates less object:
I support this.
^ permalink raw reply
* Re: problem with flowi structure
From: Jan Engelhardt @ 2010-10-01 23:46 UTC (permalink / raw)
To: Nicola Padovano; +Cc: Eric Dumazet, AIJAZ BAIG, netfilter-devel, netdev
In-Reply-To: <AANLkTi=hXc_ijNgxW+Q6jkRP_T1zjyy2mvKAdTSgeh7w@mail.gmail.com>
On Friday 2010-09-17 20:25, Nicola Padovano wrote:
>
>[CODE]
>if (hooknumber == NF_INET_LOCAL_IN) fl.nl_u.ip4_u.saddr = niph->saddr;
> //niph is the pointer to ip header of the packet to send
>if (hooknumber == NF_INET_FORWARD) fl.nl_u.ip4_u.saddr = 0;
>[/CODE]
>
>so, i don't understand why saddr = 0 when the hooknumber is NF_INET_FORWARD....
>
>this is the real problem.
0 is automatic address selection.
^ permalink raw reply
* Re: [patch v2 03/12] [PATCH 03/12] IPVS: compact ip_vs_sched_persist()
From: Julian Anastasov @ 2010-10-01 22:35 UTC (permalink / raw)
To: Simon Horman
Cc: lvs-devel, netdev, netfilter, netfilter-devel, Jan Engelhardt,
Stephen Hemminger, Wensong Zhang, Patrick McHardy
In-Reply-To: <20101001143941.949669349@akiko.akashicho.tokyo.vergenet.net>
Hello,
On Fri, 1 Oct 2010, Simon Horman wrote:
> Compact ip_vs_sched_persist() by setting up parameters
> and calling functions once.
>
> Signed-off-by: Simon Horman <horms@verge.net.au>
> ---
>
> v2
> * Make "union nf_inet_addr fwmark" const
> * Don't remove the comment next to the declaration of dport
> * Add a comment to the declaration of vport
>
> Index: lvs-test-2.6/net/netfilter/ipvs/ip_vs_core.c
> ===================================================================
> --- lvs-test-2.6.orig/net/netfilter/ipvs/ip_vs_core.c 2010-10-01 21:56:39.000000000 +0900
> +++ lvs-test-2.6/net/netfilter/ipvs/ip_vs_core.c 2010-10-01 22:02:41.000000000 +0900
> @@ -193,10 +193,14 @@ ip_vs_sched_persist(struct ip_vs_service
> struct ip_vs_iphdr iph;
> struct ip_vs_dest *dest;
> struct ip_vs_conn *ct;
> - __be16 dport; /* destination port to forward */
> + int protocol = iph.protocol;
> + __be16 dport = 0; /* destination port to forward */
> + __be16 vport = 0; /* virtual service port */
> unsigned int flags;
> union nf_inet_addr snet; /* source network of the client,
> after masking */
> + const union nf_inet_addr fwmark = { .ip = htonl(svc->fwmark) };
> + const union nf_inet_addr *vaddr = &iph.daddr;
>
> ip_vs_fill_iphdr(svc->af, skb_network_header(skb), &iph);
>
> @@ -227,119 +231,58 @@ ip_vs_sched_persist(struct ip_vs_service
> * service, and a template like <caddr, 0, vaddr, vport, daddr, dport>
> * is created for other persistent services.
> */
> - if (ports[1] == svc->port) {
> - /* Check if a template already exists */
> - if (svc->port != FTPPORT)
> - ct = ip_vs_ct_in_get(svc->af, iph.protocol, &snet, 0,
> - &iph.daddr, ports[1]);
> - else
> - ct = ip_vs_ct_in_get(svc->af, iph.protocol, &snet, 0,
> - &iph.daddr, 0);
> -
> - if (!ct || !ip_vs_check_template(ct)) {
> - /*
> - * No template found or the dest of the connection
> - * template is not available.
> - */
> - dest = svc->scheduler->schedule(svc, skb);
> - if (dest == NULL) {
> - IP_VS_DBG(1, "p-schedule: no dest found.\n");
> - return NULL;
> - }
> -
> - /*
> - * Create a template like <protocol,caddr,0,
> - * vaddr,vport,daddr,dport> for non-ftp service,
> - * and <protocol,caddr,0,vaddr,0,daddr,0>
> - * for ftp service.
> + {
> + if (ports[1] == svc->port) {
> + /* non-FTP template:
> + * <protocol, caddr, 0, vaddr, vport, daddr, dport>
> + * FTP template:
> + * <protocol, caddr, 0, vaddr, 0, daddr, 0>
> */
> if (svc->port != FTPPORT)
> - ct = ip_vs_conn_new(svc->af, iph.protocol,
> - &snet, 0,
> - &iph.daddr,
> - ports[1],
> - &dest->addr, dest->port,
> - IP_VS_CONN_F_TEMPLATE,
> - dest);
> - else
> - ct = ip_vs_conn_new(svc->af, iph.protocol,
> - &snet, 0,
> - &iph.daddr, 0,
> - &dest->addr, 0,
> - IP_VS_CONN_F_TEMPLATE,
> - dest);
> - if (ct == NULL)
> - return NULL;
> -
> - ct->timeout = svc->timeout;
> + vport = ports[1];
> } else {
> - /* set destination with the found template */
> - dest = ct->dest;
> - }
> - dport = dest->port;
> - } else {
> - /*
> - * Note: persistent fwmark-based services and persistent
> - * port zero service are handled here.
> - * fwmark template: <IPPROTO_IP,caddr,0,fwmark,0,daddr,0>
> - * port zero template: <protocol,caddr,0,vaddr,0,daddr,0>
> - */
> - if (svc->fwmark) {
> - union nf_inet_addr fwmark = {
> - .ip = htonl(svc->fwmark)
> - };
> -
> - ct = ip_vs_ct_in_get(svc->af, IPPROTO_IP, &snet, 0,
> - &fwmark, 0);
> - } else
> - ct = ip_vs_ct_in_get(svc->af, iph.protocol, &snet, 0,
> - &iph.daddr, 0);
> -
> - if (!ct || !ip_vs_check_template(ct)) {
> - /*
> - * If it is not persistent port zero, return NULL,
> - * otherwise create a connection template.
> + /* Note: persistent fwmark-based services and
> + * persistent port zero service are handled here.
> + * fwmark template:
> + * <IPPROTO_IP,caddr,0,fwmark,0,daddr,0>
> + * port zero template:
> + * <protocol,caddr,0,vaddr,0,daddr,0>
> */
> - if (svc->port)
> - return NULL;
> -
> - dest = svc->scheduler->schedule(svc, skb);
> - if (dest == NULL) {
> - IP_VS_DBG(1, "p-schedule: no dest found.\n");
> - return NULL;
> + if (svc->fwmark) {
> + protocol = IPPROTO_IP;
> + vaddr = &fwmark;
> }
> + }
> + }
>
> - /*
> - * Create a template according to the service
> - */
> - if (svc->fwmark) {
> - union nf_inet_addr fwmark = {
> - .ip = htonl(svc->fwmark)
> - };
> -
> - ct = ip_vs_conn_new(svc->af, IPPROTO_IP,
> - &snet, 0,
> - &fwmark, 0,
> - &dest->addr, 0,
> - IP_VS_CONN_F_TEMPLATE,
> - dest);
> - } else
> - ct = ip_vs_conn_new(svc->af, iph.protocol,
> - &snet, 0,
> - &iph.daddr, 0,
> - &dest->addr, 0,
> - IP_VS_CONN_F_TEMPLATE,
> - dest);
> - if (ct == NULL)
> - return NULL;
> + /* Check if a template already exists */
> + ct = ip_vs_ct_in_get(svc->af, protocol, &snet, 0, vaddr, vport);
>
> - ct->timeout = svc->timeout;
> - } else {
> - /* set destination with the found template */
> - dest = ct->dest;
> + if (!ct || !ip_vs_check_template(ct)) {
> + /* No template found or the dest of the connection
> + * template is not available.
> + */
> + dest = svc->scheduler->schedule(svc, skb);
> + if (!dest) {
> + IP_VS_DBG(1, "p-schedule: no dest found.\n");
> + return NULL;
> }
> - dport = ports[1];
> - }
> +
> + if (ports[1] == svc->port && svc->port != FTPPORT)
> + dport = dest->port;
> +
> + /* Create a template */
> + ct = ip_vs_conn_new(svc->af, protocol, &snet, 0,vaddr, vport,
> + &dest->addr, dport,
> + IP_VS_CONN_F_TEMPLATE, dest);
> + if (ct == NULL)
> + return NULL;
> +
> + ct->timeout = svc->timeout;
> + } else
> + /* set destination with the found template */
> + dest = ct->dest;
Here dport:
> + dport = dest->port;
should be:
dport = ports[1];
if (dport == svc->port && dest->port)
dport = dest->port;
> flags = (svc->flags & IP_VS_SVC_F_ONEPACKET
> && iph.protocol == IPPROTO_UDP)?
Regards
--
Julian Anastasov <ja@ssi.bg>
^ permalink raw reply
* Re: [PATCH] Fix out-of-bounds reading in sctp_asoc_get_hmac()
From: Vlad Yasevich @ 2010-10-01 22:13 UTC (permalink / raw)
To: Dan Rosenberg; +Cc: sri, linux-sctp, linux-kernel, security, stable, netdev
In-Reply-To: <1285969907.2814.49.camel@Dan>
On 10/01/2010 05:51 PM, Dan Rosenberg wrote:
> The sctp_asoc_get_hmac() function iterates through a peer's hmac_ids
> array and attempts to ensure that only a supported hmac entry is
> returned. The current code fails to do this properly - if the last id
> in the array is out of range (greater than SCTP_AUTH_HMAC_ID_MAX), the
> id integer remains set after exiting the loop, and the address of an
> out-of-bounds entry will be returned and subsequently used in the parent
> function, causing potentially ugly memory corruption. This patch resets
> the id integer to 0 on encountering an invalid id so that NULL will be
> returned after finishing the loop if no valid ids are found.
>
> Signed-off-by: Dan Rosenberg<drosenberg@vsecurity.com>
Good catch.
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
-vlad
>
> --- linux-2.6.35.5.orig/net/sctp/auth.c 2010-09-20 16:59:09.000000000 -0400
> +++ linux-2.6.35.5/net/sctp/auth.c 2010-10-01 16:48:58.000000000 -0400
> @@ -543,16 +543,20 @@ struct sctp_hmac *sctp_auth_asoc_get_hma
> id = ntohs(hmacs->hmac_ids[i]);
>
> /* Check the id is in the supported range */
> - if (id> SCTP_AUTH_HMAC_ID_MAX)
> + if (id> SCTP_AUTH_HMAC_ID_MAX) {
> + id = 0;
> continue;
> + }
>
> /* See is we support the id. Supported IDs have name and
> * length fields set, so that we can allocated and use
> * them. We can safely just check for name, for without the
> * name, we can't allocate the TFM.
> */
> - if (!sctp_hmac_list[id].hmac_name)
> + if (!sctp_hmac_list[id].hmac_name) {
> + id = 0;
> continue;
> + }
>
> break;
> }
>
>
^ permalink raw reply
* [PATCH] Fix out-of-bounds reading in sctp_asoc_get_hmac()
From: Dan Rosenberg @ 2010-10-01 21:51 UTC (permalink / raw)
To: vladislav.yasevich, sri
Cc: linux-sctp, linux-kernel, security, stable, netdev
The sctp_asoc_get_hmac() function iterates through a peer's hmac_ids
array and attempts to ensure that only a supported hmac entry is
returned. The current code fails to do this properly - if the last id
in the array is out of range (greater than SCTP_AUTH_HMAC_ID_MAX), the
id integer remains set after exiting the loop, and the address of an
out-of-bounds entry will be returned and subsequently used in the parent
function, causing potentially ugly memory corruption. This patch resets
the id integer to 0 on encountering an invalid id so that NULL will be
returned after finishing the loop if no valid ids are found.
Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
--- linux-2.6.35.5.orig/net/sctp/auth.c 2010-09-20 16:59:09.000000000 -0400
+++ linux-2.6.35.5/net/sctp/auth.c 2010-10-01 16:48:58.000000000 -0400
@@ -543,16 +543,20 @@ struct sctp_hmac *sctp_auth_asoc_get_hma
id = ntohs(hmacs->hmac_ids[i]);
/* Check the id is in the supported range */
- if (id > SCTP_AUTH_HMAC_ID_MAX)
+ if (id > SCTP_AUTH_HMAC_ID_MAX) {
+ id = 0;
continue;
+ }
/* See is we support the id. Supported IDs have name and
* length fields set, so that we can allocated and use
* them. We can safely just check for name, for without the
* name, we can't allocate the TFM.
*/
- if (!sctp_hmac_list[id].hmac_name)
+ if (!sctp_hmac_list[id].hmac_name) {
+ id = 0;
continue;
+ }
break;
}
^ permalink raw reply
* Re: [patch v2 08/12] [PATCH 08/12] IPVS: Add persistence engine data to /proc/net/ip_vs_conn
From: Julian Anastasov @ 2010-10-01 21:50 UTC (permalink / raw)
To: Simon Horman
Cc: lvs-devel, netdev, netfilter, netfilter-devel, Jan Engelhardt,
Stephen Hemminger, Wensong Zhang, Patrick McHardy
In-Reply-To: <20101001143942.382352880@akiko.akashicho.tokyo.vergenet.net>
Hello,
On Fri, 1 Oct 2010, Simon Horman wrote:
> Index: lvs-test-2.6/net/netfilter/ipvs/ip_vs_conn.c
> ===================================================================
> --- lvs-test-2.6.orig/net/netfilter/ipvs/ip_vs_conn.c 2010-10-01 22:27:17.000000000 +0900
> +++ lvs-test-2.6/net/netfilter/ipvs/ip_vs_conn.c 2010-10-01 22:27:32.000000000 +0900
> @@ -938,30 +938,44 @@ static int ip_vs_conn_seq_show(struct se
>
> if (v == SEQ_START_TOKEN)
> seq_puts(seq,
> - "Pro FromIP FPrt ToIP TPrt DestIP DPrt State Expires\n");
> + "Pro FromIP FPrt ToIP TPrt DestIP DPrt State Expires PEName PEData\n");
> else {
> const struct ip_vs_conn *cp = v;
> + char pe_data[IP_VS_PENAME_MAXLEN + IP_VS_PEDATA_MAXLEN + 3];
> + size_t len = 0;
> +
Add check for cp->dest, it is optional:
> + if (cp->dest->svc->pe && cp->dest->svc->pe->show_pe_data) {
> + pe_data[0] = ' ';
> + len = strlen(cp->dest->svc->pe->name);
> + memcpy(pe_data + 1, cp->dest->svc->pe->name, len);
> + pe_data[len + 1] = ' ';
> + len += 2;
> + len += cp->dest->svc->pe->show_pe_data(cp,
> + pe_data + len);
> + }
> + pe_data[len] = '\0';
Regards
--
Julian Anastasov <ja@ssi.bg>
^ permalink raw reply
* Re: [patch v2 07/12] [PATCH 07/12] IPVS: Add struct ip_vs_pe
From: Julian Anastasov @ 2010-10-01 21:45 UTC (permalink / raw)
To: Simon Horman
Cc: lvs-devel, netdev, netfilter, netfilter-devel, Jan Engelhardt,
Stephen Hemminger, Wensong Zhang, Patrick McHardy
In-Reply-To: <20101001143942.297844368@akiko.akashicho.tokyo.vergenet.net>
Hello,
On Fri, 1 Oct 2010, Simon Horman wrote:
> ===================================================================
> --- lvs-test-2.6.orig/net/netfilter/ipvs/ip_vs_conn.c 2010-10-01 22:48:42.000000000 +0900
> +++ lvs-test-2.6/net/netfilter/ipvs/ip_vs_conn.c 2010-10-01 22:49:15.000000000 +0900
> @@ -148,6 +148,29 @@ static unsigned int ip_vs_conn_hashkey(i
> & ip_vs_conn_tab_mask;
> }
>
> +static unsigned int ip_vs_conn_hashkey_param(const struct ip_vs_conn_param *p)
> +{
> + if (p->pe && p->pe->hashkey_raw)
> + return p->pe->hashkey_raw(p, ip_vs_conn_rnd) &
> + ip_vs_conn_tab_mask;
> + return ip_vs_conn_hashkey(p->af, p->protocol, p->caddr, p->cport);
> +}
> +
> +static unsigned int ip_vs_conn_hashkey_conn(const struct ip_vs_conn *cp)
> +{
> + struct ip_vs_conn_param p;
> +
> + ip_vs_conn_fill_param(cp->af, cp->protocol, &cp->caddr, cp->cport,
> + NULL, 0, &p);
> +
cp->dest is optional, line should be
'if (cp->dest && cp->dest->svc->pe) {':
> + if (cp->dest->svc->pe) {
> + p.pe = cp->dest->svc->pe;
> + p.pe_data = cp->pe_data;
> + p.pe_data_len = cp->pe_data_len;
> + }
> +
> + return ip_vs_conn_hashkey_param(&p);
> +}
>
> @@ -359,7 +387,7 @@ struct ip_vs_conn *ip_vs_conn_out_get(co
> /*
> * Check for "full" addressed entries
> */
Here ip_vs_conn_out_get expects client data in
p->vaddr and p->vport (was daddr before) but ip_vs_conn_hashkey_param
hashes client data from p->caddr and p->cport:
> - hash = ip_vs_conn_hashkey(p->af, p->protocol, p->vaddr, p->vport);
> + hash = ip_vs_conn_hashkey_param(p);
>
> ct_read_lock(hash);
Regards
--
Julian Anastasov <ja@ssi.bg>
^ permalink raw reply
* [patch 2/2] drivers/net/stmmac/: add HAS_IOMEM dependency
From: akpm @ 2010-10-01 21:17 UTC (permalink / raw)
To: davem; +Cc: netdev, akpm, schwidefsky, heiko.carstens, peppe.cavallaro
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
The stmmac driver does not compile on s390:
drivers/net/stmmac/stmmac_main.c: In function 'stmmac_adjust_link':
drivers/net/stmmac/stmmac_main.c:210: error: implicit declaration of function 'readl'
drivers/net/stmmac/stmmac_main.c:263: error: implicit declaration of function 'writel'
drivers/net/stmmac/stmmac_main.c: In function 'stmmac_dvr_probe':
drivers/net/stmmac/stmmac_main.c:1674: error: implicit declaration of function 'ioremap'
drivers/net/stmmac/stmmac_main.c:1674: warning: assignment makes pointer from integer without a cast
drivers/net/stmmac/stmmac_main.c:1761: error: implicit declaration of function 'iounmap'
make[3]: *** [drivers/net/stmmac/stmmac_main.o] Error 1
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Giuseppe CAVALLARO <peppe.cavallaro@st.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
drivers/net/stmmac/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff -puN drivers/net/stmmac/Kconfig~drivers-net-stmmac-add-has_iomem-dependency drivers/net/stmmac/Kconfig
--- a/drivers/net/stmmac/Kconfig~drivers-net-stmmac-add-has_iomem-dependency
+++ a/drivers/net/stmmac/Kconfig
@@ -3,7 +3,7 @@ config STMMAC_ETH
select MII
select PHYLIB
select CRC32
- depends on NETDEVICES
+ depends on NETDEVICES && HAS_IOMEM
help
This is the driver for the Ethernet IPs are built around a
Synopsys IP Core and only tested on the STMicroelectronics
_
^ permalink raw reply
* [patch 1/2] drivers-net-tulip-de4x5c-fix-copy-length-in-de4x5_ioctl-checkpatch-fixes
From: akpm @ 2010-10-01 21:17 UTC (permalink / raw)
To: davem; +Cc: netdev, akpm, dan.j.rosenberg, grundler, jeffm
From: Andrew Morton <akpm@linux-foundation.org>
ERROR: trailing statements should be on next line
#23: FILE: drivers/net/tulip/de4x5.c:5477:
+ if (copy_to_user(ioc->data, tmp.lval, ioc->len)) return -EFAULT;
total: 1 errors, 0 warnings, 8 lines checked
./patches/drivers-net-tulip-de4x5c-fix-copy-length-in-de4x5_ioctl.patch has style problems, please review. If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Please run checkpatch prior to sending patches
Cc: Dan Rosenberg <dan.j.rosenberg@gmail.com>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
drivers/net/tulip/de4x5.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff -puN drivers/net/tulip/de4x5.c~drivers-net-tulip-de4x5c-fix-copy-length-in-de4x5_ioctl-checkpatch-fixes drivers/net/tulip/de4x5.c
--- a/drivers/net/tulip/de4x5.c~drivers-net-tulip-de4x5c-fix-copy-length-in-de4x5_ioctl-checkpatch-fixes
+++ a/drivers/net/tulip/de4x5.c
@@ -5474,7 +5474,8 @@ de4x5_ioctl(struct net_device *dev, stru
tmp.lval[6] = inl(DE4X5_STRR); j+=4;
tmp.lval[7] = inl(DE4X5_SIGR); j+=4;
ioc->len = j;
- if (copy_to_user(ioc->data, tmp.lval, ioc->len)) return -EFAULT;
+ if (copy_to_user(ioc->data, tmp.lval, ioc->len))
+ return -EFAULT;
break;
#define DE4X5_DUMP 0x0f /* Dump the DE4X5 Status */
_
^ permalink raw reply
* [patch 1/1] sctp: prevent reading out-of-bounds memory
From: akpm @ 2010-10-01 21:16 UTC (permalink / raw)
To: davem; +Cc: netdev, akpm, dan.j.rosenberg, vladislav.yasevich
From: Dan Rosenberg <dan.j.rosenberg@gmail.com>
Two user-controlled allocations in SCTP are subsequently dereferenced as
sockaddr structs, without checking if the dereferenced struct members fall
beyond the end of the allocated chunk. There doesn't appear to be any
information leakage here based on how these members are used and
additional checking, but it's still worth fixing.
[akpm@linux-foundation.org: remove unfashionable newlines, fix gmail tab->space conversion]
Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
net/sctp/socket.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff -puN net/sctp/socket.c~sctp-prevent-reading-out-of-bounds-memory net/sctp/socket.c
--- a/net/sctp/socket.c~sctp-prevent-reading-out-of-bounds-memory
+++ a/net/sctp/socket.c
@@ -918,6 +918,11 @@ SCTP_STATIC int sctp_setsockopt_bindx(st
/* Walk through the addrs buffer and count the number of addresses. */
addr_buf = kaddrs;
while (walk_size < addrs_size) {
+ if (walk_size + sizeof(sa_family_t) > addrs_size) {
+ kfree(kaddrs);
+ return -EINVAL;
+ }
+
sa_addr = (struct sockaddr *)addr_buf;
af = sctp_get_af_specific(sa_addr->sa_family);
@@ -1004,9 +1009,13 @@ static int __sctp_connect(struct sock* s
/* Walk through the addrs buffer and count the number of addresses. */
addr_buf = kaddrs;
while (walk_size < addrs_size) {
+ if (walk_size + sizeof(sa_family_t) > addrs_size) {
+ err = -EINVAL;
+ goto out_free;
+ }
+
sa_addr = (union sctp_addr *)addr_buf;
af = sctp_get_af_specific(sa_addr->sa.sa_family);
- port = ntohs(sa_addr->v4.sin_port);
/* If the address family is not supported or if this address
* causes the address buffer to overflow return EINVAL.
@@ -1016,6 +1025,8 @@ static int __sctp_connect(struct sock* s
goto out_free;
}
+ port = ntohs(sa_addr->v4.sin_port);
+
/* Save current address so we can work with it */
memcpy(&to, sa_addr, af->sockaddr_len);
_
^ permalink raw reply
* Re: [patch v2 04/12] [PATCH 04/12] IPVS: Add struct ip_vs_conn_param
From: Julian Anastasov @ 2010-10-01 20:58 UTC (permalink / raw)
To: Simon Horman
Cc: lvs-devel, netdev, netfilter, netfilter-devel, Jan Engelhardt,
Stephen Hemminger, Wensong Zhang, Patrick McHardy
In-Reply-To: <20101001143942.035496196@akiko.akashicho.tokyo.vergenet.net>
Hello,
On Fri, 1 Oct 2010, Simon Horman wrote:
> +static int
> +ip_vs_conn_fill_param_proto(int af, const struct sk_buff *skb,
> + const struct ip_vs_iphdr *iph,
> + unsigned int proto_off, int inverse,
> + struct ip_vs_conn_param *p)
> +{
> + __be16 _ports[2], *pptr;
> +
> + pptr = skb_header_pointer(skb, proto_off, sizeof(_ports), _ports);
> + if (pptr == NULL)
> + return 1;
> +
> + if (likely(!inverse))
> + ip_vs_conn_fill_param(af, iph->protocol, &iph->saddr, pptr[0],
> + &iph->daddr, pptr[1], p);
> + else
Next line is wrong for inverse=1, must be
&iph->daddr, pptr[1], &iph->saddr, pptr[0]
> + ip_vs_conn_fill_param(af, iph->protocol, &iph->saddr, pptr[0],
> + &iph->daddr, pptr[1], p);
> + return 0;
> +}
> +
May be comments before ip_vs_conn_out_get should be
changed:
> @@ -341,9 +351,7 @@ struct ip_vs_conn *ip_vs_ct_in_get
> * s_addr, s_port: pkt source address (inside host)
> * d_addr, d_port: pkt dest address (foreign host)
> */
> -struct ip_vs_conn *ip_vs_conn_out_get
> -(int af, int protocol, const union nf_inet_addr *s_addr, __be16 s_port,
> - const union nf_inet_addr *d_addr, __be16 d_port)
> +struct ip_vs_conn *ip_vs_conn_out_get(const struct ip_vs_conn_param *p)
> ===================================================================
> --- lvs-test-2.6.orig/net/netfilter/ipvs/ip_vs_core.c 2010-10-01 22:06:23.000000000 +0900
> +++ lvs-test-2.6/net/netfilter/ipvs/ip_vs_core.c 2010-10-01 22:10:46.000000000 +0900
> @@ -193,14 +193,11 @@ ip_vs_sched_persist(struct ip_vs_service
> struct ip_vs_iphdr iph;
> struct ip_vs_dest *dest;
> struct ip_vs_conn *ct;
> - int protocol = iph.protocol;
> __be16 dport = 0; /* destination port to forward */
> - __be16 vport = 0; /* virtual service port */
> unsigned int flags;
> union nf_inet_addr snet; /* source network of the client,
> after masking */
> - const union nf_inet_addr fwmark = { .ip = htonl(svc->fwmark) };
> - const union nf_inet_addr *vaddr = &iph.daddr;
> + struct ip_vs_conn_param param;
>
> ip_vs_fill_iphdr(svc->af, skb_network_header(skb), &iph);
>
> @@ -232,6 +229,11 @@ ip_vs_sched_persist(struct ip_vs_service
> * is created for other persistent services.
> */
> {
> + int protocol = iph.protocol;
> + const union nf_inet_addr *vaddr = &iph.daddr;
> + const union nf_inet_addr fwmark = { .ip = htonl(svc->fwmark) };
> + __be16 vport = 0;
> +
> if (ports[1] == svc->port) {
> /* non-FTP template:
> * <protocol, caddr, 0, vaddr, vport, daddr, dport>
> @@ -253,11 +255,12 @@ ip_vs_sched_persist(struct ip_vs_service
> vaddr = &fwmark;
> }
> }
> + ip_vs_conn_fill_param(svc->af, protocol, &snet, 0,
> + vaddr, vport, ¶m);
> }
>
> /* Check if a template already exists */
> - ct = ip_vs_ct_in_get(svc->af, protocol, &snet, 0, vaddr, vport);
> -
> + ct = ip_vs_ct_in_get(¶m);
> if (!ct || !ip_vs_check_template(ct)) {
> /* No template found or the dest of the connection
> * template is not available.
> @@ -272,8 +275,7 @@ ip_vs_sched_persist(struct ip_vs_service
> dport = dest->port;
>
> /* Create a template */
> - ct = ip_vs_conn_new(svc->af, protocol, &snet, 0,vaddr, vport,
> - &dest->addr, dport,
> + ct = ip_vs_conn_new(¶m, &dest->addr, dport,
> IP_VS_CONN_F_TEMPLATE, dest);
> if (ct == NULL)
> return NULL;
> @@ -291,12 +293,7 @@ ip_vs_sched_persist(struct ip_vs_service
> /*
> * Create a new connection according to the template
> */
Missing ip_vs_conn_fill_param here?
> - cp = ip_vs_conn_new(svc->af, iph.protocol,
> - &iph.saddr, ports[0],
> - &iph.daddr, ports[1],
> - &dest->addr, dport,
> - flags,
> - dest);
> + cp = ip_vs_conn_new(¶m, &dest->addr, dport, flags, dest);
> if (cp == NULL) {
> ip_vs_conn_put(ct);
> return NULL;
> ===================================================================
> --- lvs-test-2.6.orig/net/netfilter/ipvs/ip_vs_proto_ah_esp.c 2010-10-01 21:55:19.000000000 +0900
> +++ lvs-test-2.6/net/netfilter/ipvs/ip_vs_proto_ah_esp.c 2010-10-01 22:23:33.000000000 +0900
> @@ -40,6 +40,19 @@ struct isakmp_hdr {
>
> #define PORT_ISAKMP 500
>
> +static void
> +ah_esp_conn_fill_param_proto(int af, const struct ip_vs_iphdr *iph,
> + int inverse, struct ip_vs_conn_param *p)
> +{
> + if (likely(!inverse))
> + ip_vs_conn_fill_param(af, IPPROTO_UDP,
> + &iph->saddr, htons(PORT_ISAKMP),
> + &iph->daddr, htons(PORT_ISAKMP), p);
> + else
For inverse=1 iph->protocol must be IPPROTO_UDP
and &iph->daddr before &iph->saddr:
> + ip_vs_conn_fill_param(af, iph->protocol,
> + &iph->saddr, htons(PORT_ISAKMP),
> + &iph->daddr, htons(PORT_ISAKMP), p);
> +}
Regards
--
Julian Anastasov <ja@ssi.bg>
^ permalink raw reply
* Re: sysctl_{tcp,udp,sctp}_mem overflow on 16TB system.
From: Willy Tarreau @ 2010-10-01 20:30 UTC (permalink / raw)
To: Robin Holt
Cc: David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
James Morris, Hideaki YOSHIFUJI, Patrick McHardy, Vlad Yasevich,
Sridhar Samudrala, linux-kernel, netdev, linux-decnet-user,
linux-sctp
In-Reply-To: <20101001193958.GP14068@sgi.com>
Hello Robin,
On Fri, Oct 01, 2010 at 02:39:58PM -0500, Robin Holt wrote:
>
> On a 16TB system, we noticed that sysctl_tcp_mem[2] and sysctl_udp_mem[2]
> were negative. Code review indicates that the same should occur with
> sysctl_sctp_mem[2].
>
> There are a couple ways we could address this. The one which appears most
> reasonable would be to change the struct proto defintion for sysctl_mem
> from an int to a long and handle all the associated fallout.
>
> An alternative is to limit the calculation to 1/2 INT_MAX. The downside
> being that the administrator could not tune the system to use more than
> INT_MAX memory when much more is available.
>
> Is there a compelling reason to not change the structure's definition
> over to longs instead of ints and deal with the fallout from that change?
Could we not see it differently ? => is there any reason someone would
want to assign more than 8 TB of RAM to the network buffers in the near
future ? Even at 100 Gbps, that's still 10 minutes of traffic stuck in
buffers. Probably that the day we need that large buffers, Linux won't
support 32-bit systems anymore and all such limits will have switched
to 64-bit.
So probably that limiting the value to INT_MAX/2 sounds reasonable ?
Regards,
Willy
^ permalink raw reply
* sysctl_{tcp,udp,sctp}_mem overflow on 16TB system.
From: Robin Holt @ 2010-10-01 19:39 UTC (permalink / raw)
To: David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
James Morris, Hideaki YOSHIFUJI <yosh
Cc: linux-kernel, netdev, linux-decnet-user, linux-sctp
On a 16TB system, we noticed that sysctl_tcp_mem[2] and sysctl_udp_mem[2]
were negative. Code review indicates that the same should occur with
sysctl_sctp_mem[2].
There are a couple ways we could address this. The one which appears most
reasonable would be to change the struct proto defintion for sysctl_mem
from an int to a long and handle all the associated fallout.
An alternative is to limit the calculation to 1/2 INT_MAX. The downside
being that the administrator could not tune the system to use more than
INT_MAX memory when much more is available.
Is there a compelling reason to not change the structure's definition
over to longs instead of ints and deal with the fallout from that change?
Thanks,
Robin Holt
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox