* [PATCH 01/25] netfilter: nf_ct_ecache: refactor notifier registration
From: pablo @ 2012-05-08 18:37 UTC (permalink / raw)
To: netdev; +Cc: davem, openbsc
In-Reply-To: <1336502303-1722-1-git-send-email-pablo@netfilter.org>
From: Tony Zelenoff <antonz@parallels.com>
* ret variable initialization removed as useless
* similar code strings concatenated and functions code
flow became more plain
Signed-off-by: Tony Zelenoff <antonz@parallels.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
net/netfilter/nf_conntrack_ecache.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/net/netfilter/nf_conntrack_ecache.c b/net/netfilter/nf_conntrack_ecache.c
index b924f3a..e7be79e 100644
--- a/net/netfilter/nf_conntrack_ecache.c
+++ b/net/netfilter/nf_conntrack_ecache.c
@@ -84,7 +84,7 @@ EXPORT_SYMBOL_GPL(nf_ct_deliver_cached_events);
int nf_conntrack_register_notifier(struct net *net,
struct nf_ct_event_notifier *new)
{
- int ret = 0;
+ int ret;
struct nf_ct_event_notifier *notify;
mutex_lock(&nf_ct_ecache_mutex);
@@ -95,8 +95,7 @@ int nf_conntrack_register_notifier(struct net *net,
goto out_unlock;
}
rcu_assign_pointer(net->ct.nf_conntrack_event_cb, new);
- mutex_unlock(&nf_ct_ecache_mutex);
- return ret;
+ ret = 0;
out_unlock:
mutex_unlock(&nf_ct_ecache_mutex);
@@ -121,7 +120,7 @@ EXPORT_SYMBOL_GPL(nf_conntrack_unregister_notifier);
int nf_ct_expect_register_notifier(struct net *net,
struct nf_exp_event_notifier *new)
{
- int ret = 0;
+ int ret;
struct nf_exp_event_notifier *notify;
mutex_lock(&nf_ct_ecache_mutex);
@@ -132,8 +131,7 @@ int nf_ct_expect_register_notifier(struct net *net,
goto out_unlock;
}
rcu_assign_pointer(net->ct.nf_expect_event_cb, new);
- mutex_unlock(&nf_ct_ecache_mutex);
- return ret;
+ ret = 0;
out_unlock:
mutex_unlock(&nf_ct_ecache_mutex);
--
1.7.9.5
^ permalink raw reply related
* Re: [PATCH] pch_gbe: Adding read memory barriers
From: David Miller @ 2012-05-08 18:39 UTC (permalink / raw)
To: erwanaliasr1; +Cc: netdev, linux-kernel, tshimizu818
In-Reply-To: <4FA96770.9070009@gmail.com>
From: Erwan Velu <erwanaliasr1@gmail.com>
Date: Tue, 08 May 2012 20:35:28 +0200
> Le 07/05/2012 21:30, Erwan Velu a écrit :
>> From bb1e271db0fa1a29df19bede69faf8004389132d Mon Sep 17 00:00:00 2001
>> From: Erwan Velu <erwan.velu@zodiacaerospace.com>
>> Date: Mon, 7 May 2012 19:15:29 +0000
>> Subject: [PATCH 1/1] pch_gbe: Adding read memory barriers
>
> Does my patch can be considered as acceptable or shall I rework
> something on it ?
You never need to ask questions like this.
Your patch is queued up to be reviewed in patchwork:
http://patchwork.ozlabs.org/project/netdev/list/
Therefore you only make more work for maintainers and irritate them by
asking this, and therefore it will take even longer for them to get to
your patch.
^ permalink raw reply
* [PATCH 15/25] ipvs: always update some of the flags bits in backup
From: pablo @ 2012-05-08 18:38 UTC (permalink / raw)
To: netdev; +Cc: davem, openbsc
In-Reply-To: <1336502303-1722-1-git-send-email-pablo@netfilter.org>
From: Julian Anastasov <ja@ssi.bg>
As the goal is to mirror the inactconns/activeconns
counters in the backup server, make sure the cp->flags are
updated even if cp is still not bound to dest. If cp->flags
are not updated ip_vs_bind_dest will rely only on the initial
flags when updating the counters. To avoid mistakes and
complicated checks for protocol state rely only on the
IP_VS_CONN_F_INACTIVE bit when updating the counters.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
include/linux/ip_vs.h | 5 +++
net/netfilter/ipvs/ip_vs_sync.c | 65 ++++++++++++++-------------------------
2 files changed, 28 insertions(+), 42 deletions(-)
diff --git a/include/linux/ip_vs.h b/include/linux/ip_vs.h
index be0ef3d..8a2d438 100644
--- a/include/linux/ip_vs.h
+++ b/include/linux/ip_vs.h
@@ -89,6 +89,7 @@
#define IP_VS_CONN_F_TEMPLATE 0x1000 /* template, not connection */
#define IP_VS_CONN_F_ONE_PACKET 0x2000 /* forward only one packet */
+/* Initial bits allowed in backup server */
#define IP_VS_CONN_F_BACKUP_MASK (IP_VS_CONN_F_FWD_MASK | \
IP_VS_CONN_F_NOOUTPUT | \
IP_VS_CONN_F_INACTIVE | \
@@ -97,6 +98,10 @@
IP_VS_CONN_F_TEMPLATE \
)
+/* Bits allowed to update in backup server */
+#define IP_VS_CONN_F_BACKUP_UPD_MASK (IP_VS_CONN_F_INACTIVE | \
+ IP_VS_CONN_F_SEQ_MASK)
+
/* Flags that are not sent to backup server start from bit 16 */
#define IP_VS_CONN_F_NFCT (1 << 16) /* use netfilter conntrack */
diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c
index bf5e538..d2df694 100644
--- a/net/netfilter/ipvs/ip_vs_sync.c
+++ b/net/netfilter/ipvs/ip_vs_sync.c
@@ -731,9 +731,30 @@ static void ip_vs_proc_conn(struct net *net, struct ip_vs_conn_param *param,
else
cp = ip_vs_ct_in_get(param);
- if (cp && param->pe_data) /* Free pe_data */
+ if (cp) {
+ /* Free pe_data */
kfree(param->pe_data);
- if (!cp) {
+
+ dest = cp->dest;
+ if ((cp->flags ^ flags) & IP_VS_CONN_F_INACTIVE &&
+ !(flags & IP_VS_CONN_F_TEMPLATE) && dest) {
+ if (flags & IP_VS_CONN_F_INACTIVE) {
+ atomic_dec(&dest->activeconns);
+ atomic_inc(&dest->inactconns);
+ } else {
+ atomic_inc(&dest->activeconns);
+ atomic_dec(&dest->inactconns);
+ }
+ }
+ flags &= IP_VS_CONN_F_BACKUP_UPD_MASK;
+ flags |= cp->flags & ~IP_VS_CONN_F_BACKUP_UPD_MASK;
+ cp->flags = flags;
+ if (!dest) {
+ dest = ip_vs_try_bind_dest(cp);
+ if (dest)
+ atomic_dec(&dest->refcnt);
+ }
+ } else {
/*
* Find the appropriate destination for the connection.
* If it is not found the connection will remain unbound
@@ -742,18 +763,6 @@ static void ip_vs_proc_conn(struct net *net, struct ip_vs_conn_param *param,
dest = ip_vs_find_dest(net, type, daddr, dport, param->vaddr,
param->vport, protocol, fwmark, flags);
- /* Set the approprite ativity flag */
- if (protocol == IPPROTO_TCP) {
- if (state != IP_VS_TCP_S_ESTABLISHED)
- flags |= IP_VS_CONN_F_INACTIVE;
- else
- flags &= ~IP_VS_CONN_F_INACTIVE;
- } else if (protocol == IPPROTO_SCTP) {
- if (state != IP_VS_SCTP_S_ESTABLISHED)
- flags |= IP_VS_CONN_F_INACTIVE;
- else
- flags &= ~IP_VS_CONN_F_INACTIVE;
- }
cp = ip_vs_conn_new(param, daddr, dport, flags, dest, fwmark);
if (dest)
atomic_dec(&dest->refcnt);
@@ -763,34 +772,6 @@ static void ip_vs_proc_conn(struct net *net, struct ip_vs_conn_param *param,
IP_VS_DBG(2, "BACKUP, add new conn. failed\n");
return;
}
- } else if (!cp->dest) {
- dest = ip_vs_try_bind_dest(cp);
- if (dest)
- atomic_dec(&dest->refcnt);
- } else if ((cp->dest) && (cp->protocol == IPPROTO_TCP) &&
- (cp->state != state)) {
- /* update active/inactive flag for the connection */
- dest = cp->dest;
- if (!(cp->flags & IP_VS_CONN_F_INACTIVE) &&
- (state != IP_VS_TCP_S_ESTABLISHED)) {
- atomic_dec(&dest->activeconns);
- atomic_inc(&dest->inactconns);
- cp->flags |= IP_VS_CONN_F_INACTIVE;
- } else if ((cp->flags & IP_VS_CONN_F_INACTIVE) &&
- (state == IP_VS_TCP_S_ESTABLISHED)) {
- atomic_inc(&dest->activeconns);
- atomic_dec(&dest->inactconns);
- cp->flags &= ~IP_VS_CONN_F_INACTIVE;
- }
- } else if ((cp->dest) && (cp->protocol == IPPROTO_SCTP) &&
- (cp->state != state)) {
- dest = cp->dest;
- if (!(cp->flags & IP_VS_CONN_F_INACTIVE) &&
- (state != IP_VS_SCTP_S_ESTABLISHED)) {
- atomic_dec(&dest->activeconns);
- atomic_inc(&dest->inactconns);
- cp->flags &= ~IP_VS_CONN_F_INACTIVE;
- }
}
if (opt)
--
1.7.9.5
^ permalink raw reply related
* [PATCH 23/25] netfilter: nf_ct_expect: partially implement ctnetlink_change_expect
From: pablo @ 2012-05-08 18:38 UTC (permalink / raw)
To: netdev; +Cc: davem, openbsc
In-Reply-To: <1336502303-1722-1-git-send-email-pablo@netfilter.org>
From: Kelvie Wong <kelvie@ieee.org>
This refreshes the "timeout" attribute in existing expectations if one is
given.
The use case for this would be for userspace helpers to extend the lifetime
of the expectation when requested, as this is not possible right now
without deleting/recreating the expectation.
I use this specifically for forwarding DCERPC traffic through:
DCERPC has a port mapper daemon that chooses a (seemingly) random port for
future traffic to go to. We expect this traffic (with a reasonable
timeout), but sometimes the port mapper will tell the client to continue
using the same port. This allows us to extend the expectation accordingly.
Signed-off-by: Kelvie Wong <kelvie@ieee.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
net/netfilter/nf_conntrack_netlink.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 462ec2d..6f4b00a 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -2080,7 +2080,15 @@ static int
ctnetlink_change_expect(struct nf_conntrack_expect *x,
const struct nlattr * const cda[])
{
- return -EOPNOTSUPP;
+ if (cda[CTA_EXPECT_TIMEOUT]) {
+ if (!del_timer(&x->timeout))
+ return -ETIME;
+
+ x->timeout.expires = jiffies +
+ ntohl(nla_get_be32(cda[CTA_EXPECT_TIMEOUT])) * HZ;
+ add_timer(&x->timeout);
+ }
+ return 0;
}
static const struct nla_policy exp_nat_nla_policy[CTA_EXPECT_NAT_MAX+1] = {
--
1.7.9.5
^ permalink raw reply related
* [PATCH 14/25] ipvs: fix ip_vs_try_bind_dest to rebind app and transmitter
From: pablo @ 2012-05-08 18:38 UTC (permalink / raw)
To: netdev; +Cc: davem, openbsc
In-Reply-To: <1336502303-1722-1-git-send-email-pablo@netfilter.org>
From: Julian Anastasov <ja@ssi.bg>
Initially, when the synced connection is created we
use the forwarding method provided by master but once we
bind to destination it can be changed. As result, we must
update the application and the transmitter.
As ip_vs_try_bind_dest is called always for connections
that require dest binding, there is no need to validate the
cp and dest pointers.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
net/netfilter/ipvs/ip_vs_conn.c | 33 ++++++++++++++++++++++++++-------
1 file changed, 26 insertions(+), 7 deletions(-)
diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index 1c1bb30..fd74f88 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
@@ -613,14 +613,33 @@ struct ip_vs_dest *ip_vs_try_bind_dest(struct ip_vs_conn *cp)
{
struct ip_vs_dest *dest;
- if ((cp) && (!cp->dest)) {
- dest = ip_vs_find_dest(ip_vs_conn_net(cp), cp->af, &cp->daddr,
- cp->dport, &cp->vaddr, cp->vport,
- cp->protocol, cp->fwmark, cp->flags);
+ dest = ip_vs_find_dest(ip_vs_conn_net(cp), cp->af, &cp->daddr,
+ cp->dport, &cp->vaddr, cp->vport,
+ cp->protocol, cp->fwmark, cp->flags);
+ if (dest) {
+ struct ip_vs_proto_data *pd;
+
+ /* Applications work depending on the forwarding method
+ * but better to reassign them always when binding dest */
+ if (cp->app)
+ ip_vs_unbind_app(cp);
+
ip_vs_bind_dest(cp, dest);
- return dest;
- } else
- return NULL;
+
+ /* Update its packet transmitter */
+ cp->packet_xmit = NULL;
+#ifdef CONFIG_IP_VS_IPV6
+ if (cp->af == AF_INET6)
+ ip_vs_bind_xmit_v6(cp);
+ else
+#endif
+ ip_vs_bind_xmit(cp);
+
+ pd = ip_vs_proto_data_get(ip_vs_conn_net(cp), cp->protocol);
+ if (pd && atomic_read(&pd->appcnt))
+ ip_vs_bind_app(cp, pd->pp);
+ }
+ return dest;
}
--
1.7.9.5
^ permalink raw reply related
* [PATCH 18/25] ipvs: add support for sync threads
From: pablo @ 2012-05-08 18:38 UTC (permalink / raw)
To: netdev; +Cc: davem, openbsc
In-Reply-To: <1336502303-1722-1-git-send-email-pablo@netfilter.org>
From: Pablo Neira Ayuso <pablo@netfilter.org>
Allow master and backup servers to use many threads
for sync traffic. Add sysctl var "sync_ports" to define the
number of threads. Every thread will use single UDP port,
thread 0 will use the default port 8848 while last thread
will use port 8848+sync_ports-1.
The sync traffic for connections is scheduled to many
master threads based on the cp address but one connection is
always assigned to same thread to avoid reordering of the
sync messages.
Remove ip_vs_sync_switch_mode because this check
for sync mode change is still risky. Instead, check for mode
change under sync_buff_lock.
Make sure the backup socks do not block on reading.
Special thanks to Aleksey Chudov for helping in all tests.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
include/net/ip_vs.h | 34 +++-
net/netfilter/ipvs/ip_vs_conn.c | 7 +
net/netfilter/ipvs/ip_vs_ctl.c | 29 ++-
net/netfilter/ipvs/ip_vs_sync.c | 401 ++++++++++++++++++++++++---------------
4 files changed, 305 insertions(+), 166 deletions(-)
diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index d3a4b93..d6146b4 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -784,6 +784,16 @@ struct ip_vs_app {
void (*timeout_change)(struct ip_vs_app *app, int flags);
};
+struct ipvs_master_sync_state {
+ struct list_head sync_queue;
+ struct ip_vs_sync_buff *sync_buff;
+ int sync_queue_len;
+ unsigned int sync_queue_delay;
+ struct task_struct *master_thread;
+ struct delayed_work master_wakeup_work;
+ struct netns_ipvs *ipvs;
+};
+
/* IPVS in network namespace */
struct netns_ipvs {
int gen; /* Generation */
@@ -870,6 +880,7 @@ struct netns_ipvs {
#endif
int sysctl_snat_reroute;
int sysctl_sync_ver;
+ int sysctl_sync_ports;
int sysctl_sync_qlen_max;
int sysctl_sync_sock_size;
int sysctl_cache_bypass;
@@ -893,16 +904,11 @@ struct netns_ipvs {
spinlock_t est_lock;
struct timer_list est_timer; /* Estimation timer */
/* ip_vs_sync */
- struct list_head sync_queue;
- int sync_queue_len;
- unsigned int sync_queue_delay;
- struct delayed_work master_wakeup_work;
spinlock_t sync_lock;
- struct ip_vs_sync_buff *sync_buff;
+ struct ipvs_master_sync_state *ms;
spinlock_t sync_buff_lock;
- struct sockaddr_in sync_mcast_addr;
- struct task_struct *master_thread;
- struct task_struct *backup_thread;
+ struct task_struct **backup_threads;
+ int threads_mask;
int send_mesg_maxlen;
int recv_mesg_maxlen;
volatile int sync_state;
@@ -926,6 +932,7 @@ struct netns_ipvs {
#define IPVS_SYNC_SEND_DELAY (HZ / 50)
#define IPVS_SYNC_CHECK_PERIOD HZ
#define IPVS_SYNC_FLUSH_TIME (HZ * 2)
+#define IPVS_SYNC_PORTS_MAX (1 << 6)
#ifdef CONFIG_SYSCTL
@@ -954,6 +961,11 @@ static inline int sysctl_sync_ver(struct netns_ipvs *ipvs)
return ipvs->sysctl_sync_ver;
}
+static inline int sysctl_sync_ports(struct netns_ipvs *ipvs)
+{
+ return ACCESS_ONCE(ipvs->sysctl_sync_ports);
+}
+
static inline int sysctl_sync_qlen_max(struct netns_ipvs *ipvs)
{
return ipvs->sysctl_sync_qlen_max;
@@ -991,6 +1003,11 @@ static inline int sysctl_sync_ver(struct netns_ipvs *ipvs)
return DEFAULT_SYNC_VER;
}
+static inline int sysctl_sync_ports(struct netns_ipvs *ipvs)
+{
+ return 1;
+}
+
static inline int sysctl_sync_qlen_max(struct netns_ipvs *ipvs)
{
return IPVS_SYNC_QLEN_MAX;
@@ -1240,7 +1257,6 @@ extern void ip_vs_scheduler_err(struct ip_vs_service *svc, const char *msg);
extern struct ip_vs_stats ip_vs_stats;
extern int sysctl_ip_vs_sync_ver;
-extern void ip_vs_sync_switch_mode(struct net *net, int mode);
extern struct ip_vs_service *
ip_vs_service_get(struct net *net, int af, __u32 fwmark, __u16 protocol,
const union nf_inet_addr *vaddr, __be16 vport);
diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index 4f3205d..c7edf20 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
@@ -619,12 +619,19 @@ struct ip_vs_dest *ip_vs_try_bind_dest(struct ip_vs_conn *cp)
if (dest) {
struct ip_vs_proto_data *pd;
+ spin_lock(&cp->lock);
+ if (cp->dest) {
+ spin_unlock(&cp->lock);
+ return dest;
+ }
+
/* Applications work depending on the forwarding method
* but better to reassign them always when binding dest */
if (cp->app)
ip_vs_unbind_app(cp);
ip_vs_bind_dest(cp, dest);
+ spin_unlock(&cp->lock);
/* Update its packet transmitter */
cp->packet_xmit = NULL;
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index a77b9bd..dd811b8 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -1657,9 +1657,24 @@ proc_do_sync_mode(ctl_table *table, int write,
if ((*valp < 0) || (*valp > 1)) {
/* Restore the correct value */
*valp = val;
- } else {
- struct net *net = current->nsproxy->net_ns;
- ip_vs_sync_switch_mode(net, val);
+ }
+ }
+ return rc;
+}
+
+static int
+proc_do_sync_ports(ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+ int *valp = table->data;
+ int val = *valp;
+ int rc;
+
+ rc = proc_dointvec(table, write, buffer, lenp, ppos);
+ if (write && (*valp != val)) {
+ if (*valp < 1 || !is_power_of_2(*valp)) {
+ /* Restore the correct value */
+ *valp = val;
}
}
return rc;
@@ -1723,6 +1738,12 @@ static struct ctl_table vs_vars[] = {
.proc_handler = &proc_do_sync_mode,
},
{
+ .procname = "sync_ports",
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = &proc_do_sync_ports,
+ },
+ {
.procname = "sync_qlen_max",
.maxlen = sizeof(int),
.mode = 0644,
@@ -3686,6 +3707,8 @@ int __net_init ip_vs_control_net_init_sysctl(struct net *net)
tbl[idx++].data = &ipvs->sysctl_snat_reroute;
ipvs->sysctl_sync_ver = 1;
tbl[idx++].data = &ipvs->sysctl_sync_ver;
+ ipvs->sysctl_sync_ports = 1;
+ tbl[idx++].data = &ipvs->sysctl_sync_ports;
ipvs->sysctl_sync_qlen_max = nr_free_buffer_pages() / 32;
tbl[idx++].data = &ipvs->sysctl_sync_qlen_max;
ipvs->sysctl_sync_sock_size = 0;
diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c
index 8d6a421..effa10c 100644
--- a/net/netfilter/ipvs/ip_vs_sync.c
+++ b/net/netfilter/ipvs/ip_vs_sync.c
@@ -196,6 +196,7 @@ struct ip_vs_sync_thread_data {
struct net *net;
struct socket *sock;
char *buf;
+ int id;
};
/* Version 0 definition of packet sizes */
@@ -271,13 +272,6 @@ struct ip_vs_sync_buff {
unsigned char *end;
};
-/* multicast addr */
-static struct sockaddr_in mcast_addr = {
- .sin_family = AF_INET,
- .sin_port = cpu_to_be16(IP_VS_SYNC_PORT),
- .sin_addr.s_addr = cpu_to_be32(IP_VS_SYNC_GROUP),
-};
-
/*
* Copy of struct ip_vs_seq
* From unaligned network order to aligned host order
@@ -300,22 +294,22 @@ static void hton_seq(struct ip_vs_seq *ho, struct ip_vs_seq *no)
put_unaligned_be32(ho->previous_delta, &no->previous_delta);
}
-static inline struct ip_vs_sync_buff *sb_dequeue(struct netns_ipvs *ipvs)
+static inline struct ip_vs_sync_buff *
+sb_dequeue(struct netns_ipvs *ipvs, struct ipvs_master_sync_state *ms)
{
struct ip_vs_sync_buff *sb;
spin_lock_bh(&ipvs->sync_lock);
- if (list_empty(&ipvs->sync_queue)) {
+ if (list_empty(&ms->sync_queue)) {
sb = NULL;
__set_current_state(TASK_INTERRUPTIBLE);
} else {
- sb = list_entry(ipvs->sync_queue.next,
- struct ip_vs_sync_buff,
+ sb = list_entry(ms->sync_queue.next, struct ip_vs_sync_buff,
list);
list_del(&sb->list);
- ipvs->sync_queue_len--;
- if (!ipvs->sync_queue_len)
- ipvs->sync_queue_delay = 0;
+ ms->sync_queue_len--;
+ if (!ms->sync_queue_len)
+ ms->sync_queue_delay = 0;
}
spin_unlock_bh(&ipvs->sync_lock);
@@ -338,7 +332,7 @@ ip_vs_sync_buff_create(struct netns_ipvs *ipvs)
kfree(sb);
return NULL;
}
- sb->mesg->reserved = 0; /* old nr_conns i.e. must be zeo now */
+ sb->mesg->reserved = 0; /* old nr_conns i.e. must be zero now */
sb->mesg->version = SYNC_PROTO_VER;
sb->mesg->syncid = ipvs->master_syncid;
sb->mesg->size = sizeof(struct ip_vs_sync_mesg);
@@ -357,20 +351,21 @@ static inline void ip_vs_sync_buff_release(struct ip_vs_sync_buff *sb)
kfree(sb);
}
-static inline void sb_queue_tail(struct netns_ipvs *ipvs)
+static inline void sb_queue_tail(struct netns_ipvs *ipvs,
+ struct ipvs_master_sync_state *ms)
{
- struct ip_vs_sync_buff *sb = ipvs->sync_buff;
+ struct ip_vs_sync_buff *sb = ms->sync_buff;
spin_lock(&ipvs->sync_lock);
if (ipvs->sync_state & IP_VS_STATE_MASTER &&
- ipvs->sync_queue_len < sysctl_sync_qlen_max(ipvs)) {
- if (!ipvs->sync_queue_len)
- schedule_delayed_work(&ipvs->master_wakeup_work,
+ ms->sync_queue_len < sysctl_sync_qlen_max(ipvs)) {
+ if (!ms->sync_queue_len)
+ schedule_delayed_work(&ms->master_wakeup_work,
max(IPVS_SYNC_SEND_DELAY, 1));
- ipvs->sync_queue_len++;
- list_add_tail(&sb->list, &ipvs->sync_queue);
- if ((++ipvs->sync_queue_delay) == IPVS_SYNC_WAKEUP_RATE)
- wake_up_process(ipvs->master_thread);
+ ms->sync_queue_len++;
+ list_add_tail(&sb->list, &ms->sync_queue);
+ if ((++ms->sync_queue_delay) == IPVS_SYNC_WAKEUP_RATE)
+ wake_up_process(ms->master_thread);
} else
ip_vs_sync_buff_release(sb);
spin_unlock(&ipvs->sync_lock);
@@ -381,15 +376,15 @@ static inline void sb_queue_tail(struct netns_ipvs *ipvs)
* than the specified time or the specified time is zero.
*/
static inline struct ip_vs_sync_buff *
-get_curr_sync_buff(struct netns_ipvs *ipvs, unsigned long time)
+get_curr_sync_buff(struct netns_ipvs *ipvs, struct ipvs_master_sync_state *ms,
+ unsigned long time)
{
struct ip_vs_sync_buff *sb;
spin_lock_bh(&ipvs->sync_buff_lock);
- if (ipvs->sync_buff &&
- time_after_eq(jiffies - ipvs->sync_buff->firstuse, time)) {
- sb = ipvs->sync_buff;
- ipvs->sync_buff = NULL;
+ sb = ms->sync_buff;
+ if (sb && time_after_eq(jiffies - sb->firstuse, time)) {
+ ms->sync_buff = NULL;
__set_current_state(TASK_RUNNING);
} else
sb = NULL;
@@ -397,31 +392,10 @@ get_curr_sync_buff(struct netns_ipvs *ipvs, unsigned long time)
return sb;
}
-/*
- * Switch mode from sending version 0 or 1
- * - must handle sync_buf
- */
-void ip_vs_sync_switch_mode(struct net *net, int mode)
+static inline int
+select_master_thread_id(struct netns_ipvs *ipvs, struct ip_vs_conn *cp)
{
- struct netns_ipvs *ipvs = net_ipvs(net);
- struct ip_vs_sync_buff *sb;
-
- spin_lock_bh(&ipvs->sync_buff_lock);
- if (!(ipvs->sync_state & IP_VS_STATE_MASTER))
- goto unlock;
- sb = ipvs->sync_buff;
- if (mode == sysctl_sync_ver(ipvs) || !sb)
- goto unlock;
-
- /* Buffer empty ? then let buf_create do the job */
- if (sb->mesg->size <= sizeof(struct ip_vs_sync_mesg)) {
- ip_vs_sync_buff_release(sb);
- ipvs->sync_buff = NULL;
- } else
- sb_queue_tail(ipvs);
-
-unlock:
- spin_unlock_bh(&ipvs->sync_buff_lock);
+ return ((long) cp >> (1 + ilog2(sizeof(*cp)))) & ipvs->threads_mask;
}
/*
@@ -543,6 +517,9 @@ static void ip_vs_sync_conn_v0(struct net *net, struct ip_vs_conn *cp,
struct netns_ipvs *ipvs = net_ipvs(net);
struct ip_vs_sync_mesg_v0 *m;
struct ip_vs_sync_conn_v0 *s;
+ struct ip_vs_sync_buff *buff;
+ struct ipvs_master_sync_state *ms;
+ int id;
int len;
if (unlikely(cp->af != AF_INET))
@@ -555,20 +532,37 @@ static void ip_vs_sync_conn_v0(struct net *net, struct ip_vs_conn *cp,
return;
spin_lock(&ipvs->sync_buff_lock);
- if (!ipvs->sync_buff) {
- ipvs->sync_buff =
- ip_vs_sync_buff_create_v0(ipvs);
- if (!ipvs->sync_buff) {
+ if (!(ipvs->sync_state & IP_VS_STATE_MASTER)) {
+ spin_unlock(&ipvs->sync_buff_lock);
+ return;
+ }
+
+ id = select_master_thread_id(ipvs, cp);
+ ms = &ipvs->ms[id];
+ buff = ms->sync_buff;
+ if (buff) {
+ m = (struct ip_vs_sync_mesg_v0 *) buff->mesg;
+ /* Send buffer if it is for v1 */
+ if (!m->nr_conns) {
+ sb_queue_tail(ipvs, ms);
+ ms->sync_buff = NULL;
+ buff = NULL;
+ }
+ }
+ if (!buff) {
+ buff = ip_vs_sync_buff_create_v0(ipvs);
+ if (!buff) {
spin_unlock(&ipvs->sync_buff_lock);
pr_err("ip_vs_sync_buff_create failed.\n");
return;
}
+ ms->sync_buff = buff;
}
len = (cp->flags & IP_VS_CONN_F_SEQ_MASK) ? FULL_CONN_SIZE :
SIMPLE_CONN_SIZE;
- m = (struct ip_vs_sync_mesg_v0 *)ipvs->sync_buff->mesg;
- s = (struct ip_vs_sync_conn_v0 *)ipvs->sync_buff->head;
+ m = (struct ip_vs_sync_mesg_v0 *) buff->mesg;
+ s = (struct ip_vs_sync_conn_v0 *) buff->head;
/* copy members */
s->reserved = 0;
@@ -589,12 +583,12 @@ static void ip_vs_sync_conn_v0(struct net *net, struct ip_vs_conn *cp,
m->nr_conns++;
m->size += len;
- ipvs->sync_buff->head += len;
+ buff->head += len;
/* check if there is a space for next one */
- if (ipvs->sync_buff->head + FULL_CONN_SIZE > ipvs->sync_buff->end) {
- sb_queue_tail(ipvs);
- ipvs->sync_buff = NULL;
+ if (buff->head + FULL_CONN_SIZE > buff->end) {
+ sb_queue_tail(ipvs, ms);
+ ms->sync_buff = NULL;
}
spin_unlock(&ipvs->sync_buff_lock);
@@ -619,6 +613,9 @@ void ip_vs_sync_conn(struct net *net, struct ip_vs_conn *cp, int pkts)
struct netns_ipvs *ipvs = net_ipvs(net);
struct ip_vs_sync_mesg *m;
union ip_vs_sync_conn *s;
+ struct ip_vs_sync_buff *buff;
+ struct ipvs_master_sync_state *ms;
+ int id;
__u8 *p;
unsigned int len, pe_name_len, pad;
@@ -645,6 +642,13 @@ sloop:
}
spin_lock(&ipvs->sync_buff_lock);
+ if (!(ipvs->sync_state & IP_VS_STATE_MASTER)) {
+ spin_unlock(&ipvs->sync_buff_lock);
+ return;
+ }
+
+ id = select_master_thread_id(ipvs, cp);
+ ms = &ipvs->ms[id];
#ifdef CONFIG_IP_VS_IPV6
if (cp->af == AF_INET6)
@@ -663,27 +667,32 @@ sloop:
/* check if there is a space for this one */
pad = 0;
- if (ipvs->sync_buff) {
- pad = (4 - (size_t)ipvs->sync_buff->head) & 3;
- if (ipvs->sync_buff->head + len + pad > ipvs->sync_buff->end) {
- sb_queue_tail(ipvs);
- ipvs->sync_buff = NULL;
+ buff = ms->sync_buff;
+ if (buff) {
+ m = buff->mesg;
+ pad = (4 - (size_t) buff->head) & 3;
+ /* Send buffer if it is for v0 */
+ if (buff->head + len + pad > buff->end || m->reserved) {
+ sb_queue_tail(ipvs, ms);
+ ms->sync_buff = NULL;
+ buff = NULL;
pad = 0;
}
}
- if (!ipvs->sync_buff) {
- ipvs->sync_buff = ip_vs_sync_buff_create(ipvs);
- if (!ipvs->sync_buff) {
+ if (!buff) {
+ buff = ip_vs_sync_buff_create(ipvs);
+ if (!buff) {
spin_unlock(&ipvs->sync_buff_lock);
pr_err("ip_vs_sync_buff_create failed.\n");
return;
}
+ ms->sync_buff = buff;
+ m = buff->mesg;
}
- m = ipvs->sync_buff->mesg;
- p = ipvs->sync_buff->head;
- ipvs->sync_buff->head += pad + len;
+ p = buff->head;
+ buff->head += pad + len;
m->size += pad + len;
/* Add ev. padding from prev. sync_conn */
while (pad--)
@@ -834,6 +843,7 @@ static void ip_vs_proc_conn(struct net *net, struct ip_vs_conn_param *param,
kfree(param->pe_data);
dest = cp->dest;
+ spin_lock(&cp->lock);
if ((cp->flags ^ flags) & IP_VS_CONN_F_INACTIVE &&
!(flags & IP_VS_CONN_F_TEMPLATE) && dest) {
if (flags & IP_VS_CONN_F_INACTIVE) {
@@ -847,6 +857,7 @@ static void ip_vs_proc_conn(struct net *net, struct ip_vs_conn_param *param,
flags &= IP_VS_CONN_F_BACKUP_UPD_MASK;
flags |= cp->flags & ~IP_VS_CONN_F_BACKUP_UPD_MASK;
cp->flags = flags;
+ spin_unlock(&cp->lock);
if (!dest) {
dest = ip_vs_try_bind_dest(cp);
if (dest)
@@ -1399,9 +1410,15 @@ static int bind_mcastif_addr(struct socket *sock, char *ifname)
/*
* Set up sending multicast socket over UDP
*/
-static struct socket *make_send_sock(struct net *net)
+static struct socket *make_send_sock(struct net *net, int id)
{
struct netns_ipvs *ipvs = net_ipvs(net);
+ /* multicast addr */
+ struct sockaddr_in mcast_addr = {
+ .sin_family = AF_INET,
+ .sin_port = cpu_to_be16(IP_VS_SYNC_PORT + id),
+ .sin_addr.s_addr = cpu_to_be32(IP_VS_SYNC_GROUP),
+ };
struct socket *sock;
int result;
@@ -1453,9 +1470,15 @@ error:
/*
* Set up receiving multicast socket over UDP
*/
-static struct socket *make_receive_sock(struct net *net)
+static struct socket *make_receive_sock(struct net *net, int id)
{
struct netns_ipvs *ipvs = net_ipvs(net);
+ /* multicast addr */
+ struct sockaddr_in mcast_addr = {
+ .sin_family = AF_INET,
+ .sin_port = cpu_to_be16(IP_VS_SYNC_PORT + id),
+ .sin_addr.s_addr = cpu_to_be32(IP_VS_SYNC_GROUP),
+ };
struct socket *sock;
int result;
@@ -1549,10 +1572,10 @@ ip_vs_receive(struct socket *sock, char *buffer, const size_t buflen)
iov.iov_base = buffer;
iov.iov_len = (size_t)buflen;
- len = kernel_recvmsg(sock, &msg, &iov, 1, buflen, 0);
+ len = kernel_recvmsg(sock, &msg, &iov, 1, buflen, MSG_DONTWAIT);
if (len < 0)
- return -1;
+ return len;
LeaveFunction(7);
return len;
@@ -1561,44 +1584,47 @@ ip_vs_receive(struct socket *sock, char *buffer, const size_t buflen)
/* Wakeup the master thread for sending */
static void master_wakeup_work_handler(struct work_struct *work)
{
- struct netns_ipvs *ipvs = container_of(work, struct netns_ipvs,
- master_wakeup_work.work);
+ struct ipvs_master_sync_state *ms =
+ container_of(work, struct ipvs_master_sync_state,
+ master_wakeup_work.work);
+ struct netns_ipvs *ipvs = ms->ipvs;
spin_lock_bh(&ipvs->sync_lock);
- if (ipvs->sync_queue_len &&
- ipvs->sync_queue_delay < IPVS_SYNC_WAKEUP_RATE) {
- ipvs->sync_queue_delay = IPVS_SYNC_WAKEUP_RATE;
- wake_up_process(ipvs->master_thread);
+ if (ms->sync_queue_len &&
+ ms->sync_queue_delay < IPVS_SYNC_WAKEUP_RATE) {
+ ms->sync_queue_delay = IPVS_SYNC_WAKEUP_RATE;
+ wake_up_process(ms->master_thread);
}
spin_unlock_bh(&ipvs->sync_lock);
}
/* Get next buffer to send */
static inline struct ip_vs_sync_buff *
-next_sync_buff(struct netns_ipvs *ipvs)
+next_sync_buff(struct netns_ipvs *ipvs, struct ipvs_master_sync_state *ms)
{
struct ip_vs_sync_buff *sb;
- sb = sb_dequeue(ipvs);
+ sb = sb_dequeue(ipvs, ms);
if (sb)
return sb;
/* Do not delay entries in buffer for more than 2 seconds */
- return get_curr_sync_buff(ipvs, IPVS_SYNC_FLUSH_TIME);
+ return get_curr_sync_buff(ipvs, ms, IPVS_SYNC_FLUSH_TIME);
}
static int sync_thread_master(void *data)
{
struct ip_vs_sync_thread_data *tinfo = data;
struct netns_ipvs *ipvs = net_ipvs(tinfo->net);
+ struct ipvs_master_sync_state *ms = &ipvs->ms[tinfo->id];
struct sock *sk = tinfo->sock->sk;
struct ip_vs_sync_buff *sb;
pr_info("sync thread started: state = MASTER, mcast_ifn = %s, "
- "syncid = %d\n",
- ipvs->master_mcast_ifn, ipvs->master_syncid);
+ "syncid = %d, id = %d\n",
+ ipvs->master_mcast_ifn, ipvs->master_syncid, tinfo->id);
for (;;) {
- sb = next_sync_buff(ipvs);
+ sb = next_sync_buff(ipvs, ms);
if (unlikely(kthread_should_stop()))
break;
if (!sb) {
@@ -1624,12 +1650,12 @@ done:
ip_vs_sync_buff_release(sb);
/* clean up the sync_buff queue */
- while ((sb = sb_dequeue(ipvs)))
+ while ((sb = sb_dequeue(ipvs, ms)))
ip_vs_sync_buff_release(sb);
__set_current_state(TASK_RUNNING);
/* clean up the current sync_buff */
- sb = get_curr_sync_buff(ipvs, 0);
+ sb = get_curr_sync_buff(ipvs, ms, 0);
if (sb)
ip_vs_sync_buff_release(sb);
@@ -1648,8 +1674,8 @@ static int sync_thread_backup(void *data)
int len;
pr_info("sync thread started: state = BACKUP, mcast_ifn = %s, "
- "syncid = %d\n",
- ipvs->backup_mcast_ifn, ipvs->backup_syncid);
+ "syncid = %d, id = %d\n",
+ ipvs->backup_mcast_ifn, ipvs->backup_syncid, tinfo->id);
while (!kthread_should_stop()) {
wait_event_interruptible(*sk_sleep(tinfo->sock->sk),
@@ -1661,7 +1687,8 @@ static int sync_thread_backup(void *data)
len = ip_vs_receive(tinfo->sock, tinfo->buf,
ipvs->recv_mesg_maxlen);
if (len <= 0) {
- pr_err("receiving message error\n");
+ if (len != -EAGAIN)
+ pr_err("receiving message error\n");
break;
}
@@ -1685,90 +1712,140 @@ static int sync_thread_backup(void *data)
int start_sync_thread(struct net *net, int state, char *mcast_ifn, __u8 syncid)
{
struct ip_vs_sync_thread_data *tinfo;
- struct task_struct **realtask, *task;
+ struct task_struct **array = NULL, *task;
struct socket *sock;
struct netns_ipvs *ipvs = net_ipvs(net);
- char *name, *buf = NULL;
+ char *name;
int (*threadfn)(void *data);
+ int id, count;
int result = -ENOMEM;
IP_VS_DBG(7, "%s(): pid %d\n", __func__, task_pid_nr(current));
IP_VS_DBG(7, "Each ip_vs_sync_conn entry needs %Zd bytes\n",
sizeof(struct ip_vs_sync_conn_v0));
+ if (!ipvs->sync_state) {
+ count = clamp(sysctl_sync_ports(ipvs), 1, IPVS_SYNC_PORTS_MAX);
+ ipvs->threads_mask = count - 1;
+ } else
+ count = ipvs->threads_mask + 1;
if (state == IP_VS_STATE_MASTER) {
- if (ipvs->master_thread)
+ if (ipvs->ms)
return -EEXIST;
strlcpy(ipvs->master_mcast_ifn, mcast_ifn,
sizeof(ipvs->master_mcast_ifn));
ipvs->master_syncid = syncid;
- realtask = &ipvs->master_thread;
- name = "ipvs_master:%d";
+ name = "ipvs-m:%d:%d";
threadfn = sync_thread_master;
- ipvs->sync_queue_len = 0;
- ipvs->sync_queue_delay = 0;
- INIT_DELAYED_WORK(&ipvs->master_wakeup_work,
- master_wakeup_work_handler);
- sock = make_send_sock(net);
} else if (state == IP_VS_STATE_BACKUP) {
- if (ipvs->backup_thread)
+ if (ipvs->backup_threads)
return -EEXIST;
strlcpy(ipvs->backup_mcast_ifn, mcast_ifn,
sizeof(ipvs->backup_mcast_ifn));
ipvs->backup_syncid = syncid;
- realtask = &ipvs->backup_thread;
- name = "ipvs_backup:%d";
+ name = "ipvs-b:%d:%d";
threadfn = sync_thread_backup;
- sock = make_receive_sock(net);
} else {
return -EINVAL;
}
- if (IS_ERR(sock)) {
- result = PTR_ERR(sock);
- goto out;
- }
+ if (state == IP_VS_STATE_MASTER) {
+ struct ipvs_master_sync_state *ms;
- set_sync_mesg_maxlen(net, state);
- if (state == IP_VS_STATE_BACKUP) {
- buf = kmalloc(ipvs->recv_mesg_maxlen, GFP_KERNEL);
- if (!buf)
- goto outsocket;
+ ipvs->ms = kzalloc(count * sizeof(ipvs->ms[0]), GFP_KERNEL);
+ if (!ipvs->ms)
+ goto out;
+ ms = ipvs->ms;
+ for (id = 0; id < count; id++, ms++) {
+ INIT_LIST_HEAD(&ms->sync_queue);
+ ms->sync_queue_len = 0;
+ ms->sync_queue_delay = 0;
+ INIT_DELAYED_WORK(&ms->master_wakeup_work,
+ master_wakeup_work_handler);
+ ms->ipvs = ipvs;
+ }
+ } else {
+ array = kzalloc(count * sizeof(struct task_struct *),
+ GFP_KERNEL);
+ if (!array)
+ goto out;
}
+ set_sync_mesg_maxlen(net, state);
- tinfo = kmalloc(sizeof(*tinfo), GFP_KERNEL);
- if (!tinfo)
- goto outbuf;
-
- tinfo->net = net;
- tinfo->sock = sock;
- tinfo->buf = buf;
+ tinfo = NULL;
+ for (id = 0; id < count; id++) {
+ if (state == IP_VS_STATE_MASTER)
+ sock = make_send_sock(net, id);
+ else
+ sock = make_receive_sock(net, id);
+ if (IS_ERR(sock)) {
+ result = PTR_ERR(sock);
+ goto outtinfo;
+ }
+ tinfo = kmalloc(sizeof(*tinfo), GFP_KERNEL);
+ if (!tinfo)
+ goto outsocket;
+ tinfo->net = net;
+ tinfo->sock = sock;
+ if (state == IP_VS_STATE_BACKUP) {
+ tinfo->buf = kmalloc(ipvs->recv_mesg_maxlen,
+ GFP_KERNEL);
+ if (!tinfo->buf)
+ goto outtinfo;
+ }
+ tinfo->id = id;
- task = kthread_run(threadfn, tinfo, name, ipvs->gen);
- if (IS_ERR(task)) {
- result = PTR_ERR(task);
- goto outtinfo;
+ task = kthread_run(threadfn, tinfo, name, ipvs->gen, id);
+ if (IS_ERR(task)) {
+ result = PTR_ERR(task);
+ goto outtinfo;
+ }
+ tinfo = NULL;
+ if (state == IP_VS_STATE_MASTER)
+ ipvs->ms[id].master_thread = task;
+ else
+ array[id] = task;
}
/* mark as active */
- *realtask = task;
+
+ if (state == IP_VS_STATE_BACKUP)
+ ipvs->backup_threads = array;
+ spin_lock_bh(&ipvs->sync_buff_lock);
ipvs->sync_state |= state;
+ spin_unlock_bh(&ipvs->sync_buff_lock);
/* increase the module use count */
ip_vs_use_count_inc();
return 0;
-outtinfo:
- kfree(tinfo);
-outbuf:
- kfree(buf);
outsocket:
sk_release_kernel(sock->sk);
+
+outtinfo:
+ if (tinfo) {
+ sk_release_kernel(tinfo->sock->sk);
+ kfree(tinfo->buf);
+ kfree(tinfo);
+ }
+ count = id;
+ while (count-- > 0) {
+ if (state == IP_VS_STATE_MASTER)
+ kthread_stop(ipvs->ms[count].master_thread);
+ else
+ kthread_stop(array[count]);
+ }
+ kfree(array);
+
out:
+ if (!(ipvs->sync_state & IP_VS_STATE_MASTER)) {
+ kfree(ipvs->ms);
+ ipvs->ms = NULL;
+ }
return result;
}
@@ -1776,39 +1853,60 @@ out:
int stop_sync_thread(struct net *net, int state)
{
struct netns_ipvs *ipvs = net_ipvs(net);
+ struct task_struct **array;
+ int id;
int retc = -EINVAL;
IP_VS_DBG(7, "%s(): pid %d\n", __func__, task_pid_nr(current));
if (state == IP_VS_STATE_MASTER) {
- if (!ipvs->master_thread)
+ if (!ipvs->ms)
return -ESRCH;
- pr_info("stopping master sync thread %d ...\n",
- task_pid_nr(ipvs->master_thread));
-
/*
* The lock synchronizes with sb_queue_tail(), so that we don't
* add sync buffers to the queue, when we are already in
* progress of stopping the master sync daemon.
*/
- spin_lock_bh(&ipvs->sync_lock);
+ spin_lock_bh(&ipvs->sync_buff_lock);
+ spin_lock(&ipvs->sync_lock);
ipvs->sync_state &= ~IP_VS_STATE_MASTER;
- spin_unlock_bh(&ipvs->sync_lock);
- cancel_delayed_work_sync(&ipvs->master_wakeup_work);
- retc = kthread_stop(ipvs->master_thread);
- ipvs->master_thread = NULL;
+ spin_unlock(&ipvs->sync_lock);
+ spin_unlock_bh(&ipvs->sync_buff_lock);
+
+ retc = 0;
+ for (id = ipvs->threads_mask; id >= 0; id--) {
+ struct ipvs_master_sync_state *ms = &ipvs->ms[id];
+ int ret;
+
+ pr_info("stopping master sync thread %d ...\n",
+ task_pid_nr(ms->master_thread));
+ cancel_delayed_work_sync(&ms->master_wakeup_work);
+ ret = kthread_stop(ms->master_thread);
+ if (retc >= 0)
+ retc = ret;
+ }
+ kfree(ipvs->ms);
+ ipvs->ms = NULL;
} else if (state == IP_VS_STATE_BACKUP) {
- if (!ipvs->backup_thread)
+ if (!ipvs->backup_threads)
return -ESRCH;
- pr_info("stopping backup sync thread %d ...\n",
- task_pid_nr(ipvs->backup_thread));
-
ipvs->sync_state &= ~IP_VS_STATE_BACKUP;
- retc = kthread_stop(ipvs->backup_thread);
- ipvs->backup_thread = NULL;
+ array = ipvs->backup_threads;
+ retc = 0;
+ for (id = ipvs->threads_mask; id >= 0; id--) {
+ int ret;
+
+ pr_info("stopping backup sync thread %d ...\n",
+ task_pid_nr(array[id]));
+ ret = kthread_stop(array[id]);
+ if (retc >= 0)
+ retc = ret;
+ }
+ kfree(array);
+ ipvs->backup_threads = NULL;
}
/* decrease the module use count */
@@ -1825,13 +1923,8 @@ int __net_init ip_vs_sync_net_init(struct net *net)
struct netns_ipvs *ipvs = net_ipvs(net);
__mutex_init(&ipvs->sync_mutex, "ipvs->sync_mutex", &__ipvs_sync_key);
- INIT_LIST_HEAD(&ipvs->sync_queue);
spin_lock_init(&ipvs->sync_lock);
spin_lock_init(&ipvs->sync_buff_lock);
-
- ipvs->sync_mcast_addr.sin_family = AF_INET;
- ipvs->sync_mcast_addr.sin_port = cpu_to_be16(IP_VS_SYNC_PORT);
- ipvs->sync_mcast_addr.sin_addr.s_addr = cpu_to_be32(IP_VS_SYNC_GROUP);
return 0;
}
--
1.7.9.5
^ permalink raw reply related
* [PATCH 08/25] ipvs: WRR scheduler does not need GFP_ATOMIC allocation
From: pablo @ 2012-05-08 18:38 UTC (permalink / raw)
To: netdev; +Cc: davem, openbsc
In-Reply-To: <1336502303-1722-1-git-send-email-pablo@netfilter.org>
From: Julian Anastasov <ja@ssi.bg>
Schedulers are initialized and bound to services only
on commands.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
net/netfilter/ipvs/ip_vs_wrr.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netfilter/ipvs/ip_vs_wrr.c b/net/netfilter/ipvs/ip_vs_wrr.c
index fd0d4e0..231be7d 100644
--- a/net/netfilter/ipvs/ip_vs_wrr.c
+++ b/net/netfilter/ipvs/ip_vs_wrr.c
@@ -84,7 +84,7 @@ static int ip_vs_wrr_init_svc(struct ip_vs_service *svc)
/*
* Allocate the mark variable for WRR scheduling
*/
- mark = kmalloc(sizeof(struct ip_vs_wrr_mark), GFP_ATOMIC);
+ mark = kmalloc(sizeof(struct ip_vs_wrr_mark), GFP_KERNEL);
if (mark == NULL)
return -ENOMEM;
--
1.7.9.5
^ permalink raw reply related
* [PATCH 10/25] ipvs: SH scheduler does not need GFP_ATOMIC allocation
From: pablo @ 2012-05-08 18:38 UTC (permalink / raw)
To: netdev; +Cc: davem, openbsc
In-Reply-To: <1336502303-1722-1-git-send-email-pablo@netfilter.org>
From: Julian Anastasov <ja@ssi.bg>
Schedulers are initialized and bound to services only
on commands.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
net/netfilter/ipvs/ip_vs_sh.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netfilter/ipvs/ip_vs_sh.c b/net/netfilter/ipvs/ip_vs_sh.c
index 91e97ee..0512652 100644
--- a/net/netfilter/ipvs/ip_vs_sh.c
+++ b/net/netfilter/ipvs/ip_vs_sh.c
@@ -162,7 +162,7 @@ static int ip_vs_sh_init_svc(struct ip_vs_service *svc)
/* allocate the SH table for this service */
tbl = kmalloc(sizeof(struct ip_vs_sh_bucket)*IP_VS_SH_TAB_SIZE,
- GFP_ATOMIC);
+ GFP_KERNEL);
if (tbl == NULL)
return -ENOMEM;
--
1.7.9.5
^ permalink raw reply related
* [PATCH 03/25] netfilter: nf_conntrack: use this_cpu_inc()
From: pablo @ 2012-05-08 18:38 UTC (permalink / raw)
To: netdev; +Cc: davem, openbsc
In-Reply-To: <1336502303-1722-1-git-send-email-pablo@netfilter.org>
From: Eric Dumazet <edumazet@google.com>
this_cpu_inc() is IRQ safe and faster than
local_bh_disable()/__this_cpu_inc()/local_bh_enable(), at least on x86.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Christoph Lameter <cl@linux.com>
Cc: Tejun Heo <tj@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
include/net/netfilter/nf_conntrack.h | 10 ++--------
1 file changed, 2 insertions(+), 8 deletions(-)
diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
index ab86036..cce7f6a 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -321,14 +321,8 @@ extern unsigned int nf_conntrack_max;
extern unsigned int nf_conntrack_hash_rnd;
void init_nf_conntrack_hash_rnd(void);
-#define NF_CT_STAT_INC(net, count) \
- __this_cpu_inc((net)->ct.stat->count)
-#define NF_CT_STAT_INC_ATOMIC(net, count) \
-do { \
- local_bh_disable(); \
- __this_cpu_inc((net)->ct.stat->count); \
- local_bh_enable(); \
-} while (0)
+#define NF_CT_STAT_INC(net, count) __this_cpu_inc((net)->ct.stat->count)
+#define NF_CT_STAT_INC_ATOMIC(net, count) this_cpu_inc((net)->ct.stat->count)
#define MODULE_ALIAS_NFCT_HELPER(helper) \
MODULE_ALIAS("nfct-helper-" helper)
--
1.7.9.5
^ permalink raw reply related
* [PATCH 07/25] ipvs: DH scheduler does not need GFP_ATOMIC allocation
From: pablo @ 2012-05-08 18:38 UTC (permalink / raw)
To: netdev; +Cc: davem, openbsc
In-Reply-To: <1336502303-1722-1-git-send-email-pablo@netfilter.org>
From: Julian Anastasov <ja@ssi.bg>
Schedulers are initialized and bound to services only
on commands.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
net/netfilter/ipvs/ip_vs_dh.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netfilter/ipvs/ip_vs_dh.c b/net/netfilter/ipvs/ip_vs_dh.c
index 1a53a7a..8b7dca9 100644
--- a/net/netfilter/ipvs/ip_vs_dh.c
+++ b/net/netfilter/ipvs/ip_vs_dh.c
@@ -149,7 +149,7 @@ static int ip_vs_dh_init_svc(struct ip_vs_service *svc)
/* allocate the DH table for this service */
tbl = kmalloc(sizeof(struct ip_vs_dh_bucket)*IP_VS_DH_TAB_SIZE,
- GFP_ATOMIC);
+ GFP_KERNEL);
if (tbl == NULL)
return -ENOMEM;
--
1.7.9.5
^ permalink raw reply related
* [PATCH 19/25] ipvs: optimize the use of flags in ip_vs_bind_dest
From: pablo @ 2012-05-08 18:38 UTC (permalink / raw)
To: netdev; +Cc: davem, openbsc
In-Reply-To: <1336502303-1722-1-git-send-email-pablo@netfilter.org>
From: Pablo Neira Ayuso <pablo@netfilter.org>
cp->flags is marked volatile but ip_vs_bind_dest
can safely modify the flags, so save some CPU cycles by
using temp variable.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
net/netfilter/ipvs/ip_vs_conn.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index c7edf20..1548df9 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
@@ -548,6 +548,7 @@ static inline void
ip_vs_bind_dest(struct ip_vs_conn *cp, struct ip_vs_dest *dest)
{
unsigned int conn_flags;
+ __u32 flags;
/* if dest is NULL, then return directly */
if (!dest)
@@ -559,17 +560,19 @@ ip_vs_bind_dest(struct ip_vs_conn *cp, struct ip_vs_dest *dest)
conn_flags = atomic_read(&dest->conn_flags);
if (cp->protocol != IPPROTO_UDP)
conn_flags &= ~IP_VS_CONN_F_ONE_PACKET;
+ flags = cp->flags;
/* Bind with the destination and its corresponding transmitter */
- if (cp->flags & IP_VS_CONN_F_SYNC) {
+ if (flags & IP_VS_CONN_F_SYNC) {
/* if the connection is not template and is created
* by sync, preserve the activity flag.
*/
- if (!(cp->flags & IP_VS_CONN_F_TEMPLATE))
+ if (!(flags & IP_VS_CONN_F_TEMPLATE))
conn_flags &= ~IP_VS_CONN_F_INACTIVE;
/* connections inherit forwarding method from dest */
- cp->flags &= ~(IP_VS_CONN_F_FWD_MASK | IP_VS_CONN_F_NOOUTPUT);
+ flags &= ~(IP_VS_CONN_F_FWD_MASK | IP_VS_CONN_F_NOOUTPUT);
}
- cp->flags |= conn_flags;
+ flags |= conn_flags;
+ cp->flags = flags;
cp->dest = dest;
IP_VS_DBG_BUF(7, "Bind-dest %s c:%s:%d v:%s:%d "
@@ -584,12 +587,12 @@ ip_vs_bind_dest(struct ip_vs_conn *cp, struct ip_vs_dest *dest)
atomic_read(&dest->refcnt));
/* Update the connection counters */
- if (!(cp->flags & IP_VS_CONN_F_TEMPLATE)) {
+ if (!(flags & IP_VS_CONN_F_TEMPLATE)) {
/* It is a normal connection, so modify the counters
* according to the flags, later the protocol can
* update them on state change
*/
- if (!(cp->flags & IP_VS_CONN_F_INACTIVE))
+ if (!(flags & IP_VS_CONN_F_INACTIVE))
atomic_inc(&dest->activeconns);
else
atomic_inc(&dest->inactconns);
--
1.7.9.5
^ permalink raw reply related
* [PATCH 16/25] ipvs: wakeup master thread
From: pablo @ 2012-05-08 18:38 UTC (permalink / raw)
To: netdev; +Cc: davem, openbsc
In-Reply-To: <1336502303-1722-1-git-send-email-pablo@netfilter.org>
From: Pablo Neira Ayuso <pablo@netfilter.org>
High rate of sync messages in master can lead to
overflowing the socket buffer and dropping the messages.
Fixed sleep of 1 second without wakeup events is not suitable
for loaded masters,
Use delayed_work to schedule sending for queued messages
and limit the delay to IPVS_SYNC_SEND_DELAY (20ms). This will
reduce the rate of wakeups but to avoid sending long bursts we
wakeup the master thread after IPVS_SYNC_WAKEUP_RATE (8) messages.
Add hard limit for the queued messages before sending
by using "sync_qlen_max" sysctl var. It defaults to 1/32 of
the memory pages but actually represents number of messages.
It will protect us from allocating large parts of memory
when the sending rate is lower than the queuing rate.
As suggested by Pablo, add new sysctl var
"sync_sock_size" to configure the SNDBUF (master) or
RCVBUF (slave) socket limit. Default value is 0 (preserve
system defaults).
Change the master thread to detect and block on
SNDBUF overflow, so that we do not drop messages when
the socket limit is low but the sync_qlen_max limit is
not reached. On ENOBUFS or other errors just drop the
messages.
Change master thread to enter TASK_INTERRUPTIBLE
state early, so that we do not miss wakeups due to messages or
kthread_should_stop event.
Thanks to Pablo Neira Ayuso for his valuable feedback!
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
include/net/ip_vs.h | 29 ++++++++
net/netfilter/ipvs/ip_vs_ctl.c | 16 +++++
net/netfilter/ipvs/ip_vs_sync.c | 149 ++++++++++++++++++++++++++++++---------
3 files changed, 162 insertions(+), 32 deletions(-)
diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 93b81aa..30e43c8 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -869,6 +869,8 @@ struct netns_ipvs {
#endif
int sysctl_snat_reroute;
int sysctl_sync_ver;
+ int sysctl_sync_qlen_max;
+ int sysctl_sync_sock_size;
int sysctl_cache_bypass;
int sysctl_expire_nodest_conn;
int sysctl_expire_quiescent_template;
@@ -889,6 +891,9 @@ struct netns_ipvs {
struct timer_list est_timer; /* Estimation timer */
/* ip_vs_sync */
struct list_head sync_queue;
+ int sync_queue_len;
+ unsigned int sync_queue_delay;
+ struct delayed_work master_wakeup_work;
spinlock_t sync_lock;
struct ip_vs_sync_buff *sync_buff;
spinlock_t sync_buff_lock;
@@ -911,6 +916,10 @@ struct netns_ipvs {
#define DEFAULT_SYNC_THRESHOLD 3
#define DEFAULT_SYNC_PERIOD 50
#define DEFAULT_SYNC_VER 1
+#define IPVS_SYNC_WAKEUP_RATE 8
+#define IPVS_SYNC_QLEN_MAX (IPVS_SYNC_WAKEUP_RATE * 4)
+#define IPVS_SYNC_SEND_DELAY (HZ / 50)
+#define IPVS_SYNC_CHECK_PERIOD HZ
#ifdef CONFIG_SYSCTL
@@ -929,6 +938,16 @@ static inline int sysctl_sync_ver(struct netns_ipvs *ipvs)
return ipvs->sysctl_sync_ver;
}
+static inline int sysctl_sync_qlen_max(struct netns_ipvs *ipvs)
+{
+ return ipvs->sysctl_sync_qlen_max;
+}
+
+static inline int sysctl_sync_sock_size(struct netns_ipvs *ipvs)
+{
+ return ipvs->sysctl_sync_sock_size;
+}
+
#else
static inline int sysctl_sync_threshold(struct netns_ipvs *ipvs)
@@ -946,6 +965,16 @@ static inline int sysctl_sync_ver(struct netns_ipvs *ipvs)
return DEFAULT_SYNC_VER;
}
+static inline int sysctl_sync_qlen_max(struct netns_ipvs *ipvs)
+{
+ return IPVS_SYNC_QLEN_MAX;
+}
+
+static inline int sysctl_sync_sock_size(struct netns_ipvs *ipvs)
+{
+ return 0;
+}
+
#endif
/*
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 37b9199..bd3827e 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -1718,6 +1718,18 @@ static struct ctl_table vs_vars[] = {
.proc_handler = &proc_do_sync_mode,
},
{
+ .procname = "sync_qlen_max",
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+ {
+ .procname = "sync_sock_size",
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+ {
.procname = "cache_bypass",
.maxlen = sizeof(int),
.mode = 0644,
@@ -3655,6 +3667,10 @@ int __net_init ip_vs_control_net_init_sysctl(struct net *net)
tbl[idx++].data = &ipvs->sysctl_snat_reroute;
ipvs->sysctl_sync_ver = 1;
tbl[idx++].data = &ipvs->sysctl_sync_ver;
+ ipvs->sysctl_sync_qlen_max = nr_free_buffer_pages() / 32;
+ tbl[idx++].data = &ipvs->sysctl_sync_qlen_max;
+ ipvs->sysctl_sync_sock_size = 0;
+ tbl[idx++].data = &ipvs->sysctl_sync_sock_size;
tbl[idx++].data = &ipvs->sysctl_cache_bypass;
tbl[idx++].data = &ipvs->sysctl_expire_nodest_conn;
tbl[idx++].data = &ipvs->sysctl_expire_quiescent_template;
diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c
index d2df694..b3235b2 100644
--- a/net/netfilter/ipvs/ip_vs_sync.c
+++ b/net/netfilter/ipvs/ip_vs_sync.c
@@ -307,11 +307,15 @@ static inline struct ip_vs_sync_buff *sb_dequeue(struct netns_ipvs *ipvs)
spin_lock_bh(&ipvs->sync_lock);
if (list_empty(&ipvs->sync_queue)) {
sb = NULL;
+ __set_current_state(TASK_INTERRUPTIBLE);
} else {
sb = list_entry(ipvs->sync_queue.next,
struct ip_vs_sync_buff,
list);
list_del(&sb->list);
+ ipvs->sync_queue_len--;
+ if (!ipvs->sync_queue_len)
+ ipvs->sync_queue_delay = 0;
}
spin_unlock_bh(&ipvs->sync_lock);
@@ -358,9 +362,16 @@ static inline void sb_queue_tail(struct netns_ipvs *ipvs)
struct ip_vs_sync_buff *sb = ipvs->sync_buff;
spin_lock(&ipvs->sync_lock);
- if (ipvs->sync_state & IP_VS_STATE_MASTER)
+ if (ipvs->sync_state & IP_VS_STATE_MASTER &&
+ ipvs->sync_queue_len < sysctl_sync_qlen_max(ipvs)) {
+ if (!ipvs->sync_queue_len)
+ schedule_delayed_work(&ipvs->master_wakeup_work,
+ max(IPVS_SYNC_SEND_DELAY, 1));
+ ipvs->sync_queue_len++;
list_add_tail(&sb->list, &ipvs->sync_queue);
- else
+ if ((++ipvs->sync_queue_delay) == IPVS_SYNC_WAKEUP_RATE)
+ wake_up_process(ipvs->master_thread);
+ } else
ip_vs_sync_buff_release(sb);
spin_unlock(&ipvs->sync_lock);
}
@@ -379,6 +390,7 @@ get_curr_sync_buff(struct netns_ipvs *ipvs, unsigned long time)
time_after_eq(jiffies - ipvs->sync_buff->firstuse, time)) {
sb = ipvs->sync_buff;
ipvs->sync_buff = NULL;
+ __set_current_state(TASK_RUNNING);
} else
sb = NULL;
spin_unlock_bh(&ipvs->sync_buff_lock);
@@ -392,26 +404,23 @@ get_curr_sync_buff(struct netns_ipvs *ipvs, unsigned long time)
void ip_vs_sync_switch_mode(struct net *net, int mode)
{
struct netns_ipvs *ipvs = net_ipvs(net);
+ struct ip_vs_sync_buff *sb;
+ spin_lock_bh(&ipvs->sync_buff_lock);
if (!(ipvs->sync_state & IP_VS_STATE_MASTER))
- return;
- if (mode == sysctl_sync_ver(ipvs) || !ipvs->sync_buff)
- return;
+ goto unlock;
+ sb = ipvs->sync_buff;
+ if (mode == sysctl_sync_ver(ipvs) || !sb)
+ goto unlock;
- spin_lock_bh(&ipvs->sync_buff_lock);
/* Buffer empty ? then let buf_create do the job */
- if (ipvs->sync_buff->mesg->size <= sizeof(struct ip_vs_sync_mesg)) {
- kfree(ipvs->sync_buff);
+ if (sb->mesg->size <= sizeof(struct ip_vs_sync_mesg)) {
+ ip_vs_sync_buff_release(sb);
ipvs->sync_buff = NULL;
- } else {
- spin_lock_bh(&ipvs->sync_lock);
- if (ipvs->sync_state & IP_VS_STATE_MASTER)
- list_add_tail(&ipvs->sync_buff->list,
- &ipvs->sync_queue);
- else
- ip_vs_sync_buff_release(ipvs->sync_buff);
- spin_unlock_bh(&ipvs->sync_lock);
- }
+ } else
+ sb_queue_tail(ipvs);
+
+unlock:
spin_unlock_bh(&ipvs->sync_buff_lock);
}
@@ -1130,6 +1139,28 @@ static void ip_vs_process_message(struct net *net, __u8 *buffer,
/*
+ * Setup sndbuf (mode=1) or rcvbuf (mode=0)
+ */
+static void set_sock_size(struct sock *sk, int mode, int val)
+{
+ /* setsockopt(sock, SOL_SOCKET, SO_SNDBUF, &val, sizeof(val)); */
+ /* setsockopt(sock, SOL_SOCKET, SO_RCVBUF, &val, sizeof(val)); */
+ lock_sock(sk);
+ if (mode) {
+ val = clamp_t(int, val, (SOCK_MIN_SNDBUF + 1) / 2,
+ sysctl_wmem_max);
+ sk->sk_sndbuf = val * 2;
+ sk->sk_userlocks |= SOCK_SNDBUF_LOCK;
+ } else {
+ val = clamp_t(int, val, (SOCK_MIN_RCVBUF + 1) / 2,
+ sysctl_rmem_max);
+ sk->sk_rcvbuf = val * 2;
+ sk->sk_userlocks |= SOCK_RCVBUF_LOCK;
+ }
+ release_sock(sk);
+}
+
+/*
* Setup loopback of outgoing multicasts on a sending socket
*/
static void set_mcast_loop(struct sock *sk, u_char loop)
@@ -1305,6 +1336,9 @@ static struct socket *make_send_sock(struct net *net)
set_mcast_loop(sock->sk, 0);
set_mcast_ttl(sock->sk, 1);
+ result = sysctl_sync_sock_size(ipvs);
+ if (result > 0)
+ set_sock_size(sock->sk, 1, result);
result = bind_mcastif_addr(sock, ipvs->master_mcast_ifn);
if (result < 0) {
@@ -1350,6 +1384,9 @@ static struct socket *make_receive_sock(struct net *net)
sk_change_net(sock->sk, net);
/* it is equivalent to the REUSEADDR option in user-space */
sock->sk->sk_reuse = SK_CAN_REUSE;
+ result = sysctl_sync_sock_size(ipvs);
+ if (result > 0)
+ set_sock_size(sock->sk, 0, result);
result = sock->ops->bind(sock, (struct sockaddr *) &mcast_addr,
sizeof(struct sockaddr));
@@ -1392,18 +1429,22 @@ ip_vs_send_async(struct socket *sock, const char *buffer, const size_t length)
return len;
}
-static void
+static int
ip_vs_send_sync_msg(struct socket *sock, struct ip_vs_sync_mesg *msg)
{
int msize;
+ int ret;
msize = msg->size;
/* Put size in network byte order */
msg->size = htons(msg->size);
- if (ip_vs_send_async(sock, (char *)msg, msize) != msize)
- pr_err("ip_vs_send_async error\n");
+ ret = ip_vs_send_async(sock, (char *)msg, msize);
+ if (ret >= 0 || ret == -EAGAIN)
+ return ret;
+ pr_err("ip_vs_send_async error %d\n", ret);
+ return 0;
}
static int
@@ -1428,36 +1469,75 @@ ip_vs_receive(struct socket *sock, char *buffer, const size_t buflen)
return len;
}
+/* Wakeup the master thread for sending */
+static void master_wakeup_work_handler(struct work_struct *work)
+{
+ struct netns_ipvs *ipvs = container_of(work, struct netns_ipvs,
+ master_wakeup_work.work);
+
+ spin_lock_bh(&ipvs->sync_lock);
+ if (ipvs->sync_queue_len &&
+ ipvs->sync_queue_delay < IPVS_SYNC_WAKEUP_RATE) {
+ ipvs->sync_queue_delay = IPVS_SYNC_WAKEUP_RATE;
+ wake_up_process(ipvs->master_thread);
+ }
+ spin_unlock_bh(&ipvs->sync_lock);
+}
+
+/* Get next buffer to send */
+static inline struct ip_vs_sync_buff *
+next_sync_buff(struct netns_ipvs *ipvs)
+{
+ struct ip_vs_sync_buff *sb;
+
+ sb = sb_dequeue(ipvs);
+ if (sb)
+ return sb;
+ /* Do not delay entries in buffer for more than 2 seconds */
+ return get_curr_sync_buff(ipvs, 2 * HZ);
+}
static int sync_thread_master(void *data)
{
struct ip_vs_sync_thread_data *tinfo = data;
struct netns_ipvs *ipvs = net_ipvs(tinfo->net);
+ struct sock *sk = tinfo->sock->sk;
struct ip_vs_sync_buff *sb;
pr_info("sync thread started: state = MASTER, mcast_ifn = %s, "
"syncid = %d\n",
ipvs->master_mcast_ifn, ipvs->master_syncid);
- while (!kthread_should_stop()) {
- while ((sb = sb_dequeue(ipvs))) {
- ip_vs_send_sync_msg(tinfo->sock, sb->mesg);
- ip_vs_sync_buff_release(sb);
+ for (;;) {
+ sb = next_sync_buff(ipvs);
+ if (unlikely(kthread_should_stop()))
+ break;
+ if (!sb) {
+ schedule_timeout(IPVS_SYNC_CHECK_PERIOD);
+ continue;
}
-
- /* check if entries stay in ipvs->sync_buff for 2 seconds */
- sb = get_curr_sync_buff(ipvs, 2 * HZ);
- if (sb) {
- ip_vs_send_sync_msg(tinfo->sock, sb->mesg);
- ip_vs_sync_buff_release(sb);
+ while (ip_vs_send_sync_msg(tinfo->sock, sb->mesg) < 0) {
+ int ret = 0;
+
+ __wait_event_interruptible(*sk_sleep(sk),
+ sock_writeable(sk) ||
+ kthread_should_stop(),
+ ret);
+ if (unlikely(kthread_should_stop()))
+ goto done;
}
-
- schedule_timeout_interruptible(HZ);
+ ip_vs_sync_buff_release(sb);
}
+done:
+ __set_current_state(TASK_RUNNING);
+ if (sb)
+ ip_vs_sync_buff_release(sb);
+
/* clean up the sync_buff queue */
while ((sb = sb_dequeue(ipvs)))
ip_vs_sync_buff_release(sb);
+ __set_current_state(TASK_RUNNING);
/* clean up the current sync_buff */
sb = get_curr_sync_buff(ipvs, 0);
@@ -1538,6 +1618,10 @@ int start_sync_thread(struct net *net, int state, char *mcast_ifn, __u8 syncid)
realtask = &ipvs->master_thread;
name = "ipvs_master:%d";
threadfn = sync_thread_master;
+ ipvs->sync_queue_len = 0;
+ ipvs->sync_queue_delay = 0;
+ INIT_DELAYED_WORK(&ipvs->master_wakeup_work,
+ master_wakeup_work_handler);
sock = make_send_sock(net);
} else if (state == IP_VS_STATE_BACKUP) {
if (ipvs->backup_thread)
@@ -1623,6 +1707,7 @@ int stop_sync_thread(struct net *net, int state)
spin_lock_bh(&ipvs->sync_lock);
ipvs->sync_state &= ~IP_VS_STATE_MASTER;
spin_unlock_bh(&ipvs->sync_lock);
+ cancel_delayed_work_sync(&ipvs->master_wakeup_work);
retc = kthread_stop(ipvs->master_thread);
ipvs->master_thread = NULL;
} else if (state == IP_VS_STATE_BACKUP) {
--
1.7.9.5
^ permalink raw reply related
* [PATCH 13/25] ipvs: remove check for IP_VS_CONN_F_SYNC from ip_vs_bind_dest
From: pablo @ 2012-05-08 18:38 UTC (permalink / raw)
To: netdev; +Cc: davem, openbsc
In-Reply-To: <1336502303-1722-1-git-send-email-pablo@netfilter.org>
From: Julian Anastasov <ja@ssi.bg>
As the IP_VS_CONN_F_INACTIVE bit is properly set
in cp->flags for all kind of connections we do not need to
add special checks for synced connections when updating
the activeconns/inactconns counters for first time. Now
logic will look just like in ip_vs_unbind_dest.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
net/netfilter/ipvs/ip_vs_conn.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index f562e63..1c1bb30 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
@@ -585,11 +585,11 @@ ip_vs_bind_dest(struct ip_vs_conn *cp, struct ip_vs_dest *dest)
/* Update the connection counters */
if (!(cp->flags & IP_VS_CONN_F_TEMPLATE)) {
- /* It is a normal connection, so increase the inactive
- connection counter because it is in TCP SYNRECV
- state (inactive) or other protocol inacive state */
- if ((cp->flags & IP_VS_CONN_F_SYNC) &&
- (!(cp->flags & IP_VS_CONN_F_INACTIVE)))
+ /* It is a normal connection, so modify the counters
+ * according to the flags, later the protocol can
+ * update them on state change
+ */
+ if (!(cp->flags & IP_VS_CONN_F_INACTIVE))
atomic_inc(&dest->activeconns);
else
atomic_inc(&dest->inactconns);
--
1.7.9.5
^ permalink raw reply related
* [PATCH 12/25] ipvs: ignore IP_VS_CONN_F_NOOUTPUT in backup server
From: pablo @ 2012-05-08 18:38 UTC (permalink / raw)
To: netdev; +Cc: davem, openbsc
In-Reply-To: <1336502303-1722-1-git-send-email-pablo@netfilter.org>
From: Julian Anastasov <ja@ssi.bg>
As IP_VS_CONN_F_NOOUTPUT is derived from the
forwarding method we should get it from conn_flags just
like we do it for IP_VS_CONN_F_FWD_MASK bits when binding
to real server.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
net/netfilter/ipvs/ip_vs_conn.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index 4a09b78..f562e63 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
@@ -567,7 +567,7 @@ ip_vs_bind_dest(struct ip_vs_conn *cp, struct ip_vs_dest *dest)
if (!(cp->flags & IP_VS_CONN_F_TEMPLATE))
conn_flags &= ~IP_VS_CONN_F_INACTIVE;
/* connections inherit forwarding method from dest */
- cp->flags &= ~IP_VS_CONN_F_FWD_MASK;
+ cp->flags &= ~(IP_VS_CONN_F_FWD_MASK | IP_VS_CONN_F_NOOUTPUT);
}
cp->flags |= conn_flags;
cp->dest = dest;
--
1.7.9.5
^ permalink raw reply related
* [PATCH 11/25] ipvs: use GFP_KERNEL allocation where possible
From: pablo @ 2012-05-08 18:38 UTC (permalink / raw)
To: netdev; +Cc: davem, openbsc
In-Reply-To: <1336502303-1722-1-git-send-email-pablo@netfilter.org>
From: Sasha Levin <levinsasha928@gmail.com>
Use GFP_KERNEL instead of GFP_ATOMIC when registering an ipvs protocol.
This is safe since it will always run from a process context.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
net/netfilter/ipvs/ip_vs_proto.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netfilter/ipvs/ip_vs_proto.c b/net/netfilter/ipvs/ip_vs_proto.c
index ca16476..e91c898 100644
--- a/net/netfilter/ipvs/ip_vs_proto.c
+++ b/net/netfilter/ipvs/ip_vs_proto.c
@@ -68,7 +68,7 @@ register_ip_vs_proto_netns(struct net *net, struct ip_vs_protocol *pp)
struct netns_ipvs *ipvs = net_ipvs(net);
unsigned int hash = IP_VS_PROTO_HASH(pp->protocol);
struct ip_vs_proto_data *pd =
- kzalloc(sizeof(struct ip_vs_proto_data), GFP_ATOMIC);
+ kzalloc(sizeof(struct ip_vs_proto_data), GFP_KERNEL);
if (!pd)
return -ENOMEM;
--
1.7.9.5
^ permalink raw reply related
* [PATCH 09/25] ipvs: LBLCR scheduler does not need GFP_ATOMIC allocation on init
From: pablo @ 2012-05-08 18:38 UTC (permalink / raw)
To: netdev; +Cc: davem, openbsc
In-Reply-To: <1336502303-1722-1-git-send-email-pablo@netfilter.org>
From: Julian Anastasov <ja@ssi.bg>
Schedulers are initialized and bound to services only
on commands.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
net/netfilter/ipvs/ip_vs_lblcr.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netfilter/ipvs/ip_vs_lblcr.c b/net/netfilter/ipvs/ip_vs_lblcr.c
index 9dcd39a..570e31e 100644
--- a/net/netfilter/ipvs/ip_vs_lblcr.c
+++ b/net/netfilter/ipvs/ip_vs_lblcr.c
@@ -511,7 +511,7 @@ static int ip_vs_lblcr_init_svc(struct ip_vs_service *svc)
/*
* Allocate the ip_vs_lblcr_table for this service
*/
- tbl = kmalloc(sizeof(*tbl), GFP_ATOMIC);
+ tbl = kmalloc(sizeof(*tbl), GFP_KERNEL);
if (tbl == NULL)
return -ENOMEM;
--
1.7.9.5
^ permalink raw reply related
* [PATCH 06/25] ipvs: LBLC scheduler does not need GFP_ATOMIC allocation on init
From: pablo @ 2012-05-08 18:38 UTC (permalink / raw)
To: netdev; +Cc: davem, openbsc
In-Reply-To: <1336502303-1722-1-git-send-email-pablo@netfilter.org>
From: Julian Anastasov <ja@ssi.bg>
Schedulers are initialized and bound to services only
on commands.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
net/netfilter/ipvs/ip_vs_lblc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netfilter/ipvs/ip_vs_lblc.c b/net/netfilter/ipvs/ip_vs_lblc.c
index 9b0de9a..df646cc 100644
--- a/net/netfilter/ipvs/ip_vs_lblc.c
+++ b/net/netfilter/ipvs/ip_vs_lblc.c
@@ -342,7 +342,7 @@ static int ip_vs_lblc_init_svc(struct ip_vs_service *svc)
/*
* Allocate the ip_vs_lblc_table for this service
*/
- tbl = kmalloc(sizeof(*tbl), GFP_ATOMIC);
+ tbl = kmalloc(sizeof(*tbl), GFP_KERNEL);
if (tbl == NULL)
return -ENOMEM;
--
1.7.9.5
^ permalink raw reply related
* [PATCH 00/25] [v3] netfilter updates for net-next (upcoming 3.5)
From: pablo @ 2012-05-08 18:37 UTC (permalink / raw)
To: netdev; +Cc: davem, openbsc
From: Pablo Neira Ayuso <pablo@netfilter.org>
Hi David,
Third version, now based on your fresh current net-next tree.
The following patchset contains the Netfilter updates for net-next.
Most notably:
* The new /proc/sys/net/netfilter/nf_conntrack_helper entry that
allows to disable the automatic conntrack helper assignment from
Eric Leblond. This patch also spots a warning to inform the user
that this behaviour will be removed at some point. The automatic
conntrack helper assignment may allows attackers to open hole in
the firewall to access the protected network segments (with
incorrect configurations). More information on this issue at:
https://home.regit.org/netfilter-en/secure-use-of-helpers/
In the near future, all conntrack helpers will be explicitly
attached via the CT target, as we longing discussed during
the last netfilter workshop.
* One new sysctl to translate the input device to vlan device name
from Florian Westphal. He required this to get the REDIRECT target
working with another sysctl vlan-on-top-of-bridge.
* Major improvements in the ip_vs_sync code from Julian Anastasov.
They aim to improve scalability and to address possible message
loss due to socket overrun under high rate of synchronization
messages.
* Several minor memory allocation flags fixes from IPVS people
contributors.
* Eric Leblond's patch spotted one problem that becomes noticeable
if a) automatic helper assignment is disabled, and b) if NAT is
in use, and c) the CT target is used to attach a non-standard
conntrack helper port. This fix comes from myself.
* One small update to allow updating the expectation timeout from
Kelvie Wong.
* Finally, remove ip[6]_queue support since they have been marked
as obsolete since long time ago. Now, we have nfnetlink_queue
which is way more flexible from myself.
You can pull these changes from:
git://1984.lsi.us.es/net-next master
If time allows, I'd like to send a second batch. There a several patches
that are very close to get into shape still on netfilter-devel.
Thanks!
Eric Dumazet (1):
netfilter: nf_conntrack: use this_cpu_inc()
Eric Leblond (1):
netfilter: nf_ct_helper: allow to disable automatic helper assignment
Florian Westphal (1):
netfilter: bridge: optionally set indev to vlan
H Hartley Sweeten (2):
ipvs: ip_vs_ftp: local functions should not be exposed globally
ipvs: ip_vs_proto: local functions should not be exposed globally
Hans Schillstrom (1):
net: export sysctl_[r|w]mem_max symbols needed by ip_vs_sync
Julian Anastasov (14):
ipvs: timeout tables do not need GFP_ATOMIC allocation
ipvs: LBLC scheduler does not need GFP_ATOMIC allocation on init
ipvs: DH scheduler does not need GFP_ATOMIC allocation
ipvs: WRR scheduler does not need GFP_ATOMIC allocation
ipvs: LBLCR scheduler does not need GFP_ATOMIC allocation on init
ipvs: SH scheduler does not need GFP_ATOMIC allocation
ipvs: ignore IP_VS_CONN_F_NOOUTPUT in backup server
ipvs: remove check for IP_VS_CONN_F_SYNC from ip_vs_bind_dest
ipvs: fix ip_vs_try_bind_dest to rebind app and transmitter
ipvs: always update some of the flags bits in backup
ipvs: wakeup master thread
ipvs: reduce sync rate with time thresholds
ipvs: add support for sync threads
ipvs: optimize the use of flags in ip_vs_bind_dest
Kelvie Wong (1):
netfilter: nf_ct_expect: partially implement ctnetlink_change_expect
Pablo Neira Ayuso (2):
netfilter: nf_conntrack: fix explicit helper attachment and NAT
netfilter: remove ip_queue support
Sasha Levin (1):
ipvs: use GFP_KERNEL allocation where possible
Tony Zelenoff (1):
netfilter: nf_ct_ecache: refactor notifier registration
Documentation/ABI/removed/ip_queue | 9 +
Documentation/networking/ip-sysctl.txt | 13 +-
include/linux/ip_vs.h | 5 +
include/linux/netfilter/nf_conntrack_common.h | 4 +
include/linux/netfilter_ipv4/Kbuild | 1 -
include/linux/netfilter_ipv4/ip_queue.h | 72 ---
include/linux/netlink.h | 2 +-
include/net/ip_vs.h | 87 +++-
include/net/netfilter/nf_conntrack.h | 10 +-
include/net/netfilter/nf_conntrack_helper.h | 4 +-
include/net/netns/conntrack.h | 3 +
net/bridge/br_netfilter.c | 26 +-
net/core/sock.c | 2 +
net/ipv4/netfilter/Makefile | 3 -
net/ipv4/netfilter/ip_queue.c | 639 ------------------------
net/ipv6/netfilter/Kconfig | 22 -
net/ipv6/netfilter/Makefile | 1 -
net/ipv6/netfilter/ip6_queue.c | 641 ------------------------
net/netfilter/ipvs/ip_vs_conn.c | 69 ++-
net/netfilter/ipvs/ip_vs_core.c | 30 +-
net/netfilter/ipvs/ip_vs_ctl.c | 70 ++-
net/netfilter/ipvs/ip_vs_dh.c | 2 +-
net/netfilter/ipvs/ip_vs_ftp.c | 2 +-
net/netfilter/ipvs/ip_vs_lblc.c | 2 +-
net/netfilter/ipvs/ip_vs_lblcr.c | 2 +-
net/netfilter/ipvs/ip_vs_proto.c | 6 +-
net/netfilter/ipvs/ip_vs_sh.c | 2 +-
net/netfilter/ipvs/ip_vs_sync.c | 662 +++++++++++++++++--------
net/netfilter/ipvs/ip_vs_wrr.c | 2 +-
net/netfilter/nf_conntrack_core.c | 15 +-
net/netfilter/nf_conntrack_ecache.c | 10 +-
net/netfilter/nf_conntrack_helper.c | 120 ++++-
net/netfilter/nf_conntrack_netlink.c | 10 +-
security/selinux/nlmsgtab.c | 13 -
34 files changed, 853 insertions(+), 1708 deletions(-)
create mode 100644 Documentation/ABI/removed/ip_queue
delete mode 100644 include/linux/netfilter_ipv4/ip_queue.h
delete mode 100644 net/ipv4/netfilter/ip_queue.c
delete mode 100644 net/ipv6/netfilter/ip6_queue.c
--
1.7.9.5
^ permalink raw reply
* Re: [PATCH] pch_gbe: Adding read memory barriers
From: Erwan Velu @ 2012-05-08 18:35 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tshimizu818
In-Reply-To: <4FA822C4.60809@gmail.com>
Le 07/05/2012 21:30, Erwan Velu a écrit :
> From bb1e271db0fa1a29df19bede69faf8004389132d Mon Sep 17 00:00:00 2001
> From: Erwan Velu <erwan.velu@zodiacaerospace.com>
> Date: Mon, 7 May 2012 19:15:29 +0000
> Subject: [PATCH 1/1] pch_gbe: Adding read memory barriers
Hey Fellows,
Does my patch can be considered as acceptable or shall I rework
something on it ?
Cheers,
Erwan
^ permalink raw reply
* Re: batostr() function
From: Luis R. Rodriguez @ 2012-05-08 18:26 UTC (permalink / raw)
To: Johannes Berg
Cc: Joe Perches, Andrei Emeltchenko, David Herrmann,
linux-bluetooth-u79uwXL29TY76Z2rM5mHXA, netdev
In-Reply-To: <1336499313.4320.4.camel-8upI4CBIZJIJvtFkdXX2HixXY32XiHfO@public.gmane.org>
On Tue, May 8, 2012 at 10:48 AM, Johannes Berg
<johannes-cdvu00un1VgdHxzADdlk8Q@public.gmane.org> wrote:
> On Tue, 2012-05-08 at 10:18 -0700, Joe Perches wrote:
>> On Tue, 2012-05-08 at 17:30 +0300, Andrei Emeltchenko wrote:
>> > On Tue, May 08, 2012 at 04:25:08PM +0200, Johannes Berg wrote:
>> > > On Tue, 2012-05-08 at 15:30 +0200, David Herrmann wrote:
>> > > > Hi Johannes
>> > > >
>> > > > On Mon, May 7, 2012 at 1:49 PM, Johannes Berg <johannes-cdvu00un1VgdHxzADdlk8Q@public.gmane.org> wrote:
>> > > > > Really? 2 static buffers that are used alternately based on a static
>> > > > > variable? How can that possibly be thread-safe? That may work in very
>> > > > > restricted scenarios, but ...
>> > > >
>> > > > Looking at "git blame" it seems the whole function is still from
>> > > > linux-2.4. Looks like no-one ever noticed. I've sent a patchset fixing
>> > > > it, thanks.
>> > >
>> > > I was thinking you could use %pM, but it seems BT addresses are stored
>> > > the wrong way around for some reason ...
>> >
>> > This looks like better idea then allocating buffers, we can use swap to
>> > take care about "wrong order".
>>
>> https://lkml.org/lkml/2010/12/3/358
>
> Pretty much what I had in mind, thanks. Luis, you'll notice that this
> will be a pain to backport in compat. :-)
Mumble grumble. Oh well! :) I'm starting to enjoy the curve balls.
Luis
^ permalink raw reply
* Re: [PATCH] net: update the usage of CHECKSUM_UNNECESSARY
From: Ben Hutchings @ 2012-05-08 18:20 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
jeffrey.t.kirsher-ral2JQCrhuEAvxtiuMwx3w,
devel-s9riP+hp16TNLxjTenLetw
In-Reply-To: <20120508174831.GA27406-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
On Tue, 2012-05-08 at 20:48 +0300, Michael S. Tsirkin wrote:
> On Mon, Mar 19, 2012 at 02:12:41PM -0700, Yi Zou wrote:
> > As suggested by Ben, this adds the clarification on the usage of
> > CHECKSUM_UNNECESSARY on the outgoing patch. Also add the usage
> > description of NETIF_F_FCOE_CRC and CHECKSUM_UNNECESSARY
> > for the kernel FCoE protocol driver.
> >
> > This is a follow-up to the following:
> > http://patchwork.ozlabs.org/patch/147315/
> >
> > Signed-off-by: Yi Zou <yi.zou-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> > Cc: Ben Hutchings <bhutchings-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>
> > Cc: Jeff Kirsher <jeffrey.t.kirsher-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> > Cc: www.Open-FCoE.org <devel-s9riP+hp16TNLxjTenLetw@public.gmane.org>
> > ---
> >
> > include/linux/skbuff.h | 7 +++++++
> > 1 files changed, 7 insertions(+), 0 deletions(-)
> >
> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > index 8dc8257..a2b9953 100644
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -94,6 +94,13 @@
> > * about CHECKSUM_UNNECESSARY. 8)
> > * NETIF_F_IPV6_CSUM about as dumb as the last one but does IPv6 instead.
> > *
> > + * UNNECESSARY: device will do per protocol specific csum. Protocol drivers
> > + * that do not want net to perform the checksum calculation should use
> > + * this flag in their outgoing skbs.
> > + * NETIF_F_FCOE_CRC this indicates the device can do FCoE FC CRC
> > + * offload. Correspondingly, the FCoE protocol driver
> > + * stack should use CHECKSUM_UNNECESSARY.
> > + *
> > * Any questions? No questions, good. --ANK
> > */
> >
>
> So just to make sure I understand, you never get
> UNNECESSARY packets on tx unless you declared NETIF_F_FCOE_CRC?
>
> Maybe the comment says this somehow but could not figure it out.
That's what should happen now. In future CHECKSUM_UNNECESSARY could be
used on output by other protocols which don't use TCP/IP-style
checksums, but always dependent on the output device supporting the
relevant offload feature.
Ben.
--
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply
* Re: [PATCH] net: update the usage of CHECKSUM_UNNECESSARY
From: Michael S. Tsirkin @ 2012-05-08 17:48 UTC (permalink / raw)
To: Yi Zou; +Cc: netdev, devel, bhutchings, jeffrey.t.kirsher
In-Reply-To: <20120319211241.11291.53271.stgit@localhost6.localdomain6>
On Mon, Mar 19, 2012 at 02:12:41PM -0700, Yi Zou wrote:
> As suggested by Ben, this adds the clarification on the usage of
> CHECKSUM_UNNECESSARY on the outgoing patch. Also add the usage
> description of NETIF_F_FCOE_CRC and CHECKSUM_UNNECESSARY
> for the kernel FCoE protocol driver.
>
> This is a follow-up to the following:
> http://patchwork.ozlabs.org/patch/147315/
>
> Signed-off-by: Yi Zou <yi.zou@intel.com>
> Cc: Ben Hutchings <bhutchings@solarflare.com>
> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Cc: www.Open-FCoE.org <devel@open-fcoe.org>
> ---
>
> include/linux/skbuff.h | 7 +++++++
> 1 files changed, 7 insertions(+), 0 deletions(-)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 8dc8257..a2b9953 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -94,6 +94,13 @@
> * about CHECKSUM_UNNECESSARY. 8)
> * NETIF_F_IPV6_CSUM about as dumb as the last one but does IPv6 instead.
> *
> + * UNNECESSARY: device will do per protocol specific csum. Protocol drivers
> + * that do not want net to perform the checksum calculation should use
> + * this flag in their outgoing skbs.
> + * NETIF_F_FCOE_CRC this indicates the device can do FCoE FC CRC
> + * offload. Correspondingly, the FCoE protocol driver
> + * stack should use CHECKSUM_UNNECESSARY.
> + *
> * Any questions? No questions, good. --ANK
> */
>
So just to make sure I understand, you never get
UNNECESSARY packets on tx unless you declared NETIF_F_FCOE_CRC?
Maybe the comment says this somehow but could not figure it out.
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: batostr() function
From: Johannes Berg @ 2012-05-08 17:48 UTC (permalink / raw)
To: Joe Perches
Cc: Andrei Emeltchenko, David Herrmann, linux-bluetooth, netdev,
Luis R. Rodriguez
In-Reply-To: <1336497532.29640.24.camel@joe2Laptop>
On Tue, 2012-05-08 at 10:18 -0700, Joe Perches wrote:
> On Tue, 2012-05-08 at 17:30 +0300, Andrei Emeltchenko wrote:
> > On Tue, May 08, 2012 at 04:25:08PM +0200, Johannes Berg wrote:
> > > On Tue, 2012-05-08 at 15:30 +0200, David Herrmann wrote:
> > > > Hi Johannes
> > > >
> > > > On Mon, May 7, 2012 at 1:49 PM, Johannes Berg <johannes@sipsolutions.net> wrote:
> > > > > Really? 2 static buffers that are used alternately based on a static
> > > > > variable? How can that possibly be thread-safe? That may work in very
> > > > > restricted scenarios, but ...
> > > >
> > > > Looking at "git blame" it seems the whole function is still from
> > > > linux-2.4. Looks like no-one ever noticed. I've sent a patchset fixing
> > > > it, thanks.
> > >
> > > I was thinking you could use %pM, but it seems BT addresses are stored
> > > the wrong way around for some reason ...
> >
> > This looks like better idea then allocating buffers, we can use swap to
> > take care about "wrong order".
>
> https://lkml.org/lkml/2010/12/3/358
Pretty much what I had in mind, thanks. Luis, you'll notice that this
will be a pain to backport in compat. :-)
johannes
^ permalink raw reply
* Re: batostr() function
From: Joe Perches @ 2012-05-08 17:18 UTC (permalink / raw)
To: Andrei Emeltchenko
Cc: Johannes Berg, David Herrmann,
linux-bluetooth-u79uwXL29TY76Z2rM5mHXA, netdev
In-Reply-To: <20120508143011.GD29352@aemeltch-MOBL1>
On Tue, 2012-05-08 at 17:30 +0300, Andrei Emeltchenko wrote:
> On Tue, May 08, 2012 at 04:25:08PM +0200, Johannes Berg wrote:
> > On Tue, 2012-05-08 at 15:30 +0200, David Herrmann wrote:
> > > Hi Johannes
> > >
> > > On Mon, May 7, 2012 at 1:49 PM, Johannes Berg <johannes-cdvu00un1VgdHxzADdlk8Q@public.gmane.org> wrote:
> > > > Really? 2 static buffers that are used alternately based on a static
> > > > variable? How can that possibly be thread-safe? That may work in very
> > > > restricted scenarios, but ...
> > >
> > > Looking at "git blame" it seems the whole function is still from
> > > linux-2.4. Looks like no-one ever noticed. I've sent a patchset fixing
> > > it, thanks.
> >
> > I was thinking you could use %pM, but it seems BT addresses are stored
> > the wrong way around for some reason ...
>
> This looks like better idea then allocating buffers, we can use swap to
> take care about "wrong order".
https://lkml.org/lkml/2010/12/3/358
^ permalink raw reply
* Re: [PATCH 00/25] [v2] netfilter updates for net-next (upcoming 3.5)
From: David Miller @ 2012-05-08 17:12 UTC (permalink / raw)
To: pablo; +Cc: netfilter-devel, netdev
In-Reply-To: <20120508171049.GA16205@1984>
From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Tue, 8 May 2012 19:10:49 +0200
> I can rebase all my patches on top of fresh tree, fix all conflicts
> myself and send you another patchset that will apply to your current
> tree.
>
> Please, let me know how to proceed.
Yes, please build a tree that compiles properly :-)
The error is pretty trivial to reproduce, just "allmodconfig" like I
do.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox