* [PATCH 13/25] ipvs: remove check for IP_VS_CONN_F_SYNC from ip_vs_bind_dest
From: pablo @ 2012-05-08 0:22 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1336436539-5880-1-git-send-email-pablo@netfilter.org>
From: Julian Anastasov <ja@ssi.bg>
As the IP_VS_CONN_F_INACTIVE bit is properly set
in cp->flags for all kind of connections we do not need to
add special checks for synced connections when updating
the activeconns/inactconns counters for first time. Now
logic will look just like in ip_vs_unbind_dest.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
net/netfilter/ipvs/ip_vs_conn.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index f562e63..7647f3b 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
@@ -585,11 +585,10 @@ ip_vs_bind_dest(struct ip_vs_conn *cp, struct ip_vs_dest *dest)
/* Update the connection counters */
if (!(cp->flags & IP_VS_CONN_F_TEMPLATE)) {
- /* It is a normal connection, so increase the inactive
- connection counter because it is in TCP SYNRECV
- state (inactive) or other protocol inacive state */
- if ((cp->flags & IP_VS_CONN_F_SYNC) &&
- (!(cp->flags & IP_VS_CONN_F_INACTIVE)))
+ /* It is a normal connection, so modify the counters
+ * according to the flags, later the protocol can
+ * update them on state change */
+ if (!(cp->flags & IP_VS_CONN_F_INACTIVE))
atomic_inc(&dest->activeconns);
else
atomic_inc(&dest->inactconns);
--
1.7.9.5
^ permalink raw reply related
* [PATCH 11/25] ipvs: use GFP_KERNEL allocation where possible
From: pablo @ 2012-05-08 0:22 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1336436539-5880-1-git-send-email-pablo@netfilter.org>
From: Sasha Levin <levinsasha928@gmail.com>
Use GFP_KERNEL instead of GFP_ATOMIC when registering an ipvs protocol.
This is safe since it will always run from a process context.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
net/netfilter/ipvs/ip_vs_proto.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netfilter/ipvs/ip_vs_proto.c b/net/netfilter/ipvs/ip_vs_proto.c
index a981b7c..8726488 100644
--- a/net/netfilter/ipvs/ip_vs_proto.c
+++ b/net/netfilter/ipvs/ip_vs_proto.c
@@ -71,7 +71,7 @@ register_ip_vs_proto_netns(struct net *net, struct ip_vs_protocol *pp)
struct netns_ipvs *ipvs = net_ipvs(net);
unsigned int hash = IP_VS_PROTO_HASH(pp->protocol);
struct ip_vs_proto_data *pd =
- kzalloc(sizeof(struct ip_vs_proto_data), GFP_ATOMIC);
+ kzalloc(sizeof(struct ip_vs_proto_data), GFP_KERNEL);
if (!pd)
return -ENOMEM;
--
1.7.9.5
^ permalink raw reply related
* [PATCH 08/25] ipvs: WRR scheduler does not need GFP_ATOMIC allocation
From: pablo @ 2012-05-08 0:22 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1336436539-5880-1-git-send-email-pablo@netfilter.org>
From: Julian Anastasov <ja@ssi.bg>
Schedulers are initialized and bound to services only
on commands.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
net/netfilter/ipvs/ip_vs_wrr.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netfilter/ipvs/ip_vs_wrr.c b/net/netfilter/ipvs/ip_vs_wrr.c
index fd0d4e0..231be7d 100644
--- a/net/netfilter/ipvs/ip_vs_wrr.c
+++ b/net/netfilter/ipvs/ip_vs_wrr.c
@@ -84,7 +84,7 @@ static int ip_vs_wrr_init_svc(struct ip_vs_service *svc)
/*
* Allocate the mark variable for WRR scheduling
*/
- mark = kmalloc(sizeof(struct ip_vs_wrr_mark), GFP_ATOMIC);
+ mark = kmalloc(sizeof(struct ip_vs_wrr_mark), GFP_KERNEL);
if (mark == NULL)
return -ENOMEM;
--
1.7.9.5
^ permalink raw reply related
* [PATCH 03/25] netfilter: nf_conntrack: use this_cpu_inc()
From: pablo @ 2012-05-08 0:21 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1336436539-5880-1-git-send-email-pablo@netfilter.org>
From: Eric Dumazet <edumazet@google.com>
this_cpu_inc() is IRQ safe and faster than
local_bh_disable()/__this_cpu_inc()/local_bh_enable(), at least on x86.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Christoph Lameter <cl@linux.com>
Cc: Tejun Heo <tj@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
include/net/netfilter/nf_conntrack.h | 10 ++--------
1 file changed, 2 insertions(+), 8 deletions(-)
diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
index ab86036..cce7f6a 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -321,14 +321,8 @@ extern unsigned int nf_conntrack_max;
extern unsigned int nf_conntrack_hash_rnd;
void init_nf_conntrack_hash_rnd(void);
-#define NF_CT_STAT_INC(net, count) \
- __this_cpu_inc((net)->ct.stat->count)
-#define NF_CT_STAT_INC_ATOMIC(net, count) \
-do { \
- local_bh_disable(); \
- __this_cpu_inc((net)->ct.stat->count); \
- local_bh_enable(); \
-} while (0)
+#define NF_CT_STAT_INC(net, count) __this_cpu_inc((net)->ct.stat->count)
+#define NF_CT_STAT_INC_ATOMIC(net, count) this_cpu_inc((net)->ct.stat->count)
#define MODULE_ALIAS_NFCT_HELPER(helper) \
MODULE_ALIAS("nfct-helper-" helper)
--
1.7.9.5
^ permalink raw reply related
* [PATCH 02/25] netfilter: nf_ct_helper: allow to disable automatic helper assignment
From: pablo @ 2012-05-08 0:21 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1336436539-5880-1-git-send-email-pablo@netfilter.org>
From: Eric Leblond <eric@regit.org>
This patch allows you to disable automatic conntrack helper
lookup based on TCP/UDP ports, eg.
echo 0 > /proc/sys/net/netfilter/nf_conntrack_helper
[ Note: flows that already got a helper will keep using it even
if automatic helper assignment has been disabled ]
Once this behaviour has been disabled, you have to explicitly
use the iptables CT target to attach helper to flows.
There are good reasons to stop supporting automatic helper
assignment, for further information, please read:
http://www.netfilter.org/news.html#2012-04-03
This patch also adds one message to inform that automatic helper
assignment is deprecated and it will be removed soon (this is
spotted only once, with the first flow that gets a helper attached
to make it as less annoying as possible).
Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
include/net/netfilter/nf_conntrack_helper.h | 4 +-
include/net/netns/conntrack.h | 3 +
net/netfilter/nf_conntrack_core.c | 15 ++--
net/netfilter/nf_conntrack_helper.c | 107 ++++++++++++++++++++++++---
4 files changed, 108 insertions(+), 21 deletions(-)
diff --git a/include/net/netfilter/nf_conntrack_helper.h b/include/net/netfilter/nf_conntrack_helper.h
index 5767dc2..1d18894 100644
--- a/include/net/netfilter/nf_conntrack_helper.h
+++ b/include/net/netfilter/nf_conntrack_helper.h
@@ -60,8 +60,8 @@ static inline struct nf_conn_help *nfct_help(const struct nf_conn *ct)
return nf_ct_ext_find(ct, NF_CT_EXT_HELPER);
}
-extern int nf_conntrack_helper_init(void);
-extern void nf_conntrack_helper_fini(void);
+extern int nf_conntrack_helper_init(struct net *net);
+extern void nf_conntrack_helper_fini(struct net *net);
extern int nf_conntrack_broadcast_help(struct sk_buff *skb,
unsigned int protoff,
diff --git a/include/net/netns/conntrack.h b/include/net/netns/conntrack.h
index 7a911ec..a053a19 100644
--- a/include/net/netns/conntrack.h
+++ b/include/net/netns/conntrack.h
@@ -26,11 +26,14 @@ struct netns_ct {
int sysctl_tstamp;
int sysctl_checksum;
unsigned int sysctl_log_invalid; /* Log invalid packets */
+ int sysctl_auto_assign_helper;
+ bool auto_assign_helper_warned;
#ifdef CONFIG_SYSCTL
struct ctl_table_header *sysctl_header;
struct ctl_table_header *acct_sysctl_header;
struct ctl_table_header *tstamp_sysctl_header;
struct ctl_table_header *event_sysctl_header;
+ struct ctl_table_header *helper_sysctl_header;
#endif
char *slabname;
};
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index cf0747c..32c5909 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1336,7 +1336,6 @@ static void nf_conntrack_cleanup_init_net(void)
while (untrack_refs() > 0)
schedule();
- nf_conntrack_helper_fini();
nf_conntrack_proto_fini();
#ifdef CONFIG_NF_CONNTRACK_ZONES
nf_ct_extend_unregister(&nf_ct_zone_extend);
@@ -1354,6 +1353,7 @@ static void nf_conntrack_cleanup_net(struct net *net)
}
nf_ct_free_hashtable(net->ct.hash, net->ct.htable_size);
+ nf_conntrack_helper_fini(net);
nf_conntrack_timeout_fini(net);
nf_conntrack_ecache_fini(net);
nf_conntrack_tstamp_fini(net);
@@ -1504,10 +1504,6 @@ static int nf_conntrack_init_init_net(void)
if (ret < 0)
goto err_proto;
- ret = nf_conntrack_helper_init();
- if (ret < 0)
- goto err_helper;
-
#ifdef CONFIG_NF_CONNTRACK_ZONES
ret = nf_ct_extend_register(&nf_ct_zone_extend);
if (ret < 0)
@@ -1525,10 +1521,8 @@ static int nf_conntrack_init_init_net(void)
#ifdef CONFIG_NF_CONNTRACK_ZONES
err_extend:
- nf_conntrack_helper_fini();
-#endif
-err_helper:
nf_conntrack_proto_fini();
+#endif
err_proto:
return ret;
}
@@ -1589,9 +1583,14 @@ static int nf_conntrack_init_net(struct net *net)
ret = nf_conntrack_timeout_init(net);
if (ret < 0)
goto err_timeout;
+ ret = nf_conntrack_helper_init(net);
+ if (ret < 0)
+ goto err_helper;
return 0;
+err_helper:
+ nf_conntrack_timeout_fini(net);
err_timeout:
nf_conntrack_ecache_fini(net);
err_ecache:
diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c
index 436b7cb..52ff897 100644
--- a/net/netfilter/nf_conntrack_helper.c
+++ b/net/netfilter/nf_conntrack_helper.c
@@ -34,6 +34,66 @@ static struct hlist_head *nf_ct_helper_hash __read_mostly;
static unsigned int nf_ct_helper_hsize __read_mostly;
static unsigned int nf_ct_helper_count __read_mostly;
+static bool nf_ct_auto_assign_helper __read_mostly = true;
+module_param_named(nf_conntrack_helper, nf_ct_auto_assign_helper, bool, 0644);
+MODULE_PARM_DESC(nf_conntrack_helper,
+ "Enable automatic conntrack helper assignment (default 1)");
+
+#ifdef CONFIG_SYSCTL
+static struct ctl_table helper_sysctl_table[] = {
+ {
+ .procname = "nf_conntrack_helper",
+ .data = &init_net.ct.sysctl_auto_assign_helper,
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+ {}
+};
+
+static int nf_conntrack_helper_init_sysctl(struct net *net)
+{
+ struct ctl_table *table;
+
+ table = kmemdup(helper_sysctl_table, sizeof(helper_sysctl_table),
+ GFP_KERNEL);
+ if (!table)
+ goto out;
+
+ table[0].data = &net->ct.sysctl_auto_assign_helper;
+
+ net->ct.helper_sysctl_header = register_net_sysctl_table(net,
+ nf_net_netfilter_sysctl_path, table);
+ if (!net->ct.helper_sysctl_header) {
+ printk(KERN_ERR "nf_conntrack_helper: can't register to sysctl.\n");
+ goto out_register;
+ }
+ return 0;
+
+out_register:
+ kfree(table);
+out:
+ return -ENOMEM;
+}
+
+static void nf_conntrack_helper_fini_sysctl(struct net *net)
+{
+ struct ctl_table *table;
+
+ table = net->ct.helper_sysctl_header->ctl_table_arg;
+ unregister_net_sysctl_table(net->ct.helper_sysctl_header);
+ kfree(table);
+}
+#else
+static int nf_conntrack_helper_init_sysctl(struct net *net)
+{
+ return 0;
+}
+
+static void nf_conntrack_helper_fini_sysctl(struct net *net)
+{
+}
+#endif /* CONFIG_SYSCTL */
/* Stupid hash, but collision free for the default registrations of the
* helpers currently in the kernel. */
@@ -118,6 +178,7 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl,
{
struct nf_conntrack_helper *helper = NULL;
struct nf_conn_help *help;
+ struct net *net = nf_ct_net(ct);
int ret = 0;
if (tmpl != NULL) {
@@ -127,8 +188,16 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl,
}
help = nfct_help(ct);
- if (helper == NULL)
+ if (net->ct.sysctl_auto_assign_helper && helper == NULL) {
helper = __nf_ct_helper_find(&ct->tuplehash[IP_CT_DIR_REPLY].tuple);
+ if (unlikely(!net->ct.auto_assign_helper_warned && helper)) {
+ printk(KERN_INFO "nf_conntrack: automatic helper "
+ "assignment is deprecated. Please, read "
+ "http://www.netfilter.org/news.html#2012-04-03\n");
+ net->ct.auto_assign_helper_warned = true;
+ }
+ }
+
if (helper == NULL) {
if (help)
RCU_INIT_POINTER(help->helper, NULL);
@@ -315,28 +384,44 @@ static struct nf_ct_ext_type helper_extend __read_mostly = {
.id = NF_CT_EXT_HELPER,
};
-int nf_conntrack_helper_init(void)
+int nf_conntrack_helper_init(struct net *net)
{
int err;
- nf_ct_helper_hsize = 1; /* gets rounded up to use one page */
- nf_ct_helper_hash = nf_ct_alloc_hashtable(&nf_ct_helper_hsize, 0);
- if (!nf_ct_helper_hash)
- return -ENOMEM;
+ net->ct.auto_assign_helper_warned = false;
+ net->ct.sysctl_auto_assign_helper = nf_ct_auto_assign_helper;
- err = nf_ct_extend_register(&helper_extend);
+ if (net_eq(net, &init_net)) {
+ nf_ct_helper_hsize = 1; /* gets rounded up to use one page */
+ nf_ct_helper_hash =
+ nf_ct_alloc_hashtable(&nf_ct_helper_hsize, 0);
+ if (!nf_ct_helper_hash)
+ return -ENOMEM;
+
+ err = nf_ct_extend_register(&helper_extend);
+ if (err < 0)
+ goto err1;
+ }
+
+ err = nf_conntrack_helper_init_sysctl(net);
if (err < 0)
- goto err1;
+ goto out_sysctl;
return 0;
+out_sysctl:
+ if (net_eq(net, &init_net))
+ nf_ct_extend_unregister(&helper_extend);
err1:
nf_ct_free_hashtable(nf_ct_helper_hash, nf_ct_helper_hsize);
return err;
}
-void nf_conntrack_helper_fini(void)
+void nf_conntrack_helper_fini(struct net *net)
{
- nf_ct_extend_unregister(&helper_extend);
- nf_ct_free_hashtable(nf_ct_helper_hash, nf_ct_helper_hsize);
+ nf_conntrack_helper_fini_sysctl(net);
+ if (net_eq(net, &init_net)) {
+ nf_ct_extend_unregister(&helper_extend);
+ nf_ct_free_hashtable(nf_ct_helper_hash, nf_ct_helper_hsize);
+ }
}
--
1.7.9.5
^ permalink raw reply related
* Re: [PATCH] r8169: fix problem with TSO (TX_BUFFS_AVAIL negative value)
From: Francois Romieu @ 2012-05-07 23:42 UTC (permalink / raw)
To: Alex Villacís Lasso
Cc: Thomas Pilarski, Julien Ducourthial,
Realtek linux nic maintainers, netdev, linux-kernel
In-Reply-To: <1336430367.11363.38.camel@blackbone.local>
Julien Ducourthial <jducourt@free.fr> :
> The r8169 may get stuck or show bad behaviour after activating TSO :
> the net_device is not stopped when it has no more TX descriptors.
> This problem comes from TX_BUFS_AVAIL which may reach -1 when all
> transmit descriptors are in use. The patch simply tries to keep positive
> values.
It seems more than good.
Alex, Thomas, can you check if Julien's patch below fixes your broken
kernels as well ?
diff --git a/drivers/net/ethernet/realtek/r8169.c
b/drivers/net/ethernet/realtek/r8169.c
index f545093..d1e3c51 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -61,8 +61,12 @@
#define R8169_MSG_DEFAULT \
(NETIF_MSG_DRV | NETIF_MSG_PROBE | NETIF_MSG_IFUP | NETIF_MSG_IFDOWN)
-#define TX_BUFFS_AVAIL(tp) \
- (tp->dirty_tx + NUM_TX_DESC - tp->cur_tx - 1)
+#define TX_SLOTS_AVAIL(tp) \
+ (tp->dirty_tx + NUM_TX_DESC - tp->cur_tx)
+
+/* A skbuff with nr_frags needs nr_frags+1 entries in the tx queue */
+#define TX_FRAGS_READY_FOR(tp,nr_frags) \
+ (TX_SLOTS_AVAIL(tp) >= (nr_frags + 1))
/* Maximum number of multicast addresses to filter (vs.
Rx-all-multicast).
The RTL chips use a 64 element hash table based on the Ethernet CRC.
*/
@@ -5115,7 +5119,7 @@ static netdev_tx_t rtl8169_start_xmit(struct
sk_buff *skb,
u32 opts[2];
int frags;
- if (unlikely(TX_BUFFS_AVAIL(tp) < skb_shinfo(skb)->nr_frags)) {
+ if (unlikely(!TX_FRAGS_READY_FOR(tp, skb_shinfo(skb)->nr_frags))) {
netif_err(tp, drv, dev, "BUG! Tx Ring full when queue awake!\n");
goto err_stop_0;
}
@@ -5169,7 +5173,7 @@ static netdev_tx_t rtl8169_start_xmit(struct
sk_buff *skb,
mmiowb();
- if (TX_BUFFS_AVAIL(tp) < MAX_SKB_FRAGS) {
+ if (!TX_FRAGS_READY_FOR(tp, MAX_SKB_FRAGS)) {
/* Avoid wrongly optimistic queue wake-up: rtl_tx thread must
* not miss a ring update when it notices a stopped queue.
*/
@@ -5183,7 +5187,7 @@ static netdev_tx_t rtl8169_start_xmit(struct
sk_buff *skb,
* can't.
*/
smp_mb();
- if (TX_BUFFS_AVAIL(tp) >= MAX_SKB_FRAGS)
+ if (TX_FRAGS_READY_FOR(tp, MAX_SKB_FRAGS))
netif_wake_queue(dev);
}
@@ -5306,7 +5310,7 @@ static void rtl_tx(struct net_device *dev, struct
rtl8169_private *tp)
*/
smp_mb();
if (netif_queue_stopped(dev) &&
- (TX_BUFFS_AVAIL(tp) >= MAX_SKB_FRAGS)) {
+ TX_FRAGS_READY_FOR(tp, MAX_SKB_FRAGS)) {
netif_wake_queue(dev);
}
/*
--
1.7.7.6
^ permalink raw reply related
* Re: [PATCH] net: compare_ether_addr[_64bits]() has no ordering
From: David Miller @ 2012-05-07 23:20 UTC (permalink / raw)
To: johannes; +Cc: eric.dumazet, netdev
In-Reply-To: <1336399961.4325.30.camel@jlt3.sipsolutions.net>
From: Johannes Berg <johannes@sipsolutions.net>
Date: Mon, 07 May 2012 16:12:41 +0200
> On Mon, 2012-05-07 at 15:53 +0200, Eric Dumazet wrote:
>> On Mon, 2012-05-07 at 15:39 +0200, Johannes Berg wrote:
>> > From: Johannes Berg <johannes.berg@intel.com>
>> >
>> > Neither compare_ether_addr() nor compare_ether_addr_64bits()
>> > (as it can fall back to the former) have comparison semantics
>> > like memcmp() where the sign of the return value indicates sort
>> > order. We had a bug in the wireless code due to a blind memcmp
>> > replacement because of this.
>> >
>> > A cursory look suggests that the wireless bug was the only one
>> > due to this semantic difference.
>> >
>> > Signed-off-by: Johannes Berg <johannes.berg@intel.com>
>> > ---
>> > include/linux/etherdevice.h | 11 ++++++-----
>> > 1 file changed, 6 insertions(+), 5 deletions(-)
>>
>> The right way to avoid this kind of problems is to change these
>> functions to return a bool
>
> Well, I guess so, but that'd be a weird thing for a compare_ function...
> should probably be named equal_... then, but I'm not really able to do
> such a huge change on the first day after my vacation :-)
It's true the name could be improved, but changing the name is quite
a large undertaking even with automated scripts.
Even the bool change is slightly painful, since all of the explicit
tests against integers (%99.999 of these are in wireless BTW :-) would
need to be adjusted.
For now, I'll just apply Johannes's comment fix.
^ permalink raw reply
* Re: [PATCH v2 00/17] netfilter: add namespace support for netfilter protos
From: Pablo Neira Ayuso @ 2012-05-07 23:19 UTC (permalink / raw)
To: Gao feng; +Cc: netfilter-devel, netdev, serge.hallyn, ebiederm, dlezcano
In-Reply-To: <1335519484-6089-1-git-send-email-gaofeng@cn.fujitsu.com>
Hi,
On Fri, Apr 27, 2012 at 05:37:47PM +0800, Gao feng wrote:
> Currently the sysctl of netfilter proto is not isolated, so when
> changing proto's sysctl in container will cause the host's sysctl
> be changed too. it's not expected.
>
> This patch set adds the namespace support for netfilter protos.
>
> impletement four pernet_operations to register sysctl and initial
> pernet data for proto.
>
> -ipv4_net_ops is used to register tcp4(compat),
> udp4(compat),icmp(compat),ipv4(compat).
> -ipv6_net_ops is used to register tcp6,udp6 and icmpv6.
> -sctp_net_ops is used to register sctp4(compat) and sctp6.
> -udplite_net_ops is used to register udplite4 and udplite6
>
> extern l[3,4]proto (sysctl) register functions to make them support
> namespace.
>
> finailly add namespace support for cttimeout.
>
> Gao feng (17):
> netfilter: add struct nf_proto_net for register l4proto sysctl
> netfilter: add namespace support for l4proto
> netfilter: add namespace support for l3proto
> netfilter: add namespace support for l4proto_generic
> netfilter: add namespace support for l4proto_tcp
> netfilter: add namespace support for l4proto_udp
> netfilter: add namespace support for l4proto_icmp
> netfilter: add namespace support for l4proto_icmpv6
> netfilter: add namespace support for l3proto_ipv4
> netfilter: add namespace support for l3proto_ipv6
> netfilter: add namespace support for l4proto_sctp
> netfilter: add namespace support for l4proto_udplite
> netfilter: adjust l4proto_dccp to the nf_conntrack_l4proto_register
> netfilter: adjust l4proto_gre4 to the nf_conntrack_l4proto_register
> netfilter: cleanup sysctl for l4proto and l3proto
> netfilter: add namespace support for cttimeout
> netfilter: cttimeout use pernet data of l4proto
I've been having a look at this patchset several times since last
week. The logic that it follows to split changes into patches is not
correct. This breaks the compilation of my tree since patch 2 until
the entire patchset is applied.
This has to start by one patch that adds the basic infrastructure to
register the layer 3 and 4 conntrack timeout per-net support and it
prepares the per-protocol per-net support. This implies to
propagate the minimal set of changes to make sure it compiles, ie.
modify clients to use the new interface to register init_net.
Then, follow-up per-protocol patches that use the new infrastructure
implement the per-protocol support. All this without breaking the
compilation of my tree between patches.
I'm all for fixing the existing unfinished container support for
Netfilter, but this needs to be done appropriately.
^ permalink raw reply
* [PATCH] r8169: fix problem with TSO (TX_BUFFS_AVAIL negative value)
From: Julien Ducourthial @ 2012-05-07 22:39 UTC (permalink / raw)
To: Francois Romieu, Realtek linux nic maintainers, netdev; +Cc: linux-kernel
The r8169 may get stuck or show bad behaviour after activating TSO :
the net_device is not stopped when it has no more TX descriptors.
This problem comes from TX_BUFS_AVAIL which may reach -1 when all
transmit descriptors are in use. The patch simply tries to keep positive
values.
Tested with 8111d(onboard) on a D510MO, and with 8111e(onboard) on a
Zotac 890GXITX.
Signed-off-by: Julien Ducourthial <jducourt@free.fr>
---
drivers/net/ethernet/realtek/r8169.c | 16 ++++++++++------
1 files changed, 10 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/realtek/r8169.c
b/drivers/net/ethernet/realtek/r8169.c
index f545093..d1e3c51 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -61,8 +61,12 @@
#define R8169_MSG_DEFAULT \
(NETIF_MSG_DRV | NETIF_MSG_PROBE | NETIF_MSG_IFUP | NETIF_MSG_IFDOWN)
-#define TX_BUFFS_AVAIL(tp) \
- (tp->dirty_tx + NUM_TX_DESC - tp->cur_tx - 1)
+#define TX_SLOTS_AVAIL(tp) \
+ (tp->dirty_tx + NUM_TX_DESC - tp->cur_tx)
+
+/* A skbuff with nr_frags needs nr_frags+1 entries in the tx queue */
+#define TX_FRAGS_READY_FOR(tp,nr_frags) \
+ (TX_SLOTS_AVAIL(tp) >= (nr_frags+1))
/* Maximum number of multicast addresses to filter (vs.
Rx-all-multicast).
The RTL chips use a 64 element hash table based on the Ethernet CRC.
*/
@@ -5115,7 +5119,7 @@ static netdev_tx_t rtl8169_start_xmit(struct
sk_buff *skb,
u32 opts[2];
int frags;
- if (unlikely(TX_BUFFS_AVAIL(tp) < skb_shinfo(skb)->nr_frags)) {
+ if (unlikely(!TX_FRAGS_READY_FOR(tp, skb_shinfo(skb)->nr_frags))) {
netif_err(tp, drv, dev, "BUG! Tx Ring full when queue awake!\n");
goto err_stop_0;
}
@@ -5169,7 +5173,7 @@ static netdev_tx_t rtl8169_start_xmit(struct
sk_buff *skb,
mmiowb();
- if (TX_BUFFS_AVAIL(tp) < MAX_SKB_FRAGS) {
+ if (!TX_FRAGS_READY_FOR(tp, MAX_SKB_FRAGS)) {
/* Avoid wrongly optimistic queue wake-up: rtl_tx thread must
* not miss a ring update when it notices a stopped queue.
*/
@@ -5183,7 +5187,7 @@ static netdev_tx_t rtl8169_start_xmit(struct
sk_buff *skb,
* can't.
*/
smp_mb();
- if (TX_BUFFS_AVAIL(tp) >= MAX_SKB_FRAGS)
+ if (TX_FRAGS_READY_FOR(tp, MAX_SKB_FRAGS))
netif_wake_queue(dev);
}
@@ -5306,7 +5310,7 @@ static void rtl_tx(struct net_device *dev, struct
rtl8169_private *tp)
*/
smp_mb();
if (netif_queue_stopped(dev) &&
- (TX_BUFFS_AVAIL(tp) >= MAX_SKB_FRAGS)) {
+ TX_FRAGS_READY_FOR(tp, MAX_SKB_FRAGS)) {
netif_wake_queue(dev);
}
/*
--
1.7.7.6
^ permalink raw reply related
* r8169: problem with TSO (TX_BUFFS_AVAIL negative value)
From: Julien Ducourthial @ 2012-05-07 22:35 UTC (permalink / raw)
To: nic_swsd, romieu; +Cc: netdev, kernel
Hi,
Whenever I activate TSO on my realtek based NIC, it gets stuck after a
while under heavy traffic (both under an zotac amd board with a 8111e on
board and an intel D510MO atom with 8111d on board).
After doing some investigation (with systemtap), the problem seems to be
that the net_device is not stopped when all TX descriptors are in use.
This behavior comes from the macro TX_BUFFS_AVAIL(tp). It is an unsigned
expression but returns -1 when the transmit queue is full (the macro
handles the room for the skb and the frags).
This condition only happens when you have skb with lots of frags
(otherwise the nic is stopped before all TX desc are in use), hence only
in the tso case for me.
I made a small patch to avoid negative values, and the driver works
great with TSO. With my atom based board I can now reach maximum
throughput as an NFS server.
Best regards,
Julien Ducourthial (1):
r8169: fix problem with TSO (TX_BUFFS_AVAIL negative value)
drivers/net/ethernet/realtek/r8169.c | 16 ++++++++++------
1 files changed, 10 insertions(+), 6 deletions(-)
--
1.7.7.6
^ permalink raw reply
* Re: Netlink for kernel<->user space communication?
From: Stephen Hemminger @ 2012-05-07 22:33 UTC (permalink / raw)
To: Arvid Brodin; +Cc: netdev@vger.kernel.org
In-Reply-To: <4FA817C8.9040204@xdin.com>
On Mon, 7 May 2012 18:43:23 +0000
Arvid Brodin <Arvid.Brodin@xdin.com> wrote:
> On Tue, 24 Apr 2012 16:57:55 -0700
> Stephen Hemminger <shemminger@xxxxxxxxxx> wrote:
> > On Tue, 24 Apr 2012 23:52:34 +0000
> > Arvid Brodin <Arvid.Brodin@xxxxxxxx> wrote:
> >
> >> Hi.
> >>
> >> I'm writing a kernel driver for the HSR protocol, a standard for high availability
> >> networks. I want to send messages from the kernel to user space about broken network
> >> links. I also want user space to be able to ask the kernel about its view of the status of
> >> nodes on the network.
> >>
> >> Netlink seems like a good tool for this. (Is it?)
> >
> > Yes.
> >
> >> But do I use raw netlink? (Described here: http://www.linuxjournal.com/article/7356 - but
> >> this seems a bit out of date, the kernel API description differs from today's kernel
> >> implementation.)
> >
> > No. Your driver probably looks like a device so you should be
> > using rtnetlink messages.
>
> I'm already using rtnetlink messages to add and remove my device, which works fine (see
> e.g. http://www.spinics.net/lists/netdev/msg192817.html - although I didn't think it
> meaningful to include the iproute2 patch here, until the kernel part is ready).
>
> The protocol specifies transmission of "supervision frames" every 2 seconds, e.g. to check
> link integrity. Every such frame should be received from two directions in the ring - if
> only one is received, then there is a link problem.
Why not just manipulate the carrier or operational state (see Documentation/networking/operstate)
and use the existing notification on link changes. If you don't get heartbeat then change
the state of the device to indicate lower device is down with set_operstate(), the necessary
link everts propgate back as netlink events.
> I'd like to notify user space about every such occurence. Is there a rtnetlink message
> type that fits this? The stuff in rtnetlink.h seems to be mostly concerned with specific
> user space commands (there is something called RTNLGRP_NOTIFY but I couldn't find any
> instances of it being used in the kernel, nor any documentation).
>
I am trying to steer you to use existing API's because then existing programs and
infrastructure can deal with the new device type.
^ permalink raw reply
* [PATCH] net/bluetooth/bnep/core.c: use constant for ethertype
From: Eldad Zack @ 2012-05-07 22:09 UTC (permalink / raw)
To: Marcel Holtmann, Johan Hedberg, David S. Miller
Cc: linux-bluetooth, netdev, linux-kernel, Eldad Zack
The dot1q ethertype number (0x8100) is embedded in the code, although
it is already defined in included headers.
Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
---
net/bluetooth/bnep/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/bluetooth/bnep/core.c b/net/bluetooth/bnep/core.c
index a779ec7..4fab436 100644
--- a/net/bluetooth/bnep/core.c
+++ b/net/bluetooth/bnep/core.c
@@ -340,7 +340,7 @@ static inline int bnep_rx_frame(struct bnep_session *s, struct sk_buff *skb)
}
/* Strip 802.1p header */
- if (ntohs(s->eh.h_proto) == 0x8100) {
+ if (ntohs(s->eh.h_proto) == ETH_P_8021Q) {
if (!skb_pull(skb, 4))
goto badframe;
s->eh.h_proto = get_unaligned((__be16 *) (skb->data - 2));
--
1.7.10
^ permalink raw reply related
* Re: [PATCH 1/2] vhost: basic tracepoints
From: Michael S. Tsirkin @ 2012-05-07 21:10 UTC (permalink / raw)
To: Jason Wang; +Cc: netdev, linux-kernel, kvm, virtualization
In-Reply-To: <20120410025819.49693.32870.stgit@amd-6168-8-1.englab.nay.redhat.com>
On Tue, Apr 10, 2012 at 10:58:19AM +0800, Jason Wang wrote:
> To help for the performance optimizations and debugging, this patch tracepoints
> for vhost. Pay attention that the tracepoints are only for vhost, net code are
> not touched.
>
> Two kinds of activities were traced: virtio and vhost work.
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
Thanks for looking into this.
Some questions:
Do we need to prefix traces with vhost_virtio_?
How about a trace for enabling/disabling interrupts?
Trace for a suppressed interrupt?
I think we need a vq # not pointer.
Also need some id for when there are many guests.
Use the vhost thread name (includes owner pid)? It's pid? Both?
Also, traces do add very small overhead when compiled but not
enabled mainly due to increasing register pressure.
Need to test to make sure perf is not hurt.
Some traces are just for debugging so build them on
debug kernel only?
Further, there are many events some are rare
some are common. I think we need some naming scheme
so that really useful and low overhead stuff is easier
to enable ignoring the rarely usefu;/higher overhead traces.
> ---
> drivers/vhost/trace.h | 153 +++++++++++++++++++++++++++++++++++++++++++++++++
> drivers/vhost/vhost.c | 17 +++++
> 2 files changed, 168 insertions(+), 2 deletions(-)
> create mode 100644 drivers/vhost/trace.h
>
> diff --git a/drivers/vhost/trace.h b/drivers/vhost/trace.h
> new file mode 100644
> index 0000000..0423899
> --- /dev/null
> +++ b/drivers/vhost/trace.h
> @@ -0,0 +1,153 @@
> +#if !defined(_TRACE_VHOST_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_VHOST_H
> +
> +#include <linux/tracepoint.h>
> +#include "vhost.h"
> +
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM vhost
> +
> +/*
> + * Tracepoint for updating used flag.
> + */
> +TRACE_EVENT(vhost_virtio_update_used_flags,
> + TP_PROTO(struct vhost_virtqueue *vq),
> + TP_ARGS(vq),
> +
> + TP_STRUCT__entry(
> + __field(struct vhost_virtqueue *, vq)
> + __field(u16, used_flags)
> + ),
> +
> + TP_fast_assign(
> + __entry->vq = vq;
> + __entry->used_flags = vq->used_flags;
> + ),
> +
> + TP_printk("vhost update used flag %x to vq %p notify %s",
> + __entry->used_flags, __entry->vq,
> + (__entry->used_flags & VRING_USED_F_NO_NOTIFY) ?
> + "disabled" : "enabled")
> +);
> +
> +/*
> + * Tracepoint for updating avail event.
> + */
> +TRACE_EVENT(vhost_virtio_update_avail_event,
> + TP_PROTO(struct vhost_virtqueue *vq),
> + TP_ARGS(vq),
> +
> + TP_STRUCT__entry(
> + __field(struct vhost_virtqueue *, vq)
> + __field(u16, avail_idx)
> + ),
> +
> + TP_fast_assign(
> + __entry->vq = vq;
> + __entry->avail_idx = vq->avail_idx;
> + ),
> +
> + TP_printk("vhost update avail idx %u(%u) for vq %p",
> + __entry->avail_idx, __entry->avail_idx %
> + __entry->vq->num, __entry->vq)
> +);
> +
> +/*
> + * Tracepoint for processing descriptor.
> + */
> +TRACE_EVENT(vhost_virtio_get_vq_desc,
> + TP_PROTO(struct vhost_virtqueue *vq, unsigned int index,
> + unsigned out, unsigned int in),
> + TP_ARGS(vq, index, out, in),
> +
> + TP_STRUCT__entry(
> + __field(struct vhost_virtqueue *, vq)
> + __field(unsigned int, head)
> + __field(unsigned int, out)
> + __field(unsigned int, in)
> + ),
> +
> + TP_fast_assign(
> + __entry->vq = vq;
> + __entry->head = index;
> + __entry->out = out;
> + __entry->in = in;
> + ),
> +
> + TP_printk("vhost get vq %p head %u out %u in %u",
> + __entry->vq, __entry->head, __entry->out, __entry->in)
> +
> +);
> +
> +/*
> + * Tracepoint for signal guest.
> + */
> +TRACE_EVENT(vhost_virtio_signal,
> + TP_PROTO(struct vhost_virtqueue *vq),
> + TP_ARGS(vq),
> +
> + TP_STRUCT__entry(
> + __field(struct vhost_virtqueue *, vq)
> + ),
> +
> + TP_fast_assign(
> + __entry->vq = vq;
> + ),
> +
> + TP_printk("vhost signal vq %p", __entry->vq)
> +);
> +
> +DECLARE_EVENT_CLASS(vhost_work_template,
> + TP_PROTO(struct vhost_dev *dev, struct vhost_work *work),
> + TP_ARGS(dev, work),
> +
> + TP_STRUCT__entry(
> + __field(struct vhost_dev *, dev)
> + __field(struct vhost_work *, work)
> + __field(void *, function)
> + ),
> +
> + TP_fast_assign(
> + __entry->dev = dev;
> + __entry->work = work;
> + __entry->function = work->fn;
> + ),
> +
> + TP_printk("%pf for work %p dev %p",
> + __entry->function, __entry->work, __entry->dev)
> +);
> +
> +DEFINE_EVENT(vhost_work_template, vhost_work_queue_wakeup,
> + TP_PROTO(struct vhost_dev *dev, struct vhost_work *work),
> + TP_ARGS(dev, work));
> +
> +DEFINE_EVENT(vhost_work_template, vhost_work_queue_coalesce,
> + TP_PROTO(struct vhost_dev *dev, struct vhost_work *work),
> + TP_ARGS(dev, work));
> +
> +DEFINE_EVENT(vhost_work_template, vhost_poll_start,
> + TP_PROTO(struct vhost_dev *dev, struct vhost_work *work),
> + TP_ARGS(dev, work));
> +
> +DEFINE_EVENT(vhost_work_template, vhost_poll_stop,
> + TP_PROTO(struct vhost_dev *dev, struct vhost_work *work),
> + TP_ARGS(dev, work));
> +
> +DEFINE_EVENT(vhost_work_template, vhost_work_execute_start,
> + TP_PROTO(struct vhost_dev *dev, struct vhost_work *work),
> + TP_ARGS(dev, work));
> +
> +DEFINE_EVENT(vhost_work_template, vhost_work_execute_end,
> + TP_PROTO(struct vhost_dev *dev, struct vhost_work *work),
> + TP_ARGS(dev, work));
> +
> +#endif /* _TRACE_VHOST_H */
> +
> +#undef TRACE_INCLUDE_PATH
> +#define TRACE_INCLUDE_PATH ../../drivers/vhost
> +#undef TRACE_INCLUDE_FILE
> +#define TRACE_INCLUDE_FILE trace
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>
> +
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index c14c42b..23f8d85 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -31,6 +31,8 @@
> #include <linux/if_arp.h>
>
> #include "vhost.h"
> +#define CREATE_TRACE_POINTS
> +#include "trace.h"
>
> enum {
> VHOST_MEMORY_MAX_NREGIONS = 64,
> @@ -50,6 +52,7 @@ static void vhost_poll_func(struct file *file, wait_queue_head_t *wqh,
> poll = container_of(pt, struct vhost_poll, table);
> poll->wqh = wqh;
> add_wait_queue(wqh, &poll->wait);
> + trace_vhost_poll_start(NULL, &poll->work);
> }
>
> static int vhost_poll_wakeup(wait_queue_t *wait, unsigned mode, int sync,
> @@ -101,6 +104,7 @@ void vhost_poll_start(struct vhost_poll *poll, struct file *file)
> void vhost_poll_stop(struct vhost_poll *poll)
> {
> remove_wait_queue(poll->wqh, &poll->wait);
> + trace_vhost_poll_stop(NULL, &poll->work);
> }
>
> static bool vhost_work_seq_done(struct vhost_dev *dev, struct vhost_work *work,
> @@ -147,7 +151,9 @@ static inline void vhost_work_queue(struct vhost_dev *dev,
> list_add_tail(&work->node, &dev->work_list);
> work->queue_seq++;
> wake_up_process(dev->worker);
> - }
> + trace_vhost_work_queue_wakeup(dev, work);
> + } else
> + trace_vhost_work_queue_coalesce(dev, work);
> spin_unlock_irqrestore(&dev->work_lock, flags);
> }
>
> @@ -221,7 +227,9 @@ static int vhost_worker(void *data)
>
> if (work) {
> __set_current_state(TASK_RUNNING);
> + trace_vhost_work_execute_start(dev, work);
> work->fn(work);
> + trace_vhost_work_execute_end(dev, work);
> } else
> schedule();
>
> @@ -1011,6 +1019,7 @@ static int vhost_update_used_flags(struct vhost_virtqueue *vq)
> if (vq->log_ctx)
> eventfd_signal(vq->log_ctx, 1);
> }
> + trace_vhost_virtio_update_used_flags(vq);
> return 0;
> }
>
> @@ -1030,6 +1039,7 @@ static int vhost_update_avail_event(struct vhost_virtqueue *vq, u16 avail_event)
> if (vq->log_ctx)
> eventfd_signal(vq->log_ctx, 1);
> }
> + trace_vhost_virtio_update_avail_event(vq);
> return 0;
> }
>
> @@ -1319,6 +1329,7 @@ int vhost_get_vq_desc(struct vhost_dev *dev, struct vhost_virtqueue *vq,
> /* Assume notifications from guest are disabled at this point,
> * if they aren't we would need to update avail_event index. */
> BUG_ON(!(vq->used_flags & VRING_USED_F_NO_NOTIFY));
> + trace_vhost_virtio_get_vq_desc(vq, head, *out_num, *in_num);
> return head;
> }
>
> @@ -1485,8 +1496,10 @@ static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
> void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq)
> {
> /* Signal the Guest tell them we used something up. */
> - if (vq->call_ctx && vhost_notify(dev, vq))
> + if (vq->call_ctx && vhost_notify(dev, vq)) {
> eventfd_signal(vq->call_ctx, 1);
> + trace_vhost_virtio_signal(vq);
> + }
> }
>
> /* And here's the combo meal deal. Supersize me! */
^ permalink raw reply
* [PULL net-next] macvtap, vhost and virtio tools updates
From: Michael S. Tsirkin @ 2012-05-07 20:55 UTC (permalink / raw)
To: David Miller; +Cc: netdev, kvm, jasowang, mst
There are mostly bugfixes here.
I hope to merge some more patches by 3.5, in particular
vlan support fixes are waiting for Eric's ack,
and a version of tracepoint patch might be
ready in time, but let's merge what's ready so it's testable.
This includes a ton of zerocopy fixes by Jason -
good stuff but too intrusive for 3.4 and zerocopy is experimental
anyway.
virtio supported delayed interrupt for a while now
so adding support to the virtio tool made sense
Please pull into net-next and merge for 3.5.
Thanks!
MST
The following changes since commit e4ae004b84b315dd4b762e474f97403eac70f76a:
netem: add ECN capability (2012-05-01 09:39:48 -0400)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-net-next
for you to fetch changes up to c70aa540c7a9f67add11ad3161096fb95233aa2e:
vhost: zerocopy: poll vq in zerocopy callback (2012-05-02 18:22:25 +0300)
----------------------------------------------------------------
Jason Wang (9):
macvtap: zerocopy: fix offset calculation when building skb
macvtap: zerocopy: fix truesize underestimation
macvtap: zerocopy: put page when fail to get all requested user pages
macvtap: zerocopy: set SKBTX_DEV_ZEROCOPY only when skb is built successfully
macvtap: zerocopy: validate vectors before building skb
vhost_net: zerocopy: fix possible NULL pointer dereference of vq->bufs
vhost_net: re-poll only on EAGAIN or ENOBUFS
vhost_net: zerocopy: adding and signalling immediately when fully copied
vhost: zerocopy: poll vq in zerocopy callback
Michael S. Tsirkin (1):
virtio/tools: add delayed interupt mode
drivers/net/macvtap.c | 57 ++++++++++++++++++++++++++++++-------------
drivers/vhost/net.c | 7 ++++-
drivers/vhost/vhost.c | 1 +
tools/virtio/linux/virtio.h | 1 +
tools/virtio/virtio_test.c | 26 ++++++++++++++++---
5 files changed, 69 insertions(+), 23 deletions(-)
^ permalink raw reply
* Re: [PATCH resend] [IPV6] remove sysctl accept_source_route
From: Eldad Zack @ 2012-05-07 20:52 UTC (permalink / raw)
To: David Miller; +Cc: kuznet, jmorris, yoshfuji, kaber, linux-kernel, netdev
In-Reply-To: <20120506.124151.864202935548493756.davem@davemloft.net>
I'm sorry, I wasn't aware of that.
I will check it from now on.
Eldad
On 6 May 2012 18:41, David Miller <davem@davemloft.net> wrote:
>
> Why are you resending this?
>
> Your patch is sitting in patchwork waiting to be reviewed and applied.
>
> When you needlessly resend a patch it makes more work for me, so do
> not do this. Check the queue at:
>
> http://patchwork.ozlabs.org/project/netdev/list/
>
> first.
^ permalink raw reply
* [PATCH] pch_gbe: Adding read memory barriers
From: Erwan Velu @ 2012-05-07 19:30 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tshimizu818
From bb1e271db0fa1a29df19bede69faf8004389132d Mon Sep 17 00:00:00 2001
From: Erwan Velu <erwan.velu@zodiacaerospace.com>
Date: Mon, 7 May 2012 19:15:29 +0000
Subject: [PATCH 1/1] pch_gbe: Adding read memory barriers
Under a strong incoming packet stream and/or high cpu usage,
the pch_gbe driver reports "Receive CRC Error" and discards packet.
It occurred on an Intel ATOM E620T while running a 300mbit/sec multicast
network stream leading to a ~100% cpu usage.
Adding rmb() calls before considering the network card's status solve
this issue.
Getting it into stable would be perfect as it solves reliability issues.
Signed-off-by: Erwan Velu <erwan.velu@zodiacaerospace.com>
---
.../net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
index 8035e5f..ace3654 100644
--- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
@@ -1413,6 +1413,7 @@ static irqreturn_t pch_gbe_intr(int irq, void *data)
pch_gbe_mac_set_pause_packet(hw);
}
}
+ smp_rmb(); /* prevent any other reads before*/
/* When request status is Receive interruption */
if ((int_st & (PCH_GBE_INT_RX_DMA_CMPLT | PCH_GBE_INT_TX_CMPLT)) ||
@@ -1582,6 +1584,7 @@ pch_gbe_clean_tx(struct pch_gbe_adapter *adapter,
i = tx_ring->next_to_clean;
tx_desc = PCH_GBE_TX_DESC(*tx_ring, i);
+ rmb(); /* prevent any other reads before*/
pr_debug("gbec_status:0x%04x dma_status:0x%04x\n",
tx_desc->gbec_status, tx_desc->dma_status);
@@ -1682,6 +1685,7 @@ pch_gbe_clean_rx(struct pch_gbe_adapter *adapter,
while (*work_done < work_to_do) {
/* Check Rx descriptor status */
rx_desc = PCH_GBE_RX_DESC(*rx_ring, i);
+ rmb(); /* prevent any other reads before*/
if (rx_desc->gbec_status == DSC_INIT16)
break;
cleaned = true;
--
1.7.3.4
^ permalink raw reply related
* Re[2]: [v12 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark
From: Hans Schillstrom @ 2012-05-07 19:09 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: Hans Schillstrom, kaber@trash.net, jengelh@medozas.de,
netfilter-devel@vger.kernel.org, netdev@vger.kernel.org
>On Mon, May 07, 2012 at 02:57:30PM +0200, Hans Schillstrom wrote:
>> On Monday 07 May 2012 14:22:32 Pablo Neira Ayuso wrote:
>> > On Mon, May 07, 2012 at 02:09:46PM +0200, Hans Schillstrom wrote:
>> > > On Monday 07 May 2012 13:56:12 Pablo Neira Ayuso wrote:
>> > > > On Mon, May 07, 2012 at 11:14:34AM +0200, Hans Schillstrom wrote:
>> > > > > > > We have plenty of rules where just source port mask is zero.
>> > > > > > > and the dest-port-mask is 0xfffc (or 0xffff)
>> > > > > >
>> > > > > > 0xffff and 0x0000 means on/off respectively.
>> > > > > >
>> > > > > > Still curious, how can 0xfffc be useful?
>> > > > >
>> > > > > That's a special case where an appl is using 4 ports.
>> > > > > But in general, have not seen other than "on/off" except for above.
>> > > >
>> > > > I see. Well I'm fine with this way to switch on/off things, just
>> > > > wanted some clafication.
>> > > >
>> > > > Still one final thing I'd like to remove before inclusion:
>> > > >
>> > > > + union hmark_ports port_mask;
>> > > > + union hmark_ports port_set;
>> > > > + __u32 spi_mask;
>> > > > + __u32 spi_set;
>> > > >
>> > > > the spi_mask seems redundant. The port_mask already provides u32 for
>> > > > it.
>> > >
>> > > No problems, I'll remove it.
>> >
>> > OK. As a nice side-effect, this will lead to removing the branch that
>> > tests ESP/AH in hmark_set_tuple_ports.
>> >
>> Yes, only check if not ESP or AH to swap src/dst
>
>Do you really that branch? I mean, unless I'm missing anything, swapping
>them shouldn't be a problem.
Well,
that was just to keep backward compatibility and make my tests happy.
I'll remove them and change my test setup.
^ permalink raw reply
* Re: Netlink for kernel<->user space communication?
From: Arvid Brodin @ 2012-05-07 18:43 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev@vger.kernel.org
On Tue, 24 Apr 2012 16:57:55 -0700
Stephen Hemminger <shemminger@xxxxxxxxxx> wrote:
> On Tue, 24 Apr 2012 23:52:34 +0000
> Arvid Brodin <Arvid.Brodin@xxxxxxxx> wrote:
>
>> Hi.
>>
>> I'm writing a kernel driver for the HSR protocol, a standard for high availability
>> networks. I want to send messages from the kernel to user space about broken network
>> links. I also want user space to be able to ask the kernel about its view of the status of
>> nodes on the network.
>>
>> Netlink seems like a good tool for this. (Is it?)
>
> Yes.
>
>> But do I use raw netlink? (Described here: http://www.linuxjournal.com/article/7356 - but
>> this seems a bit out of date, the kernel API description differs from today's kernel
>> implementation.)
>
> No. Your driver probably looks like a device so you should be
> using rtnetlink messages.
I'm already using rtnetlink messages to add and remove my device, which works fine (see
e.g. http://www.spinics.net/lists/netdev/msg192817.html - although I didn't think it
meaningful to include the iproute2 patch here, until the kernel part is ready).
The protocol specifies transmission of "supervision frames" every 2 seconds, e.g. to check
link integrity. Every such frame should be received from two directions in the ring - if
only one is received, then there is a link problem.
I'd like to notify user space about every such occurence. Is there a rtnetlink message
type that fits this? The stuff in rtnetlink.h seems to be mostly concerned with specific
user space commands (there is something called RTNLGRP_NOTIFY but I couldn't find any
instances of it being used in the kernel, nor any documentation).
>> Or do I use the "Kernel Connector" (Documentation/connector/connector.txt)?
> no.
Your reply didn't reach me for some reason - I found it just yesterday on spinics - and in
the meantime I've implemented the notification using the connector protocol... :-|
--
Arvid Brodin
Enea Services Stockholm AB - since February 16 a part of Xdin in the Alten Group. Soon we
will be working under the common brand name Xdin. Read more at www.xdin.com.
^ permalink raw reply
* Re: [PATCH] mwl8k: Add 0x2a02 PCI device-id (Marvell 88W8361)
From: Adrian Chadd @ 2012-05-07 18:26 UTC (permalink / raw)
To: Dan Williams
Cc: Lennert Buytenhek, sedat.dilek, John W. Linville, linux-wireless,
netdev, linux-kernel, lautriv, Jim Cromie
In-Reply-To: <1336408965.2385.25.camel@dcbw.foobar.com>
Hi,
Let me see if the topdog firmware that FreeBSD ships with supports hostap mode.
I'm having aggregation issues but I think that's driver side, not firmware side.
Adrian
^ permalink raw reply
* Re: [PATCH] pch_gbe: Adding read memory barriers
From: David Miller @ 2012-05-07 17:57 UTC (permalink / raw)
To: erwanaliasr1; +Cc: netdev, linux-kernel, tshimizu818
In-Reply-To: <4FA80B46.5070405@gmail.com>
Your patch doesn't apply to the net-next tree which is what you should
be basing all of your networking patches on:
[davem@bql net-next]$ git am --signoff pch_gbe-Adding-read-memory-barriers.patch
Applying: pch_gbe: Adding read memory barriers
error: patch failed: drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c:1222
error: drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c: patch does not apply
Patch failed at 0001 pch_gbe: Adding read memory barriers
When you have resolved this problem run "git am --resolved".
If you would prefer to skip this patch, instead run "git am --skip".
To restore the original branch and stop patching run "git am --abort".
Stop rushing things and take your time learning the process.
Otherwise you're going to make more work for maintainers and they
end up grumpy as a result, which you don't want.
^ permalink raw reply
* [PATCH] pch_gbe: Adding read memory barriers
From: Erwan Velu @ 2012-05-07 17:49 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: tshimizu818
From 3b65802e4c5a8827a84022066f10dec4d61c1f22 Mon Sep 17 00:00:00 2001
From: Erwan Velu <erwan.velu@zodiacaerospace.com>
Date: Mon, 7 May 2012 14:53:17 +0200
Subject: [PATCH 1/1] pch_gbe: Adding read memory barriers
Under a strong incoming packet stream and/or high cpu usage,
the pch_gbe driver reports "Receive CRC Error" and discards packet.
It occurred on an Intel ATOM E620T while running a 300mbit/sec multicast
network stream leading to a ~100% cpu usage.
Adding rmb() calls before considering the network card's status solve
this issues.
This patch got validated on the 3.2.16 kernel but also apply to the 3.x
family.
Getting it into stable would be perfect as it solves reliability issues.
Signed-off-by: Erwan Velu <erwan.velu@zodiacaerospace.com>
---
.../net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c | 8 ++++++++
1 files changed, 8 insertions(+), 0 deletions(-)
diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
index 48406ca..7746ca3 100644
--- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
@@ -1222,6 +1222,8 @@ static irqreturn_t pch_gbe_intr(int irq, void *data)
}
}
+ smp_rmb(); /* prevent any other reads before*/
+
/* When request status is Receive interruption */
if ((int_st & (PCH_GBE_INT_RX_DMA_CMPLT | PCH_GBE_INT_TX_CMPLT)) ||
(adapter->rx_stop_flag == true)) {
@@ -1390,6 +1392,9 @@ pch_gbe_clean_tx(struct pch_gbe_adapter *adapter,
i = tx_ring->next_to_clean;
tx_desc = PCH_GBE_TX_DESC(*tx_ring, i);
+
+ rmb(); /* prevent any other reads before*/
+
pr_debug("gbec_status:0x%04x dma_status:0x%04x\n",
tx_desc->gbec_status, tx_desc->dma_status);
@@ -1490,6 +1495,9 @@ pch_gbe_clean_rx(struct pch_gbe_adapter *adapter,
while (*work_done < work_to_do) {
/* Check Rx descriptor status */
rx_desc = PCH_GBE_RX_DESC(*rx_ring, i);
+
+ rmb(); /* prevent any other reads before*/
+
if (rx_desc->gbec_status == DSC_INIT16)
break;
cleaned = true;
--
1.7.4.4
^ permalink raw reply related
* Re: [PATCH] pch_gbe: Adding read memory barriers
From: David Miller @ 2012-05-07 17:44 UTC (permalink / raw)
To: erwanaliasr1; +Cc: netdev, linux-kernel, stable, tshimizu818
In-Reply-To: <4FA808E3.1030908@gmail.com>
From: Erwan Velu <erwanaliasr1@gmail.com>
Date: Mon, 07 May 2012 19:39:47 +0200
> Please find attached
First of all, for a patch which is not accepted yet you do not
CC: stable.
Second of all, do not put text in the main body of your email which is
unrelated to the patch and should not end up in the commit message.
Instead, post your patch to the appropriate primary mailing lists,
and if it's accepted it can then be submitted to -stable at some
later time.
Your patch posting email should be composed purely of the commit
log message in the message body, followed by the actual patch.
Otherwise the maintainer that applies your patch has to edit out
all of this other flowery text that is unrelated to the commit
and that makes more work for them.
^ permalink raw reply
* [PATCH] pch_gbe: Adding read memory barriers
From: Erwan Velu @ 2012-05-07 17:39 UTC (permalink / raw)
To: netdev, linux-kernel, stable, tshimizu818
[-- Attachment #1: Type: text/plain, Size: 437 bytes --]
Dear Linux Kernel Developers,
Please find attached, a patch to solve "Received CRC" errors reported by
the pch_gbe driver under heavy load. It occurred on an Intel ATOM E620T
while running a 300mbit/sec multicast network stream leading to a ~100%
cpu usage.
This patch got validated on the 3.2.16 kernel but also apply to the 3.x
family.
Getting it into stable would be perfect as it solves reliability issues.
Cheers,
Erwan Velu
[-- Attachment #2: 0001-pch_gbe-Adding-read-memory-barriers.patch --]
[-- Type: application/octet-stream, Size: 1805 bytes --]
From 3b65802e4c5a8827a84022066f10dec4d61c1f22 Mon Sep 17 00:00:00 2001
From: Erwan Velu <erwan.velu@zodiacaerospace.com>
Date: Mon, 7 May 2012 14:53:17 +0200
Subject: [PATCH 1/1] pch_gbe: Adding read memory barriers
Under a strong incoming packet stream and/or high cpu usage,
the pch_gbe driver reports "Receive CRC Error" and discards packet.
Adding rmb() calls before considering the network card's status solve
this issues.
Signed-off-by: Erwan Velu <erwan.velu@zodiacaerospace.com>
---
.../net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c | 8 ++++++++
1 files changed, 8 insertions(+), 0 deletions(-)
diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
index 48406ca..7746ca3 100644
--- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
@@ -1222,6 +1222,8 @@ static irqreturn_t pch_gbe_intr(int irq, void *data)
}
}
+ smp_rmb(); /* prevent any other reads before*/
+
/* When request status is Receive interruption */
if ((int_st & (PCH_GBE_INT_RX_DMA_CMPLT | PCH_GBE_INT_TX_CMPLT)) ||
(adapter->rx_stop_flag == true)) {
@@ -1390,6 +1392,9 @@ pch_gbe_clean_tx(struct pch_gbe_adapter *adapter,
i = tx_ring->next_to_clean;
tx_desc = PCH_GBE_TX_DESC(*tx_ring, i);
+
+ rmb(); /* prevent any other reads before*/
+
pr_debug("gbec_status:0x%04x dma_status:0x%04x\n",
tx_desc->gbec_status, tx_desc->dma_status);
@@ -1490,6 +1495,9 @@ pch_gbe_clean_rx(struct pch_gbe_adapter *adapter,
while (*work_done < work_to_do) {
/* Check Rx descriptor status */
rx_desc = PCH_GBE_RX_DESC(*rx_ring, i);
+
+ rmb(); /* prevent any other reads before*/
+
if (rx_desc->gbec_status == DSC_INIT16)
break;
cleaned = true;
--
1.7.4.4
^ permalink raw reply related
* Re: [PATCH] mwl8k: Add 0x2a02 PCI device-id (Marvell 88W8361)
From: Dan Williams @ 2012-05-07 16:42 UTC (permalink / raw)
To: Lennert Buytenhek
Cc: sedat.dilek-Re5JQEeQqe8AvxtiuMwx3w, John W. Linville,
linux-wireless-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, lautriv, Jim Cromie
In-Reply-To: <1336406944.2385.24.camel-wKZy7rqYPVb5EHUCmHmTqw@public.gmane.org>
On Mon, 2012-05-07 at 11:09 -0500, Dan Williams wrote:
> On Tue, 2012-05-01 at 14:51 +0200, Lennert Buytenhek wrote:
> > On Sun, Apr 29, 2012 at 12:25:21AM +0200, Sedat Dilek wrote:
> >
> > > > On 1st sight, logs look fine:
> > > >
> > > > [21:52:52] <lautriv> [ 6.050967] ieee80211 phy0: 88w8361p v4,
> > > > 00173f3bdde3, STA firmware 2.1.4.25
> > > >
> > > > But WLAN connection is not that fast and stable as lautriv reports
> > > > (several abnormalities were observed).
> > > >
> > > > I requested a tarball which includes:
> > > > * dmesg (Linux-3.3.3)
> > > > * e_n_a (/etc/network/interfaces)
> > > > * ifconfig output
> > > > * iwconfig output
> > > > * iw_phy output
> > > > * ps_axu (WPA) output
> > > >
> > > > lautriv will be so kind to be around on #linux-wireless/Freenode the
> > > > next days (UTC+2: German/Swiss local-time).
> > > > Just ping him.
> > > >
> > > > Hope you have fun, together!
> > > >
> > > > - Sedat -
> > >
> > > A new tarball from lautriv with same outputs as before, but now tested
> > > with Linux-3.4-rc4.
> >
> > The output looks good enough for me to ACK adding the PCI ID.
> >
> > Can the firmware being used here be submitted to the linux-firmware
> > git tree?
>
> So Marvell sent John a driver for TopDog a long time ago, which he put
> up on kernel.org. That driver was reworked by Louis and put up in a git
> tree, but both were lost to the kernel.org hack. I have git backups of
> both git trees. I put Louis' cleanup here:
>
> http://people.redhat.com/dcbw/mrvl_cb82.tar.bz2
>
> That driver (mrvl_cb82) has the following PCI IDs:
>
> static const struct pci_device_id mwl_id_tbl[] __devinitdata = {
> { PCI_VDEVICE(MARVELL, 0x2a02), 0 },
> { PCI_VDEVICE(MARVELL, 0x2a03), 1 },
> { PCI_VDEVICE(MARVELL, 0x2a06), 2 },
> { PCI_VDEVICE(MARVELL, 0x2a07), 3 },
> { PCI_VDEVICE(MARVELL, 0x2a04), 4 },
> { PCI_VDEVICE(MARVELL, 0x2a08), 5 },
> { PCI_VDEVICE(MARVELL, 0x2a0a), 6 },
> { PCI_VDEVICE(MARVELL, 0x2a0b), 7 },
> { PCI_VDEVICE(MARVELL, 0x2a0c), 8 },
> { 0 }
> };
>
> and supposedly works for CB82 + CB85. The firmware helper for CB82
> looks pretty close to the mwl8k one.
>
> The firmware API exposed by mrvl_cb82 looks very close to mwl8k
> actually. I only checked the HostCmd bits, not the structures, so I
> would expect a few differences. There are some commands that mwl8k
> exposes that mrvl_cb82 does not and vice versa, but I'm not sure if the
> drivers actually use those commands.
As for AP mode, the Marvell Extranet does have AP-mode drivers for the
8361 and 8363, but the zipfiles are password-protected and I have no
idea what license they are supposed to be under due to that. Thus I
cannot get the AP mode firmware for those either to submit to
linux-firmware.
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [net-next 0/4][pull request] Intel Wired LAN Driver Updates
From: David Miller @ 2012-05-07 16:33 UTC (permalink / raw)
To: jeffrey.t.kirsher; +Cc: netdev, gospo, sassmann
In-Reply-To: <1336374778.2386.1.camel@jtkirshe-mobl>
From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Mon, 07 May 2012 00:12:58 -0700
> On Sun, 2012-05-06 at 13:25 -0400, David Miller wrote:
>> From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
>> Date: Sat, 5 May 2012 05:38:09 -0700
>>
>> > This series of patches contains updates for e1000e and ixgbe.
>> >
>> > NOTE- The ixgbe patch can and probably should be applied to
>> > David Miller's net tree as well.
>> >
>> > The following are changes since commit bd14b1b2e29bd6812597f896dde06eaf7c6d2f24:
>> > tcp: be more strict before accepting ECN negociation
>> > and are available in the git repository at:
>> > git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next master
>>
>> No new changes there?
>>
>> [davem@drr net-next]$ git pull git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next master
>> From git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next
>> * branch master -> FETCH_HEAD
>> Already up-to-date.
>
> Sorry Dave, I thought I had pushed the changes but it appears I did not.
> I have rectified that and now my net-next tree contains the four
> patches.
That looks better, pulled, thanks Jeff.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox