Netdev List
 help / color / mirror / Atom feed
* [PATCH net-next-2.6] net/core: neighbour update Oops
From: Doug Kehn @ 2010-07-13 15:23 UTC (permalink / raw)
  To: davem; +Cc: netdev, eric.dumazet

When configuring DMVPN (GRE + openNHRP) and a GRE remote
address is configured a kernel Oops is observed.  The
obserseved Oops is caused by a NULL header_ops pointer
(neigh->dev->header_ops) in neigh_update_hhs() when

void (*update)(struct hh_cache*, const struct net_device*, const unsigned char *)
= neigh->dev->header_ops->cache_update;

is executed.  The dev associated with the NULL header_ops is
the GRE interface.  This patch guards against the
possibility that header_ops is NULL.

This Oops was first observed in kernel version 2.6.26.8.

Signed-off-by: Doug Kehn <rdkehn@yahoo.com>
---
 net/core/neighbour.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 6ba1c0e..a4e0a74 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -949,7 +949,10 @@ static void neigh_update_hhs(struct neighbour *neigh)
 {
 	struct hh_cache *hh;
 	void (*update)(struct hh_cache*, const struct net_device*, const unsigned char *)
-		= neigh->dev->header_ops->cache_update;
+		= NULL;
+
+	if (neigh->dev->header_ops)
+		update = neigh->dev->header_ops->cache_update;
 
 	if (update) {
 		for (hh = neigh->hh; hh; hh = hh->hh_next) {
-- 
1.7.0.4



      

^ permalink raw reply related

* [PATCH v2] netfilter: xtables: userspace notification target
From: Samuel Ortiz @ 2010-07-13 14:57 UTC (permalink / raw)
  To: Patrick McHardy, David S. Miller
  Cc: netdev, netfilter-devel, Luciano Coelho, sameo, Jan Engelhardt,
	Changli Gao, Pablo Neira Ayuso
In-Reply-To: <20100713001115.GA3751@sortiz-mobl>


The userspace notification Xtables target sends a netlink notification
whenever a packet hits the target. Notifications have a label attribute
for userspace to match it against a previously set rule. The rules also
take a --all option to switch between sending a notification for all
packets or for the first one only.
Userspace can also send a netlink message to toggle this switch while the
target is in place. This target uses the nefilter netlink framework.

This target combined with various matches (quota, rateest, etc..) allows
userspace to make decisions on interfaces handling. One could for example
decide to switch between power saving modes depending on estimated rate
thresholds.

Reviewed-by: Luciano Coelho <luciano.coelho@nokia.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
---
v2:
- Remove the work structure and make the netlink sending routine atomic.
- Reformat struct nfnotif_tg to get rid of unnecessary padding holes.
---
 include/linux/netfilter/Kbuild             |    1 +
 include/linux/netfilter/nfnetlink.h        |    5 +-
 include/linux/netfilter/nfnetlink_compat.h |    1 +
 include/linux/netfilter/xt_NFNOTIF.h       |   55 ++++++
 net/netfilter/Kconfig                      |   17 ++
 net/netfilter/Makefile                     |    1 +
 net/netfilter/xt_NFNOTIF.c                 |  287 ++++++++++++++++++++++++++++
 7 files changed, 366 insertions(+), 1 deletions(-)
 create mode 100644 include/linux/netfilter/xt_NFNOTIF.h
 create mode 100644 net/netfilter/xt_NFNOTIF.c

diff --git a/include/linux/netfilter/Kbuild b/include/linux/netfilter/Kbuild
index bb103f4..1b80b27 100644
--- a/include/linux/netfilter/Kbuild
+++ b/include/linux/netfilter/Kbuild
@@ -12,6 +12,7 @@ header-y += xt_IDLETIMER.h
 header-y += xt_LED.h
 header-y += xt_MARK.h
 header-y += xt_NFLOG.h
+header-y += xt_NFNOTIF.h
 header-y += xt_NFQUEUE.h
 header-y += xt_RATEEST.h
 header-y += xt_SECMARK.h
diff --git a/include/linux/netfilter/nfnetlink.h b/include/linux/netfilter/nfnetlink.h
index 361d6b5..e336f03 100644
--- a/include/linux/netfilter/nfnetlink.h
+++ b/include/linux/netfilter/nfnetlink.h
@@ -18,6 +18,8 @@ enum nfnetlink_groups {
 #define NFNLGRP_CONNTRACK_EXP_UPDATE	NFNLGRP_CONNTRACK_EXP_UPDATE
 	NFNLGRP_CONNTRACK_EXP_DESTROY,
 #define NFNLGRP_CONNTRACK_EXP_DESTROY	NFNLGRP_CONNTRACK_EXP_DESTROY
+	NFNLGRP_NFNOTIF,
+#define NFNLGRP_NFNOTIF	                NFNLGRP_NFNOTIF
 	__NFNLGRP_MAX,
 };
 #define NFNLGRP_MAX	(__NFNLGRP_MAX - 1)
@@ -47,7 +49,8 @@ struct nfgenmsg {
 #define NFNL_SUBSYS_QUEUE		3
 #define NFNL_SUBSYS_ULOG		4
 #define NFNL_SUBSYS_OSF			5
-#define NFNL_SUBSYS_COUNT		6
+#define NFNL_SUBSYS_NFNOTIF		6
+#define NFNL_SUBSYS_COUNT		7
 
 #ifdef __KERNEL__
 
diff --git a/include/linux/netfilter/nfnetlink_compat.h b/include/linux/netfilter/nfnetlink_compat.h
index ffb9503..dca8ab2 100644
--- a/include/linux/netfilter/nfnetlink_compat.h
+++ b/include/linux/netfilter/nfnetlink_compat.h
@@ -13,6 +13,7 @@
 #define NF_NETLINK_CONNTRACK_EXP_NEW		0x00000008
 #define NF_NETLINK_CONNTRACK_EXP_UPDATE		0x00000010
 #define NF_NETLINK_CONNTRACK_EXP_DESTROY	0x00000020
+#define NF_NETLINK_NFNOTIF			0x00000040
 
 /* Generic structure for encapsulation optional netfilter information.
  * It is reminiscent of sockaddr, but with sa_family replaced
diff --git a/include/linux/netfilter/xt_NFNOTIF.h b/include/linux/netfilter/xt_NFNOTIF.h
new file mode 100644
index 0000000..8fae827
--- /dev/null
+++ b/include/linux/netfilter/xt_NFNOTIF.h
@@ -0,0 +1,55 @@
+/*
+ * linux/include/linux/netfilter/xt_NFNOTIF.h
+ *
+ * Header file for Xtables notification target module.
+ *
+ * Copyright (C) 2010 Intel Corporation
+ * Samuel Ortiz <samuel.ortiz@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version
+ * 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA.
+ */
+
+#ifndef _XT_NFNOTIF_H
+#define _XT_NFNOTIF_H
+
+#include <linux/types.h>
+
+enum nfnotif_msg_type {
+	NFNOTIF_TG_MSG_PACKETS,
+
+	NFNOTIF_TG_MSG_MAX
+};
+
+enum nfnotif_attr_type {
+	NFNOTIF_TG_ATTR_UNSPEC,
+	NFNOTIF_TG_ATTR_LABEL,
+	NFNOTIF_TG_ATTR_SEND_NOTIF,
+
+	__NFNOTIF_TG_ATTR_AFTER_LAST
+};
+#define NFNOTIF_TG_ATTR_MAX (__NFNOTIF_TG_ATTR_AFTER_LAST - 1)
+
+#define MAX_NFNOTIF_LABEL_SIZE 31
+
+struct nfnotif_tg_info {
+	__u8 all_packets;
+
+	char label[MAX_NFNOTIF_LABEL_SIZE];
+
+	/* for kernel module internal use only */
+	struct nfnotif_tg *notif __attribute((aligned(8)));
+};
+
+#endif
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index aa2f106..0e2de36 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -469,6 +469,23 @@ config NETFILTER_XT_TARGET_NFQUEUE
 
 	  To compile it as a module, choose M here.  If unsure, say N.
 
+config NETFILTER_XT_TARGET_NFNOTIF
+	tristate '"NFNOTIF" target Support'
+	depends on NETFILTER_ADVANCED
+	select NETFILTER_NETLINK
+	help
+
+	  This option adds the `NFNOTIF' target, which allows to send
+	  netfilter netlink messages when packets hit the target.
+
+	  This target comes with an option to specify if one wants all
+	  packets hitting the target to trigger the netlink message
+	  transmission, or only the first one.
+	  It also listen on its netfilter netlink subsystem for messages
+	  allowing to reset the above option.
+
+	  To compile it as a module, choose M here.  If unsure, say N.
+
 config NETFILTER_XT_TARGET_NOTRACK
 	tristate  '"NOTRACK" target support'
 	depends on IP_NF_RAW || IP6_NF_RAW
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index e28420a..5d9c9e9 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -62,6 +62,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP) += xt_TCPOPTSTRIP.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_TEE) += xt_TEE.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_TRACE) += xt_TRACE.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_IDLETIMER) += xt_IDLETIMER.o
+obj-$(CONFIG_NETFILTER_XT_TARGET_NFNOTIF) += xt_NFNOTIF.o
 
 # matches
 obj-$(CONFIG_NETFILTER_XT_MATCH_CLUSTER) += xt_cluster.o
diff --git a/net/netfilter/xt_NFNOTIF.c b/net/netfilter/xt_NFNOTIF.c
new file mode 100644
index 0000000..75ddeba
--- /dev/null
+++ b/net/netfilter/xt_NFNOTIF.c
@@ -0,0 +1,287 @@
+/*
+ * linux/net/netfilter/xt_NFNOTIF.c
+ *
+ * Copyright (C) 2010 Intel Corporation
+ * Samuel Ortiz <samuel.ortiz@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version
+ * 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA.
+ *
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/module.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/netfilter.h>
+#include <linux/netfilter/x_tables.h>
+#include <linux/netfilter/nfnetlink.h>
+#include <linux/netfilter/xt_NFNOTIF.h>
+
+struct nfnotif_tg {
+	struct list_head entry;
+
+	char *label;
+	struct net *net;
+	unsigned int refcnt;
+	__u8 all_packets;
+	__u8 send_notif;
+};
+
+static LIST_HEAD(nfnotif_tg_list);
+static DEFINE_MUTEX(list_mutex);
+
+static int __nfnotif_tg_netlink_send(struct nfnotif_tg *nfnotif)
+{
+	struct nlmsghdr *nlh;
+	struct nfgenmsg *nfmsg;
+	struct sk_buff *skb;
+	struct net *net = nfnotif->net;
+	unsigned int type;
+	int flags;
+
+	type = NFNL_SUBSYS_NFNOTIF << 8;
+	flags = NLM_F_CREATE;
+
+	skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_ATOMIC);
+	if (skb == NULL)
+		goto error_out;
+
+	nlh = nlmsg_put(skb, 0, 0, type, sizeof(*nfmsg), flags);
+	if (nlh == NULL)
+		goto nlmsg_put_failure;
+
+	nfmsg = nlmsg_data(nlh);
+	nfmsg->version	    = NFNETLINK_V0;
+	nfmsg->res_id	    = 0;
+
+	NLA_PUT_STRING(skb, NFNOTIF_TG_ATTR_LABEL, nfnotif->label);
+
+	nlmsg_end(skb, nlh);
+
+	return nfnetlink_send(skb, net, 0, NFNLGRP_NFNOTIF, 0, GFP_ATOMIC);
+
+nla_put_failure:
+	nlmsg_cancel(skb, nlh);
+
+nlmsg_put_failure:
+	kfree_skb(skb);
+
+error_out:
+	return nfnetlink_set_err(net, 0, 0, -ENOBUFS);
+}
+
+static struct nfnotif_tg *__nfnotif_tg_find_by_label(const char *label)
+{
+	struct nfnotif_tg *entry;
+
+	BUG_ON(!label);
+
+	list_for_each_entry(entry, &nfnotif_tg_list, entry) {
+		if (!strcmp(label, entry->label))
+			return entry;
+	}
+
+	return NULL;
+}
+
+static int nfnotif_tg_create(struct nfnotif_tg_info *info)
+{
+	info->notif = kmalloc(sizeof(*info->notif), GFP_KERNEL);
+	if (!info->notif) {
+		pr_debug("Couldn't allocate notification\n");
+		return -ENOMEM;
+	}
+
+	info->notif->label = kstrdup(info->label, GFP_KERNEL);
+	if (!info->notif->label) {
+		pr_debug("Couldn't allocate label\n");
+		kfree(info->notif);
+		return -ENOMEM;
+	}
+
+	info->notif->all_packets = info->all_packets;
+	info->notif->send_notif = 1;
+
+	list_add(&info->notif->entry, &nfnotif_tg_list);
+
+	info->notif->refcnt = 1;
+
+	return 0;
+}
+
+static unsigned int nfnotif_tg_target(struct sk_buff *skb,
+				      const struct xt_action_param *par)
+{
+	const struct nfnotif_tg_info *info = par->targinfo;
+
+	BUG_ON(!info->notif);
+
+	if (!info->notif->send_notif)
+		return XT_CONTINUE;
+
+	pr_debug("Sending notification for %s\n", info->label);
+
+	if (__nfnotif_tg_netlink_send(info->notif) < 0)
+		pr_debug("Could not send notification");
+
+	if (!info->notif->all_packets)
+		info->notif->send_notif = 0;
+
+	return XT_CONTINUE;
+}
+
+static int nfnotif_tg_checkentry(const struct xt_tgchk_param *par)
+{
+	struct nfnotif_tg_info *info = par->targinfo;
+	int ret;
+
+	pr_debug("Checkentry targinfo %s\n", info->label);
+
+	if (info->label[0] == '\0' ||
+	    strnlen(info->label,
+		    MAX_NFNOTIF_LABEL_SIZE) == MAX_NFNOTIF_LABEL_SIZE) {
+		pr_debug("Label is empty or not nul-terminated\n");
+		return -EINVAL;
+	}
+
+	mutex_lock(&list_mutex);
+
+	info->notif = __nfnotif_tg_find_by_label(info->label);
+	if (info->notif) {
+		info->notif->refcnt++;
+
+		pr_debug("Increased refcnt for %s to %u\n",
+			 info->label, info->notif->refcnt);
+	} else {
+		ret = nfnotif_tg_create(info);
+		if (ret < 0) {
+			pr_debug("Failed to create notification\n");
+			mutex_unlock(&list_mutex);
+			return ret;
+		}
+	}
+
+	info->notif->net = par->net;
+
+	mutex_unlock(&list_mutex);
+	return 0;
+}
+
+static void nfnotif_tg_destroy(const struct xt_tgdtor_param *par)
+{
+	const struct nfnotif_tg_info *info = par->targinfo;
+
+	pr_debug("Destroy targinfo %s\n", info->label);
+
+	mutex_lock(&list_mutex);
+
+	if (--info->notif->refcnt == 0) {
+		pr_debug("Deleting notification %s\n", info->label);
+
+		list_del(&info->notif->entry);
+		kfree(info->notif->label);
+		kfree(info->notif);
+	}
+
+	mutex_unlock(&list_mutex);
+}
+
+static struct xt_target nfnotif_tg __read_mostly = {
+	.name		= "NFNOTIF",
+	.family		= NFPROTO_UNSPEC,
+	.target		= nfnotif_tg_target,
+	.targetsize     = sizeof(struct nfnotif_tg_info),
+	.checkentry	= nfnotif_tg_checkentry,
+	.destroy        = nfnotif_tg_destroy,
+	.me		= THIS_MODULE,
+};
+
+static int nfnotif_msg_send_notif(struct sock *nfnl, struct sk_buff *skb,
+				  const struct nlmsghdr *nlh,
+				  const struct nlattr * const attrs[])
+{
+	struct nfnotif_tg *notif;
+	char *label;
+	u8 send_notif;
+
+	if (attrs[NFNOTIF_TG_ATTR_LABEL] == NULL ||
+	    attrs[NFNOTIF_TG_ATTR_SEND_NOTIF] == NULL)
+		return -EINVAL;
+
+	label = nla_data(attrs[NFNOTIF_TG_ATTR_LABEL]);
+	send_notif = nla_get_u8(attrs[NFNOTIF_TG_ATTR_SEND_NOTIF]);
+
+	pr_debug("Label %s send %d\n", label, send_notif);
+
+	notif = __nfnotif_tg_find_by_label(label);
+	if (notif == NULL)
+		return -EINVAL;
+
+	notif->send_notif = send_notif;
+
+	return 0;
+}
+
+
+static const struct nla_policy nfnotif_nla_policy[NFNOTIF_TG_ATTR_MAX + 1] = {
+	[NFNOTIF_TG_ATTR_LABEL]            = { .type = NLA_NUL_STRING },
+	[NFNOTIF_TG_ATTR_SEND_NOTIF]	   = { .type = NLA_U8 },
+};
+
+static const struct nfnl_callback nfnotif_cb[NFNOTIF_TG_MSG_MAX] = {
+	[NFNOTIF_TG_MSG_PACKETS]   = { .call = nfnotif_msg_send_notif,
+				       .attr_count = NFNOTIF_TG_ATTR_MAX,
+				       .policy = nfnotif_nla_policy },
+};
+
+static const struct nfnetlink_subsystem nfnotif_subsys = {
+	.name				= "nfnotif",
+	.subsys_id			= NFNL_SUBSYS_NFNOTIF,
+	.cb_count			= NFNOTIF_TG_MSG_MAX,
+	.cb				= nfnotif_cb,
+};
+
+static int __init nfnotif_tg_init(void)
+{
+	int ret;
+
+	ret = nfnetlink_subsys_register(&nfnotif_subsys);
+	if (ret < 0) {
+		pr_err("%s: Cannot register with nfnetlink\n", __func__);
+		return ret;
+	}
+
+	ret = xt_register_target(&nfnotif_tg);
+	if (ret < 0) {
+		pr_err("%s: Cannot register target\n", __func__);
+		nfnetlink_subsys_unregister(&nfnotif_subsys);
+	}
+
+	return ret;
+}
+
+static void __exit nfnotif_tg_exit(void)
+{
+	nfnetlink_subsys_unregister(&nfnotif_subsys);
+	xt_unregister_target(&nfnotif_tg);
+}
+
+module_init(nfnotif_tg_init);
+module_exit(nfnotif_tg_exit);
+
+MODULE_AUTHOR("Samuel Ortiz <samuel.ortiz@intel.com>");
+MODULE_DESCRIPTION("Xtables: userspace notification");
+MODULE_LICENSE("GPL v2");
-- 
1.7.1

-- 
Intel Open Source Technology Centre
http://oss.intel.com/

^ permalink raw reply related

* Re: [PATCH] tproxy: nf_tproxy_assign_sock() can handle tw sockets
From: Felipe W Damasio @ 2010-07-13 14:49 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Avi Kivity, David Miller, Patrick McHardy, linux-kernel, netdev
In-Reply-To: <1279032023.2634.384.camel@edumazet-laptop>

Hi Mr. Dumazet,

2010/7/13 Eric Dumazet <eric.dumazet@gmail.com>:
> I currently have no fresh ideas. If you want this problem to be solved,
> its important to setup in your lab a workload to trigger again and again
> the bug, in order to provide us more crash information.

 Right. I've been running non-stop since the first bug happened, but
so far the problem hasn't surfaced again :-(

 I've been using the kernel with the patch that you provided me
(nf_tproxy.c). Is there a chance that patch fixed the problem?

 Cheers,

Felipe Damasio

^ permalink raw reply

* RE: Splice status
From: Ofer Heifetz @ 2010-07-13 14:40 UTC (permalink / raw)
  To: Changli Gao; +Cc: Eric Dumazet, Jens Axboe, netdev@vger.kernel.org
In-Reply-To: <AANLkTimZyIhl34NzAZXMNUEmGy0KFCBs_LMk5qSBwrF2@mail.gmail.com>

I profiled the splice iometer write run and noticed that blk_end_request_err is being called many times, it looks like a good candidate for the high iowait, need to debug the root cause for it.

-Ofer

-----Original Message-----
From: Changli Gao [mailto:xiaosuo@gmail.com] 
Sent: Tuesday, July 13, 2010 4:58 PM
To: Ofer Heifetz
Cc: Eric Dumazet; Jens Axboe; netdev@vger.kernel.org
Subject: Re: Splice status

On Tue, Jul 13, 2010 at 8:42 PM, Ofer Heifetz <oferh@marvell.com> wrote:
> Write and re-write numbers are in MBps.
> Iozone performs re-write meaning reads a chunk of data and writes it back, so basically the performance for this operation should be quiet high since kernel caches usage.
>
> I forgot to mention that I used EXT4 fs.

Maybe it is caused by this line in generic_file_splice_write():

                balance_dirty_pages_ratelimited_nr(mapping, nr_pages);

Please try to test it again without this line.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [PATCH] tproxy: nf_tproxy_assign_sock() can handle tw sockets
From: Eric Dumazet @ 2010-07-13 14:40 UTC (permalink / raw)
  To: Felipe W Damasio
  Cc: Avi Kivity, David Miller, Patrick McHardy, linux-kernel, netdev
In-Reply-To: <AANLkTimn9x17g-4K7fS4MjmCgzHuBARurEoFVd_gId7m@mail.gmail.com>

Le mardi 13 juillet 2010 à 11:24 -0300, Felipe W Damasio a écrit :
> Hi Mr. Dumazet,
> 
> 2010/7/12 Felipe W Damasio <felipewd@gmail.com>:
> > Here's the result using ethtool-2.6.34:
> >
> > ./ethtool -k eth1
> >
> > Offload parameters for eth1:
> > rx-checksumming: on
> > tx-checksumming: on
> > scatter-gather: on
> > tcp-segmentation-offload: on
> > udp-fragmentation-offload: off
> > generic-segmentation-offload: on
> > generic-receive-offload: off
> > large-receive-offload: off
> > ntuple-filters: off
> > receive-hashing: off
> >
> >
> > ./ethtool -k eth2
> >
> > Offload parameters for eth2:
> > rx-checksumming: on
> > tx-checksumming: on
> > scatter-gather: on
> > tcp-segmentation-offload: on
> > udp-fragmentation-offload: off
> > generic-segmentation-offload: on
> > generic-receive-offload: off
> > large-receive-offload: off
> > ntuple-filters: off
> > receive-hashing: off
> 
> Did these help you track down the issue?
> 
> Sorry to insist, it's just that my bosses are kind of pressuring me to
> solve the problem and put the squid machine back online :-)
> 
> Is there a test I can run to try and trigger the issue?
> 
> I have the same scenario (hardware and network setup) on my lab...
> 

I currently have no fresh ideas. If you want this problem to be solved,
its important to setup in your lab a workload to trigger again and again
the bug, in order to provide us more crash information.

After code review doesnt spot obvious bugs, this is time for brute force
hunting, using git bisection for example...




^ permalink raw reply

* Re: [PATCH] tproxy: nf_tproxy_assign_sock() can handle tw sockets
From: Felipe W Damasio @ 2010-07-13 14:24 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Avi Kivity, David Miller, Patrick McHardy, linux-kernel, netdev
In-Reply-To: <AANLkTilbEkJWPqvJE72r9HdFQSU84S02BKZB0CH-8QwB@mail.gmail.com>

Hi Mr. Dumazet,

2010/7/12 Felipe W Damasio <felipewd@gmail.com>:
> Here's the result using ethtool-2.6.34:
>
> ./ethtool -k eth1
>
> Offload parameters for eth1:
> rx-checksumming: on
> tx-checksumming: on
> scatter-gather: on
> tcp-segmentation-offload: on
> udp-fragmentation-offload: off
> generic-segmentation-offload: on
> generic-receive-offload: off
> large-receive-offload: off
> ntuple-filters: off
> receive-hashing: off
>
>
> ./ethtool -k eth2
>
> Offload parameters for eth2:
> rx-checksumming: on
> tx-checksumming: on
> scatter-gather: on
> tcp-segmentation-offload: on
> udp-fragmentation-offload: off
> generic-segmentation-offload: on
> generic-receive-offload: off
> large-receive-offload: off
> ntuple-filters: off
> receive-hashing: off

Did these help you track down the issue?

Sorry to insist, it's just that my bosses are kind of pressuring me to
solve the problem and put the squid machine back online :-)

Is there a test I can run to try and trigger the issue?

I have the same scenario (hardware and network setup) on my lab...

Cheers,

Felipe Damasio

^ permalink raw reply

* RE: Splice status
From: Eric Dumazet @ 2010-07-13 14:11 UTC (permalink / raw)
  To: Ofer Heifetz; +Cc: Changli Gao, Jens Axboe, netdev@vger.kernel.org
In-Reply-To: <EE71107DF0D1F24FA2D95041E64AB9E8ED254E7B66@IL-MB01.marvell.com>

Le mardi 13 juillet 2010 à 14:41 +0300, Ofer Heifetz a écrit :
> Hi,
> 
> I wanted to let you know that I have been testing Samba splice on Marvell 6282 SoC on 2.6.35_rc3 and noticed that it gave worst performance than not using it and also noticed that on re-writing file the iowait is high.
> 
> iometer using 2G file (file is created before test)
> 
> Splice  write cpu% iow%
> -----------------------
>  No     58    98    0
> Yes     14   100   48
> 
> iozone using 2G file (file created during test)
> 
> Splice  write cpu% iow%  re-write cpu% iow%  
> -------------------------------------------
>  No     35    85    4    58.2     70    0
> Yes     33    85    4    15.7    100   58
> 
> Any clue why splice introduces a high iowait?
> I noticed samba uses up to 16K per splice syscall, changing the samba to try more did not help, so I guess it is a kernel limitation.
> 

splice(socket -> pipe) provides partial buffers (depending on the MTU)

With typical MTU=1500 and tcp timestamps, each network frame contains
1448 bytes of payload, partially filling one page (of 4096 bytes)

When doing the splice(pipe -> file), kernel has to coalesce partial
data, but amount of written data per syscall() is small (about 20
Kbytes)

Without splice(), the write() syscall provides more data, and vfs
overhead is smaller as buffer size is a power of two.

Samba uses a 128 KBytes TRANSFER_BUF_SIZE in its default_sys_recvfile()
implementation, it easily outperforms splice() implementation.

You could try extending pipe size (fcntl(fd, F_SETPIPE_SZ, 256)), maybe
it will be a bit better. (and ask 256*4096 bytes to splice())

I tried this and got about 256Kbytes per splice() call...

# perf report
# Events: 13K
#
# Overhead         Command      Shared Object  Symbol
# ........  ..............  .................  ......
#
     8.69%  splice-fromnet  [kernel.kallsyms]  [k] memcpy
     3.82%  splice-fromnet  [kernel.kallsyms]  [k] kunmap_atomic
     3.51%  splice-fromnet  [kernel.kallsyms]  [k] __block_prepare_write
     2.79%  splice-fromnet  [kernel.kallsyms]  [k] __skb_splice_bits
     2.58%  splice-fromnet  [kernel.kallsyms]  [k] ext3_mark_iloc_dirty
     2.45%  splice-fromnet  [kernel.kallsyms]  [k] do_get_write_access
     2.04%  splice-fromnet  [kernel.kallsyms]  [k] __find_get_block
     1.89%  splice-fromnet  [kernel.kallsyms]  [k] _raw_spin_lock
     1.83%  splice-fromnet  [kernel.kallsyms]  [k] journal_add_journal_head
     1.46%  splice-fromnet  [bnx2x]            [k] bnx2x_rx_int
     1.46%  splice-fromnet  [kernel.kallsyms]  [k] kfree
     1.42%  splice-fromnet  [kernel.kallsyms]  [k] journal_put_journal_head
     1.29%  splice-fromnet  [kernel.kallsyms]  [k] __ext3_get_inode_loc
     1.26%  splice-fromnet  [kernel.kallsyms]  [k] journal_dirty_metadata
     1.25%  splice-fromnet  [kernel.kallsyms]  [k] page_address
     1.20%  splice-fromnet  [kernel.kallsyms]  [k] journal_cancel_revoke
     1.15%  splice-fromnet  [kernel.kallsyms]  [k] tcp_read_sock
     1.09%  splice-fromnet  [kernel.kallsyms]  [k] unlock_buffer
     1.09%  splice-fromnet  [kernel.kallsyms]  [k] pipe_to_file
     1.05%  splice-fromnet  [kernel.kallsyms]  [k] radix_tree_lookup_element
     1.04%  splice-fromnet  [kernel.kallsyms]  [k] kmap_atomic_prot
     1.04%  splice-fromnet  [kernel.kallsyms]  [k] kmem_cache_free
     1.03%  splice-fromnet  [kernel.kallsyms]  [k] kmem_cache_alloc
     1.01%  splice-fromnet  [bnx2x]            [k] bnx2x_poll



^ permalink raw reply

* Re: Splice status
From: Changli Gao @ 2010-07-13 13:58 UTC (permalink / raw)
  To: Ofer Heifetz; +Cc: Eric Dumazet, Jens Axboe, netdev@vger.kernel.org
In-Reply-To: <EE71107DF0D1F24FA2D95041E64AB9E8ED254E7BAB@IL-MB01.marvell.com>

On Tue, Jul 13, 2010 at 8:42 PM, Ofer Heifetz <oferh@marvell.com> wrote:
> Write and re-write numbers are in MBps.
> Iozone performs re-write meaning reads a chunk of data and writes it back, so basically the performance for this operation should be quiet high since kernel caches usage.
>
> I forgot to mention that I used EXT4 fs.

Maybe it is caused by this line in generic_file_splice_write():

                balance_dirty_pages_ratelimited_nr(mapping, nr_pages);

Please try to test it again without this line.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [PATCH] tc35815: fix iomap leak
From: Atsushi Nemoto @ 2010-07-13 13:14 UTC (permalink / raw)
  To: segooon; +Cc: kernel-janitors, davem, jpirko, eric.dumazet, adobriyan, netdev
In-Reply-To: <1278756199-4636-1-git-send-email-segooon@gmail.com>

On Sat, 10 Jul 2010 14:03:18 +0400, Kulikov Vasiliy <segooon@gmail.com> wrote:
> If tc35815_init_one() fails we must unmap mapped regions.
> 
> Signed-off-by: Kulikov Vasiliy <segooon@gmail.com>
> ---
>  drivers/net/tc35815.c |    4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)

No, pcim_xxx APIs are _managed_ interfaces.  These resources are
released automatically.  Actually currently nobody in kernel call
pcim_iounmap_regions() now.

And _if_ there is any reason to call pcim_iounmap_regions()
explicitly, it should be called in tc35815_remove_one() too.

So, NAK.

---
Atsushi Nemoto

^ permalink raw reply

* High process latencies due to MPC5200 FEC hard- soft-irq processing
From: Wolfgang Grandegger @ 2010-07-13 13:29 UTC (permalink / raw)
  To: Netdev; +Cc: linuxppc-dev@ozlabs.org, LKML

Hello,

we realized, that multiple ping floods (ping -f) can cause very large
high-priority process latencies (up to a many seconds) on a MPC5200
PowerPC system with FEC NAPI support. The latencies are measured with

  # cyclictest -p 80 -n

The problem is that processing of the ICMP pakets in the Hard-Irq and
Soft-IRQ context can last for a long time without returning to the
scheduler. Reducing MAX_SOFTIRQ_RESTART from 10 to 2 helps - the latency
goes down to 35 ms with 2 "ping -f" - but it's not a configurable
parameter, even if it somehow depends on the CPU power. And using the
-rt patches seems overkill to me. Any other ideas or comments on how to
get rid of such high process latencies?

Wolfgang.

^ permalink raw reply

* Re: [PATCH] netfilter: xtables: userspace notification target
From: Samuel Ortiz @ 2010-07-13 13:28 UTC (permalink / raw)
  To: Changli Gao
  Cc: Patrick McHardy, David S. Miller, netdev, netfilter-devel,
	Luciano Coelho
In-Reply-To: <AANLkTil2EgQbzUqYNHAYpIWJvyyE6AWq1TpvxrqVsD7k@mail.gmail.com>

On Tue, Jul 13, 2010 at 02:18:26PM +0800, Changli Gao wrote:
> On Tue, Jul 13, 2010 at 8:11 AM, Samuel Ortiz <sameo@linux.intel.com> wrote:
> >
> > The userspace notification Xtables target sends a netlink notification
> > whenever a packet hits the target. Notifications have a label attribute
> > for userspace to match it against a previously set rule. The rules also
> > take a --all option to switch between sending a notification for all
> > packets or for the first one only.
> > Userspace can also send a netlink message to toggle this switch while the
> > target is in place. This target uses the nefilter netlink framework.
> >
> > This target combined with various matches (quota, rateest, etc..) allows
> > userspace to make decisions on interfaces handling. One could for example
> > decide to switch between power saving modes depending on estimated rate
> > thresholds.
> >
> 
> It much like the following iptables rules.
> 
> iptables -N log_and_drop
> iptables -A log_and_drop -j NFLOG --nflog-group 1 --nflog-prefix "log_and_drop"
> iptables -A log_and_drop -j DROP
> 
> ...
> iptables ... -m quota --quota-bytes 20000 -j log_and_drop
> ...
We'd still be missing the possibility of having only the first packet logged,
and we'd have to also send an initial netlink message to switch the copy_mode
to COPY_NONE. We're not interested in the actual packet, but just by the match
hit.
I know it's not big deal after all, I'm just trying to have one simple target
for that simple task of notifying userspace of a match hit.

> > +static unsigned int nfnotif_tg_target(struct sk_buff *skb,
> > +                                     const struct xt_action_param *par)
> > +{
> > +       const struct nfnotif_tg_info *info = par->targinfo;
> > +
> > +       BUG_ON(!info->notif);
> > +
> > +       if (!info->notif->send_notif)
> > +               return XT_CONTINUE;
> > +
> > +       pr_debug("Sending notification for %s\n", info->label);
> > +
> > +       schedule_work(&info->notif->work);
> > +
> 
> Why do you use another kernel activity: kernel thread? netlink
> messages can be sent in atomic context.
That's right, I should have used the ATOMIC gfp flags from my sending routine.
I'll fix that with my next revision of the patch.

Thanks for the review.

Cheers,
Samuel.

-- 
Intel Open Source Technology Centre
http://oss.intel.com/
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] netfilter: xtables: userspace notification target
From: Luciano Coelho @ 2010-07-13 13:24 UTC (permalink / raw)
  To: ext Jan Engelhardt
  Cc: ext Pablo Neira Ayuso, Changli Gao, Samuel Ortiz, Patrick McHardy,
	David S. Miller, netdev@vger.kernel.org,
	netfilter-devel@vger.kernel.org
In-Reply-To: <alpine.LSU.2.01.1007131349120.778@obet.zrqbmnf.qr>

On Tue, 2010-07-13 at 13:49 +0200, ext Jan Engelhardt wrote:
> On Tuesday 2010-07-13 12:23, Luciano Coelho wrote:
> >> 
> >> Indeed, this looks to me like something that you can do with NFLOG and
> >> some combination of matches.
> >
> >Is it possible to have the NFLOG send only one notification to the
> >userspace? In the example above, once the quota exceeds, the userspace
> >will be notified of every packet arriving, won't it?  That would cause
> >unnecessary processing in the userspace.
> >
> >The userspace could remove the rule when it gets the first notification
> >and only add it again when it needs to get the information again (as a
> >"toggle" functionality), but I think that would take too long and there
> >would be several packets going through before the rule could be removed.
> 
> With xt_condition that should not be a problem
> (-A INPUT -m condition --name ruleXYZ -j NFLOG..)
> This is settable through procfs.

Right.  I didn't know about the condition match, because I can't see it
either on net-next-2.6 nor on nf-next-2.6.  I found your patch in the
netfilter-devel archives, though.  Any idea when it will be applied?


-- 
Cheers,
Luca.


^ permalink raw reply

* [patch] net/sched: potential data corruption
From: Dan Carpenter @ 2010-07-13 13:21 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: David S. Miller, Stephen Hemminger, netdev, kernel-janitors,
	matthew

The reset_policy() does:
        memset(d->tcfd_defdata, 0, SIMP_MAX_DATA);
        strlcpy(d->tcfd_defdata, defdata, SIMP_MAX_DATA);

In the original code, the size of d->tcfd_defdata wasn't fixed and if
strlen(defdata) was less than 31, reset_policy() would cause memory
corruption.

Please Note:  The original alloc_defdata() assumes defdata is 32
characters and a NUL terminator while reset_policy() assumes defdata is
31 characters and a NUL.  This patch updates alloc_defdata() to match
reset_policy() (ie a shorter string).  I'm not very familiar with this
code so please review carefully.

Signed-off-by: Dan Carpenter <error27@gmail.com>

diff --git a/net/sched/act_simple.c b/net/sched/act_simple.c
index 1b4bc69..4a1d640 100644
--- a/net/sched/act_simple.c
+++ b/net/sched/act_simple.c
@@ -73,10 +73,10 @@ static int tcf_simp_release(struct tcf_defact *d, int bind)
 
 static int alloc_defdata(struct tcf_defact *d, char *defdata)
 {
-	d->tcfd_defdata = kstrndup(defdata, SIMP_MAX_DATA, GFP_KERNEL);
+	d->tcfd_defdata = kzalloc(SIMP_MAX_DATA, GFP_KERNEL);
 	if (unlikely(!d->tcfd_defdata))
 		return -ENOMEM;
-
+	strlcpy(d->tcfd_defdata, defdata, SIMP_MAX_DATA);
 	return 0;
 }
 

^ permalink raw reply related

* Re: [PATCH] netfilter: xtables: userspace notification target
From: Samuel Ortiz @ 2010-07-13 13:19 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Patrick McHardy, David S. Miller, netdev, netfilter-devel,
	Luciano Coelho
In-Reply-To: <alpine.LSU.2.01.1007130751360.19472@obet.zrqbmnf.qr>

Hi Jan,

On Tue, Jul 13, 2010 at 07:56:31AM +0200, Jan Engelhardt wrote:
> 
> On Tuesday 2010-07-13 02:11, Samuel Ortiz wrote:
> >
> >The userspace notification Xtables target sends a netlink notification
> >whenever a packet hits the target. Notifications have a label attribute
> >for userspace to match it against a previously set rule. The rules also
> >take a --all option to switch between sending a notification for all
> >packets or for the first one only.
> >Userspace can also send a netlink message to toggle this switch while the
> >target is in place. This target uses the nefilter netlink framework.
> 
> Would it not make sense to modify that module?
> Sounds an awful lot like NFQUEUE without passing the payload :)
yes, except for the payload, the missing "send one" packet toggle, and the
verdict we'd have to send back, it's almost identical ;)

What I'm trying to achieve with this target is a simple way to send a
userspace notification to userspace, without having to define a complex set of
rules, matches and having to pass some initial netlink message to set the
target properly (to avoid the payload passing in the NFLOG case).


> >+++ b/net/netfilter/xt_NFNOTIF.c
> >+struct nfnotif_tg {
> >+	struct list_head entry;
> >+	struct work_struct work;
> >+
> >+	char *label;
> >+	__u8 all_packets;
> >+	struct net *net;
> >+
> >+	__u8 send_notif;
> >+
> >+	unsigned int refcnt;
> >+};
> 
> Has unnecessary padding holes.

Right, I will send a v2 later today.

Thanks for your comments and review.

Cheers,
Samuel.

-- 
Intel Open Source Technology Centre
http://oss.intel.com/

^ permalink raw reply

* Re: [PATCH -mmotm 00/30] [RFC] swap over nfs -v21
From: Américo Wang @ 2010-07-13 12:53 UTC (permalink / raw)
  To: Xiaotian Feng
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, riel-H+wXaHxf7aLQT0dZR+AlfA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, lwang-H+wXaHxf7aLQT0dZR+AlfA,
	penberg-bbCR+/B0CizivPeTLB3BmA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <20100713101650.2835.15245.sendpatchset-bd3XojVv5Xrm7B5McmCzzQ@public.gmane.org>

On Tue, Jul 13, 2010 at 6:16 PM, Xiaotian Feng <dfeng-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> Hi,
>
> Here's the latest version of swap over NFS series since -v20 last October. We decide to push
> this feature as it is useful for NAS or virt environment.
>
> The patches are against the mmotm-2010-07-01. We can split the patchset into following parts:
>
> Patch 1 - 12: provides a generic reserve framework. This framework
> could also be used to get rid of some of the __GFP_NOFAIL users.
>
> Patch 13 - 15: Provide some generic network infrastructure needed later on.
>
> Patch 16 - 21: reserve a little pool to act as a receive buffer, this allows us to
> inspect packets before tossing them.
>
> Patch 22 - 23: Generic vm infrastructure to handle swapping to a filesystem instead of a block
> device.
>
> Patch 24 - 27: convert NFS to make use of the new network and vm infrastructure to
> provide swap over NFS.
>
> Patch 28 - 30: minor bug fixing with latest -mmotm.
>
> [some history]
> v19: http://lwn.net/Articles/301915/
> v20: http://lwn.net/Articles/355350/
>
> Changes since v20:
>        - rebased to mmotm-2010-07-01
>        - dropped the null pointer deref patch for the root cause is wrong SWP_FILE enum
>        - some minor build fixes
>        - fix a null pointer deref with mmotm-2010-07-01
>        - fix a bug when swap with multi files on the same nfs server

Please use the "From:" line correctly, as stated in
Documentation/SubmittingPatches:

The "from" line must be the very first line in the message body,
and has the form:

        From: Original Author <author-hcDgGtZH8xNBDgjK7y7TUQ@public.gmane.org>

The "from" line specifies who will be credited as the author of the
patch in the permanent changelog.  If the "from" line is missing,
then the "From:" line from the email header will be used to determine
the patch author in the changelog.


I think you are using git format-patch to generate those patches, please supply
--author=<author> to git commit when you commit them to your local
tree. (or git am
if the patches you received already had the correct From: line.)

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: Splice status
From: Ofer Heifetz @ 2010-07-13 12:42 UTC (permalink / raw)
  To: Changli Gao; +Cc: Eric Dumazet, Jens Axboe, netdev@vger.kernel.org
In-Reply-To: <AANLkTimRXMivugXHJswrT5FA93PIjSjU-RneFKs1bkLL@mail.gmail.com>

Write and re-write numbers are in MBps.
Iozone performs re-write meaning reads a chunk of data and writes it back, so basically the performance for this operation should be quiet high since kernel caches usage.

I forgot to mention that I used EXT4 fs.

-Ofer

-----Original Message-----
From: Changli Gao [mailto:xiaosuo@gmail.com] 
Sent: Tuesday, July 13, 2010 3:32 PM
To: Ofer Heifetz
Cc: Eric Dumazet; Jens Axboe; netdev@vger.kernel.org
Subject: Re: Splice status

On Tue, Jul 13, 2010 at 7:41 PM, Ofer Heifetz <oferh@marvell.com> wrote:
> Hi,
>
> I wanted to let you know that I have been testing Samba splice on Marvell 6282 SoC on 2.6.35_rc3 and noticed that it gave worst performance than not using it and also noticed that on re-writing file the iowait is high.
>
> iometer using 2G file (file is created before test)
>
> Splice  write cpu% iow%
> -----------------------
>  No     58    98    0
> Yes     14   100   48
>
> iozone using 2G file (file created during test)
>
> Splice  write cpu% iow%  re-write cpu% iow%
> -------------------------------------------
>  No     35    85    4    58.2     70    0
> Yes     33    85    4    15.7    100   58
>
> Any clue why splice introduces a high iowait?
> I noticed samba uses up to 16K per splice syscall, changing the samba to try more did not help, so I guess it is a kernel limitation.
>
> -Ofer
>

What does the column write means? And what do you mean by saying
re-write? Thanks.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: Splice status
From: Changli Gao @ 2010-07-13 12:32 UTC (permalink / raw)
  To: Ofer Heifetz; +Cc: Eric Dumazet, Jens Axboe, netdev@vger.kernel.org
In-Reply-To: <EE71107DF0D1F24FA2D95041E64AB9E8ED254E7B66@IL-MB01.marvell.com>

On Tue, Jul 13, 2010 at 7:41 PM, Ofer Heifetz <oferh@marvell.com> wrote:
> Hi,
>
> I wanted to let you know that I have been testing Samba splice on Marvell 6282 SoC on 2.6.35_rc3 and noticed that it gave worst performance than not using it and also noticed that on re-writing file the iowait is high.
>
> iometer using 2G file (file is created before test)
>
> Splice  write cpu% iow%
> -----------------------
>  No     58    98    0
> Yes     14   100   48
>
> iozone using 2G file (file created during test)
>
> Splice  write cpu% iow%  re-write cpu% iow%
> -------------------------------------------
>  No     35    85    4    58.2     70    0
> Yes     33    85    4    15.7    100   58
>
> Any clue why splice introduces a high iowait?
> I noticed samba uses up to 16K per splice syscall, changing the samba to try more did not help, so I guess it is a kernel limitation.
>
> -Ofer
>

What does the column write means? And what do you mean by saying
re-write? Thanks.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: IPVS scheduler algorithms (was: [PATCH] ipvs: Kconfig cleanup)
From: Simon Horman @ 2010-07-13 12:29 UTC (permalink / raw)
  To: Ismael Luque Valencia
  Cc: Patrick McHardy, Michal Marek, lvs-devel, netdev,
	Julian Anastasov, Wensong Zhang, linux-kernel
In-Reply-To: <AANLkTim3FfgxTzQ36YYYA8QG605oNW3skHZlSLKAZcjl@mail.gmail.com>

On Sun, Jul 11, 2010 at 07:15:21PM -0500, Ismael Luque Valencia wrote:
> Hi my name is Ismael, I am doing a paper about lvs algorithms but I dont
> undernstand two of them Locality-Based Least-Connection and Locality-Based
> Least-Connection with Replication
> Someone, who can explain me that algorithms please
> If it is in Spanish will better because my English is not good

Hi,

Firstly, please don't reply to emails unless you are actually
replying to the topic at hand. Instead just compose a fresh
message to the people/lists that you want to address.
This helps keep threads in mail-readers that support them sane.

In any case this question would be better sent to the lvs-users list.

But to your question.

These schedulers are described briefly in the ipvsadm(8) man page
and in pseudo code in the source files (ip_vs_lblc.c and ip_vs_lblcr.c)
in the Linux kernel tree. And there is some discussion in the howto
http://www.austintek.com/LVS/LVS-HOWTO/HOWTO/LVS-HOWTO.ipvsadm.html#DH

The way that I think of these schedulers is as enhanced versions of wlc
designed for use with transparent proxies. That is situations where
there will be a lot of destination addresses.

lblc works by keeping a cache that associates destination addresses with
a real server. This allows accesses, potentially from different end-users,
to be sent to the same real-server. As this is designed to be used
with proxies, this means the request will be sent to a proxy that may
have already retrieved the result.

lblcr is similar, but instead of one real-server per destination, it
allows for multiple real-servers per destination.


^ permalink raw reply

* Re: [PATCH] netfilter: xtables: userspace notification target
From: Jan Engelhardt @ 2010-07-13 11:49 UTC (permalink / raw)
  To: Luciano Coelho
  Cc: ext Pablo Neira Ayuso, Changli Gao, Samuel Ortiz, Patrick McHardy,
	David S. Miller, netdev@vger.kernel.org,
	netfilter-devel@vger.kernel.org
In-Reply-To: <1279016596.12673.11.camel@chilepepper>


On Tuesday 2010-07-13 12:23, Luciano Coelho wrote:
>> 
>> Indeed, this looks to me like something that you can do with NFLOG and
>> some combination of matches.
>
>Is it possible to have the NFLOG send only one notification to the
>userspace? In the example above, once the quota exceeds, the userspace
>will be notified of every packet arriving, won't it?  That would cause
>unnecessary processing in the userspace.
>
>The userspace could remove the rule when it gets the first notification
>and only add it again when it needs to get the information again (as a
>"toggle" functionality), but I think that would take too long and there
>would be several packets going through before the rule could be removed.

With xt_condition that should not be a problem
(-A INPUT -m condition --name ruleXYZ -j NFLOG..)
This is settable through procfs.


^ permalink raw reply

* Re: [PATCH] eth16i: fix memory leak
From: Dan Carpenter @ 2010-07-13 11:43 UTC (permalink / raw)
  To: Kulikov Vasiliy
  Cc: kernel-janitors, Mika Kuoppala, David S. Miller,
	Stephen Hemminger, Eric Dumazet, Tejun Heo, Jiri Pirko, netdev
In-Reply-To: <1279020138-9398-1-git-send-email-segooon@gmail.com>

On Tue, Jul 13, 2010 at 03:22:18PM +0400, Kulikov Vasiliy wrote:
> Free allocated netdev if no probe is expected.
> 
> Signed-off-by: Kulikov Vasiliy <segooon@gmail.com>
> ---
>  drivers/net/eth16i.c |    4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/net/eth16i.c b/drivers/net/eth16i.c
> index 874973f..2bdd394 100644
> --- a/drivers/net/eth16i.c
> +++ b/drivers/net/eth16i.c
> @@ -1442,8 +1442,10 @@ int __init init_module(void)
>  		dev->if_port = eth16i_parse_mediatype(mediatype[this_dev]);
>  
>  		if(io[this_dev] == 0) {
> -			if(this_dev != 0) /* Only autoprobe 1st one */
> +			if (this_dev != 0) { /* Only autoprobe 1st one */
> +				free_netdev(def);
					    ^^^
				free_netdev(dev);

regards,
dan carpenter

>  				break;
> +			}
>  
>  			printk(KERN_NOTICE "eth16i.c: Presently autoprobing (not recommended) for a single card.\n");
>  		}
> -- 


^ permalink raw reply

* RE: Splice status
From: Ofer Heifetz @ 2010-07-13 11:41 UTC (permalink / raw)
  To: Changli Gao, Eric Dumazet; +Cc: Jens Axboe, netdev@vger.kernel.org
In-Reply-To: <AANLkTinRnDwMOS5NrmzKqYZpczG4nfHSvKA9Gw-_UUzO@mail.gmail.com>

Hi,

I wanted to let you know that I have been testing Samba splice on Marvell 6282 SoC on 2.6.35_rc3 and noticed that it gave worst performance than not using it and also noticed that on re-writing file the iowait is high.

iometer using 2G file (file is created before test)

Splice  write cpu% iow%
-----------------------
 No     58    98    0
Yes     14   100   48

iozone using 2G file (file created during test)

Splice  write cpu% iow%  re-write cpu% iow%  
-------------------------------------------
 No     35    85    4    58.2     70    0
Yes     33    85    4    15.7    100   58

Any clue why splice introduces a high iowait?
I noticed samba uses up to 16K per splice syscall, changing the samba to try more did not help, so I guess it is a kernel limitation.

-Ofer

-----Original Message-----
From: Changli Gao [mailto:xiaosuo@gmail.com] 
Sent: Sunday, July 11, 2010 4:09 PM
To: Eric Dumazet
Cc: Jens Axboe; Ofer Heifetz; netdev@vger.kernel.org
Subject: Re: Splice status

On Tue, Jul 6, 2010 at 11:56 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mardi 06 juillet 2010 à 10:01 +0800, Changli Gao a écrit :
>>
>> If we don't drain the pipe before calling splice(2), the data spliced
>> from pipe maybe not be what we expect. Then data corruption occurs.
>>
>
> This is not true. A pipe is a pipe is a buffer. You dont need it to be
> empty when using it. Nowhere in documentation its stated.

Do you mean splice(2) empties the pipe buffer before using it as an
output buffer? If not, the pipe draining is needed to avoid data
corruption.

>
> However, a single skb can fill a pipe, even if "its empty"
>

Yea. Because tcp_splice_read() doesn't know if the __tcp_splice_read
returns due to pipe fulling.

>
>> >
>> > splice(sock, pipe) can block if caller dont use appropriate "non
>> > blocking pipe' splice() mode, even if pipe is empty before a splice()
>> > call.
>>
>> I don't think it is expected. The code of sys_recvfile is much like
>> the sendfile(2) implementation in kernel. If sys_recvfile may block
>> without non_block flag, sendfile(2) may block too.
>
> Then it would be a bug. You might fix it easily.

It seems reasonable. I'll fix it.

>
> Using splice() correctly (ie, not blocking on sock->pipe) should work
> too.
>
> Again, you can block on splice(sock, pipe), iff you have a second thread
> doing the opposite (pipe->file) in parallel to unblock you. But samba
> recvfile algo is using a single thread.
>




-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* [PATCH NEXT 1/1] netxen: fix for kdump
From: Amit Kumar Salecha @ 2010-07-13 11:33 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman, Rajesh Borundia

From: Rajesh Borundia <rajesh.borundia@qlogic.com>

When the crash kernel is loaded after crash, the device is in unknown state.
So reset the device contexts prior to its creation in case of kdump,
depending upon kernel parameter reset_devices.

Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
---
 drivers/net/netxen/netxen_nic_ctx.c |   16 +++++++++++++++-
 1 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/drivers/net/netxen/netxen_nic_ctx.c b/drivers/net/netxen/netxen_nic_ctx.c
index 3a41b6a..1261212 100644
--- a/drivers/net/netxen/netxen_nic_ctx.c
+++ b/drivers/net/netxen/netxen_nic_ctx.c
@@ -255,6 +255,19 @@ out_free_rq:
 }
 
 static void
+nx_fw_cmd_reset_ctx(struct netxen_adapter *adapter)
+{
+
+	netxen_issue_cmd(adapter, adapter->ahw.pci_func, NXHAL_VERSION,
+			adapter->ahw.pci_func, NX_DESTROY_CTX_RESET, 0,
+			NX_CDRP_CMD_DESTROY_RX_CTX);
+
+	netxen_issue_cmd(adapter, adapter->ahw.pci_func, NXHAL_VERSION,
+			adapter->ahw.pci_func, NX_DESTROY_CTX_RESET, 0,
+			NX_CDRP_CMD_DESTROY_TX_CTX);
+}
+
+static void
 nx_fw_cmd_destroy_rx_ctx(struct netxen_adapter *adapter)
 {
 	struct netxen_recv_context *recv_ctx = &adapter->recv_ctx;
@@ -685,7 +698,8 @@ int netxen_alloc_hw_resources(struct netxen_adapter *adapter)
 	if (!NX_IS_REVISION_P2(adapter->ahw.revision_id)) {
 		if (test_and_set_bit(__NX_FW_ATTACHED, &adapter->state))
 			goto done;
-
+		if (reset_devices)
+			nx_fw_cmd_reset_ctx(adapter);
 		err = nx_fw_cmd_create_rx_ctx(adapter);
 		if (err)
 			goto err_out_free;
-- 
1.6.0.2


^ permalink raw reply related

* [PATCH] wireless: airo: delete netdev from list after it is freed
From: Kulikov Vasiliy @ 2010-07-13 11:23 UTC (permalink / raw)
  To: kernel-janitors
  Cc: John W. Linville, David S. Miller, Matthieu CASTET,
	Stanislaw Gruszka, Roel Kluin, linux-wireless, netdev

We must call del_airo_dev() before free_netdev() since we call
add_airo_dev() exactly after alloc_netdev().

Signed-off-by: Kulikov Vasiliy <segooon@gmail.com>
---
 drivers/net/wireless/airo.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/wireless/airo.c b/drivers/net/wireless/airo.c
index 6b605df..cce2f8f 100644
--- a/drivers/net/wireless/airo.c
+++ b/drivers/net/wireless/airo.c
@@ -2931,8 +2931,8 @@ err_out_res:
 	        release_region( dev->base_addr, 64 );
 err_out_nets:
 	airo_networks_free(ai);
-	del_airo_dev(ai);
 err_out_free:
+	del_airo_dev(ai);
 	free_netdev(dev);
 	return NULL;
 }
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH] wd: fix memory leak
From: Kulikov Vasiliy @ 2010-07-13 11:23 UTC (permalink / raw)
  To: kernel-janitors; +Cc: David S. Miller, Joe Perches, netdev

Unmap mapped IO in wd_probe1() if register_netdev() failed.

Signed-off-by: Kulikov Vasiliy <segooon@gmail.com>
---
 drivers/net/wd.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/net/wd.c b/drivers/net/wd.c
index 746a5ee..eb72c67 100644
--- a/drivers/net/wd.c
+++ b/drivers/net/wd.c
@@ -358,8 +358,10 @@ static int __init wd_probe1(struct net_device *dev, int ioaddr)
 #endif
 
 	err = register_netdev(dev);
-	if (err)
+	if (err) {
 		free_irq(dev->irq, dev);
+		iounmap(ei_status.mem);
+	}
 	return err;
 }
 
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH] eth16i: fix memory leak
From: Kulikov Vasiliy @ 2010-07-13 11:22 UTC (permalink / raw)
  To: kernel-janitors
  Cc: Mika Kuoppala, David S. Miller, Stephen Hemminger, Eric Dumazet,
	Tejun Heo, Jiri Pirko, netdev

Free allocated netdev if no probe is expected.

Signed-off-by: Kulikov Vasiliy <segooon@gmail.com>
---
 drivers/net/eth16i.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/net/eth16i.c b/drivers/net/eth16i.c
index 874973f..2bdd394 100644
--- a/drivers/net/eth16i.c
+++ b/drivers/net/eth16i.c
@@ -1442,8 +1442,10 @@ int __init init_module(void)
 		dev->if_port = eth16i_parse_mediatype(mediatype[this_dev]);
 
 		if(io[this_dev] == 0) {
-			if(this_dev != 0) /* Only autoprobe 1st one */
+			if (this_dev != 0) { /* Only autoprobe 1st one */
+				free_netdev(def);
 				break;
+			}
 
 			printk(KERN_NOTICE "eth16i.c: Presently autoprobing (not recommended) for a single card.\n");
 		}
-- 
1.7.0.4


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox