Netdev List
 help / color / mirror / Atom feed
* Re: Raise initial congestion window size / speedup slow start?
From: Alan Cox @ 2010-07-15 10:33 UTC (permalink / raw)
  To: Hagen Paul Pfeifer
  Cc: David Miller, rick.jones2, lists, davidsen, linux-kernel, netdev
In-Reply-To: <20100714221301.GI6682@nuttenaction>

On Thu, 15 Jul 2010 00:13:01 +0200
Hagen Paul Pfeifer <hagen@jauu.net> wrote:

> * David Miller | 2010-07-14 14:55:47 [-0700]:
> 
> >Although section 3 of RFC 5681 is a great text, it does not say at all
> >that increasing the initial CWND would lead to fairness issues.
> 
> Because it is only one side of the medal, probing conservative the available
> link capacity in conjunction with n simultaneous probing TCP/SCTP/DCCP
> instances is another.
> 
> >To be honest, I think google's proposal holds a lot of weight.  If
> >over time link sizes and speeds are increasing (they are) then nudging
> >the initial CWND every so often is a legitimate proposal.  Were
> >someone to claim that utilization is lower than it could be because of
> >the currenttly specified initial CWND, I would have no problem
> >believing them.
> >
> >And I'm happy to make Linux use an increased value once it has
> >traction in the standardization community.
> 
> Currently I know no working link capacity probing approach, without active
> network feedback, to conservatively probing the available link capacity with a
> high CWND. I am curious about any future trends.

Given perfect information from the network nodes you still need to
traverse the network each direction and then return an answer which means
with a 0.5sec end to end time as in the original posting causality itself
demands 1.5 seconds to get an answer which is itself incomplete and
obsolete.

Causality isn't showing any signs of going away soon.

^ permalink raw reply

* Re: [PATCH -mmotm 12/30] selinux: tag avc cache alloc as non-critical
From: Xiaotian Feng @ 2010-07-15 11:51 UTC (permalink / raw)
  To: Mitchell Erblich
  Cc: linux-mm, linux-nfs, netdev, riel, cl, a.p.zijlstra, linux-kernel,
	lwang, penberg, akpm, davem
In-Reply-To: <6E38A74E-B033-4D2A-9620-2A8BDF9E0AD1@earthlink.net>

On 07/13/2010 06:55 PM, Mitchell Erblich wrote:
>
> On Jul 13, 2010, at 3:19 AM, Xiaotian Feng wrote:
>
>>  From 6c3a91091b2910c23908a9f9953efcf3df14e522 Mon Sep 17 00:00:00 2001
>> From: Xiaotian Feng<dfeng@redhat.com>
>> Date: Tue, 13 Jul 2010 11:02:41 +0800
>> Subject: [PATCH 12/30] selinux: tag avc cache alloc as non-critical
>>
>> Failing to allocate a cache entry will only harm performance not correctness.
>> Do not consume valuable reserve pages for something like that.
>>
>> Signed-off-by: Peter Zijlstra<a.p.zijlstra@chello.nl>
>> Signed-off-by: Suresh Jayaraman<sjayaraman@suse.de>
>> Signed-off-by: Xiaotian Feng<dfeng@redhat.com>
>> ---
>> security/selinux/avc.c |    2 +-
>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/security/selinux/avc.c b/security/selinux/avc.c
>> index 3662b0f..9029395 100644
>> --- a/security/selinux/avc.c
>> +++ b/security/selinux/avc.c
>> @@ -284,7 +284,7 @@ static struct avc_node *avc_alloc_node(void)
>> {
>> 	struct avc_node *node;
>>
>> -	node = kmem_cache_zalloc(avc_node_cachep, GFP_ATOMIC);
>> +	node = kmem_cache_zalloc(avc_node_cachep, GFP_ATOMIC|__GFP_NOMEMALLOC);
>> 	if (!node)
>> 		goto out;
>>
>> -- 
>> 1.7.1.1
>>
>
> Why not just replace GFP_ATOMIC with GFP_NOWAIT?
>
> This would NOT consume the valuable last pages.

But replace GFP_ATOMIC with GFP_NOWAIT can not prevent avc_alloc_node 
consume reserved pages.

>
> Mitchell Erblich
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [PATCHv4] netfilter: add CHECKSUM target
From: Michael S. Tsirkin @ 2010-07-15 11:52 UTC (permalink / raw)
  To: Daniel P. Berrange, Jes Sorensen
  Cc: Patrick McHardy, Michael S. Tsirkin, David S. Miller,
	Jan Engelhardt, Randy Dunlap, netfilter-devel, netfilter,
	coreteam, linux-kernel, netdev

This adds a `CHECKSUM' target, which can be used in the iptables mangle
table.

You can use this target to compute and fill in the checksum in
a packet that lacks a checksum.  This is particularly useful,
if you need to work around old applications such as dhcp clients,
that do not work well with checksum offloads, but don't want to
disable checksum offload in your device.

The problem happens in the field with virtualized applications.
For reference, see Red Hat bz 605555, as well as
http://www.spinics.net/lists/kvm/msg37660.html

Typical expected use (helps old dhclient binary running in a VM):
iptables -A POSTROUTING -t mangle -p udp --dport bootpc \
	-j CHECKSUM --checksum-fill

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Includes fixes by Jan Engelhardt <jengelh@medozas.de>
---

Changes from v3: fix issues pointed out by Patrick McHardy:
fixed by Jan Engelhardt
	use __u8 in public header files
	remove in.h
fixed by me
	kill change that does not belong into this patch
	add a dependency on the mangle table

 include/linux/netfilter/xt_CHECKSUM.h |   18 ++++++++
 net/netfilter/Kconfig                 |   16 +++++++
 net/netfilter/Makefile                |    1 +
 net/netfilter/xt_CHECKSUM.c           |   70 +++++++++++++++++++++++++++++++++
 4 files changed, 105 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/netfilter/xt_CHECKSUM.h
 create mode 100644 net/netfilter/xt_CHECKSUM.c

diff --git a/include/linux/netfilter/xt_CHECKSUM.h b/include/linux/netfilter/xt_CHECKSUM.h
new file mode 100644
index 0000000..3b4fb77
--- /dev/null
+++ b/include/linux/netfilter/xt_CHECKSUM.h
@@ -0,0 +1,18 @@
+/* Header file for iptables ipt_CHECKSUM target
+ *
+ * (C) 2002 by Harald Welte <laforge@gnumonks.org>
+ * (C) 2010 Red Hat Inc
+ * Author: Michael S. Tsirkin <mst@redhat.com>
+ *
+ * This software is distributed under GNU GPL v2, 1991
+*/
+#ifndef _IPT_CHECKSUM_TARGET_H
+#define _IPT_CHECKSUM_TARGET_H
+
+#define XT_CHECKSUM_OP_FILL	0x01	/* fill in checksum in IP header */
+
+struct xt_CHECKSUM_info {
+	__u8 operation;	/* bitset of operations */
+};
+
+#endif /* _IPT_CHECKSUM_TARGET_H */
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 8593a77..16b2078 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -347,6 +347,22 @@ config NETFILTER_XT_CONNMARK
 
 comment "Xtables targets"
 
+config NETFILTER_XT_TARGET_CHECKSUM
+	tristate "CHECKSUM target support"
+	depends on IP_NF_MANGLE || IP6_NF_MANGLE
+	depends on NETFILTER_ADVANCED
+	---help---
+	  This option adds a `CHECKSUM' target, which can be used in the iptables mangle
+	  table.  
+
+	  You can use this target to compute and fill in the checksum in
+	  a packet that lacks a checksum.  This is particularly useful,
+	  if you need to work around old applications such as dhcp clients,
+	  that do not work well with checksum offloads, but don't want to disable
+	  checksum offload in your device.
+
+	  To compile it as a module, choose M here.  If unsure, say N.
+
 config NETFILTER_XT_TARGET_CLASSIFY
 	tristate '"CLASSIFY" target support'
 	depends on NETFILTER_ADVANCED
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 14e3a8f..8eb541d 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -45,6 +45,7 @@ obj-$(CONFIG_NETFILTER_XT_MARK) += xt_mark.o
 obj-$(CONFIG_NETFILTER_XT_CONNMARK) += xt_connmark.o
 
 # targets
+obj-$(CONFIG_NETFILTER_XT_TARGET_CHECKSUM) += xt_CHECKSUM.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_CLASSIFY) += xt_CLASSIFY.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_CONNSECMARK) += xt_CONNSECMARK.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_CT) += xt_CT.o
diff --git a/net/netfilter/xt_CHECKSUM.c b/net/netfilter/xt_CHECKSUM.c
new file mode 100644
index 0000000..0f642ef
--- /dev/null
+++ b/net/netfilter/xt_CHECKSUM.c
@@ -0,0 +1,70 @@
+/* iptables module for the packet checksum mangling
+ *
+ * (C) 2002 by Harald Welte <laforge@netfilter.org>
+ * (C) 2010 Red Hat, Inc.
+ *
+ * Author: Michael S. Tsirkin <mst@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+*/
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/module.h>
+#include <linux/skbuff.h>
+
+#include <linux/netfilter/x_tables.h>
+#include <linux/netfilter/xt_CHECKSUM.h>
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Michael S. Tsirkin <mst@redhat.com>");
+MODULE_DESCRIPTION("Xtables: checksum modification");
+MODULE_ALIAS("ipt_CHECKSUM");
+MODULE_ALIAS("ip6t_CHECKSUM");
+
+static unsigned int
+checksum_tg(struct sk_buff *skb, const struct xt_action_param *par)
+{
+	if (skb->ip_summed == CHECKSUM_PARTIAL)
+		skb_checksum_help(skb);
+
+	return XT_CONTINUE;
+}
+
+static int checksum_tg_check(const struct xt_tgchk_param *par)
+{
+	const struct xt_CHECKSUM_info *einfo = par->targinfo;
+
+	if (einfo->operation & ~XT_CHECKSUM_OP_FILL) {
+		pr_info("unsupported CHECKSUM operation %x\n", einfo->operation);
+		return -EINVAL;
+	}
+	if (!einfo->operation) {
+		pr_info("no CHECKSUM operation enabled\n");
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static struct xt_target checksum_tg_reg __read_mostly = {
+	.name		= "CHECKSUM",
+	.family		= NFPROTO_UNSPEC,
+	.target		= checksum_tg,
+	.targetsize	= sizeof(struct xt_CHECKSUM_info),
+	.table		= "mangle",
+	.checkentry	= checksum_tg_check,
+	.me		= THIS_MODULE,
+};
+
+static int __init checksum_tg_init(void)
+{
+	return xt_register_target(&checksum_tg_reg);
+}
+
+static void __exit checksum_tg_exit(void)
+{
+	xt_unregister_target(&checksum_tg_reg);
+}
+
+module_init(checksum_tg_init);
+module_exit(checksum_tg_exit);
-- 
1.7.2.rc0.14.g41c1c

^ permalink raw reply related

* [PATCHv4] extensions: libxt_CHECKSUM extension
From: Michael S. Tsirkin @ 2010-07-15 11:52 UTC (permalink / raw)
  To: Daniel P. Berrange, Jes Sorensen
  Cc: Patrick McHardy, Michael S. Tsirkin, David S. Miller,
	Jan Engelhardt, Randy Dunlap, netfilter-devel, netfilter,
	coreteam, linux-kernel, netdev
In-Reply-To: <20100715114434.GA6415@redhat.com>

This adds a `CHECKSUM' target, which can be used in the iptables mangle
table.

You can use this target to compute and fill in the checksum in
a packet that lacks a checksum.  This is particularly useful,
if you need to work around old applications such as dhcp clients,
that do not work well with checksum offloads, but don't want to disable
checksum offload in your device.

The problem happens in the field with virtualized applications.
For reference, see Red Hat bz 605555, as well as
http://www.spinics.net/lists/kvm/msg37660.html

Typical expected use (helps old dhclient binary running in a VM):
iptables -A POSTROUTING -t mangle -p udp --dport bootpc \
-j CHECKSUM --checksum-fill

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Includes fixes by Jan Engelhardt <jengelh@medozas.de>
---

Changes from v3: fixed issues from review by Patrick McHardy
Applied these fixes by Jan Engelhardt
	Remove $$ cvs tags
	Use xtables_param_act

 extensions/libxt_CHECKSUM.c   |   96 +++++++++++++++++++++++++++++++++++++++++
 extensions/libxt_CHECKSUM.man |    8 +++
 2 files changed, 104 insertions(+), 0 deletions(-)
 create mode 100644 extensions/libxt_CHECKSUM.c
 create mode 100644 extensions/libxt_CHECKSUM.man

diff --git a/extensions/libxt_CHECKSUM.c b/extensions/libxt_CHECKSUM.c
new file mode 100644
index 0000000..9a24443
--- /dev/null
+++ b/extensions/libxt_CHECKSUM.c
@@ -0,0 +1,96 @@
+/* Shared library add-on to xtables for CHECKSUM
+ *
+ * (C) 2002 by Harald Welte <laforge@gnumonks.org>
+ * (C) 2010 by Red Hat, Inc
+ * Author: Michael S. Tsirkin <mst@redhat.com>
+ *
+ * This program is distributed under the terms of GNU GPL v2, 1991
+ *
+ * libxt_CHECKSUM.c borrowed some bits from libipt_ECN.c
+ */
+#include <stdio.h>
+#include <string.h>
+#include <stdlib.h>
+#include <getopt.h>
+
+#include <xtables.h>
+#include <linux/netfilter/xt_CHECKSUM.h>
+
+static void CHECKSUM_help(void)
+{
+	printf(
+"CHECKSUM target options\n"
+"  --checksum-fill			Fill in packet checksum.\n");
+}
+
+static const struct option CHECKSUM_opts[] = {
+	{ "checksum-fill", 0, NULL, 'F' },
+	{ .name = NULL }
+};
+
+static int CHECKSUM_parse(int c, char **argv, int invert, unsigned int *flags,
+                     const void *entry, struct xt_entry_target **target)
+{
+	struct xt_CHECKSUM_info *einfo
+		= (struct xt_CHECKSUM_info *)(*target)->data;
+
+	switch (c) {
+	case 'F':
+		xtables_param_act(XTF_ONLY_ONCE, "CHECKSUM", "--checksum-fill",
+			*flags & XT_CHECKSUM_OP_FILL);
+		einfo->operation = XT_CHECKSUM_OP_FILL;
+		*flags |= XT_CHECKSUM_OP_FILL;
+		break;
+	default:
+		return 0;
+	}
+
+	return 1;
+}
+
+static void CHECKSUM_check(unsigned int flags)
+{
+	if (!flags)
+		xtables_error(PARAMETER_PROBLEM,
+		           "CHECKSUM target: Parameter --checksum-fill is required");
+}
+
+static void CHECKSUM_print(const void *ip, const struct xt_entry_target *target,
+                      int numeric)
+{
+	const struct xt_CHECKSUM_info *einfo =
+		(const struct xt_CHECKSUM_info *)target->data;
+
+	printf("CHECKSUM ");
+
+	if (einfo->operation & XT_CHECKSUM_OP_FILL)
+		printf("fill ");
+}
+
+static void CHECKSUM_save(const void *ip, const struct xt_entry_target *target)
+{
+	const struct xt_CHECKSUM_info *einfo =
+		(const struct xt_CHECKSUM_info *)target->data;
+
+	if (einfo->operation & XT_CHECKSUM_OP_FILL)
+		printf("--checksum-fill ");
+}
+
+static struct xtables_target checksum_tg_reg = {
+	.name		= "CHECKSUM",
+	.version	= XTABLES_VERSION,
+	.family		= NFPROTO_UNSPEC,
+	.size		= XT_ALIGN(sizeof(struct xt_CHECKSUM_info)),
+	.userspacesize	= XT_ALIGN(sizeof(struct xt_CHECKSUM_info)),
+	.help		= CHECKSUM_help,
+	.parse		= CHECKSUM_parse,
+	.final_check	= CHECKSUM_check,
+	.print		= CHECKSUM_print,
+	.save		= CHECKSUM_save,
+	.extra_opts	= CHECKSUM_opts,
+};
+
+void _init(void)
+{
+	xtables_register_target(&checksum_tg_reg);
+}
diff --git a/extensions/libxt_CHECKSUM.man b/extensions/libxt_CHECKSUM.man
new file mode 100644
index 0000000..92ae700
--- /dev/null
+++ b/extensions/libxt_CHECKSUM.man
@@ -0,0 +1,8 @@
+This target allows to selectively work around broken/old applications.
+It can only be used in the mangle table.
+.TP
+\fB\-\-checksum\-fill\fP
+Compute and fill in the checksum in a packet that lacks a checksum.
+This is particularly useful, if you need to work around old applications
+such as dhcp clients, that do not work well with checksum offloads,
+but don't want to disable checksum offload in your device.
-- 
1.7.2.rc0.14.g41c1c

^ permalink raw reply related

* Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15
From: Lennart Schulte @ 2010-07-15 11:58 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Tejun Heo, David S. Miller, lkml, netdev@vger.kernel.org,
	Fehrmann, Henning, Carsten Aulbert, Eric Dumazet
In-Reply-To: <alpine.DEB.2.00.1007111825510.15736@melkinpaasi.cs.helsinki.fi>

I'm testing new reordering algorithms in a virtual testbed, that is the 
nodes are emulated with xen and all the network parameters can be tuned 
with queues.
With one of the algorithms I also got tracebacks which include 
tcp_xmit_retransmit_queue. It only happens with ACK loss. The kernel 
version however is 2.6.31.
When I read this thread I tried the debug patch and got the following:

[ 2754.413150] NULL head, pkts 0
[ 2754.413156] Errors caught so far 1

Hope that is of any help.

^ permalink raw reply

* Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15
From: Eric Dumazet @ 2010-07-15 12:05 UTC (permalink / raw)
  To: Lennart Schulte
  Cc: Ilpo Järvinen, Tejun Heo, David S. Miller, lkml,
	netdev@vger.kernel.org, Fehrmann, Henning, Carsten Aulbert
In-Reply-To: <4C3EF7EA.2040900@nets.rwth-aachen.de>

Le jeudi 15 juillet 2010 à 13:58 +0200, Lennart Schulte a écrit :
> I'm testing new reordering algorithms in a virtual testbed, that is the 
> nodes are emulated with xen and all the network parameters can be tuned 
> with queues.
> With one of the algorithms I also got tracebacks which include 
> tcp_xmit_retransmit_queue. It only happens with ACK loss. The kernel 
> version however is 2.6.31.
> When I read this thread I tried the debug patch and got the following:
> 
> [ 2754.413150] NULL head, pkts 0
> [ 2754.413156] Errors caught so far 1
> 
> Hope that is of any help.

Not sure I understand. 

Are you saying you reproduce same tcp_xmit_retransmit_queue()  bug, with
a special tc qdisc/class droppping some ACK frames ?

Could it be some sched problem and incorrect return codes in case of
congestion ?

^ permalink raw reply

* [PATCH] vhost-net: avoid flush under lock
From: Michael S. Tsirkin @ 2010-07-15 12:19 UTC (permalink / raw)
  Cc: Michael S. Tsirkin, Sridhar Samudrala, David S. Miller,
	Arnd Bergmann, Paul E. McKenney, kvm, virtualization, netdev,
	linux-kernel

We flush under vq mutex when changing backends.
This creates a deadlock as workqueue being flushed
needs this lock as well.

https://bugzilla.redhat.com/show_bug.cgi?id=612421

Drop the vq mutex before flush: we have the device mutex
which is sufficient to prevent another ioctl from touching
the vq.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/vhost/net.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 28d7786..50df58e6 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -534,11 +534,16 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
 	rcu_assign_pointer(vq->private_data, sock);
 	vhost_net_enable_vq(n, vq);
 done:
+	mutex_unlock(&vq->mutex);
+
 	if (oldsock) {
 		vhost_net_flush_vq(n, index);
 		fput(oldsock->file);
 	}
 
+	mutex_unlock(&n->dev.mutex);
+	return 0;
+
 err_vq:
 	mutex_unlock(&vq->mutex);
 err:
-- 
1.7.2.rc0.14.g41c1c

^ permalink raw reply related

* [PATCH 1/3] drivers: isdn: use kernel macros to convert hex digit
From: Andy Shevchenko @ 2010-07-15 12:37 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andy Shevchenko, Karsten Keil, Tilman Schmidt, netdev

Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: Tilman Schmidt <tilman@imap.cc>
Cc: netdev@vger.kernel.org
---
 drivers/isdn/capi/capidrv.c |    7 ++-----
 drivers/isdn/hisax/q931.c   |   13 ++-----------
 2 files changed, 4 insertions(+), 16 deletions(-)

diff --git a/drivers/isdn/capi/capidrv.c b/drivers/isdn/capi/capidrv.c
index bf55ed5..2978bda 100644
--- a/drivers/isdn/capi/capidrv.c
+++ b/drivers/isdn/capi/capidrv.c
@@ -1450,12 +1450,9 @@ static void handle_dtrace_data(capidrv_contr *card,
     	}
 
 	for (p = data, end = data+len; p < end; p++) {
-		u8 w;
 		PUTBYTE_TO_STATUS(card, ' ');
-		w = (*p >> 4) & 0xf;
-		PUTBYTE_TO_STATUS(card, (w < 10) ? '0'+w : 'A'-10+w);
-		w = *p & 0xf;
-		PUTBYTE_TO_STATUS(card, (w < 10) ? '0'+w : 'A'-10+w);
+		PUTBYTE_TO_STATUS(card, hex_asc_hi(*p));
+		PUTBYTE_TO_STATUS(card, hex_asc_lo(*p));
 	}
 	PUTBYTE_TO_STATUS(card, '\n');
 
diff --git a/drivers/isdn/hisax/q931.c b/drivers/isdn/hisax/q931.c
index 8b853d5..c0771f9 100644
--- a/drivers/isdn/hisax/q931.c
+++ b/drivers/isdn/hisax/q931.c
@@ -1152,20 +1152,11 @@ QuickHex(char *txt, u_char * p, int cnt)
 {
 	register int i;
 	register char *t = txt;
-	register u_char w;
 
 	for (i = 0; i < cnt; i++) {
 		*t++ = ' ';
-		w = (p[i] >> 4) & 0x0f;
-		if (w < 10)
-			*t++ = '0' + w;
-		else
-			*t++ = 'A' - 10 + w;
-		w = p[i] & 0x0f;
-		if (w < 10)
-			*t++ = '0' + w;
-		else
-			*t++ = 'A' - 10 + w;
+		*t++ = hex_asc_hi(p[i]);
+		*t++ = hex_asc_lo(p[i]);
 	}
 	*t++ = 0;
 	return (t - txt);
-- 
1.7.1.1


^ permalink raw reply related

* Re: [PATCH -mmotm 05/30] mm: sl[au]b: add knowledge of reserve pages
From: Xiaotian Feng @ 2010-07-15 12:37 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: linux-mm, linux-nfs, netdev, riel, cl, a.p.zijlstra, linux-kernel,
	lwang, akpm, davem
In-Reply-To: <AANLkTilj5GrhbRJZfSsfXP1v9cQSRlARFmxpys1vUelr@mail.gmail.com>

On 07/14/2010 04:33 AM, Pekka Enberg wrote:
> Hi Xiaotian!
>
> I would actually prefer that the SLAB, SLOB, and SLUB changes were in
> separate patches to make reviewing easier.
>
> Looking at SLUB:
>
> On Tue, Jul 13, 2010 at 1:17 PM, Xiaotian Feng<dfeng@redhat.com>  wrote:
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 7bb7940..7a5d6dc 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -27,6 +27,8 @@
>>   #include<linux/memory.h>
>>   #include<linux/math64.h>
>>   #include<linux/fault-inject.h>
>> +#include "internal.h"
>> +
>>
>>   /*
>>   * Lock order:
>> @@ -1139,7 +1141,8 @@ static void setup_object(struct kmem_cache *s, struct page *page,
>>                 s->ctor(object);
>>   }
>>
>> -static struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node)
>> +static
>> +struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node, int *reserve)
>>   {
>>         struct page *page;
>>         void *start;
>> @@ -1153,6 +1156,8 @@ static struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node)
>>         if (!page)
>>                 goto out;
>>
>> +       *reserve = page->reserve;
>> +
>>         inc_slabs_node(s, page_to_nid(page), page->objects);
>>         page->slab = s;
>>         page->flags |= 1<<  PG_slab;
>> @@ -1606,10 +1611,20 @@ static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>>   {
>>         void **object;
>>         struct page *new;
>> +       int reserve;
>>
>>         /* We handle __GFP_ZERO in the caller */
>>         gfpflags&= ~__GFP_ZERO;
>>
>> +       if (unlikely(c->reserve)) {
>> +               /*
>> +                * If the current slab is a reserve slab and the current
>> +                * allocation context does not allow access to the reserves we
>> +                * must force an allocation to test the current levels.
>> +                */
>> +               if (!(gfp_to_alloc_flags(gfpflags)&  ALLOC_NO_WATERMARKS))
>> +                       goto grow_slab;
>
> OK, so assume that:
>
>    (1) c->reserve is set to one
>
>    (2) GFP flags don't allow dipping into the reserves
>
>    (3) we've managed to free enough pages so normal
>         allocations are fine
>
>    (4) the page from reserves is not yet empty
>
> we will call flush_slab() and put the "emergency page" on partial list
> and clear c->reserve. This effectively means that now some other
> allocation can fetch the partial page and start to use it. Is this OK?
> Who makes sure the emergency reserves are large enough for the next
> out-of-memory condition where we swap over NFS?
>

Good catch. I'm just wondering if above check is necessary. For
"emergency page", we don't set c->freelist. How can we get a
reserved slab, if GPF flags don't allow dipping into reserves?

>> +       }
>>         if (!c->page)
>>                 goto new_slab;
>>
>> @@ -1623,8 +1638,8 @@ load_freelist:
>>         object = c->page->freelist;
>>         if (unlikely(!object))
>>                 goto another_slab;
>> -       if (unlikely(SLABDEBUG&&  PageSlubDebug(c->page)))
>> -               goto debug;
>> +       if (unlikely(SLABDEBUG&&  PageSlubDebug(c->page) || c->reserve))
>> +               goto slow_path;
>>
>>         c->freelist = get_freepointer(s, object);
>>         c->page->inuse = c->page->objects;
>> @@ -1646,16 +1661,18 @@ new_slab:
>>                 goto load_freelist;
>>         }
>>
>> +grow_slab:
>>         if (gfpflags&  __GFP_WAIT)
>>                 local_irq_enable();
>>
>> -       new = new_slab(s, gfpflags, node);
>> +       new = new_slab(s, gfpflags, node,&reserve);
>>
>>         if (gfpflags&  __GFP_WAIT)
>>                 local_irq_disable();
>>
>>         if (new) {
>>                 c = __this_cpu_ptr(s->cpu_slab);
>> +               c->reserve = reserve;
>>                 stat(s, ALLOC_SLAB);
>>                 if (c->page)
>>                         flush_slab(s, c);
>> @@ -1667,10 +1684,20 @@ new_slab:
>>         if (!(gfpflags&  __GFP_NOWARN)&&  printk_ratelimit())
>>                 slab_out_of_memory(s, gfpflags, node);
>>         return NULL;
>> -debug:
>> -       if (!alloc_debug_processing(s, c->page, object, addr))
>> +
>> +slow_path:
>> +       if (!c->reserve&&  !alloc_debug_processing(s, c->page, object, addr))
>>                 goto another_slab;
>>
>> +       /*
>> +        * Avoid the slub fast path in slab_alloc() by not setting
>> +        * c->freelist and the fast path in slab_free() by making
>> +        * node_match() fail by setting c->node to -1.
>> +        *
>> +        * We use this for for debug and reserve checks which need
>> +        * to be done for each allocation.
>> +        */
>> +
>>         c->page->inuse++;
>>         c->page->freelist = get_freepointer(s, object);
>>         c->node = -1;
>> @@ -2095,10 +2122,11 @@ static void early_kmem_cache_node_alloc(gfp_t gfpflags, int node)
>>         struct page *page;
>>         struct kmem_cache_node *n;
>>         unsigned long flags;
>> +       int reserve;
>>
>>         BUG_ON(kmalloc_caches->size<  sizeof(struct kmem_cache_node));
>>
>> -       page = new_slab(kmalloc_caches, gfpflags, node);
>> +       page = new_slab(kmalloc_caches, gfpflags, node,&reserve);
>>
>>         BUG_ON(!page);
>>         if (page_to_nid(page) != node) {
>> --
>> 1.7.1.1
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email:<a href=mailto:"dont@kvack.org">  email@kvack.org</a>
>>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [PATCH 2/3] drivers: isdn: remove custom strtoul()
From: Andy Shevchenko @ 2010-07-15 12:37 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andy Shevchenko, Karsten Keil, netdev
In-Reply-To: <1279197440-6585-1-git-send-email-andy.shevchenko@gmail.com>

In this case we safe to use strict_strtoul().

Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: netdev@vger.kernel.org
---
 drivers/isdn/hysdn/hysdn_proclog.c |   36 +++++++-----------------------------
 1 files changed, 7 insertions(+), 29 deletions(-)

diff --git a/drivers/isdn/hysdn/hysdn_proclog.c b/drivers/isdn/hysdn/hysdn_proclog.c
index 7003698..2ee93d0 100644
--- a/drivers/isdn/hysdn/hysdn_proclog.c
+++ b/drivers/isdn/hysdn/hysdn_proclog.c
@@ -16,6 +16,7 @@
 #include <linux/sched.h>
 #include <linux/slab.h>
 #include <linux/mutex.h>
+#include <linux/kernel.h>
 
 #include "hysdn_defs.h"
 
@@ -155,9 +156,8 @@ static ssize_t
 hysdn_log_write(struct file *file, const char __user *buf, size_t count, loff_t * off)
 {
 	unsigned long u = 0;
-	int found = 0;
-	unsigned char *cp, valbuf[128];
-	long base = 10;
+	int rc;
+	unsigned char valbuf[128];
 	hysdn_card *card = file->private_data;
 
 	if (count > (sizeof(valbuf) - 1))
@@ -166,32 +166,10 @@ hysdn_log_write(struct file *file, const char __user *buf, size_t count, loff_t
 		return (-EFAULT);	/* copy failed */
 
 	valbuf[count] = 0;	/* terminating 0 */
-	cp = valbuf;
-	if ((count > 2) && (valbuf[0] == '0') && (valbuf[1] == 'x')) {
-		cp += 2;	/* pointer after hex modifier */
-		base = 16;
-	}
-	/* scan the input for debug flags */
-	while (*cp) {
-		if ((*cp >= '0') && (*cp <= '9')) {
-			found = 1;
-			u *= base;	/* adjust to next digit */
-			u += *cp++ - '0';
-			continue;
-		}
-		if (base != 16)
-			break;	/* end of number */
-
-		if ((*cp >= 'a') && (*cp <= 'f')) {
-			found = 1;
-			u *= base;	/* adjust to next digit */
-			u += *cp++ - 'a' + 10;
-			continue;
-		}
-		break;		/* terminated */
-	}
 
-	if (found) {
+	rc = strict_strtoul(valbuf, 0, &u);
+
+	if (rc == 0) {
 		card->debug_flags = u;	/* remember debug flags */
 		hysdn_addlog(card, "debug set to 0x%lx", card->debug_flags);
 	}
-- 
1.7.1.1


^ permalink raw reply related

* [PATCH 3/3] drivers: isdn: get rid of custom strtoul()
From: Andy Shevchenko @ 2010-07-15 12:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andy Shevchenko, Hansjoerg Lipp, Tilman Schmidt, Karsten Keil,
	gigaset307x-common, netdev
In-Reply-To: <1279197440-6585-1-git-send-email-andy.shevchenko@gmail.com>

There were two methods isdn_gethex() and isdn_getnum() which are custom
implementations of strtoul(). Get rid of them in regard to
strict_strtoul() kernel's function.

Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Hansjoerg Lipp <hjlipp@web.de>
Cc: Tilman Schmidt <tilman@imap.cc>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: gigaset307x-common@lists.sourceforge.net
Cc: netdev@vger.kernel.org
---
 drivers/isdn/gigaset/ev-layer.c |   80 ++++++++++-----------------------------
 1 files changed, 20 insertions(+), 60 deletions(-)

diff --git a/drivers/isdn/gigaset/ev-layer.c b/drivers/isdn/gigaset/ev-layer.c
index a230ba7..a141876 100644
--- a/drivers/isdn/gigaset/ev-layer.c
+++ b/drivers/isdn/gigaset/ev-layer.c
@@ -385,64 +385,18 @@ static const struct zsau_resp_t {
 	{NULL,				ZSAU_UNKNOWN}
 };
 
-/*
- * Get integer from char-pointer
- */
-static int isdn_getnum(char *p)
-{
-	int v = -1;
-
-	gig_dbg(DEBUG_EVENT, "string: %s", p);
-
-	while (*p >= '0' && *p <= '9')
-		v = ((v < 0) ? 0 : (v * 10)) + (int) ((*p++) - '0');
-	if (*p)
-		v = -1; /* invalid Character */
-	return v;
-}
-
-/*
- * Get integer from char-pointer
- */
-static int isdn_gethex(char *p)
-{
-	int v = 0;
-	int c;
-
-	gig_dbg(DEBUG_EVENT, "string: %s", p);
-
-	if (!*p)
-		return -1;
-
-	do {
-		if (v > (INT_MAX - 15) / 16)
-			return -1;
-		c = *p;
-		if (c >= '0' && c <= '9')
-			c -= '0';
-		else if (c >= 'a' && c <= 'f')
-			c -= 'a' - 10;
-		else if (c >= 'A' && c <= 'F')
-			c -= 'A' - 10;
-		else
-			return -1;
-		v = v * 16 + c;
-	} while (*++p);
-
-	return v;
-}
-
 /* retrieve CID from parsed response
  * returns 0 if no CID, -1 if invalid CID, or CID value 1..65535
  */
 static int cid_of_response(char *s)
 {
-	int cid;
+	unsigned long cid;
+	int rc;
 
 	if (s[-1] != ';')
 		return 0;	/* no CID separator */
-	cid = isdn_getnum(s);
-	if (cid < 0)
+	rc = strict_strtoul(s, 10, &cid);
+	if (rc)
 		return 0;	/* CID not numeric */
 	if (cid < 1 || cid > 65535)
 		return -1;	/* CID out of range */
@@ -612,21 +566,27 @@ void gigaset_handle_modem_response(struct cardstate *cs)
 		case RT_ZCAU:
 			event->parameter = -1;
 			if (curarg + 1 < params) {
-				i = isdn_gethex(argv[curarg]);
-				j = isdn_gethex(argv[curarg + 1]);
-				if (i >= 0 && i < 256 && j >= 0 && j < 256)
-					event->parameter = (unsigned) i << 8
-							   | j;
-				curarg += 2;
+				unsigned long type, value;
+
+				i = strict_strtoul(argv[curarg++], 16, &type);
+				j = strict_strtoul(argv[curarg++], 16, &value);
+
+				if (i == 0 && type < 256 &&
+				    j == 0 && value < 256)
+					event->parameter = (type << 8) | value;
 			} else
 				curarg = params - 1;
 			break;
 		case RT_NUMBER:
+			event->parameter = -1;
 			if (curarg < params) {
-				event->parameter = isdn_getnum(argv[curarg]);
-				++curarg;
-			} else
-				event->parameter = -1;
+				unsigned long res;
+				int rc;
+
+				rc = strict_strtoul(argv[curarg++], 10, &res);
+				if (rc == 0)
+					event->parameter = res;
+			}
 			gig_dbg(DEBUG_EVENT, "parameter==%d", event->parameter);
 			break;
 		}
-- 
1.7.1.1


^ permalink raw reply related

* Re: [PATCH] vhost-net: avoid flush under lock
From: Michael S. Tsirkin @ 2010-07-15 12:37 UTC (permalink / raw)
  To: Sridhar Samudrala, David S. Miller, Arnd Bergmann,
	Paul E. McKenney, kvm
In-Reply-To: <20100715121912.GA7176@redhat.com>

On Thu, Jul 15, 2010 at 03:19:12PM +0300, Michael S. Tsirkin wrote:
> We flush under vq mutex when changing backends.
> This creates a deadlock as workqueue being flushed
> needs this lock as well.
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=612421
> 
> Drop the vq mutex before flush: we have the device mutex
> which is sufficient to prevent another ioctl from touching
> the vq.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

Dave, just to clarify, I'll send pull request to merge it through my tree,
there's no need for you to bother with this.

> ---
>  drivers/vhost/net.c |    5 +++++
>  1 files changed, 5 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 28d7786..50df58e6 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -534,11 +534,16 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
>  	rcu_assign_pointer(vq->private_data, sock);
>  	vhost_net_enable_vq(n, vq);
>  done:
> +	mutex_unlock(&vq->mutex);
> +
>  	if (oldsock) {
>  		vhost_net_flush_vq(n, index);
>  		fput(oldsock->file);
>  	}
>  
> +	mutex_unlock(&n->dev.mutex);
> +	return 0;
> +
>  err_vq:
>  	mutex_unlock(&vq->mutex);
>  err:
> -- 
> 1.7.2.rc0.14.g41c1c

^ permalink raw reply

* Re: [PATCH RFC] act_cpu: packet distributing
From: jamal @ 2010-07-15 12:48 UTC (permalink / raw)
  To: Changli Gao
  Cc: Eric Dumazet, David S. Miller, Patrick McHardy, Tom Herbert,
	netdev
In-Reply-To: <AANLkTimPmXWyAspJqnya7aM_vDRNzDdJJf2bl7CYrAXN@mail.gmail.com>

On Wed, 2010-07-14 at 12:17 +0800, Changli Gao wrote:

> Thanks, I'll try. It is a write critical section, and for me it is
> difficult to convert this lock to RCU. Could you show me some
> examples?

RCU maybe a little trickier here Eric. Actions could be shared i.e. 
example, it is possible to have a policer action restricting rates for a
group of flows  across multiple netdevices etc. Since action stats get
written to by different CPUs concurrently. It could be probably done if
one was to implement per-cpu stats which get summed-up when user space
asks.

cheers,
jamal


^ permalink raw reply

* Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15
From: Lennart Schulte @ 2010-07-15 12:55 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Ilpo Järvinen, Tejun Heo, David S. Miller, lkml,
	netdev@vger.kernel.org, Fehrmann, Henning, Carsten Aulbert
In-Reply-To: <1279195528.2496.2.camel@edumazet-laptop>

Since tcp_xmit_retransmit_queue also gets skb == NULL I'm pretty sure it 
is the same bug.
Up to now I only experienced the problem with ACK loss (without ACK loss 
the test ran about 30min without problems, with ACK loss it had paniced 
within 10min).
The data sender only has a HTB queue for traffic shaping (set to 20 
Mbit/s). The ACK loss is done by another router.
The setup looks like this. This way it seems to be the most realistic.

o sender with HTB
|
|
o netem queue for forward path delay
|
o netem queue for a queue limit
|
o netem queue for backward path delay
|
o netem queue for ACK loss
|
|
o receiver with HTB

Perhaps now it is a little big clearer.


On 15.07.2010 14:05, Eric Dumazet wrote:
> Le jeudi 15 juillet 2010 à 13:58 +0200, Lennart Schulte a écrit :
>    
>> I'm testing new reordering algorithms in a virtual testbed, that is the
>> nodes are emulated with xen and all the network parameters can be tuned
>> with queues.
>> With one of the algorithms I also got tracebacks which include
>> tcp_xmit_retransmit_queue. It only happens with ACK loss. The kernel
>> version however is 2.6.31.
>> When I read this thread I tried the debug patch and got the following:
>>
>> [ 2754.413150] NULL head, pkts 0
>> [ 2754.413156] Errors caught so far 1
>>
>> Hope that is of any help.
>>      
> Not sure I understand.
>
> Are you saying you reproduce same tcp_xmit_retransmit_queue()  bug, with
> a special tc qdisc/class droppping some ACK frames ?
>
> Could it be some sched problem and incorrect return codes in case of
> congestion ?
>    

^ permalink raw reply

* [RFC] Question about tcp_sendmsg()
From: Eric Dumazet @ 2010-07-15 13:14 UTC (permalink / raw)
  To: David Miller, Ilpo Järvinen, Krishna Kumar; +Cc: netdev

While investigating for various bug reports in tcp stack, I looked at
commit def87cf42069a (tcp: Slightly optimize tcp_sendmsg)

One question I have is that the 

sg = sk->sk_route_caps & NETIF_F_SG;

is now done at the beginning of tcp_sendmsg(), and kept in sg variable
for the whole tcp_sendmsg() duration, even if task has to wait for
space.

Previously sk->sk_route_caps & NETIF_F_SG was done in select_size()
itself.

I am wondering if this can have a side effect, if SG capability changes
while a thread has to wait in sk_stream_wait_memory(), and socket route
changes (sk_route_caps flips NETIF_F_SG bit)

Thanks

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 9fce8a8..8a4d9bd 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1093,6 +1093,7 @@ wait_for_memory:
 				goto do_error;
 
 			mss_now = tcp_send_mss(sk, &size_goal, flags);
+			sg = sk->sk_route_caps & NETIF_F_SG;
 		}
 	}
 



^ permalink raw reply related

* Question about way that NICs deliver packets to the kernel
From: Junchang Wang @ 2010-07-15 14:24 UTC (permalink / raw)
  To: romieu, netdev

Hi list,
My understand of the way that NICs deliver packets to the kernel is
as follows. Correct me if any of this is wrong. Thanks.

1) The device buffer is fixed. When the kernel is acknowledged arrival of a 
new packet, it dynamically allocate a new skb and copy the packet into it. 
For example, 8139too.

2) The device buffer is mapped by streaming DMA. When the kernel is 
acknowledged arrival of a new packet, it unmaps the region previously mapped. 
Obviously, there is NO memcpy operation. Additional cost is streaming DMA 
map/unmap operations. For example, e100 and e1000.

Here comes my question:
1) Is there a principle indicating which one is better? Is streaming DMA
map/unmap operations more expensive than memcpy operation?


2) Why does r8169 bias towards the first approach even if it support both? I 
convert r8169 to the second one and get a 5% performance boost. Below is result
running netperf TCP_STREAM test with 1.6K byte packet length.
        scheme 1    scheme 2    Imp.
r8169     683M        718M       5%

The following patch shows what I did:

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 239d7ef..707876f 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -4556,15 +4556,9 @@ static int rtl8169_rx_interrupt(struct net_device *dev,
 
 			rtl8169_rx_csum(skb, desc);
 
-			if (rtl8169_try_rx_copy(&skb, tp, pkt_size, addr)) {
-				pci_dma_sync_single_for_device(pdev, addr,
-					pkt_size, PCI_DMA_FROMDEVICE);
-				rtl8169_mark_to_asic(desc, tp->rx_buf_sz);
-			} else {
-				pci_unmap_single(pdev, addr, tp->rx_buf_sz,
-						 PCI_DMA_FROMDEVICE);
-				tp->Rx_skbuff[entry] = NULL;
-			}
+			pci_unmap_single(pdev, addr, tp->rx_buf_sz,
+					 PCI_DMA_FROMDEVICE);
+			tp->Rx_skbuff[entry] = NULL;
 
 			skb_put(skb, pkt_size);
 			skb->protocol = eth_type_trans(skb, dev);

Thanks in advance.

--Junchang

^ permalink raw reply related

* [PATCH -next 0/2] bnx2: allow sleep during allocation
From: Stanislaw Gruszka @ 2010-07-15 14:25 UTC (permalink / raw)
  To: netdev; +Cc: Michael Chan

We have Fedora bug report about memory allocation failure in bnx2_open
(https://bugzilla.redhat.com/show_bug.cgi?id=612861). To prevent
failure we can allow allocator to sleep. Both patches add
GFP_KERNEL flag where possible, first patch in alloc API, second
in DMA API (after conversion from pci_dma_*).

Stanislaw

^ permalink raw reply

* [PATCH 1/2] bnx2: allocate with GFP_KERNEL flag on RX path init
From: Stanislaw Gruszka @ 2010-07-15 14:25 UTC (permalink / raw)
  To: netdev; +Cc: Michael Chan
In-Reply-To: <20100715142530.12504.80404.send-patch@dhcp-lab-109.englab.brq.redhat.com>

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
---
 drivers/net/bnx2.c |   17 +++++++++--------
 1 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index a203f39..6de4cb7 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -2664,13 +2664,13 @@ bnx2_set_mac_addr(struct bnx2 *bp, u8 *mac_addr, u32 pos)
 }
 
 static inline int
-bnx2_alloc_rx_page(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, u16 index)
+bnx2_alloc_rx_page(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, u16 index, gfp_t gfp)
 {
 	dma_addr_t mapping;
 	struct sw_pg *rx_pg = &rxr->rx_pg_ring[index];
 	struct rx_bd *rxbd =
 		&rxr->rx_pg_desc_ring[RX_RING(index)][RX_IDX(index)];
-	struct page *page = alloc_page(GFP_ATOMIC);
+	struct page *page = alloc_page(gfp);
 
 	if (!page)
 		return -ENOMEM;
@@ -2705,7 +2705,7 @@ bnx2_free_rx_page(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, u16 index)
 }
 
 static inline int
-bnx2_alloc_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, u16 index)
+bnx2_alloc_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, u16 index, gfp_t gfp)
 {
 	struct sk_buff *skb;
 	struct sw_bd *rx_buf = &rxr->rx_buf_ring[index];
@@ -2713,7 +2713,7 @@ bnx2_alloc_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, u16 index)
 	struct rx_bd *rxbd = &rxr->rx_desc_ring[RX_RING(index)][RX_IDX(index)];
 	unsigned long align;
 
-	skb = netdev_alloc_skb(bp->dev, bp->rx_buf_size);
+	skb = __netdev_alloc_skb(bp->dev, bp->rx_buf_size, gfp);
 	if (skb == NULL) {
 		return -ENOMEM;
 	}
@@ -2974,7 +2974,7 @@ bnx2_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, struct sk_buff *skb,
 	int err;
 	u16 prod = ring_idx & 0xffff;
 
-	err = bnx2_alloc_rx_skb(bp, rxr, prod);
+	err = bnx2_alloc_rx_skb(bp, rxr, prod, GFP_KERNEL);
 	if (unlikely(err)) {
 		bnx2_reuse_rx_skb(bp, rxr, skb, (u16) (ring_idx >> 16), prod);
 		if (hdr_len) {
@@ -3039,7 +3039,8 @@ bnx2_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, struct sk_buff *skb,
 			rx_pg->page = NULL;
 
 			err = bnx2_alloc_rx_page(bp, rxr,
-						 RX_PG_RING_IDX(pg_prod));
+						 RX_PG_RING_IDX(pg_prod),
+						 GFP_ATOMIC);
 			if (unlikely(err)) {
 				rxr->rx_pg_cons = pg_cons;
 				rxr->rx_pg_prod = pg_prod;
@@ -5179,7 +5180,7 @@ bnx2_init_rx_ring(struct bnx2 *bp, int ring_num)
 
 	ring_prod = prod = rxr->rx_pg_prod;
 	for (i = 0; i < bp->rx_pg_ring_size; i++) {
-		if (bnx2_alloc_rx_page(bp, rxr, ring_prod) < 0) {
+		if (bnx2_alloc_rx_page(bp, rxr, ring_prod, GFP_KERNEL) < 0) {
 			netdev_warn(bp->dev, "init'ed rx page ring %d with %d/%d pages only\n",
 				    ring_num, i, bp->rx_pg_ring_size);
 			break;
@@ -5191,7 +5192,7 @@ bnx2_init_rx_ring(struct bnx2 *bp, int ring_num)
 
 	ring_prod = prod = rxr->rx_prod;
 	for (i = 0; i < bp->rx_ring_size; i++) {
-		if (bnx2_alloc_rx_skb(bp, rxr, ring_prod) < 0) {
+		if (bnx2_alloc_rx_skb(bp, rxr, ring_prod, GFP_KERNEL) < 0) {
 			netdev_warn(bp->dev, "init'ed rx ring %d with %d/%d skbs only\n",
 				    ring_num, i, bp->rx_ring_size);
 			break;
-- 
1.7.1


^ permalink raw reply related

* [PATCH 2/2] bnx2: use device model DMA API
From: Stanislaw Gruszka @ 2010-07-15 14:25 UTC (permalink / raw)
  To: netdev; +Cc: Michael Chan
In-Reply-To: <20100715142530.12504.80404.send-patch@dhcp-lab-109.englab.brq.redhat.com>

Use DMA API as PCI equivalents will be deprecated. This change also allow
to allocate with GFP_KERNEL in some places.

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
---
 drivers/net/bnx2.c |  111 +++++++++++++++++++++++++++-------------------------
 1 files changed, 58 insertions(+), 53 deletions(-)

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 6de4cb7..98aed05 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -692,9 +692,9 @@ bnx2_free_tx_mem(struct bnx2 *bp)
 		struct bnx2_tx_ring_info *txr = &bnapi->tx_ring;
 
 		if (txr->tx_desc_ring) {
-			pci_free_consistent(bp->pdev, TXBD_RING_SIZE,
-					    txr->tx_desc_ring,
-					    txr->tx_desc_mapping);
+			dma_free_coherent(&bp->pdev->dev, TXBD_RING_SIZE,
+					  txr->tx_desc_ring,
+					  txr->tx_desc_mapping);
 			txr->tx_desc_ring = NULL;
 		}
 		kfree(txr->tx_buf_ring);
@@ -714,9 +714,9 @@ bnx2_free_rx_mem(struct bnx2 *bp)
 
 		for (j = 0; j < bp->rx_max_ring; j++) {
 			if (rxr->rx_desc_ring[j])
-				pci_free_consistent(bp->pdev, RXBD_RING_SIZE,
-						    rxr->rx_desc_ring[j],
-						    rxr->rx_desc_mapping[j]);
+				dma_free_coherent(&bp->pdev->dev, RXBD_RING_SIZE,
+						  rxr->rx_desc_ring[j],
+						  rxr->rx_desc_mapping[j]);
 			rxr->rx_desc_ring[j] = NULL;
 		}
 		vfree(rxr->rx_buf_ring);
@@ -724,9 +724,9 @@ bnx2_free_rx_mem(struct bnx2 *bp)
 
 		for (j = 0; j < bp->rx_max_pg_ring; j++) {
 			if (rxr->rx_pg_desc_ring[j])
-				pci_free_consistent(bp->pdev, RXBD_RING_SIZE,
-						    rxr->rx_pg_desc_ring[j],
-						    rxr->rx_pg_desc_mapping[j]);
+				dma_free_coherent(&bp->pdev->dev, RXBD_RING_SIZE,
+						  rxr->rx_pg_desc_ring[j],
+						  rxr->rx_pg_desc_mapping[j]);
 			rxr->rx_pg_desc_ring[j] = NULL;
 		}
 		vfree(rxr->rx_pg_ring);
@@ -748,8 +748,8 @@ bnx2_alloc_tx_mem(struct bnx2 *bp)
 			return -ENOMEM;
 
 		txr->tx_desc_ring =
-			pci_alloc_consistent(bp->pdev, TXBD_RING_SIZE,
-					     &txr->tx_desc_mapping);
+			dma_alloc_coherent(&bp->pdev->dev, TXBD_RING_SIZE,
+					   &txr->tx_desc_mapping, GFP_KERNEL);
 		if (txr->tx_desc_ring == NULL)
 			return -ENOMEM;
 	}
@@ -776,8 +776,10 @@ bnx2_alloc_rx_mem(struct bnx2 *bp)
 
 		for (j = 0; j < bp->rx_max_ring; j++) {
 			rxr->rx_desc_ring[j] =
-				pci_alloc_consistent(bp->pdev, RXBD_RING_SIZE,
-						     &rxr->rx_desc_mapping[j]);
+				dma_alloc_coherent(&bp->pdev->dev,
+						   RXBD_RING_SIZE,
+						   &rxr->rx_desc_mapping[j],
+						   GFP_KERNEL);
 			if (rxr->rx_desc_ring[j] == NULL)
 				return -ENOMEM;
 
@@ -795,8 +797,10 @@ bnx2_alloc_rx_mem(struct bnx2 *bp)
 
 		for (j = 0; j < bp->rx_max_pg_ring; j++) {
 			rxr->rx_pg_desc_ring[j] =
-				pci_alloc_consistent(bp->pdev, RXBD_RING_SIZE,
-						&rxr->rx_pg_desc_mapping[j]);
+				dma_alloc_coherent(&bp->pdev->dev,
+						   RXBD_RING_SIZE,
+						   &rxr->rx_pg_desc_mapping[j],
+						   GFP_KERNEL);
 			if (rxr->rx_pg_desc_ring[j] == NULL)
 				return -ENOMEM;
 
@@ -816,16 +820,16 @@ bnx2_free_mem(struct bnx2 *bp)
 
 	for (i = 0; i < bp->ctx_pages; i++) {
 		if (bp->ctx_blk[i]) {
-			pci_free_consistent(bp->pdev, BCM_PAGE_SIZE,
-					    bp->ctx_blk[i],
-					    bp->ctx_blk_mapping[i]);
+			dma_free_coherent(&bp->pdev->dev, BCM_PAGE_SIZE,
+					  bp->ctx_blk[i],
+					  bp->ctx_blk_mapping[i]);
 			bp->ctx_blk[i] = NULL;
 		}
 	}
 	if (bnapi->status_blk.msi) {
-		pci_free_consistent(bp->pdev, bp->status_stats_size,
-				    bnapi->status_blk.msi,
-				    bp->status_blk_mapping);
+		dma_free_coherent(&bp->pdev->dev, bp->status_stats_size,
+				  bnapi->status_blk.msi,
+				  bp->status_blk_mapping);
 		bnapi->status_blk.msi = NULL;
 		bp->stats_blk = NULL;
 	}
@@ -846,8 +850,8 @@ bnx2_alloc_mem(struct bnx2 *bp)
 	bp->status_stats_size = status_blk_size +
 				sizeof(struct statistics_block);
 
-	status_blk = pci_alloc_consistent(bp->pdev, bp->status_stats_size,
-					  &bp->status_blk_mapping);
+	status_blk = dma_alloc_coherent(&bp->pdev->dev, bp->status_stats_size,
+					&bp->status_blk_mapping, GFP_KERNEL);
 	if (status_blk == NULL)
 		goto alloc_mem_err;
 
@@ -885,9 +889,10 @@ bnx2_alloc_mem(struct bnx2 *bp)
 		if (bp->ctx_pages == 0)
 			bp->ctx_pages = 1;
 		for (i = 0; i < bp->ctx_pages; i++) {
-			bp->ctx_blk[i] = pci_alloc_consistent(bp->pdev,
+			bp->ctx_blk[i] = dma_alloc_coherent(&bp->pdev->dev,
 						BCM_PAGE_SIZE,
-						&bp->ctx_blk_mapping[i]);
+						&bp->ctx_blk_mapping[i],
+						GFP_KERNEL);
 			if (bp->ctx_blk[i] == NULL)
 				goto alloc_mem_err;
 		}
@@ -2674,9 +2679,9 @@ bnx2_alloc_rx_page(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, u16 index, gf
 
 	if (!page)
 		return -ENOMEM;
-	mapping = pci_map_page(bp->pdev, page, 0, PAGE_SIZE,
+	mapping = dma_map_page(&bp->pdev->dev, page, 0, PAGE_SIZE,
 			       PCI_DMA_FROMDEVICE);
-	if (pci_dma_mapping_error(bp->pdev, mapping)) {
+	if (dma_mapping_error(&bp->pdev->dev, mapping)) {
 		__free_page(page);
 		return -EIO;
 	}
@@ -2697,8 +2702,8 @@ bnx2_free_rx_page(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, u16 index)
 	if (!page)
 		return;
 
-	pci_unmap_page(bp->pdev, dma_unmap_addr(rx_pg, mapping), PAGE_SIZE,
-		       PCI_DMA_FROMDEVICE);
+	dma_unmap_page(&bp->pdev->dev, dma_unmap_addr(rx_pg, mapping),
+		       PAGE_SIZE, PCI_DMA_FROMDEVICE);
 
 	__free_page(page);
 	rx_pg->page = NULL;
@@ -2721,9 +2726,9 @@ bnx2_alloc_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, u16 index, gfp
 	if (unlikely((align = (unsigned long) skb->data & (BNX2_RX_ALIGN - 1))))
 		skb_reserve(skb, BNX2_RX_ALIGN - align);
 
-	mapping = pci_map_single(bp->pdev, skb->data, bp->rx_buf_use_size,
-		PCI_DMA_FROMDEVICE);
-	if (pci_dma_mapping_error(bp->pdev, mapping)) {
+	mapping = dma_map_single(&bp->pdev->dev, skb->data, bp->rx_buf_use_size,
+				 PCI_DMA_FROMDEVICE);
+	if (dma_mapping_error(&bp->pdev->dev, mapping)) {
 		dev_kfree_skb(skb);
 		return -EIO;
 	}
@@ -2829,7 +2834,7 @@ bnx2_tx_int(struct bnx2 *bp, struct bnx2_napi *bnapi, int budget)
 			}
 		}
 
-		pci_unmap_single(bp->pdev, dma_unmap_addr(tx_buf, mapping),
+		dma_unmap_single(&bp->pdev->dev, dma_unmap_addr(tx_buf, mapping),
 			skb_headlen(skb), PCI_DMA_TODEVICE);
 
 		tx_buf->skb = NULL;
@@ -2838,7 +2843,7 @@ bnx2_tx_int(struct bnx2 *bp, struct bnx2_napi *bnapi, int budget)
 		for (i = 0; i < last; i++) {
 			sw_cons = NEXT_TX_BD(sw_cons);
 
-			pci_unmap_page(bp->pdev,
+			dma_unmap_page(&bp->pdev->dev,
 				dma_unmap_addr(
 					&txr->tx_buf_ring[TX_RING_IDX(sw_cons)],
 					mapping),
@@ -2945,7 +2950,7 @@ bnx2_reuse_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr,
 	cons_rx_buf = &rxr->rx_buf_ring[cons];
 	prod_rx_buf = &rxr->rx_buf_ring[prod];
 
-	pci_dma_sync_single_for_device(bp->pdev,
+	dma_sync_single_for_device(&bp->pdev->dev,
 		dma_unmap_addr(cons_rx_buf, mapping),
 		BNX2_RX_OFFSET + BNX2_RX_COPY_THRESH, PCI_DMA_FROMDEVICE);
 
@@ -2987,7 +2992,7 @@ bnx2_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, struct sk_buff *skb,
 	}
 
 	skb_reserve(skb, BNX2_RX_OFFSET);
-	pci_unmap_single(bp->pdev, dma_addr, bp->rx_buf_use_size,
+	dma_unmap_single(&bp->pdev->dev, dma_addr, bp->rx_buf_use_size,
 			 PCI_DMA_FROMDEVICE);
 
 	if (hdr_len == 0) {
@@ -3049,7 +3054,7 @@ bnx2_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, struct sk_buff *skb,
 				return err;
 			}
 
-			pci_unmap_page(bp->pdev, mapping_old,
+			dma_unmap_page(&bp->pdev->dev, mapping_old,
 				       PAGE_SIZE, PCI_DMA_FROMDEVICE);
 
 			frag_size -= frag_len;
@@ -3120,7 +3125,7 @@ bnx2_rx_int(struct bnx2 *bp, struct bnx2_napi *bnapi, int budget)
 
 		dma_addr = dma_unmap_addr(rx_buf, mapping);
 
-		pci_dma_sync_single_for_cpu(bp->pdev, dma_addr,
+		dma_sync_single_for_cpu(&bp->pdev->dev, dma_addr,
 			BNX2_RX_OFFSET + BNX2_RX_COPY_THRESH,
 			PCI_DMA_FROMDEVICE);
 
@@ -5338,7 +5343,7 @@ bnx2_free_tx_skbs(struct bnx2 *bp)
 				continue;
 			}
 
-			pci_unmap_single(bp->pdev,
+			dma_unmap_single(&bp->pdev->dev,
 					 dma_unmap_addr(tx_buf, mapping),
 					 skb_headlen(skb),
 					 PCI_DMA_TODEVICE);
@@ -5349,7 +5354,7 @@ bnx2_free_tx_skbs(struct bnx2 *bp)
 			j++;
 			for (k = 0; k < last; k++, j++) {
 				tx_buf = &txr->tx_buf_ring[TX_RING_IDX(j)];
-				pci_unmap_page(bp->pdev,
+				dma_unmap_page(&bp->pdev->dev,
 					dma_unmap_addr(tx_buf, mapping),
 					skb_shinfo(skb)->frags[k].size,
 					PCI_DMA_TODEVICE);
@@ -5379,7 +5384,7 @@ bnx2_free_rx_skbs(struct bnx2 *bp)
 			if (skb == NULL)
 				continue;
 
-			pci_unmap_single(bp->pdev,
+			dma_unmap_single(&bp->pdev->dev,
 					 dma_unmap_addr(rx_buf, mapping),
 					 bp->rx_buf_use_size,
 					 PCI_DMA_FROMDEVICE);
@@ -5732,9 +5737,9 @@ bnx2_run_loopback(struct bnx2 *bp, int loopback_mode)
 	for (i = 14; i < pkt_size; i++)
 		packet[i] = (unsigned char) (i & 0xff);
 
-	map = pci_map_single(bp->pdev, skb->data, pkt_size,
-		PCI_DMA_TODEVICE);
-	if (pci_dma_mapping_error(bp->pdev, map)) {
+	map = dma_map_single(&bp->pdev->dev, skb->data, pkt_size,
+			     PCI_DMA_TODEVICE);
+	if (dma_mapping_error(&bp->pdev->dev, map)) {
 		dev_kfree_skb(skb);
 		return -EIO;
 	}
@@ -5772,7 +5777,7 @@ bnx2_run_loopback(struct bnx2 *bp, int loopback_mode)
 
 	udelay(5);
 
-	pci_unmap_single(bp->pdev, map, pkt_size, PCI_DMA_TODEVICE);
+	dma_unmap_single(&bp->pdev->dev, map, pkt_size, PCI_DMA_TODEVICE);
 	dev_kfree_skb(skb);
 
 	if (bnx2_get_hw_tx_cons(tx_napi) != txr->tx_prod)
@@ -5789,7 +5794,7 @@ bnx2_run_loopback(struct bnx2 *bp, int loopback_mode)
 	rx_hdr = rx_buf->desc;
 	skb_reserve(rx_skb, BNX2_RX_OFFSET);
 
-	pci_dma_sync_single_for_cpu(bp->pdev,
+	dma_sync_single_for_cpu(&bp->pdev->dev,
 		dma_unmap_addr(rx_buf, mapping),
 		bp->rx_buf_size, PCI_DMA_FROMDEVICE);
 
@@ -6457,8 +6462,8 @@ bnx2_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	} else
 		mss = 0;
 
-	mapping = pci_map_single(bp->pdev, skb->data, len, PCI_DMA_TODEVICE);
-	if (pci_dma_mapping_error(bp->pdev, mapping)) {
+	mapping = dma_map_single(&bp->pdev->dev, skb->data, len, PCI_DMA_TODEVICE);
+	if (dma_mapping_error(&bp->pdev->dev, mapping)) {
 		dev_kfree_skb(skb);
 		return NETDEV_TX_OK;
 	}
@@ -6486,9 +6491,9 @@ bnx2_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		txbd = &txr->tx_desc_ring[ring_prod];
 
 		len = frag->size;
-		mapping = pci_map_page(bp->pdev, frag->page, frag->page_offset,
-			len, PCI_DMA_TODEVICE);
-		if (pci_dma_mapping_error(bp->pdev, mapping))
+		mapping = dma_map_page(&bp->pdev->dev, frag->page, frag->page_offset,
+				       len, PCI_DMA_TODEVICE);
+		if (dma_mapping_error(&bp->pdev->dev, mapping))
 			goto dma_error;
 		dma_unmap_addr_set(&txr->tx_buf_ring[ring_prod], mapping,
 				   mapping);
@@ -6527,7 +6532,7 @@ dma_error:
 	ring_prod = TX_RING_IDX(prod);
 	tx_buf = &txr->tx_buf_ring[ring_prod];
 	tx_buf->skb = NULL;
-	pci_unmap_single(bp->pdev, dma_unmap_addr(tx_buf, mapping),
+	dma_unmap_single(&bp->pdev->dev, dma_unmap_addr(tx_buf, mapping),
 			 skb_headlen(skb), PCI_DMA_TODEVICE);
 
 	/* unmap remaining mapped pages */
@@ -6535,7 +6540,7 @@ dma_error:
 		prod = NEXT_TX_BD(prod);
 		ring_prod = TX_RING_IDX(prod);
 		tx_buf = &txr->tx_buf_ring[ring_prod];
-		pci_unmap_page(bp->pdev, dma_unmap_addr(tx_buf, mapping),
+		dma_unmap_page(&bp->pdev->dev, dma_unmap_addr(tx_buf, mapping),
 			       skb_shinfo(skb)->frags[i].size,
 			       PCI_DMA_TODEVICE);
 	}
-- 
1.7.1


^ permalink raw reply related

* Re: Question about way that NICs deliver packets to the kernel
From: Ben Hutchings @ 2010-07-15 14:33 UTC (permalink / raw)
  To: Junchang Wang; +Cc: romieu, netdev
In-Reply-To: <20100715142418.GA26491@host-a-229.ustcsz.edu.cn>

On Thu, 2010-07-15 at 22:24 +0800, Junchang Wang wrote:
> Hi list,
> My understand of the way that NICs deliver packets to the kernel is
> as follows. Correct me if any of this is wrong. Thanks.
> 
> 1) The device buffer is fixed. When the kernel is acknowledged arrival of a 
> new packet, it dynamically allocate a new skb and copy the packet into it. 
> For example, 8139too.
> 
> 2) The device buffer is mapped by streaming DMA. When the kernel is 
> acknowledged arrival of a new packet, it unmaps the region previously mapped. 
> Obviously, there is NO memcpy operation. Additional cost is streaming DMA 
> map/unmap operations. For example, e100 and e1000.
> 
> Here comes my question:
> 1) Is there a principle indicating which one is better? Is streaming DMA
> map/unmap operations more expensive than memcpy operation?

DMA should result in lower CPU usage and higher maximum performance.

> 2) Why does r8169 bias towards the first approach even if it support both? I 
> convert r8169 to the second one and get a 5% performance boost. Below is result
> running netperf TCP_STREAM test with 1.6K byte packet length.
>         scheme 1    scheme 2    Imp.
> r8169     683M        718M       5%
[...]

You should also compare the CPU usage.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [PATCH 1/2] bnx2: allocate with GFP_KERNEL flag on RX path init
From: Michael Chan @ 2010-07-15 14:48 UTC (permalink / raw)
  To: 'Stanislaw Gruszka', netdev@vger.kernel.org
In-Reply-To: <20100715142537.12504.60051.send-patch@dhcp-lab-109.englab.brq.redhat.com>

Stanislaw Gruszka wrote:

> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
> ---
> @@ -2974,7 +2974,7 @@ bnx2_rx_skb(struct bnx2 *bp, struct
> bnx2_rx_ring_info *rxr, struct sk_buff *skb,
>       int err;
>       u16 prod = ring_idx & 0xffff;
>
> -     err = bnx2_alloc_rx_skb(bp, rxr, prod);
> +     err = bnx2_alloc_rx_skb(bp, rxr, prod, GFP_KERNEL);

This should be GFP_ATOMIC since it is called from NAPI softirq
context.



^ permalink raw reply

* Re: [PATCH 2/2] netfilter: nfnetlink_queue: add definition of aligned_be64 for user-space apps
From: Patrick McHardy @ 2010-07-15 15:07 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, NetDev
In-Reply-To: <4C3EF819.8080801@netfilter.org>

Am 15.07.2010 13:59, schrieb Pablo Neira Ayuso:
> On 15/07/10 11:19, Patrick McHardy wrote:
>> Am 12.07.2010 19:00, schrieb Pablo Neira Ayuso:
>>> Currently, libnl and libnetfilter_queue include in one of their
>>> user-space header files an ad-hoc definition of aligned_be64.
>>> However, applications that use the BSD socket API to communicate
>>> via Netlink sockets (ie. those that do not use these libraries)
>>> would need to define this type by hand if they include the
>>> kernel-space header nfnetlink_queue.h.
>>>
>>> This patch adds the definition of aligned_bed64 for user-space
>>> applications in the kernel header. Otherwise, they have to define
>>> it to avoid the following compilation problem:
>>>
>>> /usr/include/linux/netfilter/nfnetlink_queue.h:28: error: expected specifier-qualifier-list before ‘aligned_be64’
>>
>> Why can't these applications simply include linux/types.h?
> 
> Including it doesn't fix the problem here:
> 
> #include <linux/types.h>
> #include <linux/netfilter/nfnetlink_queue.h>
> 
> I still get:
> 
> /usr/include/linux/netfilter/nfnetlink_queue.h:28: error: expected
> specifier-qualifier-list before ‘aligned_be64’
> 
> aligned_be64 is only define in the kernel (it's included under the
> __KERNEL__ definition).
>

In that case I think we should export a __aligned_be64 in types.h
and use that instead of aligned_be64.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] netfilter: add CHECKSUM target
From: Patrick McHardy @ 2010-07-15 15:17 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	James Morris, Hideaki YOSHIFUJI, linux-kernel, netfilter-devel,
	netfilter, coreteam, netdev, herbert.xu, kvm
In-Reply-To: <20100711104705.GA18017@redhat.com>

Am 11.07.2010 12:47, schrieb Michael S. Tsirkin:
> On Fri, Jul 09, 2010 at 05:17:36PM +0200, Patrick McHardy wrote:
>> Am 09.07.2010 00:29, schrieb Michael S. Tsirkin:
>>> This adds a `CHECKSUM' target, which can be used in the iptables mangle
>>> table.
>>>
>>> You can use this target to compute and fill in the checksum in
>>> an IP packet that lacks a checksum.  This is particularly useful,
>>> if you need to work around old applications such as dhcp clients,
>>> that do not work well with checksum offloads, but don't want to
>>> disable checksum offload in your device.
>>>
>>> The problem happens in the field with virtualized applications.
>>> For reference, see Red Hat bz 605555, as well as
>>> http://www.spinics.net/lists/kvm/msg37660.html
>>>
>>> Typical expected use (helps old dhclient binary running in a VM):
>>> iptables -A POSTROUTING -t mangle -p udp --dport 68 -j CHECKSUM
>>> --checksum-fill
>>
>> I'm not sure this is something we want to merge upstream and
>> support indefinitely. Dave suggested this as a temporary
>> out-of-tree workaround until the majority of guest dhcp clients
>> are fixed. Has anything changed that makes this course of
>> action impractical?
> 
> If I understand what Dave said correctly, it's up to you ...
> 
> The arguments for putting this upstream are:
> 
> Given the track record, I wouldn't hope for quick fix in the majority of
> guest dhcp clients, unfortunately :(.  We are talking years here.
> Even after that, one of the uses of virtualization is
> to keep old guests running. So yes, I think we'll
> keep using work-arounds for this for a very long time.
> 
> Further, since we have to add the module and we have to teach management
> to program it, it will be much less painful for everyone
> involved if we can put the code upstream, rather than forking
> management code.

Fair enough, its simple enough that I don't expect much maintenance
overhead.

^ permalink raw reply

* Re: [PATCHv4] netfilter: add CHECKSUM target
From: Patrick McHardy @ 2010-07-15 15:22 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Daniel P. Berrange, Jes Sorensen, David S. Miller, Jan Engelhardt,
	Randy Dunlap, netfilter-devel, netfilter, coreteam, linux-kernel,
	netdev
In-Reply-To: <20100715115217.GA6737@redhat.com>

Am 15.07.2010 13:52, schrieb Michael S. Tsirkin:
> This adds a `CHECKSUM' target, which can be used in the iptables mangle
> table.
> 
> You can use this target to compute and fill in the checksum in
> a packet that lacks a checksum.  This is particularly useful,
> if you need to work around old applications such as dhcp clients,
> that do not work well with checksum offloads, but don't want to
> disable checksum offload in your device.
> 
> The problem happens in the field with virtualized applications.
> For reference, see Red Hat bz 605555, as well as
> http://www.spinics.net/lists/kvm/msg37660.html
> 
> Typical expected use (helps old dhclient binary running in a VM):
> iptables -A POSTROUTING -t mangle -p udp --dport bootpc \
> 	-j CHECKSUM --checksum-fill
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> Includes fixes by Jan Engelhardt <jengelh@medozas.de>

Applied, thanks Michael.

^ permalink raw reply

* Re: [PATCHv4] extensions: libxt_CHECKSUM extension
From: Patrick McHardy @ 2010-07-15 15:24 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Daniel P. Berrange, Jes Sorensen, David S. Miller, Jan Engelhardt,
	Randy Dunlap, netfilter-devel, netfilter, coreteam, linux-kernel,
	netdev
In-Reply-To: <20100715115229.GB6737@redhat.com>

Am 15.07.2010 13:52, schrieb Michael S. Tsirkin:
> This adds a `CHECKSUM' target, which can be used in the iptables mangle
> table.
> 
> You can use this target to compute and fill in the checksum in
> a packet that lacks a checksum.  This is particularly useful,
> if you need to work around old applications such as dhcp clients,
> that do not work well with checksum offloads, but don't want to disable
> checksum offload in your device.
> 
> The problem happens in the field with virtualized applications.
> For reference, see Red Hat bz 605555, as well as
> http://www.spinics.net/lists/kvm/msg37660.html
> 
> Typical expected use (helps old dhclient binary running in a VM):
> iptables -A POSTROUTING -t mangle -p udp --dport bootpc \
> -j CHECKSUM --checksum-fill

Applied to iptables-next, thanks.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox