Netdev List

Netdev List
 help / color / mirror / Atom feed

* [Patch] atl1c: Add missing PCI device ID
From: Chuck Ebbert @ 2011-02-02 15:59 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Jie Yang


Commit 8f574b35f22fbb9b5e5f1d11ad6b55b6f35f4533 added support for a new
adapter but failed to add it to the PCI device table.

Signed-Off-By: Chuck Ebbert <cebbert@redhat.com>

--- vanilla-2.6.38-rc3.orig/drivers/net/atl1c/atl1c_main.c
+++ vanilla-2.6.38-rc3/drivers/net/atl1c/atl1c_main.c
@@ -48,6 +48,7 @@ static DEFINE_PCI_DEVICE_TABLE(atl1c_pci
 	{PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATHEROS_L2C_B)},
 	{PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATHEROS_L2C_B2)},
 	{PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATHEROS_L1D)},
+	{PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATHEROS_L1D_2_0)},
 	/* required last entry */
 	{ 0 }
 };

^ permalink raw reply

* [patch 0/6] [resend] s390: network patches for net-next
From: frank.blaschka @ 2011-02-02 16:04 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390

Hi Dave,

I have removed the NET_SKB_PAD patch from the patch set.
We will define it in an arch specific header. The new
version of the patch will come via the s390 tree. Thx!

shortlog:
Ursula Braun (3)
qeth: show new mac-address if its setting fails
qeth: allow HiperSockets framesize change in suspend
qeth: allow OSA CHPARM change in suspend state

Frank Blaschka (1)
qeth: add more strict MTU checking

Stefan Weil (2)
s390: Fix wrong size in memcmp (netiucv)
s390: Fix possibly wrong size in strncmp (smsgiucv)

Thanks,
        Frank


^ permalink raw reply

* [patch 2/6] [PATCH] qeth: add more strict MTU checking
From: frank.blaschka @ 2011-02-02 16:04 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390
In-Reply-To: <20110202160430.554090958@de.ibm.com>

[-- Attachment #1: 605-qeth-mtu-checking.diff --]
[-- Type: text/plain, Size: 2285 bytes --]

From: Frank Blaschka <frank.blaschka@de.ibm.com>

HiperSockets and OSA hardware report a maximum MTU size. Add checking
to reject larger MTUs than allowed by hardware.

Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
---


 drivers/s390/net/qeth_core_main.c |   35 ++++-------------------------------
 1 file changed, 4 insertions(+), 31 deletions(-)
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -1832,33 +1832,6 @@ static inline int qeth_get_initial_mtu_f
 	}
 }
 
-static inline int qeth_get_max_mtu_for_card(int cardtype)
-{
-	switch (cardtype) {
-
-	case QETH_CARD_TYPE_UNKNOWN:
-	case QETH_CARD_TYPE_OSD:
-	case QETH_CARD_TYPE_OSN:
-	case QETH_CARD_TYPE_OSM:
-	case QETH_CARD_TYPE_OSX:
-		return 61440;
-	case QETH_CARD_TYPE_IQD:
-		return 57344;
-	default:
-		return 1500;
-	}
-}
-
-static inline int qeth_get_mtu_out_of_mpc(int cardtype)
-{
-	switch (cardtype) {
-	case QETH_CARD_TYPE_IQD:
-		return 1;
-	default:
-		return 0;
-	}
-}
-
 static inline int qeth_get_mtu_outof_framesize(int framesize)
 {
 	switch (framesize) {
@@ -1881,10 +1854,9 @@ static inline int qeth_mtu_is_valid(stru
 	case QETH_CARD_TYPE_OSD:
 	case QETH_CARD_TYPE_OSM:
 	case QETH_CARD_TYPE_OSX:
-		return ((mtu >= 576) && (mtu <= 61440));
 	case QETH_CARD_TYPE_IQD:
 		return ((mtu >= 576) &&
-			(mtu <= card->info.max_mtu + 4096 - 32));
+			(mtu <= card->info.max_mtu));
 	case QETH_CARD_TYPE_OSN:
 	case QETH_CARD_TYPE_UNKNOWN:
 	default:
@@ -1907,7 +1879,7 @@ static int qeth_ulp_enable_cb(struct qet
 	memcpy(&card->token.ulp_filter_r,
 	       QETH_ULP_ENABLE_RESP_FILTER_TOKEN(iob->data),
 	       QETH_MPC_TOKEN_LENGTH);
-	if (qeth_get_mtu_out_of_mpc(card->info.type)) {
+	if (card->info.type == QETH_CARD_TYPE_IQD) {
 		memcpy(&framesize, QETH_ULP_ENABLE_RESP_MAX_MTU(iob->data), 2);
 		mtu = qeth_get_mtu_outof_framesize(framesize);
 		if (!mtu) {
@@ -1920,7 +1892,8 @@ static int qeth_ulp_enable_cb(struct qet
 		card->qdio.in_buf_size = mtu + 2 * PAGE_SIZE;
 	} else {
 		card->info.initial_mtu = qeth_get_initial_mtu_for_card(card);
-		card->info.max_mtu = qeth_get_max_mtu_for_card(card->info.type);
+		card->info.max_mtu = *(__u16 *)QETH_ULP_ENABLE_RESP_MAX_MTU(
+			iob->data);
 		card->qdio.in_buf_size = QETH_IN_BUF_SIZE_DEFAULT;
 	}
 


^ permalink raw reply

* [patch 1/6] [PATCH] qeth: show new mac-address if its setting fails
From: frank.blaschka @ 2011-02-02 16:04 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, Ursula Braun
In-Reply-To: <20110202160430.554090958@de.ibm.com>

[-- Attachment #1: 604-qeth-mac-address.diff --]
[-- Type: text/plain, Size: 1152 bytes --]

From: Ursula Braun <ursula.braun@de.ibm.com>

Setting of a MAC-address may fail because an already used MAC-address
is to bet set or because of authorization problems. In those cases
qeth issues a message, but the mentioned MAC-address is not the
new MAC-address to be set, but the actual MAC-address. This patch
chooses now the new MAC-address to be set for the error messages.

Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
---


 drivers/s390/net/qeth_l2_main.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/s390/net/qeth_l2_main.c
+++ b/drivers/s390/net/qeth_l2_main.c
@@ -573,13 +573,13 @@ static int qeth_l2_send_setmac_cb(struct
 		case IPA_RC_L2_DUP_LAYER3_MAC:
 			dev_warn(&card->gdev->dev,
 				"MAC address %pM already exists\n",
-				card->dev->dev_addr);
+				cmd->data.setdelmac.mac);
 			break;
 		case IPA_RC_L2_MAC_NOT_AUTH_BY_HYP:
 		case IPA_RC_L2_MAC_NOT_AUTH_BY_ADP:
 			dev_warn(&card->gdev->dev,
 				"MAC address %pM is not authorized\n",
-				card->dev->dev_addr);
+				cmd->data.setdelmac.mac);
 			break;
 		default:
 			break;


^ permalink raw reply

* [patch 3/6] [PATCH] qeth: allow HiperSockets framesize change in suspend
From: frank.blaschka @ 2011-02-02 16:04 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, Ursula Braun
In-Reply-To: <20110202160430.554090958@de.ibm.com>

[-- Attachment #1: 607-qeth-chparm-change.diff --]
[-- Type: text/plain, Size: 1268 bytes --]

From: Ursula Braun <ursula.braun@de.ibm.com>

For HiperSockets the framesize-definition determines the selected
mtu-size and the size of the allocated qdio buffers.
A framesize-change may occur while a Linux system with probed
HiperSockets device is in suspend state. This patch enables proper
resuming of a HiperSockets device in this case.

Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
---


 drivers/s390/net/qeth_core_main.c |   10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -1887,8 +1887,16 @@ static int qeth_ulp_enable_cb(struct qet
 			QETH_DBF_TEXT_(SETUP, 2, "  rc%d", iob->rc);
 			return 0;
 		}
-		card->info.max_mtu = mtu;
+		if (card->info.initial_mtu && (card->info.initial_mtu != mtu)) {
+			/* frame size has changed */
+			if (card->dev &&
+			    ((card->dev->mtu == card->info.initial_mtu) ||
+			     (card->dev->mtu > mtu)))
+				card->dev->mtu = mtu;
+			qeth_free_qdio_buffers(card);
+		}
 		card->info.initial_mtu = mtu;
+		card->info.max_mtu = mtu;
 		card->qdio.in_buf_size = mtu + 2 * PAGE_SIZE;
 	} else {
 		card->info.initial_mtu = qeth_get_initial_mtu_for_card(card);


^ permalink raw reply

* [patch 5/6] [PATCH] s390: Fix wrong size in memcmp (netiucv)
From: frank.blaschka @ 2011-02-02 16:04 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, Stefan Weil
In-Reply-To: <20110202160430.554090958@de.ibm.com>

[-- Attachment #1: 609-netiucv-memcmp-size.diff --]
[-- Type: text/plain, Size: 850 bytes --]

From: Stefan Weil <weil@mail.berlios.de>

This error was reported by cppcheck:
drivers/s390/net/netiucv.c:568: error: Using sizeof for array given
as function argument returns the size of pointer.

sizeof(ipuser) did not result in 16 (as many programmers would have
expected) but sizeof(u8 *), so it is 4 or 8, too small here.

Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
---


 drivers/s390/net/netiucv.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/s390/net/netiucv.c
+++ b/drivers/s390/net/netiucv.c
@@ -565,7 +565,7 @@ static int netiucv_callback_connreq(stru
 	struct iucv_event ev;
 	int rc;
 
-	if (memcmp(iucvMagic, ipuser, sizeof(ipuser)))
+	if (memcmp(iucvMagic, ipuser, 16))
 		/* ipuser must match iucvMagic. */
 		return -EINVAL;
 	rc = -EINVAL;


^ permalink raw reply

* [patch 4/6] [PATCH] qeth: allow OSA CHPARM change in suspend state
From: frank.blaschka @ 2011-02-02 16:04 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, Ursula braun
In-Reply-To: <20110202160430.554090958@de.ibm.com>

[-- Attachment #1: 608-qeth-osa-chparm-change.diff --]
[-- Type: text/plain, Size: 4525 bytes --]

From: Ursula Braun <ursula.braun@de.ibm.com>

For OSA the CHPARM-definition determines the number of available
outbound queues.
A CHPARM-change may occur while a Linux system with probed
OSA device is in suspend state. This patch enables proper
resuming of an OSA device in this case.

Signed-off-by: Ursula braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
---


 drivers/s390/net/qeth_core_main.c |  104 +++++++++++++++++++++++---------------
 1 file changed, 63 insertions(+), 41 deletions(-)
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -988,16 +988,30 @@ static void qeth_get_channel_path_desc(s
 	chp_dsc = (struct channelPath_dsc *)ccw_device_get_chp_desc(ccwdev, 0);
 	if (chp_dsc != NULL) {
 		/* CHPP field bit 6 == 1 -> single queue */
-		if ((chp_dsc->chpp & 0x02) == 0x02)
+		if ((chp_dsc->chpp & 0x02) == 0x02) {
+			if ((atomic_read(&card->qdio.state) !=
+				QETH_QDIO_UNINITIALIZED) &&
+			    (card->qdio.no_out_queues == 4))
+				/* change from 4 to 1 outbound queues */
+				qeth_free_qdio_buffers(card);
 			card->qdio.no_out_queues = 1;
+			if (card->qdio.default_out_queue != 0)
+				dev_info(&card->gdev->dev,
+					"Priority Queueing not supported\n");
+			card->qdio.default_out_queue = 0;
+		} else {
+			if ((atomic_read(&card->qdio.state) !=
+				QETH_QDIO_UNINITIALIZED) &&
+			    (card->qdio.no_out_queues == 1)) {
+				/* change from 1 to 4 outbound queues */
+				qeth_free_qdio_buffers(card);
+				card->qdio.default_out_queue = 2;
+			}
+			card->qdio.no_out_queues = 4;
+		}
 		card->info.func_level = 0x4100 + chp_dsc->desc;
 		kfree(chp_dsc);
 	}
-	if (card->qdio.no_out_queues == 1) {
-		card->qdio.default_out_queue = 0;
-		dev_info(&card->gdev->dev,
-			"Priority Queueing not supported\n");
-	}
 	QETH_DBF_TEXT_(SETUP, 2, "nr:%x", card->qdio.no_out_queues);
 	QETH_DBF_TEXT_(SETUP, 2, "lvl:%02x", card->info.func_level);
 	return;
@@ -3756,6 +3770,47 @@ static inline int qeth_get_qdio_q_format
 	}
 }
 
+static void qeth_determine_capabilities(struct qeth_card *card)
+{
+	int rc;
+	int length;
+	char *prcd;
+	struct ccw_device *ddev;
+	int ddev_offline = 0;
+
+	QETH_DBF_TEXT(SETUP, 2, "detcapab");
+	ddev = CARD_DDEV(card);
+	if (!ddev->online) {
+		ddev_offline = 1;
+		rc = ccw_device_set_online(ddev);
+		if (rc) {
+			QETH_DBF_TEXT_(SETUP, 2, "3err%d", rc);
+			goto out;
+		}
+	}
+
+	rc = qeth_read_conf_data(card, (void **) &prcd, &length);
+	if (rc) {
+		QETH_DBF_MESSAGE(2, "%s qeth_read_conf_data returned %i\n",
+			dev_name(&card->gdev->dev), rc);
+		QETH_DBF_TEXT_(SETUP, 2, "5err%d", rc);
+		goto out_offline;
+	}
+	qeth_configure_unitaddr(card, prcd);
+	qeth_configure_blkt_default(card, prcd);
+	kfree(prcd);
+
+	rc = qdio_get_ssqd_desc(ddev, &card->ssqd);
+	if (rc)
+		QETH_DBF_TEXT_(SETUP, 2, "6err%d", rc);
+
+out_offline:
+	if (ddev_offline == 1)
+		ccw_device_set_offline(ddev);
+out:
+	return;
+}
+
 static int qeth_qdio_establish(struct qeth_card *card)
 {
 	struct qdio_initialize init_data;
@@ -3886,6 +3941,7 @@ int qeth_core_hardsetup_card(struct qeth
 
 	QETH_DBF_TEXT(SETUP, 2, "hrdsetup");
 	atomic_set(&card->force_alloc_skb, 0);
+	qeth_get_channel_path_desc(card);
 retry:
 	if (retries)
 		QETH_DBF_MESSAGE(2, "%s Retrying to do IDX activates.\n",
@@ -3914,6 +3970,7 @@ retriable:
 		else
 			goto retry;
 	}
+	qeth_determine_capabilities(card);
 	qeth_init_tokens(card);
 	qeth_init_func_level(card);
 	rc = qeth_idx_activate_channel(&card->read, qeth_idx_read_cb);
@@ -4183,41 +4240,6 @@ void qeth_core_free_discipline(struct qe
 	card->discipline.ccwgdriver = NULL;
 }
 
-static void qeth_determine_capabilities(struct qeth_card *card)
-{
-	int rc;
-	int length;
-	char *prcd;
-
-	QETH_DBF_TEXT(SETUP, 2, "detcapab");
-	rc = ccw_device_set_online(CARD_DDEV(card));
-	if (rc) {
-		QETH_DBF_TEXT_(SETUP, 2, "3err%d", rc);
-		goto out;
-	}
-
-
-	rc = qeth_read_conf_data(card, (void **) &prcd, &length);
-	if (rc) {
-		QETH_DBF_MESSAGE(2, "%s qeth_read_conf_data returned %i\n",
-			dev_name(&card->gdev->dev), rc);
-		QETH_DBF_TEXT_(SETUP, 2, "5err%d", rc);
-		goto out_offline;
-	}
-	qeth_configure_unitaddr(card, prcd);
-	qeth_configure_blkt_default(card, prcd);
-	kfree(prcd);
-
-	rc = qdio_get_ssqd_desc(CARD_DDEV(card), &card->ssqd);
-	if (rc)
-		QETH_DBF_TEXT_(SETUP, 2, "6err%d", rc);
-
-out_offline:
-	ccw_device_set_offline(CARD_DDEV(card));
-out:
-	return;
-}
-
 static int qeth_core_probe_device(struct ccwgroup_device *gdev)
 {
 	struct qeth_card *card;


^ permalink raw reply

* [patch 6/6] [PATCH] s390: Fix possibly wrong size in strncmp (smsgiucv)
From: frank.blaschka @ 2011-02-02 16:04 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, Stefan Weil
In-Reply-To: <20110202160430.554090958@de.ibm.com>

[-- Attachment #1: 610-smsgiucv-strncmp-size.diff --]
[-- Type: text/plain, Size: 951 bytes --]

From: Stefan Weil <weil@mail.berlios.de>

This error was reported by cppcheck:
drivers/s390/net/smsgiucv.c:63: error: Using sizeof for array given as
function argument returns the size of pointer.

Although there is no runtime problem as long as sizeof(u8 *) == 8,
this misleading code should get fixed.

Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
---


 drivers/s390/net/smsgiucv.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/s390/net/smsgiucv.c
+++ b/drivers/s390/net/smsgiucv.c
@@ -60,7 +60,7 @@ static struct iucv_handler smsg_handler
 static int smsg_path_pending(struct iucv_path *path, u8 ipvmid[8],
 			     u8 ipuser[16])
 {
-	if (strncmp(ipvmid, "*MSG    ", sizeof(ipvmid)) != 0)
+	if (strncmp(ipvmid, "*MSG    ", 8) != 0)
 		return -EINVAL;
 	/* Path pending from *MSG. */
 	return iucv_path_accept(path, &smsg_handler, "SMSGIUCV        ", NULL);


^ permalink raw reply

* LOAN
From: steve lindsey @ 2011-02-03  8:19 UTC (permalink / raw)


Are you in need of financial help?, you need a loan for your business or to solve
other monetary issues.We offer loans with a fixed interest rate of 2% within
minimum range of $5,000.00 to maximum of $200,000,000.00. Interested applicants 
should submit their request via email (stevelindsey112@kkwl.ac.th) for immediate 
processing with the information listed below:

Names in full:
Address:
Gender:
Phone Number:
Amount Required:
Loan Duration:
Country:

Best Regard:Steve lindsey
Email:stevelindsey112@kkwl.ac.th

^ permalink raw reply

* [PATCH] ipsec: allow to align IPv4 AH on 32 bits
From: Nicolas Dichtel @ 2011-02-02 16:29 UTC (permalink / raw)
  To: David Miller; +Cc: herbert, netdev, christophe.gouault
In-Reply-To: <20110128.114610.193716130.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 388 bytes --]

On 28/01/2011 20:46, David Miller wrote:
> From: Nicolas Dichtel<nicolas.dichtel@6wind.com>
> Date: Fri, 28 Jan 2011 09:51:40 +0100
>
>> On 28/01/2011 05:51, Herbert Xu wrote:
>>> So perhaps an SA configuration flag is needed?
>> I agree. If David is ok, I will update the patch.
>
> Sounds good to me.

Here is the new patch.

The patch for iproute2 in the next email.

Regards,
Nicolas

[-- Attachment #2: 0001-ipsec-allow-to-align-IPv4-AH-on-32-bits.patch --]
[-- Type: text/x-patch, Size: 3736 bytes --]

>From 1772aa5401b24f4cb4cc10b038becdeb2c687531 Mon Sep 17 00:00:00 2001
From: Dang Hongwu <hongwu.dang@6wind.com>
Date: Wed, 22 Dec 2010 11:38:47 -0500
Subject: [PATCH] ipsec: allow to align IPv4 AH on 32 bits

The Linux IPv4 AH stack aligns the AH header on a 64 bit boundary
(like in IPv6). This is not RFC compliant (see RFC4302, Section
3.3.3.2.1), it should be aligned on 32 bits.

For most of the authentication algorithms, the ICV size is 96 bits.
The AH header alignment on 32 or 64 bits gives the same results.

However for SHA-256-128 for instance, the wrong 64 bit alignment results
in adding useless padding in IPv4 AH, which is forbidden by the RFC.

To avoid breaking backward compatibility, we use a new flag
(XFRM_STATE_ALIGN4) do change original behavior.

Initial patch from Dang Hongwu <hongwu.dang@6wind.com> and
Christophe Gouault <christophe.gouault@6wind.com>.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 include/linux/xfrm.h |    1 +
 include/net/xfrm.h   |    1 +
 net/ipv4/ah4.c       |   25 +++++++++++++++++++------
 3 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/include/linux/xfrm.h b/include/linux/xfrm.h
index 930fdd2..b93d6f5 100644
--- a/include/linux/xfrm.h
+++ b/include/linux/xfrm.h
@@ -350,6 +350,7 @@ struct xfrm_usersa_info {
 #define XFRM_STATE_WILDRECV	8
 #define XFRM_STATE_ICMP		16
 #define XFRM_STATE_AF_UNSPEC	32
+#define XFRM_STATE_ALIGN4	64
 };
 
 struct xfrm_usersa_id {
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index b9f385d..1f6e8a0 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -36,6 +36,7 @@
 #define XFRM_PROTO_ROUTING	IPPROTO_ROUTING
 #define XFRM_PROTO_DSTOPTS	IPPROTO_DSTOPTS
 
+#define XFRM_ALIGN4(len)	(((len) + 3) & ~3)
 #define XFRM_ALIGN8(len)	(((len) + 7) & ~7)
 #define MODULE_ALIAS_XFRM_MODE(family, encap) \
 	MODULE_ALIAS("xfrm-mode-" __stringify(family) "-" __stringify(encap))
diff --git a/net/ipv4/ah4.c b/net/ipv4/ah4.c
index 86961be..325053d 100644
--- a/net/ipv4/ah4.c
+++ b/net/ipv4/ah4.c
@@ -201,7 +201,10 @@ static int ah_output(struct xfrm_state *x, struct sk_buff *skb)
 	top_iph->ttl = 0;
 	top_iph->check = 0;
 
-	ah->hdrlen  = (XFRM_ALIGN8(sizeof(*ah) + ahp->icv_trunc_len) >> 2) - 2;
+	if (x->props.flags & XFRM_STATE_ALIGN4)
+		ah->hdrlen  = (XFRM_ALIGN4(sizeof(*ah) + ahp->icv_trunc_len) >> 2) - 2;
+	else
+		ah->hdrlen  = (XFRM_ALIGN8(sizeof(*ah) + ahp->icv_trunc_len) >> 2) - 2;
 
 	ah->reserved = 0;
 	ah->spi = x->id.spi;
@@ -299,9 +302,15 @@ static int ah_input(struct xfrm_state *x, struct sk_buff *skb)
 	nexthdr = ah->nexthdr;
 	ah_hlen = (ah->hdrlen + 2) << 2;
 
-	if (ah_hlen != XFRM_ALIGN8(sizeof(*ah) + ahp->icv_full_len) &&
-	    ah_hlen != XFRM_ALIGN8(sizeof(*ah) + ahp->icv_trunc_len))
-		goto out;
+	if (x->props.flags & XFRM_STATE_ALIGN4) {
+		if (ah_hlen != XFRM_ALIGN4(sizeof(*ah) + ahp->icv_full_len) &&
+		    ah_hlen != XFRM_ALIGN4(sizeof(*ah) + ahp->icv_trunc_len))
+			goto out;
+	} else {
+		if (ah_hlen != XFRM_ALIGN8(sizeof(*ah) + ahp->icv_full_len) &&
+		    ah_hlen != XFRM_ALIGN8(sizeof(*ah) + ahp->icv_trunc_len))
+			goto out;
+	}
 
 	if (!pskb_may_pull(skb, ah_hlen))
 		goto out;
@@ -450,8 +459,12 @@ static int ah_init_state(struct xfrm_state *x)
 
 	BUG_ON(ahp->icv_trunc_len > MAX_AH_AUTH_LEN);
 
-	x->props.header_len = XFRM_ALIGN8(sizeof(struct ip_auth_hdr) +
-					  ahp->icv_trunc_len);
+	if (x->props.flags & XFRM_STATE_ALIGN4)
+		x->props.header_len = XFRM_ALIGN4(sizeof(struct ip_auth_hdr) +
+						  ahp->icv_trunc_len);
+	else
+		x->props.header_len = XFRM_ALIGN8(sizeof(struct ip_auth_hdr) +
+						  ahp->icv_trunc_len);
 	if (x->props.mode == XFRM_MODE_TUNNEL)
 		x->props.header_len += sizeof(struct iphdr);
 	x->data = ahp;
-- 
1.5.6.5


^ permalink raw reply related

* Re: [PATCH] ipv4: Remove fib_hash.
From: Stephen Hemminger @ 2011-02-02 16:29 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, netdev
In-Reply-To: <20110201.181542.193701016.davem@davemloft.net>

On Tue, 01 Feb 2011 18:15:42 -0800 (PST)
David Miller <davem@davemloft.net> wrote:

> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 02 Feb 2011 01:48:13 +0100
> 
> > Hmm... I know having to maintain two implementations is time consuming,
> > but I know fib_trie is bigger :
> > 
> > # size net/ipv4/fib_*.o
> >    text	   data	    bss	    dec	    hex	filename
> >    7252	    120	      0	   7372	   1ccc	net/ipv4/fib_frontend.o
> >    7279	     16	      4	   7299	   1c83	net/ipv4/fib_hash.o
> >    1479	      0	      0	   1479	    5c7	net/ipv4/fib_rules.o
> >    7885	      0	   2080	   9965	   26ed	net/ipv4/fib_semantics.o
> >   16222	     16	     16	  16254	   3f7e	net/ipv4/fib_trie.o
> > 
> > In my tests, I know that fib_trie is more expensive for typical routing
> > tables for hosts (no more than a dozen or entries), in latencies
> > results, mostly because of icache misses, but also dcache ones.
> 
> It's mostly the rebalancing code that takes up the space.
> 
> The lookup path is on the same order of magnitude as the fib hash
> stuff was.
> 
> In any event, we have several months to hash out any regressions and I
> think if I didn't do this removal nobody would work on it so... :-)

For the case of small devices, what about keeping fib_hash as option
under CONFIG_EMBEDDED. And remove all the magic resizing of hash table.
Get back to something with really small size.

-- 

^ permalink raw reply

* [PATCH] iproute2: allow to specify truncation bits on auth algo
From: Nicolas Dichtel @ 2011-02-02 16:30 UTC (permalink / raw)
  To: David Miller; +Cc: herbert, netdev, christophe.gouault
In-Reply-To: <20110128.114610.193716130.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 350 bytes --]

On 28/01/2011 20:46, David Miller wrote:
> From: Nicolas Dichtel<nicolas.dichtel@6wind.com>
> Date: Fri, 28 Jan 2011 09:51:40 +0100
>
>> On 28/01/2011 05:51, Herbert Xu wrote:
>>> So perhaps an SA configuration flag is needed?
>> I agree. If David is ok, I will update the patch.
>
> Sounds good to me.
And the patch for iproute2.


Regards,
Nicolas

[-- Attachment #2: 0001-iproute2-allow-to-specify-truncation-bits-on-auth-a.patch --]
[-- Type: text/x-patch, Size: 5896 bytes --]

>From e0d84548d363cfa46a03719bc318ef52fa5ca98f Mon Sep 17 00:00:00 2001
From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Thu, 23 Dec 2010 06:48:12 -0500
Subject: [PATCH] iproute2: allow to specify truncation bits on auth algo

Attribute XFRMA_ALG_AUTH_TRUNC can be used to specify
truncation bits, so we add a new algo type: auth-trunc.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 ip/ipxfrm.c     |   28 +++++++++++++++++++++++++++-
 ip/xfrm_state.c |   50 +++++++++++++++++++++++++++++++++-----------------
 2 files changed, 60 insertions(+), 18 deletions(-)

diff --git a/ip/ipxfrm.c b/ip/ipxfrm.c
index ba7360f..591f7bf 100644
--- a/ip/ipxfrm.c
+++ b/ip/ipxfrm.c
@@ -155,6 +155,7 @@ const char *strxf_xfrmproto(__u8 proto)
 static const struct typeent algo_types[]= {
 	{ "enc", XFRMA_ALG_CRYPT }, { "auth", XFRMA_ALG_AUTH },
 	{ "comp", XFRMA_ALG_COMP }, { "aead", XFRMA_ALG_AEAD },
+	{ "auth-trunc", XFRMA_ALG_AUTH_TRUNC },
 	{ NULL, -1 }
 };
 
@@ -570,6 +571,25 @@ static void xfrm_aead_print(struct xfrm_algo_aead *algo, int len,
 	fprintf(fp, "%s", _SL_);
 }
 
+static void xfrm_auth_trunc_print(struct xfrm_algo_auth *algo, int len,
+				  FILE *fp, const char *prefix)
+{
+	struct {
+		struct xfrm_algo algo;
+		char key[algo->alg_key_len / 8];
+	} base;
+
+	memcpy(base.algo.alg_name, algo->alg_name, sizeof(base.algo.alg_name));
+	base.algo.alg_key_len = algo->alg_key_len;
+	memcpy(base.algo.alg_key, algo->alg_key, algo->alg_key_len / 8);
+
+	__xfrm_algo_print(&base.algo, XFRMA_ALG_AUTH_TRUNC, len, fp, prefix, 0);
+
+	fprintf(fp, " %d", algo->alg_trunc_len);
+
+	fprintf(fp, "%s", _SL_);
+}
+
 static void xfrm_tmpl_print(struct xfrm_user_tmpl *tmpls, int len,
 			    __u16 family, FILE *fp, const char *prefix)
 {
@@ -677,12 +697,18 @@ void xfrm_xfrma_print(struct rtattr *tb[], __u16 family,
 		fprintf(fp, "\tmark %d/0x%x\n", m->v, m->m);
 	}
 
-	if (tb[XFRMA_ALG_AUTH]) {
+	if (tb[XFRMA_ALG_AUTH] && !tb[XFRMA_ALG_AUTH_TRUNC]) {
 		struct rtattr *rta = tb[XFRMA_ALG_AUTH];
 		xfrm_algo_print((struct xfrm_algo *) RTA_DATA(rta),
 				XFRMA_ALG_AUTH, RTA_PAYLOAD(rta), fp, prefix);
 	}
 
+	if (tb[XFRMA_ALG_AUTH_TRUNC]) {
+		struct rtattr *rta = tb[XFRMA_ALG_AUTH_TRUNC];
+		xfrm_auth_trunc_print((struct xfrm_algo_auth *) RTA_DATA(rta),
+				      RTA_PAYLOAD(rta), fp, prefix);
+	}
+
 	if (tb[XFRMA_ALG_AEAD]) {
 		struct rtattr *rta = tb[XFRMA_ALG_AEAD];
 		xfrm_aead_print((struct xfrm_algo_aead *)RTA_DATA(rta),
diff --git a/ip/xfrm_state.c b/ip/xfrm_state.c
index 70f0a0b..5260d85 100644
--- a/ip/xfrm_state.c
+++ b/ip/xfrm_state.c
@@ -83,18 +83,19 @@ static void usage(void)
  	//fprintf(stderr, "REQID - number(default=0)\n");
 
 	fprintf(stderr, "FLAG-LIST := [ FLAG-LIST ] FLAG\n");
-	fprintf(stderr, "FLAG := [ noecn | decap-dscp | nopmtudisc | wildrecv | icmp | af-unspec ]\n");
+	fprintf(stderr, "FLAG := [ noecn | decap-dscp | nopmtudisc | wildrecv | icmp | af-unspec | align4 ]\n");
 
         fprintf(stderr, "ENCAP := ENCAP-TYPE SPORT DPORT OADDR\n");
         fprintf(stderr, "ENCAP-TYPE := espinudp | espinudp-nonike\n");
 
 	fprintf(stderr, "ALGO-LIST := [ ALGO-LIST ] | [ ALGO ]\n");
 	fprintf(stderr, "ALGO := ALGO_TYPE ALGO_NAME ALGO_KEY "
-			"[ ALGO_ICV_LEN ]\n");
+			"[ ALGO_ICV_LEN | ALGO_TRUNC_LEN ]\n");
 	fprintf(stderr, "ALGO_TYPE := [ ");
 	fprintf(stderr, "%s | ", strxf_algotype(XFRMA_ALG_AEAD));
 	fprintf(stderr, "%s | ", strxf_algotype(XFRMA_ALG_CRYPT));
 	fprintf(stderr, "%s | ", strxf_algotype(XFRMA_ALG_AUTH));
+	fprintf(stderr, "%s | ", strxf_algotype(XFRMA_ALG_AUTH_TRUNC));
 	fprintf(stderr, "%s ", strxf_algotype(XFRMA_ALG_COMP));
 	fprintf(stderr, "]\n");
 
@@ -342,6 +343,7 @@ static int xfrm_state_modify(int cmd, unsigned flags, int argc, char **argv)
 			case XFRMA_ALG_AEAD:
 			case XFRMA_ALG_CRYPT:
 			case XFRMA_ALG_AUTH:
+			case XFRMA_ALG_AUTH_TRUNC:
 			case XFRMA_ALG_COMP:
 			{
 				/* ALGO */
@@ -349,11 +351,12 @@ static int xfrm_state_modify(int cmd, unsigned flags, int argc, char **argv)
 					union {
 						struct xfrm_algo alg;
 						struct xfrm_algo_aead aead;
+						struct xfrm_algo_auth auth;
 					} u;
 					char buf[XFRM_ALGO_KEY_BUF_SIZE];
 				} alg = {};
 				int len;
-				__u32 icvlen;
+				__u32 icvlen, trunclen;
 				char *name;
 				char *key;
 				char *buf;
@@ -370,6 +373,7 @@ static int xfrm_state_modify(int cmd, unsigned flags, int argc, char **argv)
 					ealgop = *argv;
 					break;
 				case XFRMA_ALG_AUTH:
+				case XFRMA_ALG_AUTH_TRUNC:
 					if (aalgop)
 						duparg("ALGOTYPE", *argv);
 					aalgop = *argv;
@@ -397,21 +401,33 @@ static int xfrm_state_modify(int cmd, unsigned flags, int argc, char **argv)
 				buf = alg.u.alg.alg_key;
 				len = sizeof(alg.u.alg);
 
-				if (type != XFRMA_ALG_AEAD)
-					goto parse_algo;
-
-				if (!NEXT_ARG_OK())
-					missarg("ALGOICVLEN");
-				NEXT_ARG();
-				if (get_u32(&icvlen, *argv, 0))
-					invarg("\"aead\" ICV length is invalid",
-					       *argv);
-				alg.u.aead.alg_icv_len = icvlen;
-
-				buf = alg.u.aead.alg_key;
-				len = sizeof(alg.u.aead);
+				switch (type) {
+				case XFRMA_ALG_AEAD:
+					if (!NEXT_ARG_OK())
+						missarg("ALGOICVLEN");
+					NEXT_ARG();
+					if (get_u32(&icvlen, *argv, 0))
+						invarg("\"aead\" ICV length is invalid",
+						       *argv);
+					alg.u.aead.alg_icv_len = icvlen;
+
+					buf = alg.u.aead.alg_key;
+					len = sizeof(alg.u.aead);
+					break;
+				case XFRMA_ALG_AUTH_TRUNC:
+					if (!NEXT_ARG_OK())
+						missarg("ALGOTRUNCLEN");
+					NEXT_ARG();
+					if (get_u32(&trunclen, *argv, 0))
+						invarg("\"auth\" trunc length is invalid",
+						       *argv);
+					alg.u.auth.alg_trunc_len = trunclen;
+
+					buf = alg.u.auth.alg_key;
+					len = sizeof(alg.u.auth);
+					break;
+				}
 
-parse_algo:
 				xfrm_algo_parse((void *)&alg, type, name, key,
 						buf, sizeof(alg.buf));
 				len += alg.u.alg.alg_key_len;
-- 
1.5.6.5


^ permalink raw reply related

* Re: [PATCH] iproute2: allow to specify truncation bits on auth algo
From: Nicolas Dichtel @ 2011-02-02 16:34 UTC (permalink / raw)
  Cc: David Miller, herbert, netdev, christophe.gouault
In-Reply-To: <4D498693.6020909@6wind.com>

[-- Attachment #1: Type: text/plain, Size: 463 bytes --]

On 02/02/2011 17:30, Nicolas Dichtel wrote:
> On 28/01/2011 20:46, David Miller wrote:
>> From: Nicolas Dichtel<nicolas.dichtel@6wind.com>
>> Date: Fri, 28 Jan 2011 09:51:40 +0100
>>
>>> On 28/01/2011 05:51, Herbert Xu wrote:
>>>> So perhaps an SA configuration flag is needed?
>>> I agree. If David is ok, I will update the patch.
>>
>> Sounds good to me.
> And the patch for iproute2.
Sorry, two patches were mixed :(

Here is the right one.


Regards,
Nicolas

[-- Attachment #2: 0001-iproute2-add-support-of-flag-XFRM_STATE_ALIGN4.patch --]
[-- Type: text/x-patch, Size: 2232 bytes --]

>From fe61b9c3564b2f9504bca652cdea80ff3b5a2743 Mon Sep 17 00:00:00 2001
From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Tue, 1 Feb 2011 07:29:54 -0500
Subject: [PATCH] iproute2: add support of flag XFRM_STATE_ALIGN4

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 include/linux/xfrm.h |    1 +
 ip/ipxfrm.c          |    1 +
 ip/xfrm_state.c      |    4 +++-
 3 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/include/linux/xfrm.h b/include/linux/xfrm.h
index 07f2b63..8b2d220 100644
--- a/include/linux/xfrm.h
+++ b/include/linux/xfrm.h
@@ -349,6 +349,7 @@ struct xfrm_usersa_info {
 #define XFRM_STATE_WILDRECV	8
 #define XFRM_STATE_ICMP		16
 #define XFRM_STATE_AF_UNSPEC	32
+#define XFRM_STATE_ALIGN4	64
 };
 
 struct xfrm_usersa_id {
diff --git a/ip/ipxfrm.c b/ip/ipxfrm.c
index 9753822..ba7360f 100644
--- a/ip/ipxfrm.c
+++ b/ip/ipxfrm.c
@@ -828,6 +828,7 @@ void xfrm_state_info_print(struct xfrm_usersa_info *xsinfo,
 		XFRM_FLAG_PRINT(fp, flags, XFRM_STATE_WILDRECV, "wildrecv");
 		XFRM_FLAG_PRINT(fp, flags, XFRM_STATE_ICMP, "icmp");
 		XFRM_FLAG_PRINT(fp, flags, XFRM_STATE_AF_UNSPEC, "af-unspec");
+		XFRM_FLAG_PRINT(fp, flags, XFRM_STATE_ALIGN4, "align4");
 		if (flags)
 			fprintf(fp, "%x", flags);
 	}
diff --git a/ip/xfrm_state.c b/ip/xfrm_state.c
index 38d4039..e8c7c96 100644
--- a/ip/xfrm_state.c
+++ b/ip/xfrm_state.c
@@ -83,7 +83,7 @@ static void usage(void)
  	//fprintf(stderr, "REQID - number(default=0)\n");
 
 	fprintf(stderr, "FLAG-LIST := [ FLAG-LIST ] FLAG\n");
-	fprintf(stderr, "FLAG := [ noecn | decap-dscp | nopmtudisc | wildrecv | icmp | af-unspec ]\n");
+	fprintf(stderr, "FLAG := [ noecn | decap-dscp | nopmtudisc | wildrecv | icmp | af-unspec | align4 ]\n");
 
         fprintf(stderr, "ENCAP := ENCAP-TYPE SPORT DPORT OADDR\n");
         fprintf(stderr, "ENCAP-TYPE := espinudp | espinudp-nonike\n");
@@ -214,6 +214,8 @@ static int xfrm_state_flag_parse(__u8 *flags, int *argcp, char ***argvp)
 				*flags |= XFRM_STATE_ICMP;
 			else if (strcmp(*argv, "af-unspec") == 0)
 				*flags |= XFRM_STATE_AF_UNSPEC;
+			else if (strcmp(*argv, "align4") == 0)
+				*flags |= XFRM_STATE_ALIGN4;
 			else {
 				PREV_ARG(); /* back track */
 				break;
-- 
1.5.6.5


^ permalink raw reply related

* Re: Network performance with small packets
From: Shirley Ma @ 2011-02-02 17:10 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Krishna Kumar2, David Miller, kvm, mashirle, netdev, netdev-owner,
	Sridhar Samudrala, Steve Dobbelstein
In-Reply-To: <20110202154706.GA12738@redhat.com>

On Wed, 2011-02-02 at 17:47 +0200, Michael S. Tsirkin wrote:
> On Wed, Feb 02, 2011 at 07:39:45AM -0800, Shirley Ma wrote:
> > On Wed, 2011-02-02 at 12:48 +0200, Michael S. Tsirkin wrote:
> > > Yes, I think doing this in the host is much simpler,
> > > just send an interrupt after there's a decent amount
> > > of space in the queue.
> > > 
> > > Having said that the simple heuristic that I coded
> > > might be a bit too simple.
> > 
> > >From the debugging out what I have seen so far (a single small
> message
> > TCP_STEAM test), I think the right approach is to patch both guest
> and
> > vhost.
> 
> One problem is slowing down the guest helps here.
> So there's a chance that just by adding complexity
> in guest driver we get a small improvement :(
> 
> We can't rely on a patched guest anyway, so
> I think it is best to test guest and host changes separately.
> 
> And I do agree something needs to be done in guest too,
> for example when vqs share an interrupt, we
> might invoke a callback when we see vq is not empty
> even though it's not requested. Probably should
> check interrupts enabled here?

Yes, I modified xmit callback something like below:

static void skb_xmit_done(struct virtqueue *svq)
{
        struct virtnet_info *vi = svq->vdev->priv;

        /* Suppress further interrupts. */
        virtqueue_disable_cb(svq);

        /* We were probably waiting for more output buffers. */
        if (netif_queue_stopped(vi->dev)) {
                free_old_xmit_skbs(vi);
                if (virtqueue_free_size(svq)  <= svq->vring.num / 2) {
                        virtqueue_enable_cb(svq);
			return;
		}
        }
	netif_wake_queue(vi->dev);
}

> > The problem I have found is a regression for single  small
> > message TCP_STEAM test. Old kernel works well for TCP_STREAM, only
> new
> > kernel has problem.
> 
> Likely new kernel is faster :)

> > For Steven's problem, it's multiple stream TCP_RR issues, the old
> guest
> > doesn't perform well, so does new guest kernel. We tested reducing
> vhost
> > signaling patch before, it didn't help the performance at all.
> > 
> > Thanks
> > Shirley
> 
> Yes, it seems unrelated to tx interrupts. 

The issue is more likely related to latency. Do you have anything in
mind on how to reduce vhost latency?

Thanks
Shirley


^ permalink raw reply

* Re: Network performance with small packets
From: Shirley Ma @ 2011-02-02 17:12 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Sridhar Samudrala, Steve Dobbelstein, David Miller, kvm, mashirle,
	netdev
In-Reply-To: <20110202154841.GB12738@redhat.com>

On Wed, 2011-02-02 at 17:48 +0200, Michael S. Tsirkin wrote:
> And this is with sndbuf=0 in host, yes?
> And do you see a lot of tx interrupts?
> How packets per interrupt?

Nope, sndbuf doens't matter since I never hit reaching sock wmem
condition in vhost. I am still playing around, let me know what data you
would like to collect.

Thanks
Shirley


^ permalink raw reply

* Re: [PATCH v5] Gemini: Gigabit ethernet driver
From: Michał Mirosław @ 2011-02-02 17:18 UTC (permalink / raw)
  To: David Miller; +Cc: ulli.kroll, gemini-board-dev, netdev, linux-kernel.bfrz
In-Reply-To: <20110128.143159.48506419.davem@davemloft.net>

On Fri, Jan 28, 2011 at 02:31:59PM -0800, David Miller wrote:
> From: Michał Mirosław <mirq-linux@rere.qmqm.pl>
> Date: Thu, 27 Jan 2011 00:24:19 +0100 (CET)
> > +#define NETIF_TSO_FEATURES	\
> > +	(NETIF_F_TSO|NETIF_F_TSO_ECN|NETIF_F_TSO6)
> > +#define GMAC_TX_OFFLOAD_FEATURES	\
> > +	(NETIF_TSO_FEATURES|NETIF_F_ALL_CSUM)
> 
> Please, when definiting macros locally for your driver, do not name
> them with prefixes that match those defined generically by the
> network stack.  Otherwise it is confusing for people reading the
> driver.
> 
> One should be able to see "NETIF_XXX" somewhere and expect to find
> it's definition somewhere in the generic networking driver interfaces,
> not in the driver itself.

Sure. Renamed to GMAC_TSO_FEATURES for next versions.

> > +static struct toe_private *netdev_to_toe(struct net_device *dev)
> > +{
> > +	return dev->ml_priv;
> > +}
> 
> There is no reason to use ->ml_priv just to have a common backpointer
> to a structure shared between multiple interfaces.
> 
> Simply add a "struct toe_private *" to your "struct gmac_private" and
> stick it there.
> 
> The cost of the dereference is identical in both cases, so there is not
> even a performance incentive to use ->ml_priv.

What is the correct use of ml_priv? I saw that a few drivers use it for
same or similar purpose (pointer to state shared between network devices).

> > +static void __iomem *gmac_ctl_reg(struct net_device *dev, unsigned int reg)
> > +{
> > +	return (void __iomem *)dev->base_addr + reg;
> > +}
> Please do not abuse dev->base_addr in this way, simply define another
> "void __iomem *" pointer in your gmac_private and use that.

The field is unused for memory-mapped devices. Still, that's why I wrap
this kind of things in one-liners - I can easily change it.

> > +	page = pfn_to_page(dma_to_pfn(toe->dev, rx->word2.buf_adr));
> Please do not use non-portable routines such as dma_to_pfn() unless it
> is absolutely unavoidable.  Instead, use schemes for page struct
> lookup like those used by drivers such as drivers/net/niu.c, which uses
> a hash table to find pages based upon DMA address.
> 
> I'd like you to be able to enable this driver on as many platforms as
> possible, not just ARM, so we can be build testing your driver as we
> make changes to various network driver APIs, and we can't do that if
> you put ARM specific stuff in here.

To make it build I need to add a bunch of #ifdefs to compile out other
platform-specific code. I see no reason to make it inefficient on the
only platform it's supposed to work. I have isolated the code to another
one-liner functions in case this happens to be bigger problem.

If I make it buildable (NOT working) on other archs is it enough I add
Kconfig dependency on (ARCH_GEMINI || BROKEN) to allow it to be
build-tested there?

> > +		dev_err(&dev->dev, "Unsupported MII interface\n");
> Please use "netdev_err(dev, ..."
> Please use netdev_*() when possible elsewhere in this driver too.

Fixed. This is called before the netdev is registered, though.

> > +	writel(
> > +		(GMAC0_SWTQ00_EOF_INT_BIT|GMAC0_SWTQ00_FIN_INT_BIT)
> > +			<< (6 * dev->dev_id + txq_num),
> > +		toe_reg(toe, GLOBAL_INTERRUPT_STATUS_0_REG));
> Please format this more reasonably, this looks awful.

Fixed.

> > +	txq->ring[w].word0.bits32 = skb_headlen(skb);
> > +	txq->ring[w].word1.bits32 = skb->len | tss_flags;
> > +	txq->ring[w].word2.bits32 = mapping;
> > +	txq->ring[w].word3.bits32 = tss_pkt_len(skb) | SOF_BIT;
> What is the endinness of the RX and TX descriptors of this chipset?
> Please use "__be32", "__le32", and the endianness conversion interfaces
> as needed.

StorLink people who wrote original spaghetti-code probably didn't know
either.  The SoC has beed used only in little-endian mode so it isn't even
possible for me to test how would it work in big-endian mode.

Best Regards,
Michał Mirosław


^ permalink raw reply

* Re: [PATCH] bonding: added 802.3ad round-robin hashing policy for single TCP session balancing
From: Jay Vosburgh @ 2011-02-02 17:30 UTC (permalink / raw)
  To: Oleg V. Ukhno
  Cc: Nicolas de Pesloüan, John Fastabend, netdev@vger.kernel.org
In-Reply-To: <4D4833EA.50407@yandex-team.ru>

Oleg V. Ukhno <olegu@yandex-team.ru> wrote:

>On 01/29/2011 05:28 AM, Jay Vosburgh wrote:
>> Oleg V. Ukhno<olegu@yandex-team.ru>  wrote:
>>
>> 	I've thought about this whole thing, and here's what I view as
>> the proper way to do this.
>>
>> 	In my mind, this proposal is two separate pieces:
>>
>> 	First, a piece to make round-robin a selectable hash for
>> xmit_hash_policy.  The documentation for this should follow the pattern
>> of the "layer3+4" hash policy, in particular noting that the new
>> algorithm violates the 802.3ad standard in exciting ways, will result in
>> out of order delivery, and that other 802.3ad implementations may or may
>> not tolerate this.
>>
>> 	Second, a piece to make certain transmitted packets use the
>> source MAC of the sending slave instead of the bond's MAC.  This should
>> be a separate option from the round-robin hash policy.  I'd call it
>> something like "mac_select" with two values: "default" (what we do now)
>> and "slave_src_mac" to use the slave's real MAC for certain types of
>> traffic (I'm open to better names; that's just what I came up with while
>> writing this).  I believe that "certain types" means "everything but
>> ARP," but might be "only IP and IPv6."  Structuring the option in this
>> manner leaves the option open for additional selections in the future,
>> which a simple "on/off" option wouldn't.  This option should probably
>> only affect a subset of modes; I'm thinking anything except balance-tlb
>> or -alb (because they do funky MAC things already) and active-backup (it
>> doesn't balance traffic, and already uses fail_over_mac to control
>> this).  I think this option also needs a whole new section down in the
>> bottom explaining how to exploit it (the "pick special MACs on slaves to
>> trick switch hash" business).
>>
>> 	Comments?
>>
>> 	-J
>>
>Jay,
>As for me splitting my initial proposal into two logically diffent pieces
>is ok, this will provide more flexible configuration.
>Do I understand correctly, that after I rewrite  patch in splitted form,
>as you described above, and enhance documentation it will be /can be
>applied to kernel?

	Yes, although the patches may have to go through a few
revisions.


>Then what should I do: rewrite patch and resubmit it as a new one?

	Yes.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com


^ permalink raw reply

* Re: Network performance with small packets
From: Michael S. Tsirkin @ 2011-02-02 17:32 UTC (permalink / raw)
  To: Shirley Ma
  Cc: Krishna Kumar2, David Miller, kvm, mashirle, netdev, netdev-owner,
	Sridhar Samudrala, Steve Dobbelstein
In-Reply-To: <1296666635.25430.35.camel@localhost.localdomain>

On Wed, Feb 02, 2011 at 09:10:35AM -0800, Shirley Ma wrote:
> On Wed, 2011-02-02 at 17:47 +0200, Michael S. Tsirkin wrote:
> > On Wed, Feb 02, 2011 at 07:39:45AM -0800, Shirley Ma wrote:
> > > On Wed, 2011-02-02 at 12:48 +0200, Michael S. Tsirkin wrote:
> > > > Yes, I think doing this in the host is much simpler,
> > > > just send an interrupt after there's a decent amount
> > > > of space in the queue.
> > > > 
> > > > Having said that the simple heuristic that I coded
> > > > might be a bit too simple.
> > > 
> > > >From the debugging out what I have seen so far (a single small
> > message
> > > TCP_STEAM test), I think the right approach is to patch both guest
> > and
> > > vhost.
> > 
> > One problem is slowing down the guest helps here.
> > So there's a chance that just by adding complexity
> > in guest driver we get a small improvement :(
> > 
> > We can't rely on a patched guest anyway, so
> > I think it is best to test guest and host changes separately.
> > 
> > And I do agree something needs to be done in guest too,
> > for example when vqs share an interrupt, we
> > might invoke a callback when we see vq is not empty
> > even though it's not requested. Probably should
> > check interrupts enabled here?
> 
> Yes, I modified xmit callback something like below:
> 
> static void skb_xmit_done(struct virtqueue *svq)
> {
>         struct virtnet_info *vi = svq->vdev->priv;
> 
>         /* Suppress further interrupts. */
>         virtqueue_disable_cb(svq);
> 
>         /* We were probably waiting for more output buffers. */
>         if (netif_queue_stopped(vi->dev)) {
>                 free_old_xmit_skbs(vi);
>                 if (virtqueue_free_size(svq)  <= svq->vring.num / 2) {
>                         virtqueue_enable_cb(svq);
> 			return;
> 		}
>         }
> 	netif_wake_queue(vi->dev);
> }

OK, but this should have no effect with a vhost patch
which should ensure that we don't get an interrupt
until the queue is at least half empty.
Right?

> > > The problem I have found is a regression for single  small
> > > message TCP_STEAM test. Old kernel works well for TCP_STREAM, only
> > new
> > > kernel has problem.
> > 
> > Likely new kernel is faster :)
> 
> > > For Steven's problem, it's multiple stream TCP_RR issues, the old
> > guest
> > > doesn't perform well, so does new guest kernel. We tested reducing
> > vhost
> > > signaling patch before, it didn't help the performance at all.
> > > 
> > > Thanks
> > > Shirley
> > 
> > Yes, it seems unrelated to tx interrupts. 
> 
> The issue is more likely related to latency.

Could be. Why do you think so?

> Do you have anything in
> mind on how to reduce vhost latency?
> 
> Thanks
> Shirley

Hmm, bypassing the bridge might help a bit.
Are you using tap+bridge or macvtap?

^ permalink raw reply

* non-symmetric Unix dgram sockets and poll
From: Alban Crequy @ 2011-02-02 17:36 UTC (permalink / raw)
  To: netdev

Hi,

I have 3 Unix dgram sockets (sockA, sockB, sockC):
- sockA is connected to sockB.
- sockB is connected to sockC.
- sockC is not connected.

SockA cannot send any message to sockB because net/unix/af_unix.c::unix_may_send()
prevents it. Is there any reason for that restriction?

If it was possible to send that message, unix_dgram_poll() on sockB could be in a
situation where it needs to add the current process in 2 different wait queues:
- sk_sleep(sk) for POLLIN (messages coming from sockA)
- &unix_sk(other)->peer_wait for POLLOUT (to send messages to sockC)

Is it correct?

Thanks,
Alban

^ permalink raw reply

* af_unix unix_getname: return size for unnamed sockets too small?
From: Marcus Meissner @ 2011-02-02 17:40 UTC (permalink / raw)
  To: davem, eric.dumazet, ebiederm, netdev, linux-kernel, gorcunov

Hi,

In net/unix/af_unix.c::unix_getname() there is a small problem:

        if (!u->addr) {
                sunaddr->sun_family = AF_UNIX;
                sunaddr->sun_path[0] = 0;		// not copied out
                *uaddr_len = sizeof(short);
        } else {
                struct unix_address *addr = u->addr;

                *uaddr_len = addr->len;
                memcpy(sunaddr, addr->name, *uaddr_len);
        }

The if (!u->addr) case will not copy out the \0 in the sun_path, as
uaddr_len is just the size of sun_family.

(Shown by socat crashing after decoding gethostname return and expected
sun_path to be a valid string (and not seeing the \0)).

Should it perhaps be *uaddr_len = sizeof(short)+sizeof(char)?

Ciao, Marcus

^ permalink raw reply

* Re: [PATCH] bonding: added 802.3ad round-robin hashing policy for single TCP session balancing
From: Jay Vosburgh @ 2011-02-02 17:57 UTC (permalink / raw)
  To: =?UTF-8?B?Tmljb2xhcyBkZSBQZXNsb8O8YW4=?=
  Cc: Oleg V. Ukhno, John Fastabend, netdev@vger.kernel.org
In-Reply-To: <4D4929BB.2000403@gmail.com>

Nicolas de Pesloüan <nicolas.2p.debian@gmail.com> wrote:

>Le 29/01/2011 03:28, Jay Vosburgh a écrit :
>> 	I've thought about this whole thing, and here's what I view as
>> the proper way to do this.
>>
>> 	In my mind, this proposal is two separate pieces:
>>
>> 	First, a piece to make round-robin a selectable hash for
>> xmit_hash_policy.  The documentation for this should follow the pattern
>> of the "layer3+4" hash policy, in particular noting that the new
>> algorithm violates the 802.3ad standard in exciting ways, will result in
>> out of order delivery, and that other 802.3ad implementations may or may
>> not tolerate this.
>>
>> 	Second, a piece to make certain transmitted packets use the
>> source MAC of the sending slave instead of the bond's MAC.  This should
>> be a separate option from the round-robin hash policy.  I'd call it
>> something like "mac_select" with two values: "default" (what we do now)
>> and "slave_src_mac" to use the slave's real MAC for certain types of
>> traffic (I'm open to better names; that's just what I came up with while
>> writing this).  I believe that "certain types" means "everything but
>> ARP," but might be "only IP and IPv6."  Structuring the option in this
>> manner leaves the option open for additional selections in the future,
>> which a simple "on/off" option wouldn't.  This option should probably
>> only affect a subset of modes; I'm thinking anything except balance-tlb
>> or -alb (because they do funky MAC things already) and active-backup (it
>> doesn't balance traffic, and already uses fail_over_mac to control
>> this).  I think this option also needs a whole new section down in the
>> bottom explaining how to exploit it (the "pick special MACs on slaves to
>> trick switch hash" business).
>>
>> 	Comments?
>
>Looks really sensible to me.
>
>I just propose the following option and option values : "src_mac_select"
>(instead of mac_select), with "default" and "slave_mac" (instead of
>slave_src_mac) as possible values. In the future, we might need a
>"dst_mac_select" option... :-)

	I originally thought of using the nomenclature you propose; my
thinking for doing it the way I ended up with is to minimize the number
of tunable knobs that bonding has (so, the dst_mac would be a setting
for mac_select).  That works as long as there aren't a lot of settings
that would be turned on simultaneously, since each combination would
have to be a separate option, or the options parser would have to handle
multiple settings (e.g., mac_select=src+dst or something like that).

	Anyway, after thinking about it some more, in the long run it's
probably safer to separate these two, so, Oleg, use the above naming
("src_mac_select" with "default" and "slave_mac").

>Also, are there any risks that this kind of session load-balancing won't
>properly cooperate with multiqueue (as explained in "Overriding
>Configuration for Special Cases" in Documentation/networking/bonding.txt)?
>I think it is important to ensure we keep the ability to fine tune the
>egress path selection

	I think the logic for the mac_select (or src_mac_select or
whatever) just has to be done last, after the slave selection is done by
the multiqueue stuff.  That's probably a good tidbit to put in the
documentation as well.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Re: kernel 2.6.37 : oops in cleanup_once
From: Yann Dupont @ 2011-02-02 17:59 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel, netdev
In-Reply-To: <1296659336.20445.22.camel@edumazet-laptop>

Le 02/02/2011 16:08, Eric Dumazet a écrit :
> Le mercredi 02 février 2011 à 16:04 +0100, Yann Dupont a écrit :
>> Ok, will do it at 18:30 CET (to minimize impact)
>> It the suspected bug SLUB related ?
>>
> no : It can be a corruption from another part of kernel.
>
>> The 2.6.34.2 kernel previously used on that server used SLAB.
>>
>>
>> 2 questions :
>> -How can I be sure slub_nomerge is active ? Boot message ?
>
> # ls -l /sys/kernel/slab/
>
> If you have symlinks : merge is on (default)
>
> If you dont have symlinks : nomerge is in action
>
>> -Is there a very severe impact on performance ?
>>
> not at all
>
>> Regards,
>>
>
well. The server had the good taste to oops at 18H05, 25 minutes before 
the planned reboot :)

here is the oops (I think it's quite the same) :


Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.128042] 
BUG: unable to handle kernel NULL pointer dereference at 000000000000000d
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.128097] 
IP: [<ffffffff8130e6bf>] cleanup_once+0x3f/0xa0
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.128146] PGD 0
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.128173] 
Oops: 0002 [#1] SMP
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.128200] 
last sysfs file: /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.128250] CPU 7
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.128260] 
Modules linked in: dell_rbu acpi_cpufreq freq_table mperf nls_utf8 
nls_cp437 btrfs zlib_deflate crc32c libcrc32c ufs qnx4 hfsplus hfs minix 
ntfs vfat msdos fat jfs rei
serfs ext4 jbd2 crc16 ext3 jbd tun ipt_MASQUERADE iptable_nat nf_nat 
ipt_REJECT kvm_intel kvm xt_physdev ip6t_LOG nf_conntrack_ipv6 
nf_defrag_ipv6 ip6table_filter ip6_tables ipt_LOG xt_multiport xt_limit 
xt_tcpudp xt_state iptable_filter
  ip_tables x_tables nf_conntrack_tftp nf_conntrack_ftp 
nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ipv6 8021q bridge stp ext2 
mbcache fuse snd_pcm snd_timer ghes hed button snd soundcore i5000_edac 
edac_core processor shpchp tpm_tis pc
i_hotplug tpm rng_core snd_page_alloc i5k_amb dcdbas tpm_bios joydev 
evdev psmouse pcspkr serio_raw thermal_sys xfs exportfs dm_mod sg sr_mod 
cdrom sd_mod usbhid hid usb_storage qla2xxx scsi_transport_fc scsi_tgt 
uhci_hcd mptsas mptscsih
  mptbase bnx2 scsi_transport_sas scsi_mod ehci_hcd [last unloaded: 
scsi_wait_scan]
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.128834]
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.128855] 
Pid: 0, comm: kworker/0:1 Not tainted 2.6.37-dsiun-110105 #17 
0MY736/PowerEdge M600
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.128901] 
RIP: 0010:[<ffffffff8130e6bf>]  [<ffffffff8130e6bf>] cleanup_once+0x3f/0xa0
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.128948] 
RSP: 0018:ffff8800cfdc3e20  EFLAGS: 00010206
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.128974] 
RAX: ffff8803a7e0ea18 RBX: ffff8803a7e0ea00 RCX: 0000000000000005
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129003] 
RDX: adde806c0d860b00 RSI: 0000000000000096 RDI: ffffffff8152a970
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129032] 
RBP: 00000000000248f6 R08: 00000000003d0900 R09: 0000000000000000
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129062] 
R10: dead000000200200 R11: 0000000000000000 R12: ffff8800cfdc3ea0
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129091] 
R13: 0000000000000100 R14: ffff88040fd29fd8 R15: 0000000000000000
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129121] 
FS:  0000000000000000(0000) GS:ffff8800cfdc0000(0000) knlGS:0000000000000000
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129166] 
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129193] 
CR2: 000000000000000d CR3: 00000000014f1000 CR4: 00000000000026e0
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129223] 
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129252] 
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129282] 
Process kworker/0:1 (pid: 0, threadinfo ffff88040fd28000, task 
ffff88040fce6450)
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129327] Stack:
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129347]  
0000000000000082 00000001008d3b66 00000000000248f6 ffffffff8130e988
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129397]  
ffff88040fd24000 ffff88040fd24000 ffffffff8152a9a0 ffffffff8105e95f
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129446]  
ffff8800cfdc3e58 ffff88040fd25020 ffffffff8130e950 ffff88040fd29fd8
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129496] 
Call Trace:
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129523] <IRQ>
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129551]  
[<ffffffff8130e988>] ? peer_check_expire+0x38/0x110
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129581]  
[<ffffffff8105e95f>] ? run_timer_softirq+0x16f/0x350
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129609]  
[<ffffffff8130e950>] ? peer_check_expire+0x0/0x110
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129638]  
[<ffffffff81079c6b>] ? ktime_get+0x5b/0xe0
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129666]  
[<ffffffff8105685a>] ? __do_softirq+0xaa/0x1e0
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129694]  
[<ffffffff81003ddc>] ? call_softirq+0x1c/0x30
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129722]  
[<ffffffff81005f75>] ? do_softirq+0x65/0xa0
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129748]  
[<ffffffff81056745>] ? irq_exit+0x85/0x90
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129776]  
[<ffffffff8102137a>] ? smp_apic_timer_interrupt+0x6a/0xa0
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129806]  
[<ffffffff81003893>] ? apic_timer_interrupt+0x13/0x20
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129833] <EOI>
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129857]  
[<ffffffff8123f5ce>] ? acpi_hw_register_read+0x54/0xe2
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129890]  
[<ffffffffa01c52b8>] ? acpi_idle_enter_simple+0xf4/0x126 [processor]
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129936]  
[<ffffffffa01c52b1>] ? acpi_idle_enter_simple+0xed/0x126 [processor]
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.131555]  
[<ffffffffa01c5034>] ? acpi_idle_enter_bm+0xeb/0x27b [processor]
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.131591]  
[<ffffffff812c0deb>] ? cpuidle_idle_call+0x8b/0x140
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.131619]  
[<ffffffff8100208a>] ? cpu_idle+0x6a/0xf0
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.131645] 
Code: 00 48 8b 05 c4 c2 21 00 48 3d 60 a9 52 81 74 5c 48 8d 58 e8 48 8b 
15 11 02 24 00 2b 53 28 48 39 ea 72 49 48 8b 4b 18 48 8b 53 20 <48> 89 
51 08 48 89 0a 48 89 43 18 48 89 43 20 f0 ff 40 14 48 c7
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.131847] 
RIP  [<ffffffff8130e6bf>] cleanup_once+0x3f/0xa0
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.131876]  
RSP <ffff8800cfdc3e20>
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.131898] 
CR2: 000000000000000d
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.132280] 
---[ end trace a9f45436c3b7c143 ]---
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.132350] 
Kernel panic - not syncing: Fatal exception in interrupt
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.132422] 
Pid: 0, comm: kworker/0:1 Tainted: G      D     2.6.37-dsiun-110105 #17
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.132510] 
Call Trace:
Feb  2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.132574] 
<IRQ>  [<ffffffff8137c75e>] ? panic+0x92/0x1a2

and I also have a screenshot with more details. I'll send it in a 
private message.



Since 18H30, the server runs with slub_nomerge.

-- 
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr

^ permalink raw reply

* [PATCH] be2net: use device model DMA API
From: Ivan Vecera @ 2011-02-02 18:05 UTC (permalink / raw)
  To: netdev; +Cc: sathyap, subbus, sarveshwarb, ajitk

Use DMA API as PCI equivalents will be deprecated.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
---
 drivers/net/benet/be_ethtool.c |   25 +++++-----
 drivers/net/benet/be_main.c    |   98 +++++++++++++++++++++-------------------
 2 files changed, 64 insertions(+), 59 deletions(-)

diff --git a/drivers/net/benet/be_ethtool.c b/drivers/net/benet/be_ethtool.c
index b4be027..0c99314 100644
--- a/drivers/net/benet/be_ethtool.c
+++ b/drivers/net/benet/be_ethtool.c
@@ -376,8 +376,9 @@ static int be_get_settings(struct net_device *netdev, struct ethtool_cmd *ecmd)
 		}
 
 		phy_cmd.size = sizeof(struct be_cmd_req_get_phy_info);
-		phy_cmd.va = pci_alloc_consistent(adapter->pdev, phy_cmd.size,
-					&phy_cmd.dma);
+		phy_cmd.va = dma_alloc_coherent(&adapter->pdev->dev,
+						phy_cmd.size, &phy_cmd.dma,
+						GFP_KERNEL);
 		if (!phy_cmd.va) {
 			dev_err(&adapter->pdev->dev, "Memory alloc failure\n");
 			return -ENOMEM;
@@ -416,8 +417,8 @@ static int be_get_settings(struct net_device *netdev, struct ethtool_cmd *ecmd)
 		adapter->port_type = ecmd->port;
 		adapter->transceiver = ecmd->transceiver;
 		adapter->autoneg = ecmd->autoneg;
-		pci_free_consistent(adapter->pdev, phy_cmd.size,
-					phy_cmd.va, phy_cmd.dma);
+		dma_free_coherent(&adapter->pdev->dev, phy_cmd.size, phy_cmd.va,
+				  phy_cmd.dma);
 	} else {
 		ecmd->speed = adapter->link_speed;
 		ecmd->port = adapter->port_type;
@@ -554,8 +555,8 @@ be_test_ddr_dma(struct be_adapter *adapter)
 	};
 
 	ddrdma_cmd.size = sizeof(struct be_cmd_req_ddrdma_test);
-	ddrdma_cmd.va = pci_alloc_consistent(adapter->pdev, ddrdma_cmd.size,
-					&ddrdma_cmd.dma);
+	ddrdma_cmd.va = dma_alloc_coherent(&adapter->pdev->dev, ddrdma_cmd.size,
+					   &ddrdma_cmd.dma, GFP_KERNEL);
 	if (!ddrdma_cmd.va) {
 		dev_err(&adapter->pdev->dev, "Memory allocation failure\n");
 		return -ENOMEM;
@@ -569,8 +570,8 @@ be_test_ddr_dma(struct be_adapter *adapter)
 	}
 
 err:
-	pci_free_consistent(adapter->pdev, ddrdma_cmd.size,
-			ddrdma_cmd.va, ddrdma_cmd.dma);
+	dma_free_coherent(&adapter->pdev->dev, ddrdma_cmd.size, ddrdma_cmd.va,
+			  ddrdma_cmd.dma);
 	return ret;
 }
 
@@ -662,8 +663,8 @@ be_read_eeprom(struct net_device *netdev, struct ethtool_eeprom *eeprom,
 
 	memset(&eeprom_cmd, 0, sizeof(struct be_dma_mem));
 	eeprom_cmd.size = sizeof(struct be_cmd_req_seeprom_read);
-	eeprom_cmd.va = pci_alloc_consistent(adapter->pdev, eeprom_cmd.size,
-				&eeprom_cmd.dma);
+	eeprom_cmd.va = dma_alloc_coherent(&adapter->pdev->dev, eeprom_cmd.size,
+					   &eeprom_cmd.dma, GFP_KERNEL);
 
 	if (!eeprom_cmd.va) {
 		dev_err(&adapter->pdev->dev,
@@ -677,8 +678,8 @@ be_read_eeprom(struct net_device *netdev, struct ethtool_eeprom *eeprom,
 		resp = (struct be_cmd_resp_seeprom_read *) eeprom_cmd.va;
 		memcpy(data, resp->seeprom_data + eeprom->offset, eeprom->len);
 	}
-	pci_free_consistent(adapter->pdev, eeprom_cmd.size, eeprom_cmd.va,
-			eeprom_cmd.dma);
+	dma_free_coherent(&adapter->pdev->dev, eeprom_cmd.size, eeprom_cmd.va,
+			  eeprom_cmd.dma);
 
 	return status;
 }
diff --git a/drivers/net/benet/be_main.c b/drivers/net/benet/be_main.c
index 28a32a6..82b2df8 100644
--- a/drivers/net/benet/be_main.c
+++ b/drivers/net/benet/be_main.c
@@ -125,8 +125,8 @@ static void be_queue_free(struct be_adapter *adapter, struct be_queue_info *q)
 {
 	struct be_dma_mem *mem = &q->dma_mem;
 	if (mem->va)
-		pci_free_consistent(adapter->pdev, mem->size,
-			mem->va, mem->dma);
+		dma_free_coherent(&adapter->pdev->dev, mem->size, mem->va,
+				  mem->dma);
 }
 
 static int be_queue_alloc(struct be_adapter *adapter, struct be_queue_info *q,
@@ -138,7 +138,8 @@ static int be_queue_alloc(struct be_adapter *adapter, struct be_queue_info *q,
 	q->len = len;
 	q->entry_size = entry_size;
 	mem->size = len * entry_size;
-	mem->va = pci_alloc_consistent(adapter->pdev, mem->size, &mem->dma);
+	mem->va = dma_alloc_coherent(&adapter->pdev->dev, mem->size, &mem->dma,
+				     GFP_KERNEL);
 	if (!mem->va)
 		return -1;
 	memset(mem->va, 0, mem->size);
@@ -484,7 +485,7 @@ static void wrb_fill_hdr(struct be_adapter *adapter, struct be_eth_hdr_wrb *hdr,
 	AMAP_SET_BITS(struct amap_eth_hdr_wrb, len, hdr, len);
 }
 
-static void unmap_tx_frag(struct pci_dev *pdev, struct be_eth_wrb *wrb,
+static void unmap_tx_frag(struct device *dev, struct be_eth_wrb *wrb,
 		bool unmap_single)
 {
 	dma_addr_t dma;
@@ -494,11 +495,10 @@ static void unmap_tx_frag(struct pci_dev *pdev, struct be_eth_wrb *wrb,
 	dma = (u64)wrb->frag_pa_hi << 32 | (u64)wrb->frag_pa_lo;
 	if (wrb->frag_len) {
 		if (unmap_single)
-			pci_unmap_single(pdev, dma, wrb->frag_len,
-				PCI_DMA_TODEVICE);
+			dma_unmap_single(dev, dma, wrb->frag_len,
+					 DMA_TO_DEVICE);
 		else
-			pci_unmap_page(pdev, dma, wrb->frag_len,
-				PCI_DMA_TODEVICE);
+			dma_unmap_page(dev, dma, wrb->frag_len, DMA_TO_DEVICE);
 	}
 }
 
@@ -507,7 +507,7 @@ static int make_tx_wrbs(struct be_adapter *adapter,
 {
 	dma_addr_t busaddr;
 	int i, copied = 0;
-	struct pci_dev *pdev = adapter->pdev;
+	struct device *dev = &adapter->pdev->dev;
 	struct sk_buff *first_skb = skb;
 	struct be_queue_info *txq = &adapter->tx_obj.q;
 	struct be_eth_wrb *wrb;
@@ -521,9 +521,8 @@ static int make_tx_wrbs(struct be_adapter *adapter,
 
 	if (skb->len > skb->data_len) {
 		int len = skb_headlen(skb);
-		busaddr = pci_map_single(pdev, skb->data, len,
-					 PCI_DMA_TODEVICE);
-		if (pci_dma_mapping_error(pdev, busaddr))
+		busaddr = dma_map_single(dev, skb->data, len, DMA_TO_DEVICE);
+		if (dma_mapping_error(dev, busaddr))
 			goto dma_err;
 		map_single = true;
 		wrb = queue_head_node(txq);
@@ -536,10 +535,9 @@ static int make_tx_wrbs(struct be_adapter *adapter,
 	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
 		struct skb_frag_struct *frag =
 			&skb_shinfo(skb)->frags[i];
-		busaddr = pci_map_page(pdev, frag->page,
-				       frag->page_offset,
-				       frag->size, PCI_DMA_TODEVICE);
-		if (pci_dma_mapping_error(pdev, busaddr))
+		busaddr = dma_map_page(dev, frag->page, frag->page_offset,
+				       frag->size, DMA_TO_DEVICE);
+		if (dma_mapping_error(dev, busaddr))
 			goto dma_err;
 		wrb = queue_head_node(txq);
 		wrb_fill(wrb, busaddr, frag->size);
@@ -563,7 +561,7 @@ dma_err:
 	txq->head = map_head;
 	while (copied) {
 		wrb = queue_head_node(txq);
-		unmap_tx_frag(pdev, wrb, map_single);
+		unmap_tx_frag(dev, wrb, map_single);
 		map_single = false;
 		copied -= wrb->frag_len;
 		queue_head_inc(txq);
@@ -888,8 +886,9 @@ get_rx_page_info(struct be_adapter *adapter,
 	BUG_ON(!rx_page_info->page);
 
 	if (rx_page_info->last_page_user) {
-		pci_unmap_page(adapter->pdev, dma_unmap_addr(rx_page_info, bus),
-			adapter->big_page_size, PCI_DMA_FROMDEVICE);
+		dma_unmap_page(&adapter->pdev->dev,
+			       dma_unmap_addr(rx_page_info, bus),
+			       adapter->big_page_size, DMA_FROM_DEVICE);
 		rx_page_info->last_page_user = false;
 	}
 
@@ -1195,9 +1194,9 @@ static void be_post_rx_frags(struct be_rx_obj *rxo)
 				rxo->stats.rx_post_fail++;
 				break;
 			}
-			page_dmaaddr = pci_map_page(adapter->pdev, pagep, 0,
-						adapter->big_page_size,
-						PCI_DMA_FROMDEVICE);
+			page_dmaaddr = dma_map_page(&adapter->pdev->dev, pagep,
+						    0, adapter->big_page_size,
+						    DMA_FROM_DEVICE);
 			page_info->page_offset = 0;
 		} else {
 			get_page(pagep);
@@ -1270,8 +1269,8 @@ static void be_tx_compl_process(struct be_adapter *adapter, u16 last_index)
 	do {
 		cur_index = txq->tail;
 		wrb = queue_tail_node(txq);
-		unmap_tx_frag(adapter->pdev, wrb, (unmap_skb_hdr &&
-					skb_headlen(sent_skb)));
+		unmap_tx_frag(&adapter->pdev->dev, wrb,
+			      (unmap_skb_hdr && skb_headlen(sent_skb)));
 		unmap_skb_hdr = false;
 
 		num_wrbs++;
@@ -2179,7 +2178,8 @@ static int be_setup_wol(struct be_adapter *adapter, bool enable)
 	memset(mac, 0, ETH_ALEN);
 
 	cmd.size = sizeof(struct be_cmd_req_acpi_wol_magic_config);
-	cmd.va = pci_alloc_consistent(adapter->pdev, cmd.size, &cmd.dma);
+	cmd.va = dma_alloc_coherent(&adapter->pdev->dev, cmd.size, &cmd.dma,
+				    GFP_KERNEL);
 	if (cmd.va == NULL)
 		return -1;
 	memset(cmd.va, 0, cmd.size);
@@ -2190,8 +2190,8 @@ static int be_setup_wol(struct be_adapter *adapter, bool enable)
 		if (status) {
 			dev_err(&adapter->pdev->dev,
 				"Could not enable Wake-on-lan\n");
-			pci_free_consistent(adapter->pdev, cmd.size, cmd.va,
-					cmd.dma);
+			dma_free_coherent(&adapter->pdev->dev, cmd.size, cmd.va,
+					  cmd.dma);
 			return status;
 		}
 		status = be_cmd_enable_magic_wol(adapter,
@@ -2204,7 +2204,7 @@ static int be_setup_wol(struct be_adapter *adapter, bool enable)
 		pci_enable_wake(adapter->pdev, PCI_D3cold, 0);
 	}
 
-	pci_free_consistent(adapter->pdev, cmd.size, cmd.va, cmd.dma);
+	dma_free_coherent(&adapter->pdev->dev, cmd.size, cmd.va, cmd.dma);
 	return status;
 }
 
@@ -2528,8 +2528,8 @@ int be_load_fw(struct be_adapter *adapter, u8 *func)
 	dev_info(&adapter->pdev->dev, "Flashing firmware file %s\n", fw_file);
 
 	flash_cmd.size = sizeof(struct be_cmd_write_flashrom) + 32*1024;
-	flash_cmd.va = pci_alloc_consistent(adapter->pdev, flash_cmd.size,
-					&flash_cmd.dma);
+	flash_cmd.va = dma_alloc_coherent(&adapter->pdev->dev, flash_cmd.size,
+					  &flash_cmd.dma, GFP_KERNEL);
 	if (!flash_cmd.va) {
 		status = -ENOMEM;
 		dev_err(&adapter->pdev->dev,
@@ -2558,8 +2558,8 @@ int be_load_fw(struct be_adapter *adapter, u8 *func)
 		status = -1;
 	}
 
-	pci_free_consistent(adapter->pdev, flash_cmd.size, flash_cmd.va,
-				flash_cmd.dma);
+	dma_free_coherent(&adapter->pdev->dev, flash_cmd.size, flash_cmd.va,
+			  flash_cmd.dma);
 	if (status) {
 		dev_err(&adapter->pdev->dev, "Firmware load error\n");
 		goto fw_exit;
@@ -2700,13 +2700,13 @@ static void be_ctrl_cleanup(struct be_adapter *adapter)
 	be_unmap_pci_bars(adapter);
 
 	if (mem->va)
-		pci_free_consistent(adapter->pdev, mem->size,
-			mem->va, mem->dma);
+		dma_free_coherent(&adapter->pdev->dev, mem->size, mem->va,
+				  mem->dma);
 
 	mem = &adapter->mc_cmd_mem;
 	if (mem->va)
-		pci_free_consistent(adapter->pdev, mem->size,
-			mem->va, mem->dma);
+		dma_free_coherent(&adapter->pdev->dev, mem->size, mem->va,
+				  mem->dma);
 }
 
 static int be_ctrl_init(struct be_adapter *adapter)
@@ -2721,8 +2721,10 @@ static int be_ctrl_init(struct be_adapter *adapter)
 		goto done;
 
 	mbox_mem_alloc->size = sizeof(struct be_mcc_mailbox) + 16;
-	mbox_mem_alloc->va = pci_alloc_consistent(adapter->pdev,
-				mbox_mem_alloc->size, &mbox_mem_alloc->dma);
+	mbox_mem_alloc->va = dma_alloc_coherent(&adapter->pdev->dev,
+						mbox_mem_alloc->size,
+						&mbox_mem_alloc->dma,
+						GFP_KERNEL);
 	if (!mbox_mem_alloc->va) {
 		status = -ENOMEM;
 		goto unmap_pci_bars;
@@ -2734,8 +2736,9 @@ static int be_ctrl_init(struct be_adapter *adapter)
 	memset(mbox_mem_align->va, 0, sizeof(struct be_mcc_mailbox));
 
 	mc_cmd_mem->size = sizeof(struct be_cmd_req_mcast_mac_config);
-	mc_cmd_mem->va = pci_alloc_consistent(adapter->pdev, mc_cmd_mem->size,
-			&mc_cmd_mem->dma);
+	mc_cmd_mem->va = dma_alloc_coherent(&adapter->pdev->dev,
+					    mc_cmd_mem->size, &mc_cmd_mem->dma,
+					    GFP_KERNEL);
 	if (mc_cmd_mem->va == NULL) {
 		status = -ENOMEM;
 		goto free_mbox;
@@ -2751,8 +2754,8 @@ static int be_ctrl_init(struct be_adapter *adapter)
 	return 0;
 
 free_mbox:
-	pci_free_consistent(adapter->pdev, mbox_mem_alloc->size,
-		mbox_mem_alloc->va, mbox_mem_alloc->dma);
+	dma_free_coherent(&adapter->pdev->dev, mbox_mem_alloc->size,
+			  mbox_mem_alloc->va, mbox_mem_alloc->dma);
 
 unmap_pci_bars:
 	be_unmap_pci_bars(adapter);
@@ -2766,8 +2769,8 @@ static void be_stats_cleanup(struct be_adapter *adapter)
 	struct be_dma_mem *cmd = &adapter->stats_cmd;
 
 	if (cmd->va)
-		pci_free_consistent(adapter->pdev, cmd->size,
-			cmd->va, cmd->dma);
+		dma_free_coherent(&adapter->pdev->dev, cmd->size,
+				  cmd->va, cmd->dma);
 }
 
 static int be_stats_init(struct be_adapter *adapter)
@@ -2775,7 +2778,8 @@ static int be_stats_init(struct be_adapter *adapter)
 	struct be_dma_mem *cmd = &adapter->stats_cmd;
 
 	cmd->size = sizeof(struct be_cmd_req_get_stats);
-	cmd->va = pci_alloc_consistent(adapter->pdev, cmd->size, &cmd->dma);
+	cmd->va = dma_alloc_coherent(&adapter->pdev->dev, cmd->size, &cmd->dma,
+				     GFP_KERNEL);
 	if (cmd->va == NULL)
 		return -1;
 	memset(cmd->va, 0, cmd->size);
@@ -2918,11 +2922,11 @@ static int __devinit be_probe(struct pci_dev *pdev,
 	adapter->netdev = netdev;
 	SET_NETDEV_DEV(netdev, &pdev->dev);
 
-	status = pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
+	status = dma_set_mask(&pdev->dev, DMA_BIT_MASK(64));
 	if (!status) {
 		netdev->features |= NETIF_F_HIGHDMA;
 	} else {
-		status = pci_set_dma_mask(pdev, DMA_BIT_MASK(32));
+		status = dma_set_mask(&pdev->dev, DMA_BIT_MASK(32));
 		if (status) {
 			dev_err(&pdev->dev, "Could not set PCI DMA Mask\n");
 			goto free_netdev;
-- 
1.7.3.4


^ permalink raw reply related

* Re: Network performance with small packets
From: Shirley Ma @ 2011-02-02 18:11 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Krishna Kumar2, David Miller, kvm, mashirle, netdev, netdev-owner,
	Sridhar Samudrala, Steve Dobbelstein
In-Reply-To: <20110202173213.GA13907@redhat.com>

On Wed, 2011-02-02 at 19:32 +0200, Michael S. Tsirkin wrote:
> OK, but this should have no effect with a vhost patch
> which should ensure that we don't get an interrupt
> until the queue is at least half empty.
> Right?

There should be some coordination between guest and vhost. We shouldn't
count the TX packets when netif queue is enabled since next guest TX
xmit will free any used buffers in vhost. We need to be careful here in
case we miss the interrupts when netif queue has stopped.

However we can't change old guest so we can test the patches separately
for guest only, vhost only, and the combination.

> > > 
> > > Yes, it seems unrelated to tx interrupts. 
> > 
> > The issue is more likely related to latency.
> 
> Could be. Why do you think so?

Since I played with latency hack, I can see performance difference for
different latency.

> > Do you have anything in
> > mind on how to reduce vhost latency?
> > 
> > Thanks
> > Shirley
> 
> Hmm, bypassing the bridge might help a bit.
> Are you using tap+bridge or macvtap? 

I am using tap+bridge for TCP_RR test, I think Steven tested macvtap
before. He might have some data from his workload performance
measurement.

Shirley

^ permalink raw reply

* Re: Network performance with small packets
From: Michael S. Tsirkin @ 2011-02-02 18:20 UTC (permalink / raw)
  To: Shirley Ma
  Cc: Sridhar Samudrala, Steve Dobbelstein, David Miller, kvm, mashirle,
	netdev
In-Reply-To: <1296661371.25430.13.camel@localhost.localdomain>

On Wed, Feb 02, 2011 at 07:42:51AM -0800, Shirley Ma wrote:
> On Wed, 2011-02-02 at 12:49 +0200, Michael S. Tsirkin wrote:
> > On Tue, Feb 01, 2011 at 11:33:49PM -0800, Shirley Ma wrote:
> > > On Tue, 2011-02-01 at 23:14 -0800, Shirley Ma wrote:
> > > > w/i guest change, I played around the parameters,for example: I
> > could
> > > > get 3.7Gb/s with 42% CPU BW increasing from 2.5Gb/s for 1K message
> > > > size,
> > > > w/i dropping packet, I was able to get up to 6.2Gb/s with similar
> > CPU
> > > > usage. 
> > > 
> > > I meant w/o guest change, only vhost changes. Sorry about that.
> > > 
> > > Shirley
> > 
> > Ah, excellent. What were the parameters? 
> 
> I used half of the ring size 129 for packet counters,
> but the
> performance is still not as good as dropping packets on guest, 3.7 Gb/s
> vs. 6.2Gb/s.
> 
> Shirley

How many packets and bytes per interrupt are sent?
Also, what about other values for the counters and other counters?

What does your patch do? Just drop packets instead of
stopping the interface?

To have an understanding when should we drop packets
in the guest, we need to know *why* does it help.
Otherwise, how do we know it will work for others?
Note that qdisc will drop packets when it overruns -
so what is different? Also, are we over-running some other queue
somewhere?

-- 
MST

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox