Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [patch net-next-2.6] net: convert bonding to use rx_handler
From: Patrick McHardy @ 2011-02-18 14:46 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jiri Pirko, netdev, davem, shemminger, fubar, nicolas.2p.debian,
	andy
In-Reply-To: <1298039252.6201.66.camel@edumazet-laptop>

Am 18.02.2011 15:27, schrieb Eric Dumazet:
> Le vendredi 18 février 2011 à 15:14 +0100, Jiri Pirko a écrit :
> 
>> Do not know how to do it better. As for percpu variable, not only
>> origdev would have to be remembered but also probably skb pointer to
>> know if it's the first run on the skb or not. Can't really figure out a
>> better solution. Can you?
> 
> I'll try and let you know.

Why not simply do a lookup on skb->iif?

^ permalink raw reply

* Re: [PATCH v6 0/9] net: Unified offload configuration
From: Ben Hutchings @ 2011-02-18 14:29 UTC (permalink / raw)
  To: Michał Mirosław; +Cc: David Miller, netdev
In-Reply-To: <20110218142238.GA18099@rere.qmqm.pl>

On Fri, 2011-02-18 at 15:22 +0100, Michał Mirosław wrote:
> On Thu, Feb 17, 2011 at 02:56:11PM -0800, David Miller wrote:
[...]
> > Please get rid of that annoying message spit out by netif_features_change(),
> > it's just spam.  If we want notifications for stuff like this, use a
> > non-unicast netlink message so those who want to hear it can do so.
> 
> You mean netdev_update_features() "Features changed" message? Is it ok
> to just demote it to DEBUG level or you want to remove it altogether?
> What about netdev_fix_features() messages?

I think you need to emit these messages at 'error' severity when fixing
up features for a newly-added device, but at 'debug' later on.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [patch net-next-2.6] net: convert bonding to use rx_handler
From: Eric Dumazet @ 2011-02-18 14:27 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, shemminger, kaber, fubar, nicolas.2p.debian, andy
In-Reply-To: <20110218141456.GD2939@psychotron.redhat.com>

Le vendredi 18 février 2011 à 15:14 +0100, Jiri Pirko a écrit :

> Do not know how to do it better. As for percpu variable, not only
> origdev would have to be remembered but also probably skb pointer to
> know if it's the first run on the skb or not. Can't really figure out a
> better solution. Can you?

I'll try and let you know.

Thanks



^ permalink raw reply

* [patch 1/1] [PATCH] qeth: remove needless IPA-commands in offline
From: frank.blaschka @ 2011-02-18 14:22 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, Ursula Braun
In-Reply-To: <20110218142258.791988211@de.ibm.com>

[-- Attachment #1: 602-needless-ipa-commands.diff --]
[-- Type: text/plain, Size: 12373 bytes --]

From: Ursula Braun <ursula.braun@de.ibm.com>

If a qeth device is set offline, data and control subchannels are
cleared, which means removal of all IP Assist Primitive settings
implicitly. There is no need to delete those settings explicitly.
This patch removes all IP Assist invocations from offline.

Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
---


 drivers/s390/net/qeth_core.h      |    1 
 drivers/s390/net/qeth_core_main.c |   52 ++++++++++-------------------------
 drivers/s390/net/qeth_l2_main.c   |   45 +++++++++----------------------
 drivers/s390/net/qeth_l3_main.c   |   55 ++------------------------------------
 4 files changed, 33 insertions(+), 120 deletions(-)
--- a/drivers/s390/net/qeth_core.h
+++ b/drivers/s390/net/qeth_core.h
@@ -741,7 +741,6 @@ struct qeth_card {
 	/* QDIO buffer handling */
 	struct qeth_qdio_info qdio;
 	struct qeth_perf_stats perf_stats;
-	int use_hard_stop;
 	int read_or_write_problem;
 	struct qeth_osn_info osn_info;
 	struct qeth_discipline discipline;
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -302,12 +302,15 @@ static void qeth_issue_ipa_msg(struct qe
 	int com = cmd->hdr.command;
 	ipa_name = qeth_get_ipa_cmd_name(com);
 	if (rc)
-		QETH_DBF_MESSAGE(2, "IPA: %s(x%X) for %s returned x%X \"%s\"\n",
-				ipa_name, com, QETH_CARD_IFNAME(card),
-					rc, qeth_get_ipa_msg(rc));
+		QETH_DBF_MESSAGE(2, "IPA: %s(x%X) for %s/%s returned "
+				"x%X \"%s\"\n",
+				ipa_name, com, dev_name(&card->gdev->dev),
+				QETH_CARD_IFNAME(card), rc,
+				qeth_get_ipa_msg(rc));
 	else
-		QETH_DBF_MESSAGE(5, "IPA: %s(x%X) for %s succeeded\n",
-				ipa_name, com, QETH_CARD_IFNAME(card));
+		QETH_DBF_MESSAGE(5, "IPA: %s(x%X) for %s/%s succeeded\n",
+				ipa_name, com, dev_name(&card->gdev->dev),
+				QETH_CARD_IFNAME(card));
 }
 
 static struct qeth_ipa_cmd *qeth_check_ipa_data(struct qeth_card *card,
@@ -1083,7 +1086,6 @@ static int qeth_setup_card(struct qeth_c
 	card->data.state  = CH_STATE_DOWN;
 	card->state = CARD_STATE_DOWN;
 	card->lan_online = 0;
-	card->use_hard_stop = 0;
 	card->read_or_write_problem = 0;
 	card->dev = NULL;
 	spin_lock_init(&card->vlanlock);
@@ -1732,20 +1734,22 @@ int qeth_send_control_data(struct qeth_c
 		};
 	}
 
+	if (reply->rc == -EIO)
+		goto error;
 	rc = reply->rc;
 	qeth_put_reply(reply);
 	return rc;
 
 time_err:
+	reply->rc = -ETIME;
 	spin_lock_irqsave(&reply->card->lock, flags);
 	list_del_init(&reply->list);
 	spin_unlock_irqrestore(&reply->card->lock, flags);
-	reply->rc = -ETIME;
 	atomic_inc(&reply->received);
+error:
 	atomic_set(&card->write.irq_pending, 0);
 	qeth_release_buffer(iob->channel, iob);
 	card->write.buf_no = (card->write.buf_no + 1) % QETH_CMD_BUFFER_NO;
-	wake_up(&reply->wait_q);
 	rc = reply->rc;
 	qeth_put_reply(reply);
 	return rc;
@@ -2490,45 +2494,19 @@ int qeth_send_ipa_cmd(struct qeth_card *
 }
 EXPORT_SYMBOL_GPL(qeth_send_ipa_cmd);
 
-static int qeth_send_startstoplan(struct qeth_card *card,
-		enum qeth_ipa_cmds ipacmd, enum qeth_prot_versions prot)
-{
-	int rc;
-	struct qeth_cmd_buffer *iob;
-
-	iob = qeth_get_ipacmd_buffer(card, ipacmd, prot);
-	rc = qeth_send_ipa_cmd(card, iob, NULL, NULL);
-
-	return rc;
-}
-
 int qeth_send_startlan(struct qeth_card *card)
 {
 	int rc;
+	struct qeth_cmd_buffer *iob;
 
 	QETH_DBF_TEXT(SETUP, 2, "strtlan");
 
-	rc = qeth_send_startstoplan(card, IPA_CMD_STARTLAN, 0);
+	iob = qeth_get_ipacmd_buffer(card, IPA_CMD_STARTLAN, 0);
+	rc = qeth_send_ipa_cmd(card, iob, NULL, NULL);
 	return rc;
 }
 EXPORT_SYMBOL_GPL(qeth_send_startlan);
 
-int qeth_send_stoplan(struct qeth_card *card)
-{
-	int rc = 0;
-
-	/*
-	 * TODO: according to the IPA format document page 14,
-	 * TCP/IP (we!) never issue a STOPLAN
-	 * is this right ?!?
-	 */
-	QETH_DBF_TEXT(SETUP, 2, "stoplan");
-
-	rc = qeth_send_startstoplan(card, IPA_CMD_STOPLAN, 0);
-	return rc;
-}
-EXPORT_SYMBOL_GPL(qeth_send_stoplan);
-
 int qeth_default_setadapterparms_cb(struct qeth_card *card,
 		struct qeth_reply *reply, unsigned long data)
 {
--- a/drivers/s390/net/qeth_l2_main.c
+++ b/drivers/s390/net/qeth_l2_main.c
@@ -202,17 +202,19 @@ static void qeth_l2_add_mc(struct qeth_c
 		kfree(mc);
 }
 
-static void qeth_l2_del_all_mc(struct qeth_card *card)
+static void qeth_l2_del_all_mc(struct qeth_card *card, int del)
 {
 	struct qeth_mc_mac *mc, *tmp;
 
 	spin_lock_bh(&card->mclock);
 	list_for_each_entry_safe(mc, tmp, &card->mc_list, list) {
-		if (mc->is_vmac)
-			qeth_l2_send_setdelmac(card, mc->mc_addr,
+		if (del) {
+			if (mc->is_vmac)
+				qeth_l2_send_setdelmac(card, mc->mc_addr,
 					IPA_CMD_DELVMAC, NULL);
-		else
-			qeth_l2_send_delgroupmac(card, mc->mc_addr);
+			else
+				qeth_l2_send_delgroupmac(card, mc->mc_addr);
+		}
 		list_del(&mc->list);
 		kfree(mc);
 	}
@@ -288,18 +290,13 @@ static int qeth_l2_send_setdelvlan(struc
 				 qeth_l2_send_setdelvlan_cb, NULL);
 }
 
-static void qeth_l2_process_vlans(struct qeth_card *card, int clear)
+static void qeth_l2_process_vlans(struct qeth_card *card)
 {
 	struct qeth_vlan_vid *id;
 	QETH_CARD_TEXT(card, 3, "L2prcvln");
 	spin_lock_bh(&card->vlanlock);
 	list_for_each_entry(id, &card->vid_list, list) {
-		if (clear)
-			qeth_l2_send_setdelvlan(card, id->vid,
-				IPA_CMD_DELVLAN);
-		else
-			qeth_l2_send_setdelvlan(card, id->vid,
-				IPA_CMD_SETVLAN);
+		qeth_l2_send_setdelvlan(card, id->vid, IPA_CMD_SETVLAN);
 	}
 	spin_unlock_bh(&card->vlanlock);
 }
@@ -379,19 +376,11 @@ static int qeth_l2_stop_card(struct qeth
 			dev_close(card->dev);
 			rtnl_unlock();
 		}
-		if (!card->use_hard_stop ||
-			recovery_mode) {
-			__u8 *mac = &card->dev->dev_addr[0];
-			rc = qeth_l2_send_delmac(card, mac);
-			QETH_DBF_TEXT_(SETUP, 2, "Lerr%d", rc);
-		}
+		card->info.mac_bits &= ~QETH_LAYER2_MAC_REGISTERED;
 		card->state = CARD_STATE_SOFTSETUP;
 	}
 	if (card->state == CARD_STATE_SOFTSETUP) {
-		qeth_l2_process_vlans(card, 1);
-		if (!card->use_hard_stop ||
-			recovery_mode)
-			qeth_l2_del_all_mc(card);
+		qeth_l2_del_all_mc(card, 0);
 		qeth_clear_ipacmd_list(card);
 		card->state = CARD_STATE_HARDSETUP;
 	}
@@ -405,7 +394,6 @@ static int qeth_l2_stop_card(struct qeth
 		qeth_clear_cmd_buffers(&card->read);
 		qeth_clear_cmd_buffers(&card->write);
 	}
-	card->use_hard_stop = 0;
 	return rc;
 }
 
@@ -705,7 +693,7 @@ static void qeth_l2_set_multicast_list(s
 	if (qeth_threads_running(card, QETH_RECOVER_THREAD) &&
 	    (card->state != CARD_STATE_UP))
 		return;
-	qeth_l2_del_all_mc(card);
+	qeth_l2_del_all_mc(card, 1);
 	spin_lock_bh(&card->mclock);
 	netdev_for_each_mc_addr(ha, dev)
 		qeth_l2_add_mc(card, ha->addr, 0);
@@ -907,10 +895,8 @@ static void qeth_l2_remove_device(struct
 	qeth_set_allowed_threads(card, 0, 1);
 	wait_event(card->wait_q, qeth_threads_running(card, 0xffffffff) == 0);
 
-	if (cgdev->state == CCWGROUP_ONLINE) {
-		card->use_hard_stop = 1;
+	if (cgdev->state == CCWGROUP_ONLINE)
 		qeth_l2_set_offline(cgdev);
-	}
 
 	if (card->dev) {
 		unregister_netdev(card->dev);
@@ -1040,7 +1026,7 @@ contin:
 
 	if (card->info.type != QETH_CARD_TYPE_OSN &&
 	    card->info.type != QETH_CARD_TYPE_OSM)
-		qeth_l2_process_vlans(card, 0);
+		qeth_l2_process_vlans(card);
 
 	netif_tx_disable(card->dev);
 
@@ -1076,7 +1062,6 @@ contin:
 	return 0;
 
 out_remove:
-	card->use_hard_stop = 1;
 	qeth_l2_stop_card(card, 0);
 	ccw_device_set_offline(CARD_DDEV(card));
 	ccw_device_set_offline(CARD_WDEV(card));
@@ -1144,7 +1129,6 @@ static int qeth_l2_recover(void *ptr)
 	QETH_CARD_TEXT(card, 2, "recover2");
 	dev_warn(&card->gdev->dev,
 		"A recovery process has been started for the device\n");
-	card->use_hard_stop = 1;
 	__qeth_l2_set_offline(card->gdev, 1);
 	rc = __qeth_l2_set_online(card->gdev, 1);
 	if (!rc)
@@ -1191,7 +1175,6 @@ static int qeth_l2_pm_suspend(struct ccw
 	if (gdev->state == CCWGROUP_OFFLINE)
 		return 0;
 	if (card->state == CARD_STATE_UP) {
-		card->use_hard_stop = 1;
 		__qeth_l2_set_offline(card->gdev, 1);
 	} else
 		__qeth_l2_set_offline(card->gdev, 0);
--- a/drivers/s390/net/qeth_l3_main.c
+++ b/drivers/s390/net/qeth_l3_main.c
@@ -510,8 +510,7 @@ static void qeth_l3_set_ip_addr_list(str
 	kfree(tbd_list);
 }
 
-static void qeth_l3_clear_ip_list(struct qeth_card *card, int clean,
-					int recover)
+static void qeth_l3_clear_ip_list(struct qeth_card *card, int recover)
 {
 	struct qeth_ipaddr *addr, *tmp;
 	unsigned long flags;
@@ -530,11 +529,6 @@ static void qeth_l3_clear_ip_list(struct
 		addr = list_entry(card->ip_list.next,
 				  struct qeth_ipaddr, entry);
 		list_del_init(&addr->entry);
-		if (clean) {
-			spin_unlock_irqrestore(&card->ip_lock, flags);
-			qeth_l3_deregister_addr_entry(card, addr);
-			spin_lock_irqsave(&card->ip_lock, flags);
-		}
 		if (!recover || addr->is_multicast) {
 			kfree(addr);
 			continue;
@@ -1611,29 +1605,6 @@ static int qeth_l3_start_ipassists(struc
 	return 0;
 }
 
-static int qeth_l3_put_unique_id(struct qeth_card *card)
-{
-
-	int rc = 0;
-	struct qeth_cmd_buffer *iob;
-	struct qeth_ipa_cmd *cmd;
-
-	QETH_CARD_TEXT(card, 2, "puniqeid");
-
-	if ((card->info.unique_id & UNIQUE_ID_NOT_BY_CARD) ==
-		UNIQUE_ID_NOT_BY_CARD)
-		return -1;
-	iob = qeth_get_ipacmd_buffer(card, IPA_CMD_DESTROY_ADDR,
-				     QETH_PROT_IPV6);
-	cmd = (struct qeth_ipa_cmd *)(iob->data+IPA_PDU_HEADER_SIZE);
-	*((__u16 *) &cmd->data.create_destroy_addr.unique_id[6]) =
-				card->info.unique_id;
-	memcpy(&cmd->data.create_destroy_addr.unique_id[0],
-	       card->dev->dev_addr, OSA_ADDR_LEN);
-	rc = qeth_send_ipa_cmd(card, iob, NULL, NULL);
-	return rc;
-}
-
 static int qeth_l3_iqd_read_initial_mac_cb(struct qeth_card *card,
 		struct qeth_reply *reply, unsigned long data)
 {
@@ -2324,25 +2295,14 @@ static int qeth_l3_stop_card(struct qeth
 			dev_close(card->dev);
 			rtnl_unlock();
 		}
-		if (!card->use_hard_stop) {
-			rc = qeth_send_stoplan(card);
-			if (rc)
-				QETH_DBF_TEXT_(SETUP, 2, "1err%d", rc);
-		}
 		card->state = CARD_STATE_SOFTSETUP;
 	}
 	if (card->state == CARD_STATE_SOFTSETUP) {
-		qeth_l3_clear_ip_list(card, !card->use_hard_stop, 1);
+		qeth_l3_clear_ip_list(card, 1);
 		qeth_clear_ipacmd_list(card);
 		card->state = CARD_STATE_HARDSETUP;
 	}
 	if (card->state == CARD_STATE_HARDSETUP) {
-		if (!card->use_hard_stop &&
-		    (card->info.type != QETH_CARD_TYPE_IQD)) {
-			rc = qeth_l3_put_unique_id(card);
-			if (rc)
-				QETH_DBF_TEXT_(SETUP, 2, "2err%d", rc);
-		}
 		qeth_qdio_clear_card(card, 0);
 		qeth_clear_qdio_buffers(card);
 		qeth_clear_working_pool_list(card);
@@ -2352,7 +2312,6 @@ static int qeth_l3_stop_card(struct qeth
 		qeth_clear_cmd_buffers(&card->read);
 		qeth_clear_cmd_buffers(&card->write);
 	}
-	card->use_hard_stop = 0;
 	return rc;
 }
 
@@ -3483,17 +3442,15 @@ static void qeth_l3_remove_device(struct
 	qeth_set_allowed_threads(card, 0, 1);
 	wait_event(card->wait_q, qeth_threads_running(card, 0xffffffff) == 0);
 
-	if (cgdev->state == CCWGROUP_ONLINE) {
-		card->use_hard_stop = 1;
+	if (cgdev->state == CCWGROUP_ONLINE)
 		qeth_l3_set_offline(cgdev);
-	}
 
 	if (card->dev) {
 		unregister_netdev(card->dev);
 		card->dev = NULL;
 	}
 
-	qeth_l3_clear_ip_list(card, 0, 0);
+	qeth_l3_clear_ip_list(card, 0);
 	qeth_l3_clear_ipato_list(card);
 	return;
 }
@@ -3594,7 +3551,6 @@ contin:
 	mutex_unlock(&card->discipline_mutex);
 	return 0;
 out_remove:
-	card->use_hard_stop = 1;
 	qeth_l3_stop_card(card, 0);
 	ccw_device_set_offline(CARD_DDEV(card));
 	ccw_device_set_offline(CARD_WDEV(card));
@@ -3663,7 +3619,6 @@ static int qeth_l3_recover(void *ptr)
 	QETH_CARD_TEXT(card, 2, "recover2");
 	dev_warn(&card->gdev->dev,
 		"A recovery process has been started for the device\n");
-	card->use_hard_stop = 1;
 	__qeth_l3_set_offline(card->gdev, 1);
 	rc = __qeth_l3_set_online(card->gdev, 1);
 	if (!rc)
@@ -3684,7 +3639,6 @@ static int qeth_l3_recover(void *ptr)
 static void qeth_l3_shutdown(struct ccwgroup_device *gdev)
 {
 	struct qeth_card *card = dev_get_drvdata(&gdev->dev);
-	qeth_l3_clear_ip_list(card, 0, 0);
 	qeth_qdio_clear_card(card, 0);
 	qeth_clear_qdio_buffers(card);
 }
@@ -3700,7 +3654,6 @@ static int qeth_l3_pm_suspend(struct ccw
 	if (gdev->state == CCWGROUP_OFFLINE)
 		return 0;
 	if (card->state == CARD_STATE_UP) {
-		card->use_hard_stop = 1;
 		__qeth_l3_set_offline(card->gdev, 1);
 	} else
 		__qeth_l3_set_offline(card->gdev, 0);


^ permalink raw reply

* [patch 0/1] s390: one more qeth patches for net-next
From: frank.blaschka @ 2011-02-18 14:22 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390

Hi Dave,

one more qeth patch for net-next

shortlog:
Ursula Braun (1)
qeth: remove needless IPA-commands in offline

Thanks,
        Frank

^ permalink raw reply

* Re: [PATCH v6 0/9] net: Unified offload configuration
From: Michał Mirosław @ 2011-02-18 14:22 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, bhutchings
In-Reply-To: <20110217.145611.229745070.davem@davemloft.net>

On Thu, Feb 17, 2011 at 02:56:11PM -0800, David Miller wrote:
> From: Michał Mirosław <mirq-linux@rere.qmqm.pl>
> Date: Wed, 16 Feb 2011 03:59:16 +0100 (CET)
> > Here's a v6 of the ethtool unification patch series.
> > 
> > What's in it?
> >  1..4:
> > 	cleanups for the core patches
> >  5:
> > 	the patch - implement unified ethtool setting ops
> >  6..7:
> > 	implement interoperation between old and new ethtool ops
> >  8:
> > 	include RX checksum in features and plug it into new framework
> >  9:
> > 	convert loopback device to new framework
> > 
> > What is it good for?
> >  - unifies driver behaviour wrt hardware offloads
> >  - removes a lot of boilerplate code from drivers
> >  - allows better fine-grained control over used offloads
> Applied to net-next-2.6, please send any bug fixes relative to this.
> 
> Please get rid of that annoying message spit out by netif_features_change(),
> it's just spam.  If we want notifications for stuff like this, use a
> non-unicast netlink message so those who want to hear it can do so.

You mean netdev_update_features() "Features changed" message? Is it ok
to just demote it to DEBUG level or you want to remove it altogether?
What about netdev_fix_features() messages?

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [patch net-next-2.6] net: convert bonding to use rx_handler
From: Jiri Pirko @ 2011-02-18 14:14 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, davem, shemminger, kaber, fubar, nicolas.2p.debian, andy
In-Reply-To: <1298035791.6201.56.camel@edumazet-laptop>

Fri, Feb 18, 2011 at 02:29:51PM CET, eric.dumazet@gmail.com wrote:
>Le vendredi 18 février 2011 à 14:25 +0100, Jiri Pirko a écrit :
>> This patch converts bonding to use rx_handler. Results in cleaner
>> __netif_receive_skb() with much less exceptions needed. Also bond-specific
>> work is moved into bond code.
>> 
>> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
>> ---
>>  drivers/net/bonding/bond_main.c |   75 ++++++++++++++++++++-
>>  include/linux/skbuff.h          |    2 +
>>  net/core/dev.c                  |  144 +++++++++++---------------------------
>>  net/core/skbuff.c               |    1 +
>>  4 files changed, 119 insertions(+), 103 deletions(-)
>> 
>
>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>> index 31f02d0..9f3af5d 100644
>> --- a/include/linux/skbuff.h
>> +++ b/include/linux/skbuff.h
>> @@ -267,6 +267,7 @@ typedef unsigned char *sk_buff_data_t;
>>   *	@sk: Socket we are owned by
>>   *	@tstamp: Time we arrived
>>   *	@dev: Device we arrived on/are leaving by
>> + *	@input_dev: Original device on which we arrived
>>   *	@transport_header: Transport layer header
>>   *	@network_header: Network layer header
>>   *	@mac_header: Link layer header
>> @@ -325,6 +326,7 @@ struct sk_buff {
>>  
>>  	struct sock		*sk;
>>  	struct net_device	*dev;
>> +	struct net_device	*input_dev;
>>  
>
>Your patch looks fine, but adding 8 bytes to sk_buff for a "cleanup" is
>really a show stopper for me.

Do not know how to do it better. As for percpu variable, not only
origdev would have to be remembered but also probably skb pointer to
know if it's the first run on the skb or not. Can't really figure out a
better solution. Can you?

Thanks.

Jirka
>
>
>

^ permalink raw reply

* [PATCH v2] net: provide default_advmss() methods to blackhole dst_ops
From: Eric Dumazet @ 2011-02-18 13:44 UTC (permalink / raw)
  To: Changli Gao
  Cc: George Spelvin, David Miller, linux-kernel, netdev, Roland Dreier,
	Daniel Baluta
In-Reply-To: <1298036005.6201.58.camel@edumazet-laptop>

Le vendredi 18 février 2011 à 14:33 +0100, Eric Dumazet a écrit :

> I had this exact idea but found we need struct net pointer to get this
> value, not provided in parameters, so I falled back to the 256 value.
> 
> 

Hmm, reading again this stuff, maybe we can just use
ipv4_default_advmss() instead of a custom one.

dst->dev should be available

[PATCH] net: provide default_advmss() methods to blackhole dst_ops

Commit 0dbaee3b37e118a (net: Abstract default ADVMSS behind an
accessor.) introduced a possible crash in tcp_connect_init(), when
dst->default_advmss() is called from dst_metric_advmss()

Reported-by: George Spelvin <linux@horizon.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Roland Dreier <roland@purestorage.com>
CC: Changli Gao <xiaosuo@gmail.com>
CC: Daniel Baluta <dbaluta@ixiacom.com>
---
 net/ipv4/route.c |    1 +
 net/ipv6/route.c |    1 +
 2 files changed, 2 insertions(+)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 788a3e7..6ed6603 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2722,6 +2722,7 @@ static struct dst_ops ipv4_dst_blackhole_ops = {
 	.destroy		=	ipv4_dst_destroy,
 	.check			=	ipv4_blackhole_dst_check,
 	.default_mtu		=	ipv4_blackhole_default_mtu,
+	.default_advmss		=	ipv4_default_advmss,
 	.update_pmtu		=	ipv4_rt_blackhole_update_pmtu,
 };
 
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 1c29f95..a998db6 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -128,6 +128,7 @@ static struct dst_ops ip6_dst_blackhole_ops = {
 	.destroy		=	ip6_dst_destroy,
 	.check			=	ip6_dst_check,
 	.default_mtu		=	ip6_blackhole_default_mtu,
+	.default_advmss		=	ip6_default_advmss,
 	.update_pmtu		=	ip6_rt_blackhole_update_pmtu,
 };
 



^ permalink raw reply related

* Re: [PATCH] net: Add default_advmss() methods to blackhole dst_ops
From: Daniel Baluta @ 2011-02-18 13:43 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Changli Gao, George Spelvin, David Miller, linux-kernel, netdev,
	Roland Dreier
In-Reply-To: <1298036005.6201.58.camel@edumazet-laptop>

On Fri, Feb 18, 2011 at 3:33 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le vendredi 18 février 2011 à 21:29 +0800, Changli Gao a écrit :
>> On Fri, Feb 18, 2011 at 9:24 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > Le vendredi 18 février 2011 à 21:16 +0800, Changli Gao a écrit :
>> >
>> >> I am wondering why magic number 256 is used here. Is there a special
>> >> reason? Thanks.
>> >>
>> >
>> > It really doesnt matter. SYN message will be dropped anyway.
>> >
>> > 256 happens to be the default value
>> > of /proc/sys/net/ipv4/route/min_adv_mss

Perhaps a comment would make sense then.

thanks,
Daniel.

^ permalink raw reply

* Re: [PATCH] net: Add default_advmss() methods to blackhole dst_ops
From: Eric Dumazet @ 2011-02-18 13:33 UTC (permalink / raw)
  To: Changli Gao
  Cc: George Spelvin, David Miller, linux-kernel, netdev, Roland Dreier
In-Reply-To: <AANLkTi=O1LZzSc8=qMX=R3rJpCVL8eWr1eo6NkiZXS0m@mail.gmail.com>

Le vendredi 18 février 2011 à 21:29 +0800, Changli Gao a écrit :
> On Fri, Feb 18, 2011 at 9:24 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Le vendredi 18 février 2011 à 21:16 +0800, Changli Gao a écrit :
> >
> >> I am wondering why magic number 256 is used here. Is there a special
> >> reason? Thanks.
> >>
> >
> > It really doesnt matter. SYN message will be dropped anyway.
> >
> > 256 happens to be the default value
> > of /proc/sys/net/ipv4/route/min_adv_mss
> >
> 
> Thanks for your explaining. IMHO, ip_rt_min_advmss is better than a
> magic number, here.
> 

I had this exact idea but found we need struct net pointer to get this
value, not provided in parameters, so I falled back to the 256 value.

^ permalink raw reply

* Re: KIND
From: Mr.David Gurupatham @ 2011-02-18  9:07 UTC (permalink / raw)
  To: netdev@vger.kernel.org

My name is David Gurupatham a legal practitioner with David Gurupatham  & Associates

in Kuala Lumpur Msia.

I found your contact/profile some where over the Internet and it gave me the

greatest joy, that you are the one I have been looking for. Whom I strongly believe could

execute this project with me. Kindly get back to me for more information

Best regards,

Mr David Gurupatham{Esq}

^ permalink raw reply

* Re: [patch net-next-2.6] net: convert bonding to use rx_handler
From: Eric Dumazet @ 2011-02-18 13:29 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, shemminger, kaber, fubar, nicolas.2p.debian, andy
In-Reply-To: <20110218132524.GC2939@psychotron.redhat.com>

Le vendredi 18 février 2011 à 14:25 +0100, Jiri Pirko a écrit :
> This patch converts bonding to use rx_handler. Results in cleaner
> __netif_receive_skb() with much less exceptions needed. Also bond-specific
> work is moved into bond code.
> 
> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
> ---
>  drivers/net/bonding/bond_main.c |   75 ++++++++++++++++++++-
>  include/linux/skbuff.h          |    2 +
>  net/core/dev.c                  |  144 +++++++++++---------------------------
>  net/core/skbuff.c               |    1 +
>  4 files changed, 119 insertions(+), 103 deletions(-)
> 

> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 31f02d0..9f3af5d 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -267,6 +267,7 @@ typedef unsigned char *sk_buff_data_t;
>   *	@sk: Socket we are owned by
>   *	@tstamp: Time we arrived
>   *	@dev: Device we arrived on/are leaving by
> + *	@input_dev: Original device on which we arrived
>   *	@transport_header: Transport layer header
>   *	@network_header: Network layer header
>   *	@mac_header: Link layer header
> @@ -325,6 +326,7 @@ struct sk_buff {
>  
>  	struct sock		*sk;
>  	struct net_device	*dev;
> +	struct net_device	*input_dev;
>  

Your patch looks fine, but adding 8 bytes to sk_buff for a "cleanup" is
really a show stopper for me.




^ permalink raw reply

* Re: [PATCH] net: Add default_advmss() methods to blackhole dst_ops
From: Changli Gao @ 2011-02-18 13:29 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: George Spelvin, David Miller, linux-kernel, netdev, Roland Dreier
In-Reply-To: <1298035443.6201.51.camel@edumazet-laptop>

On Fri, Feb 18, 2011 at 9:24 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le vendredi 18 février 2011 à 21:16 +0800, Changli Gao a écrit :
>
>> I am wondering why magic number 256 is used here. Is there a special
>> reason? Thanks.
>>
>
> It really doesnt matter. SYN message will be dropped anyway.
>
> 256 happens to be the default value
> of /proc/sys/net/ipv4/route/min_adv_mss
>

Thanks for your explaining. IMHO, ip_rt_min_advmss is better than a
magic number, here.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* [patch net-next-2.6] net: convert bonding to use rx_handler
From: Jiri Pirko @ 2011-02-18 13:25 UTC (permalink / raw)
  To: netdev
  Cc: davem, shemminger, kaber, fubar, eric.dumazet, nicolas.2p.debian,
	andy

This patch converts bonding to use rx_handler. Results in cleaner
__netif_receive_skb() with much less exceptions needed. Also bond-specific
work is moved into bond code.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
 drivers/net/bonding/bond_main.c |   75 ++++++++++++++++++++-
 include/linux/skbuff.h          |    2 +
 net/core/dev.c                  |  144 +++++++++++---------------------------
 net/core/skbuff.c               |    1 +
 4 files changed, 119 insertions(+), 103 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 77e3c6a..a856a11 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1423,6 +1423,68 @@ static void bond_setup_by_slave(struct net_device *bond_dev,
 	bond->setup_by_slave = 1;
 }
 
+/* On bonding slaves other than the currently active slave, suppress
+ * duplicates except for 802.3ad ETH_P_SLOW, alb non-mcast/bcast, and
+ * ARP on active-backup slaves with arp_validate enabled.
+ */
+static bool bond_should_deliver_exact_match(struct sk_buff *skb,
+					    struct net_device *slave_dev,
+					    struct net_device *bond_dev)
+{
+	if (slave_dev->priv_flags & IFF_SLAVE_INACTIVE) {
+		if (slave_dev->priv_flags & IFF_SLAVE_NEEDARP &&
+		    skb->protocol == __cpu_to_be16(ETH_P_ARP))
+			return false;
+
+		if (bond_dev->priv_flags & IFF_MASTER_ALB &&
+		    skb->pkt_type != PACKET_BROADCAST &&
+		    skb->pkt_type != PACKET_MULTICAST)
+				return false;
+
+		if (bond_dev->priv_flags & IFF_MASTER_8023AD &&
+		    skb->protocol == __cpu_to_be16(ETH_P_SLOW))
+			return false;
+
+		return true;
+	}
+	return false;
+}
+
+static struct sk_buff *bond_handle_frame(struct sk_buff *skb)
+{
+	struct net_device *slave_dev;
+	struct net_device *bond_dev;
+
+	skb = skb_share_check(skb, GFP_ATOMIC);
+	if (unlikely(!skb))
+		return NULL;
+	slave_dev = skb->dev;
+	bond_dev = ACCESS_ONCE(slave_dev->master);
+	if (unlikely(!bond_dev))
+		return skb;
+
+	if (bond_dev->priv_flags & IFF_MASTER_ARPMON)
+		slave_dev->last_rx = jiffies;
+
+	if (bond_should_deliver_exact_match(skb, slave_dev, bond_dev)) {
+		skb->deliver_no_wcard = 1;
+		return skb;
+	}
+
+	skb->dev = bond_dev;
+
+	if (bond_dev->priv_flags & IFF_MASTER_ALB &&
+	    bond_dev->priv_flags & IFF_BRIDGE_PORT &&
+	    skb->pkt_type == PACKET_HOST) {
+		u16 *dest = (u16 *) eth_hdr(skb)->h_dest;
+
+		memcpy(dest, bond_dev->dev_addr, ETH_ALEN);
+	}
+
+	netif_rx(skb);
+	return NULL;
+}
+
 /* enslave device <slave> to bond device <master> */
 int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 {
@@ -1599,11 +1661,17 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 		pr_debug("Error %d calling netdev_set_bond_master\n", res);
 		goto err_restore_mac;
 	}
+	res = netdev_rx_handler_register(slave_dev, bond_handle_frame, NULL);
+	if (res) {
+		pr_debug("Error %d calling netdev_rx_handler_register\n", res);
+		goto err_unset_master;
+	}
+
 	/* open the slave since the application closed it */
 	res = dev_open(slave_dev);
 	if (res) {
 		pr_debug("Opening slave %s failed\n", slave_dev->name);
-		goto err_unset_master;
+		goto err_unreg_rxhandler;
 	}
 
 	new_slave->dev = slave_dev;
@@ -1811,6 +1879,9 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 err_close:
 	dev_close(slave_dev);
 
+err_unreg_rxhandler:
+	netdev_rx_handler_unregister(slave_dev);
+
 err_unset_master:
 	netdev_set_bond_master(slave_dev, NULL);
 
@@ -1992,6 +2063,7 @@ int bond_release(struct net_device *bond_dev, struct net_device *slave_dev)
 		netif_addr_unlock_bh(bond_dev);
 	}
 
+	netdev_rx_handler_unregister(slave_dev);
 	netdev_set_bond_master(slave_dev, NULL);
 
 #ifdef CONFIG_NET_POLL_CONTROLLER
@@ -2114,6 +2186,7 @@ static int bond_release_all(struct net_device *bond_dev)
 			netif_addr_unlock_bh(bond_dev);
 		}
 
+		netdev_rx_handler_unregister(slave_dev);
 		netdev_set_bond_master(slave_dev, NULL);
 
 		/* close slave before restoring its mac address */
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 31f02d0..9f3af5d 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -267,6 +267,7 @@ typedef unsigned char *sk_buff_data_t;
  *	@sk: Socket we are owned by
  *	@tstamp: Time we arrived
  *	@dev: Device we arrived on/are leaving by
+ *	@input_dev: Original device on which we arrived
  *	@transport_header: Transport layer header
  *	@network_header: Network layer header
  *	@mac_header: Link layer header
@@ -325,6 +326,7 @@ struct sk_buff {
 
 	struct sock		*sk;
 	struct net_device	*dev;
+	struct net_device	*input_dev;
 
 	/*
 	 * This is the control buffer. It is free to use for every
diff --git a/net/core/dev.c b/net/core/dev.c
index 4f69439..9d2f485 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1530,12 +1530,17 @@ int dev_forward_skb(struct net_device *dev, struct sk_buff *skb)
 }
 EXPORT_SYMBOL_GPL(dev_forward_skb);
 
+static inline int __deliver_skb(struct sk_buff *skb,
+				struct packet_type *pt_prev)
+{
+	return pt_prev->func(skb, skb->dev, pt_prev, skb->input_dev);
+}
+
 static inline int deliver_skb(struct sk_buff *skb,
-			      struct packet_type *pt_prev,
-			      struct net_device *orig_dev)
+			      struct packet_type *pt_prev)
 {
 	atomic_inc(&skb->users);
-	return pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
+	return __deliver_skb(skb, pt_prev);
 }
 
 /*
@@ -1558,7 +1563,7 @@ static void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)
 		    (ptype->af_packet_priv == NULL ||
 		     (struct sock *)ptype->af_packet_priv != skb->sk)) {
 			if (pt_prev) {
-				deliver_skb(skb2, pt_prev, skb->dev);
+				deliver_skb(skb2, pt_prev);
 				pt_prev = ptype;
 				continue;
 			}
@@ -1591,7 +1596,7 @@ static void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)
 		}
 	}
 	if (pt_prev)
-		pt_prev->func(skb2, skb->dev, pt_prev, skb->dev);
+		__deliver_skb(skb2, pt_prev);
 	rcu_read_unlock();
 }
 
@@ -3021,8 +3026,7 @@ static int ing_filter(struct sk_buff *skb, struct netdev_queue *rxq)
 }
 
 static inline struct sk_buff *handle_ing(struct sk_buff *skb,
-					 struct packet_type **pt_prev,
-					 int *ret, struct net_device *orig_dev)
+					 struct packet_type **pt_prev, int *ret)
 {
 	struct netdev_queue *rxq = rcu_dereference(skb->dev->ingress_queue);
 
@@ -3030,7 +3034,7 @@ static inline struct sk_buff *handle_ing(struct sk_buff *skb,
 		goto out;
 
 	if (*pt_prev) {
-		*ret = deliver_skb(skb, *pt_prev, orig_dev);
+		*ret = deliver_skb(skb, *pt_prev);
 		*pt_prev = NULL;
 	}
 
@@ -3092,63 +3096,30 @@ void netdev_rx_handler_unregister(struct net_device *dev)
 }
 EXPORT_SYMBOL_GPL(netdev_rx_handler_unregister);
 
-static inline void skb_bond_set_mac_by_master(struct sk_buff *skb,
-					      struct net_device *master)
-{
-	if (skb->pkt_type == PACKET_HOST) {
-		u16 *dest = (u16 *) eth_hdr(skb)->h_dest;
-
-		memcpy(dest, master->dev_addr, ETH_ALEN);
-	}
-}
-
-/* On bonding slaves other than the currently active slave, suppress
- * duplicates except for 802.3ad ETH_P_SLOW, alb non-mcast/bcast, and
- * ARP on active-backup slaves with arp_validate enabled.
- */
-static int __skb_bond_should_drop(struct sk_buff *skb,
-				  struct net_device *master)
+static void vlan_on_bond_hook(struct sk_buff *skb)
 {
-	struct net_device *dev = skb->dev;
-
-	if (master->priv_flags & IFF_MASTER_ARPMON)
-		dev->last_rx = jiffies;
-
-	if ((master->priv_flags & IFF_MASTER_ALB) &&
-	    (master->priv_flags & IFF_BRIDGE_PORT)) {
-		/* Do address unmangle. The local destination address
-		 * will be always the one master has. Provides the right
-		 * functionality in a bridge.
-		 */
-		skb_bond_set_mac_by_master(skb, master);
-	}
-
-	if (dev->priv_flags & IFF_SLAVE_INACTIVE) {
-		if ((dev->priv_flags & IFF_SLAVE_NEEDARP) &&
-		    skb->protocol == __cpu_to_be16(ETH_P_ARP))
-			return 0;
-
-		if (master->priv_flags & IFF_MASTER_ALB) {
-			if (skb->pkt_type != PACKET_BROADCAST &&
-			    skb->pkt_type != PACKET_MULTICAST)
-				return 0;
-		}
-		if (master->priv_flags & IFF_MASTER_8023AD &&
-		    skb->protocol == __cpu_to_be16(ETH_P_SLOW))
-			return 0;
+	/*
+	 * Make sure ARP frames received on VLAN interfaces stacked on
+	 * bonding interfaces still make their way to any base bonding
+	 * device that may have registered for a specific ptype.
+	 */
+	if (skb->dev->priv_flags & IFF_802_1Q_VLAN &&
+	    vlan_dev_real_dev(skb->dev)->priv_flags & IFF_BONDING &&
+	    skb->protocol == htons(ETH_P_ARP)) {
+		struct sk_buff *skb2 = skb_clone(skb, GFP_ATOMIC);
 
-		return 1;
+		if (!skb2)
+			return;
+		skb2->dev = vlan_dev_real_dev(skb->dev);
+		netif_rx(skb2);
 	}
-	return 0;
 }
 
 static int __netif_receive_skb(struct sk_buff *skb)
 {
 	struct packet_type *ptype, *pt_prev;
 	rx_handler_func_t *rx_handler;
-	struct net_device *orig_dev;
-	struct net_device *null_or_orig;
-	struct net_device *orig_or_bond;
+	struct net_device *null_or_dev;
 	int ret = NET_RX_DROP;
 	__be16 type;
 
@@ -3164,29 +3135,8 @@ static int __netif_receive_skb(struct sk_buff *skb)
 	if (!skb->skb_iif)
 		skb->skb_iif = skb->dev->ifindex;
 
-	/*
-	 * bonding note: skbs received on inactive slaves should only
-	 * be delivered to pkt handlers that are exact matches.  Also
-	 * the deliver_no_wcard flag will be set.  If packet handlers
-	 * are sensitive to duplicate packets these skbs will need to
-	 * be dropped at the handler.
-	 */
-	null_or_orig = NULL;
-	orig_dev = skb->dev;
-	if (skb->deliver_no_wcard)
-		null_or_orig = orig_dev;
-	else if (netif_is_bond_slave(orig_dev)) {
-		struct net_device *bond_master = ACCESS_ONCE(orig_dev->master);
-
-		if (likely(bond_master)) {
-			if (__skb_bond_should_drop(skb, bond_master)) {
-				skb->deliver_no_wcard = 1;
-				/* deliver only exact match */
-				null_or_orig = orig_dev;
-			} else
-				skb->dev = bond_master;
-		}
-	}
+	if (!skb->input_dev)
+		skb->input_dev = skb->dev;
 
 	__this_cpu_inc(softnet_data.processed);
 	skb_reset_network_header(skb);
@@ -3205,26 +3155,24 @@ static int __netif_receive_skb(struct sk_buff *skb)
 #endif
 
 	list_for_each_entry_rcu(ptype, &ptype_all, list) {
-		if (ptype->dev == null_or_orig || ptype->dev == skb->dev ||
-		    ptype->dev == orig_dev) {
+		if (!ptype->dev || ptype->dev == skb->dev) {
 			if (pt_prev)
-				ret = deliver_skb(skb, pt_prev, orig_dev);
+				ret = deliver_skb(skb, pt_prev);
 			pt_prev = ptype;
 		}
 	}
 
 #ifdef CONFIG_NET_CLS_ACT
-	skb = handle_ing(skb, &pt_prev, &ret, orig_dev);
+	skb = handle_ing(skb, &pt_prev, &ret);
 	if (!skb)
 		goto out;
 ncls:
 #endif
 
-	/* Handle special case of bridge or macvlan */
 	rx_handler = rcu_dereference(skb->dev->rx_handler);
 	if (rx_handler) {
 		if (pt_prev) {
-			ret = deliver_skb(skb, pt_prev, orig_dev);
+			ret = deliver_skb(skb, pt_prev);
 			pt_prev = NULL;
 		}
 		skb = rx_handler(skb);
@@ -3234,7 +3182,7 @@ ncls:
 
 	if (vlan_tx_tag_present(skb)) {
 		if (pt_prev) {
-			ret = deliver_skb(skb, pt_prev, orig_dev);
+			ret = deliver_skb(skb, pt_prev);
 			pt_prev = NULL;
 		}
 		if (vlan_hwaccel_do_receive(&skb)) {
@@ -3244,32 +3192,24 @@ ncls:
 			goto out;
 	}
 
-	/*
-	 * Make sure frames received on VLAN interfaces stacked on
-	 * bonding interfaces still make their way to any base bonding
-	 * device that may have registered for a specific ptype.  The
-	 * handler may have to adjust skb->dev and orig_dev.
-	 */
-	orig_or_bond = orig_dev;
-	if ((skb->dev->priv_flags & IFF_802_1Q_VLAN) &&
-	    (vlan_dev_real_dev(skb->dev)->priv_flags & IFF_BONDING)) {
-		orig_or_bond = vlan_dev_real_dev(skb->dev);
-	}
+	vlan_on_bond_hook(skb);
+
+	/* deliver only exact match when indicated */
+	null_or_dev = skb->deliver_no_wcard ? skb->dev : NULL;
 
 	type = skb->protocol;
 	list_for_each_entry_rcu(ptype,
 			&ptype_base[ntohs(type) & PTYPE_HASH_MASK], list) {
-		if (ptype->type == type && (ptype->dev == null_or_orig ||
-		     ptype->dev == skb->dev || ptype->dev == orig_dev ||
-		     ptype->dev == orig_or_bond)) {
+		if (ptype->type == type &&
+		    (ptype->dev == null_or_dev || ptype->dev == skb->dev)) {
 			if (pt_prev)
-				ret = deliver_skb(skb, pt_prev, orig_dev);
+				ret = deliver_skb(skb, pt_prev);
 			pt_prev = ptype;
 		}
 	}
 
 	if (pt_prev) {
-		ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
+		ret = __deliver_skb(skb, pt_prev);
 	} else {
 		atomic_long_inc(&skb->dev->rx_dropped);
 		kfree_skb(skb);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 14cf560..e19eabe 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -508,6 +508,7 @@ static void __copy_skb_header(struct sk_buff *new, const struct sk_buff *old)
 {
 	new->tstamp		= old->tstamp;
 	new->dev		= old->dev;
+	new->input_dev		= old->input_dev;
 	new->transport_header	= old->transport_header;
 	new->network_header	= old->network_header;
 	new->mac_header		= old->mac_header;
-- 
1.7.3.4


^ permalink raw reply related

* [PATCH net-next-2.6] net: add __rcu annotations to sk_wq and wq
From: Eric Dumazet @ 2011-02-18 13:26 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Add proper RCU annotations/verbs to sk_wq and wq members

Fix __sctp_write_space() sk_sleep() abuse (and sock->wq access)

Fix sunrpc sk_sleep() abuse too

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/linux/net.h  |    3 ++-
 include/net/sock.h   |    7 ++++---
 net/sctp/socket.c    |    9 +++++----
 net/socket.c         |   23 ++++++++++++++---------
 net/sunrpc/svcsock.c |   32 ++++++++++++++++++++------------
 net/unix/af_unix.c   |    2 +-
 6 files changed, 46 insertions(+), 30 deletions(-)

diff --git a/include/linux/net.h b/include/linux/net.h
index 16faa13..94de83c 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -118,6 +118,7 @@ enum sock_shutdown_cmd {
 };
 
 struct socket_wq {
+	/* Note: wait MUST be first field of socket_wq */
 	wait_queue_head_t	wait;
 	struct fasync_struct	*fasync_list;
 	struct rcu_head		rcu;
@@ -142,7 +143,7 @@ struct socket {
 
 	unsigned long		flags;
 
-	struct socket_wq	*wq;
+	struct socket_wq __rcu	*wq;
 
 	struct file		*file;
 	struct sock		*sk;
diff --git a/include/net/sock.h b/include/net/sock.h
index e3893a2..da0534d 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -281,7 +281,7 @@ struct sock {
 	int			sk_rcvbuf;
 
 	struct sk_filter __rcu	*sk_filter;
-	struct socket_wq	*sk_wq;
+	struct socket_wq __rcu	*sk_wq;
 
 #ifdef CONFIG_NET_DMA
 	struct sk_buff_head	sk_async_wait_queue;
@@ -1266,7 +1266,8 @@ static inline void sk_set_socket(struct sock *sk, struct socket *sock)
 
 static inline wait_queue_head_t *sk_sleep(struct sock *sk)
 {
-	return &sk->sk_wq->wait;
+	BUILD_BUG_ON(offsetof(struct socket_wq, wait) != 0);
+	return &rcu_dereference_raw(sk->sk_wq)->wait;
 }
 /* Detach socket from process context.
  * Announce socket dead, detach it from wait queue and inode.
@@ -1287,7 +1288,7 @@ static inline void sock_orphan(struct sock *sk)
 static inline void sock_graft(struct sock *sk, struct socket *parent)
 {
 	write_lock_bh(&sk->sk_callback_lock);
-	rcu_assign_pointer(sk->sk_wq, parent->wq);
+	sk->sk_wq = parent->wq;
 	parent->sk = sk;
 	sk_set_socket(sk, parent);
 	security_sock_graft(sk, parent);
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 8e02550..b53b2eb 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -6102,15 +6102,16 @@ static void __sctp_write_space(struct sctp_association *asoc)
 			wake_up_interruptible(&asoc->wait);
 
 		if (sctp_writeable(sk)) {
-			if (sk_sleep(sk) && waitqueue_active(sk_sleep(sk)))
-				wake_up_interruptible(sk_sleep(sk));
+			wait_queue_head_t *wq = sk_sleep(sk);
+
+			if (wq && waitqueue_active(wq))
+				wake_up_interruptible(wq);
 
 			/* Note that we try to include the Async I/O support
 			 * here by modeling from the current TCP/UDP code.
 			 * We have not tested with it yet.
 			 */
-			if (sock->wq->fasync_list &&
-			    !(sk->sk_shutdown & SEND_SHUTDOWN))
+			if (!(sk->sk_shutdown & SEND_SHUTDOWN))
 				sock_wake_async(sock,
 						SOCK_WAKE_SPACE, POLL_OUT);
 		}
diff --git a/net/socket.c b/net/socket.c
index ac2219f..9fa1e3b 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -240,17 +240,19 @@ static struct kmem_cache *sock_inode_cachep __read_mostly;
 static struct inode *sock_alloc_inode(struct super_block *sb)
 {
 	struct socket_alloc *ei;
+	struct socket_wq *wq;
 
 	ei = kmem_cache_alloc(sock_inode_cachep, GFP_KERNEL);
 	if (!ei)
 		return NULL;
-	ei->socket.wq = kmalloc(sizeof(struct socket_wq), GFP_KERNEL);
-	if (!ei->socket.wq) {
+	wq = kmalloc(sizeof(*wq), GFP_KERNEL);
+	if (!wq) {
 		kmem_cache_free(sock_inode_cachep, ei);
 		return NULL;
 	}
-	init_waitqueue_head(&ei->socket.wq->wait);
-	ei->socket.wq->fasync_list = NULL;
+	init_waitqueue_head(&wq->wait);
+	wq->fasync_list = NULL;
+	RCU_INIT_POINTER(ei->socket.wq, wq);
 
 	ei->socket.state = SS_UNCONNECTED;
 	ei->socket.flags = 0;
@@ -273,9 +275,11 @@ static void wq_free_rcu(struct rcu_head *head)
 static void sock_destroy_inode(struct inode *inode)
 {
 	struct socket_alloc *ei;
+	struct socket_wq *wq;
 
 	ei = container_of(inode, struct socket_alloc, vfs_inode);
-	call_rcu(&ei->socket.wq->rcu, wq_free_rcu);
+	wq = rcu_dereference_protected(ei->socket.wq, 1);
+	call_rcu(&wq->rcu, wq_free_rcu);
 	kmem_cache_free(sock_inode_cachep, ei);
 }
 
@@ -524,7 +528,7 @@ void sock_release(struct socket *sock)
 		module_put(owner);
 	}
 
-	if (sock->wq->fasync_list)
+	if (rcu_dereference_protected(sock->wq, 1)->fasync_list)
 		printk(KERN_ERR "sock_release: fasync list not empty!\n");
 
 	percpu_sub(sockets_in_use, 1);
@@ -1108,15 +1112,16 @@ static int sock_fasync(int fd, struct file *filp, int on)
 {
 	struct socket *sock = filp->private_data;
 	struct sock *sk = sock->sk;
+	struct socket_wq *wq;
 
 	if (sk == NULL)
 		return -EINVAL;
 
 	lock_sock(sk);
+	wq = rcu_dereference_protected(sock->wq, sock_owned_by_user(sk));
+	fasync_helper(fd, filp, on, &wq->fasync_list);
 
-	fasync_helper(fd, filp, on, &sock->wq->fasync_list);
-
-	if (!sock->wq->fasync_list)
+	if (!wq->fasync_list)
 		sock_reset_flag(sk, SOCK_FASYNC);
 	else
 		sock_set_flag(sk, SOCK_FASYNC);
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 7bd3bbb..76425d2 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -420,6 +420,7 @@ static void svc_sock_setbufsize(struct socket *sock, unsigned int snd,
 static void svc_udp_data_ready(struct sock *sk, int count)
 {
 	struct svc_sock	*svsk = (struct svc_sock *)sk->sk_user_data;
+	wait_queue_head_t *wq = sk_sleep(sk);
 
 	if (svsk) {
 		dprintk("svc: socket %p(inet %p), count=%d, busy=%d\n",
@@ -428,8 +429,8 @@ static void svc_udp_data_ready(struct sock *sk, int count)
 		set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
 		svc_xprt_enqueue(&svsk->sk_xprt);
 	}
-	if (sk_sleep(sk) && waitqueue_active(sk_sleep(sk)))
-		wake_up_interruptible(sk_sleep(sk));
+	if (wq && waitqueue_active(wq))
+		wake_up_interruptible(wq);
 }
 
 /*
@@ -438,6 +439,7 @@ static void svc_udp_data_ready(struct sock *sk, int count)
 static void svc_write_space(struct sock *sk)
 {
 	struct svc_sock	*svsk = (struct svc_sock *)(sk->sk_user_data);
+	wait_queue_head_t *wq = sk_sleep(sk);
 
 	if (svsk) {
 		dprintk("svc: socket %p(inet %p), write_space busy=%d\n",
@@ -445,10 +447,10 @@ static void svc_write_space(struct sock *sk)
 		svc_xprt_enqueue(&svsk->sk_xprt);
 	}
 
-	if (sk_sleep(sk) && waitqueue_active(sk_sleep(sk))) {
+	if (wq && waitqueue_active(wq)) {
 		dprintk("RPC svc_write_space: someone sleeping on %p\n",
 		       svsk);
-		wake_up_interruptible(sk_sleep(sk));
+		wake_up_interruptible(wq);
 	}
 }
 
@@ -739,6 +741,7 @@ static void svc_udp_init(struct svc_sock *svsk, struct svc_serv *serv)
 static void svc_tcp_listen_data_ready(struct sock *sk, int count_unused)
 {
 	struct svc_sock	*svsk = (struct svc_sock *)sk->sk_user_data;
+	wait_queue_head_t *wq;
 
 	dprintk("svc: socket %p TCP (listen) state change %d\n",
 		sk, sk->sk_state);
@@ -761,8 +764,9 @@ static void svc_tcp_listen_data_ready(struct sock *sk, int count_unused)
 			printk("svc: socket %p: no user data\n", sk);
 	}
 
-	if (sk_sleep(sk) && waitqueue_active(sk_sleep(sk)))
-		wake_up_interruptible_all(sk_sleep(sk));
+	wq = sk_sleep(sk);
+	if (wq && waitqueue_active(wq))
+		wake_up_interruptible_all(wq);
 }
 
 /*
@@ -771,6 +775,7 @@ static void svc_tcp_listen_data_ready(struct sock *sk, int count_unused)
 static void svc_tcp_state_change(struct sock *sk)
 {
 	struct svc_sock	*svsk = (struct svc_sock *)sk->sk_user_data;
+	wait_queue_head_t *wq = sk_sleep(sk);
 
 	dprintk("svc: socket %p TCP (connected) state change %d (svsk %p)\n",
 		sk, sk->sk_state, sk->sk_user_data);
@@ -781,13 +786,14 @@ static void svc_tcp_state_change(struct sock *sk)
 		set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
 		svc_xprt_enqueue(&svsk->sk_xprt);
 	}
-	if (sk_sleep(sk) && waitqueue_active(sk_sleep(sk)))
-		wake_up_interruptible_all(sk_sleep(sk));
+	if (wq && waitqueue_active(wq))
+		wake_up_interruptible_all(wq);
 }
 
 static void svc_tcp_data_ready(struct sock *sk, int count)
 {
 	struct svc_sock *svsk = (struct svc_sock *)sk->sk_user_data;
+	wait_queue_head_t *wq = sk_sleep(sk);
 
 	dprintk("svc: socket %p TCP data ready (svsk %p)\n",
 		sk, sk->sk_user_data);
@@ -795,8 +801,8 @@ static void svc_tcp_data_ready(struct sock *sk, int count)
 		set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
 		svc_xprt_enqueue(&svsk->sk_xprt);
 	}
-	if (sk_sleep(sk) && waitqueue_active(sk_sleep(sk)))
-		wake_up_interruptible(sk_sleep(sk));
+	if (wq && waitqueue_active(wq))
+		wake_up_interruptible(wq);
 }
 
 /*
@@ -1531,6 +1537,7 @@ static void svc_sock_detach(struct svc_xprt *xprt)
 {
 	struct svc_sock *svsk = container_of(xprt, struct svc_sock, sk_xprt);
 	struct sock *sk = svsk->sk_sk;
+	wait_queue_head_t *wq;
 
 	dprintk("svc: svc_sock_detach(%p)\n", svsk);
 
@@ -1539,8 +1546,9 @@ static void svc_sock_detach(struct svc_xprt *xprt)
 	sk->sk_data_ready = svsk->sk_odata;
 	sk->sk_write_space = svsk->sk_owspace;
 
-	if (sk_sleep(sk) && waitqueue_active(sk_sleep(sk)))
-		wake_up_interruptible(sk_sleep(sk));
+	wq = sk_sleep(sk);
+	if (wq && waitqueue_active(wq))
+		wake_up_interruptible(wq);
 }
 
 /*
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index d8d98d5..217fb7f 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1171,7 +1171,7 @@ restart:
 	newsk->sk_type		= sk->sk_type;
 	init_peercred(newsk);
 	newu = unix_sk(newsk);
-	newsk->sk_wq		= &newu->peer_wq;
+	RCU_INIT_POINTER(newsk->sk_wq, &newu->peer_wq);
 	otheru = unix_sk(other);
 
 	/* copy address information from listening to new sock*/



^ permalink raw reply related

* Re: [PATCH] net: Add default_advmss() methods to blackhole dst_ops
From: Eric Dumazet @ 2011-02-18 13:24 UTC (permalink / raw)
  To: Changli Gao
  Cc: George Spelvin, David Miller, linux-kernel, netdev, Roland Dreier
In-Reply-To: <AANLkTi=GRPWNWZuORx1zoJJURE-drqEoMdTuKB9LZ8Ux@mail.gmail.com>

Le vendredi 18 février 2011 à 21:16 +0800, Changli Gao a écrit :

> I am wondering why magic number 256 is used here. Is there a special
> reason? Thanks.
> 

It really doesnt matter. SYN message will be dropped anyway.

256 happens to be the default value
of /proc/sys/net/ipv4/route/min_adv_mss

^ permalink raw reply

* Re: [PATCH] net: Add default_advmss() methods to blackhole dst_ops
From: Changli Gao @ 2011-02-18 13:16 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: George Spelvin, David Miller, linux-kernel, netdev, Roland Dreier
In-Reply-To: <1298023023.2595.170.camel@edumazet-laptop>

On Fri, Feb 18, 2011 at 5:57 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
> I suspect following patch is needed
>
> CC Roland Dreier <roland@purestorage.com> because he fixed the
> default_mtu problem in commit ec831ea72ee5d7d47
> (net: Add default_mtu() methods to blackhole dst_ops)
>
> Thanks !
>
> [PATCH] net: Add default_advmss() methods to blackhole dst_ops
>
> Commit 0dbaee3b37e118a (net: Abstract default ADVMSS behind an
> accessor.) introduced a possible crash in tcp_connect_init(), when
> dst->default_advmss() is called from dst_metric_advmss()
>
> Reported-by: George Spelvin <linux@horizon.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Roland Dreier <roland@purestorage.com>
> ---
>  net/ipv4/route.c |    7 +++++++
>  net/ipv6/route.c |    6 ++++++
>  2 files changed, 13 insertions(+)
>
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 788a3e7..5edb605 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -2712,6 +2712,12 @@ static unsigned int ipv4_blackhole_default_mtu(const struct dst_entry *dst)
>        return 0;
>  }
>
> +static unsigned int ipv4_blackhole_default_advmss(const struct dst_entry *dst)
> +{
> +       return 256;
> +}
> +
> +

I am wondering why magic number 256 is used here. Is there a special
reason? Thanks.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: KIND
From: Mr.David Gurupatham @ 2011-02-18  9:07 UTC (permalink / raw)
  To: netdev@vger.kernel.org

My name is David Gurupatham a legal practitioner with David Gurupatham  & Associates

in Kuala Lumpur Msia.

I found your contact/profile some where over the Internet and it gave me the

greatest joy, that you are the one I have been looking for. Whom I strongly believe could

execute this project with me. Kindly get back to me for more information

Best regards,

Mr David Gurupatham{Esq}

^ permalink raw reply

* You have reached the limit of your email quota.
From: WED MAIL HELP DESK @ 2011-02-18 11:52 UTC (permalink / raw)
  To: 111

You have reached the limit of your email quota.
You will not be able to send or receive new mail until you boost 
your mailbox size.

Click the below link and fill the form to upgrade your account.
http://www.psanthony.com/contact/use/accountupgrading/form1.html

Technical Support
192.168.0.1 

______________ ______________ ______________ ______________
Sent via the KillerWebMail system at cyril.k12.ok.us

^ permalink raw reply

* [PATCH] stmmac: fixed dma lib build when turn-on the debug option
From: Peppe CAVALLARO @ 2011-02-18 10:21 UTC (permalink / raw)
  To: netdev@vger.kernel.org; +Cc: Peppe CAVALLARO

This patch fixes a compilation error when build the
dwmac_lib with the DEBUG option enabled.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
---
 drivers/net/stmmac/dwmac_lib.c |   28 ++++++++++++++--------------
 1 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/net/stmmac/dwmac_lib.c b/drivers/net/stmmac/dwmac_lib.c
index d65fab1..e250935 100644
--- a/drivers/net/stmmac/dwmac_lib.c
+++ b/drivers/net/stmmac/dwmac_lib.c
@@ -26,9 +26,9 @@
 
 #undef DWMAC_DMA_DEBUG
 #ifdef DWMAC_DMA_DEBUG
-#define DBG(fmt, args...)  printk(fmt, ## args)
+#define DWMAC_LIB_DBG(fmt, args...)  printk(fmt, ## args)
 #else
-#define DBG(fmt, args...)  do { } while (0)
+#define DWMAC_LIB_DBG(fmt, args...)  do { } while (0)
 #endif
 
 /* CSR1 enables the transmit DMA to check for new descriptor */
@@ -152,7 +152,7 @@ int dwmac_dma_interrupt(void __iomem *ioaddr,
 	/* read the status register (CSR5) */
 	u32 intr_status = readl(ioaddr + DMA_STATUS);
 
-	DBG(INFO, "%s: [CSR5: 0x%08x]\n", __func__, intr_status);
+	DWMAC_LIB_DBG(KERN_INFO "%s: [CSR5: 0x%08x]\n", __func__, intr_status);
 #ifdef DWMAC_DMA_DEBUG
 	/* It displays the DMA process states (CSR5 register) */
 	show_tx_process_state(intr_status);
@@ -160,43 +160,43 @@ int dwmac_dma_interrupt(void __iomem *ioaddr,
 #endif
 	/* ABNORMAL interrupts */
 	if (unlikely(intr_status & DMA_STATUS_AIS)) {
-		DBG(INFO, "CSR5[15] DMA ABNORMAL IRQ: ");
+		DWMAC_LIB_DBG(KERN_INFO "CSR5[15] DMA ABNORMAL IRQ: ");
 		if (unlikely(intr_status & DMA_STATUS_UNF)) {
-			DBG(INFO, "transmit underflow\n");
+			DWMAC_LIB_DBG(KERN_INFO "transmit underflow\n");
 			ret = tx_hard_error_bump_tc;
 			x->tx_undeflow_irq++;
 		}
 		if (unlikely(intr_status & DMA_STATUS_TJT)) {
-			DBG(INFO, "transmit jabber\n");
+			DWMAC_LIB_DBG(KERN_INFO "transmit jabber\n");
 			x->tx_jabber_irq++;
 		}
 		if (unlikely(intr_status & DMA_STATUS_OVF)) {
-			DBG(INFO, "recv overflow\n");
+			DWMAC_LIB_DBG(KERN_INFO "recv overflow\n");
 			x->rx_overflow_irq++;
 		}
 		if (unlikely(intr_status & DMA_STATUS_RU)) {
-			DBG(INFO, "receive buffer unavailable\n");
+			DWMAC_LIB_DBG(KERN_INFO "receive buffer unavailable\n");
 			x->rx_buf_unav_irq++;
 		}
 		if (unlikely(intr_status & DMA_STATUS_RPS)) {
-			DBG(INFO, "receive process stopped\n");
+			DWMAC_LIB_DBG(KERN_INFO "receive process stopped\n");
 			x->rx_process_stopped_irq++;
 		}
 		if (unlikely(intr_status & DMA_STATUS_RWT)) {
-			DBG(INFO, "receive watchdog\n");
+			DWMAC_LIB_DBG(KERN_INFO "receive watchdog\n");
 			x->rx_watchdog_irq++;
 		}
 		if (unlikely(intr_status & DMA_STATUS_ETI)) {
-			DBG(INFO, "transmit early interrupt\n");
+			DWMAC_LIB_DBG(KERN_INFO "transmit early interrupt\n");
 			x->tx_early_irq++;
 		}
 		if (unlikely(intr_status & DMA_STATUS_TPS)) {
-			DBG(INFO, "transmit process stopped\n");
+			DWMAC_LIB_DBG(KERN_INFO "transmit process stopped\n");
 			x->tx_process_stopped_irq++;
 			ret = tx_hard_error;
 		}
 		if (unlikely(intr_status & DMA_STATUS_FBI)) {
-			DBG(INFO, "fatal bus error\n");
+			DWMAC_LIB_DBG(KERN_INFO "fatal bus error\n");
 			x->fatal_bus_error_irq++;
 			ret = tx_hard_error;
 		}
@@ -215,7 +215,7 @@ int dwmac_dma_interrupt(void __iomem *ioaddr,
 	/* Clear the interrupt by writing a logic 1 to the CSR5[15-0] */
 	writel((intr_status & 0x1ffff), ioaddr + DMA_STATUS);
 
-	DBG(INFO, "\n\n");
+	DWMAC_LIB_DBG(KERN_INFO "\n\n");
 	return ret;
 }
 
-- 
1.7.4

^ permalink raw reply related

* [PATCH] net: Add default_advmss() methods to blackhole dst_ops
From: Eric Dumazet @ 2011-02-18  9:57 UTC (permalink / raw)
  To: George Spelvin, David Miller; +Cc: linux-kernel, netdev, Roland Dreier
In-Reply-To: <1298010735.2642.9.camel@edumazet-laptop>

Le vendredi 18 février 2011 à 07:32 +0100, Eric Dumazet a écrit :
> Le vendredi 18 février 2011 à 00:03 -0500, George Spelvin a écrit :
> > But wonder of wonders, kernel mode switching worked so I got to see the
> > oops on the text-mode console.  It happened just as I tried to ssh out.
> > 
> 
> CC netdev (removed linux-netdev)
> 
> I'll take a look this morning, thanks for the report.
> 
> > It's a Core 2 duo laptop (Dell E1405), 2 GB RAM, running a 32-bit kernel.
> > It's worth noting that I was using wired internet (b44 driver) and
> > not wireless.
> > 
> > I've been having a lot of weird lockups with 2.6.38-rcX, quite a change
> > from the very stable 2.6.36, but this is the first time I booted -rc5.
> > Also, the symptoms are very different; before the lockup did not seen
> > correlated with any particular activity, but the "lockup" was more like
> > something getting wedged in the kernel that more and more tasks would
> > get stuck on until everything stopped responding.
> > 
> > I should mention that this is transcribed by hand from the screen.
> > Oh, and also, it is far from the first time I ran ssh this boot.
> > (I re-tested it ater rebooting, just to be sure.  Not a consistent
> > crash.)
> > 
> > Anyway, jumping to address 0 looks "interesting", so it seems worth reporting.
> > 
> > BUG: unable to handle kernel NULL pointer dereference at   (null)
> > IP: [<  (null)>]   (null)
> > *pde = 00000000
> > Oops: 0000 [#1] SMP
> > last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/uevent
> > Modules linked in: rfcomm btusb sco l2cap crc16 bluetooth b43 mac80211 cfg80211 [last unloaded: sha256_generic]
> > 
> > Pid: 12178, comm: ssh Not tainted 2.6.38-rc5 #227 Dell Inc. MXC061                          /0MG532
> > EIP: 0060:[<00000000>] EFLAGS: 0021246 CPU: 0
> > EIP is at 0x0
> > EAX: f5947e00 EBX: f5982f80 ECX: 00000024 EDX: c148fe00
> > ESI: f5947e00 EDI: ebf29e54 EBP: ebf29ee0 ESP: ebf29dd0
> >  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> > Process ssh (pid: 12178, ti=ebf28000 task=ebefc6c0 task.ti = ebef28000)
> > Stack:
> >  c12c83ea 07e2427d 02000000 7b018f79 c117b827 0b996609 7b018f79 5f5f6644
> >  f5982f80 00000000 ebf29e58 ebf29ee0 c12cc0e0 00001600 00000000 00000001
> >  03641600 00000000 00000000 00000000 00000000 036423c0 3e6423c0 00000000
> > Call Trace:
> >  [<c12c83ea>] ? tcp_connect+0xdd/0x3fd
> >  [<c117b827>] ? secure_tcp_sequence_number+0x4f/0x65
> >  [<c12cc0e0>] ? tcp_v4_connect+0x3c1/0x417
> >  [<c12d6726>] ? inet_stream_connect+0x88/0x1fc
> >  [<c1110705>] ? _copy_from_user+0x2b/0x10e
> >  [<c129307d>] ? sys_connect+0x70/0x98
> >  [<c108df0f>] ? get_empty_filp+0x9f/0x121
> >  [<c108dfa0>] ? alloc_file+0xf/0x85
> >  [<c1293287>] ? sock_alloc_file+0x97/0xeb
> >  [<c108b758>] ? fd_install+0x1b/0x38
> >  [<c12932f6>] ? sock_map_fd+0x1b/0x20
> >  [<c1293bdf>] ? sys_socketcall+0x9d/0x291
> >  [<c1002750>] ? sysenter_do_call+0x12/0x26
> > Code:  Bad EIP value
> > EIP: [<00000000>] 0x0 SS:ESP 0068:ebf29dd0
> > CR2: 0000000000000000


I suspect following patch is needed

CC Roland Dreier <roland@purestorage.com> because he fixed the
default_mtu problem in commit ec831ea72ee5d7d47
(net: Add default_mtu() methods to blackhole dst_ops)

Thanks !

[PATCH] net: Add default_advmss() methods to blackhole dst_ops

Commit 0dbaee3b37e118a (net: Abstract default ADVMSS behind an
accessor.) introduced a possible crash in tcp_connect_init(), when
dst->default_advmss() is called from dst_metric_advmss()

Reported-by: George Spelvin <linux@horizon.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Roland Dreier <roland@purestorage.com>
---
 net/ipv4/route.c |    7 +++++++
 net/ipv6/route.c |    6 ++++++
 2 files changed, 13 insertions(+)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 788a3e7..5edb605 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2712,6 +2712,12 @@ static unsigned int ipv4_blackhole_default_mtu(const struct dst_entry *dst)
 	return 0;
 }
 
+static unsigned int ipv4_blackhole_default_advmss(const struct dst_entry *dst)
+{
+	return 256;
+}
+
+
 static void ipv4_rt_blackhole_update_pmtu(struct dst_entry *dst, u32 mtu)
 {
 }
@@ -2722,6 +2728,7 @@ static struct dst_ops ipv4_dst_blackhole_ops = {
 	.destroy		=	ipv4_dst_destroy,
 	.check			=	ipv4_blackhole_dst_check,
 	.default_mtu		=	ipv4_blackhole_default_mtu,
+	.default_advmss		=	ipv4_blackhole_default_advmss,
 	.update_pmtu		=	ipv4_rt_blackhole_update_pmtu,
 };
 
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 1c29f95..2eeeabb 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -118,6 +118,11 @@ static unsigned int ip6_blackhole_default_mtu(const struct dst_entry *dst)
 	return 0;
 }
 
+static unsigned int ip6_blackhole_default_advmss(const struct dst_entry *dst)
+{
+	return 256;
+}
+
 static void ip6_rt_blackhole_update_pmtu(struct dst_entry *dst, u32 mtu)
 {
 }
@@ -128,6 +133,7 @@ static struct dst_ops ip6_dst_blackhole_ops = {
 	.destroy		=	ip6_dst_destroy,
 	.check			=	ip6_dst_check,
 	.default_mtu		=	ip6_blackhole_default_mtu,
+	.default_advmss		=	ip6_blackhole_default_advmss,
 	.update_pmtu		=	ip6_rt_blackhole_update_pmtu,
 };
 

^ permalink raw reply related

* [V4 PATCH 3/3] bond: service netpoll arp queue on master device
From: Amerigo Wang @ 2011-02-18  9:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Neil Horman, Herbert Xu, Amerigo Wang, David S. Miller,
	Neil Horman, Eric Dumazet, netdev
In-Reply-To: <1298022215-21059-1-git-send-email-amwang@redhat.com>

Neil pointed out that we can't send ARP reply on behalf of slaves,
we need to move the arp queue to their bond device.

Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Neil Horman <nhorman@redhat.com>

---
 net/core/netpoll.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index f68e694..013e04a 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -193,6 +193,15 @@ void netpoll_poll_dev(struct net_device *dev)
 
 	poll_napi(dev);
 
+	if (dev->priv_flags & IFF_SLAVE) {
+		if (dev->npinfo) {
+			struct net_device *bond_dev = dev->master;
+			struct sk_buff *skb;
+			while ((skb = skb_dequeue(&dev->npinfo->arp_tx)))
+				skb_queue_tail(&bond_dev->npinfo->arp_tx, skb);
+		}
+	}
+
 	service_arp_queue(dev->npinfo);
 
 	zap_completion_queue();
-- 
1.7.1

^ permalink raw reply related

* [V4 PATCH 2/3] netpoll: remove IFF_IN_NETPOLL flag
From: Amerigo Wang @ 2011-02-18  9:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Neil Horman, Herbert Xu, Amerigo Wang, Jay Vosburgh,
	David S. Miller, Simon Horman, Jiri Pirko, Patrick McHardy,
	Neil Horman, Eric Dumazet, netdev
In-Reply-To: <1298022215-21059-1-git-send-email-amwang@redhat.com>

V4: rebase to net-next-2.6

This patch removes the flag IFF_IN_NETPOLL, we don't need it any more since
we have netpoll_tx_running() now.

Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Neil Horman <nhorman@redhat.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>

---
 drivers/net/bonding/bond_main.c |    6 ++----
 drivers/net/bonding/bonding.h   |    2 +-
 include/linux/if.h              |    9 ++++-----
 net/core/netpoll.c              |    2 --
 4 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 2ed6624..c75126d 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -423,11 +423,9 @@ int bond_dev_queue_xmit(struct bonding *bond, struct sk_buff *skb,
 {
 	skb->dev = slave_dev;
 	skb->priority = 1;
-	if (unlikely(netpoll_tx_running(slave_dev))) {
-		slave_dev->priv_flags |= IFF_IN_NETPOLL;
+	if (unlikely(netpoll_tx_running(slave_dev)))
 		bond_netpoll_send_skb(bond_get_slave_by_dev(bond, slave_dev), skb);
-		slave_dev->priv_flags &= ~IFF_IN_NETPOLL;
-	} else
+	else
 		dev_queue_xmit(skb);
 
 	return 0;
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index 0a3e00b..a401b8d 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -133,7 +133,7 @@ static inline void unblock_netpoll_tx(void)
 
 static inline int is_netpoll_tx_blocked(struct net_device *dev)
 {
-	if (unlikely(dev->priv_flags & IFF_IN_NETPOLL))
+	if (unlikely(netpoll_tx_running(dev)))
 		return atomic_read(&netpoll_block_tx);
 	return 0;
 }
diff --git a/include/linux/if.h b/include/linux/if.h
index 1239599..3bc63e6 100644
--- a/include/linux/if.h
+++ b/include/linux/if.h
@@ -71,11 +71,10 @@
 					 * release skb->dst
 					 */
 #define IFF_DONT_BRIDGE 0x800		/* disallow bridging this ether dev */
-#define IFF_IN_NETPOLL	0x1000		/* whether we are processing netpoll */
-#define IFF_DISABLE_NETPOLL	0x2000	/* disable netpoll at run-time */
-#define IFF_MACVLAN_PORT	0x4000	/* device used as macvlan port */
-#define IFF_BRIDGE_PORT	0x8000		/* device used as bridge port */
-#define IFF_OVS_DATAPATH	0x10000	/* device used as Open vSwitch
+#define IFF_DISABLE_NETPOLL	0x1000	/* disable netpoll at run-time */
+#define IFF_MACVLAN_PORT	0x2000	/* device used as macvlan port */
+#define IFF_BRIDGE_PORT	0x4000		/* device used as bridge port */
+#define IFF_OVS_DATAPATH	0x8000	/* device used as Open vSwitch
 					 * datapath port */
 
 #define IF_GET_IFACE	0x0001		/* for querying only */
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 02dc2cb..f68e694 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -313,9 +313,7 @@ void netpoll_send_skb_on_dev(struct netpoll *np, struct sk_buff *skb,
 		     tries > 0; --tries) {
 			if (__netif_tx_trylock(txq)) {
 				if (!netif_tx_queue_stopped(txq)) {
-					dev->priv_flags |= IFF_IN_NETPOLL;
 					status = ops->ndo_start_xmit(skb, dev);
-					dev->priv_flags &= ~IFF_IN_NETPOLL;
 					if (status == NETDEV_TX_OK)
 						txq_trans_update(txq);
 				}
-- 
1.7.1

^ permalink raw reply related

* [V4 PATCH 1/3] bonding: sync netpoll code with bridge
From: Amerigo Wang @ 2011-02-18  9:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: Neil Horman, Herbert Xu, Amerigo Wang, Jay Vosburgh, netdev

V4: rebase to net-next-2.6
V3: remove an useless #ifdef.

This patch unifies the netpoll code in bonding with netpoll code in bridge,
thanks to Herbert that code is much cleaner now.

Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Neil Horman <nhorman@redhat.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>


---
 drivers/net/bonding/bond_main.c |  155 ++++++++++++++++++++++++--------------
 drivers/net/bonding/bonding.h   |   20 +++++
 2 files changed, 118 insertions(+), 57 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 77e3c6a..2ed6624 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -59,7 +59,6 @@
 #include <linux/uaccess.h>
 #include <linux/errno.h>
 #include <linux/netdevice.h>
-#include <linux/netpoll.h>
 #include <linux/inetdevice.h>
 #include <linux/igmp.h>
 #include <linux/etherdevice.h>
@@ -424,15 +423,11 @@ int bond_dev_queue_xmit(struct bonding *bond, struct sk_buff *skb,
 {
 	skb->dev = slave_dev;
 	skb->priority = 1;
-#ifdef CONFIG_NET_POLL_CONTROLLER
-	if (unlikely(bond->dev->priv_flags & IFF_IN_NETPOLL)) {
-		struct netpoll *np = bond->dev->npinfo->netpoll;
-		slave_dev->npinfo = bond->dev->npinfo;
+	if (unlikely(netpoll_tx_running(slave_dev))) {
 		slave_dev->priv_flags |= IFF_IN_NETPOLL;
-		netpoll_send_skb_on_dev(np, skb, slave_dev);
+		bond_netpoll_send_skb(bond_get_slave_by_dev(bond, slave_dev), skb);
 		slave_dev->priv_flags &= ~IFF_IN_NETPOLL;
 	} else
-#endif
 		dev_queue_xmit(skb);
 
 	return 0;
@@ -1288,63 +1283,113 @@ static void bond_detach_slave(struct bonding *bond, struct slave *slave)
 }
 
 #ifdef CONFIG_NET_POLL_CONTROLLER
-/*
- * You must hold read lock on bond->lock before calling this.
- */
-static bool slaves_support_netpoll(struct net_device *bond_dev)
+static inline int slave_enable_netpoll(struct slave *slave)
 {
-	struct bonding *bond = netdev_priv(bond_dev);
-	struct slave *slave;
-	int i = 0;
-	bool ret = true;
+	struct netpoll *np;
+	int err = 0;
 
-	bond_for_each_slave(bond, slave, i) {
-		if ((slave->dev->priv_flags & IFF_DISABLE_NETPOLL) ||
-		    !slave->dev->netdev_ops->ndo_poll_controller)
-			ret = false;
+	np = kzalloc(sizeof(*np), GFP_KERNEL);
+	err = -ENOMEM;
+	if (!np)
+		goto out;
+
+	np->dev = slave->dev;
+	err = __netpoll_setup(np);
+	if (err) {
+		kfree(np);
+		goto out;
 	}
-	return i != 0 && ret;
+	slave->np = np;
+out:
+	return err;
+}
+static inline void slave_disable_netpoll(struct slave *slave)
+{
+	struct netpoll *np = slave->np;
+
+	if (!np)
+		return;
+
+	slave->np = NULL;
+	synchronize_rcu_bh();
+	__netpoll_cleanup(np);
+	kfree(np);
+}
+static inline bool slave_dev_support_netpoll(struct net_device *slave_dev)
+{
+	if (slave_dev->priv_flags & IFF_DISABLE_NETPOLL)
+		return false;
+	if (!slave_dev->netdev_ops->ndo_poll_controller)
+		return false;
+	return true;
 }
 
 static void bond_poll_controller(struct net_device *bond_dev)
 {
-	struct bonding *bond = netdev_priv(bond_dev);
+}
+
+static void __bond_netpoll_cleanup(struct bonding *bond)
+{
 	struct slave *slave;
 	int i;
 
-	bond_for_each_slave(bond, slave, i) {
-		if (slave->dev && IS_UP(slave->dev))
-			netpoll_poll_dev(slave->dev);
-	}
+	bond_for_each_slave(bond, slave, i)
+		if (IS_UP(slave->dev))
+			slave_disable_netpoll(slave);
 }
-
 static void bond_netpoll_cleanup(struct net_device *bond_dev)
 {
 	struct bonding *bond = netdev_priv(bond_dev);
+
+	read_lock(&bond->lock);
+	__bond_netpoll_cleanup(bond);
+	read_unlock(&bond->lock);
+}
+
+static int bond_netpoll_setup(struct net_device *dev, struct netpoll_info *ni)
+{
+	struct bonding *bond = netdev_priv(dev);
 	struct slave *slave;
-	const struct net_device_ops *ops;
-	int i;
+	int i, err = 0;
 
 	read_lock(&bond->lock);
-	bond_dev->npinfo = NULL;
 	bond_for_each_slave(bond, slave, i) {
-		if (slave->dev) {
-			ops = slave->dev->netdev_ops;
-			if (ops->ndo_netpoll_cleanup)
-				ops->ndo_netpoll_cleanup(slave->dev);
-			else
-				slave->dev->npinfo = NULL;
+		if (!IS_UP(slave->dev))
+			continue;
+		err = slave_enable_netpoll(slave);
+		if (err) {
+			__bond_netpoll_cleanup(bond);
+			break;
 		}
 	}
 	read_unlock(&bond->lock);
+	return err;
 }
 
-#else
+static struct netpoll_info *bond_netpoll_info(struct bonding *bond)
+{
+	return bond->dev->npinfo;
+}
 
+#else
+static inline int slave_enable_netpoll(struct slave *slave)
+{
+	return 0;
+}
+static inline void slave_disable_netpoll(struct slave *slave)
+{
+}
 static void bond_netpoll_cleanup(struct net_device *bond_dev)
 {
 }
-
+static int bond_netpoll_setup(struct net_device *dev, struct netpoll_info *ni)
+{
+	return 0;
+}
+static struct netpoll_info *bond_netpoll_info(struct bonding *bond)
+{
+	return NULL;
+}
 #endif
 
 /*---------------------------------- IOCTL ----------------------------------*/
@@ -1782,17 +1827,19 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 	bond_set_carrier(bond);
 
 #ifdef CONFIG_NET_POLL_CONTROLLER
-	if (slaves_support_netpoll(bond_dev)) {
-		bond_dev->priv_flags &= ~IFF_DISABLE_NETPOLL;
-		if (bond_dev->npinfo)
-			slave_dev->npinfo = bond_dev->npinfo;
-	} else if (!(bond_dev->priv_flags & IFF_DISABLE_NETPOLL)) {
-		bond_dev->priv_flags |= IFF_DISABLE_NETPOLL;
-		pr_info("New slave device %s does not support netpoll\n",
-			slave_dev->name);
-		pr_info("Disabling netpoll support for %s\n", bond_dev->name);
+	slave_dev->npinfo = bond_netpoll_info(bond);
+	if (slave_dev->npinfo) {
+		if (slave_enable_netpoll(new_slave)) {
+			read_unlock(&bond->lock);
+			pr_info("Error, %s: master_dev is using netpoll, "
+				 "but new slave device does not support netpoll.\n",
+				 bond_dev->name);
+			res = -EBUSY;
+			goto err_close;
+		}
 	}
 #endif
+
 	read_unlock(&bond->lock);
 
 	res = bond_create_slave_symlinks(bond_dev, slave_dev);
@@ -1994,17 +2041,7 @@ int bond_release(struct net_device *bond_dev, struct net_device *slave_dev)
 
 	netdev_set_bond_master(slave_dev, NULL);
 
-#ifdef CONFIG_NET_POLL_CONTROLLER
-	read_lock_bh(&bond->lock);
-
-	if (slaves_support_netpoll(bond_dev))
-		bond_dev->priv_flags &= ~IFF_DISABLE_NETPOLL;
-	read_unlock_bh(&bond->lock);
-	if (slave_dev->netdev_ops->ndo_netpoll_cleanup)
-		slave_dev->netdev_ops->ndo_netpoll_cleanup(slave_dev);
-	else
-		slave_dev->npinfo = NULL;
-#endif
+	slave_disable_netpoll(slave);
 
 	/* close slave before restoring its mac address */
 	dev_close(slave_dev);
@@ -2039,6 +2076,7 @@ static int  bond_release_and_destroy(struct net_device *bond_dev,
 
 	ret = bond_release(bond_dev, slave_dev);
 	if ((ret == 0) && (bond->slave_cnt == 0)) {
+		bond_dev->priv_flags |= IFF_DISABLE_NETPOLL;
 		pr_info("%s: destroying bond %s.\n",
 			bond_dev->name, bond_dev->name);
 		unregister_netdevice(bond_dev);
@@ -2116,6 +2154,8 @@ static int bond_release_all(struct net_device *bond_dev)
 
 		netdev_set_bond_master(slave_dev, NULL);
 
+		slave_disable_netpoll(slave);
+
 		/* close slave before restoring its mac address */
 		dev_close(slave_dev);
 
@@ -4654,6 +4694,7 @@ static const struct net_device_ops bond_netdev_ops = {
 	.ndo_vlan_rx_add_vid 	= bond_vlan_rx_add_vid,
 	.ndo_vlan_rx_kill_vid	= bond_vlan_rx_kill_vid,
 #ifdef CONFIG_NET_POLL_CONTROLLER
+	.ndo_netpoll_setup	= bond_netpoll_setup,
 	.ndo_netpoll_cleanup	= bond_netpoll_cleanup,
 	.ndo_poll_controller	= bond_poll_controller,
 #endif
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index 31fe980..0a3e00b 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -20,6 +20,7 @@
 #include <linux/if_bonding.h>
 #include <linux/cpumask.h>
 #include <linux/in6.h>
+#include <linux/netpoll.h>
 #include "bond_3ad.h"
 #include "bond_alb.h"
 
@@ -198,6 +199,9 @@ struct slave {
 	u16    queue_id;
 	struct ad_slave_info ad_info; /* HUGE - better to dynamically alloc */
 	struct tlb_slave_info tlb_info;
+#ifdef CONFIG_NET_POLL_CONTROLLER
+	struct netpoll *np;
+#endif
 };
 
 /*
@@ -323,6 +327,22 @@ static inline unsigned long slave_last_rx(struct bonding *bond,
 	return slave->dev->last_rx;
 }
 
+#ifdef CONFIG_NET_POLL_CONTROLLER
+static inline void bond_netpoll_send_skb(const struct slave *slave,
+					 struct sk_buff *skb)
+{
+	struct netpoll *np = slave->np;
+
+	if (np)
+		netpoll_send_skb(np, skb);
+}
+#else
+static inline void bond_netpoll_send_skb(const struct slave *slave,
+					 struct sk_buff *skb)
+{
+}
+#endif
+
 static inline void bond_set_slave_inactive_flags(struct slave *slave)
 {
 	struct bonding *bond = netdev_priv(slave->dev->master);
-- 
1.7.1

^ permalink raw reply related

* Re: Mass udp flow reboot linux with RealTek RTL-8169 Gigabit
From: Francois Romieu @ 2011-02-18  9:30 UTC (permalink / raw)
  To: Seblu; +Cc: Eric Dumazet, lkml, netdev, Ivan Vecera
In-Reply-To: <AANLkTi=PrOXMD54hSbEwUtmzXiwsSwiSijE3AJ9n=WDe@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 813 bytes --]

Seblu <seblu@seblu.net> :
[...]
> I've applyed your patch on 2.6.38-rc5. Host have rebooted 2mn after udp start.
> After this reboot, host is still on after 2 hour under a 1Gbit/s udp flow.

Thanks for testing.

> I attached a dmesg output before reboot. Do you need anything else?

Mostly :
1. .config
2. the size of the udp packets and the mtu

As an option :
3. a few seconds of 'vmstat 1' from the host under test
4. an 'ethtool -s eth0' from the host under test
5. /proc/interrupts from the host under test
6. lspci -tv 

Can you apply the two attached patches on top of the previous ones and
give it a try ? The debug should not be too verbose if things are stationary
enough.

Do you have a serial cable and a second computer at hand by chance (don't
go for netconsole with the r8169 driver) ?

-- 
Ueimor

[-- Attachment #2: 0001-r8169-TX-stop-queue-race-fix.patch --]
[-- Type: text/plain, Size: 1129 bytes --]

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 59ccf0c..712231f 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -4361,13 +4361,13 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 
 	tp->cur_tx += frags + 1;
 
-	wmb();
-
 	RTL_W8(TxPoll, NPQ);	/* set polling bit */
 
+	mmiowb();
+
 	if (TX_BUFFS_AVAIL(tp) < MAX_SKB_FRAGS) {
 		netif_stop_queue(dev);
-		smp_rmb();
+		smp_mb();
 		if (TX_BUFFS_AVAIL(tp) >= MAX_SKB_FRAGS)
 			netif_wake_queue(dev);
 	}
@@ -4468,10 +4468,14 @@ static void rtl8169_tx_interrupt(struct net_device *dev,
 
 	if (tp->dirty_tx != dirty_tx) {
 		tp->dirty_tx = dirty_tx;
-		smp_wmb();
-		if (netif_queue_stopped(dev) &&
-		    (TX_BUFFS_AVAIL(tp) >= MAX_SKB_FRAGS)) {
-			netif_wake_queue(dev);
+		smp_mb();
+		if (unlikely(netif_queue_stopped(dev) &&
+		    (TX_BUFFS_AVAIL(tp) >= (NUM_TX_DESC / 4)))) {
+			netif_tx_lock(dev);
+			if (netif_queue_stopped(dev) &&
+			    (TX_BUFFS_AVAIL(tp) >= (NUM_TX_DESC / 4)))
+				netif_wake_queue(dev);
+			netif_tx_unlock(dev);
 		}
 		/*
 		 * 8168 hack: TxPoll requests are lost when the Tx packets are
-- 
1.7.3.4


[-- Attachment #3: 0002-r8169-debuginator.patch --]
[-- Type: text/plain, Size: 856 bytes --]

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 712231f..5eaccbb 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -4622,6 +4622,11 @@ static int rtl8169_rx_interrupt(struct net_device *dev,
 	return count;
 }
 
+static struct {
+	u16 status[4];
+	u16 idx;
+} x = { { 0 }, 0 };
+
 static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 {
 	struct net_device *dev = dev_instance;
@@ -4637,6 +4642,12 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 	while (status && status != 0xffff) {
 		handled = 1;
 
+		x.status[x.idx++ % 4] = status;
+		if (net_ratelimit()) {
+			printk(KERN_INFO "%04x %04x %04x %04x\n",
+				x.status[0], x.status[1],
+				x.status[2], x.status[3]);
+		}
 		/* Handle all of the error cases first. These will reset
 		 * the chip, so just exit the loop.
 		 */
-- 
1.7.3.4


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox