Netdev List

Netdev List
 help / color / mirror / Atom feed

* RE: [RFC 2/2] ethtool: Add support for DMA Coalescing feature config to ethtool.
From: Wyborny, Carolyn @ 2011-06-30 17:52 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: David Miller, netdev@vger.kernel.org
In-Reply-To: <1308677893.2743.24.camel@bwh-desktop>



>-----Original Message-----
>From: Ben Hutchings [mailto:bhutchings@solarflare.com]
>Sent: Tuesday, June 21, 2011 10:38 AM
>To: Wyborny, Carolyn
>Cc: David Miller; netdev@vger.kernel.org
>Subject: RE: [RFC 2/2] ethtool: Add support for DMA Coalescing feature
>config to ethtool.
>
>On Tue, 2011-06-21 at 10:23 -0700, Wyborny, Carolyn wrote:
>>
>> >-----Original Message-----
>> >From: David Miller [mailto:davem@davemloft.net]
>> >Sent: Friday, June 17, 2011 11:54 AM
>> >To: Wyborny, Carolyn
>> >Cc: netdev@vger.kernel.org; bhutchings@solarflare.com
>> >Subject: Re: [RFC 2/2] ethtool: Add support for DMA Coalescing
>feature
>> >config to ethtool.
>> >
>> >From: "Wyborny, Carolyn" <carolyn.wyborny@intel.com>
>> >Date: Fri, 17 Jun 2011 08:50:11 -0700
>> >
>> >> I will add a fuller description of the feature in my updated patch.
>> >> I thought the feature was more well known. Quick description is
>that
>> >> it's a power saving feature that causes the adapter to coalesce its
>> >> DMA writes at low traffic times to save power on the platform by
>> >> reducing wakeups.  The parameter is intended as a simple u32 value,
>> >> not just an on or off, but also to allow a variety of configuration
>> >> by adapter vendors, with validation of the input on the driver
>side.
>> >> Since I left out the implementation in my patch, this wasn't clear.
>> >> I will also fix this in my next submission.
>> >
>> >The value cannot have adapter specific meaning, you must define it
>> >precisely and in a generic manner, such that the user can specify the
>> >same setting across different card types.
>>
>> Ok, good point.  I will refine the definition of the parameter in the
>> next submission, once the dust clears on the major revisions in
>> progress.
>
>You may wish to propose a new command structure that covers both IRQ and
>DMA moderation.  They seem to be related, since DMA cannot be delayed
>longer than the corresponding IRQ.  We are currently lacking a way to
>specify different IRQ moderation for multiqueue devices where the queues
>are not all used in the same way.
>
>Ben.
>
>--
>Ben Hutchings, Senior Software Engineer, Solarflare
>Not speaking for my employer; that's the marketing department's job.
>They asked us to note that Solarflare product names are trademarked.

I will try to do this.  Confirming you are suggesting a replacement or an alternate for the current -c/-C coalesce settings with perhaps some room reserved for some future coalescing type features and enable settings per queue?  I agree they are related and initially considered trying to add DMAC to the current coalescing settings, but they are full.

Carolyn

Carolyn Wyborny
Linux Development
LAN Access Division
Intel Corporation




^ permalink raw reply

* IPv6 /127 address
From: Stephen Hemminger @ 2011-06-30 17:47 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev

There is a new RFC out for supporting /127 addresses for
router interconnect and point-to-point links.

Right now the kernel gets confused by the /127 and doesn't
disable the anycast address. Any ideas?

 http://tools.ietf.org/html/rfc6164
 http://lists.debian.org/debian-ipv6/2011/05/msg00018.html

^ permalink raw reply

* Re: [RFC patch net-next-2.6] net: allow multiple rx_handler registration
From: Jiri Pirko @ 2011-06-30 17:32 UTC (permalink / raw)
  To: Ben Greear
  Cc: Stephen Hemminger, netdev, davem, kaber, fubar, eric.dumazet,
	nicolas.2p.debian, andy
In-Reply-To: <4E0CB26C.9070305@candelatech.com>

Thu, Jun 30, 2011 at 07:29:16PM CEST, greearb@candelatech.com wrote:
>On 06/30/2011 10:22 AM, Jiri Pirko wrote:
>>Thu, Jun 30, 2011 at 06:27:12PM CEST, shemminger@vyatta.com wrote:
>>>On Thu, 30 Jun 2011 17:16:49 +0200
>>>Jiri Pirko<jpirko@redhat.com>  wrote:
>>>
>>>>For some net topos it is necessary to have multiple "soft-net-devices"
>>>>hooked on one netdev. For example very common is to have
>>>>eth<->(br+vlan). Vlan is not using rh_handler (yet) but also for example
>>>>macvlan would be useful to have hooked on same netdev as br.
>>>>
>>>>This patch introduces rx_handler list. size struct net_device stays
>>>>intact. Measured performance regression on eth-br topo is ~1% (on received
>>>>pkts generated by pktgen) and on eth-bond topo it is ~0.25%
>>>>
>>>>On br I think that the performance can be brought back maybe by using per-cpu
>>>>variables to store port in rx_path (I must check this)
>>>>
>>>>Please comment.
>>>>
>>>>Signed-off-by: Jiri Pirko<jpirko@redhat.com>
>>>
>>>I am ok with the infrastructure, but why should Vlan use rh_handle.
>>
>>Well why it shoudln't. It would fit into what rx_handler is here for - the
>>code would be more unified. Also net_device struct would lose struct
>>vlan_group __rcu *vlgrp pointer (and reducing net_device size is always
>>good thing).
>>
>>>It is wrong to allow macvlan and bridge to share same device.
>>>Right now the code blocks users from doing lots of stupid things.
>>
>>Right, this is since rx_handler was introduced. Before that all these
>>stupid configs were allowed. It's possible easily to forbid unwanted
>>configs by checking priv flags.
>
>What sorts of stupid things?  I didn't look at your patch, but does it handle
>ordering?  In other words, is a bridge logic always handled before VLAN logic?
>
>The old hard-coded stuff in dev.c inherently determined ordering.  For dynamic
>handlers, we may need to enforce ordering to give the user any chance of doing
>things right (it would be very confusing to have the behaviour change completely
>if you added bridge module before vlan module v/s vlan before bridge).

You should read the patch first :) Ordering is handled there.

>
>Thanks,
>Ben
>>
>>>
>>--
>>To unsubscribe from this list: send the line "unsubscribe netdev" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>-- 
>Ben Greear <greearb@candelatech.com>
>Candela Technologies Inc  http://www.candelatech.com
>

^ permalink raw reply

* pull request: wireless-2.6 2011-06-30
From: John W. Linville @ 2011-06-30 17:32 UTC (permalink / raw)
  To: davem; +Cc: linux-wireless, netdev, linux-kernel

Dave,

Here is a batch of fixes intended for 3.0.  Arik gives us a fix for a
potential NULL dereference in mac80211.  Emmanuel gives us a fix for a
regression introduced by "iwlagn: support multiple TBs per command" that
can corrupt memory.  Eugene (and Bob) gives a memory leak fix for ath5k.
Evgeni gives us a preprocessor-related fix that makes modinfo output
make more sense for iwlagn.  Johannes gives us a trio of fixes, all
isolated to the bowels of iwlagn.  I overlayed a fixup on top of one of
Johannes's patches, since there was some confusion between DMA and PCI
API usage.  Finally, Rajkumar gives us an ath9k fix to ensure the chip
is properly awakened even if there is no active interface when the
resume occurs.

Please let me know if there are problems!

Thanks,

John

---

The following changes since commit 16adf5d07987d93675945f3cecf0e33706566005:

  usbnet: Remove over-broad module alias from zaurus. (2011-06-29 06:09:17 -0700)

are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.git for-davem

Arik Nemtsov (1):
      mac80211: fix rx->key NULL dereference during mic failure

Emmanuel Grumbach (1):
      iwlagn: Fix a bug introduced by the HUGE command removal

Eugene A. Shatokhin (1):
      ath5k: fix memory leak when fewer than N_PD_CURVES are in use

Evgeni Golov (1):
      iwlagn: fix *_UCODE_API_MAX output in the firmware field

Johannes Berg (3):
      iwlagn: fix change_interface for P2P types
      iwlagn: fix cmd queue unmap
      iwlagn: map command buffers BIDI

John W. Linville (3):
      Merge branch 'wireless-2.6' of git://git.kernel.org/.../iwlwifi/iwlwifi-2.6
      iwlagn: use PCI_DMA_* for pci_* operations
      Merge branch 'master' of git://git.kernel.org/.../linville/wireless-2.6 into for-davem

Rajkumar Manoharan (1):
      ath9k: Fix suspend/resume when no interface is UP

 drivers/net/wireless/ath/ath5k/eeprom.c |    8 +++-----
 drivers/net/wireless/ath/ath9k/pci.c    |    6 ++++++
 drivers/net/wireless/iwlwifi/iwl-1000.c |    5 +++--
 drivers/net/wireless/iwlwifi/iwl-2000.c |    7 ++++---
 drivers/net/wireless/iwlwifi/iwl-5000.c |    5 +++--
 drivers/net/wireless/iwlwifi/iwl-6000.c |    9 +++++----
 drivers/net/wireless/iwlwifi/iwl-core.c |    3 ++-
 drivers/net/wireless/iwlwifi/iwl-tx.c   |   25 ++++++++++---------------
 include/net/cfg80211.h                  |    2 +-
 net/mac80211/wpa.c                      |    8 +++++++-
 net/wireless/nl80211.c                  |    3 ++-
 11 files changed, 46 insertions(+), 35 deletions(-)

diff --git a/drivers/net/wireless/ath/ath5k/eeprom.c b/drivers/net/wireless/ath/ath5k/eeprom.c
index 1fef84f..392771f 100644
--- a/drivers/net/wireless/ath/ath5k/eeprom.c
+++ b/drivers/net/wireless/ath/ath5k/eeprom.c
@@ -691,14 +691,12 @@ ath5k_eeprom_free_pcal_info(struct ath5k_hw *ah, int mode)
 		if (!chinfo[pier].pd_curves)
 			continue;
 
-		for (pdg = 0; pdg < ee->ee_pd_gains[mode]; pdg++) {
+		for (pdg = 0; pdg < AR5K_EEPROM_N_PD_CURVES; pdg++) {
 			struct ath5k_pdgain_info *pd =
 					&chinfo[pier].pd_curves[pdg];
 
-			if (pd != NULL) {
-				kfree(pd->pd_step);
-				kfree(pd->pd_pwr);
-			}
+			kfree(pd->pd_step);
+			kfree(pd->pd_pwr);
 		}
 
 		kfree(chinfo[pier].pd_curves);
diff --git a/drivers/net/wireless/ath/ath9k/pci.c b/drivers/net/wireless/ath/ath9k/pci.c
index b8cbfc7..3bad0b2 100644
--- a/drivers/net/wireless/ath/ath9k/pci.c
+++ b/drivers/net/wireless/ath/ath9k/pci.c
@@ -278,6 +278,12 @@ static int ath_pci_suspend(struct device *device)
 
 	ath9k_hw_set_gpio(sc->sc_ah, sc->sc_ah->led_pin, 1);
 
+	/* The device has to be moved to FULLSLEEP forcibly.
+	 * Otherwise the chip never moved to full sleep,
+	 * when no interface is up.
+	 */
+	ath9k_hw_setpower(sc->sc_ah, ATH9K_PM_FULL_SLEEP);
+
 	return 0;
 }
 
diff --git a/drivers/net/wireless/iwlwifi/iwl-1000.c b/drivers/net/wireless/iwlwifi/iwl-1000.c
index 61d4a11..2a88e73 100644
--- a/drivers/net/wireless/iwlwifi/iwl-1000.c
+++ b/drivers/net/wireless/iwlwifi/iwl-1000.c
@@ -36,6 +36,7 @@
 #include <net/mac80211.h>
 #include <linux/etherdevice.h>
 #include <asm/unaligned.h>
+#include <linux/stringify.h>
 
 #include "iwl-eeprom.h"
 #include "iwl-dev.h"
@@ -55,10 +56,10 @@
 #define IWL100_UCODE_API_MIN 5
 
 #define IWL1000_FW_PRE "iwlwifi-1000-"
-#define IWL1000_MODULE_FIRMWARE(api) IWL1000_FW_PRE #api ".ucode"
+#define IWL1000_MODULE_FIRMWARE(api) IWL1000_FW_PRE __stringify(api) ".ucode"
 
 #define IWL100_FW_PRE "iwlwifi-100-"
-#define IWL100_MODULE_FIRMWARE(api) IWL100_FW_PRE #api ".ucode"
+#define IWL100_MODULE_FIRMWARE(api) IWL100_FW_PRE __stringify(api) ".ucode"
 
 
 /*
diff --git a/drivers/net/wireless/iwlwifi/iwl-2000.c b/drivers/net/wireless/iwlwifi/iwl-2000.c
index 2282279..3df76f5 100644
--- a/drivers/net/wireless/iwlwifi/iwl-2000.c
+++ b/drivers/net/wireless/iwlwifi/iwl-2000.c
@@ -36,6 +36,7 @@
 #include <net/mac80211.h>
 #include <linux/etherdevice.h>
 #include <asm/unaligned.h>
+#include <linux/stringify.h>
 
 #include "iwl-eeprom.h"
 #include "iwl-dev.h"
@@ -58,13 +59,13 @@
 #define IWL105_UCODE_API_MIN 5
 
 #define IWL2030_FW_PRE "iwlwifi-2030-"
-#define IWL2030_MODULE_FIRMWARE(api) IWL2030_FW_PRE #api ".ucode"
+#define IWL2030_MODULE_FIRMWARE(api) IWL2030_FW_PRE __stringify(api) ".ucode"
 
 #define IWL2000_FW_PRE "iwlwifi-2000-"
-#define IWL2000_MODULE_FIRMWARE(api) IWL2000_FW_PRE #api ".ucode"
+#define IWL2000_MODULE_FIRMWARE(api) IWL2000_FW_PRE __stringify(api) ".ucode"
 
 #define IWL105_FW_PRE "iwlwifi-105-"
-#define IWL105_MODULE_FIRMWARE(api) IWL105_FW_PRE #api ".ucode"
+#define IWL105_MODULE_FIRMWARE(api) IWL105_FW_PRE __stringify(api) ".ucode"
 
 static void iwl2000_set_ct_threshold(struct iwl_priv *priv)
 {
diff --git a/drivers/net/wireless/iwlwifi/iwl-5000.c b/drivers/net/wireless/iwlwifi/iwl-5000.c
index f99f9c1..e816c27 100644
--- a/drivers/net/wireless/iwlwifi/iwl-5000.c
+++ b/drivers/net/wireless/iwlwifi/iwl-5000.c
@@ -37,6 +37,7 @@
 #include <net/mac80211.h>
 #include <linux/etherdevice.h>
 #include <asm/unaligned.h>
+#include <linux/stringify.h>
 
 #include "iwl-eeprom.h"
 #include "iwl-dev.h"
@@ -57,10 +58,10 @@
 #define IWL5150_UCODE_API_MIN 1
 
 #define IWL5000_FW_PRE "iwlwifi-5000-"
-#define IWL5000_MODULE_FIRMWARE(api) IWL5000_FW_PRE #api ".ucode"
+#define IWL5000_MODULE_FIRMWARE(api) IWL5000_FW_PRE __stringify(api) ".ucode"
 
 #define IWL5150_FW_PRE "iwlwifi-5150-"
-#define IWL5150_MODULE_FIRMWARE(api) IWL5150_FW_PRE #api ".ucode"
+#define IWL5150_MODULE_FIRMWARE(api) IWL5150_FW_PRE __stringify(api) ".ucode"
 
 /* NIC configuration for 5000 series */
 static void iwl5000_nic_config(struct iwl_priv *priv)
diff --git a/drivers/net/wireless/iwlwifi/iwl-6000.c b/drivers/net/wireless/iwlwifi/iwl-6000.c
index fbe565c..5b150bc 100644
--- a/drivers/net/wireless/iwlwifi/iwl-6000.c
+++ b/drivers/net/wireless/iwlwifi/iwl-6000.c
@@ -36,6 +36,7 @@
 #include <net/mac80211.h>
 #include <linux/etherdevice.h>
 #include <asm/unaligned.h>
+#include <linux/stringify.h>
 
 #include "iwl-eeprom.h"
 #include "iwl-dev.h"
@@ -58,16 +59,16 @@
 #define IWL6000G2_UCODE_API_MIN 4
 
 #define IWL6000_FW_PRE "iwlwifi-6000-"
-#define IWL6000_MODULE_FIRMWARE(api) IWL6000_FW_PRE #api ".ucode"
+#define IWL6000_MODULE_FIRMWARE(api) IWL6000_FW_PRE __stringify(api) ".ucode"
 
 #define IWL6050_FW_PRE "iwlwifi-6050-"
-#define IWL6050_MODULE_FIRMWARE(api) IWL6050_FW_PRE #api ".ucode"
+#define IWL6050_MODULE_FIRMWARE(api) IWL6050_FW_PRE __stringify(api) ".ucode"
 
 #define IWL6005_FW_PRE "iwlwifi-6000g2a-"
-#define IWL6005_MODULE_FIRMWARE(api) IWL6005_FW_PRE #api ".ucode"
+#define IWL6005_MODULE_FIRMWARE(api) IWL6005_FW_PRE __stringify(api) ".ucode"
 
 #define IWL6030_FW_PRE "iwlwifi-6000g2b-"
-#define IWL6030_MODULE_FIRMWARE(api) IWL6030_FW_PRE #api ".ucode"
+#define IWL6030_MODULE_FIRMWARE(api) IWL6030_FW_PRE __stringify(api) ".ucode"
 
 static void iwl6000_set_ct_threshold(struct iwl_priv *priv)
 {
diff --git a/drivers/net/wireless/iwlwifi/iwl-core.c b/drivers/net/wireless/iwlwifi/iwl-core.c
index 213c80c..45cc51c 100644
--- a/drivers/net/wireless/iwlwifi/iwl-core.c
+++ b/drivers/net/wireless/iwlwifi/iwl-core.c
@@ -1763,6 +1763,7 @@ int iwl_mac_change_interface(struct ieee80211_hw *hw, struct ieee80211_vif *vif,
 	struct iwl_rxon_context *ctx = iwl_rxon_ctx_from_vif(vif);
 	struct iwl_rxon_context *bss_ctx = &priv->contexts[IWL_RXON_CTX_BSS];
 	struct iwl_rxon_context *tmp;
+	enum nl80211_iftype newviftype = newtype;
 	u32 interface_modes;
 	int err;
 
@@ -1818,7 +1819,7 @@ int iwl_mac_change_interface(struct ieee80211_hw *hw, struct ieee80211_vif *vif,
 
 	/* success */
 	iwl_teardown_interface(priv, vif, true);
-	vif->type = newtype;
+	vif->type = newviftype;
 	vif->p2p = newp2p;
 	err = iwl_setup_interface(priv, ctx);
 	WARN_ON(err);
diff --git a/drivers/net/wireless/iwlwifi/iwl-tx.c b/drivers/net/wireless/iwlwifi/iwl-tx.c
index 686e176..137dba9 100644
--- a/drivers/net/wireless/iwlwifi/iwl-tx.c
+++ b/drivers/net/wireless/iwlwifi/iwl-tx.c
@@ -126,7 +126,7 @@ static inline u8 iwl_tfd_get_num_tbs(struct iwl_tfd *tfd)
 }
 
 static void iwlagn_unmap_tfd(struct iwl_priv *priv, struct iwl_cmd_meta *meta,
-			     struct iwl_tfd *tfd)
+			     struct iwl_tfd *tfd, int dma_dir)
 {
 	struct pci_dev *dev = priv->pci_dev;
 	int i;
@@ -151,7 +151,7 @@ static void iwlagn_unmap_tfd(struct iwl_priv *priv, struct iwl_cmd_meta *meta,
 	/* Unmap chunks, if any. */
 	for (i = 1; i < num_tbs; i++)
 		pci_unmap_single(dev, iwl_tfd_tb_get_addr(tfd, i),
-				iwl_tfd_tb_get_len(tfd, i), PCI_DMA_TODEVICE);
+				iwl_tfd_tb_get_len(tfd, i), dma_dir);
 }
 
 /**
@@ -167,7 +167,8 @@ void iwlagn_txq_free_tfd(struct iwl_priv *priv, struct iwl_tx_queue *txq)
 	struct iwl_tfd *tfd_tmp = txq->tfds;
 	int index = txq->q.read_ptr;
 
-	iwlagn_unmap_tfd(priv, &txq->meta[index], &tfd_tmp[index]);
+	iwlagn_unmap_tfd(priv, &txq->meta[index], &tfd_tmp[index],
+			 PCI_DMA_TODEVICE);
 
 	/* free SKB */
 	if (txq->txb) {
@@ -310,9 +311,7 @@ void iwl_cmd_queue_unmap(struct iwl_priv *priv)
 		i = get_cmd_index(q, q->read_ptr);
 
 		if (txq->meta[i].flags & CMD_MAPPED) {
-			pci_unmap_single(priv->pci_dev,
-					 dma_unmap_addr(&txq->meta[i], mapping),
-					 dma_unmap_len(&txq->meta[i], len),
+			iwlagn_unmap_tfd(priv, &txq->meta[i], &txq->tfds[i],
 					 PCI_DMA_BIDIRECTIONAL);
 			txq->meta[i].flags = 0;
 		}
@@ -535,12 +534,7 @@ out_free_arrays:
 void iwl_tx_queue_reset(struct iwl_priv *priv, struct iwl_tx_queue *txq,
 			int slots_num, u32 txq_id)
 {
-	int actual_slots = slots_num;
-
-	if (txq_id == priv->cmd_queue)
-		actual_slots++;
-
-	memset(txq->meta, 0, sizeof(struct iwl_cmd_meta) * actual_slots);
+	memset(txq->meta, 0, sizeof(struct iwl_cmd_meta) * slots_num);
 
 	txq->need_update = 0;
 
@@ -700,10 +694,11 @@ int iwl_enqueue_hcmd(struct iwl_priv *priv, struct iwl_host_cmd *cmd)
 		if (!(cmd->dataflags[i] & IWL_HCMD_DFL_NOCOPY))
 			continue;
 		phys_addr = pci_map_single(priv->pci_dev, (void *)cmd->data[i],
-					   cmd->len[i], PCI_DMA_TODEVICE);
+					   cmd->len[i], PCI_DMA_BIDIRECTIONAL);
 		if (pci_dma_mapping_error(priv->pci_dev, phys_addr)) {
 			iwlagn_unmap_tfd(priv, out_meta,
-					 &txq->tfds[q->write_ptr]);
+					 &txq->tfds[q->write_ptr],
+					 PCI_DMA_BIDIRECTIONAL);
 			idx = -ENOMEM;
 			goto out;
 		}
@@ -807,7 +802,7 @@ void iwl_tx_cmd_complete(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb)
 	cmd = txq->cmd[cmd_index];
 	meta = &txq->meta[cmd_index];
 
-	iwlagn_unmap_tfd(priv, meta, &txq->tfds[index]);
+	iwlagn_unmap_tfd(priv, meta, &txq->tfds[index], PCI_DMA_BIDIRECTIONAL);
 
 	/* Input error checking is done when commands are added to queue. */
 	if (meta->flags & CMD_WANT_SKB) {
diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h
index 0589f55..396e8fc 100644
--- a/include/net/cfg80211.h
+++ b/include/net/cfg80211.h
@@ -2688,7 +2688,7 @@ void cfg80211_send_unprot_disassoc(struct net_device *dev, const u8 *buf,
  * @dev: network device
  * @addr: The source MAC address of the frame
  * @key_type: The key type that the received frame used
- * @key_id: Key identifier (0..3)
+ * @key_id: Key identifier (0..3). Can be -1 if missing.
  * @tsc: The TSC value of the frame that generated the MIC failure (6 octets)
  * @gfp: allocation flags
  *
diff --git a/net/mac80211/wpa.c b/net/mac80211/wpa.c
index 9dc3b5f..d91c1a2 100644
--- a/net/mac80211/wpa.c
+++ b/net/mac80211/wpa.c
@@ -154,7 +154,13 @@ update_iv:
 	return RX_CONTINUE;
 
 mic_fail:
-	mac80211_ev_michael_mic_failure(rx->sdata, rx->key->conf.keyidx,
+	/*
+	 * In some cases the key can be unset - e.g. a multicast packet, in
+	 * a driver that supports HW encryption. Send up the key idx only if
+	 * the key is set.
+	 */
+	mac80211_ev_michael_mic_failure(rx->sdata,
+					rx->key ? rx->key->conf.keyidx : -1,
 					(void *) skb->data, NULL, GFP_ATOMIC);
 	return RX_DROP_UNUSABLE;
 }
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 98fa8eb..f07602d 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -6463,7 +6463,8 @@ void nl80211_michael_mic_failure(struct cfg80211_registered_device *rdev,
 	if (addr)
 		NLA_PUT(msg, NL80211_ATTR_MAC, ETH_ALEN, addr);
 	NLA_PUT_U32(msg, NL80211_ATTR_KEY_TYPE, key_type);
-	NLA_PUT_U8(msg, NL80211_ATTR_KEY_IDX, key_id);
+	if (key_id != -1)
+		NLA_PUT_U8(msg, NL80211_ATTR_KEY_IDX, key_id);
 	if (tsc)
 		NLA_PUT(msg, NL80211_ATTR_KEY_SEQ, 6, tsc);
 
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply related

* Re: [RFC patch net-next-2.6] net: allow multiple rx_handler registration
From: Ben Greear @ 2011-06-30 17:29 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Stephen Hemminger, netdev, davem, kaber, fubar, eric.dumazet,
	nicolas.2p.debian, andy
In-Reply-To: <20110630172257.GB2056@minipsycho>

On 06/30/2011 10:22 AM, Jiri Pirko wrote:
> Thu, Jun 30, 2011 at 06:27:12PM CEST, shemminger@vyatta.com wrote:
>> On Thu, 30 Jun 2011 17:16:49 +0200
>> Jiri Pirko<jpirko@redhat.com>  wrote:
>>
>>> For some net topos it is necessary to have multiple "soft-net-devices"
>>> hooked on one netdev. For example very common is to have
>>> eth<->(br+vlan). Vlan is not using rh_handler (yet) but also for example
>>> macvlan would be useful to have hooked on same netdev as br.
>>>
>>> This patch introduces rx_handler list. size struct net_device stays
>>> intact. Measured performance regression on eth-br topo is ~1% (on received
>>> pkts generated by pktgen) and on eth-bond topo it is ~0.25%
>>>
>>> On br I think that the performance can be brought back maybe by using per-cpu
>>> variables to store port in rx_path (I must check this)
>>>
>>> Please comment.
>>>
>>> Signed-off-by: Jiri Pirko<jpirko@redhat.com>
>>
>> I am ok with the infrastructure, but why should Vlan use rh_handle.
>
> Well why it shoudln't. It would fit into what rx_handler is here for - the
> code would be more unified. Also net_device struct would lose struct
> vlan_group __rcu *vlgrp pointer (and reducing net_device size is always
> good thing).
>
>> It is wrong to allow macvlan and bridge to share same device.
>> Right now the code blocks users from doing lots of stupid things.
>
> Right, this is since rx_handler was introduced. Before that all these
> stupid configs were allowed. It's possible easily to forbid unwanted
> configs by checking priv flags.

What sorts of stupid things?  I didn't look at your patch, but does it handle
ordering?  In other words, is a bridge logic always handled before VLAN logic?

The old hard-coded stuff in dev.c inherently determined ordering.  For dynamic
handlers, we may need to enforce ordering to give the user any chance of doing
things right (it would be very confusing to have the behaviour change completely
if you added bridge module before vlan module v/s vlan before bridge).

Thanks,
Ben
>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply

* Re: [RFC patch net-next-2.6] net: allow multiple rx_handler registration
From: Jiri Pirko @ 2011-06-30 17:22 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: netdev, davem, kaber, fubar, eric.dumazet, nicolas.2p.debian,
	andy
In-Reply-To: <20110630092712.17eb292f@nehalam.ftrdhcpuser.net>

Thu, Jun 30, 2011 at 06:27:12PM CEST, shemminger@vyatta.com wrote:
>On Thu, 30 Jun 2011 17:16:49 +0200
>Jiri Pirko <jpirko@redhat.com> wrote:
>
>> For some net topos it is necessary to have multiple "soft-net-devices"
>> hooked on one netdev. For example very common is to have
>> eth<->(br+vlan). Vlan is not using rh_handler (yet) but also for example
>> macvlan would be useful to have hooked on same netdev as br.
>> 
>> This patch introduces rx_handler list. size struct net_device stays
>> intact. Measured performance regression on eth-br topo is ~1% (on received
>> pkts generated by pktgen) and on eth-bond topo it is ~0.25%
>> 
>> On br I think that the performance can be brought back maybe by using per-cpu
>> variables to store port in rx_path (I must check this)
>> 
>> Please comment.
>> 
>> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
>
>I am ok with the infrastructure, but why should Vlan use rh_handle.

Well why it shoudln't. It would fit into what rx_handler is here for - the
code would be more unified. Also net_device struct would lose struct
vlan_group __rcu *vlgrp pointer (and reducing net_device size is always
good thing).

>It is wrong to allow macvlan and bridge to share same device.
>Right now the code blocks users from doing lots of stupid things.

Right, this is since rx_handler was introduced. Before that all these
stupid configs were allowed. It's possible easily to forbid unwanted
configs by checking priv flags.

>

^ permalink raw reply

* Re: [PATCH] net/core: Convert to current logging forms
From: Joe Perches @ 2011-06-30 17:10 UTC (permalink / raw)
  To: WANG Cong; +Cc: netdev, LKML
In-Reply-To: <iuhh76$fid$1@dough.gmane.org>

On Thu, 2011-06-30 at 09:55 +0000, WANG Cong wrote:
> On Tue, 28 Jun 2011 12:40:10 -0700, Joe Perches wrote:
> > Use pr_fmt, pr_<level>, and netdev_<level> as appropriate.
> > Coalesce long formats.
> > +		np->name, np->local_port);
> > +	pr_info("%s: local IP %pI4\n",
> > +		np->name, &np->local_ip);
> > +	pr_info("%s: interface '%s'\n",
> > +		np->name, np->dev_name);
> > +	pr_info("%s: remote port %d\n",
> > +		np->name, np->remote_port);
> > +	pr_info("%s: remote IP %pI4\n",
> > +		np->name, &np->remote_ip);
> > +	pr_info("%s: remote ethernet address %pM\n",
> > +		np->name, np->remote_mac);
> >  }
> This doesn't have much value, because the name of the netpoll
> user (np->name) is already logged. If we changed it,
> we would see "netconsole: netconsole: blah blah...".

Thanks.

Don't just reply to the lists.
Remember to include the patch author in your replies.

cheers, Joe

^ permalink raw reply

* Re: possible bridge regression in "bridge: implement [add/del]_slave ops"?
From: Stephen Hemminger @ 2011-06-30 17:08 UTC (permalink / raw)
  To: Alexander Stein; +Cc: David S. Miller, bridge, netdev
In-Reply-To: <201106301033.23997.alexander.stein@systec-electronic.com>

On Thu, 30 Jun 2011 10:33:23 +0200
Alexander Stein <alexander.stein@systec-electronic.com> wrote:

> * echo $(pgrep rstpd) > /var/run/rstpd.pid
> * brctl addbr br1
> * echo 1 > /sys/class/net/br1/bridge/stp_state

This bogus. You are running both kernel and spanning
tree daemon at the same time!

Doing the echo of 1 to stp_state forces kernel spanning
tree. You want 2 which is what is supposed to be use for user
mode spanning tree.

Note: dropping LKML off the thread to save time/space/noise.

^ permalink raw reply

* Re: possible bridge regression in "bridge: implement [add/del]_slave ops"?
From: Stephen Hemminger @ 2011-06-30 17:03 UTC (permalink / raw)
  To: Alexander Stein; +Cc: David S. Miller, bridge, netdev, linux-kernel
In-Reply-To: <201106301527.19539.alexander.stein@systec-electronic.com>

On Thu, 30 Jun 2011 15:27:19 +0200
Alexander Stein <alexander.stein@systec-electronic.com> wrote:

> On Thursday 30 June 2011 10:33:23 Alexander Stein wrote:
> > BTW: I noticed that in 2.6.39.2 independently from this patch revert this
> > bridge didn't show up RUNNING ifconfg. Is this intended? Another bridge I
> > have, which doesn't use (R)STP, is shown as RUNNING like before.
> 
> This change was caused by commit 1faa4356a3bd89ea11fb92752d897cff3a20ec0e 
> "bridge: control carrier based on ports online". It prevents the bridge from 
> actually receiving/sending packets. Reverting restores the old behavior.

It is really a bug in RSTP, I will fix it there.

^ permalink raw reply

* Re: [PATCH 4/4] xen/netback: Add module alias for autoloading
From: Konrad Rzeszutek Wilk @ 2011-06-30 16:39 UTC (permalink / raw)
  To: Bastian Blank, xen-devel, virtualization, Jens Axboe,
	linux-kernel, netdev
In-Reply-To: <20110629124132.GD31038@wavehammer.waldi.eu.org>

On Wed, Jun 29, 2011 at 02:41:32PM +0200, Bastian Blank wrote:
> Add xen-backend:vif module alias to the xen-netback module. This allows
> automatic loading of the module.

Dave,

Could you queue this up for 3.1 please? I've the other two patches in my
tree for 3.1 and the block patch ready for Jens.

> 
> Signed-off-by: Bastian Blank <waldi@debian.org>
> Acked-by: Ian Campbell <ian.campbell@citrix.com>
> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  drivers/net/xen-netback/netback.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index 0e4851b..fd00f25 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -1743,3 +1743,4 @@ failed_init:
>  module_init(netback_init);
>  
>  MODULE_LICENSE("Dual BSD/GPL");
> +MODULE_ALIAS("xen-backend:vif");
> -- 
> 1.7.5.4
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply

* Re: [RFC patch net-next-2.6] net: allow multiple rx_handler registration
From: Stephen Hemminger @ 2011-06-30 16:27 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, kaber, fubar, eric.dumazet, nicolas.2p.debian,
	andy
In-Reply-To: <1309447009-8898-1-git-send-email-jpirko@redhat.com>

On Thu, 30 Jun 2011 17:16:49 +0200
Jiri Pirko <jpirko@redhat.com> wrote:

> For some net topos it is necessary to have multiple "soft-net-devices"
> hooked on one netdev. For example very common is to have
> eth<->(br+vlan). Vlan is not using rh_handler (yet) but also for example
> macvlan would be useful to have hooked on same netdev as br.
> 
> This patch introduces rx_handler list. size struct net_device stays
> intact. Measured performance regression on eth-br topo is ~1% (on received
> pkts generated by pktgen) and on eth-bond topo it is ~0.25%
> 
> On br I think that the performance can be brought back maybe by using per-cpu
> variables to store port in rx_path (I must check this)
> 
> Please comment.
> 
> Signed-off-by: Jiri Pirko <jpirko@redhat.com>

I am ok with the infrastructure, but why should Vlan use rh_handle.
It is wrong to allow macvlan and bridge to share same device.
Right now the code blocks users from doing lots of stupid things.


^ permalink raw reply

* Re: [PATCH] sctp: ABORT if receive queue is not empty while closing socket
From: Vladislav Yasevich @ 2011-06-30 16:27 UTC (permalink / raw)
  To: netdev, davem, Wei Yongjun, Sridhar Samudrala, linux-sctp
In-Reply-To: <20110630161938.GD24074@canuck.infradead.org>

On 06/30/2011 12:19 PM, Thomas Graf wrote:
> On Thu, Jun 30, 2011 at 10:11:06AM -0400, Vladislav Yasevich wrote:
>> On 06/30/2011 09:31 AM, Thomas Graf wrote:
>>> On Wed, Jun 29, 2011 at 12:14:41PM -0400, Vladislav Yasevich wrote:
>>>> Right.  The lack of ABORT from the receive of data is a bug.  I was trying to point out
>>>> that instead of modified the sender of data to send the ABORT, you modify the receiver
>>>> to send the ABORT when it is being closed while having data queued.
>>>
>>> Is this what you had in mind?
>>
>> Almost.  It could really be a simple true/false condition about recvqueue or inqueue
>> being non-empty.  If that's the case, trigger abort.
> 
> What would be the advantage of that?
> 

Wrt to true/false, it's simpler to test for non-empty then it is to go through and count
the data (but I perfectly ok with either way).  WRT to testing the inqueue, as you stated,
not everything may be in receive queue.

-vlad

^ permalink raw reply

* Re: [PATCH] sctp: ABORT if receive queue is not empty while closing socket
From: Thomas Graf @ 2011-06-30 16:19 UTC (permalink / raw)
  To: Vladislav Yasevich
  Cc: netdev, davem, Wei Yongjun, Sridhar Samudrala, linux-sctp
In-Reply-To: <4E0C83FA.2090909@hp.com>

On Thu, Jun 30, 2011 at 10:11:06AM -0400, Vladislav Yasevich wrote:
> On 06/30/2011 09:31 AM, Thomas Graf wrote:
> > On Wed, Jun 29, 2011 at 12:14:41PM -0400, Vladislav Yasevich wrote:
> >> Right.  The lack of ABORT from the receive of data is a bug.  I was trying to point out
> >> that instead of modified the sender of data to send the ABORT, you modify the receiver
> >> to send the ABORT when it is being closed while having data queued.
> > 
> > Is this what you had in mind?
> 
> Almost.  It could really be a simple true/false condition about recvqueue or inqueue
> being non-empty.  If that's the case, trigger abort.

What would be the advantage of that?

^ permalink raw reply

* Re: [PATCH] sctp: Enforce maximum retransmissions during shutdown
From: Thomas Graf @ 2011-06-30 16:17 UTC (permalink / raw)
  To: Vladislav Yasevich
  Cc: netdev, davem, Wei Yongjun, Sridhar Samudrala, linux-sctp
In-Reply-To: <4E0C8368.5090502@hp.com>

On Thu, Jun 30, 2011 at 10:08:40AM -0400, Vladislav Yasevich wrote:
> How about this.  If we in SHUTDOWN_PENDING state, let the errors accumulate upto
> max_retrans.  After that, start SHUTDOWN_GUARD timer to let the association live a
> bit longer just on the off-chance the receive comes back.  When SHUTDOWN_GUARD
> expires it will abort the association.
> 
> When we are in this state, SACK processing will have to reset SHUTDOWN_GUARD when
> the SACK is actually acknowledging something.

Good idea. I'll update my patch.

> > 
> > What sideeffects are you worried about resulting from my proposal?
> > 
> 
> There is a potential that the sender may abort prematurely.  The issue is that
> the sender has no way of knowing if the remote process somehow terminated and
> will never consume data, or if it is just extremely busy with something else and
> will come back.  Since this is a reliable protocol, we given the receive the benefit
> of the doubt and try our hardest to get the data across.

Understood although we are talking 10 * RTO here without an actual SACK.

> My suggestion above is still a bit of a hack that one could argue still violates the
> protocol, but the time period tries to remove as much doubt from the sender as possible
> the the receiver is really out-to-lunch.

Assuming that by 'shutdown sequence' the spec is only referring to the
SHUTDOWN / SHUTDOWN ACK exchange it would still violate the protocol
but I don't see how to avoid having association hang around forever without
violating the spec. This really looks like a hole in the spec to me.

^ permalink raw reply

* Re: [PATCH V7 4/4 net-next] vhost: vhost TX zero-copy support
From: Shirley Ma @ 2011-06-30 16:05 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: David Miller, Eric Dumazet, Avi Kivity, Arnd Bergmann, netdev,
	kvm, linux-kernel
In-Reply-To: <20110629091300.GC14627@redhat.com>

On Wed, 2011-06-29 at 12:13 +0300, Michael S. Tsirkin wrote:
> Assuming you mean vhost_zerocopy_signal_used, here's how I would do
> it:
> add a kref and a completion, signal completion in kref_put
> callback, when backend is set - kref_get, on cleanup,
> kref_put and then wait_for_completion_interruptible.
> Where's the need for another thread coming from?
> 
> If you like, post a patch with busywait + a FIXME comment,
> and I can write up a patch on top.
> 
> (BTW, ideally the function that does the signalling should be
> in core networking bits so that it's still around
> even if the vhost module gets removed). 

OK, I will modify the patch.

Thanks
Shirley


^ permalink raw reply

* [RFC patch net-next-2.6] net: allow multiple rx_handler registration
From: Jiri Pirko @ 2011-06-30 15:16 UTC (permalink / raw)
  To: netdev
  Cc: davem, shemminger, kaber, fubar, eric.dumazet, nicolas.2p.debian,
	andy

For some net topos it is necessary to have multiple "soft-net-devices"
hooked on one netdev. For example very common is to have
eth<->(br+vlan). Vlan is not using rh_handler (yet) but also for example
macvlan would be useful to have hooked on same netdev as br.

This patch introduces rx_handler list. size struct net_device stays
intact. Measured performance regression on eth-br topo is ~1% (on received
pkts generated by pktgen) and on eth-bond topo it is ~0.25%

On br I think that the performance can be brought back maybe by using per-cpu
variables to store port in rx_path (I must check this)

Please comment.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
 drivers/net/bonding/bond_main.c |   14 ++++---
 drivers/net/bonding/bonding.h   |    9 +++-
 drivers/net/macvlan.c           |   35 +++++++++++-----
 include/linux/netdevice.h       |   63 +++++++++++++++++++++++++---
 net/bridge/br_if.c              |    5 +-
 net/bridge/br_input.c           |    5 +-
 net/bridge/br_private.h         |   28 ++++++++++---
 net/core/dev.c                  |   87 +++++++++++++++++++++++++++++++--------
 8 files changed, 193 insertions(+), 53 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 61265f7..f18af47 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1482,7 +1482,8 @@ static bool bond_should_deliver_exact_match(struct sk_buff *skb,
 	return false;
 }
 
-static rx_handler_result_t bond_handle_frame(struct sk_buff **pskb)
+static rx_handler_result_t bond_handle_frame(struct sk_buff **pskb,
+					     struct rx_handler *rx_handler)
 {
 	struct sk_buff *skb = *pskb;
 	struct slave *slave;
@@ -1494,7 +1495,7 @@ static rx_handler_result_t bond_handle_frame(struct sk_buff **pskb)
 
 	*pskb = skb;
 
-	slave = bond_slave_get_rcu(skb->dev);
+	slave = bond_slave_get(rx_handler);
 	bond = slave->bond;
 
 	if (bond->params.arp_interval)
@@ -1897,8 +1898,9 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 	if (res)
 		goto err_close;
 
-	res = netdev_rx_handler_register(slave_dev, bond_handle_frame,
-					 new_slave);
+	res = netdev_rx_handler_register(slave_dev, &new_slave->rx_handler,
+					 bond_handle_frame,
+					 RX_HANDLER_PRIO_BOND);
 	if (res) {
 		pr_debug("Error %d calling netdev_rx_handler_register\n", res);
 		goto err_dest_symlinks;
@@ -1988,7 +1990,7 @@ int bond_release(struct net_device *bond_dev, struct net_device *slave_dev)
 	/* unregister rx_handler early so bond_handle_frame wouldn't be called
 	 * for this slave anymore.
 	 */
-	netdev_rx_handler_unregister(slave_dev);
+	netdev_rx_handler_unregister(slave_dev, &slave->rx_handler);
 	write_unlock_bh(&bond->lock);
 	synchronize_net();
 	write_lock_bh(&bond->lock);
@@ -2189,7 +2191,7 @@ static int bond_release_all(struct net_device *bond_dev)
 		/* unregister rx_handler early so bond_handle_frame wouldn't
 		 * be called for this slave anymore.
 		 */
-		netdev_rx_handler_unregister(slave_dev);
+		netdev_rx_handler_unregister(slave_dev, &slave->rx_handler);
 		synchronize_net();
 
 		if (bond_is_lb(bond)) {
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index 2936171..e732e16 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -172,6 +172,7 @@ struct vlan_entry {
 
 struct slave {
 	struct net_device *dev; /* first - useful for panic debug */
+	struct rx_handler rx_handler;
 	struct slave *next;
 	struct slave *prev;
 	struct bonding *bond; /* our master */
@@ -196,6 +197,11 @@ struct slave {
 #endif
 };
 
+#define bond_slave_get(rx_handler)			\
+	netdev_rx_handler_get_priv(rx_handler,		\
+				   struct slave,	\
+				   rx_handler)
+
 /*
  * Link pseudo-state only used internally by monitors
  */
@@ -253,9 +259,6 @@ struct bonding {
 #endif /* CONFIG_DEBUG_FS */
 };
 
-#define bond_slave_get_rcu(dev) \
-	((struct slave *) rcu_dereference(dev->rx_handler_data))
-
 /**
  * Returns NULL if the net_device does not belong to any of the bond's slaves
  *
diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index cc67cbe..49ca58b 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -34,19 +34,28 @@
 #define MACVLAN_HASH_SIZE	(1 << BITS_PER_BYTE)
 
 struct macvlan_port {
+	struct rx_handler	rx_handler;
 	struct net_device	*dev;
 	struct hlist_head	vlan_hash[MACVLAN_HASH_SIZE];
 	struct list_head	vlans;
 	struct rcu_head		rcu;
-	bool 			passthru;
+	bool			passthru;
 	int			count;
 };
 
+#define macvlan_port_get(rx_handler)				\
+	netdev_rx_handler_get_priv(rx_handler,			\
+				   struct macvlan_port,		\
+				   rx_handler)
+
+#define macvlan_port_get_by_dev(dev)					\
+	netdev_rx_handler_get_priv_by_prio(dev,				\
+					   RX_HANDLER_PRIO_MACVLAN,	\
+					   struct macvlan_port,		\
+					   rx_handler)
+
 static void macvlan_port_destroy(struct net_device *dev);
 
-#define macvlan_port_get_rcu(dev) \
-	((struct macvlan_port *) rcu_dereference(dev->rx_handler_data))
-#define macvlan_port_get(dev) ((struct macvlan_port *) dev->rx_handler_data)
 #define macvlan_port_exists(dev) (dev->priv_flags & IFF_MACVLAN_PORT)
 
 static struct macvlan_dev *macvlan_hash_lookup(const struct macvlan_port *port,
@@ -156,7 +165,8 @@ static void macvlan_broadcast(struct sk_buff *skb,
 }
 
 /* called under rcu_read_lock() from netif_receive_skb */
-static rx_handler_result_t macvlan_handle_frame(struct sk_buff **pskb)
+static rx_handler_result_t macvlan_handle_frame(struct sk_buff **pskb,
+						struct rx_handler *rx_handler)
 {
 	struct macvlan_port *port;
 	struct sk_buff *skb = *pskb;
@@ -167,7 +177,7 @@ static rx_handler_result_t macvlan_handle_frame(struct sk_buff **pskb)
 	unsigned int len = 0;
 	int ret = NET_RX_DROP;
 
-	port = macvlan_port_get_rcu(skb->dev);
+	port = macvlan_port_get(rx_handler);
 	if (is_multicast_ether_addr(eth->h_dest)) {
 		src = macvlan_hash_lookup(port, eth->h_source);
 		if (!src)
@@ -617,7 +627,9 @@ static int macvlan_port_create(struct net_device *dev)
 	for (i = 0; i < MACVLAN_HASH_SIZE; i++)
 		INIT_HLIST_HEAD(&port->vlan_hash[i]);
 
-	err = netdev_rx_handler_register(dev, macvlan_handle_frame, port);
+	err = netdev_rx_handler_register(dev, &port->rx_handler,
+					 macvlan_handle_frame,
+					 RX_HANDLER_PRIO_MACVLAN);
 	if (err)
 		kfree(port);
 	else
@@ -627,10 +639,11 @@ static int macvlan_port_create(struct net_device *dev)
 
 static void macvlan_port_destroy(struct net_device *dev)
 {
-	struct macvlan_port *port = macvlan_port_get(dev);
+	struct macvlan_dev *vlan = netdev_priv(dev);
+	struct macvlan_port *port = vlan->port;
 
 	dev->priv_flags &= ~IFF_MACVLAN_PORT;
-	netdev_rx_handler_unregister(dev);
+	netdev_rx_handler_unregister(dev, &port->rx_handler);
 	kfree_rcu(port, rcu);
 }
 
@@ -696,7 +709,7 @@ int macvlan_common_newlink(struct net *src_net, struct net_device *dev,
 		if (err < 0)
 			return err;
 	}
-	port = macvlan_port_get(lowerdev);
+	port = macvlan_port_get_by_dev(lowerdev);
 
 	/* Only 1 macvlan device can be created in passthru mode */
 	if (port->passthru)
@@ -818,7 +831,7 @@ static int macvlan_device_event(struct notifier_block *unused,
 	if (!macvlan_port_exists(dev))
 		return NOTIFY_DONE;
 
-	port = macvlan_port_get(dev);
+	port = macvlan_port_get_by_dev(dev);
 
 	switch (event) {
 	case NETDEV_CHANGE:
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 011eb89..126cd07 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -437,7 +437,51 @@ enum rx_handler_result {
 	RX_HANDLER_PASS,
 };
 typedef enum rx_handler_result rx_handler_result_t;
-typedef rx_handler_result_t rx_handler_func_t(struct sk_buff **pskb);
+
+struct rx_handler;
+typedef rx_handler_result_t rx_handler_func_t(struct sk_buff **pskb,
+					      struct rx_handler *rx_handler);
+
+enum rx_handler_prio {
+	RX_HANDLER_PRIO_BRIDGE,
+	RX_HANDLER_PRIO_BOND,
+	RX_HANDLER_PRIO_MACVLAN,
+};
+
+/*
+ * struct rx_handler should be embedded into
+ * private struct used by rx_handler
+ */
+struct rx_handler {
+	struct list_head	list;
+	rx_handler_func_t	*callback;
+	unsigned int		prio;
+};
+
+/**
+ * netdev_rx_handler_get_priv - get containing private structure of given
+ *				receive handler
+ * @rx_handler: receive_handler
+ * @type: the type of the container struct this is embedded in
+ * @member: the name of the member within the struct
+ */
+#define netdev_rx_handler_get_priv(rx_handler, type, member) \
+	container_of(rx_handler, type, member)
+
+/**
+ * netdev_rx_handler_get_priv_by_prio, netdev_rx_handler_get_priv_by_prio_rcu
+ *	- get containing private structure of given receive handler priority
+ * @dev: netdevice
+ * @type: the type of the container struct this is embedded in
+ * @member: the name of the member within the struct
+ */
+#define netdev_rx_handler_get_priv_by_prio(dev, prio, type, member)		\
+	netdev_rx_handler_get_priv(netdev_rx_handler_get_by_prio(dev, prio),	\
+				   type, member)
+
+#define netdev_rx_handler_get_priv_by_prio_rcu(dev, prio, type, member)		\
+	netdev_rx_handler_get_priv(netdev_rx_handler_get_by_prio_rcu(dev, prio),\
+				   type, member)
 
 extern void __napi_schedule(struct napi_struct *n);
 
@@ -1238,8 +1282,7 @@ struct net_device {
 #endif
 #endif
 
-	rx_handler_func_t __rcu	*rx_handler;
-	void __rcu		*rx_handler_data;
+	struct list_head	rx_handler_list;
 
 	struct netdev_queue __rcu *ingress_queue;
 
@@ -2082,10 +2125,18 @@ static inline void napi_free_frags(struct napi_struct *napi)
 	napi->skb = NULL;
 }
 
+extern struct rx_handler *
+netdev_rx_handler_get_by_prio(const struct net_device *dev,
+			      unsigned int prio);
+extern struct rx_handler *
+netdev_rx_handler_get_by_prio_rcu(const struct net_device *dev,
+				  unsigned int prio);
 extern int netdev_rx_handler_register(struct net_device *dev,
-				      rx_handler_func_t *rx_handler,
-				      void *rx_handler_data);
-extern void netdev_rx_handler_unregister(struct net_device *dev);
+				      struct rx_handler *rx_handler,
+			              rx_handler_func_t *callback,
+				      unsigned int prio);
+extern void netdev_rx_handler_unregister(struct net_device *dev,
+					 struct rx_handler *rx_handler);
 
 extern int		dev_valid_name(const char *name);
 extern int		dev_ioctl(struct net *net, unsigned int cmd, void __user *);
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 1bacca4..4ee5d78 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -146,7 +146,7 @@ static void del_nbp(struct net_bridge_port *p)
 
 	dev->priv_flags &= ~IFF_BRIDGE_PORT;
 
-	netdev_rx_handler_unregister(dev);
+	netdev_rx_handler_unregister(dev, &p->rx_handler);
 	synchronize_net();
 
 	netdev_set_master(dev, NULL);
@@ -365,7 +365,8 @@ int br_add_if(struct net_bridge *br, struct net_device *dev)
 	if (err)
 		goto err3;
 
-	err = netdev_rx_handler_register(dev, br_handle_frame, p);
+	err = netdev_rx_handler_register(dev, &p->rx_handler, br_handle_frame,
+					 RX_HANDLER_PRIO_BRIDGE);
 	if (err)
 		goto err4;
 
diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index f3ac1e8..5f396d8 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -140,7 +140,8 @@ static inline int is_link_local(const unsigned char *dest)
  * Return NULL if skb is handled
  * note: already called with rcu_read_lock
  */
-rx_handler_result_t br_handle_frame(struct sk_buff **pskb)
+rx_handler_result_t br_handle_frame(struct sk_buff **pskb,
+				    struct rx_handler *rx_handler)
 {
 	struct net_bridge_port *p;
 	struct sk_buff *skb = *pskb;
@@ -157,7 +158,7 @@ rx_handler_result_t br_handle_frame(struct sk_buff **pskb)
 	if (!skb)
 		return RX_HANDLER_CONSUMED;
 
-	p = br_port_get_rcu(skb->dev);
+	p = br_port_get(rx_handler);
 
 	if (unlikely(is_link_local(dest))) {
 		/* Pause frames shouldn't be passed up by driver anyway */
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 54578f2..1a1ea40 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -108,6 +108,7 @@ struct net_bridge_mdb_htable
 
 struct net_bridge_port
 {
+	struct rx_handler		rx_handler;
 	struct net_bridge		*br;
 	struct net_device		*dev;
 	struct list_head		list;
@@ -152,18 +153,32 @@ struct net_bridge_port
 #endif
 };
 
+#define br_port_get(rx_handler)					\
+	netdev_rx_handler_get_priv(rx_handler,			\
+				   struct net_bridge_port,	\
+				   rx_handler)
+
 #define br_port_exists(dev) (dev->priv_flags & IFF_BRIDGE_PORT)
 
-static inline struct net_bridge_port *br_port_get_rcu(const struct net_device *dev)
+static inline struct net_bridge_port *
+br_port_get_rcu(const struct net_device *dev)
 {
-	struct net_bridge_port *port = rcu_dereference(dev->rx_handler_data);
-	return br_port_exists(dev) ? port : NULL;
+	if (unlikely(!br_port_exists(dev)))
+		return NULL;
+	return netdev_rx_handler_get_priv_by_prio_rcu(dev,
+						      RX_HANDLER_PRIO_BRIDGE,
+						      struct net_bridge_port,
+						      rx_handler);
 }
 
 static inline struct net_bridge_port *br_port_get_rtnl(struct net_device *dev)
 {
-	return br_port_exists(dev) ?
-		rtnl_dereference(dev->rx_handler_data) : NULL;
+	if (unlikely(!br_port_exists(dev)))
+		return NULL;
+	return netdev_rx_handler_get_priv_by_prio(dev,
+						  RX_HANDLER_PRIO_BRIDGE,
+						  struct net_bridge_port,
+						  rx_handler);
 }
 
 struct br_cpu_netstats {
@@ -382,7 +397,8 @@ extern u32 br_features_recompute(struct net_bridge *br, u32 features);
 
 /* br_input.c */
 extern int br_handle_frame_finish(struct sk_buff *skb);
-extern rx_handler_result_t br_handle_frame(struct sk_buff **pskb);
+extern rx_handler_result_t br_handle_frame(struct sk_buff **pskb,
+					   struct rx_handler *rx_handler);
 
 /* br_ioctl.c */
 extern int br_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd);
diff --git a/net/core/dev.c b/net/core/dev.c
index 6b6ef14..92d9007 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3043,10 +3043,55 @@ out:
 #endif
 
 /**
+ *	netdev_rx_handler_get_by_prio - get receive handler struct by priority
+ *	@dev: net device
+ *	@prio: receive handler priority
+ *
+ *	Find and return receive handler for given priority.
+ *
+ *	The caller must hold the rtnl_mutex.
+ */
+struct rx_handler *
+netdev_rx_handler_get_by_prio(const struct net_device *dev, unsigned int prio)
+{
+	struct rx_handler *rx_handler;
+
+	ASSERT_RTNL();
+	list_for_each_entry(rx_handler, &dev->rx_handler_list, list)
+		if (rx_handler->prio == prio)
+			return rx_handler;
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(netdev_rx_handler_get_by_prio);
+
+/**
+ *	netdev_rx_handler_get_by_prio_rcu - get receive handler struct by priority
+ *	@dev: net device
+ *	@prio: receive handler priority
+ *
+ *	RCU variant to find and return receive handler for given priority.
+ *
+ *	The caller must hold the rcu_read_lock.
+ */
+struct rx_handler *
+netdev_rx_handler_get_by_prio_rcu(const struct net_device *dev,
+				  unsigned int prio)
+{
+	struct rx_handler *rx_handler;
+
+	list_for_each_entry_rcu(rx_handler, &dev->rx_handler_list, list)
+		if (rx_handler->prio == prio)
+			return rx_handler;
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(netdev_rx_handler_get_by_prio_rcu);
+
+/**
  *	netdev_rx_handler_register - register receive handler
  *	@dev: device to register a handler for
- *	@rx_handler: receive handler to register
- *	@rx_handler_data: data pointer that is used by rx handler
+ *	@rx_handler: receive handler structure to register
+ *	@callback: receive handler callback function to register
+ *	@prio: receive handler priority
  *
  *	Register a receive hander for a device. This handler will then be
  *	called from __netif_receive_skb. A negative errno code is returned
@@ -3057,17 +3102,24 @@ out:
  *	For a general description of rx_handler, see enum rx_handler_result.
  */
 int netdev_rx_handler_register(struct net_device *dev,
-			       rx_handler_func_t *rx_handler,
-			       void *rx_handler_data)
+			       struct rx_handler *rx_handler,
+			       rx_handler_func_t *callback, unsigned int prio)
 {
-	ASSERT_RTNL();
+	struct list_head *pos;
 
-	if (dev->rx_handler)
+	ASSERT_RTNL();
+	if (netdev_rx_handler_get_by_prio(dev, prio))
 		return -EBUSY;
+	list_for_each(pos, &dev->rx_handler_list) {
+		struct rx_handler *entry;
 
-	rcu_assign_pointer(dev->rx_handler_data, rx_handler_data);
-	rcu_assign_pointer(dev->rx_handler, rx_handler);
-
+		entry = list_entry(pos, struct rx_handler, list);
+		if (prio > entry->prio)
+			break;
+	}
+	rx_handler->callback = callback;
+	rx_handler->prio = prio;
+	list_add_rcu(&rx_handler->list, pos);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(netdev_rx_handler_register);
@@ -3075,24 +3127,24 @@ EXPORT_SYMBOL_GPL(netdev_rx_handler_register);
 /**
  *	netdev_rx_handler_unregister - unregister receive handler
  *	@dev: device to unregister a handler from
+ *	@prio: handler priority
  *
  *	Unregister a receive hander from a device.
  *
  *	The caller must hold the rtnl_mutex.
  */
-void netdev_rx_handler_unregister(struct net_device *dev)
+void netdev_rx_handler_unregister(struct net_device *dev,
+				  struct rx_handler *rx_handler)
 {
-
 	ASSERT_RTNL();
-	rcu_assign_pointer(dev->rx_handler, NULL);
-	rcu_assign_pointer(dev->rx_handler_data, NULL);
+	list_del_rcu(&rx_handler->list);
 }
 EXPORT_SYMBOL_GPL(netdev_rx_handler_unregister);
 
 static int __netif_receive_skb(struct sk_buff *skb)
 {
 	struct packet_type *ptype, *pt_prev;
-	rx_handler_func_t *rx_handler;
+	struct rx_handler *rx_handler;
 	struct net_device *orig_dev;
 	struct net_device *null_or_dev;
 	bool deliver_exact = false;
@@ -3152,13 +3204,12 @@ another_round:
 ncls:
 #endif
 
-	rx_handler = rcu_dereference(skb->dev->rx_handler);
-	if (rx_handler) {
+	list_for_each_entry_rcu(rx_handler, &skb->dev->rx_handler_list, list) {
 		if (pt_prev) {
 			ret = deliver_skb(skb, pt_prev, orig_dev);
 			pt_prev = NULL;
 		}
-		switch (rx_handler(&skb)) {
+		switch (rx_handler->callback(&skb, rx_handler)) {
 		case RX_HANDLER_CONSUMED:
 			goto out;
 		case RX_HANDLER_ANOTHER:
@@ -5870,6 +5921,8 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 	INIT_LIST_HEAD(&dev->napi_list);
 	INIT_LIST_HEAD(&dev->unreg_list);
 	INIT_LIST_HEAD(&dev->link_watch_list);
+	INIT_LIST_HEAD(&dev->rx_handler_list);
+
 	dev->priv_flags = IFF_XMIT_DST_RELEASE;
 	setup(dev);
 
-- 
1.7.5.4


^ permalink raw reply related

* Re: Skipping past TCP lost packet in userspace
From: Neil Horman @ 2011-06-30 14:36 UTC (permalink / raw)
  To: Josh Lehan
  Cc: janardhan.iyengar, Janardhan Iyengar, rick.jones2, Yuchung Cheng,
	netdev, Bryan Ford
In-Reply-To: <4E0C35F4.6050901@krellan.com>

On Thu, Jun 30, 2011 at 01:38:12AM -0700, Josh Lehan wrote:
> On 06/24/2011 07:58 AM, Janardhan Iyengar wrote:
> > Thanks for your note.  I agree that it does seem like we're simply
> > adding to the metaphorical pile.  And my first knee-jerk response would
> > be that there's not much else one can do in the modern IPv4 Internet :-)
> 
> Thanks, I also appreciate you reviving this thread.  I was surprised at
> the hostility here, towards an idea that we both think is necessary and
> practical, given the realities of today's Internet.
> 
> TCP is at the middle of the hourglass, as you said.  Even UDP isn't
> universally allowed (it's not all that uncommon to see UDP blocked,
> except for DNS packets to whitelisted DNS servers).  At least one ISP,
> "AT&T U-Verse", no longer allows the customer their choice of Internet
> router, and the ISP's mandated router will filter all traffic in both
> directions, so if the packet isn't recognized by its simple little
> stateful firewall, into the bit bucket it goes.  Have fun trying to pass
> SCTP or DCCP through that!
> 
I'll leave the rest of this alone, since its pretty obvious that no one is going
to break TCP for you, but just so that you're aware, The only reason you have to
use the 2-Wire gateway that AT&T provides is because there are no commercially
available routers that support the uplink interface (which I expect will change
eventually).  In the time being, if you want to use a different router, place
the RG in bridge mode by selecting a host as your DMZ device.  That will assign
the wan address to that connected device via DHCP and allow you to pass whatever
traffic you want through it.  I use it to pass SCTP and IPv6 traffice all the
time, works great.
Neil


^ permalink raw reply

* Re: [PATCH] sctp: ABORT if receive queue is not empty while closing socket
From: Vladislav Yasevich @ 2011-06-30 14:11 UTC (permalink / raw)
  To: netdev, davem, Wei Yongjun, Sridhar Samudrala, linux-sctp
In-Reply-To: <20110630133122.GB24074@canuck.infradead.org>

On 06/30/2011 09:31 AM, Thomas Graf wrote:
> On Wed, Jun 29, 2011 at 12:14:41PM -0400, Vladislav Yasevich wrote:
>> Right.  The lack of ABORT from the receive of data is a bug.  I was trying to point out
>> that instead of modified the sender of data to send the ABORT, you modify the receiver
>> to send the ABORT when it is being closed while having data queued.
> 
> Is this what you had in mind?

Almost.  It could really be a simple true/false condition about recvqueue or inqueue
being non-empty.  If that's the case, trigger abort.

-vlad

> 
> Trigger user ABORT when a socket is closed which has skbs sitting on
> the receive queue. If data was lost, there is no point in doing a
> graceful shutdown. This is consistent with TCP behaviour.
> 
> This also resolves the situation when a receiver cannot reopen its rwnd
> and the sender continues retransmission attempts indefinitely before
> initiating the shutdown.
> 
> Signed-off-by: Thomas Graf <tgraf@infradead.org>
> 
> diff --git a/include/net/sctp/ulpevent.h b/include/net/sctp/ulpevent.h
> index 99b027b..ca4693b 100644
> --- a/include/net/sctp/ulpevent.h
> +++ b/include/net/sctp/ulpevent.h
> @@ -80,7 +80,7 @@ static inline struct sctp_ulpevent *sctp_skb2event(struct sk_buff *skb)
>  
>  void sctp_ulpevent_free(struct sctp_ulpevent *);
>  int sctp_ulpevent_is_notification(const struct sctp_ulpevent *);
> -void sctp_queue_purge_ulpevents(struct sk_buff_head *list);
> +unsigned int sctp_queue_purge_ulpevents(struct sk_buff_head *list);
>  
>  struct sctp_ulpevent *sctp_ulpevent_make_assoc_change(
>  	const struct sctp_association *asoc,
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index 6766913..958253a 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -1384,6 +1384,7 @@ SCTP_STATIC void sctp_close(struct sock *sk, long timeout)
>  	struct sctp_endpoint *ep;
>  	struct sctp_association *asoc;
>  	struct list_head *pos, *temp;
> +	unsigned int data_was_unread;
>  
>  	SCTP_DEBUG_PRINTK("sctp_close(sk: 0x%p, timeout:%ld)\n", sk, timeout);
>  
> @@ -1393,6 +1394,10 @@ SCTP_STATIC void sctp_close(struct sock *sk, long timeout)
>  
>  	ep = sctp_sk(sk)->ep;
>  
> +	/* Clean up any skbs sitting on the receive queue.  */
> +	data_was_unread = sctp_queue_purge_ulpevents(&sk->sk_receive_queue);
> +	data_was_unread += sctp_queue_purge_ulpevents(&sctp_sk(sk)->pd_lobby);
> +
>  	/* Walk all associations on an endpoint.  */
>  	list_for_each_safe(pos, temp, &ep->asocs) {
>  		asoc = list_entry(pos, struct sctp_association, asocs);
> @@ -1410,7 +1415,8 @@ SCTP_STATIC void sctp_close(struct sock *sk, long timeout)
>  			}
>  		}
>  
> -		if (sock_flag(sk, SOCK_LINGER) && !sk->sk_lingertime) {
> +		if (data_was_unread ||
> +		    (sock_flag(sk, SOCK_LINGER) && !sk->sk_lingertime)) {
>  			struct sctp_chunk *chunk;
>  
>  			chunk = sctp_make_abort_user(asoc, NULL, 0);
> @@ -1420,10 +1426,6 @@ SCTP_STATIC void sctp_close(struct sock *sk, long timeout)
>  			sctp_primitive_SHUTDOWN(asoc, NULL);
>  	}
>  
> -	/* Clean up any skbs sitting on the receive queue.  */
> -	sctp_queue_purge_ulpevents(&sk->sk_receive_queue);
> -	sctp_queue_purge_ulpevents(&sctp_sk(sk)->pd_lobby);
> -
>  	/* On a TCP-style socket, block for at most linger_time if set. */
>  	if (sctp_style(sk, TCP) && timeout)
>  		sctp_wait_for_close(sk, timeout);
> diff --git a/net/sctp/ulpevent.c b/net/sctp/ulpevent.c
> index e70e5fc..aab3184 100644
> --- a/net/sctp/ulpevent.c
> +++ b/net/sctp/ulpevent.c
> @@ -1081,9 +1081,19 @@ void sctp_ulpevent_free(struct sctp_ulpevent *event)
>  }
>  
>  /* Purge the skb lists holding ulpevents. */
> -void sctp_queue_purge_ulpevents(struct sk_buff_head *list)
> +unsigned int sctp_queue_purge_ulpevents(struct sk_buff_head *list)
>  {
>  	struct sk_buff *skb;
> -	while ((skb = skb_dequeue(list)) != NULL)
> +	unsigned int data_unread = 0;
> +
> +	while ((skb = skb_dequeue(list)) != NULL) {
> +		struct sctp_ulpevent *event = sctp_skb2event(skb);
> +
> +		if (!sctp_ulpevent_is_notification(event))
> +			data_unread += skb->len;
> +
>  		sctp_ulpevent_free(sctp_skb2event(skb));
> +	}
> +
> +	return data_unread;
>  }
> 


^ permalink raw reply

* Re: [PATCH] sctp: Enforce maximum retransmissions during shutdown
From: Vladislav Yasevich @ 2011-06-30 14:08 UTC (permalink / raw)
  To: netdev, davem, Wei Yongjun, Sridhar Samudrala, linux-sctp
In-Reply-To: <20110630084933.GA24074@canuck.infradead.org>

On 06/30/2011 04:49 AM, Thomas Graf wrote:
> On Wed, Jun 29, 2011 at 12:14:41PM -0400, Vladislav Yasevich wrote:
>> Right.  The lack of ABORT from the receive of data is a bug.  I was trying to point out
>> that instead of modified the sender of data to send the ABORT, you modify the receiver
>> to send the ABORT when it is being closed while having data queued.
> 
> Agreed. This makes a good procedure if there is data is on
> sk_receive_queue and gets us in line with TCP although I don't see this
> in the spec at all :-)
> 
>> But we don't even get to sending the SHUTDOWN, so from the wire protocol, we
>> do not violated it.  We have bad behavior in that when both sender and receiver
>> are dead, the association is hung.
> 
> So how do we get out if ...
> 
> 1) there is nothing queued on sk_receive_queue but the window still
>    remains 0 forver?

sk_receive_queue isn't the only queue you have to check.  You'll need to check
the reassembly and ordering queues, as partial or out of order things might stuck
there.  That's would be an extremely rare condition since if we ever get here, the
first thing we do is reneg on those TSN and open the window to get the missing chunk
in and push complete packet up to sk_receive_queue.

>    
> 2) the receiver is an older Linux without the above fix or another stack
>    that does not ABORT?

crap....

How about this.  If we in SHUTDOWN_PENDING state, let the errors accumulate upto
max_retrans.  After that, start SHUTDOWN_GUARD timer to let the association live a
bit longer just on the off-chance the receive comes back.  When SHUTDOWN_GUARD
expires it will abort the association.

When we are in this state, SACK processing will have to reset SHUTDOWN_GUARD when
the SACK is actually acknowledging something.

> 
> I agree that using ABORT on the receiver is the ideal way whenver
> possible but we still need to fix this if the receiver does not do so.
> 
> What sideeffects are you worried about resulting from my proposal?
> 

There is a potential that the sender may abort prematurely.  The issue is that
the sender has no way of knowing if the remote process somehow terminated and
will never consume data, or if it is just extremely busy with something else and
will come back.  Since this is a reliable protocol, we given the receive the benefit
of the doubt and try our hardest to get the data across.

My suggestion above is still a bit of a hack that one could argue still violates the
protocol, but the time period tries to remove as much doubt from the sender as possible
the the receiver is really out-to-lunch.

-vlad

^ permalink raw reply

* [PATCH] sctp: ABORT if receive queue is not empty while closing socket
From: Thomas Graf @ 2011-06-30 13:31 UTC (permalink / raw)
  To: Vladislav Yasevich
  Cc: netdev, davem, Wei Yongjun, Sridhar Samudrala, linux-sctp
In-Reply-To: <4E0B4F71.4020108@hp.com>

On Wed, Jun 29, 2011 at 12:14:41PM -0400, Vladislav Yasevich wrote:
> Right.  The lack of ABORT from the receive of data is a bug.  I was trying to point out
> that instead of modified the sender of data to send the ABORT, you modify the receiver
> to send the ABORT when it is being closed while having data queued.

Is this what you had in mind?

Trigger user ABORT when a socket is closed which has skbs sitting on
the receive queue. If data was lost, there is no point in doing a
graceful shutdown. This is consistent with TCP behaviour.

This also resolves the situation when a receiver cannot reopen its rwnd
and the sender continues retransmission attempts indefinitely before
initiating the shutdown.

Signed-off-by: Thomas Graf <tgraf@infradead.org>

diff --git a/include/net/sctp/ulpevent.h b/include/net/sctp/ulpevent.h
index 99b027b..ca4693b 100644
--- a/include/net/sctp/ulpevent.h
+++ b/include/net/sctp/ulpevent.h
@@ -80,7 +80,7 @@ static inline struct sctp_ulpevent *sctp_skb2event(struct sk_buff *skb)
 
 void sctp_ulpevent_free(struct sctp_ulpevent *);
 int sctp_ulpevent_is_notification(const struct sctp_ulpevent *);
-void sctp_queue_purge_ulpevents(struct sk_buff_head *list);
+unsigned int sctp_queue_purge_ulpevents(struct sk_buff_head *list);
 
 struct sctp_ulpevent *sctp_ulpevent_make_assoc_change(
 	const struct sctp_association *asoc,
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 6766913..958253a 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1384,6 +1384,7 @@ SCTP_STATIC void sctp_close(struct sock *sk, long timeout)
 	struct sctp_endpoint *ep;
 	struct sctp_association *asoc;
 	struct list_head *pos, *temp;
+	unsigned int data_was_unread;
 
 	SCTP_DEBUG_PRINTK("sctp_close(sk: 0x%p, timeout:%ld)\n", sk, timeout);
 
@@ -1393,6 +1394,10 @@ SCTP_STATIC void sctp_close(struct sock *sk, long timeout)
 
 	ep = sctp_sk(sk)->ep;
 
+	/* Clean up any skbs sitting on the receive queue.  */
+	data_was_unread = sctp_queue_purge_ulpevents(&sk->sk_receive_queue);
+	data_was_unread += sctp_queue_purge_ulpevents(&sctp_sk(sk)->pd_lobby);
+
 	/* Walk all associations on an endpoint.  */
 	list_for_each_safe(pos, temp, &ep->asocs) {
 		asoc = list_entry(pos, struct sctp_association, asocs);
@@ -1410,7 +1415,8 @@ SCTP_STATIC void sctp_close(struct sock *sk, long timeout)
 			}
 		}
 
-		if (sock_flag(sk, SOCK_LINGER) && !sk->sk_lingertime) {
+		if (data_was_unread ||
+		    (sock_flag(sk, SOCK_LINGER) && !sk->sk_lingertime)) {
 			struct sctp_chunk *chunk;
 
 			chunk = sctp_make_abort_user(asoc, NULL, 0);
@@ -1420,10 +1426,6 @@ SCTP_STATIC void sctp_close(struct sock *sk, long timeout)
 			sctp_primitive_SHUTDOWN(asoc, NULL);
 	}
 
-	/* Clean up any skbs sitting on the receive queue.  */
-	sctp_queue_purge_ulpevents(&sk->sk_receive_queue);
-	sctp_queue_purge_ulpevents(&sctp_sk(sk)->pd_lobby);
-
 	/* On a TCP-style socket, block for at most linger_time if set. */
 	if (sctp_style(sk, TCP) && timeout)
 		sctp_wait_for_close(sk, timeout);
diff --git a/net/sctp/ulpevent.c b/net/sctp/ulpevent.c
index e70e5fc..aab3184 100644
--- a/net/sctp/ulpevent.c
+++ b/net/sctp/ulpevent.c
@@ -1081,9 +1081,19 @@ void sctp_ulpevent_free(struct sctp_ulpevent *event)
 }
 
 /* Purge the skb lists holding ulpevents. */
-void sctp_queue_purge_ulpevents(struct sk_buff_head *list)
+unsigned int sctp_queue_purge_ulpevents(struct sk_buff_head *list)
 {
 	struct sk_buff *skb;
-	while ((skb = skb_dequeue(list)) != NULL)
+	unsigned int data_unread = 0;
+
+	while ((skb = skb_dequeue(list)) != NULL) {
+		struct sctp_ulpevent *event = sctp_skb2event(skb);
+
+		if (!sctp_ulpevent_is_notification(event))
+			data_unread += skb->len;
+
 		sctp_ulpevent_free(sctp_skb2event(skb));
+	}
+
+	return data_unread;
 }

^ permalink raw reply related

* Re: possible bridge regression in "bridge: implement [add/del]_slave ops"?
From: Alexander Stein @ 2011-06-30 13:27 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David S. Miller, bridge, netdev, linux-kernel
In-Reply-To: <201106301033.23997.alexander.stein@systec-electronic.com>

On Thursday 30 June 2011 10:33:23 Alexander Stein wrote:
> BTW: I noticed that in 2.6.39.2 independently from this patch revert this
> bridge didn't show up RUNNING ifconfg. Is this intended? Another bridge I
> have, which doesn't use (R)STP, is shown as RUNNING like before.

This change was caused by commit 1faa4356a3bd89ea11fb92752d897cff3a20ec0e 
"bridge: control carrier based on ports online". It prevents the bridge from 
actually receiving/sending packets. Reverting restores the old behavior.

^ permalink raw reply

* Re: [PATCHv2 NEXT 1/2] net: add external loopback test in ethtool self test
From: Ben Hutchings @ 2011-06-30 13:25 UTC (permalink / raw)
  To: amit.salecha; +Cc: davem, netdev, ameen.rahman, sucheta.chakraborty
In-Reply-To: <1309413650-15952-2-git-send-email-amit.salecha@qlogic.com>

On Wed, 2011-06-29 at 23:00 -0700, amit.salecha@qlogic.com wrote:
> From: Amit Kumar Salecha <amit.salecha@qlogic.com>
> 
> External loopback test can be performed by application without any driver
> support on normal Ethernet cards.
> But on CNA devices, where multiple functions share same physical port.
> Here internal loopback test and external loopback test can be initiated by
> multiple functions at same time. To co exist all functions, firmware need
> to regulate what test can be run by which function. So before performing external
> loopback test, command need to send to firmware, which will quiescent other functions.
> 
> User may not want to run external loopback test always. As special cable need to be
> connected for this test.
> So adding explicit flag in ethtool self test, which will specify interface
> to perform external loopback test.
>  ETH_TEST_FL_EXTERNAL_LB: Application set to request external loopback test
>  ETH_TEST_FL_EXTERNAL_LB_DONE: Driver ack if test performed
> 
> Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>

Ben.

> ---
>  include/linux/ethtool.h |   16 ++++++++++++++--
>  1 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
> index 048d0fa..c6e427a 100644
> --- a/include/linux/ethtool.h
> +++ b/include/linux/ethtool.h
> @@ -310,9 +310,21 @@ struct ethtool_sset_info {
>  				   __u32's, etc. */
>  };
>  
> +/**
> + * enum ethtool_test_flags - flags definition of ethtool_test
> + * @ETH_TEST_FL_OFFLINE: if set perform online and offline tests, otherwise
> + *	only online tests.
> + * @ETH_TEST_FL_FAILED: Driver set this flag if test fails.
> + * @ETH_TEST_FL_EXTERNAL_LB: Application request to perform external loopback
> + *	test.
> + * @ETH_TEST_FL_EXTERNAL_LB_DONE: Driver performed the external loopback test
> + */
> +
>  enum ethtool_test_flags {
> -	ETH_TEST_FL_OFFLINE	= (1 << 0),	/* online / offline */
> -	ETH_TEST_FL_FAILED	= (1 << 1),	/* test passed / failed */
> +	ETH_TEST_FL_OFFLINE	= (1 << 0),
> +	ETH_TEST_FL_FAILED	= (1 << 1),
> +	ETH_TEST_FL_EXTERNAL_LB	= (1 << 2),
> +	ETH_TEST_FL_EXTERNAL_LB_DONE	= (1 << 3),
>  };
>  
>  /* for requesting NIC test and getting results*/

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* [PATCH 5/5] IEEE 802.15.4: do not enable driver debugging by default
From: Dmitry Eremin-Solenikov @ 2011-06-30 12:37 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Werner Almesberger
In-Reply-To: <1309437468-31021-1-git-send-email-dbaryshkov@gmail.com>

From: Werner Almesberger <werner@almesberger.net>

The IEEE 802.15.4 drivers were compiled by default with debugging,
which caused them to be rather chatty and slow. This patch silences
them. People debugging drivers can still add a #define DEBUG in the
beginning of the respective file or use dynamic debug

This patch also removes the now unused option CONFIG_FFD.

Signed-off-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
---
 drivers/ieee802154/Makefile |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/drivers/ieee802154/Makefile b/drivers/ieee802154/Makefile
index 6899913..800a389 100644
--- a/drivers/ieee802154/Makefile
+++ b/drivers/ieee802154/Makefile
@@ -1,3 +1 @@
 obj-$(CONFIG_IEEE802154_FAKEHARD) += fakehard.o
-
-ccflags-y := -DDEBUG -DCONFIG_FFD
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH 4/5] ieee802154: free skb buffer if dev isn't running
From: Dmitry Eremin-Solenikov @ 2011-06-30 12:37 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Alexander Smirnov
In-Reply-To: <1309437468-31021-1-git-send-email-dbaryshkov@gmail.com>

From: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>

Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
---
 net/ieee802154/af_ieee802154.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ieee802154/af_ieee802154.c b/net/ieee802154/af_ieee802154.c
index 6df6ecf..40e606f 100644
--- a/net/ieee802154/af_ieee802154.c
+++ b/net/ieee802154/af_ieee802154.c
@@ -302,7 +302,7 @@ static int ieee802154_rcv(struct sk_buff *skb, struct net_device *dev,
 	struct packet_type *pt, struct net_device *orig_dev)
 {
 	if (!netif_running(dev))
-		return -ENODEV;
+		goto drop;
 	pr_debug("got frame, type %d, dev %p\n", dev->type, dev);
 #ifdef DEBUG
 	print_hex_dump_bytes("ieee802154_rcv ", DUMP_PREFIX_NONE, skb->data, skb->len);
-- 
1.7.5.4


^ permalink raw reply related

* [PATCH 3/5] ieee802154: it's IEEE 802.15.4, not ZigBee
From: Dmitry Eremin-Solenikov @ 2011-06-30 12:37 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev
In-Reply-To: <1309437468-31021-1-git-send-email-dbaryshkov@gmail.com>

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
---
 net/ieee802154/dgram.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ieee802154/dgram.c b/net/ieee802154/dgram.c
index 1a3334c..faecf64 100644
--- a/net/ieee802154/dgram.c
+++ b/net/ieee802154/dgram.c
@@ -1,5 +1,5 @@
 /*
- * ZigBee socket interface
+ * IEEE 802.15.4 dgram socket interface
  *
  * Copyright 2007, 2008 Siemens AG
  *
-- 
1.7.5.4


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox