Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 1/9] mm: add generic adaptive large memory allocation APIs
From: Peter Zijlstra @ 2010-05-13 13:20 UTC (permalink / raw)
  To: Changli Gao
  Cc: akpm, Hoang-Nam Nguyen, Christoph Raisch, Roland Dreier,
	Sean Hefty, Hal Rosenstock, Divy Le Ray, James E.J. Bottomley,
	Theodore Ts'o, Andreas Dilger, Alexander Viro, Paul Menage,
	Li Zefan, linux-rdma, linux-kernel, netdev, linux-scsi,
	linux-ext4, linux-fsdevel, linux-mm, containers, Eric Dumazet,
	Tetsuo Handa
In-Reply-To: <1273744285-8128-1-git-send-email-xiaosuo@gmail.com>

On Thu, 2010-05-13 at 17:51 +0800, Changli Gao wrote:
> +void *__kvmalloc(size_t size, gfp_t flags)
> +{
> +       void *ptr;
> +
> +       if (size < PAGE_SIZE)
> +               return kmalloc(size, GFP_KERNEL | flags);
> +       size = PAGE_ALIGN(size);
> +       if (is_power_of_2(size))
> +               ptr = (void *)__get_free_pages(GFP_KERNEL | flags |
> +                                              __GFP_NOWARN, get_order(size));
> +       else
> +               ptr = alloc_pages_exact(size, GFP_KERNEL | flags |
> +                                             __GFP_NOWARN);
> +       if (ptr != NULL) {
> +               virt_to_head_page(ptr)->private = size;
> +               return ptr;
> +       }
> +
> +       ptr = vmalloc(size);
> +       if (ptr != NULL && (flags & __GFP_ZERO))
> +               memset(ptr, 0, size);
> +
> +       return ptr;
> +}
> +EXPORT_SYMBOL(__kvmalloc);

So if I do kvmalloc(size, GFP_ATOMIC) I get GFP_KERNEL|GFP_ATOMIC, which
is not a recommended variation because one should not mix __GFP_WAIT and
__GFP_HIGH.

So I would simply drop the gfp argument to avoid confusion.

> +void __kvfree(void *ptr, bool inatomic)
> +{
> +       if (unlikely(ZERO_OR_NULL_PTR(ptr)))
> +               return;
> +       if (is_vmalloc_addr(ptr)) {
> +               if (inatomic) {
> +                       struct work_struct *work;
> +
> +                       work = ptr;
> +                       BUILD_BUG_ON(sizeof(struct work_struct) > PAGE_SIZE);
> +                       INIT_WORK(work, kvfree_work);
> +                       schedule_work(work);
> +               } else {
> +                       vfree(ptr);
> +               }
> +       } else {
> +               struct page *page;
> +
> +               page = virt_to_head_page(ptr);
> +               if (PageSlab(page) || PageCompound(page))
> +                       kfree(ptr);
> +               else if (is_power_of_2(page->private))
> +                       free_pages((unsigned long)ptr,
> +                                  get_order(page->private));
> +               else
> +                       free_pages_exact(ptr, page->private);
> +       }
> +}
> +EXPORT_SYMBOL(__kvfree); 

NAK, I really utterly dislike that inatomic argument. The alloc side
doesn't function in atomic context either. Please keep the thing
symmetric in that regards.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH net-next] net/: (netfilter related) Remove unnecessary returns from void function()s
From: Patrick McHardy @ 2010-05-13 13:20 UTC (permalink / raw)
  To: Joe Perches; +Cc: David S. Miller, netdev, netfilter-devel
In-Reply-To: <1273600346.20514.247.camel@Joe-Laptop.home>

Joe Perches wrote:
> This patch removes from net/ netfilter files
> all the unnecessary return; statements that precede the
> last closing brace of void functions.
> 
> It does not remove the returns that are immediately
> preceded by a label as gcc doesn't like that.
> 
> Done via:
> $ grep -rP --include=*.[ch] -l "return;\n}" net/ | \
>   xargs perl -i -e 'local $/ ; while (<>) { s/\n[ \t\n]+return;\n}/\n}/g; print; }'
> 

Applied, but I changed the patch to keep the two return statements
below in otherwise empty function bodies since I consider that more
aesthetically pleasing.

> --- a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
> +++ b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
> @@ -452,6 +452,5 @@ module_exit(nf_conntrack_l3proto_ipv4_fini);
>  
>  void need_ipv4_conntrack(void)
>  {
> -	return;
>  }
>  EXPORT_SYMBOL_GPL(need_ipv4_conntrack);
> diff --git a/net/ipv4/netfilter/nf_nat_proto_gre.c b/net/ipv4/netfilter/nf_nat_proto_gre.c
> index d7e8920..39efbe6 100644
> --- a/net/ipv4/netfilter/nf_nat_proto_gre.c
> +++ b/net/ipv4/netfilter/nf_nat_proto_gre.c
> @@ -144,6 +144,5 @@ module_exit(nf_nat_proto_gre_fini);
>  
>  void nf_nat_need_gre(void)
>  {
> -	return;
>  }

^ permalink raw reply

* iptables NEW or SYN
From: Markus Feldmann @ 2010-05-13 13:17 UTC (permalink / raw)
  To: netdev

Hi All,

i am still experiment with iptables and the state extension. Some of my 
rules shall catch packets with the state NEW, but it doesn't catch all 
packets. Sometimes there are packets that have the SYN Flag set which go 
through my rules with the state NEW.

Any idea why?
I thought SYN is included in the state NEW, is that wrong?

What is the difference between SYN and NEW?

regards Markus

^ permalink raw reply

* Re: [PATCH] net sched: cleanup and rate limit warning
From: Patrick McHardy @ 2010-05-13 13:12 UTC (permalink / raw)
  To: hadi; +Cc: Stephen Hemminger, David Miller, netdev
In-Reply-To: <1273696903.16074.25.camel@bigi>

jamal wrote:
> On Wed, 2010-05-12 at 13:20 -0700, Stephen Hemminger wrote:
> 
>> The tool isn't generating an action (just tc filter ... police ..)
>> so it is getting the unfortunate default of reclassify.
> 
> Ah ok. 
> My advice: you should never ever depend on defaults when you
> can be explicit and say "drop". Or have the users in your tool be able
> to specify what action to take if rate is exceeded etc (actually i think
> juniper does that) 
> I think "drop" would be the sane default for over-limit - my memory is
> hazy because i assumed that was the default but there may have been some
> reservations on that default. Patrick?

I don't remeber ever discussing that, the choice of "reclassify" as
default precedes TC actions and is already present in the oldest
iproute2 version I could find (2.2.4-ss000225).

^ permalink raw reply

* [PATCH NEXT 9/9] qlcnic: add idc debug registers
From: Amit Kumar Salecha @ 2010-05-13 13:07 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman, Sucheta Chakraborty
In-Reply-To: <1273756070-7205-1-git-send-email-amit.salecha@qlogic.com>

From: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>

When ever driver changes the device state, it should write
pci-func number and timestamp in debug registers.

Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
 drivers/net/qlcnic/qlcnic.h      |    2 ++
 drivers/net/qlcnic/qlcnic_main.c |   21 +++++++++++++++++++++
 2 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/drivers/net/qlcnic/qlcnic.h b/drivers/net/qlcnic/qlcnic.h
index 5c89f4e..2ed34cd 100644
--- a/drivers/net/qlcnic/qlcnic.h
+++ b/drivers/net/qlcnic/qlcnic.h
@@ -960,6 +960,8 @@ struct qlcnic_adapter {
 
 	u8 mac_addr[ETH_ALEN];
 
+	u64 dev_rst_time;
+
 	struct qlcnic_adapter_stats stats;
 
 	struct qlcnic_recv_context recv_ctx;
diff --git a/drivers/net/qlcnic/qlcnic_main.c b/drivers/net/qlcnic/qlcnic_main.c
index ce8118c..4bf3d62 100644
--- a/drivers/net/qlcnic/qlcnic_main.c
+++ b/drivers/net/qlcnic/qlcnic_main.c
@@ -84,6 +84,7 @@ static void qlcnic_remove_sysfs_entries(struct qlcnic_adapter *adapter);
 static void qlcnic_create_diag_entries(struct qlcnic_adapter *adapter);
 static void qlcnic_remove_diag_entries(struct qlcnic_adapter *adapter);
 
+static void qlcnic_idc_debug_info(struct qlcnic_adapter *adapter, u8 encoding);
 static void qlcnic_clr_all_drv_state(struct qlcnic_adapter *adapter);
 static int qlcnic_can_start_firmware(struct qlcnic_adapter *adapter);
 
@@ -621,6 +622,7 @@ wait_init:
 		goto err_out;
 
 	QLCWR32(adapter, QLCNIC_CRB_DEV_STATE, QLCNIC_DEV_READY);
+	qlcnic_idc_debug_info(adapter, 1);
 
 	qlcnic_check_options(adapter);
 
@@ -1057,6 +1059,7 @@ qlcnic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	adapter = netdev_priv(netdev);
 	adapter->netdev  = netdev;
 	adapter->pdev    = pdev;
+	adapter->dev_rst_time = jiffies;
 	adapter->ahw.pci_func  = pci_func_id;
 
 	revision_id = pdev->revision;
@@ -1888,6 +1891,19 @@ static void qlcnic_poll_controller(struct net_device *netdev)
 }
 #endif
 
+static void
+qlcnic_idc_debug_info(struct qlcnic_adapter *adapter, u8 encoding)
+{
+	u32 val;
+
+	val = adapter->portnum & 0xf;
+	val |= encoding << 7;
+	val |= (jiffies - adapter->dev_rst_time) << 8;
+
+	QLCWR32(adapter, QLCNIC_CRB_DRV_SCRATCH, val);
+	adapter->dev_rst_time = jiffies;
+}
+
 static int
 qlcnic_set_drv_state(struct qlcnic_adapter *adapter, u8 state)
 {
@@ -2011,6 +2027,7 @@ qlcnic_can_start_firmware(struct qlcnic_adapter *adapter)
 	case QLCNIC_DEV_COLD:
 		QLCWR32(adapter, QLCNIC_CRB_DEV_STATE, QLCNIC_DEV_INITIALIZING);
 		QLCWR32(adapter, QLCNIC_CRB_DRV_IDC_VER, QLCNIC_DRV_IDC_VER);
+		qlcnic_idc_debug_info(adapter, 0);
 		qlcnic_api_unlock(adapter);
 		return 1;
 
@@ -2103,6 +2120,8 @@ skip_ack_check:
 			qlcnic_schedule_work(adapter, qlcnic_fwinit_work,
 						FW_POLL_DELAY * 2);
 			QLCDB(adapter, DRV, "Quiscing the driver\n");
+			qlcnic_idc_debug_info(adapter, 0);
+
 			qlcnic_api_unlock(adapter);
 			return;
 		}
@@ -2112,6 +2131,7 @@ skip_ack_check:
 						QLCNIC_DEV_INITIALIZING);
 			set_bit(__QLCNIC_START_FW, &adapter->state);
 			QLCDB(adapter, DRV, "Restarting fw\n");
+			qlcnic_idc_debug_info(adapter, 0);
 		}
 
 		qlcnic_api_unlock(adapter);
@@ -2207,6 +2227,7 @@ qlcnic_dev_request_reset(struct qlcnic_adapter *adapter)
 	if (state == QLCNIC_DEV_READY) {
 		QLCWR32(adapter, QLCNIC_CRB_DEV_STATE, QLCNIC_DEV_NEED_RESET);
 		QLCDB(adapter, DRV, "NEED_RESET state set\n");
+		qlcnic_idc_debug_info(adapter, 0);
 	}
 
 	qlcnic_api_unlock(adapter);
-- 
1.6.0.2


^ permalink raw reply related

* [PATCH NEXT 7/9] qlcnic: check device class
From: Amit Kumar Salecha @ 2010-05-13 13:07 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman, Sucheta Chakraborty
In-Reply-To: <1273756070-7205-1-git-send-email-amit.salecha@qlogic.com>

From: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>

pci-func class can be other than ethernet in Qlogic CNA device.

Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
 drivers/net/qlcnic/qlcnic.h      |    2 +-
 drivers/net/qlcnic/qlcnic_init.c |   16 +++++++++++++++-
 drivers/net/qlcnic/qlcnic_main.c |    8 ++------
 3 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/drivers/net/qlcnic/qlcnic.h b/drivers/net/qlcnic/qlcnic.h
index 8a9412f..5c89f4e 100644
--- a/drivers/net/qlcnic/qlcnic.h
+++ b/drivers/net/qlcnic/qlcnic.h
@@ -1037,7 +1037,7 @@ int qlcnic_need_fw_reset(struct qlcnic_adapter *adapter);
 void qlcnic_request_firmware(struct qlcnic_adapter *adapter);
 void qlcnic_release_firmware(struct qlcnic_adapter *adapter);
 int qlcnic_pinit_from_rom(struct qlcnic_adapter *adapter);
-void qlcnic_setup_idc_param(struct qlcnic_adapter *adapter);
+int qlcnic_setup_idc_param(struct qlcnic_adapter *adapter);
 
 int qlcnic_rom_fast_read(struct qlcnic_adapter *adapter, int addr, int *valp);
 int qlcnic_rom_fast_read_words(struct qlcnic_adapter *adapter, int addr,
diff --git a/drivers/net/qlcnic/qlcnic_init.c b/drivers/net/qlcnic/qlcnic_init.c
index c4bef57..77bfdab 100644
--- a/drivers/net/qlcnic/qlcnic_init.c
+++ b/drivers/net/qlcnic/qlcnic_init.c
@@ -514,10 +514,22 @@ int qlcnic_pinit_from_rom(struct qlcnic_adapter *adapter)
 	return 0;
 }
 
-void
+int
 qlcnic_setup_idc_param(struct qlcnic_adapter *adapter) {
 
 	int timeo;
+	u32 val;
+
+	val = QLCRD32(adapter, QLCNIC_CRB_DEV_PARTITION_INFO);
+	val = (val >> (adapter->portnum * 4)) & 0xf;
+
+	if ((val & 0x3) != 1) {
+		dev_err(&adapter->pdev->dev, "Not an Ethernet NIC func=%u\n",
+									val);
+		return -EIO;
+	}
+
+	adapter->physical_port = (val >> 2);
 
 	if (qlcnic_rom_fast_read(adapter, QLCNIC_ROM_DEV_INIT_TIMEOUT, &timeo))
 		timeo = 30;
@@ -528,6 +540,8 @@ qlcnic_setup_idc_param(struct qlcnic_adapter *adapter) {
 		timeo = 10;
 
 	adapter->reset_ack_timeo = timeo;
+
+	return 0;
 }
 
 static int
diff --git a/drivers/net/qlcnic/qlcnic_main.c b/drivers/net/qlcnic/qlcnic_main.c
index e94a96d..ce57229 100644
--- a/drivers/net/qlcnic/qlcnic_main.c
+++ b/drivers/net/qlcnic/qlcnic_main.c
@@ -1081,17 +1081,13 @@ qlcnic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		goto err_out_iounmap;
 	}
 
-	qlcnic_setup_idc_param(adapter);
+	if (qlcnic_setup_idc_param(adapter))
+		goto err_out_iounmap;
 
 	err = qlcnic_start_firmware(adapter);
 	if (err)
 		goto err_out_decr_ref;
 
-	/*
-	 * See if the firmware gave us a virtual-physical port mapping.
-	 */
-	adapter->physical_port = adapter->portnum;
-
 	qlcnic_clear_stats(adapter);
 
 	qlcnic_setup_intr(adapter);
-- 
1.6.0.2


^ permalink raw reply related

* [PATCH NEXT 8/9] qlcnic: support quisce mode
From: Amit Kumar Salecha @ 2010-05-13 13:07 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman, Sucheta Chakraborty
In-Reply-To: <1273756070-7205-1-git-send-email-amit.salecha@qlogic.com>

From: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>

Device can go to quiescent state, during which drivers
should refrain from using the device.

Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
 drivers/net/qlcnic/qlcnic_main.c |   28 ++++++++++++++++++++++++++--
 1 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/drivers/net/qlcnic/qlcnic_main.c b/drivers/net/qlcnic/qlcnic_main.c
index ce57229..ce8118c 100644
--- a/drivers/net/qlcnic/qlcnic_main.c
+++ b/drivers/net/qlcnic/qlcnic_main.c
@@ -2044,8 +2044,11 @@ qlcnic_can_start_firmware(struct qlcnic_adapter *adapter)
 
 	do {
 		msleep(1000);
-	} while ((QLCRD32(adapter, QLCNIC_CRB_DEV_STATE) != QLCNIC_DEV_READY)
-			&& --dev_init_timeo);
+		prev_state = QLCRD32(adapter, QLCNIC_CRB_DEV_STATE);
+
+		if (prev_state == QLCNIC_DEV_QUISCENT)
+			continue;
+	} while ((prev_state != QLCNIC_DEV_READY) && --dev_init_timeo);
 
 	if (!dev_init_timeo) {
 		dev_err(&adapter->pdev->dev,
@@ -2076,6 +2079,14 @@ qlcnic_fwinit_work(struct work_struct *work)
 	if (qlcnic_api_lock(adapter))
 		goto err_ret;
 
+	dev_state = QLCRD32(adapter, QLCNIC_CRB_DEV_STATE);
+	if (dev_state ==  QLCNIC_DEV_QUISCENT) {
+		qlcnic_api_unlock(adapter);
+		qlcnic_schedule_work(adapter, qlcnic_fwinit_work,
+						FW_POLL_DELAY * 2);
+		return;
+	}
+
 	if (adapter->fw_wait_cnt++ > adapter->reset_ack_timeo) {
 		dev_err(&adapter->pdev->dev, "Reset:Failed to get ack %d sec\n",
 					adapter->reset_ack_timeo);
@@ -2085,6 +2096,17 @@ qlcnic_fwinit_work(struct work_struct *work)
 	if (!qlcnic_check_drv_state(adapter)) {
 skip_ack_check:
 		dev_state = QLCRD32(adapter, QLCNIC_CRB_DEV_STATE);
+
+		if (dev_state == QLCNIC_DEV_NEED_QUISCENT) {
+			QLCWR32(adapter, QLCNIC_CRB_DEV_STATE,
+						QLCNIC_DEV_QUISCENT);
+			qlcnic_schedule_work(adapter, qlcnic_fwinit_work,
+						FW_POLL_DELAY * 2);
+			QLCDB(adapter, DRV, "Quiscing the driver\n");
+			qlcnic_api_unlock(adapter);
+			return;
+		}
+
 		if (dev_state == QLCNIC_DEV_NEED_RESET) {
 			QLCWR32(adapter, QLCNIC_CRB_DEV_STATE,
 						QLCNIC_DEV_INITIALIZING);
@@ -2107,6 +2129,8 @@ skip_ack_check:
 	QLCDB(adapter, HW, "Func waiting: Device state=%u\n", dev_state);
 
 	switch (dev_state) {
+	case QLCNIC_DEV_QUISCENT:
+	case QLCNIC_DEV_NEED_QUISCENT:
 	case QLCNIC_DEV_NEED_RESET:
 		qlcnic_schedule_work(adapter,
 			qlcnic_fwinit_work, FW_POLL_DELAY);
-- 
1.6.0.2


^ permalink raw reply related

* [PATCH NEXT 5/9] qlcnic: cleanup unused code
From: Amit Kumar Salecha @ 2010-05-13 13:07 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman
In-Reply-To: <1273756070-7205-1-git-send-email-amit.salecha@qlogic.com>

LRO ring, cut-thru mode and specific fw version are not
valid to Qlogic CNA device.

Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
 drivers/net/qlcnic/qlcnic.h         |    6 ------
 drivers/net/qlcnic/qlcnic_ethtool.c |    1 -
 drivers/net/qlcnic/qlcnic_init.c    |   31 ++++++-------------------------
 drivers/net/qlcnic/qlcnic_main.c    |   33 ++++-----------------------------
 4 files changed, 10 insertions(+), 61 deletions(-)

diff --git a/drivers/net/qlcnic/qlcnic.h b/drivers/net/qlcnic/qlcnic.h
index 189453f..ed9df15 100644
--- a/drivers/net/qlcnic/qlcnic.h
+++ b/drivers/net/qlcnic/qlcnic.h
@@ -98,8 +98,6 @@
 #define QLCNIC_CT_DEFAULT_RX_BUF_LEN	2048
 #define QLCNIC_LRO_BUFFER_EXTRA		2048
 
-#define QLCNIC_RX_LRO_BUFFER_LENGTH		(8060)
-
 /* Opcodes to be used with the commands */
 #define TX_ETHER_PKT	0x01
 #define TX_TCP_PKT	0x02
@@ -133,7 +131,6 @@
 
 #define RCV_RING_NORMAL 0
 #define RCV_RING_JUMBO	1
-#define RCV_RING_LRO	2
 
 #define MIN_CMD_DESCRIPTORS		64
 #define MIN_RCV_DESCRIPTORS		64
@@ -144,7 +141,6 @@
 #define MAX_RCV_DESCRIPTORS_10G 	8192
 #define MAX_JUMBO_RCV_DESCRIPTORS_1G	512
 #define MAX_JUMBO_RCV_DESCRIPTORS_10G	1024
-#define MAX_LRO_RCV_DESCRIPTORS		8
 
 #define DEFAULT_RCV_DESCRIPTORS_1G	2048
 #define DEFAULT_RCV_DESCRIPTORS_10G	4096
@@ -402,7 +398,6 @@ struct qlcnic_hardware_context {
 	rwlock_t crb_lock;
 	struct mutex mem_lock;
 
-	u8 cut_through;
 	u8 revision_id;
 	u8 pci_func;
 	u8 linkup;
@@ -917,7 +912,6 @@ struct qlcnic_adapter {
 	u16 num_txd;
 	u16 num_rxd;
 	u16 num_jumbo_rxd;
-	u16 num_lro_rxd;
 
 	u8 max_rds_rings;
 	u8 max_sds_rings;
diff --git a/drivers/net/qlcnic/qlcnic_ethtool.c b/drivers/net/qlcnic/qlcnic_ethtool.c
index e494ccc..3bd514e 100644
--- a/drivers/net/qlcnic/qlcnic_ethtool.c
+++ b/drivers/net/qlcnic/qlcnic_ethtool.c
@@ -412,7 +412,6 @@ qlcnic_get_ringparam(struct net_device *dev,
 
 	ring->rx_pending = adapter->num_rxd;
 	ring->rx_jumbo_pending = adapter->num_jumbo_rxd;
-	ring->rx_jumbo_pending += adapter->num_lro_rxd;
 	ring->tx_pending = adapter->num_txd;
 
 	if (adapter->ahw.port_type == QLCNIC_GBE) {
diff --git a/drivers/net/qlcnic/qlcnic_init.c b/drivers/net/qlcnic/qlcnic_init.c
index 9b01042..c4bef57 100644
--- a/drivers/net/qlcnic/qlcnic_init.c
+++ b/drivers/net/qlcnic/qlcnic_init.c
@@ -230,17 +230,8 @@ int qlcnic_alloc_sw_resources(struct qlcnic_adapter *adapter)
 		switch (ring) {
 		case RCV_RING_NORMAL:
 			rds_ring->num_desc = adapter->num_rxd;
-			if (adapter->ahw.cut_through) {
-				rds_ring->dma_size =
-					QLCNIC_CT_DEFAULT_RX_BUF_LEN;
-				rds_ring->skb_size =
-					QLCNIC_CT_DEFAULT_RX_BUF_LEN;
-			} else {
-				rds_ring->dma_size =
-					QLCNIC_P3_RX_BUF_MAX_LEN;
-				rds_ring->skb_size =
-					rds_ring->dma_size + NET_IP_ALIGN;
-			}
+			rds_ring->dma_size = QLCNIC_P3_RX_BUF_MAX_LEN;
+			rds_ring->skb_size = rds_ring->dma_size + NET_IP_ALIGN;
 			break;
 
 		case RCV_RING_JUMBO:
@@ -254,13 +245,6 @@ int qlcnic_alloc_sw_resources(struct qlcnic_adapter *adapter)
 			rds_ring->skb_size =
 				rds_ring->dma_size + NET_IP_ALIGN;
 			break;
-
-		case RCV_RING_LRO:
-			rds_ring->num_desc = adapter->num_lro_rxd;
-			rds_ring->dma_size = QLCNIC_RX_LRO_BUFFER_LENGTH;
-			rds_ring->skb_size = rds_ring->dma_size + NET_IP_ALIGN;
-			break;
-
 		}
 		rds_ring->rx_buf_arr = (struct qlcnic_rx_buffer *)
 			vmalloc(RCV_BUFF_RINGSIZE(rds_ring));
@@ -556,12 +540,10 @@ qlcnic_has_mn(struct qlcnic_adapter *adapter)
 			QLCNIC_FW_VERSION_OFFSET, (int *)&flashed_ver);
 	flashed_ver = QLCNIC_DECODE_VERSION(flashed_ver);
 
-	if (flashed_ver >= QLCNIC_VERSION_CODE(4, 0, 220)) {
+	capability = QLCRD32(adapter, QLCNIC_PEG_TUNE_CAPABILITY);
+	if (capability & QLCNIC_PEG_TUNE_MN_PRESENT)
+		return 1;
 
-		capability = QLCRD32(adapter, QLCNIC_PEG_TUNE_CAPABILITY);
-		if (capability & QLCNIC_PEG_TUNE_MN_PRESENT)
-			return 1;
-	}
 	return 0;
 }
 
@@ -1279,8 +1261,7 @@ qlcnic_alloc_rx_skb(struct qlcnic_adapter *adapter,
 
 	skb = buffer->skb;
 
-	if (!adapter->ahw.cut_through)
-		skb_reserve(skb, 2);
+	skb_reserve(skb, 2);
 
 	dma = pci_map_single(pdev, skb->data,
 			rds_ring->dma_size, PCI_DMA_FROMDEVICE);
diff --git a/drivers/net/qlcnic/qlcnic_main.c b/drivers/net/qlcnic/qlcnic_main.c
index f10ea32..d86c50d 100644
--- a/drivers/net/qlcnic/qlcnic_main.c
+++ b/drivers/net/qlcnic/qlcnic_main.c
@@ -544,22 +544,10 @@ qlcnic_check_options(struct qlcnic_adapter *adapter)
 				brd_name, adapter->ahw.revision_id);
 	}
 
-	if (adapter->fw_version < QLCNIC_VERSION_CODE(3, 4, 216)) {
-		adapter->driver_mismatch = 1;
-		dev_warn(&pdev->dev, "firmware version %d.%d.%d unsupported\n",
-				fw_major, fw_minor, fw_build);
-		return;
-	}
-
-	i = QLCRD32(adapter, QLCNIC_SRE_MISC);
-	adapter->ahw.cut_through = (i & 0x8000) ? 1 : 0;
-
-	dev_info(&pdev->dev, "firmware v%d.%d.%d [%s]\n",
-			fw_major, fw_minor, fw_build,
-			adapter->ahw.cut_through ? "cut-through" : "legacy");
+	dev_info(&pdev->dev, "firmware v%d.%d.%d\n",
+			fw_major, fw_minor, fw_build);
 
-	if (adapter->fw_version >= QLCNIC_VERSION_CODE(4, 0, 222))
-		adapter->capabilities = QLCRD32(adapter, CRB_FW_CAPABILITIES_1);
+	adapter->capabilities = QLCRD32(adapter, CRB_FW_CAPABILITIES_1);
 
 	adapter->flags &= ~QLCNIC_LRO_ENABLED;
 
@@ -576,7 +564,6 @@ qlcnic_check_options(struct qlcnic_adapter *adapter)
 
 	adapter->num_txd = MAX_CMD_DESCRIPTORS;
 
-	adapter->num_lro_rxd = 0;
 	adapter->max_rds_rings = 2;
 }
 
@@ -2567,24 +2554,12 @@ qlcnic_remove_diag_entries(struct qlcnic_adapter *adapter)
 
 #define is_qlcnic_netdev(dev) (dev->netdev_ops == &qlcnic_netdev_ops)
 
-static int
-qlcnic_destip_supported(struct qlcnic_adapter *adapter)
-{
-	if (adapter->ahw.cut_through)
-		return 0;
-
-	return 1;
-}
-
 static void
 qlcnic_config_indev_addr(struct net_device *dev, unsigned long event)
 {
 	struct in_device *indev;
 	struct qlcnic_adapter *adapter = netdev_priv(dev);
 
-	if (!qlcnic_destip_supported(adapter))
-		return;
-
 	indev = in_dev_get(dev);
 	if (!indev)
 		return;
@@ -2664,7 +2639,7 @@ recheck:
 
 	adapter = netdev_priv(dev);
 
-	if (!adapter || !qlcnic_destip_supported(adapter))
+	if (!adapter)
 		goto done;
 
 	if (adapter->is_up != QLCNIC_ADAPTER_UP_MAGIC)
-- 
1.6.0.2


^ permalink raw reply related

* [PATCH NEXT 4/9] qlcnic: cleanup dma mask setting
From: Amit Kumar Salecha @ 2010-05-13 13:07 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman
In-Reply-To: <1273756070-7205-1-git-send-email-amit.salecha@qlogic.com>

Device support 64 bit dma mask.

Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
 drivers/net/qlcnic/qlcnic.h      |    1 -
 drivers/net/qlcnic/qlcnic_main.c |   93 ++++++++++----------------------------
 2 files changed, 24 insertions(+), 70 deletions(-)

diff --git a/drivers/net/qlcnic/qlcnic.h b/drivers/net/qlcnic/qlcnic.h
index cfedcbd..189453f 100644
--- a/drivers/net/qlcnic/qlcnic.h
+++ b/drivers/net/qlcnic/qlcnic.h
@@ -924,7 +924,6 @@ struct qlcnic_adapter {
 	u8 driver_mismatch;
 	u8 msix_supported;
 	u8 rx_csum;
-	u8 pci_using_dac;
 	u8 portnum;
 	u8 physical_port;
 
diff --git a/drivers/net/qlcnic/qlcnic_main.c b/drivers/net/qlcnic/qlcnic_main.c
index 4652c7b..f10ea32 100644
--- a/drivers/net/qlcnic/qlcnic_main.c
+++ b/drivers/net/qlcnic/qlcnic_main.c
@@ -242,66 +242,6 @@ static void qlcnic_clear_stats(struct qlcnic_adapter *adapter)
 	return;
 }
 
-static int qlcnic_set_dma_mask(struct qlcnic_adapter *adapter)
-{
-	struct pci_dev *pdev = adapter->pdev;
-	u64 mask, cmask;
-
-	adapter->pci_using_dac = 0;
-
-	mask = DMA_BIT_MASK(39);
-	cmask = mask;
-
-	if (pci_set_dma_mask(pdev, mask) == 0 &&
-			pci_set_consistent_dma_mask(pdev, cmask) == 0) {
-		adapter->pci_using_dac = 1;
-		return 0;
-	}
-
-	return -EIO;
-}
-
-/* Update addressable range if firmware supports it */
-static int
-qlcnic_update_dma_mask(struct qlcnic_adapter *adapter)
-{
-	int change, shift, err;
-	u64 mask, old_mask, old_cmask;
-	struct pci_dev *pdev = adapter->pdev;
-
-	change = 0;
-
-	shift = QLCRD32(adapter, CRB_DMA_SHIFT);
-	if (shift > 32)
-		return 0;
-
-	if (shift > 9)
-		change = 1;
-
-	if (change) {
-		old_mask = pdev->dma_mask;
-		old_cmask = pdev->dev.coherent_dma_mask;
-
-		mask = DMA_BIT_MASK(32+shift);
-
-		err = pci_set_dma_mask(pdev, mask);
-		if (err)
-			goto err_out;
-
-		err = pci_set_consistent_dma_mask(pdev, mask);
-		if (err)
-			goto err_out;
-		dev_info(&pdev->dev, "using %d-bit dma mask\n", 32+shift);
-	}
-
-	return 0;
-
-err_out:
-	pci_set_dma_mask(pdev, old_mask);
-	pci_set_consistent_dma_mask(pdev, old_cmask);
-	return err;
-}
-
 static void qlcnic_set_port_mode(struct qlcnic_adapter *adapter)
 {
 	u32 val, data;
@@ -645,10 +585,6 @@ qlcnic_start_firmware(struct qlcnic_adapter *adapter)
 {
 	int val, err, first_boot;
 
-	err = qlcnic_set_dma_mask(adapter);
-	if (err)
-		return err;
-
 	err = qlcnic_can_start_firmware(adapter);
 	if (err < 0)
 		return err;
@@ -699,8 +635,6 @@ wait_init:
 
 	QLCWR32(adapter, QLCNIC_CRB_DEV_STATE, QLCNIC_DEV_READY);
 
-	qlcnic_update_dma_mask(adapter);
-
 	qlcnic_check_options(adapter);
 
 	adapter->need_fw_reset = 0;
@@ -1026,7 +960,7 @@ qlcnic_reset_context(struct qlcnic_adapter *adapter)
 
 static int
 qlcnic_setup_netdev(struct qlcnic_adapter *adapter,
-		struct net_device *netdev)
+		struct net_device *netdev, u8 pci_using_dac)
 {
 	int err;
 	struct pci_dev *pdev = adapter->pdev;
@@ -1049,7 +983,7 @@ qlcnic_setup_netdev(struct qlcnic_adapter *adapter,
 	netdev->features |= (NETIF_F_IPV6_CSUM | NETIF_F_TSO6);
 	netdev->vlan_features |= (NETIF_F_IPV6_CSUM | NETIF_F_TSO6);
 
-	if (adapter->pci_using_dac) {
+	if (pci_using_dac) {
 		netdev->features |= NETIF_F_HIGHDMA;
 		netdev->vlan_features |= NETIF_F_HIGHDMA;
 	}
@@ -1079,6 +1013,22 @@ qlcnic_setup_netdev(struct qlcnic_adapter *adapter,
 	return 0;
 }
 
+static int qlcnic_set_dma_mask(struct pci_dev *pdev, u8 *pci_using_dac)
+{
+	if (!pci_set_dma_mask(pdev, DMA_BIT_MASK(64)) &&
+			!pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64)))
+		*pci_using_dac = 1;
+	else if (!pci_set_dma_mask(pdev, DMA_BIT_MASK(32)) &&
+			!pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32)))
+		*pci_using_dac = 0;
+	else {
+		dev_err(&pdev->dev, "Unable to set DMA mask, aborting\n");
+		return -EIO;
+	}
+
+	return 0;
+}
+
 static int __devinit
 qlcnic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 {
@@ -1087,6 +1037,7 @@ qlcnic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	int err;
 	int pci_func_id = PCI_FUNC(pdev->devfn);
 	uint8_t revision_id;
+	uint8_t pci_using_dac;
 
 	err = pci_enable_device(pdev);
 	if (err)
@@ -1097,6 +1048,10 @@ qlcnic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		goto err_out_disable_pdev;
 	}
 
+	err = qlcnic_set_dma_mask(pdev, &pci_using_dac);
+	if (err)
+		goto err_out_disable_pdev;
+
 	err = pci_request_regions(pdev, qlcnic_driver_name);
 	if (err)
 		goto err_out_disable_pdev;
@@ -1154,7 +1109,7 @@ qlcnic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	qlcnic_setup_intr(adapter);
 
-	err = qlcnic_setup_netdev(adapter, netdev);
+	err = qlcnic_setup_netdev(adapter, netdev, pci_using_dac);
 	if (err)
 		goto err_out_disable_msi;
 
-- 
1.6.0.2


^ permalink raw reply related

* [PATCH NEXT 6/9] qlcnic: check IDC version
From: Amit Kumar Salecha @ 2010-05-13 13:07 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman, Sucheta Chakraborty
In-Reply-To: <1273756070-7205-1-git-send-email-amit.salecha@qlogic.com>

From: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>

Warn user if IDC version mismatch with different class of drivers.

Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
 drivers/net/qlcnic/qlcnic.h      |    1 +
 drivers/net/qlcnic/qlcnic_main.c |   20 ++++++++++++++++++--
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/net/qlcnic/qlcnic.h b/drivers/net/qlcnic/qlcnic.h
index ed9df15..8a9412f 100644
--- a/drivers/net/qlcnic/qlcnic.h
+++ b/drivers/net/qlcnic/qlcnic.h
@@ -53,6 +53,7 @@
 #define _QLCNIC_LINUX_MINOR 0
 #define _QLCNIC_LINUX_SUBVERSION 2
 #define QLCNIC_LINUX_VERSIONID  "5.0.2"
+#define QLCNIC_DRV_IDC_VER  0x01
 
 #define QLCNIC_VERSION_CODE(a, b, c)	(((a) << 24) + ((b) << 16) + (c))
 #define _major(v)	(((v) >> 24) & 0xff)
diff --git a/drivers/net/qlcnic/qlcnic_main.c b/drivers/net/qlcnic/qlcnic_main.c
index d86c50d..e94a96d 100644
--- a/drivers/net/qlcnic/qlcnic_main.c
+++ b/drivers/net/qlcnic/qlcnic_main.c
@@ -1976,12 +1976,25 @@ qlcnic_check_drv_state(struct qlcnic_adapter *adapter)
 		return 1;
 }
 
+static int qlcnic_check_idc_ver(struct qlcnic_adapter *adapter)
+{
+	u32 val = QLCRD32(adapter, QLCNIC_CRB_DRV_IDC_VER);
+
+	if (val != QLCNIC_DRV_IDC_VER) {
+		dev_warn(&adapter->pdev->dev, "IDC Version mismatch, driver's"
+			" idc ver = %x; reqd = %x\n", QLCNIC_DRV_IDC_VER, val);
+	}
+
+	return 0;
+}
+
 static int
 qlcnic_can_start_firmware(struct qlcnic_adapter *adapter)
 {
 	u32 val, prev_state;
 	u8 dev_init_timeo = adapter->dev_init_timeo;
 	u8 portnum = adapter->portnum;
+	u8 ret;
 
 	if (test_and_clear_bit(__QLCNIC_START_FW, &adapter->state))
 		return 1;
@@ -2001,12 +2014,14 @@ qlcnic_can_start_firmware(struct qlcnic_adapter *adapter)
 	switch (prev_state) {
 	case QLCNIC_DEV_COLD:
 		QLCWR32(adapter, QLCNIC_CRB_DEV_STATE, QLCNIC_DEV_INITIALIZING);
+		QLCWR32(adapter, QLCNIC_CRB_DRV_IDC_VER, QLCNIC_DRV_IDC_VER);
 		qlcnic_api_unlock(adapter);
 		return 1;
 
 	case QLCNIC_DEV_READY:
+		ret = qlcnic_check_idc_ver(adapter);
 		qlcnic_api_unlock(adapter);
-		return 0;
+		return ret;
 
 	case QLCNIC_DEV_NEED_RESET:
 		val = QLCRD32(adapter, QLCNIC_CRB_DRV_STATE);
@@ -2049,9 +2064,10 @@ qlcnic_can_start_firmware(struct qlcnic_adapter *adapter)
 	QLC_DEV_CLR_RST_QSCNT(val, portnum);
 	QLCWR32(adapter, QLCNIC_CRB_DRV_STATE, val);
 
+	ret = qlcnic_check_idc_ver(adapter);
 	qlcnic_api_unlock(adapter);
 
-	return 0;
+	return ret;
 }
 
 static void
-- 
1.6.0.2


^ permalink raw reply related

* [PATCH 0/9]qlcnic: cleanup
From: Amit Kumar Salecha @ 2010-05-13 13:07 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman

Hi
  Series of 9 patches to cleanup unused code and to support quiscent
  mode. 

-Amit

^ permalink raw reply

* [PATCH NEXT 3/9] qlcnic: fix caching window register
From: Amit Kumar Salecha @ 2010-05-13 13:07 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman
In-Reply-To: <1273756070-7205-1-git-send-email-amit.salecha@qlogic.com>

o Window register is not per pci-func, so caching can
result in expected result.

Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
 drivers/net/qlcnic/qlcnic.h      |    3 ---
 drivers/net/qlcnic/qlcnic_hw.c   |    5 -----
 drivers/net/qlcnic/qlcnic_main.c |   10 ----------
 3 files changed, 0 insertions(+), 18 deletions(-)

diff --git a/drivers/net/qlcnic/qlcnic.h b/drivers/net/qlcnic/qlcnic.h
index 2fba9cd..cfedcbd 100644
--- a/drivers/net/qlcnic/qlcnic.h
+++ b/drivers/net/qlcnic/qlcnic.h
@@ -399,9 +399,6 @@ struct qlcnic_hardware_context {
 
 	unsigned long pci_len0;
 
-	u32 ocm_win;
-	u32 crb_win;
-
 	rwlock_t crb_lock;
 	struct mutex mem_lock;
 
diff --git a/drivers/net/qlcnic/qlcnic_hw.c b/drivers/net/qlcnic/qlcnic_hw.c
index 7a72b8d..0c2e1f0 100644
--- a/drivers/net/qlcnic/qlcnic_hw.c
+++ b/drivers/net/qlcnic/qlcnic_hw.c
@@ -776,9 +776,6 @@ qlcnic_pci_set_crbwindow_2M(struct qlcnic_adapter *adapter, ulong off)
 
 	window = CRB_HI(off);
 
-	if (adapter->ahw.crb_win == window)
-		return;
-
 	writel(window, addr);
 	if (readl(addr) != window) {
 		if (printk_ratelimit())
@@ -786,7 +783,6 @@ qlcnic_pci_set_crbwindow_2M(struct qlcnic_adapter *adapter, ulong off)
 				"failed to set CRB window to %d off 0x%lx\n",
 				window, off);
 	}
-	adapter->ahw.crb_win = window;
 }
 
 int
@@ -874,7 +870,6 @@ qlcnic_pci_set_window_2M(struct qlcnic_adapter *adapter,
 	/* read back to flush */
 	readl(adapter->ahw.ocm_win_crb);
 
-	adapter->ahw.ocm_win = window;
 	*start = QLCNIC_PCI_OCM0_2M + GET_MEM_OFFS_2M(addr);
 	return 0;
 }
diff --git a/drivers/net/qlcnic/qlcnic_main.c b/drivers/net/qlcnic/qlcnic_main.c
index dc791d5..4652c7b 100644
--- a/drivers/net/qlcnic/qlcnic_main.c
+++ b/drivers/net/qlcnic/qlcnic_main.c
@@ -518,13 +518,6 @@ qlcnic_setup_pci_map(struct qlcnic_adapter *adapter)
 	struct pci_dev *pdev = adapter->pdev;
 	int pci_func = adapter->ahw.pci_func;
 
-	/*
-	 * Set the CRB window to invalid. If any register in window 0 is
-	 * accessed it should set the window to 0 and then reset it to 1.
-	 */
-	adapter->ahw.crb_win = -1;
-	adapter->ahw.ocm_win = -1;
-
 	/* remap phys address */
 	mem_base = pci_resource_start(pdev, 0);	/* 0 is for BAR 0 */
 	mem_len = pci_resource_len(pdev, 0);
@@ -1312,9 +1305,6 @@ qlcnic_resume(struct pci_dev *pdev)
 	pci_set_master(pdev);
 	pci_restore_state(pdev);
 
-	adapter->ahw.crb_win = -1;
-	adapter->ahw.ocm_win = -1;
-
 	err = qlcnic_start_firmware(adapter);
 	if (err) {
 		dev_err(&pdev->dev, "failed to start firmware\n");
-- 
1.6.0.2


^ permalink raw reply related

* [PATCH NEXT 2/9] qlcnic: remove obsolete register
From: Amit Kumar Salecha @ 2010-05-13 13:07 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman
In-Reply-To: <1273756070-7205-1-git-send-email-amit.salecha@qlogic.com>

MSI_MODE, CAPABILITIES_FW and SCRATCHPAD registers are obsolete.
Driver should not use them.

Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
 drivers/net/qlcnic/qlcnic_ethtool.c |    9 +--------
 drivers/net/qlcnic/qlcnic_hdr.h     |    7 -------
 drivers/net/qlcnic/qlcnic_init.c    |    1 -
 3 files changed, 1 insertions(+), 16 deletions(-)

diff --git a/drivers/net/qlcnic/qlcnic_ethtool.c b/drivers/net/qlcnic/qlcnic_ethtool.c
index 6cdc5eb..e494ccc 100644
--- a/drivers/net/qlcnic/qlcnic_ethtool.c
+++ b/drivers/net/qlcnic/qlcnic_ethtool.c
@@ -606,19 +606,12 @@ qlcnic_set_pauseparam(struct net_device *netdev,
 static int qlcnic_reg_test(struct net_device *dev)
 {
 	struct qlcnic_adapter *adapter = netdev_priv(dev);
-	u32 data_read, data_written;
+	u32 data_read;
 
 	data_read = QLCRD32(adapter, QLCNIC_PCIX_PH_REG(0));
 	if ((data_read & 0xffff) != adapter->pdev->vendor)
 		return 1;
 
-	data_written = (u32)0xa5a5a5a5;
-
-	QLCWR32(adapter, CRB_SCRATCHPAD_TEST, data_written);
-	data_read = QLCRD32(adapter, CRB_SCRATCHPAD_TEST);
-	if (data_written != data_read)
-		return 1;
-
 	return 0;
 }
 
diff --git a/drivers/net/qlcnic/qlcnic_hdr.h b/drivers/net/qlcnic/qlcnic_hdr.h
index a984cd2..1374078 100644
--- a/drivers/net/qlcnic/qlcnic_hdr.h
+++ b/drivers/net/qlcnic/qlcnic_hdr.h
@@ -585,15 +585,8 @@ enum {
  * for backward compability
  */
 #define CRB_NIC_CAPABILITIES_HOST	QLCNIC_REG(0x1a8)
-#define CRB_NIC_CAPABILITIES_FW 	QLCNIC_REG(0x1dc)
-#define CRB_NIC_MSI_MODE_HOST		QLCNIC_REG(0x270)
-#define CRB_NIC_MSI_MODE_FW	  	QLCNIC_REG(0x274)
 
 #define INTR_SCHEME_PERPORT	      	0x1
-#define MSI_MODE_MULTIFUNC	      	0x1
-
-/* used for ethtool tests */
-#define CRB_SCRATCHPAD_TEST	    QLCNIC_REG(0x280)
 
 /*
  * CrbPortPhanCntrHi/Lo is used to pass the address of HostPhantomIndex address
diff --git a/drivers/net/qlcnic/qlcnic_init.c b/drivers/net/qlcnic/qlcnic_init.c
index 1b621ca..9b01042 100644
--- a/drivers/net/qlcnic/qlcnic_init.c
+++ b/drivers/net/qlcnic/qlcnic_init.c
@@ -1189,7 +1189,6 @@ int qlcnic_init_firmware(struct qlcnic_adapter *adapter)
 		return err;
 
 	QLCWR32(adapter, CRB_NIC_CAPABILITIES_HOST, INTR_SCHEME_PERPORT);
-	QLCWR32(adapter, CRB_NIC_MSI_MODE_HOST, MSI_MODE_MULTIFUNC);
 	QLCWR32(adapter, CRB_MPORT_MODE, MPORT_MULTI_FUNCTION_MODE);
 	QLCWR32(adapter, CRB_CMDPEG_STATE, PHAN_INITIALIZE_ACK);
 
-- 
1.6.0.2


^ permalink raw reply related

* [PATCH NEXT 1/9] qlcnic: fix context cleanup
From: Amit Kumar Salecha @ 2010-05-13 13:07 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman
In-Reply-To: <1273756070-7205-1-git-send-email-amit.salecha@qlogic.com>

Before going for recovery, every pci-func should check fw state,
irrespective of device state. This to avoid unnecssary sending
of command for ctx destroy.

Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
 drivers/net/qlcnic/qlcnic_main.c |    4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/drivers/net/qlcnic/qlcnic_main.c b/drivers/net/qlcnic/qlcnic_main.c
index ee573fe..dc791d5 100644
--- a/drivers/net/qlcnic/qlcnic_main.c
+++ b/drivers/net/qlcnic/qlcnic_main.c
@@ -2302,10 +2302,8 @@ qlcnic_check_health(struct qlcnic_adapter *adapter)
 	if (qlcnic_check_temp(adapter))
 		goto detach;
 
-	if (adapter->need_fw_reset) {
+	if (adapter->need_fw_reset)
 		qlcnic_dev_request_reset(adapter);
-		goto detach;
-	}
 
 	state = QLCRD32(adapter, QLCNIC_CRB_DEV_STATE);
 	if (state == QLCNIC_DEV_NEED_RESET || state == QLCNIC_DEV_NEED_QUISCENT)
-- 
1.6.0.2


^ permalink raw reply related

* Re: [BUG] crashes with kvm/nat networking and net-next
From: Patrick McHardy @ 2010-05-13 12:58 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Bart De Schuymer, Eric Dumazet, netdev
In-Reply-To: <20100512151540.6f09a1b8@nehalam>

Stephen Hemminger wrote:
> On Wed, 12 May 2010 16:03:54 +0200
>> Don't call nf_bridge_update_protocol() for ARP traffic as
>> skb->nf_bridge isn't used in the ARP case.
>>
>>
>> Signed-off-by: Bart De Schuymer <bdschuym@pandora.be>
>> Reported-by: Stephen Hemminger <shemminger@vyatta.com>
>>
>> diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
>> index 93f80fe..4442099 100644
>> --- a/net/bridge/br_netfilter.c
>> +++ b/net/bridge/br_netfilter.c
>> @@ -643,10 +643,10 @@ static int br_nf_forward_finish(struct sk_buff *skb)
>>  			skb->pkt_type = PACKET_OTHERHOST;
>>  			nf_bridge->mask ^= BRNF_PKT_TYPE;
>>  		}
>> +		nf_bridge_update_protocol(skb);
>>  	} else {
>>  		in = *((struct net_device **)(skb->cb));
>>  	}
>> -	nf_bridge_update_protocol(skb);
>>  	nf_bridge_push_encap_header(skb);
>>  
>>  	NF_HOOK_THRESH(NFPROTO_BRIDGE, NF_BR_FORWARD, skb, in,
>>
> 
> This has worked all day for me without problem.

Applied, thanks everyone.

^ permalink raw reply

* RE: does the broadcom bnx2x support RSS/multi queue
From: Eilon Greenstein @ 2010-05-13 11:32 UTC (permalink / raw)
  To: Jon Zhou; +Cc: Eric Dumazet, netdev@vger.kernel.org
In-Reply-To: <4A6A2125329CFD4D8CC40C9E8ABCAB9F2497DECAE8@MILEXCH2.ds.jdsu.net>

On Thu, 2010-05-13 at 02:44 -0700, Jon Zhou wrote:
> insmod ./PF_RING/drivers/broadcom/netxtreme2-5.2.50/bnx2x-1.52.12/src/bnx2x.ko multi_mode=1 num_queues=4 int_mode=3 debug=1
There is no need to set multi_mode or int_mode - you are using the
default values. However, I need more information on why you are using
INTA and not MSI-X, so please set the debug to 0x20

> but seems MSI not enabled:
> 34:         26          0          0          0          0          0          0          0          0          0          0          0   49470611   63427140          0          0   IO-APIC-fasteoi   eth5
Indeed, you are using INT# - this is why you do not have multi-queue.

Eilon



^ permalink raw reply

* [PATCH 8/9] cxgb3: use kvzalloc and kvfree
From: Changli Gao @ 2010-05-13 11:14 UTC (permalink / raw)
  To: akpm; +Cc: Divy Le Ray, netdev, linux-kernel, Changli Gao

use kvzalloc and kvfree

use kvzalloc and kvfree

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 drivers/net/cxgb3/cxgb3_defs.h    |    2 --
 drivers/net/cxgb3/cxgb3_offload.c |   31 ++-----------------------------
 drivers/net/cxgb3/l2t.c           |    4 ++--
 3 files changed, 4 insertions(+), 33 deletions(-)
diff --git a/drivers/net/cxgb3/cxgb3_defs.h b/drivers/net/cxgb3/cxgb3_defs.h
index 47e5376..d3c662f 100644
--- a/drivers/net/cxgb3/cxgb3_defs.h
+++ b/drivers/net/cxgb3/cxgb3_defs.h
@@ -41,8 +41,6 @@
 
 #define VALIDATE_TID 1
 
-void *cxgb_alloc_mem(unsigned long size);
-void cxgb_free_mem(void *addr);
 void cxgb_neigh_update(struct neighbour *neigh);
 void cxgb_redirect(struct dst_entry *old, struct dst_entry *new);
 
diff --git a/drivers/net/cxgb3/cxgb3_offload.c b/drivers/net/cxgb3/cxgb3_offload.c
index c6485b3..799c9b7 100644
--- a/drivers/net/cxgb3/cxgb3_offload.c
+++ b/drivers/net/cxgb3/cxgb3_offload.c
@@ -39,7 +39,6 @@
 #include <linux/if_vlan.h>
 #include <net/netevent.h>
 #include <linux/highmem.h>
-#include <linux/vmalloc.h>
 
 #include "common.h"
 #include "regs.h"
@@ -1156,32 +1155,6 @@ void cxgb_redirect(struct dst_entry *old, struct dst_entry *new)
 }
 
 /*
- * Allocate a chunk of memory using kmalloc or, if that fails, vmalloc.
- * The allocated memory is cleared.
- */
-void *cxgb_alloc_mem(unsigned long size)
-{
-	void *p = kmalloc(size, GFP_KERNEL);
-
-	if (!p)
-		p = vmalloc(size);
-	if (p)
-		memset(p, 0, size);
-	return p;
-}
-
-/*
- * Free memory allocated through t3_alloc_mem().
- */
-void cxgb_free_mem(void *addr)
-{
-	if (is_vmalloc_addr(addr))
-		vfree(addr);
-	else
-		kfree(addr);
-}
-
-/*
  * Allocate and initialize the TID tables.  Returns 0 on success.
  */
 static int init_tid_tabs(struct tid_info *t, unsigned int ntids,
@@ -1191,7 +1164,7 @@ static int init_tid_tabs(struct tid_info *t, unsigned int ntids,
 	unsigned long size = ntids * sizeof(*t->tid_tab) +
 	    natids * sizeof(*t->atid_tab) + nstids * sizeof(*t->stid_tab);
 
-	t->tid_tab = cxgb_alloc_mem(size);
+	t->tid_tab = kvzalloc(size);
 	if (!t->tid_tab)
 		return -ENOMEM;
 
@@ -1227,7 +1200,7 @@ static int init_tid_tabs(struct tid_info *t, unsigned int ntids,
 
 static void free_tid_maps(struct tid_info *t)
 {
-	cxgb_free_mem(t->tid_tab);
+	kvfree(t->tid_tab);
 }
 
 static inline void add_adapter(struct adapter *adap)
diff --git a/drivers/net/cxgb3/l2t.c b/drivers/net/cxgb3/l2t.c
index 2f3ee72..71e8c32 100644
--- a/drivers/net/cxgb3/l2t.c
+++ b/drivers/net/cxgb3/l2t.c
@@ -420,7 +420,7 @@ struct l2t_data *t3_init_l2t(unsigned int l2t_capacity)
 	struct l2t_data *d;
 	int i, size = sizeof(*d) + l2t_capacity * sizeof(struct l2t_entry);
 
-	d = cxgb_alloc_mem(size);
+	d = kvzalloc(size);
 	if (!d)
 		return NULL;
 
@@ -441,6 +441,6 @@ struct l2t_data *t3_init_l2t(unsigned int l2t_capacity)
 
 void t3_free_l2t(struct l2t_data *d)
 {
-	cxgb_free_mem(d);
+	kvfree(d);
 }
 

^ permalink raw reply related

* [PATCH 7/9] cxgb4: use kvzalloc and kvfree
From: Changli Gao @ 2010-05-13 11:11 UTC (permalink / raw)
  To: akpm; +Cc: netdev, linux-kernel, Changli Gao

use kvzalloc and kvfree

use kvzalloc and kvfree

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 drivers/net/cxgb4/cxgb4.h      |    3 ---
 drivers/net/cxgb4/cxgb4_main.c |   37 +++++--------------------------------
 drivers/net/cxgb4/l2t.c        |    2 +-
 3 files changed, 6 insertions(+), 36 deletions(-)
diff --git a/drivers/net/cxgb4/cxgb4.h b/drivers/net/cxgb4/cxgb4.h
index 8856a75..2a38fad 100644
--- a/drivers/net/cxgb4/cxgb4.h
+++ b/drivers/net/cxgb4/cxgb4.h
@@ -584,9 +584,6 @@ static inline struct adapter *netdev2adap(const struct net_device *dev)
 void t4_os_portmod_changed(const struct adapter *adap, int port_id);
 void t4_os_link_changed(struct adapter *adap, int port_id, int link_stat);
 
-void *t4_alloc_mem(size_t size);
-void t4_free_mem(void *addr);
-
 void t4_free_sge_resources(struct adapter *adap);
 irq_handler_t t4_intr_handler(struct adapter *adap);
 netdev_tx_t t4_eth_xmit(struct sk_buff *skb, struct net_device *dev);
diff --git a/drivers/net/cxgb4/cxgb4_main.c b/drivers/net/cxgb4/cxgb4_main.c
index 1bad500..776f72f 100644
--- a/drivers/net/cxgb4/cxgb4_main.c
+++ b/drivers/net/cxgb4/cxgb4_main.c
@@ -55,7 +55,6 @@
 #include <linux/sched.h>
 #include <linux/seq_file.h>
 #include <linux/sockios.h>
-#include <linux/vmalloc.h>
 #include <linux/workqueue.h>
 #include <net/neighbour.h>
 #include <net/netevent.h>
@@ -742,32 +741,6 @@ out:	release_firmware(fw);
 	return ret;
 }
 
-/*
- * Allocate a chunk of memory using kmalloc or, if that fails, vmalloc.
- * The allocated memory is cleared.
- */
-void *t4_alloc_mem(size_t size)
-{
-	void *p = kmalloc(size, GFP_KERNEL);
-
-	if (!p)
-		p = vmalloc(size);
-	if (p)
-		memset(p, 0, size);
-	return p;
-}
-
-/*
- * Free memory allocated through alloc_mem().
- */
-void t4_free_mem(void *addr)
-{
-	if (is_vmalloc_addr(addr))
-		vfree(addr);
-	else
-		kfree(addr);
-}
-
 static inline int is_offload(const struct adapter *adap)
 {
 	return adap->params.offload;
@@ -2034,7 +2007,7 @@ static int tid_init(struct tid_info *t)
 	size = t->ntids * sizeof(*t->tid_tab) + natids * sizeof(*t->atid_tab) +
 	       t->nstids * sizeof(*t->stid_tab) +
 	       BITS_TO_LONGS(t->nstids) * sizeof(long);
-	t->tid_tab = t4_alloc_mem(size);
+	t->tid_tab = kvzalloc(size);
 	if (!t->tid_tab)
 		return -ENOMEM;
 
@@ -3308,8 +3281,8 @@ sriov:
 	return 0;
 
  out_free_dev:
-	t4_free_mem(adapter->tids.tid_tab);
-	t4_free_mem(adapter->l2t);
+	kvfree(adapter->tids.tid_tab);
+	kvfree(adapter->l2t);
 	for_each_port(adapter, i)
 		if (adapter->port[i])
 			free_netdev(adapter->port[i]);
@@ -3349,8 +3322,8 @@ static void __devexit remove_one(struct pci_dev *pdev)
 
 		t4_sge_stop(adapter);
 		t4_free_sge_resources(adapter);
-		t4_free_mem(adapter->l2t);
-		t4_free_mem(adapter->tids.tid_tab);
+		kvfree(adapter->l2t);
+		kvfree(adapter->tids.tid_tab);
 		disable_msi(adapter);
 
 		for_each_port(adapter, i)
diff --git a/drivers/net/cxgb4/l2t.c b/drivers/net/cxgb4/l2t.c
index 9f96724..f975087 100644
--- a/drivers/net/cxgb4/l2t.c
+++ b/drivers/net/cxgb4/l2t.c
@@ -513,7 +513,7 @@ struct l2t_data *t4_init_l2t(void)
 	int i;
 	struct l2t_data *d;
 
-	d = t4_alloc_mem(sizeof(*d));
+	d = kvzalloc(sizeof(*d));
 	if (!d)
 		return NULL;
 

^ permalink raw reply related

* [PATCH 1/9] mm: add generic adaptive large memory allocation APIs
From: Changli Gao @ 2010-05-13  9:51 UTC (permalink / raw)
  To: akpm
  Cc: Hoang-Nam Nguyen, Christoph Raisch, Roland Dreier, Sean Hefty,
	Hal Rosenstock, Divy Le Ray, James E.J. Bottomley,
	Theodore Ts'o, Andreas Dilger, Alexander Viro, Paul Menage,
	Li Zefan, linux-rdma, linux-kernel, netdev, linux-scsi,
	linux-ext4, linux-fsdevel, linux-mm, containers, Eric Dumazet,
	Tetsuo Handa, Peter Zijlstra, Changli Gao

generic adaptive large memory allocation APIs

kv*alloc are used to allocate large contiguous memory and the users don't mind
whether the memory is physically or virtually contiguous. The allocator always
try its best to allocate physically contiguous memory first.

In this patch set, some APIs are introduced: kvmalloc(), kvzalloc(), kvcalloc(),
kvrealloc(), kvfree() and kvfree_inatomic().

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 include/linux/mm.h      |   31 ++++++++++++++
 include/linux/vmalloc.h |    1 
 mm/nommu.c              |    6 ++
 mm/util.c               |  104 ++++++++++++++++++++++++++++++++++++++++++++++++
 mm/vmalloc.c            |   14 ++++++
 5 files changed, 156 insertions(+)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 462acaf..0ece978 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1467,5 +1467,36 @@ extern int soft_offline_page(struct page *page, int flags);
 
 extern void dump_page(struct page *page);
 
+void *__kvmalloc(size_t size, gfp_t flags);
+
+static inline void *kvmalloc(size_t size)
+{
+	return __kvmalloc(size, 0);
+}
+
+static inline void *kvzalloc(size_t size)
+{
+	return __kvmalloc(size, __GFP_ZERO);
+}
+
+static inline void *kvcalloc(size_t n, size_t size)
+{
+	return __kvmalloc(n * size, __GFP_ZERO);
+}
+
+void __kvfree(void *ptr, bool inatomic);
+
+static inline void kvfree(void *ptr)
+{
+	__kvfree(ptr, false);
+}
+
+static inline void kvfree_inatomic(void *ptr)
+{
+	__kvfree(ptr, true);
+}
+
+void *kvrealloc(void *ptr, size_t newsize);
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 227c2a5..33ec828 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -60,6 +60,7 @@ extern void *__vmalloc(unsigned long size, gfp_t gfp_mask, pgprot_t prot);
 extern void *__vmalloc_area(struct vm_struct *area, gfp_t gfp_mask,
 				pgprot_t prot);
 extern void vfree(const void *addr);
+extern unsigned long vsize(const void *addr);
 
 extern void *vmap(struct page **pages, unsigned int count,
 			unsigned long flags, pgprot_t prot);
diff --git a/mm/nommu.c b/mm/nommu.c
index 63fa17d..1ddf3fe 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -223,6 +223,12 @@ void vfree(const void *addr)
 }
 EXPORT_SYMBOL(vfree);
 
+unsigned long vsize(const void *addr)
+{
+	return ksize(addr);
+}
+EXPORT_SYMBOL(vsize);
+
 void *__vmalloc(unsigned long size, gfp_t gfp_mask, pgprot_t prot)
 {
 	/*
diff --git a/mm/util.c b/mm/util.c
index f5712e8..7cc364a 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -5,6 +5,7 @@
 #include <linux/err.h>
 #include <linux/sched.h>
 #include <asm/uaccess.h>
+#include <linux/vmalloc.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/kmem.h>
@@ -289,6 +290,109 @@ int __attribute__((weak)) get_user_pages_fast(unsigned long start,
 }
 EXPORT_SYMBOL_GPL(get_user_pages_fast);
 
+void *__kvmalloc(size_t size, gfp_t flags)
+{
+	void *ptr;
+
+	if (size < PAGE_SIZE)
+		return kmalloc(size, GFP_KERNEL | flags);
+	size = PAGE_ALIGN(size);
+	if (is_power_of_2(size))
+		ptr = (void *)__get_free_pages(GFP_KERNEL | flags |
+					       __GFP_NOWARN, get_order(size));
+	else
+		ptr = alloc_pages_exact(size, GFP_KERNEL | flags |
+					      __GFP_NOWARN);
+	if (ptr != NULL) {
+		virt_to_head_page(ptr)->private = size;
+		return ptr;
+	}
+
+	ptr = vmalloc(size);
+	if (ptr != NULL && (flags & __GFP_ZERO))
+		memset(ptr, 0, size);
+
+	return ptr;
+}
+EXPORT_SYMBOL(__kvmalloc);
+
+static void kvfree_work(struct work_struct *work)
+{
+	vfree(work);
+}
+
+void __kvfree(void *ptr, bool inatomic)
+{
+	if (unlikely(ZERO_OR_NULL_PTR(ptr)))
+		return;
+	if (is_vmalloc_addr(ptr)) {
+		if (inatomic) {
+			struct work_struct *work;
+
+			work = ptr;
+			BUILD_BUG_ON(sizeof(struct work_struct) > PAGE_SIZE);
+			INIT_WORK(work, kvfree_work);
+			schedule_work(work);
+		} else {
+			vfree(ptr);
+		}
+	} else {
+		struct page *page;
+
+		page = virt_to_head_page(ptr);
+		if (PageSlab(page) || PageCompound(page))
+			kfree(ptr);
+		else if (is_power_of_2(page->private))
+			free_pages((unsigned long)ptr,
+				   get_order(page->private));
+		else
+			free_pages_exact(ptr, page->private);
+	}
+}
+EXPORT_SYMBOL(__kvfree);
+
+void *kvrealloc(void *ptr, size_t newsize)
+{
+	void *nptr;
+	size_t oldsize;
+
+	if (unlikely(!newsize)) {
+		kvfree(ptr);
+		return ZERO_SIZE_PTR;
+	}
+
+	if (unlikely(ZERO_OR_NULL_PTR(ptr)))
+		return kvmalloc(newsize);
+
+	if (is_vmalloc_addr(ptr)) {
+		oldsize = vsize(ptr);
+		if (newsize <= oldsize)
+			return ptr;
+	} else {
+		struct page *page;
+
+		page = virt_to_head_page(ptr);
+		if (PageSlab(page) || PageCompound(page)) {
+			if (newsize < PAGE_SIZE)
+				return krealloc(ptr, newsize, GFP_KERNEL);
+			oldsize = ksize(ptr);
+		} else {
+			oldsize = page->private;
+			if (newsize <= oldsize)
+				return ptr;
+		}
+	}
+
+	nptr = kvmalloc(newsize);
+	if (nptr != NULL) {
+		memcpy(nptr, ptr, oldsize);
+		kvfree(ptr);
+	}
+
+	return nptr;
+}
+EXPORT_SYMBOL(kvrealloc);
+
 /* Tracepoints definitions. */
 EXPORT_TRACEPOINT_SYMBOL(kmalloc);
 EXPORT_TRACEPOINT_SYMBOL(kmem_cache_alloc);
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index ae00746..93552a8 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1413,6 +1413,20 @@ void vfree(const void *addr)
 EXPORT_SYMBOL(vfree);
 
 /**
+ *	vsize  -  get the actual amount of memory allocated by vmalloc()
+ *	@addr:		memory base address
+ */
+unsigned long vsize(const void *addr)
+{
+	struct vmap_area *va;
+
+	va = find_vmap_area((unsigned long)addr);
+
+	return va->va_end - va->va_start - PAGE_SIZE;
+}
+EXPORT_SYMBOL(vsize);
+
+/**
  *	vunmap  -  release virtual mapping obtained by vmap()
  *	@addr:		memory base address
  *

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [PATCH 0/9] mm: generic adaptive large memory allocation APIs
From: Changli Gao @ 2010-05-13  9:49 UTC (permalink / raw)
  To: akpm
  Cc: Hoang-Nam Nguyen, Christoph Raisch, Roland Dreier, Sean Hefty,
	Hal Rosenstock, Divy Le Ray, James E.J. Bottomley,
	Theodore Ts'o, Andreas Dilger, Alexander Viro, Paul Menage,
	Li Zefan, linux-rdma, linux-kernel, netdev, linux-scsi,
	linux-ext4, linux-fsdevel, linux-mm, containers, Changli Gao

generic adaptive large memory allocation APIs

kv*alloc are used to allocate large contiguous memory and the users don't mind
whether the memory is physically or virtually contiguous. The allocator always
try its best to allocate physically contiguous memory first.

In this patch set, some APIs are introduced: kvmalloc(), kvzalloc(), kvcalloc(),
kvrealloc(), kvfree() and kvfree_inatomic().

Some code are converted to use the new generic APIs instead.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 drivers/infiniband/hw/ehca/ipz_pt_fn.c |   22 +-----
 drivers/net/cxgb3/cxgb3_defs.h         |    2 
 drivers/net/cxgb3/cxgb3_offload.c      |   31 ---------
 drivers/net/cxgb3/l2t.c                |    4 -
 drivers/net/cxgb4/cxgb4.h              |    3 
 drivers/net/cxgb4/cxgb4_main.c         |   37 +----------
 drivers/net/cxgb4/l2t.c                |    2 
 drivers/scsi/cxgb3i/cxgb3i_ddp.c       |   12 +--
 drivers/scsi/cxgb3i/cxgb3i_ddp.h       |   26 -------
 drivers/scsi/cxgb3i/cxgb3i_offload.c   |    6 -
 fs/ext4/super.c                        |   21 +-----
 fs/file.c                              |  109 ++++-----------------------------
 include/linux/mm.h                     |   31 +++++++++
 include/linux/vmalloc.h                |    1 
 kernel/cgroup.c                        |   47 +-------------
 kernel/relay.c                         |   35 ----------
 mm/nommu.c                             |    6 +
 mm/util.c                              |  104 +++++++++++++++++++++++++++++++
 mm/vmalloc.c                           |   14 ++++
 19 files changed, 207 insertions(+), 306 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* RE: does the broadcom bnx2x support RSS/multi queue
From: Jon Zhou @ 2010-05-13  9:44 UTC (permalink / raw)
  To: eilong@broadcom.com, Eric Dumazet; +Cc: netdev@vger.kernel.org
In-Reply-To: <1273655947.4491.5.camel@lb-tlvb-eilong.il.broadcom.com>

hi:

I just tried to insall 1.52.12 driver:

insmod ./PF_RING/drivers/broadcom/netxtreme2-5.2.50/bnx2x-1.52.12/src/bnx2x.ko multi_mode=1 num_queues=4 int_mode=3 debug=1
ifconfig eth5 up

but seems MSI not enabled:
34:         26          0          0          0          0          0          0          0          0          0          0          0   49470611   63427140          0          0   IO-APIC-fasteoi   eth5
4314:          2          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0      none-<NULL>
4316:          2          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0      none-<NULL>
4318:          2          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0      none-<NULL>
4321:       6510        728       1786          0         14          0         57        380          0          0          0          0        100        139          0          0   PCI-MSI-edge      eth1
4322:       1102          0       2556        788          0          5          0       1338          0          0          0          0        174         45        411          0   PCI-MSI-edge      eth1
4323:         21          0          0          0          0          0          0          0          0          0          0          0        784       1103         56         87   PCI-MSI-edge      eth1
43

could give some tips?
thanks!

May 13 03:37:59 ibm-bc-53 kernel: Broadcom NetXtreme II 5771x 10Gigabit Ethernet Driver bnx2x 1.52.12 ($DateTime: 2009/12/17 12:14:50 $)
May 13 03:37:59 ibm-bc-53 kernel: bnx2x 0000:15:00.0: PCI INT A -> GSI 24 (level, low) -> IRQ 24
May 13 03:37:59 ibm-bc-53 kernel: bnx2x 0000:15:00.0: setting latency timer to 64
May 13 03:37:59 ibm-bc-53 kernel: bnx2x: part number 394D4342-31373735-31314131-473033
May 13 03:37:59 ibm-bc-53 kernel: eth2: Broadcom NetXtreme II BCM57711 XGb (A0) PCI-E x8 5GHz (Gen2) found at mem a0000000, IRQ 24, node addr ffff8801db810210
May 13 03:37:59 ibm-bc-53 kernel: bnx2x 0000:15:00.1: PCI INT B -> GSI 34 (level, low) -> IRQ 34
May 13 03:37:59 ibm-bc-53 kernel: bnx2x 0000:15:00.1: setting latency timer to 64
May 13 03:37:59 ibm-bc-53 kernel: eth2 renamed to eth3 by udevd [24569]
May 13 03:37:59 ibm-bc-53 kernel: bnx2x: part number 394D4342-31373735-31314131-473033
May 13 03:37:59 ibm-bc-53 kernel: udev: renamed network interface eth2 to eth3
May 13 03:37:59 ibm-bc-53 ifup:     eth3      device: Broadcom Corporation NetXtreme II BCM57711 10Gigabit PCIe
May 13 03:37:59 ibm-bc-53 SuSEfirewall2: SuSEfirewall2 not active
May 13 03:37:59 ibm-bc-53 kernel: eth2: Broadcom NetXtreme II BCM57711 XGb (A0) PCI-E x8 5GHz (Gen2) found at mem a0800000, IRQ 34, node addr ffff880200828210
May 13 03:37:59 ibm-bc-53 kernel: bnx2x 0000:1a:00.0: PCI INT A -> GSI 26 (level, low) -> IRQ 26
May 13 03:37:59 ibm-bc-53 kernel: bnx2x 0000:1a:00.0: setting latency timer to 64
May 13 03:37:59 ibm-bc-53 kernel: bnx2x: part number 394D4342-31373735-31314131-473033
May 13 03:37:59 ibm-bc-53 kernel: eth2 renamed to eth5 by udevd [24642]
May 13 03:37:59 ibm-bc-53 kernel: udev: renamed network interface eth2 to eth5
May 13 03:37:59 ibm-bc-53 ifup:     eth5      device: Broadcom Corporation NetXtreme II BCM57711 10Gigabit PCIe
May 13 03:37:59 ibm-bc-53 ifup:     eth5      Startmode is 'off'
May 13 03:37:59 ibm-bc-53 kernel: eth2: Broadcom NetXtreme II BCM57711 XGb (A0) PCI-E x8 5GHz (Gen2) found at mem 9a000000, IRQ 26, node addr ffff8802030f8210
May 13 03:37:59 ibm-bc-53 kernel: bnx2x 0000:1a:00.1: PCI INT B -> GSI 25 (level, low) -> IRQ 25
May 13 03:37:59 ibm-bc-53 kernel: bnx2x 0000:1a:00.1: setting latency timer to 64
May 13 03:37:59 ibm-bc-53 kernel: bnx2x: part number 394D4342-31373735-31314131-473033
May 13 03:37:59 ibm-bc-53 ifup:     eth2      device: Broadcom Corporation NetXtreme II BCM57711 10Gigabit PCIe
May 13 03:37:59 ibm-bc-53 SuSEfirewall2: SuSEfirewall2 not active
May 13 03:38:00 ibm-bc-53 kernel: eth4: Broadcom NetXtreme II BCM57711 XGb (A0) PCI-E x8 5GHz (Gen2) found at mem 9a800000, IRQ 25, node addr ffff88028d5a8210
May 13 03:38:00 ibm-bc-53 ifup:     eth4      device: Broadcom Corporation NetXtreme II BCM57711 10Gigabit PCIe
May 13 03:38:00 ibm-bc-53 SuSEfirewall2: SuSEfirewall2 not active
May 13 03:38:12 ibm-bc-53 kernel: bnx2x: eth5 NIC Link is Up, 10000 Mbps full duplex, receive & transmit flow control ON

I think there should be "using MSI-X" or something like that


-----Original Message-----
From: Eilon Greenstein [mailto:eilong@broadcom.com] 
Sent: Wednesday, May 12, 2010 5:19 PM
To: Eric Dumazet
Cc: Jon Zhou; netdev@vger.kernel.org
Subject: Re: does the broadcom bnx2x support RSS/multi queue

On Wed, 2010-05-12 at 00:41 -0700, Eric Dumazet wrote:
> Le mercredi 12 mai 2010 à 00:31 -0700, Jon Zhou a écrit :
> > hi there
> > 
> > I am not sure if my Broadcom 10G nic driver(bnx2x) support RSS/multi queue
> > 
> > ibm-bc-53:/home/ruizhou/nprobe # uname -a
> > Linux ibm-bc-53 2.6.27.19-5-default #1 SMP 2009-02-28 04:40:21 +0100 x86_64 x86_64 x86_64 GNU/Linux
> > 
> > ibm-bc-53:/home/ruizhou/nprobe # ethtool -S eth5
> > NIC statistics:
> >      rx_bytes: 68100170
> >      rx_error_bytes: 0
> >      tx_bytes: 0
> >      tx_error_bytes: 0
> >      rx_ucast_packets: 201654
> >      rx_mcast_packets: 0
> >      rx_bcast_packets: 0
> >      tx_packets: 0
> >      tx_mac_errors: 0
> >      tx_carrier_errors: 0
> >      rx_crc_errors: 0
> >      rx_align_errors: 0
> >      tx_single_collisions: 0
> >      tx_multi_collisions: 0
> >      tx_deferred: 0
> >      tx_excess_collisions: 0
> >      tx_late_collisions: 0
> >      tx_total_collisions: 0
> >      rx_fragments: 0
> >      rx_jabbers: 0
> >      rx_undersize_packets: 0
> >      rx_oversize_packets: 0
> >      tx_64_byte_packets: 0
> >      tx_65_to_127_byte_packets: 0
> >      tx_128_to_255_byte_packets: 0
> >      tx_256_to_511_byte_packets: 0
> >      tx_512_to_1023_byte_packets: 0
> >      tx_1024_to_1522_byte_packets: 0
> >      tx_1523_to_9022_byte_packets: 0
> >      rx_xon_frames: 0
> >      rx_xoff_frames: 0
> >      tx_xon_frames: 0
> >      tx_xoff_frames: 0
> >      rx_mac_ctrl_frames: 0
> >      rx_filtered_packets: 0
> >      rx_discards: 0
> >      rx_fw_discards: 0
> >      brb_discard: 0
> >      brb_truncate: 0
> >      rx_phy_ip_err_discards: 0
> >      rx_skb_alloc_discard: 0
> >      rx_csum_offload_errors: 6
> > 
> > the driver ver is:
> > bnx2x_main.c
> > #define DRV_MODULE_VERSION      "1.45.26"
> > 
> > looks not support?
> > 
> > thanks
> > jon
> 
> Per queue stats were added last year only (Thu Feb 12 08:36:33 2009)
> 
> You might check "grep eth5 /proc/interrupts"
> 
> Or upgrade to 2.6.33.x kernel :)
> 
The HW and current driver support multi-queue. However, you are using a version which is too old.






^ permalink raw reply

* [GIT PULL] last minute vhost-net fix
From: Michael S. Tsirkin @ 2010-05-13  8:44 UTC (permalink / raw)
  To: David Miller; +Cc: kvm, virtualization, netdev, linux-kernel

David, if it's not too late, please pull the following
last minute fix into 2.6.34.
Thanks!

The following changes since commit de02d72bb3cc5b3d4c873db4ca8291723dd48479:

  Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 (2010-05-10 22:53:41 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git net-2.6

Michael S. Tsirkin (1):
      vhost: fix barrier pairing

 drivers/vhost/vhost.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

-- 
MST

^ permalink raw reply

* Re: [ath5k-devel] [PATCH] [ath5k][leds] Ability to disable leds support. If leds support enabled do not force mac802.11 leds layer selection.
From: Dmytro Milinevskyy @ 2010-05-13  7:36 UTC (permalink / raw)
  To: Pavel Roskin
  Cc: ath5k-devel, Kalle Valo, linux-wireless, GeunSik Lim, Jiri Slaby,
	Greg Kroah-Hartman, John W. Linville, Keng-Yu Lin, netdev,
	Jiri Kosina, Johannes Berg, Shahar Or, linux-kernel,
	Luca Verdesca
In-Reply-To: <1273679336.10823.15.camel@mj>

Hello, Pavel.

I will rework the patch considering your suggestions.

> I'm not sure this complexity is needed.  Are you going to support LEDs
> if CONFIG_LEDS_CLASS is disabled?
If there's any other place in driver that might want use LEDs w/o
exporting the interface to the userspace.

-- Dima Milinevskyy


On Wed, May 12, 2010 at 6:48 PM, Pavel Roskin <proski@gnu.org> wrote:
> On Wed, 2010-04-07 at 21:58 +0300, Dmytro Milinevskyy wrote:
>
>> Here is the patch to disable ath5k leds support on build stage.
>> However if the leds support was enabled do not force selection of 802.11 leds layer.
>
> The idea is good, but the implementation could be improved.
>
> There are too many preprocessor conditionals in your patch.
>
>> +#ifdef CONFIG_ATH5K_LEDS
>>  /*
>>   * These match net80211 definitions (not used in
>>   * mac80211).
>> @@ -939,11 +940,7 @@ enum ath5k_power_mode {
>>  #define AR5K_LED_AUTH        2 /*IEEE80211_S_AUTH*/
>>  #define AR5K_LED_ASSOC       3 /*IEEE80211_S_ASSOC*/
>>  #define AR5K_LED_RUN 4 /*IEEE80211_S_RUN*/
>
> It should be OK to leave the constants defined even if they are not
> used.
>
>> +#ifdef CONFIG_ATH5K_LEDS
>>  /* LED functions */
>>  extern int ath5k_init_leds(struct ath5k_softc *sc);
>>  extern void ath5k_led_enable(struct ath5k_softc *sc);
>>  extern void ath5k_led_off(struct ath5k_softc *sc);
>>  extern void ath5k_unregister_leds(struct ath5k_softc *sc);
>> +#endif
>
> You could add inline functions for the case when CONFIG_ATH5K_LEDS is
> not defined.  That would avoid may conditionals in the code.
>
>>  /* GPIO Functions */
>> +#ifdef CONFIG_ATH5K_LEDS
>>  extern void ath5k_hw_set_ledstate(struct ath5k_hw *ah, unsigned int state);
>> +#endif
>
> The same comment applies.
>
> Also, there is nothing wrong with having an external declaration that is
> not used in some particular configuration.
>
>> +#ifdef CONFIG_ATH5K_LEDS
>>       /* turn on HW LEDs */
>>       ath5k_hw_set_ledstate(ah, AR5K_LED_INIT);
>> +#endif
>
> This is avoidable by having an inline ath5k_hw_set_ledstate() that does
> nothing.
>
>> +#ifdef CONFIG_ATH5K_LEDS
>>       struct ieee80211_hw *hw = pci_get_drvdata(to_pci_dev(dev));
>>       struct ath5k_softc *sc = hw->priv;
>>
>>       ath5k_led_off(sc);
>> +#endif
>
> Even this is avoidable if ath5k_led_off() does nothing.  gcc should be
> smart enough to optimize out unneeded function calls.
>
>> +#ifdef CONFIG_ATH5K_LEDS
>>  /*
>>   * State for LED triggers
>>   */
>>  struct ath5k_led
>>  {
>> +#ifdef CONFIG_LEDS_CLASS
>
> I'm not sure this complexity is needed.  Are you going to support LEDs
> if CONFIG_LEDS_CLASS is disabled?
>
>> +#ifdef CONFIG_ATH5K_LEDS
>>       unsigned int            led_pin,        /* GPIO pin for driving LED */
>>                               led_on;         /* pin setting for LED on */
>> +#endif
>>
>>       struct tasklet_struct   restq;          /* reset tasklet */
>>
>> @@ -164,7 +172,9 @@ struct ath5k_softc {
>>       spinlock_t              rxbuflock;
>>       u32                     *rxlink;        /* link ptr in last RX desc */
>>       struct tasklet_struct   rxtq;           /* rx intr tasklet */
>> +#ifdef CONFIG_ATH5K_LEDS
>>       struct ath5k_led        rx_led;         /* rx led */
>> +#endif
>
> You may want to group those fields together to make the code more
> readable.
>
>> --- a/drivers/net/wireless/ath/ath5k/led.c
>> +++ b/drivers/net/wireless/ath/ath5k/led.c
>
> I wonder if you could omit led.c completely in the Makefile.  If there
> are some parts of led.c that are needed without CONFIG_ATH5K_LEDS, maybe
> they belong elsewhere?
>
> --
> Regards,
> Pavel Roskin
>

^ permalink raw reply

* Re: [PATCH net-next-2.6 2/2] bonding: allow user-controlled output slave selection
From: John Fastabend @ 2010-05-13  7:32 UTC (permalink / raw)
  To: Andy Gospodarek; +Cc: Jay Vosburgh, Neil Horman, netdev@vger.kernel.org
In-Reply-To: <20100512221408.GI7497@gospo.rdu.redhat.com>

Andy Gospodarek wrote:
> On Wed, May 12, 2010 at 12:41:54PM -0700, Jay Vosburgh wrote:
>> Neil Horman <nhorman@tuxdriver.com> wrote:
>>
>>> On Tue, May 11, 2010 at 01:09:39PM -0700, Jay Vosburgh wrote:
>>>> Andy Gospodarek <andy@greyhouse.net> wrote:
>>>>
>>>>> This patch give the user the ability to control the output slave for
>>>>> round-robin and active-backup bonding.  Similar functionality was
>>>>> discussed in the past, but Jay Vosburgh indicated he would rather see a
>>>>> feature like this added to existing modes rather than creating a
>>>>> completely new mode.  Jay's thoughts as well as Neil's input surrounding
>>>>> some of the issues with the first implementation pushed us toward a
>>>>> design that relied on the queue_mapping rather than skb marks.
>>>>> Round-robin and active-backup modes were chosen as the first users of
>>>>> this slave selection as they seemed like the most logical choices when
>>>>> considering a multi-switch environment.
>>>>>
>>>>> Round-robin mode works without any modification, but active-backup does
>>>>> require inclusion of the first patch in this series and setting
>>>>> the 'keep_all' flag.  This will allow reception of unicast traffic on
>>>>> any of the backup interfaces.
>>>>    Yes, I did think that the mark business fit better into existing
>>>> modes (I thought of it as kind of a new hash for xor and 802.3ad modes).
>>>> I also didn't expect to see so much new stuff (this, as well as the FCOE
>>>> special cases being discussed elsewhere) being shoehorned into the
>>>> active-backup mode.  I'm not so sure that adding so many special cases
>>>> to active-backup is a good thing.
>>>>
>>>>    Now, I'm starting to wonder if you were right, and it would be
>>>> better overall to have a "manual" mode that would hopefully satisfy this
>>>> case as well as the FCOE special case.  I don't think either of these is
>>>> a bad use case, I'm just not sure the right way to handle them is
>>>> another special knob in active-backup mode (either directly, or
>>>> implicitly in __netif_receive_skb), which wasn't what I expected to see.
>>>>
>>> I honestly don't think a separate mode is warranted here.  While I'm not opposed
>>> to adding a new mode, I really think doing so is no different from overloading
>>> an existing mode.  I say that because to add a new mode in which we explicitly
>>> expect traffic to be directed to various slaves requires that we implement a
>>> policy for frames which have no queue mapping determined on egress.  Any policy
>>> I can think of is really an approximation of an existing policy, so we may as
>>> well reuse the policy code that we already have in place.  About the only way a
>>> separate mode makes sense is in the 'passthrough' queue mode you document below.
>>> In this model, in which queue ids map to slaves in a 1:1 fashion it doesn't make
>>> senes.
>>       One goal I'm hoping to achieve is something that would satisfy
>> both the queue map stuff that you're looking for, and would meet the
>> needs of the FCOE people who also want to disable the duplicate
>> suppression (i.e., permit incoming traffic on the inactive slave) for a
>> different special case.
>>
>>       The FCOE proposal revolves around, essentially, active-backup
>> mode, but permitting incoming traffic on the inactive slave.  At the
>> moment, the patches attempt to special case things such that only
>> dev_add_pack listeners directly bound to the inactive slave are checked
>> (to permit the FCOE traffic to pass on the inactive slave, but still
>> dropping IP, as ip_rcv is a wildcard bind).
>>
>>       Your keep_all patch is, by and large, the same thing, except
>> that it permits anything to come in on the "inactive" slave, and it's a
>> switch that has to be turned on.
>>
>>       This seems like needless duplication to me; I'd prefer to see a
>> single solution that handles both cases instead of two special cases
>> that each do 90% of what the other does.
>>
>>       As far as a new mode goes, one major reason I think a separate
>> mode is warranted is the semantics: with either of these changes (to
>> permit more or less regular use of the "inactive" slaves), the mode
>> isn't really an "active-backup" mode any longer; there is no "inactive"
>> or "backup" slave.  I think of this as being a major change of
>> functionality, not simply a minor option.
>>
>>       Hence my thought that "active-backup" could stay as a "true" hot
>> standby mode (backup slaves are just that: for backup, only), and this
>> new mode would be the place to do the special queue-map / FCOE /
>> whatever that isn't really a hot standby configuration any longer.
>>
>>       As far as the behavior of the new mode (your concern about its
>> policy map approximations), in the end, it would probably act pretty
>> much like active-backup with your patch applied: traffic goes out the
>> active slave, unless directed otherwise.  It's a lot less complicated
>> than I had feared.
>>
> 
> It's beginning to sound like the 'FCoE use-case' and the one Neil and I
> are proposing are quite similar.  The main goal of both is to have the
> option to have multiple slaves send and receive traffic during the
> steady-state, but in the event of a failover all traffic would run on a
> single interface.
> 

I believe they are similar although I never considered using FCoE over a 
device that is actually in the bond.  For example the current FCoE use 
case is,

bond0 ------> ethx
                |
vlan-fcoe -->  |

Here vlan-fcoe is not a slave of bond0.  With the keep_all patch this 
would work plus an additional configuration,

bond0 --> vlan-fcoe1  ---> ethx
    \                        |
     \ --- vlan-fcoe2  --->  |

Here both vlan-fcoe1 and vlan-fcoe2 are slaves of bond0.

Even with the keep_all patch it still seems a little inconsistent to 
drop a packet outright if it is received on an inactive slave and 
destined for a vlan on the bond and then to deliver the packet to 
devices that have exact matches if it is received on an inactive slave 
but destined for the bond device.  I'll post a patch in just a moment 
that hopefully illustrates what I see as an unexpected side effect.


> The implementation proposed with this patch is a bit different that the
> 'mark-mode' patch you may recall I posted a few months ago.  It created
> a new mode that essentially did exactly what you are describing --
> transmit on the primary interface unless pushed to another interface via
> info in the skb and receive on all interfaces.  We initially did not
> create a new mode based on your reservations about the previous
> mark-mode patch and went the direction of enhancing one or two modes
> initially (figuring it would be good to run before walking), with the
> idea that other modes could take care of this output slave selection
> logic in the future.
> 
> 
>>>>    I presume you're overloading active-backup because it's not
>>>> etherchannel, 802.3ad, etc, and just talks right to the switch.  For the
>>>> regular load balance modes, I still think overlay into the existing
>>>> modes is preferable (more on that later); I'm thinking of "manual"
>>>> instead of another tweak to active-backup.
>>>>
>>>>    If users want to have actual hot-standby functionality, then
>>>> active-backup would do that, and nothing else (and it can be multi-queue
>>>> aware, but only one slave active at a time).
>>>>
>>> Yes, but active backup doesn't provide prefered output path selection in and of
>>> itself.  Thats the feature here.
>>       I understand that; I'm suggesting that active-backup should
>> provide no service other than hot standby, and not be overloaded into a
>> manual load balancing scheme (both for your use, and for FCOE).
>>
>>       Maybe I'm worrying too much about defending the purity of the
>> active-backup mode; I understand what you're trying to do a little
>> better now, and yes, the "manual" mode I think of (in your queue mapping
>> scheme, not the other doodads I talked about) would basically be
>> active-backup with your queue mapper, minus the duplicate suppressor.
>>
> 
> It doesn't matter terribly to me which direction is taken.  Again, a
> major reason this route was proposed was that you were not as keen on
> creating a new mode as I was at the time of that patch posting.  It's
> somewhat understandable as once a mode is added it's tough to take away,
> but when one sees how much we are really changing the way active-backup
> might behave in some cases maybe it makes sense to use a new mode?
> 
> I guess I like the idea of adding this output selection to existing
> modes because it at least gives us the option to use queue maps to
> select output interfaces for more than a mode that looks like
> present-day active-backup minus the duplicate suppression.   I'm happy to
> code-up a patch that creates a new mode, but before I go do that and
> test it, I'd like to know we have come to an agreement on the direction
> for the future.
> 
>>>>    Users who want the set of bonded slaves to look like a big
>>>> multiqueue buffet could use this "manual" mode and set things up however
>>>> they want.  One way to set it up is simply that the bond is N queues
>>>> wide, where N is the total of the queue counts of all the slaves.  If a
>>>> slave fails, N gets smaller, and the user code has to deal with that.
>>>> Since the queue count of a device can't change dynamically, the bond
>>>> would have to actually be set up with some big number of queues, and
>>>> then only a subset is actually active (or there is some sort of wrap).
>>>>
>>>>    In such an implementation, each slave would have a range of
>>>> queue IDs, not necessarily just one.  I'm a bit leery of exposing an API
>>>> where each slave is one queue ID, as it could make transitioning to real
>>>> multi-queue awareness difficult.
>>>>
>>> I'm sorry, what exactly do you mean when you say 'real' multi queue
>>> awareness?  How is this any less real than any other implementation?  The
>>> approach you outline above isn't any more or less valid than this one.
>>       That was my misunderstanding of how you planned to handle
>> things.  I had thought this patch was simply a scheme to use the queue
>> IDs for slave selection, without any method to further perform queue
>> selection on the slave itself (I hadn't thought of placing a tc action
>> on the slave itself, which you described later on).  I had been thinking
>> in terms of schemes to expose all of the slave queues on the bonding
>> device.
> 
> It wasn't our original intention either.  I didn't mention it in my
> original post as it wasn't really the intent of our patch, but a nice
> side-effect for the informed user. :) Obviously a bit more testing could
> take place and we could add more examples to the documentation for the
> nice side-effect feature of this patch, but since this wasn't our
> original intent and we didn't test it, we did not advertise it.
> 
>>       So, I don't see any issues with the queue mapping part.  I still
>> want to find a common solution for FCOE / your patch with regards to the
>> duplicate suppressor.
> 
> Understood.
> 
>>> While we're on the subject, Andy and I did discuss a model simmilar to what you
>>> describe above (what I'll refer to as a queue id passthrough model), in which
>>> you can tell the bonding driver to map a frame to a queue, and the bonding
>>> driver doesn't really do anything with the queue id other than pass to the slave
>>> device for hardware based multiqueue tx handling.  While we could do that, its
>>> my feeling such a model isn't the way to go for two primary reasons:
>>>
>>> 1) Inconsistent behavior.  Such an implementation makes assumptions regarding
>>> queue id specification within a driver.  For example, What if one of the slaves
>>> reserves some fixed number of low order queues for a sepecific purpose, and as
>>> such general use queues begin an at offset from zero, while other slaves do not.
>>> While its easy to accomidate such needs when writing the tc filters, if a slave
>>> fails over, such a bias would change output traffic behavior, as the bonding
>>> driver can't be clearly informed of such a bias.  Likewise, what if a slave
>>> driver allocates more queues than it actually supports in hardware (like the
>>> implementation you propose, ixgbe IIRC actually does this).  If slaves handled
>>> unimplemented tx queues different (if one wrapped queues, while the other simply
>>> dropped frames to unimplemented queues for instance).  A failover would change
>>> traffic patterns dramatically.
>>>
>>> 2) Need.  While (1) can pretty easily be managed with a few configuration
>>> guidelines (output queues on slaves have to be configured identically, lets
>>> chaos and madness befall you, etc), theres really no reason to bind users to
>>> such a system.  We're using tc filters to set the queue id on skbs enqueued to
>>> the bonding driver, theres absolutely no reason you can add addition filters to
>>> the slaves directly.  Since the bonding driver uses dev_queue_xmit to send a
>>> frame to a slave, it has the opportunity to pass through another set of queuing
>>> diciplines and filters that can reset and re-assign the skbs queue mapping.  So
>>> with the approach in this patch you can get both direct output control without
>>> sacrificing actual hardware tx output queue control.  With a passthrough model,
>>> you save a bit of filter configuration, but at the expense of having to be much
>>> more careful about how you configure your slave nics, and detecting such errors
>>> in configuration would be rather difficult to track down, as it would require
>>> the generation of traffic that hit the right filter after a failover.
>>       I don't disagree with any of this.  One thought I do have is
>> that Eric Dumazet, I believe, has mentioned that the read lock in
>> bonding is a limiting factor on 10G performance.  In the far distant
>> future when bonding is RCU, going through the lock(s) on the tc actions
>> of the slave could have the same net effect, and in such a case, a
>> qdisc-less path may be of benefit.  Not a concern for today, I suspect.
>>
>>>>    There might also be a way to tie it in to the new RPS code on
>>>> the receive side.
>>>>
>>>>    If the slaves all have the same MAC and attach to a single
>>>> switch via etherchannel, then it all looks pretty much like a single big
>>>> honkin' multiqueue device.  The switch probably won't map the flows back
>>>> the same way, though.
>>>>
>>> I agree, they probably wont.  Receive side handling wasn't really our focus here
>>> though.  Thats largely why we chose round robin and active backup as our first
>>> modes to use this with.  They are already written to expect frames on either
>>> interface.
>>>
>>>>    If the slaves are on discrete switches (without etherchannel),
>>>> things become more complicated.  If the slaves have the same MAC, then
>>>> the switches will be irritated about seeing that same MAC coming in from
>>>> multiple places.  If the slaves have different MACs, then ARP has the
>>>> same sort of issues.
>>>>
>>>>    In thinking about it, if it's linux bonding at both ends, there
>>>> could be any number of discrete switches in the path, and it wouldn't
>>>> matter as long as the linux end can work things out, e.g.,
>>>>
>>>>         -- switch 1 --
>>>> hostA  /              \  hostB
>>>> bond  ---- switch 2 ---- bond
>>>>        \              /
>>>>         -- switch 3 --
>>>>
>>>>    For something like this, the switches would never share MAC
>>>> information for the bonding slaves.  The issue here then becomes more of
>>>> detecting link failures (it would require either a "trunk failover" type
>>>> of function on the switch, or some kind of active probe between the
>>>> bonds).
>>>>
>>>>    Now, I realize that I'm babbling a bit, as from reading your
>>>> description, this isn't necessarily your target topology (which sounded
>>>> more like a case of slave A can reach only network X, and slave B can
>>>> reach anywhere, so sending to network X should use slave A
>>>> preferentially), or, as long as I'm doing ASCII-art,
>>>>
>>>>        --- switch 1 ---- network X
>>>> hostA /               /
>>>> bond  ---- switch 2 -+-- anywhere
>>>>
>>>>    Is that an accurate representation?  Or is it something a bit
>>>> different, e.g.,
>>>>
>>>>        --- switch 1 ---- network X -\
>>>> hostA /                             /
>>>> bond  ---- switch 2 ---- anywhere --
>>>>
>>>>    I.e., the "anywhere" connects back to network X from the
>>>> outside, so to speak.  Or, oh, maybe I'm missing it entirely, and you're
>>>> thinking of something like this:
>>>>
>>>>        --- switch 1 --- VPN --- web site
>>>> hostA /                          /
>>>> bond  ---- switch 2 - Internet -/
>>>>
>>>>    Where you prefer to hit "web site" via the VPN (perhaps it's a
>>>> more efficient or secure path), but can do it from the public network at
>>>> large if necessary.
>>>>
>>> Yes, this one.  I think the other models are equally interesting, but this model
>>> in which either path had universal reachabilty, but for some classes of traffic
>>> one path is preferred over the other is the one we had in mind.
>>>
>>>>    Now, regardless of the above, your first patch ("keep_all") is
>>>> to deal with the reverse problem, if this is a piggyback on top of
>>>> active-backup mode: how to get packets back, when both channels can be
>>>> active simultaneously.  That actually dovetails to a degree with work
>>>> I've been doing lately, but the solution there probably isn't what
>>>> you're looking for (there's a user space daemon to do path finding, and
>>>> the "bond IP" address is piggybacked on the slaves' MAC addresses, which
>>>> are not changed; the "bond IP" set exists in a separate subnet all its
>>>> own).
>>>>
>>>>    As I said, I'm not convinced that the "keep_all" option to
>>>> active-backup is really better than just a "manual" mode that lacks the
>>>> dup suppression and expects the user to set everything up.
>>>>
>>>>    As for the round-robin change in this patch, if I'm reading it
>>>> right, then the way it works is that the packets are round-robined,
>>>> unless there's a queue id passed in, in which case it's assigned to the
>>>> slave mapped to that queue id.  I'm not entirely sure why you picked
>>>> round-robin mode for that over balance-xor; it doesn't seem to fit well
>>>> with the description in the documentation.  Or is it just sort of a
>>>> demonstrator?
>>>>
>>> It was selected because round robin allows transmits on any interface already,
>>> and expects frames on any interface, so it was a 'safe' choice.  I would think
>>> balance-xor would also work.  Ideally it would be nice to get more modes
>>> supporting this mechanism.
>>       I think that this should work for balance-xor and 802.3ad.  The
>> only limitation for 802.3ad is that the spec requires "conversations" to
>> not be striped or to skip around in a manner that could lead to out of
>> order delivery.
> 
> Agreed.  Checking would probably also have to be done to make sure that
> we were not trasmitting on an inactive aggregator.
> 
>>       I'm not so sure about the alb/tlb modes; at first thought, I
>> think it could have conflicts with the internal balancing done within
>> the modes (if, e.g., the tc action put traffic for the same destination
>> on two slaves).
>>
> 
> TLB and ALB modes would certainly have to be done differently.  It
> should not be terribly difficult to move from the existing hashing
> that's done to one that relies on the queue_mapping, but it will take a
> bit to make sure it's not a complete hack.
> 
> We decided against doing that for all modes on the first pass as it
> seemed like the active-backup and round-robin were the most-likely
> users.  We also wanted present the code early rather that spending time
> supporting this on every-mode to find out that it just wasn't rational
> to do it on some of them.
> 
>>>>    I do like one other aspect of the patch, and that's the concept
>>>> of overlaying the queue map on top of the balance algorithm.  So, e.g.,
>>>> balance-xor would do its usual thing, unless the packet is queue mapped,
>>>> in which case the packet's assignment is obeyed.  The balance-xor could
>>>> even optionally do its xor across the full set of all slaves output
>>>> queues instead of just across the slaves.  Round-robin can operate
>>>> similarly.  For those modes, a "balance by queue vs. balance by slave"
>>>> seems like a reasonable knob to have.
>>> Not sure what you mean here.  In the model implemented by this patch, there is
>>> one output queue per slave, and as such, balance by queue == balance by slave.
>>> That would make sense in the model you describe earlier in this note, but not in
>>> the model presented by this patch.
>>       Yes, I was thinking about what I had described; again,
>> predicated on my misunderstanding of how it all worked.
>>
>>>>    I do understand that you're proposing something relatively
>>>> simple, and I'm thinking out loud about alternate or additional
>>>> implementation details.  Some of this is "ooh ahh what if", but we also
>>>> don't want to end up with something that's forwards incompatible, and
>>>> I'm hoping to find one solution to multiple problems.
>>>>
>>> For clarification, can you ennumerate what other problems you are trying to
>>> solve with this feature, or features simmilar to this?  From this email, the one
>>> that I most clearly see is the desire to allow a passthrough mode of queue
>>> selection, which I think I've noted can be done already (even without this
>>> patch), by attaching additional tc filters to the slaves output queues directly.
>>> What else do you have in mind?
>>       As I said above, I hadn't thought of stacking tc actions on to
>> the slaves directly, so I was thinking on ways to expose the slave
>> queues.
>>
>>       I still find something intriguing about a round-robin or xor
>> mode that robins/xors through all of the slave queues, though, but that
>> should be something separate (I'm not sure if such a scheme is actually
>> "better", either).
>>
>>       -J

It would be best if there was a solution for the FCoE use case that 
works with the current bonding modes including 802.3ad.  There is switch 
support to run mpio FCoE while doing link aggregation on the LAN side 
that we should support.  I'm not sure the keep_all patch would be good 
in this case Jay I think you mentioned this at some point, but I missed 
the conclusion?  Although maybe it would be OK I'll think about it some 
more tomorrow.

Thanks,
John



>>
>> ---
>>       -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply

* [PATCH 3/3] net: deliver skbs on inactive slaves to exact matches
From: John Fastabend @ 2010-05-13  7:31 UTC (permalink / raw)
  To: andy, fubar, nhorman, bonding-devel, netdev; +Cc: john.r.fastabend
In-Reply-To: <20100513073106.3528.45412.stgit@jf-dev2-dcblab>

Currently, the accelerated receive path for VLAN's will
drop packets if the real device is an inactive slave and
is not one of the special pkts tested for in
skb_bond_should_drop().  This behavior is different then
the non-accelerated path and for pkts over a bonded vlan.

For example,

vlanx -> bond0 -> ethx

will be dropped in the vlan path and not delivered to any
packet handlers at all.  However,

bond0 -> vlanx -> ethx

and

bond0 -> ethx

will be delivered to handlers that match the exact dev,
because the VLAN path checks the real_dev which is not a
slave and netif_recv_skb() doesn't drop frames but only
delivers them to exact matches.

This patch adds a sk_buff flag which is used for tagging
skbs that would previously been dropped and allows the
skb to continue to skb_netif_recv().  Here we add
logic to check for the bond_should_drop flag and if it
is set only deliver to handlers that match exactly.  This
makes both paths above consistent and gives pkt handlers
a way to identify skbs that come from inactive slaves.
Without this patch in some configurations skbs will be
delivered to handlers with exact matches and in others
be dropped out right in the vlan path.

I have tested the following 4 configurations in failover modes
and load balancing modes.

# bond0 -> ethx

# vlanx -> bond0 -> ethx

# bond0 -> vlanx -> ethx

# bond0 -> ethx
            |
  vlanx -> --

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---

 include/linux/skbuff.h |    8 +++++++-
 net/8021q/vlan_core.c  |    8 ++++++--
 net/core/dev.c         |   23 +++++++++++++++++++----
 3 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index c9525bc..5ba4fd5 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -379,8 +379,14 @@ struct sk_buff {
 
 	kmemcheck_bitfield_begin(flags2);
 	__u16			queue_mapping:16;
-#ifdef CONFIG_IPV6_NDISC_NODETYPE
+#if defined(CONFIG_IPV6_NDISC_NODETYPE) && \
+    (defined(CONFIG_BONDING) || defined(CONFIG_BONDING_MODULE))
+	__u8			ndisc_nodetype:2,
+				bond_should_drop:1;
+#elif defined(CONFIG_IPV6_NDISC_NODETYPE)
 	__u8			ndisc_nodetype:2;
+#elif defined(CONFIG_BONDING) || defined(CONFIG_BONDING_MODULE)
+	__u8			bond_should_drop:1;
 #endif
 	kmemcheck_bitfield_end(flags2);
 
diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
index c584a0a..57ac2d3 100644
--- a/net/8021q/vlan_core.c
+++ b/net/8021q/vlan_core.c
@@ -11,8 +11,10 @@ int __vlan_hwaccel_rx(struct sk_buff *skb, struct vlan_group *grp,
 	if (netpoll_rx(skb))
 		return NET_RX_DROP;
 
+#if defined(CONFIG_BONDING) || defined(CONFIG_BONDING_MODULE)
 	if (skb_bond_should_drop(skb, ACCESS_ONCE(skb->dev->master)))
-		goto drop;
+		skb->bond_should_drop = 1;
+#endif
 
 	skb->skb_iif = skb->dev->ifindex;
 	__vlan_hwaccel_put_tag(skb, vlan_tci);
@@ -83,8 +85,10 @@ vlan_gro_common(struct napi_struct *napi, struct vlan_group *grp,
 {
 	struct sk_buff *p;
 
+#if defined(CONFIG_BONDING) || defined(CONFIG_BONDING_MODULE)
 	if (skb_bond_should_drop(skb, ACCESS_ONCE(skb->dev->master)))
-		goto drop;
+		skb->bond_should_drop = 1;
+#endif
 
 	skb->skb_iif = skb->dev->ifindex;
 	__vlan_hwaccel_put_tag(skb, vlan_tci);
diff --git a/net/core/dev.c b/net/core/dev.c
index 3dc691d..92fdff4 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2782,11 +2782,13 @@ static int __netif_receive_skb(struct sk_buff *skb)
 {
 	struct packet_type *ptype, *pt_prev;
 	struct net_device *orig_dev;
-	struct net_device *master;
 	struct net_device *null_or_orig;
 	struct net_device *orig_or_bond;
 	int ret = NET_RX_DROP;
 	__be16 type;
+#if defined(CONFIG_BONDING) || defined(CONFIG_BONDING_MODULE)
+	struct net_device *master;
+#endif
 
 	if (!skb->tstamp.tv64)
 		net_timestamp(skb);
@@ -2801,15 +2803,28 @@ static int __netif_receive_skb(struct sk_buff *skb)
 	if (!skb->skb_iif)
 		skb->skb_iif = skb->dev->ifindex;
 
+	/*
+	 * bonding note: skbs received on inactive slaves should only
+	 * be delivered to pkt handlers that are exact matches.  Also
+	 * the bond_should_drop flag will be set.  If packet handlers
+	 * are sensitive to duplicate packets these skbs will need to
+	 * be dropped at the handler.  The vlan accel path may have
+	 * already set the bond_should_drop flag.
+	 */
 	null_or_orig = NULL;
 	orig_dev = skb->dev;
+#if defined(CONFIG_BONDING) || defined(CONFIG_BONDING_MODULE)
 	master = ACCESS_ONCE(orig_dev->master);
-	if (master) {
-		if (skb_bond_should_drop(skb, master))
+	if (skb->bond_should_drop)
+		null_or_orig = orig_dev;
+	else if (master) {
+		if (skb_bond_should_drop(skb, master)) {
+			skb->bond_should_drop = 1;
 			null_or_orig = orig_dev; /* deliver only exact match */
-		else
+		} else
 			skb->dev = master;
 	}
+#endif
 
 	__get_cpu_var(softnet_data).processed++;
 


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox