Netdev List
 help / color / mirror / Atom feed
* Re: [RFC PATCH 1/1] bna: Generic Netlink Interface to collect FW trace
From: David Miller @ 2011-04-27  2:17 UTC (permalink / raw)
  To: debdut; +Cc: shemminger, rmody, netdev, huangj, amathur, ddutt
In-Reply-To: <BANLkTinUsJ2foWKBVb_9nVFA2V_vKb11rA@mail.gmail.com>

From: Debashis Dutt <debdut@gmail.com>
Date: Tue, 26 Apr 2011 19:16:29 -0700

> However, since the generic netlink is a more generic interface, we could use
> this infrastructure in the driver for commands which are not part of
> other standard tools.

You aren't the only device in the world that might want to provide
a facility to fetch firmware traces.

This isn't really a niche thing specific to your device at all.

^ permalink raw reply

* Re: [RFC PATCH 1/1] bna: Generic Netlink Interface to collect FW trace
From: Debashis Dutt @ 2011-04-27  2:16 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Rasesh Mody, netdev, davem, huangj, amathur, Debashis Dutt
In-Reply-To: <20110426185917.3d033636@nehalam>

On Tue, Apr 26, 2011 at 6:59 PM, Stephen Hemminger
<shemminger@vyatta.com> wrote:
> On Tue, 26 Apr 2011 18:51:57 -0700
> Rasesh Mody <rmody@brocade.com> wrote:
>
>> This is a RFC patch to Brocade BNA 10G Ethernet driver. It adds the generic
>> netlink communication interface to the BNA driver to collect firmware traces
>> using the in-kernel generic netlink infrastructure. The driver uses the
>> "dumpit" handler provided by the generic netlink layer to accomplish this. The
>> driver can extend this interface later if required.
>>
>> As of today, there seems to be no standard mechanism to collect debug
>> information such as firmware trace for a given hardware. Generic Netlinki seems
>> to provide a suitable option to do the same, without any further
>> addition/modification to the existing kernel implementation.
>>
>> This is a RFC patch inviting suggestions/opinions for improvement/modification
>> and requesting consideration for possible inclusion in net-next tree.
>>
>> Signed-off-by: Debashis Dutt <ddutt@brocade.com>
>> Signed-off-by: Rasesh Mody <rmody@brocade.com>
>
> Seems like a lot of work for a debug interface. What about debugfs?
> Or is that out of favor now, I can never keep track...

Stephen,

Well, I think debugfs isn't always available, but there could always
be a driver
configuration option for that.

However, since the generic netlink is a more generic interface, we could use
this infrastructure in the driver for commands which are not part of
other standard tools.

Thanks
--Debashis
>
> --
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* Re: [PATCH net-next-2.6] ipv4,ipv6,bonding: Restore control over number of peer notifications
From: Ben Hutchings @ 2011-04-27  2:14 UTC (permalink / raw)
  To: Brian Haley
  Cc: Jay Vosburgh, Andy Gospodarek, David Miller, Patrick McHardy,
	netdev
In-Reply-To: <4DB77AC2.5070207@hp.com>

On Tue, 2011-04-26 at 22:09 -0400, Brian Haley wrote:
> On 04/26/2011 09:25 PM, Ben Hutchings wrote:
> > For backward compatibility, we should retain the module parameters and
> > sysfs attributes to control the number of peer notifications
> > (gratuitous ARPs and unsolicited NAs) sent after bonding failover.
> > Also, it is possible for failover to take place even though the new
> > active slave does not have link up, and in that case the peer
> > notification should be deferred until it does.
> > 
> > Change ipv4 and ipv6 so they do not automatically send peer
> > notifications on bonding failover.
> > 
> > Change the bonding driver to send separate NETDEV_NOTIFY_PEERS
> > notifications when the link is up, as many times as requested.  Since
> > it does not directly control which protocols send notifications, make
> > num_grat_arp and num_unsol_na aliases for a single parameter.  Bump
> > the bonding version number and update its documentation.
> > 
> > Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
> 
> Signed-off-by: Brian Haley <brian.haley@hp.com>

I'm not sure what you mean by this.  You didn't write any of it and
you're not a maintainer with your own repository.  Did you mean to say
'Reviewed-by' or 'Acked-by'?

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [PATCH net-next-2.6] ipv4,ipv6,bonding: Restore control over number of peer notifications
From: Brian Haley @ 2011-04-27  2:09 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Jay Vosburgh, Andy Gospodarek, David Miller, Patrick McHardy,
	netdev
In-Reply-To: <1303867552.2850.39.camel@bwh-desktop>

On 04/26/2011 09:25 PM, Ben Hutchings wrote:
> For backward compatibility, we should retain the module parameters and
> sysfs attributes to control the number of peer notifications
> (gratuitous ARPs and unsolicited NAs) sent after bonding failover.
> Also, it is possible for failover to take place even though the new
> active slave does not have link up, and in that case the peer
> notification should be deferred until it does.
> 
> Change ipv4 and ipv6 so they do not automatically send peer
> notifications on bonding failover.
> 
> Change the bonding driver to send separate NETDEV_NOTIFY_PEERS
> notifications when the link is up, as many times as requested.  Since
> it does not directly control which protocols send notifications, make
> num_grat_arp and num_unsol_na aliases for a single parameter.  Bump
> the bonding version number and update its documentation.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Signed-off-by: Brian Haley <brian.haley@hp.com>

^ permalink raw reply

* Re: [RFC PATCH 1/1] bna: Generic Netlink Interface to collect FW trace
From: Stephen Hemminger @ 2011-04-27  1:59 UTC (permalink / raw)
  To: Rasesh Mody; +Cc: netdev, davem, huangj, amathur, Debashis Dutt
In-Reply-To: <1303869117-780-1-git-send-email-rmody@brocade.com>

On Tue, 26 Apr 2011 18:51:57 -0700
Rasesh Mody <rmody@brocade.com> wrote:

> This is a RFC patch to Brocade BNA 10G Ethernet driver. It adds the generic
> netlink communication interface to the BNA driver to collect firmware traces
> using the in-kernel generic netlink infrastructure. The driver uses the
> "dumpit" handler provided by the generic netlink layer to accomplish this. The
> driver can extend this interface later if required. 
> 
> As of today, there seems to be no standard mechanism to collect debug
> information such as firmware trace for a given hardware. Generic Netlinki seems
> to provide a suitable option to do the same, without any further
> addition/modification to the existing kernel implementation.
> 
> This is a RFC patch inviting suggestions/opinions for improvement/modification
> and requesting consideration for possible inclusion in net-next tree.
> 
> Signed-off-by: Debashis Dutt <ddutt@brocade.com>
> Signed-off-by: Rasesh Mody <rmody@brocade.com>

Seems like a lot of work for a debug interface. What about debugfs?
Or is that out of favor now, I can never keep track...

-- 

^ permalink raw reply

* [RFC PATCH 1/1] bna: Generic Netlink Interface to collect FW trace
From: Rasesh Mody @ 2011-04-27  1:51 UTC (permalink / raw)
  To: netdev, davem; +Cc: huangj, amathur, Rasesh Mody, Debashis Dutt

This is a RFC patch to Brocade BNA 10G Ethernet driver. It adds the generic
netlink communication interface to the BNA driver to collect firmware traces
using the in-kernel generic netlink infrastructure. The driver uses the
"dumpit" handler provided by the generic netlink layer to accomplish this. The
driver can extend this interface later if required. 

As of today, there seems to be no standard mechanism to collect debug
information such as firmware trace for a given hardware. Generic Netlinki seems
to provide a suitable option to do the same, without any further
addition/modification to the existing kernel implementation.

This is a RFC patch inviting suggestions/opinions for improvement/modification
and requesting consideration for possible inclusion in net-next tree.

Signed-off-by: Debashis Dutt <ddutt@brocade.com>
Signed-off-by: Rasesh Mody <rmody@brocade.com>
---
 drivers/net/bna/bfa_ioc.c   |   59 +++++++++++
 drivers/net/bna/bfa_ioc.h   |    4 +
 drivers/net/bna/bfi.h       |    2 +
 drivers/net/bna/bnad.c      |   39 ++++++++
 drivers/net/bna/bnad.h      |    9 ++-
 drivers/net/bna/bnad_genl.c |  228 +++++++++++++++++++++++++++++++++++++++++++
 drivers/net/bna/bnad_genl.h |   63 ++++++++++++
 7 files changed, 402 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/bna/bnad_genl.c
 create mode 100644 drivers/net/bna/bnad_genl.h

diff --git a/drivers/net/bna/bfa_ioc.c b/drivers/net/bna/bfa_ioc.c
index fcb9bb3..766e48b 100644
--- a/drivers/net/bna/bfa_ioc.c
+++ b/drivers/net/bna/bfa_ioc.c
@@ -2209,6 +2209,65 @@ bfa_nw_ioc_get_mac(struct bfa_ioc *ioc)
 	return ioc->attr->mac;
 }
 
+static int
+bfa_nw_ioc_smem_read(struct bfa_ioc *ioc, void *tbuf, u32 soff, u32 sz)
+{
+	u32 pgnum, loff;
+	__be32 r32;
+	int i, len;
+	u32 *buf = tbuf;
+
+	pgnum = PSS_SMEM_PGNUM(ioc->ioc_regs.smem_pg0, soff);
+	loff = PSS_SMEM_PGOFF(soff);
+
+	/*
+	 *  Hold semaphore to serialize pll init and fwtrc.
+	 */
+	if (!(bfa_nw_ioc_sem_get(ioc->ioc_regs.ioc_init_sem_reg)))
+		return 1;
+
+	writel(pgnum, ioc->ioc_regs.host_page_num_fn);
+
+	len = sz/sizeof(u32);
+	for (i = 0; i < len; i++) {
+		r32 = swab32(readl(ioc->ioc_regs.smem_page_start + loff));
+		buf[i] = be32_to_cpu(r32);
+		loff += sizeof(u32);
+
+		/*
+		 * handle page offset wrap around
+		 */
+		loff = PSS_SMEM_PGOFF(loff);
+		if (loff == 0) {
+			pgnum++;
+			writel(pgnum, ioc->ioc_regs.host_page_num_fn);
+		}
+	}
+	writel(PSS_SMEM_PGNUM(ioc->ioc_regs.smem_pg0, 0),
+			ioc->ioc_regs.host_page_num_fn);
+	/*
+	 *  release semaphore.
+	 */
+	writel(1, ioc->ioc_regs.ioc_init_sem_reg);
+
+	return 0;
+}
+
+int
+bfa_nw_ioc_debug_fwtrc(struct bfa_ioc *ioc, void *trcdata, int *trclen)
+{
+	u32 loff = (BFI_IOC_TRC_OFF + BFA_DBG_FWTRC_LEN * (ioc->port_id));
+	int tlen, status = 0;
+
+	tlen = *trclen;
+	if (tlen > BFA_DBG_FWTRC_LEN)
+		tlen = BFA_DBG_FWTRC_LEN;
+
+	status = bfa_nw_ioc_smem_read(ioc, trcdata, loff, tlen);
+	*trclen = tlen;
+	return status;
+}
+
 /**
  * Firmware failure detected. Start recovery actions.
  */
diff --git a/drivers/net/bna/bfa_ioc.h b/drivers/net/bna/bfa_ioc.h
index bd48abe..383cd59 100644
--- a/drivers/net/bna/bfa_ioc.h
+++ b/drivers/net/bna/bfa_ioc.h
@@ -23,6 +23,9 @@
 #include "bfi.h"
 #include "cna.h"
 
+#define BFA_DBG_FWTRC_LEN	(BFI_IOC_TRC_ENTS * BFI_IOC_TRC_ENT_SZ + \
+				BFI_IOC_TRC_HDR_SZ)
+
 #define BFA_IOC_TOV		3000	/* msecs */
 #define BFA_IOC_HWSEM_TOV	500	/* msecs */
 #define BFA_IOC_HB_TOV		500	/* msecs */
@@ -269,6 +272,7 @@ void bfa_nw_ioc_hbfail_register(struct bfa_ioc *ioc,
 bool bfa_nw_ioc_sem_get(void __iomem *sem_reg);
 void bfa_nw_ioc_sem_release(void __iomem *sem_reg);
 void bfa_nw_ioc_hw_sem_release(struct bfa_ioc *ioc);
+int bfa_nw_ioc_debug_fwtrc(struct bfa_ioc *ioc, void *trcdata, int *trclen);
 void bfa_nw_ioc_fwver_get(struct bfa_ioc *ioc,
 			struct bfi_ioc_image_hdr *fwhdr);
 bool bfa_nw_ioc_fwver_cmp(struct bfa_ioc *ioc,
diff --git a/drivers/net/bna/bfi.h b/drivers/net/bna/bfi.h
index 6050379..ee73b6f 100644
--- a/drivers/net/bna/bfi.h
+++ b/drivers/net/bna/bfi.h
@@ -277,6 +277,8 @@ struct bfi_ioc_getattr_reply {
  */
 #define BFI_IOC_TRC_OFF		(0x4b00)
 #define BFI_IOC_TRC_ENTS	256
+#define BFI_IOC_TRC_ENT_SZ	16
+#define BFI_IOC_TRC_HDR_SZ	32
 
 #define BFI_IOC_FW_SIGNATURE	(0xbfadbfad)
 #define BFI_IOC_MD5SUM_SZ	4
diff --git a/drivers/net/bna/bnad.c b/drivers/net/bna/bnad.c
index e588511..d8d0da0 100644
--- a/drivers/net/bna/bnad.c
+++ b/drivers/net/bna/bnad.c
@@ -25,6 +25,7 @@
 #include <linux/ip.h>
 
 #include "bnad.h"
+#include "bnad_genl.h"
 #include "bna.h"
 #include "cna.h"
 
@@ -44,8 +45,12 @@ MODULE_PARM_DESC(bnad_ioc_auto_recover, "Enable / Disable auto recovery");
 /*
  * Global variables
  */
+u32 bna_id;
 u32 bnad_rxqs_per_cq = 2;
 
+struct mutex bnad_list_mutex;
+LIST_HEAD(bnad_list);
+
 static const u8 bnad_bcast_addr[] =  {0xff, 0xff, 0xff, 0xff, 0xff, 0xff};
 
 /*
@@ -72,6 +77,23 @@ do {								\
 
 #define BNAD_TXRX_SYNC_MDELAY	250	/* 250 msecs */
 
+static void
+bnad_add_to_list(struct bnad *bnad)
+{
+	mutex_lock(&bnad_list_mutex);
+	list_add_tail(&bnad->list_entry, &bnad_list);
+	bna_id++;
+	mutex_unlock(&bnad_list_mutex);
+}
+
+static void
+bnad_remove_from_list(struct bnad *bnad)
+{
+	mutex_lock(&bnad_list_mutex);
+	list_del(&bnad->list_entry);
+	mutex_unlock(&bnad_list_mutex);
+}
+
 /*
  * Reinitialize completions in CQ, once Rx is taken down
  */
@@ -3087,6 +3109,11 @@ bnad_pci_probe(struct pci_dev *pdev,
 	}
 	bnad = netdev_priv(netdev);
 
+	mutex_lock(&bnad_list_mutex);
+	bnad->id = bna_id;
+	mutex_unlock(&bnad_list_mutex);
+
+	bnad_add_to_list(bnad);
 	/*
 	 * PCI initialization
 	 * 	Output : using_dac = 1 for 64 bit DMA
@@ -3189,6 +3216,7 @@ disable_device:
 	bnad_res_free(bnad);
 	bnad_disable_msix(bnad);
 pci_uninit:
+	bnad_remove_from_list(bnad);
 	bnad_pci_uninit(pdev);
 	bnad_lock_uninit(bnad);
 	bnad_uninit(bnad);
@@ -3226,6 +3254,7 @@ bnad_pci_remove(struct pci_dev *pdev)
 
 	bnad_res_free(bnad);
 	bnad_disable_msix(bnad);
+	bnad_remove_from_list(bnad);
 	bnad_pci_uninit(pdev);
 	bnad_lock_uninit(bnad);
 	bnad_uninit(bnad);
@@ -3257,6 +3286,7 @@ bnad_module_init(void)
 
 	pr_info("Brocade 10G Ethernet driver\n");
 
+	mutex_init(&bnad_list_mutex);
 	bfa_nw_ioc_auto_recover(bnad_ioc_auto_recover);
 
 	err = pci_register_driver(&bnad_pci_driver);
@@ -3266,6 +3296,10 @@ bnad_module_init(void)
 		return err;
 	}
 
+	/* Register with generic netlink */
+	if (bnad_genl_init())
+		pr_err("bna: Generic Netlink Register failed\n");
+
 	return 0;
 }
 
@@ -3273,6 +3307,11 @@ static void __exit
 bnad_module_exit(void)
 {
 	pci_unregister_driver(&bnad_pci_driver);
+	mutex_destroy(&bnad_list_mutex);
+
+	/* Unegister with generic netlink */
+	if (bnad_genl_uninit())
+		pr_err("bna: Generic Netlink Unregister failed\n");
 
 	if (bfi_fw)
 		release_firmware(bfi_fw);
diff --git a/drivers/net/bna/bnad.h b/drivers/net/bna/bnad.h
index ccdabad..2afec2b 100644
--- a/drivers/net/bna/bnad.h
+++ b/drivers/net/bna/bnad.h
@@ -279,13 +279,18 @@ struct bnad {
 	char			adapter_name[BNAD_NAME_LEN];
 	char 			port_name[BNAD_NAME_LEN];
 	char			mbox_irq_name[BNAD_NAME_LEN];
+
+	int			id;
+	struct list_head	list_entry;
 };
 
 /*
  * EXTERN VARIABLES
  */
-extern struct firmware *bfi_fw;
-extern u32 		bnad_rxqs_per_cq;
+extern struct firmware		*bfi_fw;
+extern struct mutex		bnad_list_mutex;
+extern struct list_head		bnad_list;
+extern u32			bnad_rxqs_per_cq;
 
 /*
  * EXTERN PROTOTYPES
diff --git a/drivers/net/bna/bnad_genl.c b/drivers/net/bna/bnad_genl.c
new file mode 100644
index 0000000..d8e49ad
--- /dev/null
+++ b/drivers/net/bna/bnad_genl.c
@@ -0,0 +1,228 @@
+/*
+ * Linux network driver for Brocade Converged Network Adapter.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License (GPL) Version 2 as
+ * published by the Free Software Foundation
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+/*
+ * Copyright (c) 2005-2011 Brocade Communications Systems, Inc.
+ * All rights reserved
+ * www.brocade.com
+ */
+#include "bnad.h"
+#include "bnad_genl.h"
+#include "bna.h"
+
+static struct genl_family bnad_genl_family = {
+	.id = GENL_ID_GENERATE,
+	.name = "BNAD_GENL",
+	.version = BNAD_GENL_VERSION,
+	.hdrsize = 0,
+	.maxattr = BNAD_GENL_ATTR_MAX,
+};
+
+struct bnad_fw_debug_info {
+	char	*debug_buffer;
+	u32	buffer_len;
+};
+
+static struct bnad *
+bnad_get_bnadev(int bna_id)
+{
+	struct bnad *bnad;
+
+	mutex_lock(&bnad_list_mutex);
+	list_for_each_entry(bnad, &bnad_list, list_entry) {
+		if (bnad->id == bna_id) {
+			mutex_unlock(&bnad_list_mutex);
+			return bnad;
+		}
+	}
+	mutex_unlock(&bnad_list_mutex);
+	return NULL;
+}
+
+static int
+bnad_genl_fwtrc_msg_fill(struct sk_buff *skb, u32 pid, u32 seq, int flags,
+			struct bnad_fw_debug_info *fw_debug, long *offset)
+{
+	void *genl_msg_hdr = NULL;
+	int sk_buff_len, err = 0;
+
+	/* genlmsg_put */
+	genl_msg_hdr = genlmsg_put(skb, pid, seq, &bnad_genl_family, flags,
+							BNAD_GENL_CMD_FWTRC);
+	if (!genl_msg_hdr) {
+		pr_err("bna: Failed to get the genl_msg_header\n");
+		return -ENOMEM;
+	}
+
+	/* NLA_PUT
+	 * sk_buff_len - available lenth for attribute payload
+	 * offset - offset in fwtrc buffer
+	 */
+	sk_buff_len = skb_tailroom(skb) - NLA_HDRLEN;
+	if ((fw_debug->buffer_len - *offset) < sk_buff_len)
+		sk_buff_len = fw_debug->buffer_len - *offset;
+
+	NLA_PUT(skb, BNAD_GENL_ATTR_FWTRC, sk_buff_len,
+				((fw_debug->debug_buffer) + *offset));
+
+	*offset += sk_buff_len;
+
+	/* genlmsg_end */
+	err = genlmsg_end(skb, genl_msg_hdr);
+	if (err < 0) {
+		pr_err("bna: Failed to do genlmsg_end\n");
+		return -ENOMEM;
+	}
+
+	return 0;
+
+nla_put_failure:
+	pr_err("bna: Failed to do NLA_PUT\n");
+	genlmsg_cancel(skb, genl_msg_hdr);
+	return -EMSGSIZE;
+}
+
+static int
+bnad_genl_fwtrc_get(struct sk_buff *skb, struct netlink_callback *cb)
+{
+	/*
+	 * cb->args[0] firmware trace offset
+	 * cb->args[1] firmware buffer pointer
+	 */
+	long offset = cb->args[0];
+	int err;
+	struct bnad *bnad = NULL;
+	struct bnad_fw_debug_info *fw_debug = NULL;
+	struct bnad_genl_debug *iocmd = NULL;
+
+	if (offset >= (long)BFA_DBG_FWTRC_LEN) {
+		fw_debug = (struct bnad_fw_debug_info *)cb->args[1];
+		vfree(fw_debug->debug_buffer);
+		fw_debug->debug_buffer = NULL;
+		kfree(fw_debug);
+		fw_debug = NULL;
+		return skb->len;
+	}
+
+	if (offset == 0) {
+		/* Get the driver instance */
+		iocmd = (struct bnad_genl_debug *)
+			(((char *)NLMSG_DATA(cb->nlh)) +
+			GENL_HDRLEN + NLA_HDRLEN);
+		bnad = bnad_get_bnadev(iocmd->inst_no);
+		if (!bnad) {
+			pr_warn("bna: Failed to get driver instance\n");
+			return -EINVAL;
+		}
+
+		/* Allocate memory for fwtrc structure and buffer */
+		fw_debug = kzalloc(sizeof(struct bnad_fw_debug_info),
+								GFP_KERNEL);
+		if (!fw_debug)
+			return -ENOMEM;
+
+		fw_debug->buffer_len = iocmd->bufsz;
+
+		fw_debug->debug_buffer = vzalloc(fw_debug->buffer_len);
+		if (!fw_debug->debug_buffer) {
+			kfree(fw_debug);
+			fw_debug = NULL;
+			pr_err("bnad[%d]: Failed to allocate fwtrc buffer\n",
+								bnad->id);
+			return -ENOMEM;
+		}
+
+		/* Get the firmware trace into the buffer */
+		err = bfa_nw_ioc_debug_fwtrc(&bnad->bna.device.ioc,
+				fw_debug->debug_buffer, &fw_debug->buffer_len);
+		if (err) {
+			vfree(fw_debug->debug_buffer);
+			fw_debug->debug_buffer = NULL;
+			kfree(fw_debug);
+			fw_debug = NULL;
+			pr_err("bnad[%d]: Failed to collect fwtrc\n", bnad->id);
+			return -ENOMEM;;
+		}
+
+		cb->args[1] = (long)fw_debug;
+	} else
+		fw_debug = (struct bnad_fw_debug_info *)cb->args[1];
+
+	err = bnad_genl_fwtrc_msg_fill(skb, NETLINK_CB(cb->skb).pid,
+			cb->nlh->nlmsg_seq, NLM_F_MULTI, fw_debug, &offset);
+	if (err < 0)
+		return err;
+
+	cb->args[0] = offset;
+	return skb->len;
+}
+
+static struct genl_ops bnad_genl_ops[] = {
+	{
+	.cmd = BNAD_GENL_CMD_FWTRC,
+	.flags = 0,
+	.policy = NULL,
+	.doit = NULL,
+	.dumpit = bnad_genl_fwtrc_get,
+	},
+};
+
+int
+bnad_genl_init(void)
+{
+	int i, err = 0;
+
+	/* Register family */
+	err = genl_register_family(&bnad_genl_family);
+	if (err) {
+		pr_err("bna: failed to register with Netlink\n");
+		return err;
+	}
+	pr_info("bna: registered with Netlink\n");
+
+	/* Register ops */
+	for (i = 0; i < sizeof(bnad_genl_ops) / sizeof(bnad_genl_ops[0]); i++) {
+		err = genl_register_ops(&bnad_genl_family, &bnad_genl_ops[i]);
+		if (err)
+			pr_err("bna: failed to register netlink op %u\n",
+				bnad_genl_ops[i].cmd);
+		else
+			pr_info("bna: registered netlink op %u\n",
+				bnad_genl_ops[i].cmd);
+	}
+
+	return err;
+}
+
+int
+bnad_genl_uninit(void)
+{
+	int i, err = 0;
+
+	for (i = 0; i < sizeof(bnad_genl_ops) / sizeof(bnad_genl_ops[0]); i++) {
+		err = genl_unregister_ops(&bnad_genl_family, &bnad_genl_ops[i]);
+		if (err)
+			pr_err("bna: failed to unregister netlink op %u)\n",
+				bnad_genl_ops[i].cmd);
+		else
+			pr_info("bna: unregistered netlink op %u\n",
+				bnad_genl_ops[i].cmd);
+	}
+
+	err = genl_unregister_family(&bnad_genl_family);
+	if (err)
+		pr_err("bna: failed to unregister with Netlink\n");
+	else
+		pr_info("bna: unregistered with Netlink\n");
+
+	return err;
+}
diff --git a/drivers/net/bna/bnad_genl.h b/drivers/net/bna/bnad_genl.h
new file mode 100644
index 0000000..fb75a6e
--- /dev/null
+++ b/drivers/net/bna/bnad_genl.h
@@ -0,0 +1,63 @@
+/*
+ * Linux network driver for Brocade Converged Network Adapter.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License (GPL) Version 2 as
+ * published by the Free Software Foundation
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+/*
+ * Copyright (c) 2005-2011 Brocade Communications Systems, Inc.
+ * All rights reserved
+ * www.brocade.com
+ */
+#ifndef __BNAD_GENL_H__
+#define __BNAD_GENL_H__
+
+#include <linux/device.h>
+#include <linux/netdevice.h>
+#include <linux/gfp.h>
+#include <linux/skbuff.h>
+#include <linux/ethtool.h>
+#include <linux/rtnetlink.h>
+#include <net/genetlink.h>
+
+/* Attributes */
+enum {
+	BNAD_GENL_ATTR_UNSPEC,
+	BNAD_GENL_ATTR_FWTRC,
+	__BNAD_GENL_ATTR_MAX
+};
+
+/* Effectively a single attribute */
+#define BNAD_GENL_ATTR_MAX (__BNAD_GENL_ATTR_MAX - 1)
+
+enum {
+	BNAD_GENL_VERSION = 1,
+};
+
+/* Commands/Responses */
+enum {
+	BNAD_GENL_CMD_UNSPEC,
+	BNAD_GENL_CMD_FWTRC,
+	__BNAD_GENL_CMD_MAX,
+};
+
+struct bnad_genl_debug {
+	int		status;
+	u16		bnad_num;
+	u16		rsvd;
+	u32		bufsz;
+	int		inst_no;
+	u64		buf_ptr;
+	u64		offset;
+};
+
+extern int bnad_genl_init(void);
+extern int bnad_genl_uninit(void);
+
+#endif /* __BNAD_GENL_H__ */
-- 
1.7.1


^ permalink raw reply related

* Re: [PATCH net-next-2.6 0/7] SCTP updates for net-next-2.6
From: David Miller @ 2011-04-27  1:47 UTC (permalink / raw)
  To: yjwei; +Cc: netdev, linux-sctp
In-Reply-To: <4DB76A64.3020708@cn.fujitsu.com>

From: Wei Yongjun <yjwei@cn.fujitsu.com>
Date: Wed, 27 Apr 2011 08:59:16 +0800

> And I have a stupid question about the rule of backport.  Since those
> patchs have existed so long time, when I backport those patchs, I'd better
> fix the bug in the original patch, or create new patch to fix it? Also how
> about some thing need to improvement like the ->dst_saddr() method?

I think, since these are going in as patches, you should make any
corrections to those patches by fixing the patch.

Adding new patches is just going to keep the bug in there if people
try to bisect through these changes, and that's no good.

Thanks.

^ permalink raw reply

* Re: [PATCH net-next-2.6] ipv4,ipv6,bonding: Restore control over number of peer notifications
From: Jay Vosburgh @ 2011-04-27  1:44 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Andy Gospodarek, David Miller, Patrick McHardy, netdev,
	Brian Haley
In-Reply-To: <1303867552.2850.39.camel@bwh-desktop>

Ben Hutchings <bhutchings@solarflare.com> wrote:

>For backward compatibility, we should retain the module parameters and
>sysfs attributes to control the number of peer notifications
>(gratuitous ARPs and unsolicited NAs) sent after bonding failover.
>Also, it is possible for failover to take place even though the new
>active slave does not have link up, and in that case the peer
>notification should be deferred until it does.
>
>Change ipv4 and ipv6 so they do not automatically send peer
>notifications on bonding failover.
>
>Change the bonding driver to send separate NETDEV_NOTIFY_PEERS
>notifications when the link is up, as many times as requested.  Since
>it does not directly control which protocols send notifications, make
>num_grat_arp and num_unsol_na aliases for a single parameter.  Bump
>the bonding version number and update its documentation.
>
>Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>

>---
> Documentation/networking/bonding.txt |   34 +++++++++----------
> drivers/net/bonding/bond_main.c      |   59 ++++++++++++++++++++++++++++++++++
> drivers/net/bonding/bond_sysfs.c     |   26 +++++++++++++++
> drivers/net/bonding/bonding.h        |    6 ++-
> net/ipv4/devinet.c                   |    1 -
> net/ipv6/ndisc.c                     |    1 -
> 6 files changed, 105 insertions(+), 22 deletions(-)
>
>diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
>index e27202b..1f45bd8 100644
>--- a/Documentation/networking/bonding.txt
>+++ b/Documentation/networking/bonding.txt
>@@ -1,7 +1,7 @@
>
> 		Linux Ethernet Bonding Driver HOWTO
>
>-		Latest update: 23 September 2009
>+		Latest update: 27 April 2011
>
> Initial release : Thomas Davis <tadavis at lbl.gov>
> Corrections, HA extensions : 2000/10/03-15 :
>@@ -585,25 +585,23 @@ mode
> 		chosen.
>
> num_grat_arp
>-
>-	Specifies the number of gratuitous ARPs to be issued after a
>-	failover event.  One gratuitous ARP is issued immediately after
>-	the failover, subsequent ARPs are sent at a rate of one per link
>-	monitor interval (arp_interval or miimon, whichever is active).
>-
>-	The valid range is 0 - 255; the default value is 1.  This option
>-	affects only the active-backup mode.  This option was added for
>-	bonding version 3.3.0.
>-
> num_unsol_na
>
>-	Specifies the number of unsolicited IPv6 Neighbor Advertisements
>-	to be issued after a failover event.  One unsolicited NA is issued
>-	immediately after the failover.
>-
>-	The valid range is 0 - 255; the default value is 1.  This option
>-	affects only the active-backup mode.  This option was added for
>-	bonding version 3.4.0.
>+	Specify the number of peer notifications (gratuitous ARPs and
>+	unsolicited IPv6 Neighbor Advertisements) to be issued after a
>+	failover event.  As soon as the link is up on the new slave
>+	(possibly immediately) a peer notification is sent on the
>+	bonding device and each VLAN sub-device.  This is repeated at
>+	each link monitor interval (arp_interval or miimon, whichever
>+	is active) if the number is greater than 1.
>+
>+	The valid range is 0 - 255; the default value is 1.  These options
>+	affect only the active-backup mode.  These options were added for
>+	bonding versions 3.3.0 and 3.4.0 respectively.
>+
>+	From Linux 2.6.40 and bonding version 3.7.1, these notifications
>+	are generated by the ipv4 and ipv6 code and the numbers of
>+	repetitions cannot be set independently.
>
> primary
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 66d9dc6..22bd03b 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -89,6 +89,7 @@
>
> static int max_bonds	= BOND_DEFAULT_MAX_BONDS;
> static int tx_queues	= BOND_DEFAULT_TX_QUEUES;
>+static int num_peer_notif = 1;
> static int miimon	= BOND_LINK_MON_INTERV;
> static int updelay;
> static int downdelay;
>@@ -111,6 +112,10 @@ module_param(max_bonds, int, 0);
> MODULE_PARM_DESC(max_bonds, "Max number of bonded devices");
> module_param(tx_queues, int, 0);
> MODULE_PARM_DESC(tx_queues, "Max number of transmit queues (default = 16)");
>+module_param_named(num_grat_arp, num_peer_notif, int, 0644);
>+MODULE_PARM_DESC(num_grat_arp, "Number of peer notifications to send on failover event (alias of num_unsol_na)");
>+module_param_named(num_unsol_na, num_peer_notif, int, 0644);
>+MODULE_PARM_DESC(num_unsol_na, "Number of peer notifications to send on failover event (alias of num_grat_arp)");
> module_param(miimon, int, 0);
> MODULE_PARM_DESC(miimon, "Link check interval in milliseconds");
> module_param(updelay, int, 0);
>@@ -1082,6 +1087,21 @@ static struct slave *bond_find_best_slave(struct bonding *bond)
> 	return bestslave;
> }
>
>+static bool bond_should_notify_peers(struct bonding *bond)
>+{
>+	struct slave *slave = bond->curr_active_slave;
>+
>+	pr_debug("bond_should_notify_peers: bond %s slave %s\n",
>+		 bond->dev->name, slave ? slave->dev->name : "NULL");
>+
>+	if (!slave || !bond->send_peer_notif ||
>+	    test_bit(__LINK_STATE_LINKWATCH_PENDING, &slave->dev->state))
>+		return false;
>+
>+	bond->send_peer_notif--;
>+	return true;
>+}
>+
> /**
>  * change_active_interface - change the active slave into the specified one
>  * @bond: our bonding struct
>@@ -1149,16 +1169,28 @@ void bond_change_active_slave(struct bonding *bond, struct slave *new_active)
> 			bond_set_slave_inactive_flags(old_active);
>
> 		if (new_active) {
>+			bool should_notify_peers = false;
>+
> 			bond_set_slave_active_flags(new_active);
>
> 			if (bond->params.fail_over_mac)
> 				bond_do_fail_over_mac(bond, new_active,
> 						      old_active);
>
>+			if (netif_running(bond->dev)) {
>+				bond->send_peer_notif =
>+					bond->params.num_peer_notif;
>+				should_notify_peers =
>+					bond_should_notify_peers(bond);
>+			}
>+
> 			write_unlock_bh(&bond->curr_slave_lock);
> 			read_unlock(&bond->lock);
>
> 			netdev_bonding_change(bond->dev, NETDEV_BONDING_FAILOVER);
>+			if (should_notify_peers)
>+				netdev_bonding_change(bond->dev,
>+						      NETDEV_NOTIFY_PEERS);
>
> 			read_lock(&bond->lock);
> 			write_lock_bh(&bond->curr_slave_lock);
>@@ -2556,6 +2588,7 @@ void bond_mii_monitor(struct work_struct *work)
> {
> 	struct bonding *bond = container_of(work, struct bonding,
> 					    mii_work.work);
>+	bool should_notify_peers = false;
>
> 	read_lock(&bond->lock);
> 	if (bond->kill_timers)
>@@ -2564,6 +2597,8 @@ void bond_mii_monitor(struct work_struct *work)
> 	if (bond->slave_cnt == 0)
> 		goto re_arm;
>
>+	should_notify_peers = bond_should_notify_peers(bond);
>+
> 	if (bond_miimon_inspect(bond)) {
> 		read_unlock(&bond->lock);
> 		rtnl_lock();
>@@ -2582,6 +2617,12 @@ re_arm:
> 				   msecs_to_jiffies(bond->params.miimon));
> out:
> 	read_unlock(&bond->lock);
>+
>+	if (should_notify_peers) {
>+		rtnl_lock();
>+		netdev_bonding_change(bond->dev, NETDEV_NOTIFY_PEERS);
>+		rtnl_unlock();
>+	}
> }
>
> static __be32 bond_glean_dev_ip(struct net_device *dev)
>@@ -3154,6 +3195,7 @@ void bond_activebackup_arp_mon(struct work_struct *work)
> {
> 	struct bonding *bond = container_of(work, struct bonding,
> 					    arp_work.work);
>+	bool should_notify_peers = false;
> 	int delta_in_ticks;
>
> 	read_lock(&bond->lock);
>@@ -3166,6 +3208,8 @@ void bond_activebackup_arp_mon(struct work_struct *work)
> 	if (bond->slave_cnt == 0)
> 		goto re_arm;
>
>+	should_notify_peers = bond_should_notify_peers(bond);
>+
> 	if (bond_ab_arp_inspect(bond, delta_in_ticks)) {
> 		read_unlock(&bond->lock);
> 		rtnl_lock();
>@@ -3185,6 +3229,12 @@ re_arm:
> 		queue_delayed_work(bond->wq, &bond->arp_work, delta_in_ticks);
> out:
> 	read_unlock(&bond->lock);
>+
>+	if (should_notify_peers) {
>+		rtnl_lock();
>+		netdev_bonding_change(bond->dev, NETDEV_NOTIFY_PEERS);
>+		rtnl_unlock();
>+	}
> }
>
> /*-------------------------- netdev event handling --------------------------*/
>@@ -3494,6 +3544,8 @@ static int bond_close(struct net_device *bond_dev)
>
> 	write_lock_bh(&bond->lock);
>
>+	bond->send_peer_notif = 0;
>+
> 	/* signal timers not to re-arm */
> 	bond->kill_timers = 1;
>
>@@ -4571,6 +4623,12 @@ static int bond_check_params(struct bond_params *params)
> 		use_carrier = 1;
> 	}
>
>+	if (num_peer_notif < 0 || num_peer_notif > 255) {
>+		pr_warning("Warning: num_grat_arp/num_unsol_na (%d) not in range 0-255 so it was reset to 1\n",
>+			   num_peer_notif);
>+		num_peer_notif = 1;
>+	}
>+
> 	/* reset values for 802.3ad */
> 	if (bond_mode == BOND_MODE_8023AD) {
> 		if (!miimon) {
>@@ -4760,6 +4818,7 @@ static int bond_check_params(struct bond_params *params)
> 	params->mode = bond_mode;
> 	params->xmit_policy = xmit_hashtype;
> 	params->miimon = miimon;
>+	params->num_peer_notif = num_peer_notif;
> 	params->arp_interval = arp_interval;
> 	params->arp_validate = arp_validate_value;
> 	params->updelay = updelay;
>diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
>index 935406a..4059bfc 100644
>--- a/drivers/net/bonding/bond_sysfs.c
>+++ b/drivers/net/bonding/bond_sysfs.c
>@@ -869,6 +869,30 @@ static DEVICE_ATTR(ad_select, S_IRUGO | S_IWUSR,
> 		   bonding_show_ad_select, bonding_store_ad_select);
>
> /*
>+ * Show and set the number of peer notifications to send after a failover event.
>+ */
>+static ssize_t bonding_show_num_peer_notif(struct device *d,
>+					   struct device_attribute *attr,
>+					   char *buf)
>+{
>+	struct bonding *bond = to_bond(d);
>+	return sprintf(buf, "%d\n", bond->params.num_peer_notif);
>+}
>+
>+static ssize_t bonding_store_num_peer_notif(struct device *d,
>+					    struct device_attribute *attr,
>+					    const char *buf, size_t count)
>+{
>+	struct bonding *bond = to_bond(d);
>+	int err = kstrtou8(buf, 10, &bond->params.num_peer_notif);
>+	return err ? err : count;
>+}
>+static DEVICE_ATTR(num_grat_arp, S_IRUGO | S_IWUSR,
>+		   bonding_show_num_peer_notif, bonding_store_num_peer_notif);
>+static DEVICE_ATTR(num_unsol_na, S_IRUGO | S_IWUSR,
>+		   bonding_show_num_peer_notif, bonding_store_num_peer_notif);
>+
>+/*
>  * Show and set the MII monitor interval.  There are two tricky bits
>  * here.  First, if MII monitoring is activated, then we must disable
>  * ARP monitoring.  Second, if the timer isn't running, we must
>@@ -1566,6 +1590,8 @@ static struct attribute *per_bond_attrs[] = {
> 	&dev_attr_lacp_rate.attr,
> 	&dev_attr_ad_select.attr,
> 	&dev_attr_xmit_hash_policy.attr,
>+	&dev_attr_num_grat_arp.attr,
>+	&dev_attr_num_unsol_na.attr,
> 	&dev_attr_miimon.attr,
> 	&dev_attr_primary.attr,
> 	&dev_attr_primary_reselect.attr,
>diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
>index 85fb822..d08362e 100644
>--- a/drivers/net/bonding/bonding.h
>+++ b/drivers/net/bonding/bonding.h
>@@ -24,8 +24,8 @@
> #include "bond_3ad.h"
> #include "bond_alb.h"
>
>-#define DRV_VERSION	"3.7.0"
>-#define DRV_RELDATE	"June 2, 2010"
>+#define DRV_VERSION	"3.7.1"
>+#define DRV_RELDATE	"April 27, 2011"
> #define DRV_NAME	"bonding"
> #define DRV_DESCRIPTION	"Ethernet Channel Bonding Driver"
>
>@@ -149,6 +149,7 @@ struct bond_params {
> 	int mode;
> 	int xmit_policy;
> 	int miimon;
>+	u8 num_peer_notif;
> 	int arp_interval;
> 	int arp_validate;
> 	int use_carrier;
>@@ -231,6 +232,7 @@ struct bonding {
> 	rwlock_t lock;
> 	rwlock_t curr_slave_lock;
> 	s8       kill_timers;
>+	u8	 send_peer_notif;
> 	s8	 setup_by_slave;
> 	s8       igmp_retrans;
> #ifdef CONFIG_PROC_FS
>diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
>index acf553f..5345b0b 100644
>--- a/net/ipv4/devinet.c
>+++ b/net/ipv4/devinet.c
>@@ -1203,7 +1203,6 @@ static int inetdev_event(struct notifier_block *this, unsigned long event,
> 			break;
> 		/* fall through */
> 	case NETDEV_NOTIFY_PEERS:
>-	case NETDEV_BONDING_FAILOVER:
> 		/* Send gratuitous ARP to notify of link change */
> 		inetdev_send_gratuitous_arp(dev, in_dev);
> 		break;
>diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
>index 69aacd1..7596f07 100644
>--- a/net/ipv6/ndisc.c
>+++ b/net/ipv6/ndisc.c
>@@ -1747,7 +1747,6 @@ static int ndisc_netdev_event(struct notifier_block *this, unsigned long event,
> 		fib6_run_gc(~0UL, net);
> 		break;
> 	case NETDEV_NOTIFY_PEERS:
>-	case NETDEV_BONDING_FAILOVER:
> 		ndisc_send_unsol_na(dev);
> 		break;
> 	default:
>-- 
>1.7.4
>
>
>-- 
>Ben Hutchings, Senior Software Engineer, Solarflare
>Not speaking for my employer; that's the marketing department's job.
>They asked us to note that Solarflare product names are trademarked.
>
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH net-next-2.6] ipv4,ipv6,bonding: Restore control over number of peer notifications
From: Ben Hutchings @ 2011-04-27  1:25 UTC (permalink / raw)
  To: Jay Vosburgh, Andy Gospodarek
  Cc: David Miller, Patrick McHardy, netdev, Brian Haley

For backward compatibility, we should retain the module parameters and
sysfs attributes to control the number of peer notifications
(gratuitous ARPs and unsolicited NAs) sent after bonding failover.
Also, it is possible for failover to take place even though the new
active slave does not have link up, and in that case the peer
notification should be deferred until it does.

Change ipv4 and ipv6 so they do not automatically send peer
notifications on bonding failover.

Change the bonding driver to send separate NETDEV_NOTIFY_PEERS
notifications when the link is up, as many times as requested.  Since
it does not directly control which protocols send notifications, make
num_grat_arp and num_unsol_na aliases for a single parameter.  Bump
the bonding version number and update its documentation.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 Documentation/networking/bonding.txt |   34 +++++++++----------
 drivers/net/bonding/bond_main.c      |   59 ++++++++++++++++++++++++++++++++++
 drivers/net/bonding/bond_sysfs.c     |   26 +++++++++++++++
 drivers/net/bonding/bonding.h        |    6 ++-
 net/ipv4/devinet.c                   |    1 -
 net/ipv6/ndisc.c                     |    1 -
 6 files changed, 105 insertions(+), 22 deletions(-)

diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index e27202b..1f45bd8 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -1,7 +1,7 @@
 
 		Linux Ethernet Bonding Driver HOWTO
 
-		Latest update: 23 September 2009
+		Latest update: 27 April 2011
 
 Initial release : Thomas Davis <tadavis at lbl.gov>
 Corrections, HA extensions : 2000/10/03-15 :
@@ -585,25 +585,23 @@ mode
 		chosen.
 
 num_grat_arp
-
-	Specifies the number of gratuitous ARPs to be issued after a
-	failover event.  One gratuitous ARP is issued immediately after
-	the failover, subsequent ARPs are sent at a rate of one per link
-	monitor interval (arp_interval or miimon, whichever is active).
-
-	The valid range is 0 - 255; the default value is 1.  This option
-	affects only the active-backup mode.  This option was added for
-	bonding version 3.3.0.
-
 num_unsol_na
 
-	Specifies the number of unsolicited IPv6 Neighbor Advertisements
-	to be issued after a failover event.  One unsolicited NA is issued
-	immediately after the failover.
-
-	The valid range is 0 - 255; the default value is 1.  This option
-	affects only the active-backup mode.  This option was added for
-	bonding version 3.4.0.
+	Specify the number of peer notifications (gratuitous ARPs and
+	unsolicited IPv6 Neighbor Advertisements) to be issued after a
+	failover event.  As soon as the link is up on the new slave
+	(possibly immediately) a peer notification is sent on the
+	bonding device and each VLAN sub-device.  This is repeated at
+	each link monitor interval (arp_interval or miimon, whichever
+	is active) if the number is greater than 1.
+
+	The valid range is 0 - 255; the default value is 1.  These options
+	affect only the active-backup mode.  These options were added for
+	bonding versions 3.3.0 and 3.4.0 respectively.
+
+	From Linux 2.6.40 and bonding version 3.7.1, these notifications
+	are generated by the ipv4 and ipv6 code and the numbers of
+	repetitions cannot be set independently.
 
 primary
 
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 66d9dc6..22bd03b 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -89,6 +89,7 @@
 
 static int max_bonds	= BOND_DEFAULT_MAX_BONDS;
 static int tx_queues	= BOND_DEFAULT_TX_QUEUES;
+static int num_peer_notif = 1;
 static int miimon	= BOND_LINK_MON_INTERV;
 static int updelay;
 static int downdelay;
@@ -111,6 +112,10 @@ module_param(max_bonds, int, 0);
 MODULE_PARM_DESC(max_bonds, "Max number of bonded devices");
 module_param(tx_queues, int, 0);
 MODULE_PARM_DESC(tx_queues, "Max number of transmit queues (default = 16)");
+module_param_named(num_grat_arp, num_peer_notif, int, 0644);
+MODULE_PARM_DESC(num_grat_arp, "Number of peer notifications to send on failover event (alias of num_unsol_na)");
+module_param_named(num_unsol_na, num_peer_notif, int, 0644);
+MODULE_PARM_DESC(num_unsol_na, "Number of peer notifications to send on failover event (alias of num_grat_arp)");
 module_param(miimon, int, 0);
 MODULE_PARM_DESC(miimon, "Link check interval in milliseconds");
 module_param(updelay, int, 0);
@@ -1082,6 +1087,21 @@ static struct slave *bond_find_best_slave(struct bonding *bond)
 	return bestslave;
 }
 
+static bool bond_should_notify_peers(struct bonding *bond)
+{
+	struct slave *slave = bond->curr_active_slave;
+
+	pr_debug("bond_should_notify_peers: bond %s slave %s\n",
+		 bond->dev->name, slave ? slave->dev->name : "NULL");
+
+	if (!slave || !bond->send_peer_notif ||
+	    test_bit(__LINK_STATE_LINKWATCH_PENDING, &slave->dev->state))
+		return false;
+
+	bond->send_peer_notif--;
+	return true;
+}
+
 /**
  * change_active_interface - change the active slave into the specified one
  * @bond: our bonding struct
@@ -1149,16 +1169,28 @@ void bond_change_active_slave(struct bonding *bond, struct slave *new_active)
 			bond_set_slave_inactive_flags(old_active);
 
 		if (new_active) {
+			bool should_notify_peers = false;
+
 			bond_set_slave_active_flags(new_active);
 
 			if (bond->params.fail_over_mac)
 				bond_do_fail_over_mac(bond, new_active,
 						      old_active);
 
+			if (netif_running(bond->dev)) {
+				bond->send_peer_notif =
+					bond->params.num_peer_notif;
+				should_notify_peers =
+					bond_should_notify_peers(bond);
+			}
+
 			write_unlock_bh(&bond->curr_slave_lock);
 			read_unlock(&bond->lock);
 
 			netdev_bonding_change(bond->dev, NETDEV_BONDING_FAILOVER);
+			if (should_notify_peers)
+				netdev_bonding_change(bond->dev,
+						      NETDEV_NOTIFY_PEERS);
 
 			read_lock(&bond->lock);
 			write_lock_bh(&bond->curr_slave_lock);
@@ -2556,6 +2588,7 @@ void bond_mii_monitor(struct work_struct *work)
 {
 	struct bonding *bond = container_of(work, struct bonding,
 					    mii_work.work);
+	bool should_notify_peers = false;
 
 	read_lock(&bond->lock);
 	if (bond->kill_timers)
@@ -2564,6 +2597,8 @@ void bond_mii_monitor(struct work_struct *work)
 	if (bond->slave_cnt == 0)
 		goto re_arm;
 
+	should_notify_peers = bond_should_notify_peers(bond);
+
 	if (bond_miimon_inspect(bond)) {
 		read_unlock(&bond->lock);
 		rtnl_lock();
@@ -2582,6 +2617,12 @@ re_arm:
 				   msecs_to_jiffies(bond->params.miimon));
 out:
 	read_unlock(&bond->lock);
+
+	if (should_notify_peers) {
+		rtnl_lock();
+		netdev_bonding_change(bond->dev, NETDEV_NOTIFY_PEERS);
+		rtnl_unlock();
+	}
 }
 
 static __be32 bond_glean_dev_ip(struct net_device *dev)
@@ -3154,6 +3195,7 @@ void bond_activebackup_arp_mon(struct work_struct *work)
 {
 	struct bonding *bond = container_of(work, struct bonding,
 					    arp_work.work);
+	bool should_notify_peers = false;
 	int delta_in_ticks;
 
 	read_lock(&bond->lock);
@@ -3166,6 +3208,8 @@ void bond_activebackup_arp_mon(struct work_struct *work)
 	if (bond->slave_cnt == 0)
 		goto re_arm;
 
+	should_notify_peers = bond_should_notify_peers(bond);
+
 	if (bond_ab_arp_inspect(bond, delta_in_ticks)) {
 		read_unlock(&bond->lock);
 		rtnl_lock();
@@ -3185,6 +3229,12 @@ re_arm:
 		queue_delayed_work(bond->wq, &bond->arp_work, delta_in_ticks);
 out:
 	read_unlock(&bond->lock);
+
+	if (should_notify_peers) {
+		rtnl_lock();
+		netdev_bonding_change(bond->dev, NETDEV_NOTIFY_PEERS);
+		rtnl_unlock();
+	}
 }
 
 /*-------------------------- netdev event handling --------------------------*/
@@ -3494,6 +3544,8 @@ static int bond_close(struct net_device *bond_dev)
 
 	write_lock_bh(&bond->lock);
 
+	bond->send_peer_notif = 0;
+
 	/* signal timers not to re-arm */
 	bond->kill_timers = 1;
 
@@ -4571,6 +4623,12 @@ static int bond_check_params(struct bond_params *params)
 		use_carrier = 1;
 	}
 
+	if (num_peer_notif < 0 || num_peer_notif > 255) {
+		pr_warning("Warning: num_grat_arp/num_unsol_na (%d) not in range 0-255 so it was reset to 1\n",
+			   num_peer_notif);
+		num_peer_notif = 1;
+	}
+
 	/* reset values for 802.3ad */
 	if (bond_mode == BOND_MODE_8023AD) {
 		if (!miimon) {
@@ -4760,6 +4818,7 @@ static int bond_check_params(struct bond_params *params)
 	params->mode = bond_mode;
 	params->xmit_policy = xmit_hashtype;
 	params->miimon = miimon;
+	params->num_peer_notif = num_peer_notif;
 	params->arp_interval = arp_interval;
 	params->arp_validate = arp_validate_value;
 	params->updelay = updelay;
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 935406a..4059bfc 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -869,6 +869,30 @@ static DEVICE_ATTR(ad_select, S_IRUGO | S_IWUSR,
 		   bonding_show_ad_select, bonding_store_ad_select);
 
 /*
+ * Show and set the number of peer notifications to send after a failover event.
+ */
+static ssize_t bonding_show_num_peer_notif(struct device *d,
+					   struct device_attribute *attr,
+					   char *buf)
+{
+	struct bonding *bond = to_bond(d);
+	return sprintf(buf, "%d\n", bond->params.num_peer_notif);
+}
+
+static ssize_t bonding_store_num_peer_notif(struct device *d,
+					    struct device_attribute *attr,
+					    const char *buf, size_t count)
+{
+	struct bonding *bond = to_bond(d);
+	int err = kstrtou8(buf, 10, &bond->params.num_peer_notif);
+	return err ? err : count;
+}
+static DEVICE_ATTR(num_grat_arp, S_IRUGO | S_IWUSR,
+		   bonding_show_num_peer_notif, bonding_store_num_peer_notif);
+static DEVICE_ATTR(num_unsol_na, S_IRUGO | S_IWUSR,
+		   bonding_show_num_peer_notif, bonding_store_num_peer_notif);
+
+/*
  * Show and set the MII monitor interval.  There are two tricky bits
  * here.  First, if MII monitoring is activated, then we must disable
  * ARP monitoring.  Second, if the timer isn't running, we must
@@ -1566,6 +1590,8 @@ static struct attribute *per_bond_attrs[] = {
 	&dev_attr_lacp_rate.attr,
 	&dev_attr_ad_select.attr,
 	&dev_attr_xmit_hash_policy.attr,
+	&dev_attr_num_grat_arp.attr,
+	&dev_attr_num_unsol_na.attr,
 	&dev_attr_miimon.attr,
 	&dev_attr_primary.attr,
 	&dev_attr_primary_reselect.attr,
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index 85fb822..d08362e 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -24,8 +24,8 @@
 #include "bond_3ad.h"
 #include "bond_alb.h"
 
-#define DRV_VERSION	"3.7.0"
-#define DRV_RELDATE	"June 2, 2010"
+#define DRV_VERSION	"3.7.1"
+#define DRV_RELDATE	"April 27, 2011"
 #define DRV_NAME	"bonding"
 #define DRV_DESCRIPTION	"Ethernet Channel Bonding Driver"
 
@@ -149,6 +149,7 @@ struct bond_params {
 	int mode;
 	int xmit_policy;
 	int miimon;
+	u8 num_peer_notif;
 	int arp_interval;
 	int arp_validate;
 	int use_carrier;
@@ -231,6 +232,7 @@ struct bonding {
 	rwlock_t lock;
 	rwlock_t curr_slave_lock;
 	s8       kill_timers;
+	u8	 send_peer_notif;
 	s8	 setup_by_slave;
 	s8       igmp_retrans;
 #ifdef CONFIG_PROC_FS
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index acf553f..5345b0b 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1203,7 +1203,6 @@ static int inetdev_event(struct notifier_block *this, unsigned long event,
 			break;
 		/* fall through */
 	case NETDEV_NOTIFY_PEERS:
-	case NETDEV_BONDING_FAILOVER:
 		/* Send gratuitous ARP to notify of link change */
 		inetdev_send_gratuitous_arp(dev, in_dev);
 		break;
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 69aacd1..7596f07 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -1747,7 +1747,6 @@ static int ndisc_netdev_event(struct notifier_block *this, unsigned long event,
 		fib6_run_gc(~0UL, net);
 		break;
 	case NETDEV_NOTIFY_PEERS:
-	case NETDEV_BONDING_FAILOVER:
 		ndisc_send_unsol_na(dev);
 		break;
 	default:
-- 
1.7.4


-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply related

* Re: [PATCH net-next-2.6 0/7] SCTP updates for net-next-2.6
From: Wei Yongjun @ 2011-04-27  0:59 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-sctp
In-Reply-To: <20110426.145120.28826019.davem@davemloft.net>


> Wei, while you are re-spinning this patch set I want to bring up
> something I just noticed in the SCTP code.
>
> The ->dst_saddr() method is not used by anything, it appears.
>
> The ipv4 variant, sctp_v4_dst_saddr() is called internally by the
> ipv4 specific code, but that's it.
>
> So I think the ->dst_saddr member of sctp_pf can be completely
> removed, as can sctp_v6_dst_saddr().
>
> The sctp_v4_dst_saddr() function, of course, will need to be retained.

David, thanks to noticed this. I will cleanup it.

And I have a stupid question about the rule of backport.  Since those
patchs have existed so long time, when I backport those patchs, I'd better
fix the bug in the original patch, or create new patch to fix it? Also how
about some thing need to improvement like the ->dst_saddr() method?

^ permalink raw reply

* Re: [PATCH] net: fix netdev_increment_features()
From: Ben Hutchings @ 2011-04-27  0:26 UTC (permalink / raw)
  To: Michał Mirosław
  Cc: netdev, Jay Vosburgh, Andy Gospodarek, Stephen Hemminger, bridge
In-Reply-To: <20110422163116.3C5C3138DD@rere.qmqm.pl>

On Fri, 2011-04-22 at 18:31 +0200, Michał Mirosław wrote:
> Simplify and fix netdev_increment_features() to conform to what is
> stated in netdevice.h comments about NETIF_F_ONE_FOR_ALL.
> Include FCoE segmentation and VLAN-challedged flags in computation.
> 
> Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
> ---
> 
> I noticed this while converting bridge code to hw_features. After
> adding devices to the bridge all features were always cleared regardless
> of features active in those devices.
> 
>  include/linux/netdevice.h |    7 ++++++-
>  net/core/dev.c            |   37 ++++++++++++-------------------------
>  2 files changed, 18 insertions(+), 26 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 405ce21..86140e4 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1107,7 +1107,12 @@ struct net_device {
>  	 */
>  #define NETIF_F_ONE_FOR_ALL	(NETIF_F_GSO_SOFTWARE | NETIF_F_GSO_ROBUST | \
>  				 NETIF_F_SG | NETIF_F_HIGHDMA |		\
> -				 NETIF_F_FRAGLIST)
> +				 NETIF_F_FRAGLIST | NETIF_F_VLAN_CHALLENGED)

I think we can also add NETIF_F_HW_VLAN_TX, now that
dev_hard_start_xmit() handles VLAN tag insertion.

> +	/*
> +	 * If one device doesn't support one of these features, then disable it
> +	 * for all in netdev_increment_features.
> +	 */
> +#define NETIF_F_ALL_FOR_ALL	(NETIF_F_NOCACHE_COPY | NETIF_F_FSO)

Shouldn't most RX offloads be included in this?  Though it doesn't
really matter that much.

>  	/* changeable features with no special hardware requirements */
>  #define NETIF_F_SOFT_FEATURES	(NETIF_F_GSO | NETIF_F_GRO)
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 3871bf6..1a2c91c 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -6184,33 +6184,20 @@ static int dev_cpu_callback(struct notifier_block *nfb,
>   */
>  u32 netdev_increment_features(u32 all, u32 one, u32 mask)
>  {
> +	if (mask & NETIF_F_GEN_CSUM)
> +		mask |= NETIF_F_ALL_CSUM;
> +	mask |= NETIF_F_VLAN_CHALLENGED;
> +
> +	all |= one & (NETIF_F_ONE_FOR_ALL|NETIF_F_ALL_CSUM) & mask;
> +	all &= one | ~NETIF_F_ALL_FOR_ALL;
> +
>  	/* If device needs checksumming, downgrade to it. */
> -	if (all & NETIF_F_NO_CSUM && !(one & NETIF_F_NO_CSUM))
> -		all ^= NETIF_F_NO_CSUM | (one & NETIF_F_ALL_CSUM);
> -	else if (mask & NETIF_F_ALL_CSUM) {
> -		/* If one device supports v4/v6 checksumming, set for all. */
> -		if (one & (NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM) &&
> -		    !(all & NETIF_F_GEN_CSUM)) {
> -			all &= ~NETIF_F_ALL_CSUM;
> -			all |= one & (NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM);
> -		}
> +	if (all & (NETIF_F_ALL_CSUM & ~NETIF_F_NO_CSUM))
> +		all &= ~NETIF_F_NO_CSUM;
>  
> -		/* If one device supports hw checksumming, set for all. */
> -		if (one & NETIF_F_GEN_CSUM && !(all & NETIF_F_GEN_CSUM)) {
> -			all &= ~NETIF_F_ALL_CSUM;
> -			all |= NETIF_F_HW_CSUM;
> -		}
> -	}
> -
> -	/* If device can't no cache copy, don't do for all */
> -	if (!(one & NETIF_F_NOCACHE_COPY))
> -		all &= ~NETIF_F_NOCACHE_COPY;
> -
> -	one |= NETIF_F_ALL_CSUM;
> -
> -	one |= all & NETIF_F_ONE_FOR_ALL;
> -	all &= one | NETIF_F_LLTX | NETIF_F_GSO | NETIF_F_UFO;
> -	all |= one & mask & NETIF_F_ONE_FOR_ALL;
> +	/* If one device supports hw checksumming, set for all. */
> +	if (all & NETIF_F_GEN_CSUM)
> +		all &= ~(NETIF_F_ALL_CSUM & ~NETIF_F_GEN_CSUM);
>  
>  	return all;
>  }

Well it might be correct, but it's hard to tell.  Can we see your unit
tests, please?

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* RE: Kernel crash after using new Intel NIC (igb)
From: Wyborny, Carolyn @ 2011-04-26 23:34 UTC (permalink / raw)
  To: Maximilian Engelhardt, linux-kernel@vger.kernel.org
  Cc: netdev@vger.kernel.org, StuStaNet Vorstand,
	e1000-devel@lists.sourceforge.net
In-Reply-To: <201104250033.03401.maxi@daemonizer.de>



>-----Original Message-----
>From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
>On Behalf Of Maximilian Engelhardt
>Sent: Sunday, April 24, 2011 3:33 PM
>To: linux-kernel@vger.kernel.org
>Cc: netdev@vger.kernel.org; StuStaNet Vorstand
>Subject: Kernel crash after using new Intel NIC (igb)
>
>Hello,
>
>some time ago we switched some of our servers to a new networking card
>that
>uses the Intel igb driver. Since that time we see regular kernel
>crashes.
>The crashes happen at very irregular intervals, sometimes after a week
>uptime,
>sometimes after a month or even more. They seem to be independent of the
>server load as they also happen in the night when there is low traffic.
>
>The affected server is used as a NAT device with some iptables rules and
>serves
>about 2000 people.
>
>Attached are two logs of the crashes as well as the output of dmesg,
>lspci,
>and /proc/interrupts as well as the used kernel config.
>
>I have no idea what might be wrong but I think it is a kernel bug.
>Perhaps
>someone with more knowledge has a clue.
>
>If needed I can provide additional information or build different
>kernels.
>
>Greetings,
>Maxi

Hello,

I'm sorry you're having crashes since installing our NIC.  Thank you for the data.  I haven't had a chance to review it carefully yet, but it looks to me like the crashes have us in the stack sometimes and sometimes not.  I need to do a bit more research and will need some more information.  Can I get an ethtool -i eth# for the device and also lspci -vvv for the platform its installed on.

If you open an issue at SourceForge we will have a place to keep the logs.

I will research this a bit more and get back to you tomorrow my time.

Thanks,

Carolyn
Carolyn Wyborny
Linux Development
LAN Access Division
Intel Corporation





^ permalink raw reply

* RE: Strange igb bug, out-of-tree driver seems to work fine.
From: Wyborny, Carolyn @ 2011-04-26 23:23 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev
In-Reply-To: <4DB0F965.4080605@candelatech.com>



>-----Original Message-----
>From: Ben Greear [mailto:greearb@candelatech.com]
>Sent: Thursday, April 21, 2011 8:44 PM
>To: Wyborny, Carolyn
>Cc: netdev
>Subject: Re: Strange igb bug, out-of-tree driver seems to work fine.
>
>On 04/21/2011 05:16 PM, Wyborny, Carolyn wrote:
>>
>>
>>> -----Original Message-----
>>> From: netdev-owner@vger.kernel.org [mailto:netdev-
>owner@vger.kernel.org]
>>> On Behalf Of Ben Greear
>>> Sent: Thursday, April 21, 2011 2:39 PM
>>> To: netdev
>>> Subject: Strange igb bug, out-of-tree driver seems to work fine.
>>>
>>> We have a 4-port NIC using the igb driver in a couple of systems,
>>> and saw some strange issues (mostly rx CRC errors).  We tried 2.6.34,
>>> 2.6.36, and 2.6.38 based kernels and all showed issues.  The NICs
>>> are from two different vendors, though both use the igb driver.
>>>
>>> We then tried 2.6.36.3 with the out-of-tree igb-2.4.13.tar.gz
>>> driver.  And everything seems to run clean.
>>>
>>> My kernels are somewhat hacked and tainted, and this testing
>>> is using my tainting module, but since changing only the driver
>>> seems to fix things, it's _probably_ not my fault this time!
>>>
>>> I am running a small hack to make igb work better with VLANs
>>> in my 2.6.36 and .38 kernel, though not in the .34.  Here is the .36
>>> diff:
>
>>
>> Hello,
>>
>> There have been recent patches for igb that I submitted that might
>contain the fix you found in the out-of-tree driver, although I'm not
>sure which one would affect rx CRC errors.  Were there other symptoms?
>Any messages in the log?  I'll look through the patches and see if there
>is anything obvious and get you some commit numbers to try.
>
>There was nothing else that I noticed.  If your patches are in 2.6.39-
>rc4+, I can try
>them there perhaps?
>
>Thanks,
>Ben
>
>
>--
>Ben Greear <greearb@candelatech.com>
>Candela Technologies Inc  http://www.candelatech.com


Hello,

I'm sorry for the delay in responding.  I'm really scratching my head on this one as we don't do much in the driver that affects what we get on receive.  I've seen situations where some switches end up transmitting more of these and then we record more of them, but I'm guessing you're testing with the same equipment, just a different driver version.  Let me know if I'm mistaken there.  

So, to answer your question, I believe my patches are there, but I did review them again and I'm not sure they will make any difference.  My latest batch of patches was to add features to the i350 device specifically.

Give it try though and let me know if you see any difference with 2.6.39-rc4+.

Thanks,

Carolyn

Carolyn Wyborny
Linux Development
LAN Access Division
Intel Corporation




^ permalink raw reply

* Re: [PATCH 09/13] netvm: Set PF_MEMALLOC as appropriate during SKB processing
From: NeilBrown @ 2011-04-26 23:22 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Linux-MM, Linux-Netdev, LKML, David Miller, Peter Zijlstra
In-Reply-To: <20110426141048.GG4658@suse.de>

On Tue, 26 Apr 2011 15:10:48 +0100 Mel Gorman <mgorman@suse.de> wrote:

> On Tue, Apr 26, 2011 at 10:21:57PM +1000, NeilBrown wrote:
> > On Tue, 26 Apr 2011 08:36:50 +0100 Mel Gorman <mgorman@suse.de> wrote:
> > 
> > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > index 3871bf6..2d79a20 100644
> > > --- a/net/core/dev.c
> > > +++ b/net/core/dev.c
> > > @@ -3095,6 +3095,27 @@ static void vlan_on_bond_hook(struct sk_buff *skb)
> > >  	}
> > >  }
> > >  
> > > +/*
> > > + * Limit which protocols can use the PFMEMALLOC reserves to those that are
> > > + * expected to be used for communication with swap.
> > > + */
> > > +static bool skb_pfmemalloc_protocol(struct sk_buff *skb)
> > > +{
> > > +	if (skb_pfmemalloc(skb))
> > > +		switch (skb->protocol) {
> > > +		case __constant_htons(ETH_P_ARP):
> > > +		case __constant_htons(ETH_P_IP):
> > > +		case __constant_htons(ETH_P_IPV6):
> > > +		case __constant_htons(ETH_P_8021Q):
> > > +			break;
> > > +
> > > +		default:
> > > +			return false;
> > > +		}
> > > +
> > > +	return true;
> > > +}
> > 
> > This sort of thing really bugs me :-)
> > Neither the comment nor the function name actually describe what the function
> > is doing.  The function is checking *2* things.
> >    is_pfmemalloc_skb_or_pfmemalloc_protocol()
> > might be a more correct name, but is too verbose.
> > 
> > I would prefer the skb_pfmemalloc test were removed from here and ....
> > 
> > > +	if (!skb_pfmemalloc_protocol(skb))
> > > +		goto drop;
> > > +
> > 
> > ...added here so this becomes:
> > 
> >       if (!skb_pfmemalloc(skb) && !skb_pfmemalloc_protocol(skb))
> >                 goto drop;
> > 
> > which actually makes sense.
> > 
> 
> Moving the check is neater but that check should be
> 
> 	if (skb_pfmemalloc(skb) && !skb_pfmemalloc_protocol(skb))
> 
> ? It's only if the skb was allocated from emergency reserves that we
> need to consider dropping it to make way for other packets to be
> received.
> 

Correct.  I got my Boolean algebra all confused. Sorry 'bout that.

NeilBrown

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 12/13] mm: Throttle direct reclaimers if PF_MEMALLOC reserves are low and swap is backed by network storage
From: NeilBrown @ 2011-04-26 23:18 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Linux-MM, Linux-Netdev, LKML, David Miller, Peter Zijlstra
In-Reply-To: <20110426142624.GH4658@suse.de>

On Tue, 26 Apr 2011 15:26:24 +0100 Mel Gorman <mgorman@suse.de> wrote:

> On Tue, Apr 26, 2011 at 10:30:59PM +1000, NeilBrown wrote:
> > On Tue, 26 Apr 2011 08:36:53 +0100 Mel Gorman <mgorman@suse.de> wrote:
> > 
> > 
> > > +/*
> > > + * Throttle direct reclaimers if backing storage is backed by the network
> > > + * and the PFMEMALLOC reserve for the preferred node is getting dangerously
> > > + * depleted. kswapd will continue to make progress and wake the processes
> > > + * when the low watermark is reached
> > > + */
> > > +static void throttle_direct_reclaim(gfp_t gfp_mask, struct zonelist *zonelist,
> > > +					nodemask_t *nodemask)
> > > +{
> > > +	struct zone *zone;
> > > +	int high_zoneidx = gfp_zone(gfp_mask);
> > > +	DEFINE_WAIT(wait);
> > > +
> > > +	/* Check if the pfmemalloc reserves are ok */
> > > +	first_zones_zonelist(zonelist, high_zoneidx, NULL, &zone);
> > > +	prepare_to_wait(&zone->zone_pgdat->pfmemalloc_wait, &wait,
> > > +							TASK_INTERRUPTIBLE);
> > > +	if (pfmemalloc_watermark_ok(zone->zone_pgdat, high_zoneidx))
> > > +		goto out;
> > > +
> > > +	/* Throttle */
> > > +	do {
> > > +		schedule();
> > > +		finish_wait(&zone->zone_pgdat->pfmemalloc_wait, &wait);
> > > +		prepare_to_wait(&zone->zone_pgdat->pfmemalloc_wait, &wait,
> > > +							TASK_INTERRUPTIBLE);
> > > +	} while (!pfmemalloc_watermark_ok(zone->zone_pgdat, high_zoneidx) &&
> > > +			!fatal_signal_pending(current));
> > > +
> > > +out:
> > > +	finish_wait(&zone->zone_pgdat->pfmemalloc_wait, &wait);
> > > +}
> > 
> > You are doing an interruptible wait, but only checking for fatal signals.
> > So if a non-fatal signal arrives, you will busy-wait.
> > 
> > So I suspect you want TASK_KILLABLE, so just use:
> > 
> >     wait_event_killable(zone->zone_pgdat->pfmemalloc_wait,
> >                         pgmemalloc_watermark_ok(zone->zone_pgdata,
> >                                                 high_zoneidx));
> > 
> 
> Well, if a normal signal arrives, we do not necessarily want the
> process to enter reclaim. For fatal signals, I allow it to continue
> because it's not likely to be putting the system under more pressure
> if it's exiting.

Yep, I understand that and it doesn't seem unreasonable.

However I don't think the code implements that correctly.

If you get a non-fatal signal, schedule will exit immediately (because of the
TASK_INTERRUPTIBLE setting) and the 'while' clause will succeed because the
signal is not fatal, so it will loop around and try to schedule again, which
will again exit immediately - busy loop.

> 
> > (You also have an extraneous call to finish_wait)
> > 
> 
> Which one? I'm not seeing a flow where finish_wait gets called twice
> without a prepare_to_wait in between. 
> 

You don't need to call finish_wait immediately before prepare_to_wait.

It really is best to just use the appropriate 'wait_event*' macro....

NeilBrown

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [PATCH 4/4] ipv4: Kill RTO_CONN.
From: David Miller @ 2011-04-26 22:12 UTC (permalink / raw)
  To: netdev


It's not used by anything in the kernel, and defined in net/route.h so
never exported to userspace.

Therefore we can safely remove it.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/route.h |    4 ----
 1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/include/net/route.h b/include/net/route.h
index 79530da..fdbdb92 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -37,10 +37,6 @@
 
 #define RTO_ONLINK	0x01
 
-#define RTO_CONN	0
-/* RTO_CONN is not used (being alias for 0), but preserved not to break
- * some modules referring to it. */
-
 #define RT_CONN_FLAGS(sk)   (RT_TOS(inet_sk(sk)->tos) | sock_flag(sk, SOCK_LOCALROUTE))
 
 struct fib_nh;
-- 
1.7.4.5


^ permalink raw reply related

* [PATCH 3/4] ipv4: Remove erroneous check in igmpv3_newpack() and igmp_send_report().
From: David Miller @ 2011-04-26 22:12 UTC (permalink / raw)
  To: netdev


Output route resolution never returns a route with rt_src set to zero
(which is INADDR_ANY).

Even if the flow key for the output route lookup specifies INADDR_ANY
for the source address, the output route resolution chooses a real
source address to use in the final route.

This test has existed forever in igmp_send_report() and David Stevens
simply copied over the erroneous test when implementing support for
IGMPv3.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv4/igmp.c |   10 ----------
 1 files changed, 0 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index 1fd3d9c..8ae0a57 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -328,11 +328,6 @@ static struct sk_buff *igmpv3_newpack(struct net_device *dev, int size)
 		kfree_skb(skb);
 		return NULL;
 	}
-	if (rt->rt_src == 0) {
-		kfree_skb(skb);
-		ip_rt_put(rt);
-		return NULL;
-	}
 
 	skb_dst_set(skb, &rt->dst);
 	skb->dev = dev;
@@ -670,11 +665,6 @@ static int igmp_send_report(struct in_device *in_dev, struct ip_mc_list *pmc,
 	if (IS_ERR(rt))
 		return -1;
 
-	if (rt->rt_src == 0) {
-		ip_rt_put(rt);
-		return -1;
-	}
-
 	skb = alloc_skb(IGMP_SIZE+LL_ALLOCATED_SPACE(dev), GFP_ATOMIC);
 	if (skb == NULL) {
 		ip_rt_put(rt);
-- 
1.7.4.5


^ permalink raw reply related

* [PATCH 2/4] ipv4: Sanitize and simplify ip_route_{connect,newports}()
From: David Miller @ 2011-04-26 22:12 UTC (permalink / raw)
  To: netdev


These functions are used together as a unit for route resolution
during connect().  They address the chicken-and-egg problem that
exists when ports need to be allocated during connect() processing,
yet such port allocations require addressing information from the
routing code.

It's currently more heavy handed than it needs to be, and in
particular we allocate and initialize a flow object twice.

Let the callers provide the on-stack flow object.  That way we only
need to initialize it once in the ip_route_connect() call.

Later, if ip_route_newports() needs to do anything, it re-uses that
flow object as-is except for the ports which it updates before the
route re-lookup.

Also, describe why this set of facilities are needed and how it works
in a big comment.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/route.h |   88 ++++++++++++++++++++++++++++++++------------------
 net/dccp/ipv4.c     |   10 +++---
 net/ipv4/af_inet.c  |    3 +-
 net/ipv4/datagram.c |    3 +-
 net/ipv4/tcp_ipv4.c |   10 +++---
 net/l2tp/l2tp_ip.c  |    8 ++--
 6 files changed, 74 insertions(+), 48 deletions(-)

diff --git a/include/net/route.h b/include/net/route.h
index 3684c3e..79530da 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -217,17 +217,37 @@ static inline char rt_tos2priority(u8 tos)
 	return ip_tos2prio[IPTOS_TOS(tos)>>1];
 }
 
-static inline struct rtable *ip_route_connect(__be32 dst, __be32 src, u32 tos,
-					      int oif, u8 protocol,
-					      __be16 sport, __be16 dport,
-					      struct sock *sk, bool can_sleep)
+/* ip_route_connect() and ip_route_newports() work in tandem whilst
+ * binding a socket for a new outgoing connection.
+ *
+ * In order to use IPSEC properly, we must, in the end, have a
+ * route that was looked up using all available keys including source
+ * and destination ports.
+ *
+ * However, if a source port needs to be allocated (the user specified
+ * a wildcard source port) we need to obtain addressing information
+ * in order to perform that allocation.
+ *
+ * So ip_route_connect() looks up a route using wildcarded source and
+ * destination ports in the key, simply so that we can get a pair of
+ * addresses to use for port allocation.
+ *
+ * Later, once the ports are allocated, ip_route_newports() will make
+ * another route lookup if needed to make sure we catch any IPSEC
+ * rules keyed on the port information.
+ *
+ * The callers allocate the flow key on their stack, and must pass in
+ * the same flowi4 object to both the ip_route_connect() and the
+ * ip_route_newports() calls.
+ */
+
+static inline void ip_route_connect_init(struct flowi4 *fl4, __be32 dst, __be32 src,
+					 u32 tos, int oif, u8 protocol,
+					 __be16 sport, __be16 dport,
+					 struct sock *sk, bool can_sleep)
 {
-	struct net *net = sock_net(sk);
-	struct rtable *rt;
-	struct flowi4 fl4;
-	__u8 flow_flags;
+	__u8 flow_flags = 0;
 
-	flow_flags = 0;
 	if (inet_sk(sk)->transparent)
 		flow_flags |= FLOWI_FLAG_ANYSRC;
 	if (protocol == IPPROTO_TCP)
@@ -235,41 +255,45 @@ static inline struct rtable *ip_route_connect(__be32 dst, __be32 src, u32 tos,
 	if (can_sleep)
 		flow_flags |= FLOWI_FLAG_CAN_SLEEP;
 
-	flowi4_init_output(&fl4, oif, sk->sk_mark, tos, RT_SCOPE_UNIVERSE,
+	flowi4_init_output(fl4, oif, sk->sk_mark, tos, RT_SCOPE_UNIVERSE,
 			   protocol, flow_flags, dst, src, dport, sport);
+}
+
+static inline struct rtable *ip_route_connect(struct flowi4 *fl4,
+					      __be32 dst, __be32 src, u32 tos,
+					      int oif, u8 protocol,
+					      __be16 sport, __be16 dport,
+					      struct sock *sk, bool can_sleep)
+{
+	struct net *net = sock_net(sk);
+	struct rtable *rt;
+
+	ip_route_connect_init(fl4, dst, src, tos, oif, protocol,
+			      sport, dport, sk, can_sleep);
 
 	if (!dst || !src) {
-		rt = __ip_route_output_key(net, &fl4);
+		rt = __ip_route_output_key(net, fl4);
 		if (IS_ERR(rt))
 			return rt;
-		fl4.daddr = rt->rt_dst;
-		fl4.saddr = rt->rt_src;
+		fl4->daddr = rt->rt_dst;
+		fl4->saddr = rt->rt_src;
 		ip_rt_put(rt);
 	}
-	security_sk_classify_flow(sk, flowi4_to_flowi(&fl4));
-	return ip_route_output_flow(net, &fl4, sk);
+	security_sk_classify_flow(sk, flowi4_to_flowi(fl4));
+	return ip_route_output_flow(net, fl4, sk);
 }
 
-static inline struct rtable *ip_route_newports(struct rtable *rt,
-					       u8 protocol, __be16 orig_sport,
-					       __be16 orig_dport, __be16 sport,
-					       __be16 dport, struct sock *sk)
+static inline struct rtable *ip_route_newports(struct flowi4 *fl4, struct rtable *rt,
+					       __be16 orig_sport, __be16 orig_dport,
+					       __be16 sport, __be16 dport,
+					       struct sock *sk)
 {
 	if (sport != orig_sport || dport != orig_dport) {
-		struct flowi4 fl4;
-		__u8 flow_flags;
-
-		flow_flags = 0;
-		if (inet_sk(sk)->transparent)
-			flow_flags |= FLOWI_FLAG_ANYSRC;
-		if (protocol == IPPROTO_TCP)
-			flow_flags |= FLOWI_FLAG_PRECOW_METRICS;
-		flowi4_init_output(&fl4, rt->rt_oif, rt->rt_mark, rt->rt_tos,
-				   RT_SCOPE_UNIVERSE, protocol, flow_flags,
-				   rt->rt_dst, rt->rt_src, dport, sport);
+		fl4->fl4_dport = dport;
+		fl4->fl4_sport = sport;
 		ip_rt_put(rt);
-		security_sk_classify_flow(sk, flowi4_to_flowi(&fl4));
-		return ip_route_output_flow(sock_net(sk), &fl4, sk);
+		security_sk_classify_flow(sk, flowi4_to_flowi(fl4));
+		return ip_route_output_flow(sock_net(sk), fl4, sk);
 	}
 	return rt;
 }
diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index ae451c6..b92ab65 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -40,12 +40,13 @@
 
 int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 {
+	const struct sockaddr_in *usin = (struct sockaddr_in *)uaddr;
 	struct inet_sock *inet = inet_sk(sk);
 	struct dccp_sock *dp = dccp_sk(sk);
-	const struct sockaddr_in *usin = (struct sockaddr_in *)uaddr;
 	__be16 orig_sport, orig_dport;
-	struct rtable *rt;
 	__be32 daddr, nexthop;
+	struct flowi4 fl4;
+	struct rtable *rt;
 	int err;
 
 	dp->dccps_role = DCCP_ROLE_CLIENT;
@@ -65,7 +66,7 @@ int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 
 	orig_sport = inet->inet_sport;
 	orig_dport = usin->sin_port;
-	rt = ip_route_connect(nexthop, inet->inet_saddr,
+	rt = ip_route_connect(&fl4, nexthop, inet->inet_saddr,
 			      RT_CONN_FLAGS(sk), sk->sk_bound_dev_if,
 			      IPPROTO_DCCP,
 			      orig_sport, orig_dport, sk, true);
@@ -101,8 +102,7 @@ int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 	if (err != 0)
 		goto failure;
 
-	rt = ip_route_newports(rt, IPPROTO_DCCP,
-			       orig_sport, orig_dport,
+	rt = ip_route_newports(&fl4, rt, orig_sport, orig_dport,
 			       inet->inet_sport, inet->inet_dport, sk);
 	if (IS_ERR(rt)) {
 		rt = NULL;
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index cae75ef..0413af3 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1103,6 +1103,7 @@ static int inet_sk_reselect_saddr(struct sock *sk)
 	struct inet_sock *inet = inet_sk(sk);
 	__be32 old_saddr = inet->inet_saddr;
 	__be32 daddr = inet->inet_daddr;
+	struct flowi4 fl4;
 	struct rtable *rt;
 	__be32 new_saddr;
 
@@ -1110,7 +1111,7 @@ static int inet_sk_reselect_saddr(struct sock *sk)
 		daddr = inet->opt->faddr;
 
 	/* Query new route. */
-	rt = ip_route_connect(daddr, 0, RT_CONN_FLAGS(sk),
+	rt = ip_route_connect(&fl4, daddr, 0, RT_CONN_FLAGS(sk),
 			      sk->sk_bound_dev_if, sk->sk_protocol,
 			      inet->inet_sport, inet->inet_dport, sk, false);
 	if (IS_ERR(rt))
diff --git a/net/ipv4/datagram.c b/net/ipv4/datagram.c
index 85bd24c..216ba23 100644
--- a/net/ipv4/datagram.c
+++ b/net/ipv4/datagram.c
@@ -24,6 +24,7 @@ int ip4_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 {
 	struct inet_sock *inet = inet_sk(sk);
 	struct sockaddr_in *usin = (struct sockaddr_in *) uaddr;
+	struct flowi4 fl4;
 	struct rtable *rt;
 	__be32 saddr;
 	int oif;
@@ -46,7 +47,7 @@ int ip4_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 		if (!saddr)
 			saddr = inet->mc_addr;
 	}
-	rt = ip_route_connect(usin->sin_addr.s_addr, saddr,
+	rt = ip_route_connect(&fl4, usin->sin_addr.s_addr, saddr,
 			      RT_CONN_FLAGS(sk), oif,
 			      sk->sk_protocol,
 			      inet->inet_sport, usin->sin_port, sk, true);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index edf18bd..310454c 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -146,12 +146,13 @@ EXPORT_SYMBOL_GPL(tcp_twsk_unique);
 /* This will initiate an outgoing connection. */
 int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 {
+	struct sockaddr_in *usin = (struct sockaddr_in *)uaddr;
 	struct inet_sock *inet = inet_sk(sk);
 	struct tcp_sock *tp = tcp_sk(sk);
-	struct sockaddr_in *usin = (struct sockaddr_in *)uaddr;
 	__be16 orig_sport, orig_dport;
-	struct rtable *rt;
 	__be32 daddr, nexthop;
+	struct flowi4 fl4;
+	struct rtable *rt;
 	int err;
 
 	if (addr_len < sizeof(struct sockaddr_in))
@@ -169,7 +170,7 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 
 	orig_sport = inet->inet_sport;
 	orig_dport = usin->sin_port;
-	rt = ip_route_connect(nexthop, inet->inet_saddr,
+	rt = ip_route_connect(&fl4, nexthop, inet->inet_saddr,
 			      RT_CONN_FLAGS(sk), sk->sk_bound_dev_if,
 			      IPPROTO_TCP,
 			      orig_sport, orig_dport, sk, true);
@@ -236,8 +237,7 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 	if (err)
 		goto failure;
 
-	rt = ip_route_newports(rt, IPPROTO_TCP,
-			       orig_sport, orig_dport,
+	rt = ip_route_newports(&fl4, rt, orig_sport, orig_dport,
 			       inet->inet_sport, inet->inet_dport, sk);
 	if (IS_ERR(rt)) {
 		err = PTR_ERR(rt);
diff --git a/net/l2tp/l2tp_ip.c b/net/l2tp/l2tp_ip.c
index fce9bd3..cc67367 100644
--- a/net/l2tp/l2tp_ip.c
+++ b/net/l2tp/l2tp_ip.c
@@ -296,12 +296,12 @@ out_in_use:
 
 static int l2tp_ip_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 {
-	int rc;
-	struct inet_sock *inet = inet_sk(sk);
 	struct sockaddr_l2tpip *lsa = (struct sockaddr_l2tpip *) uaddr;
+	struct inet_sock *inet = inet_sk(sk);
+	struct flowi4 fl4;
 	struct rtable *rt;
 	__be32 saddr;
-	int oif;
+	int oif, rc;
 
 	rc = -EINVAL;
 	if (addr_len < sizeof(*lsa))
@@ -320,7 +320,7 @@ static int l2tp_ip_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len
 	if (ipv4_is_multicast(lsa->l2tp_addr.s_addr))
 		goto out;
 
-	rt = ip_route_connect(lsa->l2tp_addr.s_addr, saddr,
+	rt = ip_route_connect(&fl4, lsa->l2tp_addr.s_addr, saddr,
 			      RT_CONN_FLAGS(sk), oif,
 			      IPPROTO_L2TP,
 			      0, 0, sk, true);
-- 
1.7.4.5


^ permalink raw reply related

* [PATCH 1/4] ipv4: Respect 'saddr' and 'daddr' args to ip_build_and_send_pkt().
From: David Miller @ 2011-04-26 22:12 UTC (permalink / raw)
  To: netdev


This function ignores the passed in addresses and forces their
settings using rt->rt_dst and rt->rt_src.

There is never a reason to do this, because the socket of the
callers of this function must know what addresses it is using.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv4/ip_output.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index bdad3d6..e0d0d5d 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -158,8 +158,8 @@ int ip_build_and_send_pkt(struct sk_buff *skb, struct sock *sk,
 	else
 		iph->frag_off = 0;
 	iph->ttl      = ip_select_ttl(inet, &rt->dst);
-	iph->daddr    = rt->rt_dst;
-	iph->saddr    = rt->rt_src;
+	iph->daddr    = daddr;
+	iph->saddr    = saddr;
 	iph->protocol = sk->sk_protocol;
 	ip_select_ident(iph, &rt->dst, sk);
 
-- 
1.7.4.5


^ permalink raw reply related

* [PATCH 0/4] Minor ipv4 routing cleanups.
From: David Miller @ 2011-04-26 22:11 UTC (permalink / raw)
  To: netdev


Just a few cleanups and simplifications I came across while
auditing some aspects of the ipv4 routing code.

I'm going to do some more testing and validation on these changes
before I actually push them out to net-next-2.6

Signed-off-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* rtl8169 related kernel splat in 2.6.38.4
From: Ben Greear @ 2011-04-26 22:03 UTC (permalink / raw)
  To: netdev

I see this on every (or at least most) boots:

------------[ cut here ]------------
WARNING: at /home/greearb/git/linux-2.6.dev.38.y/kernel/timer.c:983 del_timer_sync+0x25/0x37()
Hardware name: To Be Filled By O.E.M.
Modules linked in: veth 8021q garp stp llc fuse macvlan wanlink(P) pktgen coretemp hwmon nfs lockd fscache nfs_acl auth_rpcgss sunrpc ipv6 uinput arc4 ecb 
snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq ath9k snd_seq_device mac80211 snd_pcm ath9k_common ath9k_hw snd_timer ath cfg80211 snd r8169 
iTCO_wdt soundcore iTCO_vendor_support microcode i2c_i801 snd_page_alloc serio_raw mii pcspkr i915 drm_kms_helper drm i2c_algo_bit video [last unloaded: 
scsi_wait_scan]
Pid: 3224, comm: ifup Tainted: P            2.6.38.4+ #6
Call Trace:
  [<c043094b>] ? warn_slowpath_common+0x65/0x7a
  [<c04398a0>] ? del_timer_sync+0x25/0x37
  [<c043096f>] ? warn_slowpath_null+0xf/0x13
  [<c04398a0>] ? del_timer_sync+0x25/0x37
  [<c06e94f3>] ? linkwatch_schedule_work+0x68/0x83
  [<c06e95b6>] ? linkwatch_fire_event+0xa8/0xad
  [<c06ef193>] ? netif_carrier_on+0x23/0x34
  [<f8af1b32>] ? __rtl8169_check_link_status+0x4f/0xaf [r8169]
  [<f8af2376>] ? rtl8169_interrupt+0x1e3/0x287 [r8169]
  [<c046c87c>] ? handle_IRQ_event+0x1d/0x9c
  [<c046e130>] ? handle_edge_irq+0xba/0x100
  [<c046e076>] ? handle_edge_irq+0x0/0x100
  <IRQ>  [<c0404026>] ? do_IRQ+0x37/0x90
  [<c04035e9>] ? common_interrupt+0x29/0x30
  [<c0491283>] ? unmap_vmas+0x477/0x5b7
  [<c049298b>] ? __do_fault+0x2f3/0x323
  [<c0495676>] ? exit_mmap+0x7a/0xc3
  [<c042ecd1>] ? mmput+0x61/0xd5
  [<c0432430>] ? exit_mm+0x103/0x10b
  [<c0433cd9>] ? do_exit+0x1b4/0x5cc
  [<c0580fed>] ? copy_to_user+0x2f/0x106
  [<c043a88c>] ? sys_rt_sigaction+0x64/0x77
  [<c0434153>] ? do_group_exit+0x62/0x85
  [<c0772373>] ? do_device_not_available+0x0/0x1a
  [<c0434189>] ? sys_exit_group+0x13/0x17
  [<c040301c>] ? sysenter_do_call+0x12/0x28
---[ end trace da518b58347d6cca ]---

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply

* Re: [PATCH] iproute2: improve mqprio inputs for queue offsets and counts
From: Stephen Hemminger @ 2011-04-26 22:00 UTC (permalink / raw)
  To: John Fastabend; +Cc: bhutchings, netdev
In-Reply-To: <20110426194441.23726.23489.stgit@jf-dev1-dcblab>

On Tue, 26 Apr 2011 12:44:42 -0700
John Fastabend <john.r.fastabend@intel.com> wrote:

> This changes mqprio input format to be more user friendly.
> 
> Old usage,
> 
> # ./tc/tc qdisc add dev eth3 root mqprio help
> Usage: ... mqprio [num_tc NUMBER] [map P0 P1...]
>                   [offset txq0 txq1 ...] [count cnt0 cnt1 ...] [hw 1|0]
> 
> New uage,
> 
> # ./tc/tc qdisc add dev eth3 root mqprio help
> Usage: ... mqprio [num_tc NUMBER] [map P0 P1 ...]
>                   [queues count1@offset1 count2@offset2 ...] [hw 1|0]
> 
> Suggested-by: Ben Hutchings <bhutchings@solarflare.com>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>

Applied (fixed typo in commit message)

-- 

^ permalink raw reply

* RE: [RFC PATCH] netlink: Increase netlink dump skb message size
From: Rose, Gregory V @ 2011-04-26 21:58 UTC (permalink / raw)
  To: Ben Hutchings, Eric Dumazet; +Cc: David Miller, netdev@vger.kernel.org
In-Reply-To: <1303844836.2850.6.camel@bwh-desktop>

> -----Original Message-----
> From: Ben Hutchings [mailto:bhutchings@solarflare.com]
> Sent: Tuesday, April 26, 2011 12:07 PM
> To: Rose, Gregory V; Eric Dumazet
> Cc: David Miller; netdev@vger.kernel.org
> Subject: RE: [RFC PATCH] netlink: Increase netlink dump skb message size
> 
> On Tue, 2011-04-26 at 09:24 -0700, Rose, Gregory V wrote:
> > > -----Original Message-----
> > > From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> > > Sent: Tuesday, April 26, 2011 9:21 AM
> > > To: Rose, Gregory V
> > > Cc: David Miller; netdev@vger.kernel.org; bhutchings@solarflare.com
> > > Subject: RE: [RFC PATCH] netlink: Increase netlink dump skb message
> size
> > >
> > > Le mardi 26 avril 2011 à 09:12 -0700, Rose, Gregory V a écrit :
> > >
> > > > I'm fine with however you folks want to approach this, just give me
> some
> > > direction.
> > >
> > > I would just try following patch :
> >
> > I'll test it out, it's certainly a lot simpler.  Perhaps I was getting a
> bit too fancy.
> >
> > Ben would want to make sure it works for 127 VFs, my device does 63.
> 
> I haven't been working on SR-IOV myself so I'll pass this on to a
> colleague.  Thanks, Eric.

Eric's patch works fine for the case of 63 VFs.  For most dumps it's allocating quite a bit more memory than necessary but that's not a big issue since it's transient and not held for long.

- Greg



^ permalink raw reply

* Re: [PATCH net-next-2.6 0/7] SCTP updates for net-next-2.6
From: David Miller @ 2011-04-26 21:51 UTC (permalink / raw)
  To: yjwei; +Cc: netdev, linux-sctp
In-Reply-To: <20110426.001238.183056292.davem@davemloft.net>


Wei, while you are re-spinning this patch set I want to bring up
something I just noticed in the SCTP code.

The ->dst_saddr() method is not used by anything, it appears.

The ipv4 variant, sctp_v4_dst_saddr() is called internally by the
ipv4 specific code, but that's it.

So I think the ->dst_saddr member of sctp_pf can be completely
removed, as can sctp_v6_dst_saddr().

The sctp_v4_dst_saddr() function, of course, will need to be retained.

^ permalink raw reply

* Re: Linux TCP's Robustness to Multipath Packet Reordering
From: Dominik Kaspar @ 2011-04-26 21:27 UTC (permalink / raw)
  To: John Heffner; +Cc: Eric Dumazet, Carsten Wolff, netdev
In-Reply-To: <BANLkTikoQfbSxnJi_OR+N6sa5iVNcTO6Ug@mail.gmail.com>

Hi John,

Thanks for your advice. I am very well aware that TCP is not designed
to work under such conditions. I am still surprised how well Linux TCP
handles many situations of excessive, persistent packet reordering. In
scenarios of fairly heterogeneous path characteristics, Linux TCP
aggregates multiple paths close to ideally :-)

If I'm not mistaken, cwnd moderation is a measure to prevent TCP from
sending large bursts if a single ACK covers many segments. In what way
can cwnd moderation prevent TCP from increasing its estimate of packet
reordering?

Greetings,
Dominik

On Tue, Apr 26, 2011 at 10:16 PM, John Heffner <johnwheffner@gmail.com> wrote:
> First, TCP is definitely not designed to work under such conditions.
> For example, assumptions behind RTO calculation and fast retransmit
> heuristics are violated.  However, in this particular case my first
> guess is that you are being limited by "cwnd moderation," which was
> the topic of recent discussion here.  Under persistent reordering,
> cwnd moderation can inhibit the ability of cwnd to grow.
>
> Thanks,
>  -John
>
>
> On Tue, Apr 26, 2011 at 2:00 PM, Dominik Kaspar <dokaspar.ietf@gmail.com> wrote:
>> Hi Eric,
>>
>> Here are the tcpdump files for the first TSO-disabled experiment, in a
>> full version and a short version with only the first 10000 packets:
>>
>> http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-01-tos0-exp1-full.pcap
>> http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-01-tos0-exp1-short.pcap
>>
>> By the way, the packets are sent from the server (x.x.x.189) to the
>> client interfaces (x.x.x.74) and (x.x.x.216) with the following
>> pattern (which is a non-bursty 128-bit approximation of scheduling
>> with a 600:400 ratio over primary path 0 and secondary path 1):
>>
>> 0010010100101001010010100101001010010100101001010010100101001010
>> 0101001010010100101001010010100101001010010100101001010010100101
>>
>> Greetings,
>> Dominik
>>
>> On Tue, Apr 26, 2011 at 7:10 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>> Le mardi 26 avril 2011 à 18:58 +0200, Dominik Kaspar a écrit :
>>>> Hi Eric,
>>>>
>>>> On Mon, Apr 25, 2011 at 5:38 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>> >
>>>> > Since you have at sender a rule to spoof destination address of packets,
>>>> > you should make sure you dont send "super packets (up to 64Kbytes)",
>>>> > because it would stress the multipath more than you wanted to. This way,
>>>> > you send only normal packets (1500 MTU).
>>>> >
>>>> > ethtool -K eth0 tso off
>>>> > ethtool -K eth0 gso off
>>>> >
>>>> > I am pretty sure it should help your (atypic) workload.
>>>>
>>>> I made new experiments with the exact same multipath setup as before,
>>>> but disabled TSO and GSO on all involved Ethernet interfaces. However,
>>>> this did not seem to change much about TCP's behavior when packets are
>>>> striped over heterogeneous paths. You can see the results of four
>>>> 20-minute experiments on this plot:
>>>>
>>>> http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-01-tos0.png
>>>>
>>>> Cheers,
>>>> Dominik
>>>
>>> Hi Dominik
>>>
>>> Any chance to have a pcap file from sender side, of say first 10.000
>>> packets ?
>>>
>>>
>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox