* Re: [PATCH 4/4] net_sched: gred: actually perform idling in WRED mode
From: Jamal Hadi Salim @ 2012-09-13 18:08 UTC (permalink / raw)
To: David Ward; +Cc: netdev, Bruce Osler, Cyril Chemparathy
In-Reply-To: <1347549755-19438-4-git-send-email-david.ward@ll.mit.edu>
On 12-09-13 11:22 AM, David Ward wrote:
> gred_dequeue() and gred_drop() do not seem to get called when the
> queue is empty, meaning that we never start idling while in WRED
> mode. And since qidlestart is not stored by gred_store_wred_set(),
> we would never stop idling while in WRED mode if we ever started.
> This messes up the average queue size calculation that influences
> packet marking/dropping behavior.
>
> Now, we start WRED mode idling as we are removing the last packet
> from the queue. Also we now actually stop WRED mode idling when we
> are enqueuing a packet.
>
> Cc: Bruce Osler <brosler@cisco.com>
> Signed-off-by: David Ward <david.ward@ll.mit.edu>
This is one is not so obvious. Iam assuming you vetted it via some tests.
In which case:
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
cheers,
jamal
^ permalink raw reply
* Re: [PATCH 3/4] net_sched: gred: fix qave reporting via netlink
From: Jamal Hadi Salim @ 2012-09-13 18:01 UTC (permalink / raw)
To: David Ward; +Cc: netdev, Bruce Osler, Cyril Chemparathy
In-Reply-To: <1347549755-19438-3-git-send-email-david.ward@ll.mit.edu>
On 12-09-13 11:22 AM, David Ward wrote:
> q->vars.qavg is a Wlog scaled value, but q->backlog is not. In order
> to pass q->vars.qavg as the backlog value, we need to un-scale it.
> Additionally, the qave value returned via netlink should not be Wlog
> scaled, so we need to un-scale the result of red_calc_qavg().
>
> This caused artificially high values for "Average Queue" to be shown
> by 'tc -s -d qdisc', but did not affect the actual operation of GRED.
>
> Signed-off-by: David Ward <david.ward@ll.mit.edu>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
^ permalink raw reply
* Re: [PATCH 2/4] net_sched: gred: eliminate redundant DP prio comparisons
From: Jamal Hadi Salim @ 2012-09-13 18:00 UTC (permalink / raw)
To: David Ward; +Cc: netdev, Bruce Osler, Cyril Chemparathy
In-Reply-To: <1347549755-19438-2-git-send-email-david.ward@ll.mit.edu>
On 12-09-13 11:22 AM, David Ward wrote:
> Each pair of DPs only needs to be compared once when searching for
> a non-unique prio value.
>
> Signed-off-by: David Ward <david.ward@ll.mit.edu>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
^ permalink raw reply
* Re: [PATCH 1/4] net_sched: gred: correct comment about qavg calculation in RIO mode
From: Jamal Hadi Salim @ 2012-09-13 17:59 UTC (permalink / raw)
To: David Ward; +Cc: netdev, Bruce Osler, Cyril Chemparathy
In-Reply-To: <1347549755-19438-1-git-send-email-david.ward@ll.mit.edu>
On 12-09-13 11:22 AM, David Ward wrote:
> Signed-off-by: David Ward <david.ward@ll.mit.edu>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
^ permalink raw reply
* Re: NULL deref in bnx2 / crashes ? ( was: netconsole leads to stalled CPU task )
From: Sylvain Munaut @ 2012-09-13 17:35 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Cong Wang, netdev
In-Reply-To: <1347455135.13103.949.camel@edumazet-glaptop>
Hi,
> Yes, but I have some worries of why it is needed.
>
> Isnt it covering a bug elsewhere ?
That may very well be.
Of the few test servers I have running the same kernel, I just found
the one with netconsole active to be "stuck".
Not frozen, but all user process are hanged up and it's spitting
message about processes and CPU being "stuck". The trace is different
in each case depending on what the process was actually doing at the
time it got stuck.
No message sent to the netconsole with the root cause and nothing was
written in the logs ...
Cheers,
Sylvain
^ permalink raw reply
* [PATCH] Xen backend support for paged out grant targets.
From: Andres Lagar-Cavilla @ 2012-09-13 17:28 UTC (permalink / raw)
To: xen-devel
Cc: Konrad Rzeszutek Wilk, Ian Campbell, David Vrabel, David Miller,
linux-kernel, netdev, Andres Lagar-Cavilla
Since Xen-4.2, hvm domains may have portions of their memory paged out. When a
foreign domain (such as dom0) attempts to map these frames, the map will
initially fail. The hypervisor returns a suitable errno, and kicks an
asynchronous page-in operation carried out by a helper. The foreign domain is
expected to retry the mapping operation until it eventually succeeds. The
foreign domain is not put to sleep because itself could be the one running the
pager assist (typical scenario for dom0).
This patch adds support for this mechanism for backend drivers using grant
mapping and copying operations. Specifically, this covers the blkback and
gntdev drivers (which map foregin grants), and the netback driver (which copies
foreign grants).
* Add GNTST_eagain, already exposed by Xen, to the grant interface.
* Add a retry method for grants that fail with GNTST_eagain (i.e. because the
target foregin frame is paged out).
* Insert hooks with appropriate macro decorators in the aforementioned drivers.
The retry loop is only invoked if the grant operation status is GNTST_eagain.
It guarantees to leave a new status code different from GNTST_eagain. Any other
status code results in identical code execution as before.
The retry loop performs 256 attempts with increasing time intervals through a
32 second period. It uses msleep to yield while waiting for the next retry.
V2 after feedback from David Vrabel:
* Explicit MAX_DELAY instead of wrap-around delay into zero
* Abstract GNTST_eagain check into core grant table code for netback module.
V3 after feedback from Ian Campbell:
* Add placeholder in array of grant table error descriptions for unrelated
error code we jump over.
* Eliminate single map and retry macro in favor of a generic batch flavor.
* Some renaming.
* Bury most implementation in grant_table.c, cleaner interface.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
---
drivers/net/xen-netback/netback.c | 11 ++------
drivers/xen/grant-table.c | 53 +++++++++++++++++++++++++++++++++++
drivers/xen/xenbus/xenbus_client.c | 6 ++--
include/xen/grant_table.h | 12 ++++++++
include/xen/interface/grant_table.h | 5 +++-
5 files changed, 74 insertions(+), 13 deletions(-)
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 682633b..05593d8 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -635,9 +635,7 @@ static void xen_netbk_rx_action(struct xen_netbk *netbk)
return;
BUG_ON(npo.copy_prod > ARRAY_SIZE(netbk->grant_copy_op));
- ret = HYPERVISOR_grant_table_op(GNTTABOP_copy, &netbk->grant_copy_op,
- npo.copy_prod);
- BUG_ON(ret != 0);
+ gnttab_batch_copy(netbk->grant_copy_op, npo.copy_prod);
while ((skb = __skb_dequeue(&rxq)) != NULL) {
sco = (struct skb_cb_overlay *)skb->cb;
@@ -1460,18 +1458,15 @@ static void xen_netbk_tx_submit(struct xen_netbk *netbk)
static void xen_netbk_tx_action(struct xen_netbk *netbk)
{
unsigned nr_gops;
- int ret;
nr_gops = xen_netbk_tx_build_gops(netbk);
if (nr_gops == 0)
return;
- ret = HYPERVISOR_grant_table_op(GNTTABOP_copy,
- netbk->tx_copy_ops, nr_gops);
- BUG_ON(ret);
- xen_netbk_tx_submit(netbk);
+ gnttab_batch_copy(netbk->tx_copy_ops, nr_gops);
+ xen_netbk_tx_submit(netbk);
}
static void xen_netbk_idx_release(struct xen_netbk *netbk, u16 pending_idx)
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index eea81cf..f5681c8 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -38,6 +38,7 @@
#include <linux/vmalloc.h>
#include <linux/uaccess.h>
#include <linux/io.h>
+#include <linux/delay.h>
#include <linux/hardirq.h>
#include <xen/xen.h>
@@ -823,6 +824,52 @@ unsigned int gnttab_max_grant_frames(void)
}
EXPORT_SYMBOL_GPL(gnttab_max_grant_frames);
+/* Handling of paged out grant targets (GNTST_eagain) */
+#define MAX_DELAY 256
+static inline void
+gnttab_retry_eagain_gop(unsigned int cmd, void *gop, int16_t *status,
+ const char *func)
+{
+ unsigned delay = 1;
+
+ do {
+ BUG_ON(HYPERVISOR_grant_table_op(cmd, gop, 1));
+ if (*status == GNTST_eagain)
+ msleep(delay++);
+ } while ((*status == GNTST_eagain) && (delay < MAX_DELAY));
+
+ if (delay >= MAX_DELAY) {
+ printk(KERN_ERR "%s: %s eagain grant\n", func, current->comm);
+ *status = GNTST_bad_page;
+ }
+}
+
+void gnttab_batch_map(struct gnttab_map_grant_ref *batch, unsigned count)
+{
+ struct gnttab_map_grant_ref *op;
+
+ if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, batch, count))
+ BUG();
+ for (op = batch; op < batch + count; op++)
+ if (op->status == GNTST_eagain)
+ gnttab_retry_eagain_gop(GNTTABOP_map_grant_ref, op,
+ &op->status, __func__);
+}
+EXPORT_SYMBOL_GPL(gnttab_batch_map);
+
+void gnttab_batch_copy(struct gnttab_copy *batch, unsigned count)
+{
+ struct gnttab_copy *op;
+
+ if (HYPERVISOR_grant_table_op(GNTTABOP_copy, batch, count))
+ BUG();
+ for (op = batch; op < batch + count; op++)
+ if (op->status == GNTST_eagain)
+ gnttab_retry_eagain_gop(GNTTABOP_copy, op, &op->status,
+ __func__);
+}
+EXPORT_SYMBOL_GPL(gnttab_batch_copy);
+
int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
struct gnttab_map_grant_ref *kmap_ops,
struct page **pages, unsigned int count)
@@ -836,6 +883,12 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
if (ret)
return ret;
+ /* Retry eagain maps */
+ for (i = 0; i < count; i++)
+ if (map_ops[i].status == GNTST_eagain)
+ gnttab_retry_eagain_gop(GNTTABOP_map_grant_ref, map_ops + i,
+ &map_ops[i].status, __func__);
+
if (xen_feature(XENFEAT_auto_translated_physmap))
return ret;
diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c
index b3e146e..bcf3ba4 100644
--- a/drivers/xen/xenbus/xenbus_client.c
+++ b/drivers/xen/xenbus/xenbus_client.c
@@ -490,8 +490,7 @@ static int xenbus_map_ring_valloc_pv(struct xenbus_device *dev,
op.host_addr = arbitrary_virt_to_machine(pte).maddr;
- if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, &op, 1))
- BUG();
+ gnttab_batch_map(&op, 1);
if (op.status != GNTST_okay) {
free_vm_area(area);
@@ -572,8 +571,7 @@ int xenbus_map_ring(struct xenbus_device *dev, int gnt_ref,
gnttab_set_map_op(&op, (unsigned long)vaddr, GNTMAP_host_map, gnt_ref,
dev->otherend_id);
- if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, &op, 1))
- BUG();
+ gnttab_batch_map(&op, 1);
if (op.status != GNTST_okay) {
xenbus_dev_fatal(dev, op.status,
diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index 11e27c3..da9386e 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -189,4 +189,16 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
struct page **pages, unsigned int count, bool clear_pte);
+/* Perform a batch of grant map/copy operations. Retry every batch slot
+ * for which the hypervisor returns GNTST_eagain. This is typically due
+ * to paged out target frames.
+ *
+ * Will retry for 1, 2, ... 255 ms, i.e. 256 times during 32 seconds.
+ *
+ * Return value in each iand every status field of the batch guaranteed
+ * to not be GNTST_eagain.
+ */
+void gnttab_batch_map(struct gnttab_map_grant_ref *batch, unsigned count);
+void gnttab_batch_copy(struct gnttab_copy *batch, unsigned count);
+
#endif /* __ASM_GNTTAB_H__ */
diff --git a/include/xen/interface/grant_table.h b/include/xen/interface/grant_table.h
index 7da811b..2f912d9 100644
--- a/include/xen/interface/grant_table.h
+++ b/include/xen/interface/grant_table.h
@@ -520,6 +520,7 @@ DEFINE_GUEST_HANDLE_STRUCT(gnttab_get_version);
#define GNTST_permission_denied (-8) /* Not enough privilege for operation. */
#define GNTST_bad_page (-9) /* Specified page was invalid for op. */
#define GNTST_bad_copy_arg (-10) /* copy arguments cross page boundary */
+#define GNTST_eagain (-12) /* Retry. */
#define GNTTABOP_error_msgs { \
"okay", \
@@ -532,7 +533,9 @@ DEFINE_GUEST_HANDLE_STRUCT(gnttab_get_version);
"no spare translation slot in the I/O MMU", \
"permission denied", \
"bad page", \
- "copy arguments cross page boundary" \
+ "copy arguments cross page boundary", \
+ "_placeholder_", \
+ "retry" \
}
#endif /* __XEN_PUBLIC_GRANT_TABLE_H__ */
--
1.7.9.5
^ permalink raw reply related
* Re: [PATCH net-next] drivers/net: Enable IOMMU pass through for be2net
From: Craig Hada @ 2012-09-13 17:06 UTC (permalink / raw)
To: Hada, Craig M, netdev
Cc: sathya.perla, subbu.seetharaman, ajit.khaparde, linux-kernel
In-Reply-To: <4988B24899FF4448BF3D99EFCD2A3562323AD5@G1W3785.americas.hpqcorp.net>
On 9/13/2012 9:27 AM, Hada, Craig M wrote:
> This patch sets the coherent DMA mask to 64-bit after the be2net driver has been acknowledged that the system is 64-bit DMA capable. The coherent DMA mask is examined by the Intel IOMMU driver to determine whether to allow pass through context mapping for all devices. With this patch, the be2net driver combined with be2net compatible hardware provides comparable performance to the case where vt-d is disabled. The main use case for this change is to decrease the time necessary to copy virtual machine memory during KVM live migration instantiations.
>
> This patch was tested on a system that enables the IOMMU in non-coherent mode. Two DMA remapper issues were encountered and both are in the Intel IOMMU driver with the following patches submitted upstream but not yet commited.
>
> Patch 1 - DMAR:[fault reason 02] Present bit in context entry is clear
> https://lkml.org/lkml/2012/6/15/20
My apologies for posting a truncated link for the above. The correct link is https://lkml.org/lkml/2012/6/15/204
-Craig
^ permalink raw reply
* re: gre: Support GRE over IPv6
From: Dan Carpenter @ 2012-09-13 16:01 UTC (permalink / raw)
To: xeb; +Cc: netdev
Hello Dmitry Kozlov,
The patch c12b395a4664: "gre: Support GRE over IPv6" from Aug 10,
2012, leads to the following warning:
net/ipv6/ip6_gre.c:1299 ip6gre_header_parse()
error: memcpy() 'haddr' too small (8 vs 16)
net/ipv6/ip6_gre.c
1296 static int ip6gre_header_parse(const struct sk_buff *skb, unsigned char *haddr)
1297 {
1298 const struct ipv6hdr *ipv6h = (const struct ipv6hdr *)skb_mac_header(skb);
1299 memcpy(haddr, &ipv6h->saddr, sizeof(struct in6_addr));
^^^^^
Smatch thinks this buffer is only 8 characters sometimes.
1300 return sizeof(struct in6_addr);
1301 }
One call tree where this would happen would be the
(struct sockaddr_ll *)sll->sll_addr[] in packet_rcv().
-> packet_rcv()
-> dev_parse_header()
-> ip6gre_header_parse()
I don't know the code well enough to say if this is a bug or not. Could
you take a look?
regards,
dan carpenter
^ permalink raw reply
* [PATCH 2/4] net_sched: gred: eliminate redundant DP prio comparisons
From: David Ward @ 2012-09-13 15:22 UTC (permalink / raw)
To: netdev; +Cc: Bruce Osler, Cyril Chemparathy, Jamal Hadi Salim, David Ward
In-Reply-To: <1347549755-19438-1-git-send-email-david.ward@ll.mit.edu>
Each pair of DPs only needs to be compared once when searching for
a non-unique prio value.
Signed-off-by: David Ward <david.ward@ll.mit.edu>
---
net/sched/sch_gred.c | 5 ++---
1 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c
index fca73cd..e19d4eb 100644
--- a/net/sched/sch_gred.c
+++ b/net/sched/sch_gred.c
@@ -102,9 +102,8 @@ static inline int gred_wred_mode_check(struct Qdisc *sch)
if (q == NULL)
continue;
- for (n = 0; n < table->DPs; n++)
- if (table->tab[n] && table->tab[n] != q &&
- table->tab[n]->prio == q->prio)
+ for (n = i + 1; n < table->DPs; n++)
+ if (table->tab[n] && table->tab[n]->prio == q->prio)
return 1;
}
--
1.7.4.1
^ permalink raw reply related
* [PATCH net-next V4] IB/ipoib: Add rtnl_link_ops support
From: Or Gerlitz @ 2012-09-13 15:56 UTC (permalink / raw)
To: davem, roland; +Cc: netdev, Or Gerlitz, Erez Shitrit
In-Reply-To: <1347551797-2495-1-git-send-email-ogerlitz@mellanox.com>
Add rtnl_link_ops to IPoIB, with the first usage being child device
create/delete through them. Childs devices are now either legacy ones,
created/deleted through the ipoib sysfs entries, or RTNL ones.
Adding support for RTNL childs involved refactoring of ipoib_vlan_add
which is now used by both the sysfs and the link_ops code.
Also, added ndo_uninit entry to support calling unregister_netdevice_queue
from the rtnl dellink entry. This required removal of calls to
ipoib_dev_cleanup from the driver in flows which use unregister_netdevice,
since the networking core will invoke ipoib_uninit which does exactly that.
Signed-off-by: Erez Shitrit <erezsh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
Documentation/infiniband/ipoib.txt | 3 +
drivers/infiniband/ulp/ipoib/Makefile | 3 +-
drivers/infiniband/ulp/ipoib/ipoib.h | 13 +++
drivers/infiniband/ulp/ipoib/ipoib_main.c | 25 +++++-
drivers/infiniband/ulp/ipoib/ipoib_netlink.c | 114 ++++++++++++++++++++++++++
drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 102 +++++++++++++----------
include/linux/if_link.h | 11 +++
7 files changed, 220 insertions(+), 51 deletions(-)
create mode 100644 drivers/infiniband/ulp/ipoib/ipoib_netlink.c
diff --git a/Documentation/infiniband/ipoib.txt b/Documentation/infiniband/ipoib.txt
index 64eeb55..f2cfe26 100644
--- a/Documentation/infiniband/ipoib.txt
+++ b/Documentation/infiniband/ipoib.txt
@@ -24,6 +24,9 @@ Partitions and P_Keys
The P_Key for any interface is given by the "pkey" file, and the
main interface for a subinterface is in "parent."
+ Child interface create/delete can also be done using IPoIB's
+ rtnl_link_ops, where childs created using either way behave the same.
+
Datagram vs Connected modes
The IPoIB driver supports two modes of operation: datagram and
diff --git a/drivers/infiniband/ulp/ipoib/Makefile b/drivers/infiniband/ulp/ipoib/Makefile
index 3090100..e5430dd 100644
--- a/drivers/infiniband/ulp/ipoib/Makefile
+++ b/drivers/infiniband/ulp/ipoib/Makefile
@@ -5,7 +5,8 @@ ib_ipoib-y := ipoib_main.o \
ipoib_multicast.o \
ipoib_verbs.o \
ipoib_vlan.o \
- ipoib_ethtool.o
+ ipoib_ethtool.o \
+ ipoib_netlink.o
ib_ipoib-$(CONFIG_INFINIBAND_IPOIB_CM) += ipoib_cm.o
ib_ipoib-$(CONFIG_INFINIBAND_IPOIB_DEBUG) += ipoib_fs.o
diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index ca43901..381f51b 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -104,6 +104,10 @@ enum {
MAX_SEND_CQE = 16,
IPOIB_CM_COPYBREAK = 256,
+
+ IPOIB_NON_CHILD = 0,
+ IPOIB_LEGACY_CHILD = 1,
+ IPOIB_RTNL_CHILD = 2,
};
#define IPOIB_OP_RECV (1ul << 31)
@@ -350,6 +354,7 @@ struct ipoib_dev_priv {
struct net_device *parent;
struct list_head child_intfs;
struct list_head list;
+ int child_type;
#ifdef CONFIG_INFINIBAND_IPOIB_CM
struct ipoib_cm_dev_priv cm;
@@ -509,6 +514,14 @@ void ipoib_event(struct ib_event_handler *handler,
int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey);
int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey);
+int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv,
+ u16 pkey, int child_type);
+
+int __init ipoib_netlink_init(void);
+void __exit ipoib_netlink_fini(void);
+
+void ipoib_setup(struct net_device *dev);
+
void ipoib_pkey_poll(struct work_struct *work);
int ipoib_pkey_dev_delay_open(struct net_device *dev);
void ipoib_drain_cq(struct net_device *dev);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 3e2085a..b3e9709 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -173,6 +173,11 @@ static int ipoib_stop(struct net_device *dev)
return 0;
}
+static void ipoib_uninit(struct net_device *dev)
+{
+ ipoib_dev_cleanup(dev);
+}
+
static netdev_features_t ipoib_fix_features(struct net_device *dev, netdev_features_t features)
{
struct ipoib_dev_priv *priv = netdev_priv(dev);
@@ -1262,6 +1267,9 @@ out:
void ipoib_dev_cleanup(struct net_device *dev)
{
struct ipoib_dev_priv *priv = netdev_priv(dev), *cpriv, *tcpriv;
+ LIST_HEAD(head);
+
+ ASSERT_RTNL();
ipoib_delete_debug_files(dev);
@@ -1270,10 +1278,9 @@ void ipoib_dev_cleanup(struct net_device *dev)
/* Stop GC on child */
set_bit(IPOIB_STOP_NEIGH_GC, &cpriv->flags);
cancel_delayed_work(&cpriv->neigh_reap_task);
- unregister_netdev(cpriv->dev);
- ipoib_dev_cleanup(cpriv->dev);
- free_netdev(cpriv->dev);
+ unregister_netdevice_queue(cpriv->dev, &head);
}
+ unregister_netdevice_many(&head);
ipoib_ib_dev_cleanup(dev);
@@ -1291,6 +1298,7 @@ static const struct header_ops ipoib_header_ops = {
};
static const struct net_device_ops ipoib_netdev_ops = {
+ .ndo_uninit = ipoib_uninit,
.ndo_open = ipoib_open,
.ndo_stop = ipoib_stop,
.ndo_change_mtu = ipoib_change_mtu,
@@ -1300,7 +1308,7 @@ static const struct net_device_ops ipoib_netdev_ops = {
.ndo_set_rx_mode = ipoib_set_mcast_list,
};
-static void ipoib_setup(struct net_device *dev)
+void ipoib_setup(struct net_device *dev)
{
struct ipoib_dev_priv *priv = netdev_priv(dev);
@@ -1662,7 +1670,6 @@ static void ipoib_remove_one(struct ib_device *device)
flush_workqueue(ipoib_workqueue);
unregister_netdev(priv->dev);
- ipoib_dev_cleanup(priv->dev);
free_netdev(priv->dev);
}
@@ -1714,8 +1721,15 @@ static int __init ipoib_init_module(void)
if (ret)
goto err_sa;
+ ret = ipoib_netlink_init();
+ if (ret)
+ goto err_client;
+
return 0;
+err_client:
+ ib_unregister_client(&ipoib_client);
+
err_sa:
ib_sa_unregister_client(&ipoib_sa_client);
destroy_workqueue(ipoib_workqueue);
@@ -1728,6 +1742,7 @@ err_fs:
static void __exit ipoib_cleanup_module(void)
{
+ ipoib_netlink_fini();
ib_unregister_client(&ipoib_client);
ib_sa_unregister_client(&ipoib_sa_client);
ipoib_unregister_debugfs();
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_netlink.c b/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
new file mode 100644
index 0000000..a7dc5ea
--- /dev/null
+++ b/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
@@ -0,0 +1,114 @@
+/*
+ * Copyright (c) 2012 Mellanox Technologies. - All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/netdevice.h>
+#include <linux/module.h>
+#include <net/rtnetlink.h>
+#include "ipoib.h"
+
+static const struct nla_policy ipoib_policy[IFLA_IPOIB_MAX + 1] = {
+ [IFLA_IPOIB_PKEY] = { .type = NLA_U16 },
+};
+
+static int ipoib_new_child_link(struct net *src_net, struct net_device *dev,
+ struct nlattr *tb[], struct nlattr *data[])
+{
+ struct net_device *pdev;
+ struct ipoib_dev_priv *ppriv;
+ u16 child_pkey;
+ int err;
+
+ if (!tb[IFLA_LINK])
+ return -EINVAL;
+
+ pdev = __dev_get_by_index(src_net, nla_get_u32(tb[IFLA_LINK]));
+ if (!pdev)
+ return -ENODEV;
+
+ ppriv = netdev_priv(pdev);
+
+ if (test_bit(IPOIB_FLAG_SUBINTERFACE, &ppriv->flags)) {
+ ipoib_warn(ppriv, "child creation disallowed for child devices\n");
+ return -EINVAL;
+ }
+
+ if (!data || !data[IFLA_IPOIB_PKEY]) {
+ ipoib_dbg(ppriv, "no pkey specified, using parent pkey\n");
+ child_pkey = ppriv->pkey;
+ } else
+ child_pkey = nla_get_u16(data[IFLA_IPOIB_PKEY]);
+
+ err = __ipoib_vlan_add(ppriv, netdev_priv(dev), child_pkey, IPOIB_RTNL_CHILD);
+
+ return err;
+}
+
+static void ipoib_unregister_child_dev(struct net_device *dev, struct list_head *head)
+{
+ struct ipoib_dev_priv *priv, *ppriv;
+
+ priv = netdev_priv(dev);
+ ppriv = netdev_priv(priv->parent);
+
+ mutex_lock(&ppriv->vlan_mutex);
+ unregister_netdevice_queue(dev, head);
+ list_del(&priv->list);
+ mutex_unlock(&ppriv->vlan_mutex);
+}
+
+static size_t ipoib_get_size(const struct net_device *dev)
+{
+ return nla_total_size(2); /* IFLA_IPOIB_PKEY */
+}
+
+static struct rtnl_link_ops ipoib_link_ops __read_mostly = {
+ .kind = "ipoib",
+ .maxtype = IFLA_IPOIB_MAX,
+ .policy = ipoib_policy,
+ .priv_size = sizeof(struct ipoib_dev_priv),
+ .setup = ipoib_setup,
+ .newlink = ipoib_new_child_link,
+ .dellink = ipoib_unregister_child_dev,
+ .get_size = ipoib_get_size,
+};
+
+int __init ipoib_netlink_init(void)
+{
+ return rtnl_link_register(&ipoib_link_ops);
+}
+
+void __exit ipoib_netlink_fini(void)
+{
+ rtnl_link_unregister(&ipoib_link_ops);
+}
+
+MODULE_ALIAS_RTNL_LINK("ipoib");
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
index d7e9740..238bbf9 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
@@ -49,47 +49,11 @@ static ssize_t show_parent(struct device *d, struct device_attribute *attr,
}
static DEVICE_ATTR(parent, S_IRUGO, show_parent, NULL);
-int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey)
+int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv,
+ u16 pkey, int type)
{
- struct ipoib_dev_priv *ppriv, *priv;
- char intf_name[IFNAMSIZ];
int result;
- if (!capable(CAP_NET_ADMIN))
- return -EPERM;
-
- ppriv = netdev_priv(pdev);
-
- if (!rtnl_trylock())
- return restart_syscall();
- mutex_lock(&ppriv->vlan_mutex);
-
- /*
- * First ensure this isn't a duplicate. We check the parent device and
- * then all of the child interfaces to make sure the Pkey doesn't match.
- */
- if (ppriv->pkey == pkey) {
- result = -ENOTUNIQ;
- priv = NULL;
- goto err;
- }
-
- list_for_each_entry(priv, &ppriv->child_intfs, list) {
- if (priv->pkey == pkey) {
- result = -ENOTUNIQ;
- priv = NULL;
- goto err;
- }
- }
-
- snprintf(intf_name, sizeof intf_name, "%s.%04x",
- ppriv->dev->name, pkey);
- priv = ipoib_intf_alloc(intf_name);
- if (!priv) {
- result = -ENOMEM;
- goto err;
- }
-
priv->max_ib_mtu = ppriv->max_ib_mtu;
/* MTU will be reset when mcast join happens */
priv->dev->mtu = IPOIB_UD_MTU(priv->max_ib_mtu);
@@ -134,14 +98,13 @@ int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey)
if (device_create_file(&priv->dev->dev, &dev_attr_parent))
goto sysfs_failed;
+ priv->child_type = type;
list_add_tail(&priv->list, &ppriv->child_intfs);
- mutex_unlock(&ppriv->vlan_mutex);
- rtnl_unlock();
-
return 0;
sysfs_failed:
+ result = -ENOMEM;
ipoib_delete_debug_files(priv->dev);
unregister_netdevice(priv->dev);
@@ -149,11 +112,60 @@ register_failed:
ipoib_dev_cleanup(priv->dev);
err:
+ return result;
+}
+
+int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey)
+{
+ struct ipoib_dev_priv *ppriv, *priv;
+ char intf_name[IFNAMSIZ];
+ struct ipoib_dev_priv *tpriv;
+ int result;
+
+ if (!capable(CAP_NET_ADMIN))
+ return -EPERM;
+
+ ppriv = netdev_priv(pdev);
+
+ snprintf(intf_name, sizeof intf_name, "%s.%04x",
+ ppriv->dev->name, pkey);
+ priv = ipoib_intf_alloc(intf_name);
+ if (!priv)
+ return -ENOMEM;
+
+ if (!rtnl_trylock())
+ return restart_syscall();
+
+ mutex_lock(&ppriv->vlan_mutex);
+
+ /*
+ * First ensure this isn't a duplicate. We check the parent device and
+ * then all of the legacy child interfaces to make sure the Pkey
+ * doesn't match.
+ */
+ if (ppriv->pkey == pkey) {
+ result = -ENOTUNIQ;
+ goto out;
+ }
+
+ list_for_each_entry(tpriv, &ppriv->child_intfs, list) {
+ if (tpriv->pkey == pkey &&
+ tpriv->child_type == IPOIB_LEGACY_CHILD) {
+ result = -ENOTUNIQ;
+ goto out;
+ }
+ }
+
+ result = __ipoib_vlan_add(ppriv, priv, pkey, IPOIB_LEGACY_CHILD);
+
+out:
mutex_unlock(&ppriv->vlan_mutex);
- rtnl_unlock();
- if (priv)
+
+ if (result)
free_netdev(priv->dev);
+ rtnl_unlock();
+
return result;
}
@@ -171,9 +183,9 @@ int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey)
return restart_syscall();
mutex_lock(&ppriv->vlan_mutex);
list_for_each_entry_safe(priv, tpriv, &ppriv->child_intfs, list) {
- if (priv->pkey == pkey) {
+ if (priv->pkey == pkey &&
+ priv->child_type == IPOIB_LEGACY_CHILD) {
unregister_netdevice(priv->dev);
- ipoib_dev_cleanup(priv->dev);
list_del(&priv->list);
dev = priv->dev;
break;
diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index ac173bd..24c0dd0 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -398,4 +398,15 @@ struct ifla_port_vsi {
__u8 pad[3];
};
+
+/* IPoIB section */
+
+enum {
+ IFLA_IPOIB_UNSPEC,
+ IFLA_IPOIB_PKEY,
+ __IFLA_IPOIB_MAX
+};
+
+#define IFLA_IPOIB_MAX (__IFLA_IPOIB_MAX - 1)
+
#endif /* _LINUX_IF_LINK_H */
--
1.7.1
^ permalink raw reply related
* [PATCH iproute2 2/2] iplink: added support for ipoib rtnl link ops
From: Or Gerlitz @ 2012-09-13 15:56 UTC (permalink / raw)
To: davem, roland; +Cc: netdev, Or Gerlitz
In-Reply-To: <1347551797-2495-1-git-send-email-ogerlitz@mellanox.com>
Added basic support to create/delete IPoIB child devices,
where the user can optionally specify the IB PKEY (Partition Key)
to be used by the newly created device. If nothing is provided,
the child will use the same pkey as the parent.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
include/linux/if_link.h | 9 ++++++
ip/Makefile | 2 +-
ip/iplink.c | 2 +-
ip/iplink_ipoib.c | 73 +++++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 84 insertions(+), 2 deletions(-)
create mode 100644 ip/iplink_ipoib.c
diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index 46f03db..7e64aa7 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -384,4 +384,13 @@ struct ifla_port_vsi {
__u8 pad[3];
};
+/* IPoIB section */
+enum {
+ IFLA_IPOIB_UNSPEC,
+ IFLA_IPOIB_CHILD_PKEY,
+ __IFLA_IPOIB_MAX
+};
+
+#define IFLA_IPOIB_MAX (__IFLA_IPOIB_MAX - 1)
+
#endif /* _LINUX_IF_LINK_H */
diff --git a/ip/Makefile b/ip/Makefile
index 6a518f8..ba2e2e6 100644
--- a/ip/Makefile
+++ b/ip/Makefile
@@ -3,7 +3,7 @@ IPOBJ=ip.o ipaddress.o ipaddrlabel.o iproute.o iprule.o ipnetns.o \
ipmaddr.o ipmonitor.o ipmroute.o ipprefix.o iptuntap.o \
ipxfrm.o xfrm_state.o xfrm_policy.o xfrm_monitor.o \
iplink_vlan.o link_veth.o link_gre.o iplink_can.o \
- iplink_macvlan.o iplink_macvtap.o ipl2tp.o link_vti.o
+ iplink_macvlan.o iplink_macvtap.o ipl2tp.o link_vti.o iplink_ipoib.o
RTMONOBJ=rtmon.o
diff --git a/ip/iplink.c b/ip/iplink.c
index 4111871..7451aa0 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -83,7 +83,7 @@ void iplink_usage(void)
if (iplink_have_newlink()) {
fprintf(stderr, "\n");
- fprintf(stderr, "TYPE := { vlan | veth | vcan | dummy | ifb | macvlan | can | bridge }\n");
+ fprintf(stderr, "TYPE := { vlan | veth | vcan | dummy | ifb | macvlan | can | bridge | ipoib }\n");
}
exit(-1);
}
diff --git a/ip/iplink_ipoib.c b/ip/iplink_ipoib.c
new file mode 100644
index 0000000..7dea357
--- /dev/null
+++ b/ip/iplink_ipoib.c
@@ -0,0 +1,73 @@
+/*
+ * iplink_ipoib.c IPoIB device support
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * Authors: Or Gerlitz <ogerlitz@mellanox.com>
+ * copied iflink_vlan.c authored by Patrick McHardy <kaber@trash.net>
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <linux/if_link.h>
+
+#include "rt_names.h"
+#include "utils.h"
+#include "ip_common.h"
+
+static void explain(void)
+{
+ fprintf(stderr,
+ "Usage: ... ipoib [pkey PKEY]\n"
+ "\n"
+ "PKEY := 0x8001-0xffff\n"
+ );
+}
+
+static int ipoib_parse_opt(struct link_util *lu, int argc, char **argv,
+ struct nlmsghdr *n)
+{
+ __u16 pkey;
+
+ while (argc > 0) {
+ if (matches(*argv, "pkey") == 0) {
+ NEXT_ARG();
+ if (get_u16(&pkey, *argv, 0))
+ invarg("pkey is invalid", *argv);
+ addattr_l(n, 1024, IFLA_IPOIB_CHILD_PKEY, &pkey, 2);
+ } else if (matches(*argv, "help") == 0) {
+ explain();
+ return -1;
+ } else {
+ fprintf(stderr, "ipoib: what is \"%s\"?\n", *argv);
+ explain();
+ return -1;
+ }
+ argc--, argv++;
+ }
+
+ return 0;
+}
+
+static void ipoib_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
+{
+ if (!tb)
+ return;
+
+ if (!tb[IFLA_IPOIB_CHILD_PKEY] ||
+ RTA_PAYLOAD(tb[IFLA_IPOIB_CHILD_PKEY]) < sizeof(__u16))
+ return;
+
+ fprintf(f, "pkey %.4x ", rta_getattr_u16(tb[IFLA_IPOIB_CHILD_PKEY]));
+}
+
+struct link_util ipoib_link_util = {
+ .id = "ipoib",
+ .maxattr = IFLA_IPOIB_MAX,
+ .parse_opt = ipoib_parse_opt,
+ .print_opt = ipoib_print_opt,
+};
--
1.7.1
^ permalink raw reply related
* [PATCH net-next V4 0/2] Add rtnl_link_ops support to IPoIB
From: Or Gerlitz @ 2012-09-13 15:56 UTC (permalink / raw)
To: davem, roland; +Cc: netdev, Or Gerlitz
This is about adding rtnl_link_ops to IPoIB, primarly addressing feedback
from Dave on a similar patch that was part of the eIPoIB submission.
Also added the releated iproute2 patch, for the sake of the review and
testing, e.g example usages:
$ ip link add link ib0 name ib0.1 type ipoib
$ ip link add link ib0 name ib0.8001 type ipoib pkey 0x8001
and the obvious
$ link delete dev ib0.8001
$ link delete dev ib0.1
Dave, as I reported over the V3 thread / http://marc.info/?l=linux-netdev&m=134744666921839&w=2
with latest net-next I see a lockdep warning when the module is unloaded which I'm
not sure if real or false, decided to keep with the submission, but I'm open to comments.
Roland, this patch is hanging out for pretty long while (few months) without
any comment from you, if it makes things easier, I would like to merge it through
net-next, makes sense?
Or.
Changes from V3:
- addressed feedback from Patrick McHardy to move the IFLA_IPOIB_yyy ipoib
rtnl defintions into include/linux/if_link.h
- changed IFLA_IPOIB_CHILD_PKEY to be named IFLA_IPOIB_PKEY which will cope
with more IFLA_IPOIB_yyy entries to be added once the basic support is in
Changes from V2:
- removed the notion of user defined index per child, since we can do well w.o it
- for that end, make (an internal to ipoib) distrinction between legacy childs created
through the old sysfs way to childs created using rtnl link ops
Changes from V1:
- applied feedback from Dave Miller to avoid using sysfs
- added rtnl_link_ops support in ipoib and use them to add/delete childs
Or Gerlitz (1):
IB/ipoib: Add rtnl_link_ops support
Documentation/infiniband/ipoib.txt | 3 +
drivers/infiniband/ulp/ipoib/Makefile | 3 +-
drivers/infiniband/ulp/ipoib/ipoib.h | 13 +++
drivers/infiniband/ulp/ipoib/ipoib_main.c | 25 +++++-
drivers/infiniband/ulp/ipoib/ipoib_netlink.c | 114 ++++++++++++++++++++++++++
drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 102 +++++++++++++----------
include/linux/if_link.h | 11 +++
7 files changed, 220 insertions(+), 51 deletions(-)
create mode 100644 drivers/infiniband/ulp/ipoib/ipoib_netlink.c
^ permalink raw reply
* Re: bnx2 cards intermittantly going offline
From: Sven Ulland @ 2012-09-13 15:45 UTC (permalink / raw)
To: netdev; +Cc: Marc A. Donges, Michael Chan
In-Reply-To: <20120913135108.GC3650@abomination.net.united.domain>
On 09/13/2012 03:51 PM, Marc A. Donges wrote:
> After 55 days of operation the machine (A) suddenly was no longer
> reachable via network. Strangely, a second machine (B) that should
> take over the IP addresses (keepalived) did not take over. Only
> after shutting the switchport to which A is attached did B take
> over.
Hi. We've had the same symptom with our BCM5709S [14e4:163a] on
Debian. Like you, we were on stable's 2.6.32-41squeeze2. Google led us
to many similar issues [1,2,3]. They concluded with the fix being in
mainline commit c441b8d2 [4]: "bnx2: Fix lost MSI-X problem on 5709
NICs".
Broadcom: Can you publish a tool that decodes ethtool -d dumps to make
debugging easier, or do you deem it no longer necessary with the the
register dump commits in 555069da?
Now, Debian's 2.6.32-41squeeze2 is based on longterm release 2.6.32.54
[5]. That version includes commit 0b7817ed [6], which is a backport of
the already mentioned mainline commit c441b8d2.
So we tried digging further and applying some seemingly relevant
commits [7,8] to our 2.6.32, but without any change in behaviour. Our
temporary fix was to run 'ethtool -t ethX' to reset the device every
time it locked up.
This dragged on with various builds, until we ended up on mainline
2.6.38 where we no longer saw any symptoms. I don't know in which
kernel version it was fixed, but we ended up on that one, sort of by
chance. Unfortunately, it had severe issues with kswapd memory
compaction causing CPU soft lockups [9], so we went straight to
squeeze-backports' 3.2.23-1~bpo60+2. We've been happy since then.
> We have five pairs of basically identical machines performing the
> same task (each pair for one site). The error has not occured with
> any other one, but this site is the busiest:
We also saw the issue only at a site with generally higher load
compared to other sites.
I'd love to know exactly which commit fixed the issue, but it's fairly
tricky to reproduce the issue, and the bisect count is fairly high (it
need not be a specific fix for bnx2).
sven
[1]: bnx2 driver crashes under random circumstances
https://bugzilla.redhat.com/show_bug.cgi?id=520888
[2]: Access denied. Come on, Red Hat!
https://bugzilla.redhat.com/show_bug.cgi?id=511368
[3]: NIC doesn't register packets [rhel-5.5.z]
https://bugzilla.redhat.com/show_bug.cgi?id=587799
[4]: bnx2: Fix lost MSI-X problem on 5709 NICs.
http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=object;h=c441b8d2cb2194b05550a558d6d95d8944e56a84
[5]: Debian Changelog linux-2.6 (2.6.32-45)
http://packages.debian.org/changelogs/pool/main/l/linux-2.6/linux-2.6_2.6.32-45/changelog#version2.6.32-41
[6]: bnx2: Fix lost MSI-X problem on 5709 NICs.
http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=commit;h=0b7817edda5e44e5fa769645bd1220f5e7b0beb5
[7]: bnx2: reset_task is crashing the kernel. Fixing it.
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=4529819c45161e4a119134f56ef504e69420bc98
[8]: bnx2: fixing a timout error due not refreshing TX timers correctly
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e6bf95ffa8d6f8f4b7ee33ea01490d95b0bbeb6e
[9]: [PATCH] remove compaction from kswapd
http://thread.gmane.org/gmane.linux.kernel.mm/58962
https://lkml.org/lkml/2011/3/25/664
^ permalink raw reply
* [PATCH 3/4] net_sched: gred: fix qave reporting via netlink
From: David Ward @ 2012-09-13 15:22 UTC (permalink / raw)
To: netdev; +Cc: Bruce Osler, Cyril Chemparathy, Jamal Hadi Salim, David Ward
In-Reply-To: <1347549755-19438-1-git-send-email-david.ward@ll.mit.edu>
q->vars.qavg is a Wlog scaled value, but q->backlog is not. In order
to pass q->vars.qavg as the backlog value, we need to un-scale it.
Additionally, the qave value returned via netlink should not be Wlog
scaled, so we need to un-scale the result of red_calc_qavg().
This caused artificially high values for "Average Queue" to be shown
by 'tc -s -d qdisc', but did not affect the actual operation of GRED.
Signed-off-by: David Ward <david.ward@ll.mit.edu>
---
net/sched/sch_gred.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)
diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c
index e19d4eb..b2570b5 100644
--- a/net/sched/sch_gred.c
+++ b/net/sched/sch_gred.c
@@ -534,6 +534,7 @@ static int gred_dump(struct Qdisc *sch, struct sk_buff *skb)
for (i = 0; i < MAX_DPs; i++) {
struct gred_sched_data *q = table->tab[i];
struct tc_gred_qopt opt;
+ unsigned long qavg;
memset(&opt, 0, sizeof(opt));
@@ -565,7 +566,9 @@ static int gred_dump(struct Qdisc *sch, struct sk_buff *skb)
if (gred_wred_mode(table))
gred_load_wred_set(table, q);
- opt.qave = red_calc_qavg(&q->parms, &q->vars, q->vars.qavg);
+ qavg = red_calc_qavg(&q->parms, &q->vars,
+ q->vars.qavg >> q->parms.Wlog);
+ opt.qave = qavg >> q->parms.Wlog;
append_opt:
if (nla_append(skb, sizeof(opt), &opt) < 0)
--
1.7.4.1
^ permalink raw reply related
* [PATCH 1/4] net_sched: gred: correct comment about qavg calculation in RIO mode
From: David Ward @ 2012-09-13 15:22 UTC (permalink / raw)
To: netdev; +Cc: Bruce Osler, Cyril Chemparathy, Jamal Hadi Salim, David Ward
Signed-off-by: David Ward <david.ward@ll.mit.edu>
---
net/sched/sch_gred.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c
index e901583..fca73cd 100644
--- a/net/sched/sch_gred.c
+++ b/net/sched/sch_gred.c
@@ -176,7 +176,7 @@ static int gred_enqueue(struct sk_buff *skb, struct Qdisc *sch)
skb->tc_index = (skb->tc_index & ~GRED_VQ_MASK) | dp;
}
- /* sum up all the qaves of prios <= to ours to get the new qave */
+ /* sum up all the qaves of prios < ours to get the new qave */
if (!gred_wred_mode(t) && gred_rio_mode(t)) {
int i;
--
1.7.4.1
^ permalink raw reply related
* [PATCH 4/4] net_sched: gred: actually perform idling in WRED mode
From: David Ward @ 2012-09-13 15:22 UTC (permalink / raw)
To: netdev; +Cc: Bruce Osler, Cyril Chemparathy, Jamal Hadi Salim, David Ward
In-Reply-To: <1347549755-19438-1-git-send-email-david.ward@ll.mit.edu>
gred_dequeue() and gred_drop() do not seem to get called when the
queue is empty, meaning that we never start idling while in WRED
mode. And since qidlestart is not stored by gred_store_wred_set(),
we would never stop idling while in WRED mode if we ever started.
This messes up the average queue size calculation that influences
packet marking/dropping behavior.
Now, we start WRED mode idling as we are removing the last packet
from the queue. Also we now actually stop WRED mode idling when we
are enqueuing a packet.
Cc: Bruce Osler <brosler@cisco.com>
Signed-off-by: David Ward <david.ward@ll.mit.edu>
---
net/sched/sch_gred.c | 26 +++++++++++++++-----------
1 files changed, 15 insertions(+), 11 deletions(-)
diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c
index b2570b5..d42234c 100644
--- a/net/sched/sch_gred.c
+++ b/net/sched/sch_gred.c
@@ -136,6 +136,7 @@ static inline void gred_store_wred_set(struct gred_sched *table,
struct gred_sched_data *q)
{
table->wred_set.qavg = q->vars.qavg;
+ table->wred_set.qidlestart = q->vars.qidlestart;
}
static inline int gred_use_ecn(struct gred_sched *t)
@@ -259,16 +260,18 @@ static struct sk_buff *gred_dequeue(struct Qdisc *sch)
} else {
q->backlog -= qdisc_pkt_len(skb);
- if (!q->backlog && !gred_wred_mode(t))
- red_start_of_idle_period(&q->vars);
+ if (gred_wred_mode(t)) {
+ if (!sch->qstats.backlog)
+ red_start_of_idle_period(&t->wred_set);
+ } else {
+ if (!q->backlog)
+ red_start_of_idle_period(&q->vars);
+ }
}
return skb;
}
- if (gred_wred_mode(t) && !red_is_idling(&t->wred_set))
- red_start_of_idle_period(&t->wred_set);
-
return NULL;
}
@@ -290,19 +293,20 @@ static unsigned int gred_drop(struct Qdisc *sch)
q->backlog -= len;
q->stats.other++;
- if (!q->backlog && !gred_wred_mode(t))
- red_start_of_idle_period(&q->vars);
+ if (gred_wred_mode(t)) {
+ if (!sch->qstats.backlog)
+ red_start_of_idle_period(&t->wred_set);
+ } else {
+ if (!q->backlog)
+ red_start_of_idle_period(&q->vars);
+ }
}
qdisc_drop(skb, sch);
return len;
}
- if (gred_wred_mode(t) && !red_is_idling(&t->wred_set))
- red_start_of_idle_period(&t->wred_set);
-
return 0;
-
}
static void gred_reset(struct Qdisc *sch)
--
1.7.4.1
^ permalink raw reply related
* [PATCH] mISDN: Fix wrong usage of flush_work_sync while holding locks
From: Karsten Keil @ 2012-09-13 14:36 UTC (permalink / raw)
To: davem; +Cc: netdev, stable
It is a bad idea to hold a spinlock and call flush_work_sync.
Move the workqueue cleanup outside the spinlock and use cancel_work_sync,
on closing the channel this seems to be the more correct function.
Remove the never used and constant return value of mISDN_freebchannel.
Signed-off-by: Karsten Keil <keil@b1-systems.de>
Cc: <stable@kernel.org>
---
drivers/isdn/hardware/mISDN/avmfritz.c | 3 ++-
drivers/isdn/hardware/mISDN/mISDNipac.c | 3 ++-
drivers/isdn/hardware/mISDN/mISDNisar.c | 3 ++-
drivers/isdn/hardware/mISDN/netjet.c | 3 ++-
drivers/isdn/hardware/mISDN/w6692.c | 3 ++-
drivers/isdn/mISDN/hwchannel.c | 9 ++++-----
include/linux/mISDNhw.h | 2 +-
7 files changed, 15 insertions(+), 11 deletions(-)
diff --git a/drivers/isdn/hardware/mISDN/avmfritz.c b/drivers/isdn/hardware/mISDN/avmfritz.c
index fa6ca47..dceaec8 100644
--- a/drivers/isdn/hardware/mISDN/avmfritz.c
+++ b/drivers/isdn/hardware/mISDN/avmfritz.c
@@ -857,8 +857,9 @@ avm_bctrl(struct mISDNchannel *ch, u32 cmd, void *arg)
switch (cmd) {
case CLOSE_CHANNEL:
test_and_clear_bit(FLG_OPEN, &bch->Flags);
+ cancel_work_sync(&bch->workq);
spin_lock_irqsave(&fc->lock, flags);
- mISDN_freebchannel(bch);
+ mISDN_clear_bchannel(bch);
modehdlc(bch, ISDN_P_NONE);
spin_unlock_irqrestore(&fc->lock, flags);
ch->protocol = ISDN_P_NONE;
diff --git a/drivers/isdn/hardware/mISDN/mISDNipac.c b/drivers/isdn/hardware/mISDN/mISDNipac.c
index 752e082..ccd7d85 100644
--- a/drivers/isdn/hardware/mISDN/mISDNipac.c
+++ b/drivers/isdn/hardware/mISDN/mISDNipac.c
@@ -1406,8 +1406,9 @@ hscx_bctrl(struct mISDNchannel *ch, u32 cmd, void *arg)
switch (cmd) {
case CLOSE_CHANNEL:
test_and_clear_bit(FLG_OPEN, &bch->Flags);
+ cancel_work_sync(&bch->workq);
spin_lock_irqsave(hx->ip->hwlock, flags);
- mISDN_freebchannel(bch);
+ mISDN_clear_bchannel(bch);
hscx_mode(hx, ISDN_P_NONE);
spin_unlock_irqrestore(hx->ip->hwlock, flags);
ch->protocol = ISDN_P_NONE;
diff --git a/drivers/isdn/hardware/mISDN/mISDNisar.c b/drivers/isdn/hardware/mISDN/mISDNisar.c
index be5973d..182ecf0 100644
--- a/drivers/isdn/hardware/mISDN/mISDNisar.c
+++ b/drivers/isdn/hardware/mISDN/mISDNisar.c
@@ -1588,8 +1588,9 @@ isar_bctrl(struct mISDNchannel *ch, u32 cmd, void *arg)
switch (cmd) {
case CLOSE_CHANNEL:
test_and_clear_bit(FLG_OPEN, &bch->Flags);
+ cancel_work_sync(&bch->workq);
spin_lock_irqsave(ich->is->hwlock, flags);
- mISDN_freebchannel(bch);
+ mISDN_clear_bchannel(bch);
modeisar(ich, ISDN_P_NONE);
spin_unlock_irqrestore(ich->is->hwlock, flags);
ch->protocol = ISDN_P_NONE;
diff --git a/drivers/isdn/hardware/mISDN/netjet.c b/drivers/isdn/hardware/mISDN/netjet.c
index c3e3e76..9bcade5 100644
--- a/drivers/isdn/hardware/mISDN/netjet.c
+++ b/drivers/isdn/hardware/mISDN/netjet.c
@@ -812,8 +812,9 @@ nj_bctrl(struct mISDNchannel *ch, u32 cmd, void *arg)
switch (cmd) {
case CLOSE_CHANNEL:
test_and_clear_bit(FLG_OPEN, &bch->Flags);
+ cancel_work_sync(&bch->workq);
spin_lock_irqsave(&card->lock, flags);
- mISDN_freebchannel(bch);
+ mISDN_clear_bchannel(bch);
mode_tiger(bc, ISDN_P_NONE);
spin_unlock_irqrestore(&card->lock, flags);
ch->protocol = ISDN_P_NONE;
diff --git a/drivers/isdn/hardware/mISDN/w6692.c b/drivers/isdn/hardware/mISDN/w6692.c
index 26a86b8..335fe64 100644
--- a/drivers/isdn/hardware/mISDN/w6692.c
+++ b/drivers/isdn/hardware/mISDN/w6692.c
@@ -1054,8 +1054,9 @@ w6692_bctrl(struct mISDNchannel *ch, u32 cmd, void *arg)
switch (cmd) {
case CLOSE_CHANNEL:
test_and_clear_bit(FLG_OPEN, &bch->Flags);
+ cancel_work_sync(&bch->workq);
spin_lock_irqsave(&card->lock, flags);
- mISDN_freebchannel(bch);
+ mISDN_clear_bchannel(bch);
w6692_mode(bc, ISDN_P_NONE);
spin_unlock_irqrestore(&card->lock, flags);
ch->protocol = ISDN_P_NONE;
diff --git a/drivers/isdn/mISDN/hwchannel.c b/drivers/isdn/mISDN/hwchannel.c
index ef34fd4..2602be2 100644
--- a/drivers/isdn/mISDN/hwchannel.c
+++ b/drivers/isdn/mISDN/hwchannel.c
@@ -148,17 +148,16 @@ mISDN_clear_bchannel(struct bchannel *ch)
ch->next_minlen = ch->init_minlen;
ch->maxlen = ch->init_maxlen;
ch->next_maxlen = ch->init_maxlen;
+ skb_queue_purge(&ch->rqueue);
+ ch->rcount = 0;
}
EXPORT_SYMBOL(mISDN_clear_bchannel);
-int
+void
mISDN_freebchannel(struct bchannel *ch)
{
+ cancel_work_sync(&ch->workq);
mISDN_clear_bchannel(ch);
- skb_queue_purge(&ch->rqueue);
- ch->rcount = 0;
- flush_work_sync(&ch->workq);
- return 0;
}
EXPORT_SYMBOL(mISDN_freebchannel);
diff --git a/include/linux/mISDNhw.h b/include/linux/mISDNhw.h
index d0752ec..9d96d5d 100644
--- a/include/linux/mISDNhw.h
+++ b/include/linux/mISDNhw.h
@@ -183,7 +183,7 @@ extern int mISDN_initbchannel(struct bchannel *, unsigned short,
unsigned short);
extern int mISDN_freedchannel(struct dchannel *);
extern void mISDN_clear_bchannel(struct bchannel *);
-extern int mISDN_freebchannel(struct bchannel *);
+extern void mISDN_freebchannel(struct bchannel *);
extern int mISDN_ctrl_bchannel(struct bchannel *, struct mISDN_ctrl_req *);
extern void queue_ch_frame(struct mISDNchannel *, u_int,
int, struct sk_buff *);
--
1.7.7
^ permalink raw reply related
* Re: [PATCH v4 0/8] cgroup: Assign subsystem IDs during compile time
From: Neil Horman @ 2012-09-13 14:01 UTC (permalink / raw)
To: Daniel Wagner
Cc: netdev-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA,
Daniel Wagner, David S. Miller, Paul E. McKenney, Andrew Morton,
Eric Dumazet, Gao feng, Glauber Costa, Herbert Xu,
Jamal Hadi Salim, John Fastabend, Kamezawa Hiroyuki, Li Zefan,
Tejun Heo
In-Reply-To: <1347459128-32236-1-git-send-email-wagi-kQCPcA+X3s7YtjvyW6yDsg@public.gmane.org>
On Wed, Sep 12, 2012 at 04:12:00PM +0200, Daniel Wagner wrote:
> From: Daniel Wagner <daniel.wagner-98C5kh4wR6ohFhg+JK9F0w@public.gmane.org>
>
> Hi,
>
> I've removed the useless test in patch #4 and updated the commit message
> on patch #7.
>
> While rewriting the commit message #7 I realized the pointer check was
> completely wrong. Instead testing the return value of
> task_subsys_state() I tested the pointer return by container_of. For
> more details on this see the commit message.
>
> Because of this I added Herbert and Paul to the Cc list. Please have
> close look at my rambling on the RCU part in patch #7. Thanks a lot!
>
> This series is against
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-3.7
>
> cheers,
> daniel
>
>
> Previous cover letters:
>
> v3:
>
> In this version I tried to concentrate on the main topic of this
> series, so I removed some of the things which were not really needed
> and I have to admit the result looks much better. So I hope that will
> simplify the review for you.
>
> I reordered some of the patches and dropped the jump label
> optimization for now. When this series is applied, then I can follow
> up with those changes.
>
> Overall, I tried to address all comments I got from v2. I didn't address
> Tejun comment on
>
> cgroup: Assign subsystem IDs during compile time
>
> to split the net_cls and net_prio changes from that patch. But I
> tried to 'fix' this by beeing a bit more verbose.
>
> The last patch is then the sweet one which gives some memory
> back.
>
> v2:
>
> Most notable changes are, that enabling/disabling of the jump labels
> are not inside the cgroup_lock anymore (create/destroy cb). Instead
> the corresponding functions will be called on module load or unload.
>
> CGROUP_BUILTIN_SUBSYS_COUNT is also gone in this version. This time I
> trade space for speed. Some extra cycles are spend to identify the
> modules in the for loops, e.g.
>
> for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
> struct cgroup_subsys_state *ss = cgrp->subsys[i];
>
> /* at bootup time, we don't worry about modular subsystems */
> if (!ss || (ss && ss->module))
> continue;
>
> [...]
> }
>
> CGROUP_SUBSYS_COUNT is currently 12 if all controllers are built. I
> haven't found any other way to get rid of CGROUP_BUILTIN_SUBSYS_COUNT
> without real dirty preprocessor tricks.
>
> Finally, the two versions of task_cls_classid() and task_netprioidx()
> are merged together.
>
> v1:
>
> I was able to 'fix' CGROUP_BUILTIN_SUBSYS_COUNT defition. With this
> version there is no unused subsys_id.
>
> The number of builtin subsystem are counted with gcc's predefined
> __COUNTER__ macro. This is a bit fragile, because __COUNTER__
> is only reset to 0 per compile unit. There is a workaround for this.
> When starting to enumate we need to store the current value of
> __COUNTER__ and then subtract that from all enums we define.
>
> Not sure if that is okay or not.
>
> v0:
>
> The patch #1 and #2 are there to be able to introduce (#3, #4) the
> jump labels in task_cls_classid() and task_netprioidx(). The jump
> labels are needed to know when it is safe to access the controller.
> For example not safe means the module is not yet loaded.
>
> All those patches are just preparation for the center piece (#5)
> of these series. This one will remove the dynamic subsystem ID
> generation and falls back to compile time generated IDs.
>
> This is the first result from the discussion around on the
> "cgroup cls & netprio 'cleanups'" patches.
>
> This patches are against net-next
>
> v4: - removed unnecessary testing in patch #4
> - updated commit message in patch #7
> - fixed wrong pointer check in patch #7
> v3: - dropping unrelated patches such as the jump label patch
> - reordered the patches
> - splitted "cgroup: Assign subsystem IDs during compile time" patch a bit
> - fixed the ordering dependency when assigning the subsystems
> - removed synchronize_rcu() calls
> - more verbose commit messages
> v2: - do not use dirty precompiler tricks:
> use ss->module to identify modules in the loops.
> - enable/disable jump labels in module load/unload functions
> - merge builtin/module versions of task_cls_classid() and task_netprioidx
> v1: - only use jump labels when built as module (#3, #4)
> - get rid of the additional 'pointer' (#5)
> v0: - initial version
>
> Signed-off-by: Daniel Wagner <daniel.wagner-98C5kh4wR6ohFhg+JK9F0w@public.gmane.org>
> Cc: "David S. Miller" <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
> Cc: "Paul E. McKenney" <paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> Cc: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
> Cc: Eric Dumazet <edumazet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> Cc: Gao feng <gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
> Cc: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
> Cc: Herbert Xu <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
> Cc: Jamal Hadi Salim <jhs-jkUAjuhPggJWk0Htik3J/w@public.gmane.org>
> Cc: John Fastabend <john.r.fastabend-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
> Cc: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> Cc: Neil Horman <nhorman-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>
> Cc: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>
> Daniel Wagner (8):
> cgroup: net_cls: Move sock_update_classid() declaration to
> cls_cgroup.h
> cgroup: net_cls: Do not define task_cls_classid() when not selected
> cgroup: net_prio: Do not define task_netpioidx() when not selected
> cgroup: Remove CGROUP_BUILTIN_SUBSYS_COUNT
> cgroup: Wrap subsystem selection macro
> cgroup: Do not depend on a given order when populating the subsys
> array
> cgroup: Assign subsystem IDs during compile time
> cgroup: Define CGROUP_SUBSYS_COUNT according the configuration
>
> include/linux/cgroup.h | 12 +++---
> include/linux/cgroup_subsys.h | 24 +++++------
> include/net/cls_cgroup.h | 27 ++++++------
> include/net/netprio_cgroup.h | 30 +++++--------
> include/net/sock.h | 8 ----
> kernel/cgroup.c | 98 ++++++++++++++++++++++---------------------
> net/core/netprio_cgroup.c | 11 -----
> net/core/sock.c | 15 ++-----
> net/sched/cls_cgroup.c | 13 ------
> 9 files changed, 97 insertions(+), 141 deletions(-)
>
> --
> 1.7.12.315.g682ce8b
>
>
Looks good, thanks. For the series:
Acked-by: Neil Horman <nhorman-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>
^ permalink raw reply
* Re: bnx2 cards intermittantly going offline
From: Marc A. Donges @ 2012-09-13 13:51 UTC (permalink / raw)
To: netdev; +Cc: Michael Chan
In-Reply-To: <1295373358.8131.4.camel@HP1>
[This is a reply to a somewhat older thread]
"Michael Chan" wrote:
> On Tue, 2011-01-18 at 02:54 -0800, Mills, Tony wrote:
>> Last night i setup a machine to monitor overnight and at 3:52 this
>> morning it became unresponsive.
>>
>
> When it becomes unresponsive, please send some packets to the NIC (such
> as ping) and monitor statistics with ethtool -S. See if the packets are
> being received or discarded. Also, run tcpdump on the machine to see if
> the packets are properly received by the stack. Thanks.
Hi Michael, hi netdev,
I appear to be having the same problem as Tony (or at least a problem matching
his description).
The machine uses the BCM5709 chipset:
03:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
03:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
04:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
04:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
It is running Debian stable (with the Debian stable firmware-bnx2 package).
After 55 days of operation the machine (A) suddenly was no longer reachable via
network. Strangely, a second machine (B) that should take over the IP addresses
(keepalived) did not take over. Only after shutting the switchport to which A
is attached did B take over.
Logging in to the machine via serial, I noticed that it did not receive any
packets via the network interface (after unshutting the switchport), only
traffic sent by the host A was visible in tcpdump, no traffic that was sent to
it (there should have been at least ARP traffic). In order to verify this, I
dumped traffic on another host in the broadcast domain and indeed, the traffic
sent out by A is seen on the network, it just doesn't receive any that is sent
to it.
This explains the lack of failover of keepalived, because A still considers
itself master and is able to announce that to the network, while it cannot see
the packets from its partner B (that wants to take over because of its,
meanwhile, higher priority).
No neighbors see the machine in their ARP tables any more.
I think the number of packets that are sent to the host are reflected in the
interface variable rx_ftq_discards: It increases by about 10 per second while
idle, and by about 80 per second when I send floodpings to the machine. Here
you see a dump of the interface statistics spaced ten seconds apart, while
floodpinging the host:
A:~# ethtool -S eth0; sleep 10; echo ---; ethtool -S eth0
NIC statistics:
rx_bytes: 35498373071360
rx_error_bytes: 0
tx_bytes: 35475382869262
tx_error_bytes: 0
rx_ucast_packets: 45479514105
rx_mcast_packets: 9800399
rx_bcast_packets: 4901866
tx_ucast_packets: 45364190447
tx_mcast_packets: 7285029
tx_bcast_packets: 3111
tx_mac_errors: 0
tx_carrier_errors: 0
rx_crc_errors: 0
rx_align_errors: 0
tx_single_collisions: 0
tx_multi_collisions: 0
tx_deferred: 0
tx_excess_collisions: 0
tx_late_collisions: 0
tx_total_collisions: 0
rx_fragments: 0
rx_jabbers: 0
rx_undersize_packets: 0
rx_oversize_packets: 0
rx_64_byte_packets: 3465587589
rx_65_to_127_byte_packets: 422897833
rx_128_to_255_byte_packets: 3996306350
rx_256_to_511_byte_packets: 1500221686
rx_512_to_1023_byte_packets: 1351649898
rx_1024_to_1522_byte_packets: 397814646
rx_1523_to_9022_byte_packets: 0
tx_64_byte_packets: 3451623430
tx_65_to_127_byte_packets: 366024709
tx_128_to_255_byte_packets: 3954496418
tx_256_to_511_byte_packets: 1499757422
tx_512_to_1023_byte_packets: 1351506958
tx_1024_to_1522_byte_packets: 388331444
tx_1523_to_9022_byte_packets: 0
rx_xon_frames: 0
rx_xoff_frames: 0
tx_xon_frames: 81
tx_xoff_frames: 81
rx_mac_ctrl_frames: 0
rx_filtered_packets: 26701433
rx_ftq_discards: 1796839
rx_discards: 369
rx_fw_discards: 0
---
NIC statistics:
rx_bytes: 35498373162770
rx_error_bytes: 0
tx_bytes: 35475382869262
tx_error_bytes: 0
rx_ucast_packets: 45479514920
rx_mcast_packets: 9800483
rx_bcast_packets: 4901876
tx_ucast_packets: 45364190447
tx_mcast_packets: 7285029
tx_bcast_packets: 3111
tx_mac_errors: 0
tx_carrier_errors: 0
rx_crc_errors: 0
rx_align_errors: 0
tx_single_collisions: 0
tx_multi_collisions: 0
tx_deferred: 0
tx_excess_collisions: 0
tx_late_collisions: 0
tx_total_collisions: 0
rx_fragments: 0
rx_jabbers: 0
rx_undersize_packets: 0
rx_oversize_packets: 0
rx_64_byte_packets: 3465587625
rx_65_to_127_byte_packets: 422898706
rx_128_to_255_byte_packets: 3996306350
rx_256_to_511_byte_packets: 1500221686
rx_512_to_1023_byte_packets: 1351649898
rx_1024_to_1522_byte_packets: 397814646
rx_1523_to_9022_byte_packets: 0
tx_64_byte_packets: 3451623430
tx_65_to_127_byte_packets: 366024709
tx_128_to_255_byte_packets: 3954496418
tx_256_to_511_byte_packets: 1499757422
tx_512_to_1023_byte_packets: 1351506958
tx_1024_to_1522_byte_packets: 388331444
tx_1523_to_9022_byte_packets: 0
rx_xon_frames: 0
rx_xoff_frames: 0
tx_xon_frames: 81
tx_xoff_frames: 81
rx_mac_ctrl_frames: 0
rx_filtered_packets: 26701433
rx_ftq_discards: 1797748
rx_discards: 369
rx_fw_discards: 0
The number of interrupts for the NIC is no longer increasing on host A. It is increasing on the otherwise identical and now active host B.
A:~# cat /proc/interrupts | fgrep eth0; sleep 10; echo ---; cat /proc/interrupts | fgrep eth0
74: 7353715 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0-0
75: 150160682 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0-1
76: 261739096 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0-2
77: 3118389637 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0-3
78: 3538415303 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0-4
79: 3437432016 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0-5
80: 4130864322 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0-6
81: 3844677189 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0-7
---
74: 7353715 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0-0
75: 150160682 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0-1
76: 261739096 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0-2
77: 3118389637 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0-3
78: 3538415303 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0-4
79: 3437432016 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0-5
80: 4130864322 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0-6
81: 3844677189 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0-7
B:~# cat /proc/interrupts | fgrep eth0; sleep 10; echo ---; cat /proc/interrupts | fgrep eth0
74: 8496700 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0-0
75: 2605649299 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0-1
76: 2278350057 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0-2
77: 2119009356 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0-3
78: 2004958460 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0-4
79: 2005171437 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0-5
80: 2318332903 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0-6
81: 2087470150 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0-7
---
74: 8496713 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0-0
75: 2605688265 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0-1
76: 2278397958 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0-2
77: 2119043500 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0-3
78: 2005000430 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0-4
79: 2005205617 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0-5
80: 2318373260 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0-6
81: 2087518969 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0-7
There are no (significant) interface errors on the switchport of machine A (Cisco 6500):
Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 3354643
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 0 bits/sec, 0 packets/sec
5 minute output rate 73000 bits/sec, 90 packets/sec
139005756894 packets input, 106028470724434 bytes, 0 no buffer
Received 41673355 broadcasts (41644823 multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 0 multicast, 0 pause input
0 input packets with dribble condition detected
139565849434 packets output, 106109148647056 bytes, 0 underruns
0 output errors, 0 collisions, 3 interface resets
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 PAUSE output
0 output buffer failures, 0 output buffers swapped out
For reference, switchport of machine B:
Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 561319
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 168420000 bits/sec, 27846 packets/sec
5 minute output rate 168547000 bits/sec, 27951 packets/sec
12477681177 packets input, 9891434829664 bytes, 0 no buffer
Received 4452361 broadcasts (4434737 multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 0 multicast, 0 pause input
0 input packets with dribble condition detected
12725512555 packets output, 9944380037353 bytes, 0 underruns
0 output errors, 0 collisions, 2 interface resets
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 PAUSE output
0 output buffer failures, 0 output buffers swapped out
This error occured about five hours ago, the interface did not recover.
We have five pairs of basically identical machines performing the same task
(each pair for one site). The error has not occured with any other one, but
this site is the busiest:
eth0 Link encap:Ethernet HWaddr 3c:d9:2b:ef:f6:3c
inet addr:172.16.100.23 Bcast:172.16.100.63 Mask:255.255.255.192
inet6 addr: fe80::3ed9:2bff:feef:f63c/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:45494315484 errors:1896322 dropped:1896322 overruns:0 frame:1896322
TX packets:45371478602 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:35498383041926 (32.2 TiB) TX bytes:35475382870222 (32.2 TiB)
Interrupt:30 Memory:f4000000-f4012800
The host performs NAT, input and output interface being eth0, therefore the RX and TX counters are similar.
I would appreciate any suggestions for diagnosing this further.
Kind regards
Marc
^ permalink raw reply
* Re: [PATCH] sch_red: fix weighted average calculation
From: Eric Dumazet @ 2012-09-13 13:53 UTC (permalink / raw)
To: Cyril Chemparathy
Cc: linux-kernel, netdev, davem, david.ward, jdowdal, paul.gortmaker
In-Reply-To: <1347543820-27548-1-git-send-email-cyril@ti.com>
On Thu, 2012-09-13 at 09:43 -0400, Cyril Chemparathy wrote:
> This patch fixes an apparent bug in the running weighted average calculation
> used in the RED algorithm.
>
> Going by the described formula:
> qavg = qavg*(1-W) + backlog*W
> => qavg = qavg + (backlog - qavg) * W
>
> ... with W converted to a pre-calculated shift, this then becomes:
> qavg = qavg + (backlog - qavg) >> logW
>
> ... giving the modified expression introduced by this patch.
>
> Signed-off-by: John Dowdal <jdowdal@ti.com>
> ---
> include/net/red.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/net/red.h b/include/net/red.h
> index ef46058..05960a4 100644
> --- a/include/net/red.h
> +++ b/include/net/red.h
> @@ -287,7 +287,7 @@ static inline unsigned long red_calc_qavg_no_idle_time(const struct red_parms *p
> *
> * --ANK (980924)
> */
> - return v->qavg + (backlog - (v->qavg >> p->Wlog));
> + return v->qavg + (backlog - v->qavg) >> p->Wlog;
> }
>
> static inline unsigned long red_calc_qavg(const struct red_parms *p,
This is going to be a FPP (Frequently Posted Patch)
Current formulae is fine.
Thats because backlog, at start of red_calc_qavg_no_idle_time() is not
yet scaled by p->Wlog. v->avg is scaled, but not backlog.
Have you tested RED after your patch ?
^ permalink raw reply
* [PATCH] sch_red: fix weighted average calculation
From: Cyril Chemparathy @ 2012-09-13 13:43 UTC (permalink / raw)
To: linux-kernel, netdev
Cc: davem, david.ward, eric.dumazet, jdowdal, paul.gortmaker,
Cyril Chemparathy
This patch fixes an apparent bug in the running weighted average calculation
used in the RED algorithm.
Going by the described formula:
qavg = qavg*(1-W) + backlog*W
=> qavg = qavg + (backlog - qavg) * W
... with W converted to a pre-calculated shift, this then becomes:
qavg = qavg + (backlog - qavg) >> logW
... giving the modified expression introduced by this patch.
Signed-off-by: John Dowdal <jdowdal@ti.com>
---
include/net/red.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/net/red.h b/include/net/red.h
index ef46058..05960a4 100644
--- a/include/net/red.h
+++ b/include/net/red.h
@@ -287,7 +287,7 @@ static inline unsigned long red_calc_qavg_no_idle_time(const struct red_parms *p
*
* --ANK (980924)
*/
- return v->qavg + (backlog - (v->qavg >> p->Wlog));
+ return v->qavg + (backlog - v->qavg) >> p->Wlog;
}
static inline unsigned long red_calc_qavg(const struct red_parms *p,
--
1.7.9.5
^ permalink raw reply related
* Re: GRO aggregation
From: Eric Dumazet @ 2012-09-13 13:22 UTC (permalink / raw)
To: Or Gerlitz
Cc: Shlomo Pongartz, Rick Jones, netdev@vger.kernel.org, Tom Herbert,
Yevgeny Petrilin
In-Reply-To: <CAJZOPZLgQVq+pS1PTU2SM2C_dPPuHx8EnVL8zH077zm5O9aafQ@mail.gmail.com>
On Thu, 2012-09-13 at 15:47 +0300, Or Gerlitz wrote:
> Shlomo is dealing with making the IPoIB driver work well with GRO,
> thanks for the
> comments on the Mellanox Ethernet driver, we will look there too
> (added Yevgeny)...
>
> As for IPoIB it has two modes, connected which irrelevant for this
> discussion, and datagram
> - who is under the scope here. Its MTU is typically 2044 but can be
> 4092 as well, the allocation
> of skb's for this mode is done in ipoib_alloc_rx_skb() -- which you've
> patched recently...
>
> Following your comment we noted that if using the lower/typical mtu of
> 2044 which means
> we are below the ipoib_ud_need_sg() threshold, skbs are allocated on
> one "form" and if using
> the 4092 mtu in another "form" - do you see each of the form to fall
> into different GRO flow, e.g
> 2044 to the "slow" and 4092 to the "fast"?!
Seems fine to me both ways, because you use dev_alloc_skb(), and you
dont pull tcp payload into tcp->head.
You might try adding prefetch() as well to bring into cpu cache
IP/TCP headers before they are needed in gro layers.
^ permalink raw reply
* [net-next PATCH 3/3] bnx2x: Utilize Link Flap Avoidance
From: Yuval Mintz @ 2012-09-13 12:56 UTC (permalink / raw)
To: davem, netdev; +Cc: eilong, ariele, Yuval Mintz, Yaniv Rosner
In-Reply-To: <1347540981-16198-1-git-send-email-yuvalmin@broadcom.com>
Change various flows in the bnx2x driver which up until now flapped
the link - these flows now benefit from the link flap avoidance mechanism.
This includes the removal of the link reset made upon nic init, as it is
possible the link is already active at that time.
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Yaniv Rosner <yaniv.rosner@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 12 +++---
drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h | 16 +++++++--
.../net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c | 10 +++--
drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h | 3 ++
drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.h | 2 +-
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 34 +++++++++++++-------
6 files changed, 51 insertions(+), 26 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index af20c6e..ca80487 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -2283,7 +2283,7 @@ int bnx2x_nic_load(struct bnx2x *bp, int load_mode)
/* Wait for all pending SP commands to complete */
if (!bnx2x_wait_sp_comp(bp, ~0x0UL)) {
BNX2X_ERR("Timeout waiting for SP elements to complete\n");
- bnx2x_nic_unload(bp, UNLOAD_CLOSE);
+ bnx2x_nic_unload(bp, UNLOAD_CLOSE, false);
return -EBUSY;
}
@@ -2331,7 +2331,7 @@ load_error0:
}
/* must be called with rtnl_lock */
-int bnx2x_nic_unload(struct bnx2x *bp, int unload_mode)
+int bnx2x_nic_unload(struct bnx2x *bp, int unload_mode, bool keep_link)
{
int i;
bool global = false;
@@ -2393,7 +2393,7 @@ int bnx2x_nic_unload(struct bnx2x *bp, int unload_mode)
/* Cleanup the chip if needed */
if (unload_mode != UNLOAD_RECOVERY)
- bnx2x_chip_cleanup(bp, unload_mode);
+ bnx2x_chip_cleanup(bp, unload_mode, keep_link);
else {
/* Send the UNLOAD_REQUEST to the MCP */
bnx2x_send_unload_req(bp, unload_mode);
@@ -2417,7 +2417,7 @@ int bnx2x_nic_unload(struct bnx2x *bp, int unload_mode)
bnx2x_free_irq(bp);
/* Report UNLOAD_DONE to MCP */
- bnx2x_send_unload_done(bp);
+ bnx2x_send_unload_done(bp, false);
}
/*
@@ -3768,7 +3768,7 @@ int bnx2x_reload_if_running(struct net_device *dev)
if (unlikely(!netif_running(dev)))
return 0;
- bnx2x_nic_unload(bp, UNLOAD_NORMAL);
+ bnx2x_nic_unload(bp, UNLOAD_NORMAL, true);
return bnx2x_nic_load(bp, LOAD_NORMAL);
}
@@ -3965,7 +3965,7 @@ int bnx2x_suspend(struct pci_dev *pdev, pm_message_t state)
netif_device_detach(dev);
- bnx2x_nic_unload(bp, UNLOAD_CLOSE);
+ bnx2x_nic_unload(bp, UNLOAD_CLOSE, false);
bnx2x_set_power_state(bp, pci_choose_state(pdev, state));
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
index 21b5532..96e998c 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
@@ -83,8 +83,9 @@ u32 bnx2x_send_unload_req(struct bnx2x *bp, int unload_mode);
* bnx2x_send_unload_done - send UNLOAD_DONE command to the MCP.
*
* @bp: driver handle
+ * @keep_link: true iff link should be kept up
*/
-void bnx2x_send_unload_done(struct bnx2x *bp);
+void bnx2x_send_unload_done(struct bnx2x *bp, bool keep_link);
/**
* bnx2x_config_rss_pf - configure RSS parameters in a PF.
@@ -153,6 +154,14 @@ u8 bnx2x_initial_phy_init(struct bnx2x *bp, int load_mode);
void bnx2x_link_set(struct bnx2x *bp);
/**
+ * bnx2x_force_link_reset - Forces link reset, and put the PHY
+ * in reset as well.
+ *
+ * @bp: driver handle
+ */
+void bnx2x_force_link_reset(struct bnx2x *bp);
+
+/**
* bnx2x_link_test - query link status.
*
* @bp: driver handle
@@ -312,12 +321,13 @@ void bnx2x_set_num_queues(struct bnx2x *bp);
*
* @bp: driver handle
* @unload_mode: COMMON, PORT, FUNCTION
+ * @keep_link: true iff link should be kept up.
*
* - Cleanup MAC configuration.
* - Closes clients.
* - etc.
*/
-void bnx2x_chip_cleanup(struct bnx2x *bp, int unload_mode);
+void bnx2x_chip_cleanup(struct bnx2x *bp, int unload_mode, bool keep_link);
/**
* bnx2x_acquire_hw_lock - acquire HW lock.
@@ -446,7 +456,7 @@ void bnx2x_fw_dump_lvl(struct bnx2x *bp, const char *lvl);
bool bnx2x_test_firmware_version(struct bnx2x *bp, bool is_err);
/* dev_close main block */
-int bnx2x_nic_unload(struct bnx2x *bp, int unload_mode);
+int bnx2x_nic_unload(struct bnx2x *bp, int unload_mode, bool keep_link);
/* dev_open main block */
int bnx2x_nic_load(struct bnx2x *bp, int load_mode);
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
index c37a68d..19d2fc5 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
@@ -905,6 +905,7 @@ static int bnx2x_nway_reset(struct net_device *dev)
if (netif_running(dev)) {
bnx2x_stats_handle(bp, STATS_EVENT_STOP);
+ bnx2x_force_link_reset(bp);
bnx2x_link_set(bp);
}
@@ -1733,6 +1734,7 @@ static int bnx2x_set_eee(struct net_device *dev, struct ethtool_eee *edata)
/* Restart link to propogate changes */
if (netif_running(dev)) {
bnx2x_stats_handle(bp, STATS_EVENT_STOP);
+ bnx2x_force_link_reset(bp);
bnx2x_link_set(bp);
}
@@ -2257,7 +2259,7 @@ static int bnx2x_test_ext_loopback(struct bnx2x *bp)
if (!netif_running(bp->dev))
return BNX2X_EXT_LOOPBACK_FAILED;
- bnx2x_nic_unload(bp, UNLOAD_NORMAL);
+ bnx2x_nic_unload(bp, UNLOAD_NORMAL, false);
rc = bnx2x_nic_load(bp, LOAD_LOOPBACK_EXT);
if (rc) {
DP(BNX2X_MSG_ETHTOOL,
@@ -2408,7 +2410,7 @@ static void bnx2x_self_test(struct net_device *dev,
link_up = bp->link_vars.link_up;
- bnx2x_nic_unload(bp, UNLOAD_NORMAL);
+ bnx2x_nic_unload(bp, UNLOAD_NORMAL, false);
rc = bnx2x_nic_load(bp, LOAD_DIAG);
if (rc) {
etest->flags |= ETH_TEST_FL_FAILED;
@@ -2440,7 +2442,7 @@ static void bnx2x_self_test(struct net_device *dev,
etest->flags |= ETH_TEST_FL_EXTERNAL_LB_DONE;
}
- bnx2x_nic_unload(bp, UNLOAD_NORMAL);
+ bnx2x_nic_unload(bp, UNLOAD_NORMAL, false);
/* restore input for TX port IF */
REG_WR(bp, NIG_REG_EGRESS_UMP0_IN_EN + port*4, val);
@@ -2934,7 +2936,7 @@ static int bnx2x_set_channels(struct net_device *dev,
bnx2x_change_num_queues(bp, channels->combined_count);
return 0;
}
- bnx2x_nic_unload(bp, UNLOAD_NORMAL);
+ bnx2x_nic_unload(bp, UNLOAD_NORMAL, true);
bnx2x_change_num_queues(bp, channels->combined_count);
return bnx2x_nic_load(bp, LOAD_NORMAL);
}
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h
index df14006..c795cfc 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h
@@ -1286,6 +1286,9 @@ struct drv_func_mb {
#define DRV_MSG_CODE_SET_MF_BW_MIN_MASK 0x00ff0000
#define DRV_MSG_CODE_SET_MF_BW_MAX_MASK 0xff000000
+ #define DRV_MSG_CODE_UNLOAD_SKIP_LINK_RESET 0x00000002
+
+ #define DRV_MSG_CODE_LOAD_REQ_WITH_LFA 0x0000100a
u32 fw_mb_header;
#define FW_MSG_CODE_MASK 0xffff0000
#define FW_MSG_CODE_DRV_LOAD_COMMON 0x10100000
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.h
index 5b64d3d..3cd2391 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.h
@@ -359,7 +359,7 @@ int bnx2x_phy_init(struct link_params *params, struct link_vars *vars);
to 0 */
int bnx2x_link_reset(struct link_params *params, struct link_vars *vars,
u8 reset_ext_phy);
-
+int bnx2x_lfa_reset(struct link_params *params, struct link_vars *vars);
/* bnx2x_link_update should be called upon link interrupt */
int bnx2x_link_update(struct link_params *params, struct link_vars *vars);
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 2105498..dfc5b60 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -2171,7 +2171,6 @@ void bnx2x_link_set(struct bnx2x *bp)
{
if (!BP_NOMCP(bp)) {
bnx2x_acquire_phy_lock(bp);
- bnx2x_link_reset(&bp->link_params, &bp->link_vars, 1);
bnx2x_phy_init(&bp->link_params, &bp->link_vars);
bnx2x_release_phy_lock(bp);
@@ -2184,12 +2183,19 @@ static void bnx2x__link_reset(struct bnx2x *bp)
{
if (!BP_NOMCP(bp)) {
bnx2x_acquire_phy_lock(bp);
- bnx2x_link_reset(&bp->link_params, &bp->link_vars, 1);
+ bnx2x_lfa_reset(&bp->link_params, &bp->link_vars);
bnx2x_release_phy_lock(bp);
} else
BNX2X_ERR("Bootcode is missing - can not reset link\n");
}
+void bnx2x_force_link_reset(struct bnx2x *bp)
+{
+ bnx2x_acquire_phy_lock(bp);
+ bnx2x_link_reset(&bp->link_params, &bp->link_vars, 1);
+ bnx2x_release_phy_lock(bp);
+}
+
u8 bnx2x_link_test(struct bnx2x *bp, u8 is_serdes)
{
u8 rc = 0;
@@ -6757,7 +6763,6 @@ static int bnx2x_init_hw_port(struct bnx2x *bp)
u32 low, high;
u32 val;
- bnx2x__link_reset(bp);
DP(NETIF_MSG_HW, "starting port init port %d\n", port);
@@ -8244,12 +8249,15 @@ u32 bnx2x_send_unload_req(struct bnx2x *bp, int unload_mode)
* bnx2x_send_unload_done - send UNLOAD_DONE command to the MCP.
*
* @bp: driver handle
+ * @keep_link: true iff link should be kept up
*/
-void bnx2x_send_unload_done(struct bnx2x *bp)
+void bnx2x_send_unload_done(struct bnx2x *bp, bool keep_link)
{
+ u32 reset_param = keep_link ? DRV_MSG_CODE_UNLOAD_SKIP_LINK_RESET : 0;
+
/* Report UNLOAD_DONE to MCP */
if (!BP_NOMCP(bp))
- bnx2x_fw_command(bp, DRV_MSG_CODE_UNLOAD_DONE, 0);
+ bnx2x_fw_command(bp, DRV_MSG_CODE_UNLOAD_DONE, reset_param);
}
static int bnx2x_func_wait_started(struct bnx2x *bp)
@@ -8318,7 +8326,7 @@ static int bnx2x_func_wait_started(struct bnx2x *bp)
return 0;
}
-void bnx2x_chip_cleanup(struct bnx2x *bp, int unload_mode)
+void bnx2x_chip_cleanup(struct bnx2x *bp, int unload_mode, bool keep_link)
{
int port = BP_PORT(bp);
int i, rc = 0;
@@ -8440,7 +8448,7 @@ unload_error:
/* Report UNLOAD_DONE to MCP */
- bnx2x_send_unload_done(bp);
+ bnx2x_send_unload_done(bp, keep_link);
}
void bnx2x_disable_close_the_gate(struct bnx2x *bp)
@@ -8852,7 +8860,8 @@ int bnx2x_leader_reset(struct bnx2x *bp)
* driver is owner of the HW
*/
if (!global && !BP_NOMCP(bp)) {
- load_code = bnx2x_fw_command(bp, DRV_MSG_CODE_LOAD_REQ, 0);
+ load_code = bnx2x_fw_command(bp, DRV_MSG_CODE_LOAD_REQ,
+ DRV_MSG_CODE_LOAD_REQ_WITH_LFA);
if (!load_code) {
BNX2X_ERR("MCP response failure, aborting\n");
rc = -EAGAIN;
@@ -8958,7 +8967,7 @@ static void bnx2x_parity_recover(struct bnx2x *bp)
/* Stop the driver */
/* If interface has been removed - break */
- if (bnx2x_nic_unload(bp, UNLOAD_RECOVERY))
+ if (bnx2x_nic_unload(bp, UNLOAD_RECOVERY, false))
return;
bp->recovery_state = BNX2X_RECOVERY_WAIT;
@@ -9124,7 +9133,7 @@ static void bnx2x_sp_rtnl_task(struct work_struct *work)
bp->sp_rtnl_state = 0;
smp_mb();
- bnx2x_nic_unload(bp, UNLOAD_NORMAL);
+ bnx2x_nic_unload(bp, UNLOAD_NORMAL, true);
bnx2x_nic_load(bp, LOAD_NORMAL);
goto sp_rtnl_exit;
@@ -9310,7 +9319,8 @@ static void __devinit bnx2x_prev_unload_undi_inc(struct bnx2x *bp, u8 port,
static int __devinit bnx2x_prev_mcp_done(struct bnx2x *bp)
{
- u32 rc = bnx2x_fw_command(bp, DRV_MSG_CODE_UNLOAD_DONE, 0);
+ u32 rc = bnx2x_fw_command(bp, DRV_MSG_CODE_UNLOAD_DONE,
+ DRV_MSG_CODE_UNLOAD_SKIP_LINK_RESET);
if (!rc) {
BNX2X_ERR("MCP response failure, aborting\n");
return -EBUSY;
@@ -11005,7 +11015,7 @@ static int bnx2x_close(struct net_device *dev)
struct bnx2x *bp = netdev_priv(dev);
/* Unload the driver, release IRQs */
- bnx2x_nic_unload(bp, UNLOAD_CLOSE);
+ bnx2x_nic_unload(bp, UNLOAD_CLOSE, false);
/* Power off */
bnx2x_set_power_state(bp, PCI_D3hot);
--
1.7.9.rc2
^ permalink raw reply related
* [net-next PATCH 2/3] bnx2x: Link Flap Avoidance
From: Yuval Mintz @ 2012-09-13 12:56 UTC (permalink / raw)
To: davem, netdev; +Cc: eilong, ariele, Yaniv Rosner, Yuval Mintz
In-Reply-To: <1347540981-16198-1-git-send-email-yuvalmin@broadcom.com>
From: Yaniv Rosner <yaniv.rosner@broadcom.com>
Various flows in the bnx2x driver cause a link-flap - if the link
is up, it would be toggled down (after a mac/phy reset) and then
taken back up.
In many of these cases, there is no need to do cause such a flap,
as the associated flows should not actually affect the link.
This patch adds the 'Link Flap Avoidance' mechanism, which allows
the driver to better determine if a given flow requires a link change,
and thus minimize the number of link flaps caused by the driver.
Signed-off-by: Yaniv Rosner <yaniv.rosner@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h | 48 +++
drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c | 435 +++++++++++++++++++---
drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.h | 2 +
3 files changed, 437 insertions(+), 48 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h
index 76b6e65..df14006 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h
@@ -1909,6 +1909,54 @@ struct lldp_local_mib {
};
/***END OF DCBX STRUCTURES DECLARATIONS***/
+/***********************************************************/
+/* Elink section */
+/***********************************************************/
+#define SHMEM_LINK_CONFIG_SIZE 2
+struct shmem_lfa {
+ u32 req_duplex;
+ #define REQ_DUPLEX_PHY0_MASK 0x0000ffff
+ #define REQ_DUPLEX_PHY0_SHIFT 0
+ #define REQ_DUPLEX_PHY1_MASK 0xffff0000
+ #define REQ_DUPLEX_PHY1_SHIFT 16
+ u32 req_flow_ctrl;
+ #define REQ_FLOW_CTRL_PHY0_MASK 0x0000ffff
+ #define REQ_FLOW_CTRL_PHY0_SHIFT 0
+ #define REQ_FLOW_CTRL_PHY1_MASK 0xffff0000
+ #define REQ_FLOW_CTRL_PHY1_SHIFT 16
+ u32 req_line_speed; /* Also determine AutoNeg */
+ #define REQ_LINE_SPD_PHY0_MASK 0x0000ffff
+ #define REQ_LINE_SPD_PHY0_SHIFT 0
+ #define REQ_LINE_SPD_PHY1_MASK 0xffff0000
+ #define REQ_LINE_SPD_PHY1_SHIFT 16
+ u32 speed_cap_mask[SHMEM_LINK_CONFIG_SIZE];
+ u32 additional_config;
+ #define REQ_FC_AUTO_ADV_MASK 0x0000ffff
+ #define REQ_FC_AUTO_ADV0_SHIFT 0
+ #define NO_LFA_DUE_TO_DCC_MASK 0x00010000
+ u32 lfa_sts;
+ #define LFA_LINK_FLAP_REASON_OFFSET 0
+ #define LFA_LINK_FLAP_REASON_MASK 0x000000ff
+ #define LFA_LINK_DOWN 0x1
+ #define LFA_LOOPBACK_ENABLED 0x2
+ #define LFA_DUPLEX_MISMATCH 0x3
+ #define LFA_MFW_IS_TOO_OLD 0x4
+ #define LFA_LINK_SPEED_MISMATCH 0x5
+ #define LFA_FLOW_CTRL_MISMATCH 0x6
+ #define LFA_SPEED_CAP_MISMATCH 0x7
+ #define LFA_DCC_LFA_DISABLED 0x8
+ #define LFA_EEE_MISMATCH 0x9
+
+ #define LINK_FLAP_AVOIDANCE_COUNT_OFFSET 8
+ #define LINK_FLAP_AVOIDANCE_COUNT_MASK 0x0000ff00
+
+ #define LINK_FLAP_COUNT_OFFSET 16
+ #define LINK_FLAP_COUNT_MASK 0x00ff0000
+
+ #define LFA_FLAGS_MASK 0xff000000
+ #define SHMEM_LFA_DONT_CLEAR_STAT (1<<24)
+};
+
struct ncsi_oem_fcoe_features {
u32 fcoe_features1;
#define FCOE_FEATURES1_IOS_PER_CONNECTION_MASK 0x0000FFFF
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c
index 05620ef..8eabd33 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c
@@ -321,6 +321,127 @@ static u32 bnx2x_bits_dis(struct bnx2x *bp, u32 reg, u32 bits)
return val;
}
+/*
+ * bnx2x_check_lfa - This function checks if link reinitialization is required,
+ * or link flap can be avoided.
+ *
+ * @params: link parameters
+ * Returns 0 if Link Flap Avoidance conditions are met otherwise, the failed
+ * condition code.
+ */
+static int bnx2x_check_lfa(struct link_params *params)
+{
+ u32 link_status, cfg_idx, lfa_mask, cfg_size;
+ u32 cur_speed_cap_mask, cur_req_fc_auto_adv, additional_config;
+ u32 saved_val, req_val, eee_status;
+ struct bnx2x *bp = params->bp;
+
+ additional_config =
+ REG_RD(bp, params->lfa_base +
+ offsetof(struct shmem_lfa, additional_config));
+
+ /* NOTE: must be first condition checked -
+ * to verify DCC bit is cleared in any case!
+ */
+ if (additional_config & NO_LFA_DUE_TO_DCC_MASK) {
+ DP(NETIF_MSG_LINK, "No LFA due to DCC flap after clp exit\n");
+ REG_WR(bp, params->lfa_base +
+ offsetof(struct shmem_lfa, additional_config),
+ additional_config & ~NO_LFA_DUE_TO_DCC_MASK);
+ return LFA_DCC_LFA_DISABLED;
+ }
+
+ /* Verify that link is up */
+ link_status = REG_RD(bp, params->shmem_base +
+ offsetof(struct shmem_region,
+ port_mb[params->port].link_status));
+ if (!(link_status & LINK_STATUS_LINK_UP))
+ return LFA_LINK_DOWN;
+
+ /* Verify that loopback mode is not set */
+ if (params->loopback_mode)
+ return LFA_LOOPBACK_ENABLED;
+
+ /* Verify that MFW supports LFA */
+ if (!params->lfa_base)
+ return LFA_MFW_IS_TOO_OLD;
+
+ if (params->num_phys == 3) {
+ cfg_size = 2;
+ lfa_mask = 0xffffffff;
+ } else {
+ cfg_size = 1;
+ lfa_mask = 0xffff;
+ }
+
+ /* Compare Duplex */
+ saved_val = REG_RD(bp, params->lfa_base +
+ offsetof(struct shmem_lfa, req_duplex));
+ req_val = params->req_duplex[0] | (params->req_duplex[1] << 16);
+ if ((saved_val & lfa_mask) != (req_val & lfa_mask)) {
+ DP(NETIF_MSG_LINK, "Duplex mismatch %x vs. %x\n",
+ (saved_val & lfa_mask), (req_val & lfa_mask));
+ return LFA_DUPLEX_MISMATCH;
+ }
+ /* Compare Flow Control */
+ saved_val = REG_RD(bp, params->lfa_base +
+ offsetof(struct shmem_lfa, req_flow_ctrl));
+ req_val = params->req_flow_ctrl[0] | (params->req_flow_ctrl[1] << 16);
+ if ((saved_val & lfa_mask) != (req_val & lfa_mask)) {
+ DP(NETIF_MSG_LINK, "Flow control mismatch %x vs. %x\n",
+ (saved_val & lfa_mask), (req_val & lfa_mask));
+ return LFA_FLOW_CTRL_MISMATCH;
+ }
+ /* Compare Link Speed */
+ saved_val = REG_RD(bp, params->lfa_base +
+ offsetof(struct shmem_lfa, req_line_speed));
+ req_val = params->req_line_speed[0] | (params->req_line_speed[1] << 16);
+ if ((saved_val & lfa_mask) != (req_val & lfa_mask)) {
+ DP(NETIF_MSG_LINK, "Link speed mismatch %x vs. %x\n",
+ (saved_val & lfa_mask), (req_val & lfa_mask));
+ return LFA_LINK_SPEED_MISMATCH;
+ }
+
+ for (cfg_idx = 0; cfg_idx < cfg_size; cfg_idx++) {
+ cur_speed_cap_mask = REG_RD(bp, params->lfa_base +
+ offsetof(struct shmem_lfa,
+ speed_cap_mask[cfg_idx]));
+
+ if (cur_speed_cap_mask != params->speed_cap_mask[cfg_idx]) {
+ DP(NETIF_MSG_LINK, "Speed Cap mismatch %x vs. %x\n",
+ cur_speed_cap_mask,
+ params->speed_cap_mask[cfg_idx]);
+ return LFA_SPEED_CAP_MISMATCH;
+ }
+ }
+
+ cur_req_fc_auto_adv =
+ REG_RD(bp, params->lfa_base +
+ offsetof(struct shmem_lfa, additional_config)) &
+ REQ_FC_AUTO_ADV_MASK;
+
+ if ((u16)cur_req_fc_auto_adv != params->req_fc_auto_adv) {
+ DP(NETIF_MSG_LINK, "Flow Ctrl AN mismatch %x vs. %x\n",
+ cur_req_fc_auto_adv, params->req_fc_auto_adv);
+ return LFA_FLOW_CTRL_MISMATCH;
+ }
+
+ eee_status = REG_RD(bp, params->shmem2_base +
+ offsetof(struct shmem2_region,
+ eee_status[params->port]));
+
+ if (((eee_status & SHMEM_EEE_LPI_REQUESTED_BIT) ^
+ (params->eee_mode & EEE_MODE_ENABLE_LPI)) ||
+ ((eee_status & SHMEM_EEE_REQUESTED_BIT) ^
+ (params->eee_mode & EEE_MODE_ADV_LPI))) {
+ DP(NETIF_MSG_LINK, "EEE mismatch %x vs. %x\n", params->eee_mode,
+ eee_status);
+ return LFA_EEE_MISMATCH;
+ }
+
+ /* LFA conditions are met */
+ return 0;
+}
/******************************************************************/
/* EPIO/GPIO section */
/******************************************************************/
@@ -1606,16 +1727,23 @@ static void bnx2x_set_xumac_nig(struct link_params *params,
NIG_REG_P0_MAC_PAUSE_OUT_EN, tx_pause_en);
}
-static void bnx2x_umac_disable(struct link_params *params)
+static void bnx2x_set_umac_rxtx(struct link_params *params, u8 en)
{
u32 umac_base = params->port ? GRCBASE_UMAC1 : GRCBASE_UMAC0;
+ u32 val;
struct bnx2x *bp = params->bp;
if (!(REG_RD(bp, MISC_REG_RESET_REG_2) &
(MISC_REGISTERS_RESET_REG_2_UMAC0 << params->port)))
return;
-
+ val = REG_RD(bp, umac_base + UMAC_REG_COMMAND_CONFIG);
+ if (en)
+ val |= (UMAC_COMMAND_CONFIG_REG_TX_ENA |
+ UMAC_COMMAND_CONFIG_REG_RX_ENA);
+ else
+ val &= ~(UMAC_COMMAND_CONFIG_REG_TX_ENA |
+ UMAC_COMMAND_CONFIG_REG_RX_ENA);
/* Disable RX and TX */
- REG_WR(bp, umac_base + UMAC_REG_COMMAND_CONFIG, 0);
+ REG_WR(bp, umac_base + UMAC_REG_COMMAND_CONFIG, val);
}
static void bnx2x_umac_enable(struct link_params *params,
@@ -1766,11 +1894,12 @@ static void bnx2x_xmac_init(struct link_params *params, u32 max_speed)
}
-static void bnx2x_xmac_disable(struct link_params *params)
+static void bnx2x_set_xmac_rxtx(struct link_params *params, u8 en)
{
u8 port = params->port;
struct bnx2x *bp = params->bp;
u32 pfc_ctrl, xmac_base = (port) ? GRCBASE_XMAC1 : GRCBASE_XMAC0;
+ u32 val;
if (REG_RD(bp, MISC_REG_RESET_REG_2) &
MISC_REGISTERS_RESET_REG_2_XMAC) {
@@ -1784,7 +1913,12 @@ static void bnx2x_xmac_disable(struct link_params *params)
REG_WR(bp, xmac_base + XMAC_REG_PFC_CTRL_HI,
(pfc_ctrl | (1<<1)));
DP(NETIF_MSG_LINK, "Disable XMAC on port %x\n", port);
- REG_WR(bp, xmac_base + XMAC_REG_CTRL, 0);
+ val = REG_RD(bp, xmac_base + XMAC_REG_CTRL);
+ if (en)
+ val |= (XMAC_CTRL_REG_TX_EN | XMAC_CTRL_REG_RX_EN);
+ else
+ val &= ~(XMAC_CTRL_REG_TX_EN | XMAC_CTRL_REG_RX_EN);
+ REG_WR(bp, xmac_base + XMAC_REG_CTRL, val);
}
}
@@ -2825,16 +2959,18 @@ static int bnx2x_bmac2_enable(struct link_params *params,
static int bnx2x_bmac_enable(struct link_params *params,
struct link_vars *vars,
- u8 is_lb)
+ u8 is_lb, u8 reset_bmac)
{
int rc = 0;
u8 port = params->port;
struct bnx2x *bp = params->bp;
u32 val;
/* Reset and unreset the BigMac */
- REG_WR(bp, GRCBASE_MISC + MISC_REGISTERS_RESET_REG_2_CLEAR,
- (MISC_REGISTERS_RESET_REG_2_RST_BMAC0 << port));
- usleep_range(1000, 2000);
+ if (reset_bmac) {
+ REG_WR(bp, GRCBASE_MISC + MISC_REGISTERS_RESET_REG_2_CLEAR,
+ (MISC_REGISTERS_RESET_REG_2_RST_BMAC0 << port));
+ usleep_range(1000, 2000);
+ }
REG_WR(bp, GRCBASE_MISC + MISC_REGISTERS_RESET_REG_2_SET,
(MISC_REGISTERS_RESET_REG_2_RST_BMAC0 << port));
@@ -2866,37 +3002,28 @@ static int bnx2x_bmac_enable(struct link_params *params,
return rc;
}
-static void bnx2x_bmac_rx_disable(struct bnx2x *bp, u8 port)
+static void bnx2x_set_bmac_rx(struct bnx2x *bp, u32 chip_id, u8 port, u8 en)
{
u32 bmac_addr = port ? NIG_REG_INGRESS_BMAC1_MEM :
NIG_REG_INGRESS_BMAC0_MEM;
u32 wb_data[2];
u32 nig_bmac_enable = REG_RD(bp, NIG_REG_BMAC0_REGS_OUT_EN + port*4);
+ if (CHIP_IS_E2(bp))
+ bmac_addr += BIGMAC2_REGISTER_BMAC_CONTROL;
+ else
+ bmac_addr += BIGMAC_REGISTER_BMAC_CONTROL;
/* Only if the bmac is out of reset */
if (REG_RD(bp, MISC_REG_RESET_REG_2) &
(MISC_REGISTERS_RESET_REG_2_RST_BMAC0 << port) &&
nig_bmac_enable) {
-
- if (CHIP_IS_E2(bp)) {
- /* Clear Rx Enable bit in BMAC_CONTROL register */
- REG_RD_DMAE(bp, bmac_addr +
- BIGMAC2_REGISTER_BMAC_CONTROL,
- wb_data, 2);
- wb_data[0] &= ~BMAC_CONTROL_RX_ENABLE;
- REG_WR_DMAE(bp, bmac_addr +
- BIGMAC2_REGISTER_BMAC_CONTROL,
- wb_data, 2);
- } else {
- /* Clear Rx Enable bit in BMAC_CONTROL register */
- REG_RD_DMAE(bp, bmac_addr +
- BIGMAC_REGISTER_BMAC_CONTROL,
- wb_data, 2);
+ /* Clear Rx Enable bit in BMAC_CONTROL register */
+ REG_RD_DMAE(bp, bmac_addr, wb_data, 2);
+ if (en)
+ wb_data[0] |= BMAC_CONTROL_RX_ENABLE;
+ else
wb_data[0] &= ~BMAC_CONTROL_RX_ENABLE;
- REG_WR_DMAE(bp, bmac_addr +
- BIGMAC_REGISTER_BMAC_CONTROL,
- wb_data, 2);
- }
+ REG_WR_DMAE(bp, bmac_addr, wb_data, 2);
usleep_range(1000, 2000);
}
}
@@ -4407,7 +4534,7 @@ static void bnx2x_warpcore_config_init(struct bnx2x_phy *phy,
"serdes_net_if = 0x%x\n",
vars->line_speed, serdes_net_if);
bnx2x_set_aer_mmd(params, phy);
-
+ bnx2x_warpcore_reset_lane(bp, phy, 1);
vars->phy_flags |= PHY_XGXS_FLAG;
if ((serdes_net_if == PORT_HW_CFG_NET_SERDES_IF_SGMII) ||
(phy->req_line_speed &&
@@ -6526,12 +6653,9 @@ static int bnx2x_update_link_down(struct link_params *params,
usleep_range(10000, 20000);
/* Reset BigMac/Xmac */
if (CHIP_IS_E1x(bp) ||
- CHIP_IS_E2(bp)) {
- bnx2x_bmac_rx_disable(bp, params->port);
- REG_WR(bp, GRCBASE_MISC +
- MISC_REGISTERS_RESET_REG_2_CLEAR,
- (MISC_REGISTERS_RESET_REG_2_RST_BMAC0 << port));
- }
+ CHIP_IS_E2(bp))
+ bnx2x_set_bmac_rx(bp, params->chip_id, params->port, 0);
+
if (CHIP_IS_E3(bp)) {
/* Prevent LPI Generation by chip */
REG_WR(bp, MISC_REG_CPMU_LP_FW_ENABLE_P0 + (params->port << 2),
@@ -6543,8 +6667,8 @@ static int bnx2x_update_link_down(struct link_params *params,
SHMEM_EEE_ACTIVE_BIT);
bnx2x_update_mng_eee(params, vars->eee_status);
- bnx2x_xmac_disable(params);
- bnx2x_umac_disable(params);
+ bnx2x_set_xmac_rxtx(params, 0);
+ bnx2x_set_umac_rxtx(params, 0);
}
return 0;
@@ -6596,7 +6720,7 @@ static int bnx2x_update_link_up(struct link_params *params,
if ((CHIP_IS_E1x(bp) ||
CHIP_IS_E2(bp))) {
if (link_10g) {
- if (bnx2x_bmac_enable(params, vars, 0) ==
+ if (bnx2x_bmac_enable(params, vars, 0, 1) ==
-ESRCH) {
DP(NETIF_MSG_LINK, "Found errors on BMAC\n");
vars->link_up = 0;
@@ -12171,7 +12295,7 @@ void bnx2x_init_bmac_loopback(struct link_params *params,
bnx2x_xgxs_deassert(params);
/* set bmac loopback */
- bnx2x_bmac_enable(params, vars, 1);
+ bnx2x_bmac_enable(params, vars, 1, 1);
REG_WR(bp, NIG_REG_EGRESS_DRAIN0_MODE + params->port*4, 0);
}
@@ -12263,7 +12387,7 @@ void bnx2x_init_xgxs_loopback(struct link_params *params,
if (USES_WARPCORE(bp))
bnx2x_xmac_enable(params, vars, 0);
else
- bnx2x_bmac_enable(params, vars, 0);
+ bnx2x_bmac_enable(params, vars, 0, 1);
}
if (params->loopback_mode == LOOPBACK_XGXS) {
@@ -12288,8 +12412,161 @@ void bnx2x_init_xgxs_loopback(struct link_params *params,
bnx2x_set_led(params, vars, LED_MODE_OPER, vars->line_speed);
}
+static void bnx2x_set_rx_filter(struct link_params *params, u8 en)
+{
+ struct bnx2x *bp = params->bp;
+ u8 val = en * 0x1F;
+
+ /* Open the gate between the NIG to the BRB */
+ if (!CHIP_IS_E1x(bp))
+ val |= en * 0x20;
+ REG_WR(bp, NIG_REG_LLH0_BRB1_DRV_MASK + params->port*4, val);
+
+ if (!CHIP_IS_E1(bp)) {
+ REG_WR(bp, NIG_REG_LLH0_BRB1_DRV_MASK_MF + params->port*4,
+ en*0x3);
+ }
+
+ REG_WR(bp, (params->port ? NIG_REG_LLH1_BRB1_NOT_MCP :
+ NIG_REG_LLH0_BRB1_NOT_MCP), en);
+}
+static int bnx2x_avoid_link_flap(struct link_params *params,
+ struct link_vars *vars)
+{
+ u32 phy_idx;
+ u32 dont_clear_stat, lfa_sts;
+ struct bnx2x *bp = params->bp;
+
+ /* Sync the link parameters */
+ bnx2x_link_status_update(params, vars);
+
+ /*
+ * The module verification was already done by previous link owner,
+ * so this call is meant only to get warning message
+ */
+
+ for (phy_idx = INT_PHY; phy_idx < params->num_phys; phy_idx++) {
+ struct bnx2x_phy *phy = ¶ms->phy[phy_idx];
+ if (phy->phy_specific_func) {
+ DP(NETIF_MSG_LINK, "Calling PHY specific func\n");
+ phy->phy_specific_func(phy, params, PHY_INIT);
+ }
+ if ((phy->media_type == ETH_PHY_SFPP_10G_FIBER) ||
+ (phy->media_type == ETH_PHY_SFP_1G_FIBER) ||
+ (phy->media_type == ETH_PHY_DA_TWINAX))
+ bnx2x_verify_sfp_module(phy, params);
+ }
+ lfa_sts = REG_RD(bp, params->lfa_base +
+ offsetof(struct shmem_lfa,
+ lfa_sts));
+
+ dont_clear_stat = lfa_sts & SHMEM_LFA_DONT_CLEAR_STAT;
+
+ /* Re-enable the NIG/MAC */
+ if (CHIP_IS_E3(bp)) {
+ if (!dont_clear_stat) {
+ REG_WR(bp, GRCBASE_MISC +
+ MISC_REGISTERS_RESET_REG_2_CLEAR,
+ (MISC_REGISTERS_RESET_REG_2_MSTAT0 <<
+ params->port));
+ REG_WR(bp, GRCBASE_MISC +
+ MISC_REGISTERS_RESET_REG_2_SET,
+ (MISC_REGISTERS_RESET_REG_2_MSTAT0 <<
+ params->port));
+ }
+ if (vars->line_speed < SPEED_10000)
+ bnx2x_umac_enable(params, vars, 0);
+ else
+ bnx2x_xmac_enable(params, vars, 0);
+ } else {
+ if (vars->line_speed < SPEED_10000)
+ bnx2x_emac_enable(params, vars, 0);
+ else
+ bnx2x_bmac_enable(params, vars, 0, !dont_clear_stat);
+ }
+
+ /* Increment LFA count */
+ lfa_sts = ((lfa_sts & ~LINK_FLAP_AVOIDANCE_COUNT_MASK) |
+ (((((lfa_sts & LINK_FLAP_AVOIDANCE_COUNT_MASK) >>
+ LINK_FLAP_AVOIDANCE_COUNT_OFFSET) + 1) & 0xff)
+ << LINK_FLAP_AVOIDANCE_COUNT_OFFSET));
+ /* Clear link flap reason */
+ lfa_sts &= ~LFA_LINK_FLAP_REASON_MASK;
+
+ REG_WR(bp, params->lfa_base +
+ offsetof(struct shmem_lfa, lfa_sts), lfa_sts);
+
+ /* Disable NIG DRAIN */
+ REG_WR(bp, NIG_REG_EGRESS_DRAIN0_MODE + params->port*4, 0);
+
+ /* Enable interrupts */
+ bnx2x_link_int_enable(params);
+ return 0;
+}
+
+static void bnx2x_cannot_avoid_link_flap(struct link_params *params,
+ struct link_vars *vars,
+ int lfa_status)
+{
+ u32 lfa_sts, cfg_idx, tmp_val;
+ struct bnx2x *bp = params->bp;
+
+ bnx2x_link_reset(params, vars, 1);
+
+ if (!params->lfa_base)
+ return;
+ /* Store the new link parameters */
+ REG_WR(bp, params->lfa_base +
+ offsetof(struct shmem_lfa, req_duplex),
+ params->req_duplex[0] | (params->req_duplex[1] << 16));
+
+ REG_WR(bp, params->lfa_base +
+ offsetof(struct shmem_lfa, req_flow_ctrl),
+ params->req_flow_ctrl[0] | (params->req_flow_ctrl[1] << 16));
+
+ REG_WR(bp, params->lfa_base +
+ offsetof(struct shmem_lfa, req_line_speed),
+ params->req_line_speed[0] | (params->req_line_speed[1] << 16));
+
+ for (cfg_idx = 0; cfg_idx < SHMEM_LINK_CONFIG_SIZE; cfg_idx++) {
+ REG_WR(bp, params->lfa_base +
+ offsetof(struct shmem_lfa,
+ speed_cap_mask[cfg_idx]),
+ params->speed_cap_mask[cfg_idx]);
+ }
+
+ tmp_val = REG_RD(bp, params->lfa_base +
+ offsetof(struct shmem_lfa, additional_config));
+ tmp_val &= ~REQ_FC_AUTO_ADV_MASK;
+ tmp_val |= params->req_fc_auto_adv;
+
+ REG_WR(bp, params->lfa_base +
+ offsetof(struct shmem_lfa, additional_config), tmp_val);
+
+ lfa_sts = REG_RD(bp, params->lfa_base +
+ offsetof(struct shmem_lfa, lfa_sts));
+
+ /* Clear the "Don't Clear Statistics" bit, and set reason */
+ lfa_sts &= ~SHMEM_LFA_DONT_CLEAR_STAT;
+
+ /* Set link flap reason */
+ lfa_sts &= ~LFA_LINK_FLAP_REASON_MASK;
+ lfa_sts |= ((lfa_status & LFA_LINK_FLAP_REASON_MASK) <<
+ LFA_LINK_FLAP_REASON_OFFSET);
+
+ /* Increment link flap counter */
+ lfa_sts = ((lfa_sts & ~LINK_FLAP_COUNT_MASK) |
+ (((((lfa_sts & LINK_FLAP_COUNT_MASK) >>
+ LINK_FLAP_COUNT_OFFSET) + 1) & 0xff)
+ << LINK_FLAP_COUNT_OFFSET));
+ REG_WR(bp, params->lfa_base +
+ offsetof(struct shmem_lfa, lfa_sts), lfa_sts);
+ /* Proceed with regular link initialization */
+}
+
int bnx2x_phy_init(struct link_params *params, struct link_vars *vars)
{
+ int lfa_status;
struct bnx2x *bp = params->bp;
DP(NETIF_MSG_LINK, "Phy Initialization started\n");
DP(NETIF_MSG_LINK, "(1) req_speed %d, req_flowctrl %d\n",
@@ -12304,6 +12581,19 @@ int bnx2x_phy_init(struct link_params *params, struct link_vars *vars)
vars->flow_ctrl = BNX2X_FLOW_CTRL_NONE;
vars->mac_type = MAC_TYPE_NONE;
vars->phy_flags = 0;
+ /* Driver opens NIG-BRB filters */
+ bnx2x_set_rx_filter(params, 1);
+ /* Check if link flap can be avoided */
+ lfa_status = bnx2x_check_lfa(params);
+
+ if (lfa_status == 0) {
+ DP(NETIF_MSG_LINK, "Link Flap Avoidance in progress\n");
+ return bnx2x_avoid_link_flap(params, vars);
+ }
+
+ DP(NETIF_MSG_LINK, "Cannot avoid link flap lfa_sta=0x%x\n",
+ lfa_status);
+ bnx2x_cannot_avoid_link_flap(params, vars, lfa_status);
/* Disable attentions */
bnx2x_bits_dis(bp, NIG_REG_MASK_INTERRUPT_PORT0 + params->port*4,
@@ -12386,13 +12676,12 @@ int bnx2x_link_reset(struct link_params *params, struct link_vars *vars,
REG_WR(bp, NIG_REG_EGRESS_EMAC0_OUT_EN + port*4, 0);
}
- /* Stop BigMac rx */
- if (!CHIP_IS_E3(bp))
- bnx2x_bmac_rx_disable(bp, port);
- else {
- bnx2x_xmac_disable(params);
- bnx2x_umac_disable(params);
- }
+ if (!CHIP_IS_E3(bp)) {
+ bnx2x_set_bmac_rx(bp, params->chip_id, port, 0);
+ } else {
+ bnx2x_set_xmac_rxtx(params, 0);
+ bnx2x_set_umac_rxtx(params, 0);
+ }
/* Disable emac */
if (!CHIP_IS_E3(bp))
REG_WR(bp, NIG_REG_NIG_EMAC0_EN + port*4, 0);
@@ -12450,6 +12739,56 @@ int bnx2x_link_reset(struct link_params *params, struct link_vars *vars,
vars->phy_flags = 0;
return 0;
}
+int bnx2x_lfa_reset(struct link_params *params,
+ struct link_vars *vars)
+{
+ struct bnx2x *bp = params->bp;
+ vars->link_up = 0;
+ vars->phy_flags = 0;
+ if (!params->lfa_base)
+ return bnx2x_link_reset(params, vars, 1);
+ /*
+ * Activate NIG drain so that during this time the device won't send
+ * anything while it is unable to response.
+ */
+ REG_WR(bp, NIG_REG_EGRESS_DRAIN0_MODE + params->port*4, 1);
+
+ /*
+ * Close gracefully the gate from BMAC to NIG such that no half packets
+ * are passed.
+ */
+ if (!CHIP_IS_E3(bp))
+ bnx2x_set_bmac_rx(bp, params->chip_id, params->port, 0);
+
+ if (CHIP_IS_E3(bp)) {
+ bnx2x_set_xmac_rxtx(params, 0);
+ bnx2x_set_umac_rxtx(params, 0);
+ }
+ /* Wait 10ms for the pipe to clean up*/
+ usleep_range(10000, 20000);
+
+ /* Clean the NIG-BRB using the network filters in a way that will
+ * not cut a packet in the middle.
+ */
+ bnx2x_set_rx_filter(params, 0);
+
+ /*
+ * Re-open the gate between the BMAC and the NIG, after verifying the
+ * gate to the BRB is closed, otherwise packets may arrive to the
+ * firmware before driver had initialized it. The target is to achieve
+ * minimum management protocol down time.
+ */
+ if (!CHIP_IS_E3(bp))
+ bnx2x_set_bmac_rx(bp, params->chip_id, params->port, 1);
+
+ if (CHIP_IS_E3(bp)) {
+ bnx2x_set_xmac_rxtx(params, 1);
+ bnx2x_set_umac_rxtx(params, 1);
+ }
+ /* Disable NIG drain */
+ REG_WR(bp, NIG_REG_EGRESS_DRAIN0_MODE + params->port*4, 0);
+ return 0;
+}
/****************************************************************************/
/* Common function */
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.h
index 600ffda..5b64d3d 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.h
@@ -305,6 +305,8 @@ struct link_params {
struct bnx2x *bp;
u16 req_fc_auto_adv; /* Should be set to TX / BOTH when
req_flow_ctrl is set to AUTO */
+ u16 rsrv1;
+ u32 lfa_base;
};
/* Output parameters */
--
1.7.9.rc2
^ permalink raw reply related
* [net-next PATCH 1/3] bnx2x: link code refactoring
From: Yuval Mintz @ 2012-09-13 12:56 UTC (permalink / raw)
To: davem, netdev; +Cc: eilong, ariele, Yaniv Rosner, Yuval Mintz
In-Reply-To: <1347540981-16198-1-git-send-email-yuvalmin@broadcom.com>
From: Yaniv Rosner <yaniv.rosner@broadcom.com>
Separate the interrupt setting part of each external PHY to a specific
function.
This allows calling the interrupt setting in case of link-flap avoidance,
since some link owners may not enable the interrupt on their own.
Signed-off-by: Yaniv Rosner <yaniv.rosner@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c | 192 +++++++++++++---------
drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.h | 1 +
2 files changed, 114 insertions(+), 79 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c
index f4beb46..05620ef 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c
@@ -7203,6 +7203,22 @@ static void bnx2x_8073_set_pause_cl37(struct link_params *params,
msleep(500);
}
+static void bnx2x_8073_specific_func(struct bnx2x_phy *phy,
+ struct link_params *params,
+ u32 action)
+{
+ struct bnx2x *bp = params->bp;
+ switch (action) {
+ case PHY_INIT:
+ /* Enable LASI */
+ bnx2x_cl45_write(bp, phy,
+ MDIO_PMA_DEVAD, MDIO_PMA_LASI_RXCTRL, (1<<2));
+ bnx2x_cl45_write(bp, phy,
+ MDIO_PMA_DEVAD, MDIO_PMA_LASI_CTRL, 0x0004);
+ break;
+ }
+}
+
static int bnx2x_8073_config_init(struct bnx2x_phy *phy,
struct link_params *params,
struct link_vars *vars)
@@ -7223,12 +7239,7 @@ static int bnx2x_8073_config_init(struct bnx2x_phy *phy,
bnx2x_set_gpio(bp, MISC_REGISTERS_GPIO_1,
MISC_REGISTERS_GPIO_OUTPUT_HIGH, gpio_port);
- /* Enable LASI */
- bnx2x_cl45_write(bp, phy,
- MDIO_PMA_DEVAD, MDIO_PMA_LASI_RXCTRL, (1<<2));
- bnx2x_cl45_write(bp, phy,
- MDIO_PMA_DEVAD, MDIO_PMA_LASI_CTRL, 0x0004);
-
+ bnx2x_8073_specific_func(phy, params, PHY_INIT);
bnx2x_8073_set_pause_cl37(params, phy, vars);
bnx2x_cl45_read(bp, phy,
@@ -8263,7 +8274,7 @@ static void bnx2x_8727_specific_func(struct bnx2x_phy *phy,
u32 action)
{
struct bnx2x *bp = params->bp;
-
+ u16 val;
switch (action) {
case DISABLE_TX:
bnx2x_sfp_set_transmitter(params, phy, 0);
@@ -8272,6 +8283,40 @@ static void bnx2x_8727_specific_func(struct bnx2x_phy *phy,
if (!(phy->flags & FLAGS_SFP_NOT_APPROVED))
bnx2x_sfp_set_transmitter(params, phy, 1);
break;
+ case PHY_INIT:
+ bnx2x_cl45_write(bp, phy,
+ MDIO_PMA_DEVAD, MDIO_PMA_LASI_RXCTRL,
+ (1<<2) | (1<<5));
+ bnx2x_cl45_write(bp, phy,
+ MDIO_PMA_DEVAD, MDIO_PMA_LASI_TXCTRL,
+ 0);
+ bnx2x_cl45_write(bp, phy,
+ MDIO_PMA_DEVAD, MDIO_PMA_LASI_CTRL, 0x0006);
+ /* Make MOD_ABS give interrupt on change */
+ bnx2x_cl45_read(bp, phy, MDIO_PMA_DEVAD,
+ MDIO_PMA_REG_8727_PCS_OPT_CTRL,
+ &val);
+ val |= (1<<12);
+ if (phy->flags & FLAGS_NOC)
+ val |= (3<<5);
+ /* Set 8727 GPIOs to input to allow reading from the 8727 GPIO0
+ * status which reflect SFP+ module over-current
+ */
+ if (!(phy->flags & FLAGS_NOC))
+ val &= 0xff8f; /* Reset bits 4-6 */
+ bnx2x_cl45_write(bp, phy,
+ MDIO_PMA_DEVAD, MDIO_PMA_REG_8727_PCS_OPT_CTRL,
+ val);
+
+ /* Set 2-wire transfer rate of SFP+ module EEPROM
+ * to 100Khz since some DACs(direct attached cables) do
+ * not work at 400Khz.
+ */
+ bnx2x_cl45_write(bp, phy,
+ MDIO_PMA_DEVAD,
+ MDIO_PMA_REG_8727_TWO_WIRE_SLAVE_ADDR,
+ 0xa001);
+ break;
default:
DP(NETIF_MSG_LINK, "Function 0x%x not supported by 8727\n",
action);
@@ -9054,28 +9099,15 @@ static int bnx2x_8727_config_init(struct bnx2x_phy *phy,
struct link_vars *vars)
{
u32 tx_en_mode;
- u16 tmp1, val, mod_abs, tmp2;
- u16 rx_alarm_ctrl_val;
- u16 lasi_ctrl_val;
+ u16 tmp1, mod_abs, tmp2;
struct bnx2x *bp = params->bp;
/* Enable PMD link, MOD_ABS_FLT, and 1G link alarm */
bnx2x_wait_reset_complete(bp, phy, params);
- rx_alarm_ctrl_val = (1<<2) | (1<<5) ;
- /* Should be 0x6 to enable XS on Tx side. */
- lasi_ctrl_val = 0x0006;
DP(NETIF_MSG_LINK, "Initializing BCM8727\n");
- /* Enable LASI */
- bnx2x_cl45_write(bp, phy,
- MDIO_PMA_DEVAD, MDIO_PMA_LASI_RXCTRL,
- rx_alarm_ctrl_val);
- bnx2x_cl45_write(bp, phy,
- MDIO_PMA_DEVAD, MDIO_PMA_LASI_TXCTRL,
- 0);
- bnx2x_cl45_write(bp, phy,
- MDIO_PMA_DEVAD, MDIO_PMA_LASI_CTRL, lasi_ctrl_val);
+ bnx2x_8727_specific_func(phy, params, PHY_INIT);
/* Initially configure MOD_ABS to interrupt when module is
* presence( bit 8)
*/
@@ -9091,25 +9123,9 @@ static int bnx2x_8727_config_init(struct bnx2x_phy *phy,
bnx2x_cl45_write(bp, phy,
MDIO_PMA_DEVAD, MDIO_PMA_REG_PHY_IDENTIFIER, mod_abs);
-
/* Enable/Disable PHY transmitter output */
bnx2x_set_disable_pmd_transmit(params, phy, 0);
- /* Make MOD_ABS give interrupt on change */
- bnx2x_cl45_read(bp, phy, MDIO_PMA_DEVAD, MDIO_PMA_REG_8727_PCS_OPT_CTRL,
- &val);
- val |= (1<<12);
- if (phy->flags & FLAGS_NOC)
- val |= (3<<5);
-
- /* Set 8727 GPIOs to input to allow reading from the 8727 GPIO0
- * status which reflect SFP+ module over-current
- */
- if (!(phy->flags & FLAGS_NOC))
- val &= 0xff8f; /* Reset bits 4-6 */
- bnx2x_cl45_write(bp, phy,
- MDIO_PMA_DEVAD, MDIO_PMA_REG_8727_PCS_OPT_CTRL, val);
-
bnx2x_8727_power_module(bp, phy, 1);
bnx2x_cl45_read(bp, phy,
@@ -9119,13 +9135,7 @@ static int bnx2x_8727_config_init(struct bnx2x_phy *phy,
MDIO_PMA_DEVAD, MDIO_PMA_LASI_RXSTAT, &tmp1);
bnx2x_8727_config_speed(phy, params);
- /* Set 2-wire transfer rate of SFP+ module EEPROM
- * to 100Khz since some DACs(direct attached cables) do
- * not work at 400Khz.
- */
- bnx2x_cl45_write(bp, phy,
- MDIO_PMA_DEVAD, MDIO_PMA_REG_8727_TWO_WIRE_SLAVE_ADDR,
- 0xa001);
+
/* Set TX PreEmphasis if needed */
if ((params->feature_config_flags &
@@ -9554,6 +9564,29 @@ static void bnx2x_848xx_set_led(struct bnx2x *bp,
0xFFFB, 0xFFFD);
}
+static void bnx2x_848xx_specific_func(struct bnx2x_phy *phy,
+ struct link_params *params,
+ u32 action)
+{
+ struct bnx2x *bp = params->bp;
+ switch (action) {
+ case PHY_INIT:
+ if (phy->type != PORT_HW_CFG_XGXS_EXT_PHY_TYPE_BCM84833) {
+ /* Save spirom version */
+ bnx2x_save_848xx_spirom_version(phy, bp, params->port);
+ }
+ /* This phy uses the NIG latch mechanism since link indication
+ * arrives through its LED4 and not via its LASI signal, so we
+ * get steady signal instead of clear on read
+ */
+ bnx2x_bits_en(bp, NIG_REG_LATCH_BC_0 + params->port*4,
+ 1 << NIG_LATCH_BC_ENABLE_MI_INT);
+
+ bnx2x_848xx_set_led(bp, phy);
+ break;
+ }
+}
+
static int bnx2x_848xx_cmn_config_init(struct bnx2x_phy *phy,
struct link_params *params,
struct link_vars *vars)
@@ -9561,22 +9594,10 @@ static int bnx2x_848xx_cmn_config_init(struct bnx2x_phy *phy,
struct bnx2x *bp = params->bp;
u16 autoneg_val, an_1000_val, an_10_100_val, an_10g_val;
- if (phy->type != PORT_HW_CFG_XGXS_EXT_PHY_TYPE_BCM84833) {
- /* Save spirom version */
- bnx2x_save_848xx_spirom_version(phy, bp, params->port);
- }
- /* This phy uses the NIG latch mechanism since link indication
- * arrives through its LED4 and not via its LASI signal, so we
- * get steady signal instead of clear on read
- */
- bnx2x_bits_en(bp, NIG_REG_LATCH_BC_0 + params->port*4,
- 1 << NIG_LATCH_BC_ENABLE_MI_INT);
-
+ bnx2x_848xx_specific_func(phy, params, PHY_INIT);
bnx2x_cl45_write(bp, phy,
MDIO_PMA_DEVAD, MDIO_PMA_REG_CTRL, 0x0000);
- bnx2x_848xx_set_led(bp, phy);
-
/* set 1000 speed advertisement */
bnx2x_cl45_read(bp, phy,
MDIO_AN_DEVAD, MDIO_AN_REG_8481_1000T_CTRL,
@@ -10565,6 +10586,35 @@ static void bnx2x_848xx_set_link_led(struct bnx2x_phy *phy,
/******************************************************************/
/* 54618SE PHY SECTION */
/******************************************************************/
+static void bnx2x_54618se_specific_func(struct bnx2x_phy *phy,
+ struct link_params *params,
+ u32 action)
+{
+ struct bnx2x *bp = params->bp;
+ u16 temp;
+ switch (action) {
+ case PHY_INIT:
+ /* Configure LED4: set to INTR (0x6). */
+ /* Accessing shadow register 0xe. */
+ bnx2x_cl22_write(bp, phy,
+ MDIO_REG_GPHY_SHADOW,
+ MDIO_REG_GPHY_SHADOW_LED_SEL2);
+ bnx2x_cl22_read(bp, phy,
+ MDIO_REG_GPHY_SHADOW,
+ &temp);
+ temp &= ~(0xf << 4);
+ temp |= (0x6 << 4);
+ bnx2x_cl22_write(bp, phy,
+ MDIO_REG_GPHY_SHADOW,
+ MDIO_REG_GPHY_SHADOW_WR_ENA | temp);
+ /* Configure INTR based on link status change. */
+ bnx2x_cl22_write(bp, phy,
+ MDIO_REG_INTR_MASK,
+ ~MDIO_REG_INTR_MASK_LINK_STATUS);
+ break;
+ }
+}
+
static int bnx2x_54618se_config_init(struct bnx2x_phy *phy,
struct link_params *params,
struct link_vars *vars)
@@ -10602,24 +10652,8 @@ static int bnx2x_54618se_config_init(struct bnx2x_phy *phy,
/* Wait for GPHY to reset */
msleep(50);
- /* Configure LED4: set to INTR (0x6). */
- /* Accessing shadow register 0xe. */
- bnx2x_cl22_write(bp, phy,
- MDIO_REG_GPHY_SHADOW,
- MDIO_REG_GPHY_SHADOW_LED_SEL2);
- bnx2x_cl22_read(bp, phy,
- MDIO_REG_GPHY_SHADOW,
- &temp);
- temp &= ~(0xf << 4);
- temp |= (0x6 << 4);
- bnx2x_cl22_write(bp, phy,
- MDIO_REG_GPHY_SHADOW,
- MDIO_REG_GPHY_SHADOW_WR_ENA | temp);
- /* Configure INTR based on link status change. */
- bnx2x_cl22_write(bp, phy,
- MDIO_REG_INTR_MASK,
- ~MDIO_REG_INTR_MASK_LINK_STATUS);
+ bnx2x_54618se_specific_func(phy, params, PHY_INIT);
/* Flip the signal detect polarity (set 0x1c.0x1e[8]). */
bnx2x_cl22_write(bp, phy,
MDIO_REG_GPHY_SHADOW,
@@ -11349,7 +11383,7 @@ static struct bnx2x_phy phy_8073 = {
.format_fw_ver = (format_fw_ver_t)bnx2x_format_ver,
.hw_reset = (hw_reset_t)NULL,
.set_link_led = (set_link_led_t)NULL,
- .phy_specific_func = (phy_specific_func_t)NULL
+ .phy_specific_func = (phy_specific_func_t)bnx2x_8073_specific_func
};
static struct bnx2x_phy phy_8705 = {
.type = PORT_HW_CFG_XGXS_EXT_PHY_TYPE_BCM8705,
@@ -11542,7 +11576,7 @@ static struct bnx2x_phy phy_84823 = {
.format_fw_ver = (format_fw_ver_t)bnx2x_848xx_format_ver,
.hw_reset = (hw_reset_t)NULL,
.set_link_led = (set_link_led_t)bnx2x_848xx_set_link_led,
- .phy_specific_func = (phy_specific_func_t)NULL
+ .phy_specific_func = (phy_specific_func_t)bnx2x_848xx_specific_func
};
static struct bnx2x_phy phy_84833 = {
@@ -11578,7 +11612,7 @@ static struct bnx2x_phy phy_84833 = {
.format_fw_ver = (format_fw_ver_t)bnx2x_848xx_format_ver,
.hw_reset = (hw_reset_t)bnx2x_84833_hw_reset_phy,
.set_link_led = (set_link_led_t)bnx2x_848xx_set_link_led,
- .phy_specific_func = (phy_specific_func_t)NULL
+ .phy_specific_func = (phy_specific_func_t)bnx2x_848xx_specific_func
};
static struct bnx2x_phy phy_54618se = {
@@ -11612,7 +11646,7 @@ static struct bnx2x_phy phy_54618se = {
.format_fw_ver = (format_fw_ver_t)NULL,
.hw_reset = (hw_reset_t)NULL,
.set_link_led = (set_link_led_t)bnx2x_5461x_set_link_led,
- .phy_specific_func = (phy_specific_func_t)NULL
+ .phy_specific_func = (phy_specific_func_t)bnx2x_54618se_specific_func
};
/*****************************************************************/
/* */
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.h
index 51cac81..600ffda 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.h
@@ -216,6 +216,7 @@ struct bnx2x_phy {
phy_specific_func_t phy_specific_func;
#define DISABLE_TX 1
#define ENABLE_TX 2
+#define PHY_INIT 3
};
/* Inputs parameters to the CLC */
--
1.7.9.rc2
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox