Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [RFC PATCH net-next 0/5] Ease netns management for userland
From: Nicolas Dichtel @ 2012-12-13 17:41 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: netdev, davem, aatteka
In-Reply-To: <87sj7beyc1.fsf@xmission.com>

Le 12/12/2012 22:48, Eric W. Biederman a écrit :
> ebiederm@xmission.com (Eric W. Biederman) writes:
>
>> It is very wrong to presume that without context you know the reason for
>> the exsitence of any network namespace and that you should or even that
>> you can manage it.  Think of running your multi-network namespace
>> managing application in a container.
>
> A good example of a network namespace you don't want to mess with are
> the network namespaces created by vsftp and chrome for security purposes
> to remove any possibility of creating new connections to the network.
>
Ok, I get the point.

A last question: from an administration point of view, is it intended to
not be able to monitor which netns are currently used? Like it can be done
for sockets, files, ...

^ permalink raw reply

* [RFC][PATCH] bgmac: driver for GBit MAC core on BCMA bus
From: Rafał Miłecki @ 2012-12-13 17:43 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: Rafał Miłecki

BCMA is a Broadcom specific bus with devices AKA cores. All recent BCMA
based SoCs have gigabit ethernet provided by the GBit MAC core. This
patch adds driver for such a cores registering itself as a netdev. It
has been tested on a BCM4706 and BCM4718 chipsets.

In the kernel tree there is already b44 driver which has some common
things with bgmac, however there are many differences that has led to
the decision or writing a new driver:
1) GBit MAC cores appear on BCMA bus (not SSB as in case of b44)
2) There is 64bit DMA engine which differs from 32bit one
3) There is no CAM (Content Addressable Memory) in GBit MAC
4) We have 4 TX queues on GBit MAC devices (instead of 1)
5) Many registers have different addresses/values
6) RX header flags are also different

The driver in it's state is functional how, however there is of course
place for improvements:
1) Supporting more net_device_ops
2) SUpporting more ethtool_ops
3) Unaligned addressing in DMA
4) Writing separated PHY driver

It's just a RFC, so be aware of two ugly parts in this version:
1) bgmac_stop doesn't stop device and free resources
2) bgmac_dma_rx_read doesn't check for alloc result
---
 drivers/bcma/driver_chipcommon_pmu.c        |    3 +-
 drivers/net/ethernet/broadcom/Kconfig       |    9 +
 drivers/net/ethernet/broadcom/Makefile      |    1 +
 drivers/net/ethernet/broadcom/bgmac.c       | 1396 +++++++++++++++++++++++++++
 drivers/net/ethernet/broadcom/bgmac.h       |  425 ++++++++
 include/linux/bcma/bcma_driver_chipcommon.h |    2 +
 6 files changed, 1835 insertions(+), 1 deletions(-)
 create mode 100644 drivers/net/ethernet/broadcom/bgmac.c
 create mode 100644 drivers/net/ethernet/broadcom/bgmac.h

diff --git a/drivers/bcma/driver_chipcommon_pmu.c b/drivers/bcma/driver_chipcommon_pmu.c
index c62c788..932b101 100644
--- a/drivers/bcma/driver_chipcommon_pmu.c
+++ b/drivers/bcma/driver_chipcommon_pmu.c
@@ -264,7 +264,7 @@ static u32 bcma_pmu_pll_clock_bcm4706(struct bcma_drv_cc *cc, u32 pll0, u32 m)
 }
 
 /* query bus clock frequency for PMU-enabled chipcommon */
-static u32 bcma_pmu_get_bus_clock(struct bcma_drv_cc *cc)
+u32 bcma_pmu_get_bus_clock(struct bcma_drv_cc *cc)
 {
 	struct bcma_bus *bus = cc->core->bus;
 
@@ -293,6 +293,7 @@ static u32 bcma_pmu_get_bus_clock(struct bcma_drv_cc *cc)
 	}
 	return BCMA_CC_PMU_HT_CLOCK;
 }
+EXPORT_SYMBOL_GPL(bcma_pmu_get_bus_clock);
 
 /* query cpu clock frequency for PMU-enabled chipcommon */
 u32 bcma_pmu_get_cpu_clock(struct bcma_drv_cc *cc)
diff --git a/drivers/net/ethernet/broadcom/Kconfig b/drivers/net/ethernet/broadcom/Kconfig
index 4bd416b..fc3d99d 100644
--- a/drivers/net/ethernet/broadcom/Kconfig
+++ b/drivers/net/ethernet/broadcom/Kconfig
@@ -120,4 +120,13 @@ config BNX2X
 	  To compile this driver as a module, choose M here: the module
 	  will be called bnx2x.  This is recommended.
 
+config BGMAC
+	tristate "BCMA bus GBit core support"
+	depends on BCMA_HOST_SOC && HAS_DMA
+	---help---
+	  This driver supports GBit MAC and BCM4706 GBit MAC cores on BCMA bus.
+	  They can be found on BCM47xx SoCs and provide gigabit ethernet.
+	  In case of using this driver on BCM4706 it's also requires to enable
+	  BCMA_DRIVER_GMAC_CMN to make it work.
+
 endif # NET_VENDOR_BROADCOM
diff --git a/drivers/net/ethernet/broadcom/Makefile b/drivers/net/ethernet/broadcom/Makefile
index b789605..68efa1a 100644
--- a/drivers/net/ethernet/broadcom/Makefile
+++ b/drivers/net/ethernet/broadcom/Makefile
@@ -9,3 +9,4 @@ obj-$(CONFIG_CNIC) += cnic.o
 obj-$(CONFIG_BNX2X) += bnx2x/
 obj-$(CONFIG_SB1250_MAC) += sb1250-mac.o
 obj-$(CONFIG_TIGON3) += tg3.o
+obj-$(CONFIG_BGMAC) += bgmac.o
diff --git a/drivers/net/ethernet/broadcom/bgmac.c b/drivers/net/ethernet/broadcom/bgmac.c
new file mode 100644
index 0000000..ba746a5
--- /dev/null
+++ b/drivers/net/ethernet/broadcom/bgmac.c
@@ -0,0 +1,1396 @@
+/*
+ * Broadcom specific AMBA
+ * Bus subsystem
+ *
+ * Copyright (C) 2012 Rafał Miłecki <zajec5@gmail.com>
+ *
+ * Licensed under the GNU/GPL. See COPYING for details.
+ */
+
+#define pr_fmt(fmt)		KBUILD_MODNAME ": " fmt
+
+#define bgmac_err(bgmac, fmt, ...) \
+	pr_err("u%d: " fmt, (bgmac)->core->core_unit, ##__VA_ARGS__)
+#define bgmac_warn(bgmac, fmt, ...) \
+	pr_warn("u%d: " fmt, (bgmac)->core->core_unit, ##__VA_ARGS__)
+#define bgmac_info(bgmac, fmt, ...) \
+	pr_info("u%d: " fmt, (bgmac)->core->core_unit, ##__VA_ARGS__)
+#define bgmac_debug(bgmac, fmt, ...) \
+	pr_debug("u%d: " fmt, (bgmac)->core->core_unit, ##__VA_ARGS__)
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/delay.h>
+#include <linux/etherdevice.h>
+#include <linux/mii.h>
+#include <linux/interrupt.h>
+#include <linux/dma-mapping.h>
+#include <asm/mach-bcm47xx/nvram.h>
+
+#include "bgmac.h"
+
+#define BGMAC_WEIGHT	64
+
+#define ETHER_MAX_LEN   1518
+
+static const struct bcma_device_id bgmac_bcma_tbl[] = {
+	BCMA_CORE(BCMA_MANUF_BCM, BCMA_CORE_4706_MAC_GBIT, BCMA_ANY_REV, BCMA_ANY_CLASS),
+	BCMA_CORE(BCMA_MANUF_BCM, BCMA_CORE_MAC_GBIT, BCMA_ANY_REV, BCMA_ANY_CLASS),
+	BCMA_CORETABLE_END
+};
+MODULE_DEVICE_TABLE(bcma, bgmac_bcma_tbl);
+
+static bool bgmac_wait_value(struct bcma_device *core, u16 reg, u32 mask,
+			     u32 value, int timeout)
+{
+	u32 val;
+	int i;
+
+	for (i = 0; i < timeout / 10; i++) {
+		val = bcma_read32(core, reg);
+		if ((val & mask) == value)
+			return true;
+		udelay(10);
+	}
+	pr_err("Timeout waiting for reg 0x%X\n", reg);
+	return false;
+}
+
+/**************************************************
+ * DMA
+ **************************************************/
+
+static void bgmac_dma_tx_reset(struct bgmac *bgmac, struct bgmac_dma_ring *ring)
+{
+	u32 val;
+	int i;
+
+	if (!ring->mmio_base)
+		return;
+
+	/* Susend DMA TX ring first.
+	 * bgmac_wait_value doesn't support waiting for any of few values, so
+	 * implement whole loop here.
+	 */
+	bgmac_write(bgmac, ring->mmio_base + BGMAC_DMA_TX_CTL,
+		    BGMAC_DMA_TX_SUSPEND);
+	for (i = 0; i < 10000 / 10; i++) {
+		val = bgmac_read(bgmac, ring->mmio_base + BGMAC_DMA_TX_STATUS);
+		val &= BGMAC_DMA_TX_STAT;
+		if (val == BGMAC_DMA_TX_STAT_DISABLED ||
+		    val == BGMAC_DMA_TX_STAT_IDLEWAIT ||
+		    val == BGMAC_DMA_TX_STAT_STOPPED) {
+			i = 0;
+			break;
+		}
+		udelay(10);
+	}
+	if (i)
+		bgmac_err(bgmac, "Timeout suspending DMA TX ring 0x%X (BGMAC_DMA_TX_STAT: 0x%08X)\n",
+			  ring->mmio_base, val);
+
+	/* Remove SUSPEND bit */
+	bgmac_write(bgmac, ring->mmio_base + BGMAC_DMA_TX_CTL, 0);
+	if (!bgmac_wait_value(bgmac->core,
+			      ring->mmio_base + BGMAC_DMA_TX_STATUS,
+			      BGMAC_DMA_TX_STAT, BGMAC_DMA_TX_STAT_DISABLED,
+			      10000)) {
+		bgmac_warn(bgmac, "DMA TX ring 0x%X wasn't disabled on time, waiting additional 300us\n", ring->mmio_base);
+		udelay(300);
+		val = bgmac_read(bgmac, ring->mmio_base + BGMAC_DMA_TX_STATUS);
+		if ((val & BGMAC_DMA_TX_STAT) != BGMAC_DMA_TX_STAT_DISABLED)
+			bgmac_err(bgmac, "Reset of DMA TX ring 0x%X failed\n", ring->mmio_base);
+	}
+}
+
+static void bgmac_dma_tx_enable(struct bgmac *bgmac,
+				struct bgmac_dma_ring *ring)
+{
+	u32 ctl;
+
+	ctl = bgmac_read(bgmac, ring->mmio_base + BGMAC_DMA_TX_CTL);
+	ctl |= BGMAC_DMA_TX_ENABLE;
+	ctl |= BGMAC_DMA_TX_PARITY_DISABLE;
+	bgmac_write(bgmac, ring->mmio_base + BGMAC_DMA_TX_CTL, ctl);
+}
+
+static netdev_tx_t bgmac_dma_tx_add(struct bgmac *bgmac,
+				    struct bgmac_dma_ring *ring,
+				    struct sk_buff *skb)
+{
+	struct device *dma_dev = bgmac->core->dma_dev;
+	struct bgmac_dma_desc *dma_desc;
+	struct bgmac_slot_info *slot;
+	u32 ctl0, ctl1;
+	int free_slots;
+
+	if (skb->len > BGMAC_DESC_CTL1_LEN) {
+		bgmac_err(bgmac, "Too long skb (%d)\n", skb->len);
+		return NETDEV_TX_BUSY;
+	}
+
+	if (ring->start <= ring->end)
+		free_slots = ring->start - ring->end + BGMAC_TX_RING_SLOTS;
+	else
+		free_slots = ring->start - ring->end;
+	if (free_slots <= 1) {
+		bgmac_err(bgmac, "No free slots on ring 0x%X!\n", ring->mmio_base);
+		netif_stop_queue(bgmac->net_dev);
+		return NETDEV_TX_BUSY;
+	}
+
+	slot = &ring->slots[ring->end];
+	slot->skb = skb;
+	slot->dma_addr = dma_map_single(dma_dev, skb->data, skb->len, DMA_TO_DEVICE);
+	if (dma_mapping_error(dma_dev, slot->dma_addr)) {
+		bgmac_err(bgmac, "Mapping error of skb on ring 0x%X\n", ring->mmio_base);
+		return NETDEV_TX_BUSY;
+	}
+
+	ctl0 = BGMAC_DESC_CTL0_IOC | BGMAC_DESC_CTL0_SOF | BGMAC_DESC_CTL0_EOF;
+	if (ring->end == ring->num_slots - 1)
+		ctl0 |= BGMAC_DESC_CTL0_EOT;
+	ctl1 = skb->len & BGMAC_DESC_CTL1_LEN;
+
+	dma_desc = ring->cpu_base;
+	dma_desc += ring->end;
+	dma_desc->addr_low = cpu_to_le32(lower_32_bits(slot->dma_addr));
+	dma_desc->addr_high = cpu_to_le32(upper_32_bits(slot->dma_addr));
+	dma_desc->ctl0 = cpu_to_le32(ctl0);
+	dma_desc->ctl1 = cpu_to_le32(ctl1);
+
+	wmb();
+
+	/* Increase ring->end to point empty slot. We tell hardware the first
+	 * slot it shold *not* read.
+	 */
+	if (++ring->end >= BGMAC_TX_RING_SLOTS)
+		ring->end = 0;
+	bgmac_write(bgmac, ring->mmio_base + BGMAC_DMA_TX_INDEX,
+		    ring->end * sizeof(struct bgmac_dma_desc));
+
+	return NETDEV_TX_OK;
+}
+
+/* Free transmitted packets */
+static void bgmac_dma_tx_free(struct bgmac *bgmac, struct bgmac_dma_ring *ring)
+{
+	struct device *dma_dev = bgmac->core->dma_dev;
+	struct bgmac_dma_desc *dma_desc;
+	struct bgmac_slot_info *slot;
+	int empty_slot;
+
+	/* The last slot that hardware didn't consume yet */
+	empty_slot = bgmac_read(bgmac, ring->mmio_base + BGMAC_DMA_TX_STATUS);
+	empty_slot &= BGMAC_DMA_TX_STATDPTR;
+	empty_slot /= sizeof(struct bgmac_dma_desc);
+
+	while (ring->start != empty_slot) {
+		/* Set pointers */
+		dma_desc = ring->cpu_base;
+		dma_desc += ring->start;
+		slot = &ring->slots[ring->start];
+
+		if (slot->skb) {
+			/* Unmap no longer used buffer */
+			dma_unmap_single(dma_dev, slot->dma_addr,
+					 slot->skb->len, DMA_TO_DEVICE);
+			slot->dma_addr = 0;
+
+			/* Free memory! :) */
+			dev_kfree_skb_any(slot->skb);
+			slot->skb = NULL;
+		} else {
+			bgmac_err(bgmac, "Hardware reported transmission for empty TX ring slot %d! End of ring: %d", ring->start, ring->end);
+		}
+
+		if (++ring->start >= BGMAC_TX_RING_SLOTS)
+			ring->start = 0;
+	}
+}
+
+static void bgmac_dma_rx_reset(struct bgmac *bgmac, struct bgmac_dma_ring *ring)
+{
+	if (!ring->mmio_base)
+		return;
+
+	bgmac_write(bgmac, ring->mmio_base + BGMAC_DMA_RX_CTL, 0);
+	if (!bgmac_wait_value(bgmac->core,
+			      ring->mmio_base + BGMAC_DMA_RX_STATUS,
+			      BGMAC_DMA_RX_STAT, BGMAC_DMA_RX_STAT_DISABLED,
+			      10000))
+		bgmac_err(bgmac, "Reset of ring 0x%X RX failed\n", ring->mmio_base);
+}
+
+static void bgmac_dma_rx_enable(struct bgmac *bgmac,
+				struct bgmac_dma_ring *ring)
+{
+	u32 ctl;
+
+	ctl = bgmac_read(bgmac, ring->mmio_base + BGMAC_DMA_RX_CTL);
+	ctl &= BGMAC_DMA_RX_ADDREXT_MASK;
+	ctl |= BGMAC_DMA_RX_ENABLE;
+	ctl |= BGMAC_DMA_RX_PARITY_DISABLE;
+	ctl |= BGMAC_DMA_RX_OVERFLOW_CONT;
+	ctl |= BGMAC_RX_FRAME_OFFSET << BGMAC_DMA_RX_FRAME_OFFSET_SHIFT;
+	bgmac_write(bgmac, ring->mmio_base + BGMAC_DMA_RX_CTL, ctl);
+}
+
+static int bgmac_dma_rx_skb_for_slot(struct bgmac *bgmac,
+				     struct bgmac_slot_info *slot)
+{
+	struct device *dma_dev = bgmac->core->dma_dev;
+	struct bgmac_rx_header *rx;
+
+	/* Alloc skb */
+	slot->skb = netdev_alloc_skb(bgmac->net_dev, BGMAC_RX_BUF_SIZE);
+	if (!slot->skb) {
+		bgmac_err(bgmac, "Allocation of skb failed!\n");
+		return -ENOMEM;
+	}
+
+	/* Poison - if everything goes fine, hardware will overwrite it */
+	rx = (struct bgmac_rx_header *)slot->skb->data;
+	rx->len = cpu_to_le16(0xdead);
+	rx->flags = cpu_to_le16(0xbeef);
+
+	/* Map skb for the DMA */
+	slot->dma_addr = dma_map_single(dma_dev, slot->skb->data,
+					BGMAC_RX_BUF_SIZE, DMA_FROM_DEVICE);
+	if (dma_mapping_error(dma_dev, slot->dma_addr)) {
+		bgmac_err(bgmac, "DMA mapping error\n");
+		return -ENOMEM;
+	}
+	if (slot->dma_addr & 0xC0000000)
+		bgmac_warn(bgmac, "DMA address using 0xC0000000 bit(s), it may need translation trick\n");
+
+	return 0;
+}
+
+static int bgmac_dma_rx_read(struct bgmac *bgmac, struct bgmac_dma_ring *ring,
+			     int weight)
+{
+	struct device *dma_dev = bgmac->core->dma_dev;
+	struct bgmac_dma_desc *dma_desc;
+	struct bgmac_slot_info *slot;
+	struct sk_buff *skb;
+	struct bgmac_rx_header *rx;
+	u32 end_slot;
+	u16 len, flags;
+	int handled = 0;
+
+	end_slot = bgmac_read(bgmac, ring->mmio_base + BGMAC_DMA_RX_STATUS);
+	end_slot &= BGMAC_DMA_RX_STATDPTR;
+	end_slot /= sizeof(struct bgmac_dma_desc);
+
+	ring->end = end_slot;
+
+	while (ring->start != ring->end) {
+		/* Set pointers */
+		dma_desc = ring->cpu_base;
+		dma_desc += ring->start;
+		slot = &ring->slots[ring->start];
+		skb = slot->skb;
+
+		/* Unmap buffer to make it accessible to the CPU */
+		dma_unmap_single(dma_dev, slot->dma_addr, BGMAC_RX_BUF_SIZE,
+				 DMA_FROM_DEVICE);
+
+		/* Get info from the header */
+		rx = (struct bgmac_rx_header *)skb->data;
+		len = le16_to_cpu(rx->len);
+		flags = le16_to_cpu(rx->flags);
+
+		/* Check for poison and drop or pass packet */
+		if (len == 0xdead && flags == 0xbeef) {
+			bgmac_err(bgmac, "Found poisoned packet at slot %d, DMA issue!\n", ring->start);
+			dev_kfree_skb_any(skb);
+		} else {
+			/* Remove header from the skb and pass it to the net */
+			skb_put(skb, BGMAC_RX_FRAME_OFFSET + len);
+			skb_pull(skb, BGMAC_RX_FRAME_OFFSET);
+			skb->protocol = eth_type_trans(skb, bgmac->net_dev);
+			netif_receive_skb(skb);
+			handled++;
+		}
+
+		/* Alloc new skb */
+		bgmac_dma_rx_skb_for_slot(bgmac, slot); /* FIXME: can fail! */
+		dma_desc->addr_low = cpu_to_le32(lower_32_bits(slot->dma_addr));
+		dma_desc->addr_high = cpu_to_le32(upper_32_bits(slot->dma_addr));
+
+		if (++ring->start >= BGMAC_RX_RING_SLOTS)
+			ring->start = 0;
+
+		if (handled >= weight) /* Should never be bigger */
+			break;
+	}
+
+	return handled;
+}
+
+/* Does ring support unaligned addressing? */
+static bool bgmac_dma_unaligned(struct bgmac *bgmac,
+				struct bgmac_dma_ring *ring)
+{
+	if (ring->tx) {
+		bgmac_write(bgmac, ring->mmio_base + BGMAC_DMA_TX_RINGLO,
+			    0xff0);
+		if (bgmac_read(bgmac, ring->mmio_base + BGMAC_DMA_TX_RINGLO))
+			return true;
+	} else {
+		bgmac_write(bgmac, ring->mmio_base + BGMAC_DMA_RX_RINGLO,
+			    0xff0);
+		if (bgmac_read(bgmac, ring->mmio_base + BGMAC_DMA_RX_RINGLO))
+			return true;
+	}
+	return false;
+}
+
+static void bgmac_dma_ring_free(struct bgmac *bgmac,
+				struct bgmac_dma_ring *ring)
+{
+	struct device *dma_dev = bgmac->core->dma_dev;
+	struct bgmac_slot_info *slot;
+	int size;
+	int i;
+
+	for (i = 0; i < ring->num_slots; i++) {
+		slot = &ring->slots[i];
+		if (slot->skb) {
+			if (slot->dma_addr)
+				dma_unmap_single(dma_dev, slot->dma_addr,
+						 slot->skb->len, DMA_TO_DEVICE);
+			dev_kfree_skb_any(slot->skb);
+		}
+	}
+
+	if (ring->cpu_base) {
+		/* Free ring of descriptors */
+		size = ring->num_slots * sizeof(struct bgmac_dma_desc);
+		dma_free_coherent(dma_dev, size, ring->cpu_base,
+				  ring->dma_base);
+	}
+}
+
+static void bgmac_dma_free(struct bgmac *bgmac)
+{
+	int i;
+
+	for (i = 0; i < BGMAC_MAX_TX_RINGS; i++)
+		bgmac_dma_ring_free(bgmac, &bgmac->tx_ring[i]);
+	for (i = 0; i < BGMAC_MAX_RX_RINGS; i++)
+		bgmac_dma_ring_free(bgmac, &bgmac->rx_ring[i]);
+}
+
+static int bgmac_dma_alloc(struct bgmac *bgmac)
+{
+	struct device *dma_dev = bgmac->core->dma_dev;
+	struct bgmac_dma_ring *ring;
+	u16 ring_base[] = { BGMAC_DMA_BASE0, BGMAC_DMA_BASE1, BGMAC_DMA_BASE2,
+			    BGMAC_DMA_BASE3, };
+	int size; /* ring size: different for Tx and Rx */
+	int err;
+	int i;
+
+	BUILD_BUG_ON(BGMAC_MAX_TX_RINGS > ARRAY_SIZE(ring_base));
+	BUILD_BUG_ON(BGMAC_MAX_RX_RINGS > ARRAY_SIZE(ring_base));
+
+	if (!(bcma_aread32(bgmac->core, BCMA_IOST) & BCMA_IOST_DMA64)) {
+		bgmac_err(bgmac, "Core does not report 64-bit DMA\n");
+		return -ENOTSUPP;
+	}
+
+	for (i = 0; i < BGMAC_MAX_TX_RINGS; i++) {
+		ring = &bgmac->tx_ring[i];
+		ring->tx = true;
+		ring->num_slots = BGMAC_TX_RING_SLOTS;
+		ring->mmio_base = ring_base[i];
+		if (bgmac_dma_unaligned(bgmac, ring))
+			bgmac_warn(bgmac, "TX on ring 0x%X supports unaligned addressing but this feature is not implemented\n", ring->mmio_base);
+
+		/* Alloc ring of descriptors */
+		size = ring->num_slots * sizeof(struct bgmac_dma_desc);
+		ring->cpu_base = dma_zalloc_coherent(dma_dev, size,
+						     &(ring->dma_base),
+						     GFP_KERNEL);
+		if (!ring->cpu_base) {
+			bgmac_err(bgmac, "Allocation of TX ring 0x%X failed\n", ring->mmio_base);
+			goto err_dma_free;
+		}
+		if (ring->dma_base & 0xC0000000)
+			bgmac_warn(bgmac, "DMA address using 0xC0000000 bit(s), it may need translation trick\n");
+
+		/* No need to alloc TX slots yet */
+	}
+
+	for (i = 0; i < BGMAC_MAX_RX_RINGS; i++) {
+		ring = &bgmac->rx_ring[i];
+		ring->tx = false;
+		ring->num_slots = BGMAC_RX_RING_SLOTS;
+		ring->mmio_base = ring_base[i];
+		if (bgmac_dma_unaligned(bgmac, ring))
+			bgmac_warn(bgmac, "RX on ring 0x%X supports unaligned addressing but this feature is not implemented\n", ring->mmio_base);
+
+		/* Alloc ring of descriptors */
+		size = ring->num_slots * sizeof(struct bgmac_dma_desc);
+		ring->cpu_base = dma_zalloc_coherent(dma_dev, size,
+						     &(ring->dma_base),
+						     GFP_KERNEL);
+		if (!ring->cpu_base) {
+			bgmac_err(bgmac, "Allocation of RX ring 0x%X failed\n", ring->mmio_base);
+			err = -ENOMEM;
+			goto err_dma_free;
+		}
+		if (ring->dma_base & 0xC0000000)
+			bgmac_warn(bgmac, "DMA address using 0xC0000000 bit(s), it may need translation trick\n");
+
+		/* Alloc RX slots */
+		for (i = 0; i < ring->num_slots; i++) {
+			err = bgmac_dma_rx_skb_for_slot(bgmac, &ring->slots[i]);
+			if (err) {
+				bgmac_err(bgmac, "Can't allocate skb for slot in RX ring\n");
+				goto err_dma_free;
+			}
+		}
+	}
+
+	return 0;
+
+err_dma_free:
+	bgmac_dma_free(bgmac);
+	return -ENOMEM;
+}
+
+static void bgmac_dma_init(struct bgmac *bgmac)
+{
+	struct bgmac_dma_ring *ring;
+	struct bgmac_dma_desc *dma_desc;
+	u32 ctl0, ctl1;
+	int i;
+
+	for (i = 0; i < BGMAC_MAX_TX_RINGS; i++) {
+		ring = &bgmac->tx_ring[i];
+
+		/* We don't implement unaligned addressing, so enable first */
+		bgmac_dma_tx_enable(bgmac, ring);
+		bgmac_write(bgmac, ring->mmio_base + BGMAC_DMA_TX_RINGLO,
+			    lower_32_bits(ring->dma_base));
+		bgmac_write(bgmac, ring->mmio_base + BGMAC_DMA_TX_RINGHI,
+			    upper_32_bits(ring->dma_base));
+
+		ring->start = 0;
+		ring->end = 0;	/* Points the slot that should *not* be read */
+	}
+
+	for (i = 0; i < BGMAC_MAX_RX_RINGS; i++) {
+		ring = &bgmac->rx_ring[i];
+
+		/* We don't implement unaligned addressing, so enable first */
+		bgmac_dma_rx_enable(bgmac, ring);
+		bgmac_write(bgmac, ring->mmio_base + BGMAC_DMA_RX_RINGLO,
+			    lower_32_bits(ring->dma_base));
+		bgmac_write(bgmac, ring->mmio_base + BGMAC_DMA_RX_RINGHI,
+			    upper_32_bits(ring->dma_base));
+
+		for (i = 0, dma_desc = ring->cpu_base; i < ring->num_slots;
+		     i++, dma_desc++) {
+			ctl0 = ctl1 = 0;
+
+			if (i == ring->num_slots - 1)
+				ctl0 |= BGMAC_DESC_CTL0_EOT;
+			ctl1 |= BGMAC_RX_BUF_SIZE & BGMAC_DESC_CTL1_LEN;
+			/* Is there any BGMAC device that requires extension? */
+			/* ctl1 |= (addrext << B43_DMA64_DCTL1_ADDREXT_SHIFT) &
+			 * B43_DMA64_DCTL1_ADDREXT_MASK;
+			 */
+
+			dma_desc->addr_low = cpu_to_le32(lower_32_bits(ring->slots[i].dma_addr));
+			dma_desc->addr_high = cpu_to_le32(upper_32_bits(ring->slots[i].dma_addr));
+			dma_desc->ctl0 = cpu_to_le32(ctl0);
+			dma_desc->ctl1 = cpu_to_le32(ctl1);
+		}
+
+		bgmac_write(bgmac, ring->mmio_base + BGMAC_DMA_RX_INDEX,
+			    ring->num_slots * sizeof(struct bgmac_dma_desc));
+
+		ring->start = 0;
+		ring->end = 0;
+	}
+}
+
+/**************************************************
+ * PHY ops
+ **************************************************/
+
+u16 bgmac_phy_read(struct bgmac *bgmac, u8 phyaddr, u8 reg)
+{
+	struct bcma_device *core;
+	u16 phy_access_addr;
+	u16 phy_ctl_addr;
+	u32 tmp;
+
+	BUILD_BUG_ON(BGMAC_PA_DATA_MASK != BCMA_GMAC_CMN_PA_DATA_MASK);
+	BUILD_BUG_ON(BGMAC_PA_ADDR_MASK != BCMA_GMAC_CMN_PA_ADDR_MASK);
+	BUILD_BUG_ON(BGMAC_PA_ADDR_SHIFT != BCMA_GMAC_CMN_PA_ADDR_SHIFT);
+	BUILD_BUG_ON(BGMAC_PA_REG_MASK != BCMA_GMAC_CMN_PA_REG_MASK);
+	BUILD_BUG_ON(BGMAC_PA_REG_SHIFT != BCMA_GMAC_CMN_PA_REG_SHIFT);
+	BUILD_BUG_ON(BGMAC_PA_WRITE != BCMA_GMAC_CMN_PA_WRITE);
+	BUILD_BUG_ON(BGMAC_PA_START != BCMA_GMAC_CMN_PA_START);
+	BUILD_BUG_ON(BGMAC_PC_EPA_MASK != BCMA_GMAC_CMN_PC_EPA_MASK);
+	BUILD_BUG_ON(BGMAC_PC_MCT_MASK != BCMA_GMAC_CMN_PC_MCT_MASK);
+	BUILD_BUG_ON(BGMAC_PC_MCT_SHIFT != BCMA_GMAC_CMN_PC_MCT_SHIFT);
+	BUILD_BUG_ON(BGMAC_PC_MTE != BCMA_GMAC_CMN_PC_MTE);
+
+	if (bgmac->core->id.id == BCMA_CORE_4706_MAC_GBIT) {
+		core = bgmac->core->bus->drv_gmac_cmn.core;
+		phy_access_addr = BCMA_GMAC_CMN_PHY_ACCESS;
+		phy_ctl_addr = BCMA_GMAC_CMN_PHY_CTL;
+	} else {
+		core = bgmac->core;
+		phy_access_addr = BGMAC_PHY_ACCESS;
+		phy_ctl_addr = BGMAC_PHY_CNTL;
+	}
+
+	tmp = bcma_read32(core, phy_ctl_addr);
+	tmp &= ~BGMAC_PC_EPA_MASK;
+	tmp |= phyaddr;
+	bcma_write32(core, phy_ctl_addr, tmp);
+
+	tmp = BGMAC_PA_START;
+	tmp |= phyaddr << BGMAC_PA_ADDR_SHIFT;
+	tmp |= reg << BGMAC_PA_REG_SHIFT;
+	bcma_write32(core, phy_access_addr, tmp);
+
+	if (!bgmac_wait_value(core, phy_access_addr, BGMAC_PA_START, 0, 1000)) {
+		bgmac_err(bgmac, "Reading PHY %d register 0x%X failed\n",
+			  phyaddr, reg);
+		return 0xffff;
+	}
+
+	return bcma_read32(core, phy_access_addr) & BGMAC_PA_DATA_MASK;
+}
+
+/* http://bcm-v4.sipsolutions.net/mac-gbit/gmac/chipphywr */
+void bgmac_phy_write(struct bgmac *bgmac, u8 phyaddr, u8 reg, u16 value)
+{
+	struct bcma_device *core;
+	u16 phy_access_addr;
+	u16 phy_ctl_addr;
+	u32 tmp;
+
+	if (bgmac->core->id.id == BCMA_CORE_4706_MAC_GBIT) {
+		core = bgmac->core->bus->drv_gmac_cmn.core;
+		phy_access_addr = BCMA_GMAC_CMN_PHY_ACCESS;
+		phy_ctl_addr = BCMA_GMAC_CMN_PHY_CTL;
+	} else {
+		core = bgmac->core;
+		phy_access_addr = BGMAC_PHY_ACCESS;
+		phy_ctl_addr = BGMAC_PHY_CNTL;
+	}
+
+	tmp = bcma_read32(core, phy_ctl_addr);
+	tmp &= ~BGMAC_PC_EPA_MASK;
+	tmp |= phyaddr;
+	bcma_write32(core, phy_ctl_addr, tmp);
+
+	bgmac_write(bgmac, BGMAC_INT_STATUS, BGMAC_IS_MDIO);
+	if (bgmac_read(bgmac, BGMAC_INT_STATUS) & BGMAC_IS_MDIO)
+		bgmac_warn(bgmac, "Error setting MDIO int\n");
+
+	tmp = BGMAC_PA_START;
+	tmp |= BGMAC_PA_WRITE;
+	tmp |= phyaddr << BGMAC_PA_ADDR_SHIFT;
+	tmp |= reg << BGMAC_PA_REG_SHIFT;
+	tmp |= value;
+	bcma_write32(core, phy_access_addr, tmp);
+
+	if (!bgmac_wait_value(core, phy_access_addr, BGMAC_PA_START, 0, 1000))
+		bgmac_err(bgmac, "Writing to PHY %d register 0x%X failed\n",
+			  phyaddr, reg);
+}
+
+/* http://bcm-v4.sipsolutions.net/mac-gbit/gmac/chipphyforce */
+static void bgmac_phy_force(struct bgmac *bgmac)
+{
+	u16 ctl;
+	u16 mask = ~(BGMAC_PHY_CTL_SPEED | BGMAC_PHY_CTL_SPEED_MSB |
+		     BGMAC_PHY_CTL_ANENAB | BGMAC_PHY_CTL_DUPLEX);
+
+	if (bgmac->phyaddr == BGMAC_PHY_NOREGS)
+		return;
+
+	if (bgmac->autoneg)
+		return;
+
+	ctl = bgmac_phy_read(bgmac, bgmac->phyaddr, BGMAC_PHY_CTL);
+	ctl &= mask;
+	if (bgmac->full_duplex)
+		ctl |= BGMAC_PHY_CTL_DUPLEX;
+	if (bgmac->speed == BGMAC_SPEED_100)
+		ctl |= BGMAC_PHY_CTL_SPEED_100;
+	else if (bgmac->speed == BGMAC_SPEED_1000)
+		ctl |= BGMAC_PHY_CTL_SPEED_1000;
+	bgmac_phy_write(bgmac, bgmac->phyaddr, BGMAC_PHY_CTL, ctl);
+}
+
+/* http://bcm-v4.sipsolutions.net/mac-gbit/gmac/chipphyadvertise */
+static void bgmac_phy_advertise(struct bgmac *bgmac)
+{
+	u16 adv;
+
+	if (bgmac->phyaddr == BGMAC_PHY_NOREGS)
+		return;
+
+	if (!bgmac->autoneg)
+		return;
+
+	/* Adv selected 10/100 speeds */
+	adv = bgmac_phy_read(bgmac, bgmac->phyaddr, BGMAC_PHY_ADV);
+	adv &= ~(BGMAC_PHY_ADV_10HALF | BGMAC_PHY_ADV_10FULL |
+		 BGMAC_PHY_ADV_100HALF | BGMAC_PHY_ADV_100FULL);
+	if (!bgmac->full_duplex && bgmac->speed & BGMAC_SPEED_10)
+		adv |= BGMAC_PHY_ADV_10HALF;
+	if (!bgmac->full_duplex && bgmac->speed & BGMAC_SPEED_100)
+		adv |= BGMAC_PHY_ADV_100HALF;
+	if (bgmac->full_duplex && bgmac->speed & BGMAC_SPEED_10)
+		adv |= BGMAC_PHY_ADV_10FULL;
+	if (bgmac->full_duplex && bgmac->speed & BGMAC_SPEED_100)
+		adv |= BGMAC_PHY_ADV_100FULL;
+	bgmac_phy_write(bgmac, bgmac->phyaddr, BGMAC_PHY_ADV, adv);
+
+	/* Adv selected 1000 speeds */
+	adv = bgmac_phy_read(bgmac, bgmac->phyaddr, BGMAC_PHY_ADV2);
+	adv &= ~(BGMAC_PHY_ADV2_1000HALF | BGMAC_PHY_ADV2_1000FULL);
+	if (!bgmac->full_duplex && bgmac->speed & BGMAC_SPEED_1000)
+		adv |= BGMAC_PHY_ADV2_1000HALF;
+	if (bgmac->full_duplex && bgmac->speed & BGMAC_SPEED_1000)
+		adv |= BGMAC_PHY_ADV2_1000FULL;
+	bgmac_phy_write(bgmac, bgmac->phyaddr, BGMAC_PHY_ADV2, adv);
+
+	/* Restart */
+	bgmac_phy_write(bgmac, bgmac->phyaddr, BGMAC_PHY_CTL,
+			bgmac_phy_read(bgmac, bgmac->phyaddr, BGMAC_PHY_CTL) |
+			BGMAC_PHY_CTL_RESTART);
+}
+
+/* http://bcm-v4.sipsolutions.net/mac-gbit/gmac/chipphyinit */
+static void bgmac_phy_init(struct bgmac *bgmac)
+{
+	struct bcma_chipinfo *ci = &bgmac->core->bus->chipinfo;
+	struct bcma_drv_cc *cc = &bgmac->core->bus->drv_cc;
+	u8 i;
+
+	if (ci->id == BCMA_CHIP_ID_BCM5356) {
+		for (i = 0; i < 5; i++) {
+			bgmac_phy_write(bgmac, i, 0x1f, 0x008b);
+			bgmac_phy_write(bgmac, i, 0x15, 0x0100);
+			bgmac_phy_write(bgmac, i, 0x1f, 0x000f);
+			bgmac_phy_write(bgmac, i, 0x12, 0x2aaa);
+			bgmac_phy_write(bgmac, i, 0x1f, 0x000b);
+		}
+	}
+	if ((ci->id == BCMA_CHIP_ID_BCM5357 && ci->pkg != 10) ||
+	    (ci->id == BCMA_CHIP_ID_BCM4749 && ci->pkg != 10) ||
+	    (ci->id == BCMA_CHIP_ID_BCM53572 && ci->pkg != 9)) {
+		bcma_chipco_chipctl_maskset(cc, 2, ~0xc0000000, 0);
+		bcma_chipco_chipctl_maskset(cc, 4, ~0x80000000, 0);
+		for (i = 0; i < 5; i++) {
+			bgmac_phy_write(bgmac, i, 0x1f, 0x000f);
+			bgmac_phy_write(bgmac, i, 0x16, 0x5284);
+			bgmac_phy_write(bgmac, i, 0x1f, 0x000b);
+			bgmac_phy_write(bgmac, i, 0x17, 0x0010);
+			bgmac_phy_write(bgmac, i, 0x1f, 0x000f);
+			bgmac_phy_write(bgmac, i, 0x16, 0x5296);
+			bgmac_phy_write(bgmac, i, 0x17, 0x1073);
+			bgmac_phy_write(bgmac, i, 0x17, 0x9073);
+			bgmac_phy_write(bgmac, i, 0x16, 0x52b6);
+			bgmac_phy_write(bgmac, i, 0x17, 0x9273);
+			bgmac_phy_write(bgmac, i, 0x1f, 0x000b);
+		}
+	}
+}
+
+/* http://bcm-v4.sipsolutions.net/mac-gbit/gmac/chipphyreset */
+static void bgmac_phy_reset(struct bgmac *bgmac)
+{
+	if (bgmac->phyaddr == BGMAC_PHY_NOREGS)
+		return;
+
+	bgmac_phy_write(bgmac, bgmac->phyaddr, BGMAC_PHY_CTL,
+			BGMAC_PHY_CTL_RESET);
+	udelay(100);
+	if (bgmac_phy_read(bgmac, bgmac->phyaddr, BGMAC_PHY_CTL) &
+	    BGMAC_PHY_CTL_RESET)
+		bgmac_err(bgmac, "PHY reset failed\n");
+	bgmac_phy_init(bgmac);
+}
+
+/**************************************************
+ * Chip ops
+ **************************************************/
+
+/* TODO: can we just drop @force? Can we don't reset MAC at all if there is
+ * nothing to change? Try if after stabilizng driver.
+ */
+static void bgmac_cmdcfg_maskset(struct bgmac *bgmac, u32 mask, u32 set,
+				 bool force)
+{
+	u32 cmdcfg = bgmac_read(bgmac, BGMAC_CMDCFG);
+	u32 new_val = (cmdcfg & mask) | set;
+
+	bgmac_set(bgmac, BGMAC_CMDCFG, BGMAC_CMDCFG_SR);
+	udelay(2);
+
+	if (new_val != cmdcfg || force)
+		bgmac_write(bgmac, BGMAC_CMDCFG, new_val);
+
+	bgmac_mask(bgmac, BGMAC_CMDCFG, ~BGMAC_CMDCFG_SR);
+	udelay(2);
+}
+
+static void bgmac_chip_stats_update(struct bgmac *bgmac)
+{
+	int i;
+
+	if (bgmac->core->id.id != BCMA_CORE_4706_MAC_GBIT) {
+		for (i = 0; i < BGMAC_NUM_MIB_TX_REGS; i++)
+			bgmac->mib_tx_regs[i] =
+				bgmac_read(bgmac,
+					   BGMAC_TX_GOOD_OCTETS + (i * 4));
+		for (i = 0; i < BGMAC_NUM_MIB_RX_REGS; i++)
+			bgmac->mib_rx_regs[i] =
+				bgmac_read(bgmac,
+					   BGMAC_RX_GOOD_OCTETS + (i * 4));
+	}
+
+	/* TODO: what else? how to handle BCM4706? */
+}
+
+static void bgmac_clear_mib(struct bgmac *bgmac)
+{
+	int i;
+
+	if (bgmac->core->id.id == BCMA_CORE_4706_MAC_GBIT)
+		return;
+
+	bgmac_set(bgmac, BGMAC_DEV_CTL, BGMAC_DC_MROR);
+	for (i = 0; i < BGMAC_NUM_MIB_TX_REGS; i++)
+		bgmac_read(bgmac, BGMAC_TX_GOOD_OCTETS + (i * 4));
+	for (i = 0; i < BGMAC_NUM_MIB_RX_REGS; i++)
+		bgmac_read(bgmac, BGMAC_RX_GOOD_OCTETS + (i * 4));
+}
+
+/* http://bcm-v4.sipsolutions.net/mac-gbit/gmac/gmac_speed */
+static void bgmac_speed(struct bgmac *bgmac, int speed)
+{
+	u32 mask = ~(BGMAC_CMDCFG_ES_MASK | BGMAC_CMDCFG_HD);
+	u32 set = 0;
+
+	if (speed & BGMAC_SPEED_10)
+		set |= BGMAC_CMDCFG_ES_10;
+	if (speed & BGMAC_SPEED_100)
+		set |= BGMAC_CMDCFG_ES_100;
+	if (speed & BGMAC_SPEED_1000)
+		set |= BGMAC_CMDCFG_ES_1000;
+	if (!bgmac->full_duplex)
+		set |= BGMAC_CMDCFG_HD;
+	bgmac_cmdcfg_maskset(bgmac, mask, set, true);
+}
+
+static void bgmac_miiconfig(struct bgmac *bgmac)
+{
+	u8 imode = (bgmac_read(bgmac, BGMAC_DEV_STATUS) & BGMAC_DS_MM_MASK) >>
+			BGMAC_DS_MM_SHIFT;
+	if (imode == 0 || imode == 1) {
+		if (bgmac->autoneg)
+			bgmac_speed(bgmac, BGMAC_SPEED_100);
+		else
+			bgmac_speed(bgmac, bgmac->speed);
+	}
+}
+
+/* http://bcm-v4.sipsolutions.net/mac-gbit/gmac/chipreset */
+static void bgmac_chip_reset(struct bgmac *bgmac)
+{
+	struct bcma_device *core = bgmac->core;
+	struct bcma_bus *bus = core->bus;
+	struct bcma_chipinfo *ci = &bus->chipinfo;
+	u32 flags = 0;
+	u32 iost;
+	int i;
+
+	if (bcma_core_is_enabled(core)) {
+		if (!bgmac->stats_grabbed) {
+			bgmac_chip_stats_update(bgmac);
+			bgmac->stats_grabbed = true;
+		}
+
+		for (i = 0; i < BGMAC_MAX_TX_RINGS; i++)
+			bgmac_dma_tx_reset(bgmac, &bgmac->tx_ring[i]);
+
+		bgmac_cmdcfg_maskset(bgmac, ~0, BGMAC_CMDCFG_ML, false);
+		udelay(1);
+
+		for (i = 0; i < BGMAC_MAX_RX_RINGS; i++)
+			bgmac_dma_rx_reset(bgmac, &bgmac->rx_ring[i]);
+
+		/* TODO: Clear software multicast filter list */
+	}
+
+	iost = bcma_aread32(core, BCMA_IOST);
+	if ((ci->id == BCMA_CHIP_ID_BCM5357 && ci->pkg == 10) ||
+	    (ci->id == BCMA_CHIP_ID_BCM4749 && ci->pkg == 10) ||
+	    (ci->id == BCMA_CHIP_ID_BCM53572 && ci->pkg == 9))
+		iost &= ~BGMAC_BCMA_IOST_ATTACHED;
+
+	if (iost & BGMAC_BCMA_IOST_ATTACHED) {
+		flags = BGMAC_BCMA_IOCTL_SW_CLKEN;
+		if (!bgmac->has_robosw)
+			flags |= BGMAC_BCMA_IOCTL_SW_RESET;
+	}
+
+	bcma_core_enable(core, flags);
+
+	if (core->id.rev > 2) {
+		bgmac_set(bgmac, BCMA_CLKCTLST, 1 << 8);
+		bgmac_wait_value(bgmac->core, BCMA_CLKCTLST, 1 << 24, 1 << 24,
+				 1000);
+	}
+
+	if (ci->id == BCMA_CHIP_ID_BCM5357 || ci->id == BCMA_CHIP_ID_BCM4749 ||
+	    ci->id == BCMA_CHIP_ID_BCM53572) {
+		struct bcma_drv_cc *cc = &bgmac->core->bus->drv_cc;
+		u8 et_swtype = 0;
+		u8 sw_type = BGMAC_CHIPCTL_1_SW_TYPE_EPHY |
+			     BGMAC_CHIPCTL_1_IF_TYPE_RMII;
+		char buf[2];
+		if (nvram_getenv("et_swtype", buf, 1) > 0) {
+			if (kstrtou8(buf, 0, &et_swtype))
+				bgmac_err(bgmac, "Failed to parse et_swtype (%s)\n", buf);
+			et_swtype &= 0x0f;
+			et_swtype <<= 4;
+			sw_type = et_swtype;
+		} else if (ci->id == BCMA_CHIP_ID_BCM5357 && ci->pkg == 9) {
+			sw_type = BGMAC_CHIPCTL_1_SW_TYPE_EPHYRMII;
+		} else if (0) {
+			/* TODO */
+		}
+		bcma_chipco_chipctl_maskset(cc, 1,
+					    ~(BGMAC_CHIPCTL_1_IF_TYPE_MASK |
+					      BGMAC_CHIPCTL_1_SW_TYPE_MASK),
+					    sw_type);
+	}
+
+	if (iost & BGMAC_BCMA_IOST_ATTACHED && !bgmac->has_robosw)
+		bcma_awrite32(core, BCMA_IOCTL,
+			      bcma_aread32(core, BCMA_IOCTL) &
+			      ~BGMAC_BCMA_IOCTL_SW_RESET);
+
+	/* http://bcm-v4.sipsolutions.net/mac-gbit/gmac/gmac_reset
+	 * Specs don't say about using BGMAC_CMDCFG_SR, but in this routine
+	 * BGMAC_CMDCFG is read _after_ putting chip in a reset. So it has to
+	 * be keps until taking MAC out of the reset.
+	 */
+	bgmac_cmdcfg_maskset(bgmac,
+			     ~(BGMAC_CMDCFG_TE |
+			       BGMAC_CMDCFG_RE |
+			       BGMAC_CMDCFG_RPI |
+			       BGMAC_CMDCFG_TAI |
+			       BGMAC_CMDCFG_HD |
+			       BGMAC_CMDCFG_ML |
+			       BGMAC_CMDCFG_CFE |
+			       BGMAC_CMDCFG_RL |
+			       BGMAC_CMDCFG_RED |
+			       BGMAC_CMDCFG_PE |
+			       BGMAC_CMDCFG_TPI |
+			       BGMAC_CMDCFG_PAD_EN |
+			       BGMAC_CMDCFG_PF),
+			     BGMAC_CMDCFG_PROM |
+			     BGMAC_CMDCFG_NLC |
+			     BGMAC_CMDCFG_CFE |
+			     BGMAC_CMDCFG_SR,
+			     false);
+
+	bgmac_clear_mib(bgmac);
+	if (core->id.id == BCMA_CORE_4706_MAC_GBIT)
+		bcma_maskset32(bgmac->cmn, BCMA_GMAC_CMN_PHY_CTL, ~0,
+			       BCMA_GMAC_CMN_PC_MTE);
+	else
+		bgmac_set(bgmac, BGMAC_PHY_CNTL, BGMAC_PC_MTE);
+	bgmac_miiconfig(bgmac);
+	bgmac_phy_init(bgmac);
+
+	bgmac->int_status = 0;
+}
+
+static void bgmac_chip_intrs_on(struct bgmac *bgmac)
+{
+	bgmac_write(bgmac, BGMAC_INT_MASK, bgmac->int_mask);
+}
+
+static void bgmac_chip_intrs_off(struct bgmac *bgmac)
+{
+	bgmac_write(bgmac, BGMAC_INT_MASK, 0);
+}
+
+/* http://bcm-v4.sipsolutions.net/mac-gbit/gmac/gmac_enable */
+static void bgmac_enable(struct bgmac *bgmac)
+{
+	struct bcma_chipinfo *ci = &bgmac->core->bus->chipinfo;
+	u32 cmdcfg;
+	u32 mode;
+	u32 rxq_ctl;
+	u32 fl_ctl;
+	u16 bp_clk;
+	u8 mdp;
+
+	cmdcfg = bgmac_read(bgmac, BGMAC_CMDCFG);
+	bgmac_cmdcfg_maskset(bgmac, ~(BGMAC_CMDCFG_TE | BGMAC_CMDCFG_RE),
+			     BGMAC_CMDCFG_SR, true);
+	udelay(2);
+	cmdcfg |= BGMAC_CMDCFG_TE | BGMAC_CMDCFG_RE;
+	bgmac_write(bgmac, BGMAC_CMDCFG, cmdcfg);
+
+	mode = (bgmac_read(bgmac, BGMAC_DEV_STATUS) & BGMAC_DS_MM_MASK) >>
+		BGMAC_DS_MM_SHIFT;
+	if (ci->id != BCMA_CHIP_ID_BCM47162 || mode != 0)
+		bgmac_set(bgmac, BCMA_CLKCTLST, BCMA_CLKCTLST_FORCEHT);
+	if (ci->id == BCMA_CHIP_ID_BCM47162 && mode == 2)
+		bcma_chipco_chipctl_maskset(&bgmac->core->bus->drv_cc, 1, ~0,
+					    BGMAC_CHIPCTL_1_RXC_DLL_BYPASS);
+
+	switch (ci->id) {
+	case BCMA_CHIP_ID_BCM5357:
+	case BCMA_CHIP_ID_BCM4749:
+	case BCMA_CHIP_ID_BCM53572:
+	case BCMA_CHIP_ID_BCM4716:
+	case BCMA_CHIP_ID_BCM47162:
+		fl_ctl = 0x03cb04cb;
+		if (ci->id == BCMA_CHIP_ID_BCM5357 ||
+		    ci->id == BCMA_CHIP_ID_BCM4749 ||
+		    ci->id == BCMA_CHIP_ID_BCM53572)
+			fl_ctl = 0x2300e1;
+		bgmac_write(bgmac, BGMAC_FLOW_CTL_THRESH, fl_ctl);
+		bgmac_write(bgmac, BGMAC_PAUSE_CTL, 0x27fff);
+		break;
+	}
+
+	rxq_ctl = bgmac_read(bgmac, BGMAC_RXQ_CTL);
+	rxq_ctl &= ~BGMAC_RXQ_CTL_MDP_MASK;
+	bp_clk = bcma_pmu_get_bus_clock(&bgmac->core->bus->drv_cc) / 1000000;
+	mdp = (bp_clk * 128 / 1000) - 3;
+	rxq_ctl |= (mdp << BGMAC_RXQ_CTL_MDP_SHIFT);
+	bgmac_write(bgmac, BGMAC_RXQ_CTL, rxq_ctl);
+}
+
+/* http://bcm-v4.sipsolutions.net/mac-gbit/gmac/chipinit */
+static void bgmac_chip_init(struct bgmac *bgmac, bool full_init)
+{
+	struct bgmac_dma_ring *ring;
+	u8 *mac = bgmac->net_dev->dev_addr;
+	u32 tmp;
+	int i;
+
+	/* 1 interrupt per received frame */
+	bgmac_write(bgmac, BGMAC_INT_RECV_LAZY, 1 << BGMAC_IRL_FC_SHIFT);
+
+	/* Enable 802.3x tx flow control (honor received PAUSE frames) */
+	bgmac_cmdcfg_maskset(bgmac, ~BGMAC_CMDCFG_RPI, 0, true);
+
+	if (bgmac->net_dev->flags & IFF_PROMISC)
+		bgmac_cmdcfg_maskset(bgmac, ~0, BGMAC_CMDCFG_PROM, false);
+	else
+		bgmac_warn(bgmac, "Software filtering is not supported yet\n");
+
+	/* Set MAC addr */
+	tmp = (mac[0] << 24) | (mac[1] << 16) | (mac[2] << 8) | mac[3];
+	bgmac_write(bgmac, BGMAC_MACADDR_HIGH, tmp);
+	tmp = (mac[4] << 8) | mac[5];
+	bgmac_write(bgmac, BGMAC_MACADDR_LOW, tmp);
+
+	if (bgmac->loopback)
+		bgmac_cmdcfg_maskset(bgmac, ~0, BGMAC_CMDCFG_ML, true);
+	else
+		bgmac_cmdcfg_maskset(bgmac, ~BGMAC_CMDCFG_ML, 0, true);
+
+	bgmac_write(bgmac, BGMAC_RXMAX_LENGTH, 32 + ETHER_MAX_LEN);
+
+	if (!bgmac->autoneg) {
+		bgmac_speed(bgmac, bgmac->speed);
+		bgmac_phy_force(bgmac);
+	} else if (bgmac->speed) { /* if there is anything to adv */
+		bgmac_phy_advertise(bgmac);
+	}
+
+	if (full_init) {
+		bgmac_dma_init(bgmac);
+		if (1) /* FIXME: is there any case we don't want IRQs? */
+			bgmac_chip_intrs_on(bgmac);
+	} else {
+		for (i = 0; i < BGMAC_MAX_RX_RINGS; i++) {
+			ring = &bgmac->rx_ring[i];
+			bgmac_dma_rx_enable(bgmac, ring);
+		}
+	}
+
+	bgmac_enable(bgmac);
+}
+
+static irqreturn_t bgmac_interrupt(int irq, void *dev_id)
+{
+	struct bgmac *bgmac = netdev_priv(dev_id);
+
+	u32 int_status = bgmac_read(bgmac, BGMAC_INT_STATUS);
+	int_status &= bgmac->int_mask;
+
+	if (!int_status)
+		return IRQ_NONE;
+
+	/* Ack */
+	bgmac_write(bgmac, BGMAC_INT_STATUS, int_status);
+
+	/* Disable new interrupts until handling existing ones */
+	bgmac_chip_intrs_off(bgmac);
+
+	bgmac->int_status = int_status;
+
+	napi_schedule(&bgmac->napi);
+
+	return IRQ_HANDLED;
+}
+
+static int bgmac_poll(struct napi_struct *napi, int weight)
+{
+	struct bgmac *bgmac = container_of(napi, struct bgmac, napi);
+	struct bgmac_dma_ring *ring;
+	int handled = 0;
+
+	if (bgmac->int_status & BGMAC_IS_TX0) {
+		ring = &bgmac->tx_ring[0];
+		bgmac_dma_tx_free(bgmac, ring);
+		bgmac->int_status &= ~BGMAC_IS_TX0;
+	}
+
+	if (bgmac->int_status & BGMAC_IS_RX) {
+		ring = &bgmac->rx_ring[0];
+		handled += bgmac_dma_rx_read(bgmac, ring, weight);
+		bgmac->int_status &= ~BGMAC_IS_RX;
+	}
+
+	if (bgmac->int_status) {
+		bgmac_err(bgmac, "Unknown IRQs: 0x%08X\n", bgmac->int_status);
+		bgmac->int_status = 0;
+	}
+
+	if (handled < weight)
+		napi_complete(napi);
+
+	bgmac_chip_intrs_on(bgmac);
+
+	return handled;
+}
+
+/**************************************************
+ * net_device_ops
+ **************************************************/
+
+static int bgmac_open(struct net_device *net_dev)
+{
+	struct bgmac *bgmac = netdev_priv(net_dev);
+
+	/* Reset */
+	bgmac_chip_reset(bgmac);
+	/* Specs say about reclaiming rings here, but we do that in DMA init */
+	bgmac_chip_init(bgmac, true);
+
+	/* Enable IRQs */
+	if (request_irq(bgmac->core->irq, bgmac_interrupt, IRQF_SHARED,
+			KBUILD_MODNAME, net_dev) < 0)
+		bgmac_err(bgmac, "IRQ request error!\n");
+	napi_enable(&bgmac->napi);
+
+	return 0;
+}
+
+static int bgmac_stop(struct net_device *net_dev)
+{
+	struct bgmac *bgmac = netdev_priv(net_dev);
+
+	/* Disable IRQs */
+	napi_disable(&bgmac->napi);
+	bgmac_chip_intrs_off(bgmac);
+
+	free_irq(bgmac->core->irq, net_dev);
+
+	/* TODO */
+
+	return 0;
+}
+
+static netdev_tx_t bgmac_start_xmit(struct sk_buff *skb,
+				    struct net_device *net_dev)
+{
+	struct bgmac *bgmac = netdev_priv(net_dev);
+	struct bgmac_dma_ring *ring;
+
+	/* No QOS support yet */
+	ring = &bgmac->tx_ring[0];
+	return bgmac_dma_tx_add(bgmac, ring, skb);
+}
+
+static int bgmac_ioctl(struct net_device *net_dev, struct ifreq *ifr, int cmd)
+{
+	struct bgmac *bgmac = netdev_priv(net_dev);
+	struct mii_ioctl_data *data = if_mii(ifr);
+
+	switch (cmd) {
+	case SIOCGMIIPHY:
+		data->phy_id = bgmac->phyaddr;
+		/* fallthru */
+	case SIOCGMIIREG:
+		if (!netif_running(net_dev))
+			return -EAGAIN;
+		data->val_out = bgmac_phy_read(bgmac, bgmac->phyaddr,
+					       data->reg_num & 0x1f);
+		return 0;
+	case SIOCSMIIREG:
+		if (!netif_running(net_dev))
+			return -EAGAIN;
+		bgmac_phy_write(bgmac, bgmac->phyaddr, data->reg_num & 0x1f,
+				data->val_in);
+		return 0;
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
+static const struct net_device_ops bgmac_netdev_ops = {
+	.ndo_open		= bgmac_open,
+	.ndo_stop		= bgmac_stop,
+	.ndo_start_xmit		= bgmac_start_xmit,
+	.ndo_set_mac_address	= eth_mac_addr, /* generic, sets dev_addr */
+	.ndo_do_ioctl           = bgmac_ioctl,
+};
+
+/**************************************************
+ * ethtool_ops
+ **************************************************/
+
+static int bgmac_get_settings(struct net_device *net_dev,
+			      struct ethtool_cmd *cmd)
+{
+	struct bgmac *bgmac = netdev_priv(net_dev);
+
+	cmd->supported = SUPPORTED_10baseT_Half |
+			 SUPPORTED_10baseT_Full |
+			 SUPPORTED_100baseT_Half |
+			 SUPPORTED_100baseT_Full |
+			 SUPPORTED_1000baseT_Half |
+			 SUPPORTED_1000baseT_Full |
+			 SUPPORTED_Autoneg;
+
+	if (bgmac->autoneg) {
+		WARN_ON(cmd->advertising);
+		if (bgmac->full_duplex) {
+			if (bgmac->speed & BGMAC_SPEED_10)
+				cmd->advertising |= ADVERTISED_10baseT_Full;
+			if (bgmac->speed & BGMAC_SPEED_100)
+				cmd->advertising |= ADVERTISED_100baseT_Full;
+			if (bgmac->speed & BGMAC_SPEED_1000)
+				cmd->advertising |= ADVERTISED_1000baseT_Full;
+		} else {
+			if (bgmac->speed & BGMAC_SPEED_10)
+				cmd->advertising |= ADVERTISED_10baseT_Half;
+			if (bgmac->speed & BGMAC_SPEED_100)
+				cmd->advertising |= ADVERTISED_100baseT_Half;
+			if (bgmac->speed & BGMAC_SPEED_1000)
+				cmd->advertising |= ADVERTISED_1000baseT_Half;
+		}
+	} else {
+		switch (bgmac->speed) {
+		case BGMAC_SPEED_10:
+			ethtool_cmd_speed_set(cmd, SPEED_10);
+			break;
+		case BGMAC_SPEED_100:
+			ethtool_cmd_speed_set(cmd, SPEED_100);
+			break;
+		case BGMAC_SPEED_1000:
+			ethtool_cmd_speed_set(cmd, SPEED_1000);
+			break;
+		}
+	}
+
+	cmd->duplex = bgmac->full_duplex ? DUPLEX_FULL : DUPLEX_HALF;
+
+	cmd->autoneg = bgmac->autoneg;
+
+	return 0;
+}
+
+#if 0
+static int bgmac_set_settings(struct net_device *net_dev,
+			      struct ethtool_cmd *cmd)
+{
+	struct bgmac *bgmac = netdev_priv(net_dev);
+
+	return -1;
+}
+#endif
+
+static void bgmac_get_drvinfo(struct net_device *net_dev,
+			      struct ethtool_drvinfo *info)
+{
+	strlcpy(info->driver, KBUILD_MODNAME, sizeof(info->driver));
+	strlcpy(info->bus_info, "BCMA", sizeof(info->bus_info));
+}
+
+static const struct ethtool_ops bgmac_ethtool_ops = {
+	.get_settings		= bgmac_get_settings,
+	.get_drvinfo		= bgmac_get_drvinfo,
+};
+
+/**************************************************
+ * BCMA bus ops
+ **************************************************/
+
+/* http://bcm-v4.sipsolutions.net/mac-gbit/gmac/chipattach */
+static int bgmac_probe(struct bcma_device *core)
+{
+	struct net_device *net_dev;
+	struct bgmac *bgmac;
+	struct ssb_sprom *sprom = &core->bus->sprom;
+	u8 *mac = core->core_unit ? sprom->et1mac : sprom->et0mac;
+	int err;
+
+	/* We don't support 2nd, 3rd, ... units, SPROM has to be adjusted */
+	if (core->core_unit > 1) {
+		pr_err("Unsupported core_unit %d\n", core->core_unit);
+		return -ENOTSUPP;
+	}
+
+	/* Allocation and references */
+	net_dev = alloc_etherdev(sizeof(*bgmac));
+	if (!net_dev)
+		return -ENOMEM;
+	net_dev->netdev_ops = &bgmac_netdev_ops;
+	net_dev->irq = core->irq;
+	SET_ETHTOOL_OPS(net_dev, &bgmac_ethtool_ops);
+	bgmac = netdev_priv(net_dev);
+	bgmac->net_dev = net_dev;
+	bgmac->core = core;
+	bcma_set_drvdata(core, bgmac);
+
+	/* Defaults */
+	bgmac->autoneg = true;
+	bgmac->full_duplex = true;
+	bgmac->speed = BGMAC_SPEED_10 | BGMAC_SPEED_100 | BGMAC_SPEED_1000;
+	memcpy(bgmac->net_dev->dev_addr, mac, ETH_ALEN);
+
+	/* On BCM4706 we need common core to access PHY */
+	if (core->id.id == BCMA_CORE_4706_MAC_GBIT &&
+	    !core->bus->drv_gmac_cmn.core) {
+		bgmac_err(bgmac, "GMAC CMN core not found (required for BCM4706)\n");
+		err = -ENODEV;
+		goto err_netdev_free;
+	}
+	bgmac->cmn = core->bus->drv_gmac_cmn.core;
+
+	bgmac->phyaddr = core->core_unit ? sprom->et1phyaddr :
+			 sprom->et0phyaddr;
+	bgmac->phyaddr &= BGMAC_PHY_MASK;
+	if (bgmac->phyaddr == BGMAC_PHY_MASK) {
+		bgmac_err(bgmac, "No PHY found\n");
+		err = -ENODEV;
+		goto err_netdev_free;
+	}
+	bgmac_info(bgmac, "Found PHY addr: %d\n", bgmac->phyaddr);
+
+	if (core->bus->hosttype == BCMA_HOSTTYPE_PCI) {
+		bgmac_err(bgmac, "PCI setup not implemented\n");
+		err = -ENOTSUPP;
+		goto err_netdev_free;
+	}
+
+	bgmac_chip_reset(bgmac);
+
+	err = bgmac_dma_alloc(bgmac);
+	if (err) {
+		bgmac_err(bgmac, "Unable to alloc memory for DMA\n");
+		goto err_netdev_free;
+	}
+
+	bgmac->int_mask = BGMAC_IS_ERRMASK | BGMAC_IS_RX | BGMAC_IS_TX_MASK;
+	if (nvram_getenv("et0_no_txint", NULL, 0) == 0)
+		bgmac->int_mask &= ~BGMAC_IS_TX_MASK;
+
+	/* TODO: reset the external phy */
+	bgmac_phy_reset(bgmac);
+
+	bgmac->has_robosw = !!(core->bus->sprom.boardflags_lo &
+			       BGMAC_BFL_ENETROBO);
+	if (bgmac->has_robosw)
+		bgmac_err(bgmac, "Support for Roboswitch not implemented\n");
+
+	if (core->bus->sprom.boardflags_lo & BGMAC_BFL_ENETADM)
+		bgmac_err(bgmac, "Support for ADMtek ethernet switch not implemented\n");
+
+	err = register_netdev(bgmac->net_dev);
+	if (err) {
+		bgmac_err(bgmac, "Cannot register net device\n");
+		err = -ENOTSUPP;
+		goto err_dma_free;
+	}
+
+	netif_napi_add(net_dev, &bgmac->napi, bgmac_poll, BGMAC_WEIGHT);
+
+	return 0;
+
+err_dma_free:
+	bgmac_dma_free(bgmac);
+
+err_netdev_free:
+	bcma_set_drvdata(core, NULL);
+	free_netdev(net_dev);
+
+	return err;
+}
+
+static void bgmac_remove(struct bcma_device *core)
+{
+	struct bgmac *bgmac = bcma_get_drvdata(core);
+
+	unregister_netdev(bgmac->net_dev);
+	free_netdev(bgmac->net_dev);
+	bcma_set_drvdata(core, NULL);
+}
+
+static struct bcma_driver bgmac_bcma_driver = {
+	.name		= KBUILD_MODNAME,
+	.id_table	= bgmac_bcma_tbl,
+	.probe		= bgmac_probe,
+	.remove		= bgmac_remove,
+};
+
+static int __init bgmac_init(void)
+{
+	int err;
+
+	err = bcma_driver_register(&bgmac_bcma_driver);
+	if (err)
+		return err;
+	pr_info("Broadcom 47xx GMAC driver loaded\n");
+
+	return 0;
+}
+
+static void __exit bgmac_exit(void)
+{
+	bcma_driver_unregister(&bgmac_bcma_driver);
+}
+
+module_init(bgmac_init)
+module_exit(bgmac_exit)
+
+MODULE_AUTHOR("Rafał Miłecki");
+MODULE_LICENSE("GPL");
diff --git a/drivers/net/ethernet/broadcom/bgmac.h b/drivers/net/ethernet/broadcom/bgmac.h
new file mode 100644
index 0000000..3899f1b
--- /dev/null
+++ b/drivers/net/ethernet/broadcom/bgmac.h
@@ -0,0 +1,425 @@
+#ifndef _BGMAC_H
+#define _BGMAC_H
+
+#include <linux/bcma/bcma.h>
+
+#define BGMAC_DEV_CTL				0x000
+#define  BGMAC_DC_TSM				0x00000002
+#define  BGMAC_DC_CFCO				0x00000004
+#define  BGMAC_DC_RLSS				0x00000008
+#define  BGMAC_DC_MROR				0x00000010
+#define  BGMAC_DC_FCM_MASK			0x00000060
+#define  BGMAC_DC_FCM_SHIFT			5
+#define  BGMAC_DC_NAE				0x00000080
+#define  BGMAC_DC_TF				0x00000100
+#define  BGMAC_DC_RDS_MASK			0x00030000
+#define  BGMAC_DC_RDS_SHIFT			16
+#define  BGMAC_DC_TDS_MASK			0x000c0000
+#define  BGMAC_DC_TDS_SHIFT			18
+#define BGMAC_DEV_STATUS			0x004		/* Configuration of the interface */
+#define  BGMAC_DS_RBF				0x00000001
+#define  BGMAC_DS_RDF				0x00000002
+#define  BGMAC_DS_RIF				0x00000004
+#define  BGMAC_DS_TBF				0x00000008
+#define  BGMAC_DS_TDF				0x00000010
+#define  BGMAC_DS_TIF				0x00000020
+#define  BGMAC_DS_PO				0x00000040
+#define  BGMAC_DS_MM_MASK			0x00000300	/* Mode of the interface */
+#define  BGMAC_DS_MM_SHIFT			8
+#define BGMAC_BIST_STATUS			0x00c
+#define BGMAC_INT_STATUS			0x020		/* Interrupt status */
+#define  BGMAC_IS_MRO				0x00000001
+#define  BGMAC_IS_MTO				0x00000002
+#define  BGMAC_IS_TFD				0x00000004
+#define  BGMAC_IS_LS				0x00000008
+#define  BGMAC_IS_MDIO				0x00000010
+#define  BGMAC_IS_MR				0x00000020
+#define  BGMAC_IS_MT				0x00000040
+#define  BGMAC_IS_TO				0x00000080
+#define  BGMAC_IS_DESC_ERR			0x00000400	/* Descriptor error */
+#define  BGMAC_IS_DATA_ERR			0x00000800	/* Data error */
+#define  BGMAC_IS_DESC_PROT_ERR			0x00001000	/* Descriptor protocol error */
+#define  BGMAC_IS_RX_DESC_UNDERF		0x00002000	/* Receive descriptor underflow */
+#define  BGMAC_IS_RX_F_OVERF			0x00004000	/* Receive FIFO overflow */
+#define  BGMAC_IS_TX_F_UNDERF			0x00008000	/* Transmit FIFO underflow */
+#define  BGMAC_IS_RX				0x00010000	/* Interrupt for RX queue 0 */
+#define  BGMAC_IS_TX0				0x01000000	/* Interrupt for TX queue 0 */
+#define  BGMAC_IS_TX1				0x02000000	/* Interrupt for TX queue 1 */
+#define  BGMAC_IS_TX2				0x04000000	/* Interrupt for TX queue 2 */
+#define  BGMAC_IS_TX3				0x08000000	/* Interrupt for TX queue 3 */
+#define  BGMAC_IS_TX_MASK			0x0f000000
+#define  BGMAC_IS_INTMASK			0x0f01fcff
+#define  BGMAC_IS_ERRMASK			0x0000fc00
+#define BGMAC_INT_MASK				0x024		/* Interrupt mask */
+#define BGMAC_GP_TIMER				0x028
+#define BGMAC_INT_RECV_LAZY			0x100
+#define  BGMAC_IRL_TO_MASK			0x00ffffff
+#define  BGMAC_IRL_FC_MASK			0xff000000
+#define  BGMAC_IRL_FC_SHIFT			24		/* Shift the number of interrupts triggered per received frame */
+#define BGMAC_FLOW_CTL_THRESH			0x104		/* Flow control thresholds */
+#define BGMAC_WRRTHRESH				0x108
+#define BGMAC_GMAC_IDLE_CNT_THRESH		0x10c
+#define BGMAC_PHY_ACCESS			0x180		/* PHY access address */
+#define  BGMAC_PA_DATA_MASK			0x0000ffff
+#define  BGMAC_PA_ADDR_MASK			0x001f0000
+#define  BGMAC_PA_ADDR_SHIFT			16
+#define  BGMAC_PA_REG_MASK			0x1f000000
+#define  BGMAC_PA_REG_SHIFT			24
+#define  BGMAC_PA_WRITE				0x20000000
+#define  BGMAC_PA_START				0x40000000
+#define BGMAC_PHY_CNTL				0x188		/* PHY control address */
+#define  BGMAC_PC_EPA_MASK			0x0000001f
+#define  BGMAC_PC_MCT_MASK			0x007f0000
+#define  BGMAC_PC_MCT_SHIFT			16
+#define  BGMAC_PC_MTE				0x00800000
+#define BGMAC_TXQ_CTL				0x18c
+#define  BGMAC_TXQ_CTL_DBT_MASK			0x00000fff
+#define  BGMAC_TXQ_CTL_DBT_SHIFT		0
+#define BGMAC_RXQ_CTL				0x190
+#define  BGMAC_RXQ_CTL_DBT_MASK			0x00000fff
+#define  BGMAC_RXQ_CTL_DBT_SHIFT		0
+#define  BGMAC_RXQ_CTL_PTE			0x00001000
+#define  BGMAC_RXQ_CTL_MDP_MASK			0x3f000000
+#define  BGMAC_RXQ_CTL_MDP_SHIFT		24
+#define BGMAC_GPIO_SELECT			0x194
+#define BGMAC_GPIO_OUTPUT_EN			0x198
+/* For 0x1e0 see BCMA_CLKCTLST */
+#define BGMAC_HW_WAR				0x1e4
+#define BGMAC_PWR_CTL				0x1e8
+#define BGMAC_DMA_BASE0				0x200		/* Tx and Rx controller */
+#define BGMAC_DMA_BASE1				0x240		/* Tx controller only */
+#define BGMAC_DMA_BASE2				0x280		/* Tx controller only */
+#define BGMAC_DMA_BASE3				0x2C0		/* Tx controller only */
+#define BGMAC_TX_GOOD_OCTETS			0x300
+#define BGMAC_TX_GOOD_OCTETS_HIGH		0x304
+#define BGMAC_TX_GOOD_PKTS			0x308
+#define BGMAC_TX_OCTETS				0x30c
+#define BGMAC_TX_OCTETS_HIGH			0x310
+#define BGMAC_TX_PKTS				0x314
+#define BGMAC_TX_BROADCAST_PKTS			0x318
+#define BGMAC_TX_MULTICAST_PKTS			0x31c
+#define BGMAC_TX_LEN_64				0x320
+#define BGMAC_TX_LEN_65_TO_127			0x324
+#define BGMAC_TX_LEN_128_TO_255			0x328
+#define BGMAC_TX_LEN_256_TO_511			0x32c
+#define BGMAC_TX_LEN_512_TO_1023		0x330
+#define BGMAC_TX_LEN_1024_TO_1522		0x334
+#define BGMAC_TX_LEN_1523_TO_2047		0x338
+#define BGMAC_TX_LEN_2048_TO_4095		0x33c
+#define BGMAC_TX_LEN_4095_TO_8191		0x340
+#define BGMAC_TX_LEN_8192_TO_MAX		0x344
+#define BGMAC_TX_JABBER_PKTS			0x348		/* Error */
+#define BGMAC_TX_OVERSIZE_PKTS			0x34c		/* Error */
+#define BGMAC_TX_FRAGMENT_PKTS			0x350
+#define BGMAC_TX_UNDERRUNS			0x354		/* Error */
+#define BGMAC_TX_TOTAL_COLS			0x358
+#define BGMAC_TX_SINGLE_COLS			0x35c
+#define BGMAC_TX_MULTIPLE_COLS			0x360
+#define BGMAC_TX_EXCESSIVE_COLS			0x364		/* Error */
+#define BGMAC_TX_LATE_COLS			0x368		/* Error */
+#define BGMAC_TX_DEFERED			0x36c
+#define BGMAC_TX_CARRIER_LOST			0x370
+#define BGMAC_TX_PAUSE_PKTS			0x374
+#define BGMAC_TX_UNI_PKTS			0x378
+#define BGMAC_TX_Q0_PKTS			0x37c
+#define BGMAC_TX_Q0_OCTETS			0x380
+#define BGMAC_TX_Q0_OCTETS_HIGH			0x384
+#define BGMAC_TX_Q1_PKTS			0x388
+#define BGMAC_TX_Q1_OCTETS			0x38c
+#define BGMAC_TX_Q1_OCTETS_HIGH			0x390
+#define BGMAC_TX_Q2_PKTS			0x394
+#define BGMAC_TX_Q2_OCTETS			0x398
+#define BGMAC_TX_Q2_OCTETS_HIGH			0x39c
+#define BGMAC_TX_Q3_PKTS			0x3a0
+#define BGMAC_TX_Q3_OCTETS			0x3a4
+#define BGMAC_TX_Q3_OCTETS_HIGH			0x3a8
+#define BGMAC_RX_GOOD_OCTETS			0x3b0
+#define BGMAC_RX_GOOD_OCTETS_HIGH		0x3b4
+#define BGMAC_RX_GOOD_PKTS			0x3b8
+#define BGMAC_RX_OCTETS				0x3bc
+#define BGMAC_RX_OCTETS_HIGH			0x3c0
+#define BGMAC_RX_PKTS				0x3c4
+#define BGMAC_RX_BROADCAST_PKTS			0x3c8
+#define BGMAC_RX_MULTICAST_PKTS			0x3cc
+#define BGMAC_RX_LEN_64				0x3d0
+#define BGMAC_RX_LEN_65_TO_127			0x3d4
+#define BGMAC_RX_LEN_128_TO_255			0x3d8
+#define BGMAC_RX_LEN_256_TO_511			0x3dc
+#define BGMAC_RX_LEN_512_TO_1023		0x3e0
+#define BGMAC_RX_LEN_1024_TO_1522		0x3e4
+#define BGMAC_RX_LEN_1523_TO_2047		0x3e8
+#define BGMAC_RX_LEN_2048_TO_4095		0x3ec
+#define BGMAC_RX_LEN_4095_TO_8191		0x3f0
+#define BGMAC_RX_LEN_8192_TO_MAX		0x3f4
+#define BGMAC_RX_JABBER_PKTS			0x3f8		/* Error */
+#define BGMAC_RX_OVERSIZE_PKTS			0x3fc		/* Error */
+#define BGMAC_RX_FRAGMENT_PKTS			0x400
+#define BGMAC_RX_MISSED_PKTS			0x404		/* Error */
+#define BGMAC_RX_CRC_ALIGN_ERRS			0x408		/* Error */
+#define BGMAC_RX_UNDERSIZE			0x40c		/* Error */
+#define BGMAC_RX_CRC_ERRS			0x410		/* Error */
+#define BGMAC_RX_ALIGN_ERRS			0x414		/* Error */
+#define BGMAC_RX_SYMBOL_ERRS			0x418		/* Error */
+#define BGMAC_RX_PAUSE_PKTS			0x41c
+#define BGMAC_RX_NONPAUSE_PKTS			0x420
+#define BGMAC_RX_SACHANGES			0x424
+#define BGMAC_RX_UNI_PKTS			0x428
+#define BGMAC_UNIMAC_VERSION			0x800
+#define BGMAC_HDBKP_CTL				0x804
+#define BGMAC_CMDCFG				0x808		/* Configuration */
+#define  BGMAC_CMDCFG_TE			0x00000001	/* Set to activate TX */
+#define  BGMAC_CMDCFG_RE			0x00000002	/* Set to activate RX */
+#define  BGMAC_CMDCFG_ES_MASK			0x0000000c	/* Ethernet speed see gmac_speed */
+#define   BGMAC_CMDCFG_ES_10			0x00000000
+#define   BGMAC_CMDCFG_ES_100			0x00000004
+#define   BGMAC_CMDCFG_ES_1000			0x00000008
+#define  BGMAC_CMDCFG_PROM			0x00000010	/* Set to activate promiscuous mode */
+#define  BGMAC_CMDCFG_PAD_EN			0x00000020
+#define  BGMAC_CMDCFG_CF			0x00000040
+#define  BGMAC_CMDCFG_PF			0x00000080
+#define  BGMAC_CMDCFG_RPI			0x00000100	/* Unset to enable 802.3x tx flow control */
+#define  BGMAC_CMDCFG_TAI			0x00000200
+#define  BGMAC_CMDCFG_HD			0x00000400	/* Set if in half duplex mode */
+#define  BGMAC_CMDCFG_HD_SHIFT			10
+#define  BGMAC_CMDCFG_SR			0x00000800	/* Set to reset mode */
+#define  BGMAC_CMDCFG_ML			0x00008000	/* Set to activate mac loopback mode */
+#define  BGMAC_CMDCFG_AE			0x00400000
+#define  BGMAC_CMDCFG_CFE			0x00800000
+#define  BGMAC_CMDCFG_NLC			0x01000000
+#define  BGMAC_CMDCFG_RL			0x02000000
+#define  BGMAC_CMDCFG_RED			0x04000000
+#define  BGMAC_CMDCFG_PE			0x08000000
+#define  BGMAC_CMDCFG_TPI			0x10000000
+#define  BGMAC_CMDCFG_AT			0x20000000
+#define BGMAC_MACADDR_HIGH			0x80c		/* High 4 octets of own mac address */
+#define BGMAC_MACADDR_LOW			0x810		/* Low 2 octets of own mac address */
+#define BGMAC_RXMAX_LENGTH			0x814		/* Max receive frame length with vlan tag */
+#define BGMAC_PAUSEQUANTA			0x818
+#define BGMAC_MAC_MODE				0x844
+#define BGMAC_OUTERTAG				0x848
+#define BGMAC_INNERTAG				0x84c
+#define BGMAC_TXIPG				0x85c
+#define BGMAC_PAUSE_CTL				0xb30
+#define BGMAC_TX_FLUSH				0xb34
+#define BGMAC_RX_STATUS				0xb38
+#define BGMAC_TX_STATUS				0xb3c
+
+#define BGMAC_PHY_CTL				0x00
+#define  BGMAC_PHY_CTL_SPEED_MSB		0x0040
+#define  BGMAC_PHY_CTL_DUPLEX			0x0100		/* duplex mode */
+#define  BGMAC_PHY_CTL_RESTART			0x0200		/* restart autonegotiation */
+#define  BGMAC_PHY_CTL_ANENAB			0x1000		/* enable autonegotiation */
+#define  BGMAC_PHY_CTL_SPEED			0x2000
+#define  BGMAC_PHY_CTL_LOOP			0x4000		/* loopback */
+#define  BGMAC_PHY_CTL_RESET			0x8000		/* reset */
+/* Helpers */
+#define  BGMAC_PHY_CTL_SPEED_10			0
+#define  BGMAC_PHY_CTL_SPEED_100		BGMAC_PHY_CTL_SPEED
+#define  BGMAC_PHY_CTL_SPEED_1000		BGMAC_PHY_CTL_SPEED_MSB
+#define BGMAC_PHY_ADV				0x04
+#define  BGMAC_PHY_ADV_10HALF			0x0020		/* advertise 10MBits/s half duplex */
+#define  BGMAC_PHY_ADV_10FULL			0x0040		/* advertise 10MBits/s full duplex */
+#define  BGMAC_PHY_ADV_100HALF			0x0080		/* advertise 100MBits/s half duplex */
+#define  BGMAC_PHY_ADV_100FULL			0x0100		/* advertise 100MBits/s full duplex */
+#define BGMAC_PHY_ADV2				0x09
+#define  BGMAC_PHY_ADV2_1000HALF		0x0100		/* advertise 1000MBits/s half duplex */
+#define  BGMAC_PHY_ADV2_1000FULL		0x0200		/* advertise 1000MBits/s full duplex */
+
+/* BCMA GMAC core specific IO Control (BCMA_IOCTL) flags */
+#define BGMAC_BCMA_IOCTL_SW_CLKEN		0x00000004	/* PHY Clock Enable */
+#define BGMAC_BCMA_IOCTL_SW_RESET		0x00000008	/* PHY Reset */
+
+/* BCMA GMAC core specific IO status (BCMA_IOST) flags */
+#define BGMAC_BCMA_IOST_ATTACHED		0x00000800
+
+#define BGMAC_NUM_MIB_TX_REGS	\
+		(((BGMAC_TX_Q3_OCTETS_HIGH - BGMAC_TX_GOOD_OCTETS) / 4) + 1)
+#define BGMAC_NUM_MIB_RX_REGS	\
+		(((BGMAC_RX_UNI_PKTS - BGMAC_RX_GOOD_OCTETS) / 4) + 1)
+
+#define BGMAC_DMA_TX_CTL			0x00
+#define  BGMAC_DMA_TX_ENABLE			0x00000001
+#define  BGMAC_DMA_TX_SUSPEND			0x00000002
+#define  BGMAC_DMA_TX_LOOPBACK			0x00000004
+#define  BGMAC_DMA_TX_FLUSH			0x00000010
+#define  BGMAC_DMA_TX_PARITY_DISABLE		0x00000800
+#define  BGMAC_DMA_TX_ADDREXT_MASK		0x00030000
+#define  BGMAC_DMA_TX_ADDREXT_SHIFT		16
+#define BGMAC_DMA_TX_INDEX			0x04
+#define BGMAC_DMA_TX_RINGLO			0x08
+#define BGMAC_DMA_TX_RINGHI			0x0C
+#define BGMAC_DMA_TX_STATUS			0x10
+#define  BGMAC_DMA_TX_STATDPTR			0x00001FFF
+#define  BGMAC_DMA_TX_STAT			0xF0000000
+#define   BGMAC_DMA_TX_STAT_DISABLED		0x00000000
+#define   BGMAC_DMA_TX_STAT_ACTIVE		0x10000000
+#define   BGMAC_DMA_TX_STAT_IDLEWAIT		0x20000000
+#define   BGMAC_DMA_TX_STAT_STOPPED		0x30000000
+#define   BGMAC_DMA_TX_STAT_SUSP		0x40000000
+#define BGMAC_DMA_TX_ERROR			0x14
+#define  BGMAC_DMA_TX_ERRDPTR			0x0001FFFF
+#define  BGMAC_DMA_TX_ERR			0xF0000000
+#define   BGMAC_DMA_TX_ERR_NOERR		0x00000000
+#define   BGMAC_DMA_TX_ERR_PROT			0x10000000
+#define   BGMAC_DMA_TX_ERR_UNDERRUN		0x20000000
+#define   BGMAC_DMA_TX_ERR_TRANSFER		0x30000000
+#define   BGMAC_DMA_TX_ERR_DESCREAD		0x40000000
+#define   BGMAC_DMA_TX_ERR_CORE			0x50000000
+#define BGMAC_DMA_RX_CTL			0x20
+#define  BGMAC_DMA_RX_ENABLE			0x00000001
+#define  BGMAC_DMA_RX_FRAME_OFFSET_MASK		0x000000FE
+#define  BGMAC_DMA_RX_FRAME_OFFSET_SHIFT	1
+#define  BGMAC_DMA_RX_DIRECT_FIFO		0x00000100
+#define  BGMAC_DMA_RX_OVERFLOW_CONT		0x00000400
+#define  BGMAC_DMA_RX_PARITY_DISABLE		0x00000800
+#define  BGMAC_DMA_RX_ADDREXT_MASK		0x00030000
+#define  BGMAC_DMA_RX_ADDREXT_SHIFT		16
+#define BGMAC_DMA_RX_INDEX			0x24
+#define BGMAC_DMA_RX_RINGLO			0x28
+#define BGMAC_DMA_RX_RINGHI			0x2C
+#define BGMAC_DMA_RX_STATUS			0x30
+#define  BGMAC_DMA_RX_STATDPTR			0x00001FFF
+#define  BGMAC_DMA_RX_STAT			0xF0000000
+#define   BGMAC_DMA_RX_STAT_DISABLED		0x00000000
+#define   BGMAC_DMA_RX_STAT_ACTIVE		0x10000000
+#define   BGMAC_DMA_RX_STAT_IDLEWAIT		0x20000000
+#define   BGMAC_DMA_RX_STAT_STOPPED		0x30000000
+#define   BGMAC_DMA_RX_STAT_SUSP		0x40000000
+#define BGMAC_DMA_RX_ERROR			0x34
+#define  BGMAC_DMA_RX_ERRDPTR			0x0001FFFF
+#define  BGMAC_DMA_RX_ERR			0xF0000000
+#define   BGMAC_DMA_RX_ERR_NOERR		0x00000000
+#define   BGMAC_DMA_RX_ERR_PROT			0x10000000
+#define   BGMAC_DMA_RX_ERR_UNDERRUN		0x20000000
+#define   BGMAC_DMA_RX_ERR_TRANSFER		0x30000000
+#define   BGMAC_DMA_RX_ERR_DESCREAD		0x40000000
+#define   BGMAC_DMA_RX_ERR_CORE			0x50000000
+
+#define BGMAC_DESC_CTL0_EOT			0x10000000	/* End of ring */
+#define BGMAC_DESC_CTL0_IOC			0x20000000	/* IRQ on complete */
+#define BGMAC_DESC_CTL0_SOF			0x40000000	/* Start of frame */
+#define BGMAC_DESC_CTL0_EOF			0x80000000	/* End of frame */
+#define BGMAC_DESC_CTL1_LEN			0x00001FFF
+
+#define BGMAC_PHY_NOREGS			0x1E
+#define BGMAC_PHY_MASK				0x1F
+
+#define BGMAC_MAX_TX_RINGS			4
+#define BGMAC_MAX_RX_RINGS			1
+
+#define BGMAC_TX_RING_SLOTS			128
+#define BGMAC_RX_RING_SLOTS			512 - 1		/* Why -1? Well, Broadcom does that... */
+
+#define BGMAC_RX_HEADER_LEN			28		/* Last 24 bytes are unused. Well... */
+#define BGMAC_RX_FRAME_OFFSET			30		/* There are 2 unused bytes between header and real data */
+#define BGMAC_RX_MAX_FRAME_SIZE			1536		/* Copied from b44/tg3 */
+#define BGMAC_RX_BUF_SIZE			(BGMAC_RX_FRAME_OFFSET + BGMAC_RX_MAX_FRAME_SIZE)
+
+#define BGMAC_BFL_ENETROBO			0x0010		/* has ephy roboswitch spi */
+#define BGMAC_BFL_ENETADM			0x0080		/* has ADMtek switch */
+#define BGMAC_BFL_ENETVLAN			0x0100		/* can do vlan */
+
+#define BGMAC_CHIPCTL_1_IF_TYPE_MASK		0x00000030
+#define BGMAC_CHIPCTL_1_IF_TYPE_RMII		0x00000000
+#define BGMAC_CHIPCTL_1_IF_TYPE_MI		0x00000010
+#define BGMAC_CHIPCTL_1_IF_TYPE_RGMII		0x00000020
+#define BGMAC_CHIPCTL_1_SW_TYPE_MASK		0x000000C0
+#define BGMAC_CHIPCTL_1_SW_TYPE_EPHY		0x00000000
+#define BGMAC_CHIPCTL_1_SW_TYPE_EPHYMII		0x00000040
+#define BGMAC_CHIPCTL_1_SW_TYPE_EPHYRMII	0x00000080
+#define BGMAC_CHIPCTL_1_SW_TYPE_RGMI		0x000000C0
+#define BGMAC_CHIPCTL_1_RXC_DLL_BYPASS		0x00010000
+
+#define BGMAC_SPEED_10				0x0001
+#define BGMAC_SPEED_100				0x0002
+#define BGMAC_SPEED_1000			0x0004
+
+struct bgmac_slot_info {
+	struct sk_buff *skb;
+	dma_addr_t dma_addr;
+};
+
+struct bgmac_dma_desc {
+	__le32 ctl0;
+	__le32 ctl1;
+	__le32 addr_low;
+	__le32 addr_high;
+} __packed;
+
+struct bgmac_dma_ring {
+	bool tx;
+	u16 num_slots;
+	u16 start;
+	u16 end;
+
+	u16 mmio_base;
+	void *cpu_base;
+	dma_addr_t dma_base;
+
+	struct bgmac_slot_info slots[BGMAC_RX_RING_SLOTS];
+};
+
+struct bgmac_rx_header {
+	__le16 len;
+	__le16 flags;
+	__le16 pad[12];
+};
+
+struct bgmac {
+	struct bcma_device *core;
+	struct bcma_device *cmn; /* Reference to CMN core for BCM4706 */
+	struct net_device *net_dev;
+	struct napi_struct napi;
+
+	u8 phyaddr;
+	bool has_robosw;
+
+	u32 int_mask;
+	u32 int_status;
+
+	bool loopback;
+
+	bool autoneg;
+	bool full_duplex;
+	int speed;
+
+	/* DMA */
+	struct bgmac_dma_ring tx_ring[BGMAC_MAX_TX_RINGS];
+	struct bgmac_dma_ring rx_ring[BGMAC_MAX_RX_RINGS];
+
+	/* Stats */
+	bool stats_grabbed;
+	u32 mib_tx_regs[BGMAC_NUM_MIB_TX_REGS];
+	u32 mib_rx_regs[BGMAC_NUM_MIB_RX_REGS];
+};
+
+static inline u32 bgmac_read(struct bgmac *bgmac, u16 offset)
+{
+	return bcma_read32(bgmac->core, offset);
+}
+
+static inline void bgmac_write(struct bgmac *bgmac, u16 offset, u32 value)
+{
+	bcma_write32(bgmac->core, offset, value);
+}
+
+static inline void bgmac_maskset(struct bgmac *bgmac, u16 offset, u32 mask,
+				   u32 set)
+{
+	bgmac_write(bgmac, offset, (bgmac_read(bgmac, offset) & mask) | set);
+}
+
+static inline void bgmac_mask(struct bgmac *bgmac, u16 offset, u32 mask)
+{
+	bgmac_maskset(bgmac, offset, mask, 0);
+}
+
+static inline void bgmac_set(struct bgmac *bgmac, u16 offset, u32 set)
+{
+	bgmac_maskset(bgmac, offset, ~0, set);
+}
+
+u16 bgmac_phy_read(struct bgmac *bgmac, u8 phyaddr, u8 reg);
+void bgmac_phy_write(struct bgmac *bgmac, u8 phyaddr, u8 reg, u16 value);
+
+#endif /* _BGMAC_H */
diff --git a/include/linux/bcma/bcma_driver_chipcommon.h b/include/linux/bcma/bcma_driver_chipcommon.h
index 8d719e9..c6f5057 100644
--- a/include/linux/bcma/bcma_driver_chipcommon.h
+++ b/include/linux/bcma/bcma_driver_chipcommon.h
@@ -638,4 +638,6 @@ extern void bcma_chipco_regctl_maskset(struct bcma_drv_cc *cc,
 				       u32 offset, u32 mask, u32 set);
 extern void bcma_pmu_spuravoid_pllupdate(struct bcma_drv_cc *cc, int spuravoid);
 
+extern u32 bcma_pmu_get_bus_clock(struct bcma_drv_cc *cc);
+
 #endif /* LINUX_BCMA_DRIVER_CC_H_ */
-- 
1.7.7

^ permalink raw reply related

* Re: [PATCH 00/11] Add basic VLAN support to bridges
From: Stephen Hemminger @ 2012-12-13 17:47 UTC (permalink / raw)
  To: vyasevic; +Cc: Or Gerlitz, netdev, davem, mst, john.r.fastabend
In-Reply-To: <50C91506.70903@redhat.com>

On Wed, 12 Dec 2012 18:36:38 -0500
Vlad Yasevich <vyasevic@redhat.com> wrote:

> On 12/12/2012 05:54 PM, Or Gerlitz wrote:
> > On Wed, Dec 12, 2012 at 10:01 PM, Vlad Yasevich <vyasevic@redhat.com> wrote:  
> >> This series of patches provides an ability to add VLANs to the bridge
> >> 

The bigger question is why is this impossible or too awkward with existing
netfilter (ebtables) functionality? As a practical matter, I like to keep
the bridging code as simple as possible and move the complexity away from
the core.

Also, if the functionality lived in netfilter rules, the developer and user
would have a more freedom to implement complex rulesets.

^ permalink raw reply

* Re: [patch net-next 0/4] net: allow to change carrier from userspace
From: Flavio Leitner @ 2012-12-13 17:54 UTC (permalink / raw)
  To: John Fastabend
  Cc: Jiri Pirko, Stephen Hemminger, netdev, davem, edumazet,
	bhutchings, mirqus, greearb
In-Reply-To: <50CA0D3D.5000503@gmail.com>

On Thu, 13 Dec 2012 09:15:41 -0800
John Fastabend <john.fastabend@gmail.com> wrote:

> [...]
> 
> >> That is what the operstate mechanism was for. Why did we build that mechanism
> >> if it doesn't work from userspace.
> >>
> >> Maybe the fix is to make setting linkstate also set carrier bits.
> >
> > Hmm. You mean to call netif_carrier_on/off as a reaction to operstate
> > change? How exactly would you like to do that?
> >
> > Thanks
> >
> > Jiri
> 
> This would break existing applications and would not really be in the
> spirit of the operstate mechanism as I read the documentation:
> 
> ./Documentation/networking/operstates.txt
>   63 IF_OPER_DORMANT (5):
>   64  Interface is L1 up, but waiting for an external event, f.e. for a
>   65  protocol to establish. (802.1X)
> 
> The L1 up is netif_carrier_on here.

Agreed. I am also not finding how to change carrier bits from the
current states.

> We use this in user space when we do not want applications to start
> using the link until we have negotiated and configured some link layer
> attributes. To do this we set IFLA_LINKMODE and then use the
> IF_OPER_DORMANT event to trigger the application eventually setting
> IF_OPER_UP when the link layer negotiation is complete. If you take the
> carrier down this breaks. In my case the protocol is LLDP but I think
> there are other examples. This is basically the example Stephen already
> gave.
> 
> I guess I still am missing why teamd doesn't just set IFLA_LINKMODE
> then manage the operstate this way? Sure teamd would have to become a
> bit smarter but it would save adding additional interfaces to the
> kernel. In the LinkAgg case you could just pin the operstate down this
> would also allow protocols to run under the linkagg over the LLC in
> IEEE speak which I think is being discussed in the latest round of
> LLDP/LinkAgg spec updates (I'll check on that later today).

Think about the ARP monitoring as an example. If the teamd (userlevel)
doesn't receive the reply packet, it should set the master interface 
as down in terms of carrier because there isn't a valid L1 from the
master interface perspective.

LACP needs the same action. If the ports can't move to DISTRIBUTING/
COLLECTING state, it should set master's carrier bit to down.

Even the lack of ports should keep the master with carrier down.

I am saying this because people are used to and there are scripts out
there using something like:
# ethtool <iface> | grep 'Link'
to react an interface failure.

Therefore, I am not seeing how setting operstate to down helps with the
current code.  Can ethtool report Link based on a logic between operstate
and netif_carrier_ok()? If operstate is down, it would report down regardless
of netif_carrier_ok() for instance.

Calling netif_carrier_off() also calls dev_deactivate() to stop everything.
So, if team sets operstate to down, the interface remains active sending/
receiving as far as I can see, right? Not sure if this would be a problem.

-- 
fbl

^ permalink raw reply

* Re: [PATCH v2] ipv6: Change skb->data before using icmpv6_notify() to propagate redirect
From: David Miller @ 2012-12-13 17:59 UTC (permalink / raw)
  To: djduanjiong; +Cc: steffen.klassert, netdev
In-Reply-To: <50C9BA41.1010507@gmail.com>

From: Duan Jiong <djduanjiong@gmail.com>
Date: Thu, 13 Dec 2012 19:21:37 +0800

> +	if (!ndisc_parse_options(msg->opt, ndoptlen, &ndopts)) {
> +		ND_PRINTK(2, warn,
> +			  "Redirect: invalid ND options\n");

Do not add more uses of such baroque kernel logging mechanisms.

> +	if (!ndopts.nd_opts_rh) {
> +		return;
> +	}

Single statement basic blocks should never be surrounded by
curly braces, they just waste lines.

> +	hdr = (u8 *)ndopts.nd_opts_rh;

(u8 *)[SPACE]ndopts...

> +	if(!pskb_pull(skb, hdr - skb_transport_header(skb))) {

if[SPACE](...

^ permalink raw reply

* Re: remove noisy message from llcp_sock_sendmsg
From: David Miller @ 2012-12-13 18:01 UTC (permalink / raw)
  To: davej; +Cc: netdev
In-Reply-To: <20121213041134.GA1611@redhat.com>

From: Dave Jones <davej@redhat.com>
Date: Wed, 12 Dec 2012 23:11:34 -0500

> This is easily triggerable when fuzz-testing as an unprivileged user.
> We could rate-limit it, but given we don't print similar messages
> for other protocols, I just removed it.
> 
> Signed-off-by: Dave Jones <davej@redhat.com>

Applied.

Please provide appropriate "xxx: " prefixes in your subject lines
for patch submissions.  "nfc: " would have been appropriate for
this one.

Thanks.

^ permalink raw reply

* Re: GPF in skb_flow_dissect
From: David Miller @ 2012-12-13 18:01 UTC (permalink / raw)
  To: erdnetdev; +Cc: davej, jasowang, netdev
In-Reply-To: <1355376177.12271.244.camel@edumazet-glaptop>

From: Eric Dumazet <erdnetdev@gmail.com>
Date: Wed, 12 Dec 2012 21:22:57 -0800

> [PATCH] tuntap: dont use skb after netif_rx_ni(skb)
> 
> commit 96442e4242 (tuntap: choose the txq based on rxq) added
> a use after free.
> 
> Cache rxhash in a temp variable before calling netif_rx_ni()
> 
> Reported-by: Dave Jones <davej@redhat.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied.

^ permalink raw reply

* Re: [PATCH] ndisc: Fix padding error in link-layer address option.
From: David Miller @ 2012-12-13 18:01 UTC (permalink / raw)
  To: yoshfuji; +Cc: netdev
In-Reply-To: <201212131429.qBDETaa6029537@94.43.138.210.xn.2iij.net>

From: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Date: Thu, 13 Dec 2012 23:29:36 +0900

> If a natural number n exists where 2 + data_len <= 8n < 2 + data_len + pad,
> post padding is not initialized correctly.
> 
> (Un)fortunately, the only type that requires pad is Infiniband,
> whose pad is 2 and data_len is 20, and this logical error has not
> become obvious, but it is better to fix.
> 
> Note that ndisc_opt_addr_space() handles the situation described
> above correctly.
> 
> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] net: ethool: Document struct ethtool_flow_ext
From: David Miller @ 2012-12-13 18:01 UTC (permalink / raw)
  To: yanb; +Cc: ogerlitz, amirv, netdev, bhutchings
In-Reply-To: <1355412059-25663-1-git-send-email-yanb@mellanox.com>

From: Yan Burman <yanb@mellanox.com>
Date: Thu, 13 Dec 2012 17:20:59 +0200

> Add documentation for struct ethtool_flow_ext especially in regard
> to what flags are needed for which fields.
> 
> Signed-off-by: Yan Burman <yanb@mellanox.com>

Applied.

^ permalink raw reply

* Re: [PATCH] bridge: fix icmpv6 endian bug and other sparse warnings
From: David Miller @ 2012-12-13 18:02 UTC (permalink / raw)
  To: shemminger; +Cc: fengguang.wu, amwang, netdev
In-Reply-To: <20121213085128.1db11ee4@nehalam.linuxnetplumber.net>

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Thu, 13 Dec 2012 08:51:28 -0800

> Fix the warnings reported by sparse on recent bridge multicast
> changes. Mostly just rcu annotation issues but in this case
> sparse found a real bug! The ICMPv6 mld2 query mrc
> values is in network byte order.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

Applied.

^ permalink raw reply

* Re: netconsole fun
From: Neil Horman @ 2012-12-13 18:08 UTC (permalink / raw)
  To: Peter Hurley; +Cc: Cong Wang, netdev
In-Reply-To: <1355410171.2605.17.camel@thor>

On Thu, Dec 13, 2012 at 09:49:31AM -0500, Peter Hurley wrote:
> On Thu, 2012-12-13 at 07:36 -0500, Neil Horman wrote:
> > On Wed, Dec 12, 2012 at 03:59:17PM -0500, Peter Hurley wrote:
> > > On Tue, 2012-12-11 at 11:45 -0500, Neil Horman wrote:
> > > > On Tue, Dec 11, 2012 at 10:16:51AM -0500, Peter Hurley wrote:
> > > > > On Tue, 2012-12-11 at 09:30 -0500, Neil Horman wrote:
> > > > > > On Tue, Dec 11, 2012 at 09:19:52AM -0500, Peter Hurley wrote:
> > > > > > > On Tue, 2012-12-11 at 04:51 +0000, Cong Wang wrote:
> > > > > > > > On Mon, 10 Dec 2012 at 14:17 GMT, Peter Hurley <peter@hurleysoftware.com> wrote:
> > > > > > > > > Now that netpoll has been disabled for slaved devices, is there a
> > > > > > > > > recommended method of running netconsole on a machine that has a slaved
> > > > > > > > > device?
> > > > > > > > >
> > > > > > > > 
> > > > > > > > Yes, running it on the master device instead.
> > > > > > > 
> > > > > > > Thanks for the suggestion, but:
> > > > > > > 
> > > > > > > [ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.7.0-rc8-xeon ...... netconsole=@192.168.10.99/br0,30000@192.168.10.100/xx:xx:xx:xx:xx:xx
> > > > > > > ...
> > > > > > > [ 5.289869] netpoll: netconsole: local port 6665
> > > > > > > [ 5.289885] netpoll: netconsole: local IP 192.168.10.99
> > > > > > > [ 5.289892] netpoll: netconsole: interface 'br0'
> > > > > > > [ 5.289898] netpoll: netconsole: remote port 30000
> > > > > > > [ 5.289907] netpoll: netconsole: remote IP 192.168.10.100
> > > > > > > [ 5.289914] netpoll: netconsole: remote ethernet address xx:xx:xx:xx:xx:xx
> > > > > > > [ 5.289922] netpoll: netconsole: br0 doesn't exist, aborting
> > > > > > > [ 5.289929] netconsole: cleaning up
> > > > > > > ...
> > > > > > > [ 9.392291] Bridge firewalling registered
> > > > > > > [ 9.396805] device eth1 entered promiscuous mode
> > > > > > > [ 9.418350] eth1:  setting full-duplex.
> > > > > > > [ 9.421268] br0: port 1(eth1) entered forwarding state
> > > > > > > [ 9.423354] br0: port 1(eth1) entered forwarding state
> > > > > > > 
> > > > > > > 
> > > > > > > Is there a way to control or associate network device names prior to
> > > > > > > udev renaming?
> > > > > > > 
> > > > > > That looks like a systemd problem (or more specifically a boot dependency
> > > > > > problem).  You need to modify your netconsole unit/service file to start after
> > > > > > all your networking is up.  NetworkManager provides a dummy service file for
> > > > > > this purpose, called networkmanager-wait-online.service
> > > > > 
> > > > > Ok. So with a single physical network interface that will be bridged,
> > > > > netconsole cannot used for kernel boot messages.
> > > > > 
> > > > > With a machine with multiple nics, is there a way to control device
> > > > > naming so that the interface name to be used by netconsole specified on
> > > > > the boot command line will actually corresponding to the intended
> > > > > device. For example,
> > > > > 
> > > > > [ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.7.0-rc8-xeon ...... netconsole=@192.168.1.123/eth0,30000@192.168.1.139/xx:xx:xx:xx:xx:xx
> > > > > ....
> > > > > [ 4.092184] 3c59x: Donald Becker and others.
> > > > > [ 4.092204] 0000:07:05.0: 3Com PCI 3c905C Tornado at ffffc9000186cf80.
> > > > > [ 4.094035] tg3.c:v3.125 (September 26, 2012)
> > > > > ....
> > > > > [ 4.125038] tg3 0000:08:00.0 eth1: Tigon3 [partno(BCM95754) rev b002] (PCI Express) MAC address xx:xx:xx:xx:xx:xx
> > > > > [ 4.125055] tg3 0000:08:00.0 eth1: attached PHY is 5787 (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[0])
> > > > > [ 4.125062] tg3 0000:08:00.0 eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
> > > > > [ 4.125068] tg3 0000:08:00.0 eth1: dma_rwctrl[76180000] dma_mask[64-bit]
> > > > > 
> > > > > This is attaching netconsole to the wrong device because bus
> > > > > enumeration, and therefore load order, is not consistent from boot to
> > > > > boot.
> > > > > 
> > > > No, theres no way to do that.  As you note device ennumeration isn't consistent
> > > > accross boots, thats why udev creates rules to rename devices based on immutable
> > > > (or semi-immutable) data, like mac addresses, or pci bus locations).  Once that
> > > > happens, you'll have consistent names for your interfaces, and that work will be
> > > > guaranteed to be done after networkmanager has finished opening all the
> > > > interfaces that it needs (hence my suggestion to make netconsole service
> > > > dependent on networkmanager service startup completing).
> > > 
> > > Just wondering if you think something like the patch below is
> > > suitable/acceptable for insulating netconsole from inconsistent device
> > > name scenarios without changing the existing semantics. The basic idea
> > > is to allow an ethernet MAC address in the <dev> field of the
> > > netconsole= options, and if a MAC address was specified rather than a
> > > device name, to do the dev lookup from the MAC address instead.
> > > 
> > > This doesn't extend to, but also doesn't interfere with, the dynamic
> > > config of netconsole via configfs.
> > > 
> > > Would you mind reviewing it?
> > > 
> > > Regards,
> > > Peter
> > > 
> > This looks like a pretty good idea to me.  That said, something occured to me
> > when you wrote your summary above.  Have you looked at the netconsole service
> > scripts that most distros provide in their packaging?  I'm almost positive Red
> > Hat/Fedora (and also like Suse and Ubuntu), already implement this functionality
> > from user space.  Basically, instead of people just modprobing netconsole, they
> > create a service script that parses a config file that has contains all the
> > options needed to load the netconsole module, and it has the intellegence to see
> > if you specified a mac address rather than a device.  If you did that it finds
> > the corresponding device mac address and uses that as the device.  I'm sorry, I
> > don't know why I didn't think of that before.  Check that out though, that will
> > likey give you exactly what you need
> 
> Even with a udev rule to load netconsole that runs immediately after
> device renaming (so before scripting), most of the dynamic module
> loading has already happened so netconsole misses it. At least with the
> patch, netconsole will load and attach to the proper interface much
> earlier in the boot so that module-load-time messages will be caught.
> 
I'm not sure what you mean by this.  I get that other drivers are loaded and
that if the netconsole service runs too soon, you might not have a network path
to the netconsole host you want, but as long as the network is up, the service
script (if properly written), shouldn't make you specify a network device.
Instead of having to specify the device to the netconsole module directly, the
config file for the service script lets you specify the destination ip address
of the netconsole server, and figures out the output device based on the routing
tables, whatever that device name happens to be on that boot.  So you don't
really need this patch, because user space can do this work (and more) for you.
You just have to wait for the network to come up, which you need to do anyway.

> There is an unforeseen consequence of the patch: it breaks device
> renaming because the device will already be in use by netconsole. Which
> is the whole problem with userspace device renaming to begin with...
> 
That is bad, but see above, the netconsole service can work around this for you,
allowing you to never have to specify a particular device at all.

> I guess that leaves only the option of building in netconsole and the
> driver that supplies the interface.
> 

> Oh well.
> 
> Regards,
> Peter
> 
> 

^ permalink raw reply

* Re: [patch net-next 0/4] net: allow to change carrier from userspace
From: Stephen Hemminger @ 2012-12-13 18:09 UTC (permalink / raw)
  To: Flavio Leitner
  Cc: John Fastabend, Jiri Pirko, netdev, davem, edumazet, bhutchings,
	mirqus, greearb
In-Reply-To: <20121213155423.07c47307@obelix.rh>

On Thu, 13 Dec 2012 15:54:23 -0200
Flavio Leitner <fbl@redhat.com> wrote:

> I am saying this because people are used to and there are scripts out
> there using something like:
> # ethtool <iface> | grep 'Link'
> to react an interface failure.

Then the script is broken. It is asking about hardware state.

^ permalink raw reply

* Re: [patch net-next 0/4] net: allow to change carrier from userspace
From: Flavio Leitner @ 2012-12-13 18:17 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: John Fastabend, Jiri Pirko, netdev, davem, edumazet, bhutchings,
	mirqus, greearb
In-Reply-To: <20121213100933.6d68a8e5@nehalam.linuxnetplumber.net>

On Thu, 13 Dec 2012 10:09:33 -0800
Stephen Hemminger <shemminger@vyatta.com> wrote:

> On Thu, 13 Dec 2012 15:54:23 -0200
> Flavio Leitner <fbl@redhat.com> wrote:
> 
> > I am saying this because people are used to and there are scripts out
> > there using something like:
> > # ethtool <iface> | grep 'Link'
> > to react an interface failure.
> 
> Then the script is broken. It is asking about hardware state.

I was talking about the team master interface, so it makes sense
to check its 'hardware' state. Just think on 'bond0' interface
with no slaves. It should report Link detected: no.

See bond_release(), what happens if bond->slave_cnt == 0, for instance.

-- 
fbl

^ permalink raw reply

* Re: [RFC] net : add tx timestamp to packet mmap.
From: Richard Cochran @ 2012-12-13 18:17 UTC (permalink / raw)
  To: Paul Chavent; +Cc: davem, edumazet, daniel.borkmann, xemul, ebiederm, netdev
In-Reply-To: <50C9FEC4.5050804@onera.fr>

On Thu, Dec 13, 2012 at 05:13:56PM +0100, Paul Chavent wrote:
> >
> >In order for time stamps to appear, somebody has to call
> >skb_tx_timestamp() ...
> Yes. "Somebody" means "the hardware driver" after completing xmit.
> That's true ?

Yes, the MAC driver must call this helper function, but not many
drivers do this yet. You didn't say which MAC driver you are using and
whether it supports Tx SO_TIMESTAMPING or not.

> Yes, it only sets some flags. I thought that those flags was
> required by the skb_tx_timestamp() in order to make the appropriate
> timestamping (hardware, software, etc).
> 
> So in order to have tx timestamp that work, both calls are needed ?

Yes.

> Why sock_tx_timestamp is called in packet_fill_skb and
> packet_sendmsg_spkt and not in tpacket_fill_skb ?
> Why i can retrieve timestamps when i add this call ?

Sorry, I don't know much about packet mmap. Last time I tried it, some
years ago, it wasn't really working.

Richard

^ permalink raw reply

* Re: [patch net-next 0/4] net: allow to change carrier from userspace
From: Stephen Hemminger @ 2012-12-13 18:20 UTC (permalink / raw)
  To: Flavio Leitner
  Cc: John Fastabend, Jiri Pirko, netdev, davem, edumazet, bhutchings,
	mirqus, greearb
In-Reply-To: <20121213161733.7bffe897@obelix.rh>

On Thu, 13 Dec 2012 16:17:33 -0200
Flavio Leitner <fbl@redhat.com> wrote:

> On Thu, 13 Dec 2012 10:09:33 -0800
> Stephen Hemminger <shemminger@vyatta.com> wrote:
> 
> > On Thu, 13 Dec 2012 15:54:23 -0200
> > Flavio Leitner <fbl@redhat.com> wrote:
> > 
> > > I am saying this because people are used to and there are scripts out
> > > there using something like:
> > > # ethtool <iface> | grep 'Link'
> > > to react an interface failure.
> > 
> > Then the script is broken. It is asking about hardware state.
> 
> I was talking about the team master interface, so it makes sense
> to check its 'hardware' state. Just think on 'bond0' interface
> with no slaves. It should report Link detected: no.
> 
> See bond_release(), what happens if bond->slave_cnt == 0, for instance.
> 

I was thinking more that ethtool operation for reporting link on
the team device should use the proper check rather than just using netif_carrier_ok(),
the team ethtool operation for get_link should be check IFF_RUNNING flag
in dev->flags which is controlled by operstate transistions.

^ permalink raw reply

* Re: [RFC PATCH net-next 4/4 V4] try to fix performance regression
From: Rick Jones @ 2012-12-13 18:25 UTC (permalink / raw)
  To: Weiping Pan; +Cc: David Laight, davem, brutus, netdev
In-Reply-To: <50C9E0A0.2040409@redhat.com>

On 12/13/2012 06:05 AM, Weiping Pan wrote:
> But if I just run normal tcp loopback for each message size, then the
> performance is stable.
> [root@intel-s3e3432-01 ~]# cat base.sh
> for s in 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768
> 65536 131072 262144 524288 1048576
> do
> netperf -i -2,10 -I 95,20 -- -m $s -M $s | tail -n1
> done

The -i option goes max,min iterations:

http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#index-g_t_002di_002c-Global-28

and src/netsh.c will apply some silent clipping to that:

     case 'i':
       /* set the iterations min and max for confidence intervals */
       break_args(optarg,arg1,arg2);
       if (arg1[0]) {
	iteration_max = convert(arg1);
       }
       if (arg2[0] ) {
	iteration_min = convert(arg2);
       }
       /* if the iteration_max is < iteration_min make iteration_max
	 equal iteration_min */
       if (iteration_max < iteration_min) iteration_max = iteration_min;
       /* limit minimum to 3 iterations */
       if (iteration_max < 3) iteration_max = 3;
       if (iteration_min < 3) iteration_min = 3;
       /* limit maximum to 30 iterations */
       if (iteration_max > 30) iteration_max = 30;
       if (iteration_min > 30) iteration_min = 30;
       if (confidence_level == 0) confidence_level = 99;
       if (interval == 0.0) interval = 0.05; /* five percent */
       break;

So, what will happen with your netperf command line above is it will set 
iteration max to 10 iterations and it will always run 10 iterations 
since min will equal max.  If you want it to possibly terminate sooner 
upon hitting the confidence intervals you would want to go with -i 10,3. 
  That will have netperf always run at least three and no more than 10 
iterations.

If I'm not mistaken, the use of the "| tail -n 1" there will cause the 
"classic" confidence intervals not met warning to be tossed (unless I 
suppose it is actually going to stderr?).

If you use the "omni" tests directly rather than via "migration" you 
will no longer get warnings about not hitting the confidence interval, 
but you can have netperf emit the confidence level it actually achieved 
as well as the number of iterations it took to get there.  You would use 
the omni output selection to do that.

http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#Omni-Output-Selection

These may have been mentioned before...

Judging from that command line you have the potential variability of the 
socket buffer auto-tuning.  Does AF_UNIX do the same sort of auto 
tuning?  It may be desirable to add some test-specific -s and -S options 
to have a fixed socket buffer size.

Since the MTU for loopback is ~16K, the send sizes below that will 
probably have differing interactions with the Nagle algorithm. 
Particularly as I suspect the timing will differ between friends and no 
friends.

I would guess the most "consistent" comparison with AF_UNIX would be 
when Nagle is disabled for the TCP_STREAM tests.  That would be a 
test-specific -D option.

Perhaps a more "stable" way to compare friends, no-friends and unix 
would be to use the _RR tests.  That will be a more direct, less-prone 
to other heuristics measure of path-length differences - both in the 
reported transactions per second and in any CPU utilization/service 
demand if you enable that via -c.  I'm not sure it would be necessary to 
take the request/response size out beyond a couple KB.  Take it out to 
the MB level and you will probably return to the question of auto-tuning 
of the socket buffer sizes.

happy benchmarking,

rick jones

^ permalink raw reply

* [PATCH] Fix comment for packets without data
From: Florent Fourcot @ 2012-12-13 18:27 UTC (permalink / raw)
  To: pablo, yoshfuji; +Cc: netdev, netfilter-devel, Florent Fourcot

The double negation is probably a typo error

Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
---
 net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c b/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
index 00ee17c..d9efe32 100644
--- a/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
+++ b/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
@@ -81,7 +81,7 @@ static int ipv6_get_l4proto(const struct sk_buff *skb, unsigned int nhoff,
 	}
 	protoff = ipv6_skip_exthdr(skb, extoff, &nexthdr, &frag_off);
 	/*
-	 * (protoff == skb->len) mean that the packet doesn't have no data
+	 * (protoff == skb->len) mean that the packet does not have any data
 	 * except of IPv6 & ext headers. but it's tracked anyway. - YK
 	 */
 	if (protoff < 0 || (frag_off & htons(~0x7)) != 0) {
-- 
1.7.10.4


^ permalink raw reply related

* Re: [patch net-next 0/4] net: allow to change carrier from userspace
From: Dan Williams @ 2012-12-13 18:33 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Flavio Leitner, John Fastabend, Jiri Pirko, netdev, davem,
	edumazet, bhutchings, mirqus, greearb
In-Reply-To: <20121213102051.72ad69aa@nehalam.linuxnetplumber.net>

On Thu, 2012-12-13 at 10:20 -0800, Stephen Hemminger wrote:
> On Thu, 13 Dec 2012 16:17:33 -0200
> Flavio Leitner <fbl@redhat.com> wrote:
> 
> > On Thu, 13 Dec 2012 10:09:33 -0800
> > Stephen Hemminger <shemminger@vyatta.com> wrote:
> > 
> > > On Thu, 13 Dec 2012 15:54:23 -0200
> > > Flavio Leitner <fbl@redhat.com> wrote:
> > > 
> > > > I am saying this because people are used to and there are scripts out
> > > > there using something like:
> > > > # ethtool <iface> | grep 'Link'
> > > > to react an interface failure.
> > > 
> > > Then the script is broken. It is asking about hardware state.
> > 
> > I was talking about the team master interface, so it makes sense
> > to check its 'hardware' state. Just think on 'bond0' interface
> > with no slaves. It should report Link detected: no.
> > 
> > See bond_release(), what happens if bond->slave_cnt == 0, for instance.
> > 
> 
> I was thinking more that ethtool operation for reporting link on
> the team device should use the proper check rather than just using netif_carrier_ok(),
> the team ethtool operation for get_link should be check IFF_RUNNING flag
> in dev->flags which is controlled by operstate transistions.

That's not entirely sufficient, because not everything uses ethtool to
deterine whether the link/carrier is active.  eg, anything listenting to
netlink for IFF_RUNNING won't know anything about this.  I may have
missed previous conversation, but IFF_RUNNING is AFAIK the sole
mechanism to indicate to userspace that the link is operational and
ready for general traffic.  Can the team drivers simply not set
IFF_RUNNING until the interface is actually ready for general traffic?

I'd just caution people to be careful when it comes to carrier/link
states, as a lot of stuff relies on it, and people keep trying to
slightly change the meaning of it.  We need it to be consistent across
net interfaces and driver implementations, not have specific meanings to
certain drivers that don't apply to others.

If teamd needs to manage the carrier state of the teamX interface,
perhaps instead of munging the actual carrier state itself, it can flip
some team-private value that is one component of determining whether
IFF_RUNNING gets set or not?

Dan

^ permalink raw reply

* Re: [RFC][PATCH] bgmac: driver for GBit MAC core on BCMA bus
From: Joe Perches @ 2012-12-13 18:46 UTC (permalink / raw)
  To: Rafał Miłecki; +Cc: netdev, David S. Miller
In-Reply-To: <1355420611-25764-1-git-send-email-zajec5@gmail.com>

On Thu, 2012-12-13 at 18:43 +0100, Rafał Miłecki wrote:
> BCMA is a Broadcom specific bus with devices AKA cores. All recent BCMA
> based SoCs have gigabit ethernet provided by the GBit MAC core. This
> patch adds driver for such a cores registering itself as a netdev. It
> has been tested on a BCM4706 and BCM4718 chipsets.
[]
> It's just a RFC, so be aware of two ugly parts in this version:

Just some trivial notes.  Feel free to ignore.

> diff --git a/drivers/net/ethernet/broadcom/bgmac.c b/drivers/net/ethernet/broadcom/bgmac.c

[]

> +#define pr_fmt(fmt)		KBUILD_MODNAME ": " fmt
> +
> +#define bgmac_err(bgmac, fmt, ...) \
> +	pr_err("u%d: " fmt, (bgmac)->core->core_unit, ##__VA_ARGS__)
> +#define bgmac_warn(bgmac, fmt, ...) \
> +	pr_warn("u%d: " fmt, (bgmac)->core->core_unit, ##__VA_ARGS__)
> +#define bgmac_info(bgmac, fmt, ...) \
> +	pr_info("u%d: " fmt, (bgmac)->core->core_unit, ##__VA_ARGS__)
> +#define bgmac_debug(bgmac, fmt, ...) \
> +	pr_debug("u%d: " fmt, (bgmac)->core->core_unit, ##__VA_ARGS__)

Most subsystems use prefix_dbg now instead of prefix_debug.
Why not change these from pr_<level> to dev_<level>?
Maybe these should be in the bgmac.h file.

> +static bool bgmac_wait_value(struct bcma_device *core, u16 reg, u32 mask,
> +			     u32 value, int timeout)
> +{
[]
> +	pr_err("Timeout waiting for reg 0x%X\n", reg);

Maybe add new bmca_dev_(err|warn|info|dbg) macros too?

> +/* http://bcm-v4.sipsolutions.net/mac-gbit/gmac/chipphyinit */
> +static void bgmac_phy_init(struct bgmac *bgmac)
> +{
> +	struct bcma_chipinfo *ci = &bgmac->core->bus->chipinfo;
> +	struct bcma_drv_cc *cc = &bgmac->core->bus->drv_cc;
> +	u8 i;
> +
> +	if (ci->id == BCMA_CHIP_ID_BCM5356) {
> +		for (i = 0; i < 5; i++) {
> +			bgmac_phy_write(bgmac, i, 0x1f, 0x008b);
> +			bgmac_phy_write(bgmac, i, 0x15, 0x0100);
> +			bgmac_phy_write(bgmac, i, 0x1f, 0x000f);
> +			bgmac_phy_write(bgmac, i, 0x12, 0x2aaa);
> +			bgmac_phy_write(bgmac, i, 0x1f, 0x000b);
> +		}
> +	}
> +	if ((ci->id == BCMA_CHIP_ID_BCM5357 && ci->pkg != 10) ||
> +	    (ci->id == BCMA_CHIP_ID_BCM4749 && ci->pkg != 10) ||
> +	    (ci->id == BCMA_CHIP_ID_BCM53572 && ci->pkg != 9)) {
> +		bcma_chipco_chipctl_maskset(cc, 2, ~0xc0000000, 0);
> +		bcma_chipco_chipctl_maskset(cc, 4, ~0x80000000, 0);
> +		for (i = 0; i < 5; i++) {
> +			bgmac_phy_write(bgmac, i, 0x1f, 0x000f);
> +			bgmac_phy_write(bgmac, i, 0x16, 0x5284);
> +			bgmac_phy_write(bgmac, i, 0x1f, 0x000b);
> +			bgmac_phy_write(bgmac, i, 0x17, 0x0010);
> +			bgmac_phy_write(bgmac, i, 0x1f, 0x000f);
> +			bgmac_phy_write(bgmac, i, 0x16, 0x5296);
> +			bgmac_phy_write(bgmac, i, 0x17, 0x1073);
> +			bgmac_phy_write(bgmac, i, 0x17, 0x9073);
> +			bgmac_phy_write(bgmac, i, 0x16, 0x52b6);
> +			bgmac_phy_write(bgmac, i, 0x17, 0x9273);
> +			bgmac_phy_write(bgmac, i, 0x1f, 0x000b);

That's a lot of magic numbers.

> +static void bgmac_chip_stats_update(struct bgmac *bgmac)
> +{
> +	int i;
> +
> +	if (bgmac->core->id.id != BCMA_CORE_4706_MAC_GBIT) {

Save an indent level by returning a call to
a new bgmac_4706_chip_stats_update instead?

	if (bgmac->core->id.id == BCMA_CORE_4706_MAC_GBIT)
		return bgmac_407_chip_stats_update(bgmac);

> +		for (i = 0; i < BGMAC_NUM_MIB_TX_REGS; i++)
> +			bgmac->mib_tx_regs[i] =
> +				bgmac_read(bgmac,
> +					   BGMAC_TX_GOOD_OCTETS + (i * 4));
> +		for (i = 0; i < BGMAC_NUM_MIB_RX_REGS; i++)
> +			bgmac->mib_rx_regs[i] =
> +				bgmac_read(bgmac,
> +					   BGMAC_RX_GOOD_OCTETS + (i * 4));
> +	}

unindent

> +
> +	/* TODO: what else? how to handle BCM4706? */

and delete?

> +}
> +
> +static void bgmac_clear_mib(struct bgmac *bgmac)
> +{
> +	int i;
> +
> +	if (bgmac->core->id.id == BCMA_CORE_4706_MAC_GBIT)
> +		return;

Something like what this function does.

[]

> diff --git a/drivers/net/ethernet/broadcom/bgmac.h b/drivers/net/ethernet/broadcom/bgmac.h

[]

> +struct bgmac {
> +	struct bcma_device *core;
> +	struct bcma_device *cmn; /* Reference to CMN core for BCM4706 */
> +	struct net_device *net_dev;
> +	struct napi_struct napi;
> +
> +	u8 phyaddr;
> +	bool has_robosw;
> +
> +	u32 int_mask;
> +	u32 int_status;
> +
> +	bool loopback;
> +
> +	bool autoneg;
> +	bool full_duplex;
> +	int speed;
> +
> +	/* DMA */
> +	struct bgmac_dma_ring tx_ring[BGMAC_MAX_TX_RINGS];
> +	struct bgmac_dma_ring rx_ring[BGMAC_MAX_RX_RINGS];
> +
> +	/* Stats */
> +	bool stats_grabbed;
> +	u32 mib_tx_regs[BGMAC_NUM_MIB_TX_REGS];
> +	u32 mib_rx_regs[BGMAC_NUM_MIB_RX_REGS];
> +};

Maybe some minor reordering of the u8/bools to reduce padding.

^ permalink raw reply

* Re: [PATCH] Fix comment for packets without data
From: Rick Jones @ 2012-12-13 18:51 UTC (permalink / raw)
  To: Florent Fourcot; +Cc: pablo, yoshfuji, netdev, netfilter-devel
In-Reply-To: <1355423248-12448-1-git-send-email-florent.fourcot@enst-bretagne.fr>

On 12/13/2012 10:27 AM, Florent Fourcot wrote:
> The double negation is probably a typo error
>
> Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
> ---
>   net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c |    2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c b/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
> index 00ee17c..d9efe32 100644
> --- a/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
> +++ b/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
> @@ -81,7 +81,7 @@ static int ipv6_get_l4proto(const struct sk_buff *skb, unsigned int nhoff,
>   	}
>   	protoff = ipv6_skip_exthdr(skb, extoff, &nexthdr, &frag_off);
>   	/*
> -	 * (protoff == skb->len) mean that the packet doesn't have no data
> +	 * (protoff == skb->len) mean that the packet does not have any data
>   	 * except of IPv6 & ext headers. but it's tracked anyway. - YK
>   	 */
>   	if (protoff < 0 || (frag_off & htons(~0x7)) != 0) {
>

Might as well take it to completion and have it read "(protoff == 
skb->len) means the packet does not have any data"  or "(protoff == 
skb->len) means the packet has no data"  Not sure about that next line 
with the except.  Perhaps "(protoff == skb->len) means the packet has no 
data, just IPv6 and extension headers, but it is tracked anyway."

rick jones

^ permalink raw reply

* Re: [PATCH 00/11] Add basic VLAN support to bridges
From: Vlad Yasevich @ 2012-12-13 18:53 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Or Gerlitz, netdev, davem, mst, john.r.fastabend
In-Reply-To: <20121213094719.3a7a9408@nehalam.linuxnetplumber.net>

On 12/13/2012 12:47 PM, Stephen Hemminger wrote:
> On Wed, 12 Dec 2012 18:36:38 -0500
> Vlad Yasevich <vyasevic@redhat.com> wrote:
>
>> On 12/12/2012 05:54 PM, Or Gerlitz wrote:
>>> On Wed, Dec 12, 2012 at 10:01 PM, Vlad Yasevich <vyasevic@redhat.com> wrote:
>>>> This series of patches provides an ability to add VLANs to the bridge
>>>>
>
> The bigger question is why is this impossible or too awkward with existing
> netfilter (ebtables) functionality? As a practical matter, I like to keep
> the bridging code as simple as possible and move the complexity away from
> the core.

Basic filtering can be achieved, but it is awkward and possible slow. 
We've seen results where long chains (due to a lot of running VMs) cause
a drastic regression (about 20%).   I suppose vlan tagging/stripping 
could also be done by the tables, but that would involve ever more chains.

Really interesting thing is that some products, when using vlans under 
the bridge, explicitly don't include the physical device into the bridge 
so that they don't have to specify a ton of rules or open security holes.

The desire here was to provide basic VLAN switching functionality on the 
bridge.

>
> Also, if the functionality lived in netfilter rules, the developer and user
> would have a more freedom to implement complex rulesets.

I guess some of the issue.  Complex rulesets to get something relatively 
simple causes performance regressions.  Also, in a VM environment, as 
the number of VMs increases, the number of rules also
increases causing management and more performance issues.

When I started working on this a while ago, I've asked in the first RFC 
series if this was worth pursuing.  Here is a quote from you:

> Initial reaction is that this is a useful. You can already do the same thing
> with ebtables, and ebtables allows more flexibility. But ebtables does slow
> things down, and is harder to configure.

Additionally, this is just an option.  If the new filtering is not 
configured, the bridge behaves exactly like it did before.  All the 
extra things one can do with the ebtables are still there as well..

-vlad
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* ethtool 3.7 released
From: Ben Hutchings @ 2012-12-13 18:57 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: text/plain, Size: 623 bytes --]

ethtool version 3.7 has been released.

Home page: https://ftp.kernel.org/pub/software/network/ethtool/
Download link:
https://ftp.kernel.org/pub/software/network/ethtool/ethtool-3.7.tar.xz

Release notes:

	* Fix: Gracefully handle failure of register pretty-printer (-d option)
	* Feature: Add support for et131x registers (-d option)
	* Feature: Basic optical diagnostics for SFF-8472 modules (-m option)

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.




[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* Re: [PATCH 00/11] Add basic VLAN support to bridges
From: David Miller @ 2012-12-13 19:00 UTC (permalink / raw)
  To: shemminger; +Cc: vyasevic, or.gerlitz, netdev, mst, john.r.fastabend
In-Reply-To: <20121213094719.3a7a9408@nehalam.linuxnetplumber.net>

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Thu, 13 Dec 2012 09:47:19 -0800

> On Wed, 12 Dec 2012 18:36:38 -0500
> Vlad Yasevich <vyasevic@redhat.com> wrote:
> 
>> On 12/12/2012 05:54 PM, Or Gerlitz wrote:
>> > On Wed, Dec 12, 2012 at 10:01 PM, Vlad Yasevich <vyasevic@redhat.com> wrote:  
>> >> This series of patches provides an ability to add VLANs to the bridge
>> >> 
> 
> The bigger question is why is this impossible or too awkward with existing
> netfilter (ebtables) functionality? As a practical matter, I like to keep
> the bridging code as simple as possible and move the complexity away from
> the core.
> 
> Also, if the functionality lived in netfilter rules, the developer and user
> would have a more freedom to implement complex rulesets.

I do not consider it wise to create more, rather then fewer, users
of ebtables.

It is one of the most poorly constructed subsystems in the entire
networking.

Just my $0.02

^ permalink raw reply

* Re: [PATCH 00/11] Add basic VLAN support to bridges
From: Stephen Hemminger @ 2012-12-13 19:04 UTC (permalink / raw)
  To: David Miller; +Cc: vyasevic, or.gerlitz, netdev, mst, john.r.fastabend
In-Reply-To: <20121213.140023.2131448980265576282.davem@davemloft.net>

On Thu, 13 Dec 2012 14:00:23 -0500 (EST)
David Miller <davem@davemloft.net> wrote:

> From: Stephen Hemminger <shemminger@vyatta.com>
> Date: Thu, 13 Dec 2012 09:47:19 -0800
> 
> > On Wed, 12 Dec 2012 18:36:38 -0500
> > Vlad Yasevich <vyasevic@redhat.com> wrote:
> > 
> >> On 12/12/2012 05:54 PM, Or Gerlitz wrote:
> >> > On Wed, Dec 12, 2012 at 10:01 PM, Vlad Yasevich <vyasevic@redhat.com> wrote:  
> >> >> This series of patches provides an ability to add VLANs to the bridge
> >> >> 
> > 
> > The bigger question is why is this impossible or too awkward with existing
> > netfilter (ebtables) functionality? As a practical matter, I like to keep
> > the bridging code as simple as possible and move the complexity away from
> > the core.
> > 
> > Also, if the functionality lived in netfilter rules, the developer and user
> > would have a more freedom to implement complex rulesets.
> 
> I do not consider it wise to create more, rather then fewer, users
> of ebtables.
> 
> It is one of the most poorly constructed subsystems in the entire
> networking.

Maybe a better filtering architecture (at the bridge level) would
be a good project.

^ permalink raw reply

* Re: [RFC PATCH net-next 0/5] Ease netns management for userland
From: Eric W. Biederman @ 2012-12-13 19:08 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: netdev, davem, aatteka
In-Reply-To: <50CA135A.7060802@6wind.com>

Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:

> Le 12/12/2012 22:48, Eric W. Biederman a écrit :
>> ebiederm@xmission.com (Eric W. Biederman) writes:
>>
>>> It is very wrong to presume that without context you know the reason for
>>> the exsitence of any network namespace and that you should or even that
>>> you can manage it.  Think of running your multi-network namespace
>>> managing application in a container.
>>
>> A good example of a network namespace you don't want to mess with are
>> the network namespaces created by vsftp and chrome for security purposes
>> to remove any possibility of creating new connections to the network.
>>
> Ok, I get the point.
>
> A last question: from an administration point of view, is it intended to
> not be able to monitor which netns are currently used? Like it can be done
> for sockets, files, ...

No.  The difficulty monitoring which network namespaces are being used
is an unintended side effect.

My pending changes to /proc/<pid>/ns/net and friends that allow you to
stat those files and compare if two network are the same network
namespace should make that monitoring much easier.  It isn't perfect as
there currently isn't a way to take a socket and say which network
namespace is this socket in.  But the current solution should tell you
what is happening most of the time.

struct net allocates it's own slab type so /proc/slabinfo on a good day
can tell you how many network namespace structures have been allocated
and are in use.

Eric

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox