* [PATCH v1 1/3] dtb: Add SGMII based 1GbE node to APM X-Gene SoC device tree
From: Iyappan Subramanian @ 2014-10-10 20:18 UTC (permalink / raw)
To: davem, netdev, devicetree
Cc: linux-arm-kernel, patches, kchudgar, Iyappan Subramanian
In-Reply-To: <1412972316-16344-1-git-send-email-isubramanian@apm.com>
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Keyur Chudgar <kchudgar@apm.com>
---
arch/arm64/boot/dts/apm-mustang.dts | 4 ++++
arch/arm64/boot/dts/apm-storm.dtsi | 24 ++++++++++++++++++++++++
2 files changed, 28 insertions(+)
diff --git a/arch/arm64/boot/dts/apm-mustang.dts b/arch/arm64/boot/dts/apm-mustang.dts
index 2ae782b..71a1489 100644
--- a/arch/arm64/boot/dts/apm-mustang.dts
+++ b/arch/arm64/boot/dts/apm-mustang.dts
@@ -33,6 +33,10 @@
status = "ok";
};
+&sgenet0 {
+ status = "ok";
+};
+
&xgenet {
status = "ok";
};
diff --git a/arch/arm64/boot/dts/apm-storm.dtsi b/arch/arm64/boot/dts/apm-storm.dtsi
index d16cc03..f45bbfe 100644
--- a/arch/arm64/boot/dts/apm-storm.dtsi
+++ b/arch/arm64/boot/dts/apm-storm.dtsi
@@ -176,6 +176,16 @@
clock-output-names = "menetclk";
};
+ sge0clk: sge0clk@1f21c000 {
+ compatible = "apm,xgene-device-clock";
+ #clock-cells = <1>;
+ clocks = <&socplldiv2 0>;
+ reg = <0x0 0x1f21c000 0x0 0x1000>;
+ reg-names = "csr-reg";
+ csr-mask = <0x3>;
+ clock-output-names = "sge0clk";
+ };
+
xge0clk: xge0clk@1f61c000 {
compatible = "apm,xgene-device-clock";
#clock-cells = <1>;
@@ -446,6 +456,20 @@
};
};
+ sgenet0: ethernet@1f210000 {
+ compatible = "apm,xgene-enet";
+ status = "disabled";
+ reg = <0x0 0x1f210000 0x0 0x10000>,
+ <0x0 0x1f200000 0x0 0X10000>,
+ <0x0 0x1B000000 0x0 0X20000>;
+ reg-names = "enet_csr", "ring_csr", "ring_cmd";
+ interrupts = <0x0 0xA0 0x4>;
+ dma-coherent;
+ clocks = <&sge0clk 0>;
+ local-mac-address = [00 00 00 00 00 00];
+ phy-connection-type = "sgmii";
+ };
+
xgenet: ethernet@1f610000 {
compatible = "apm,xgene-enet";
status = "disabled";
--
1.9.1
^ permalink raw reply related
* [PATCH] fm10k: Add skb->xmit_more support
From: alexander.duyck @ 2014-10-10 20:17 UTC (permalink / raw)
To: e1000-devel, netdev, jeffrey.t.kirsher; +Cc: matthew.vick
From: Alexander Duyck <alexander.h.duyck@redhat.com>
This change adds suport for skb->xmit_more based on the changes that were
made to igb to support the feature. The main changes are moving up the
check for maybe_stop_tx so that we can check netif_xmit_stopped to determine
if we must write the tail because we can add no further buffers.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
drivers/net/ethernet/intel/fm10k/fm10k_main.c | 65 +++++++++++++------------
1 file changed, 34 insertions(+), 31 deletions(-)
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index 6c800a3..8ad7ff4 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -930,6 +930,30 @@ static bool fm10k_tx_desc_push(struct fm10k_ring *tx_ring,
return i == tx_ring->count;
}
+static int __fm10k_maybe_stop_tx(struct fm10k_ring *tx_ring, u16 size)
+{
+ netif_stop_subqueue(tx_ring->netdev, tx_ring->queue_index);
+
+ smp_mb();
+
+ /* We need to check again in a case another CPU has just
+ * made room available. */
+ if (likely(fm10k_desc_unused(tx_ring) < size))
+ return -EBUSY;
+
+ /* A reprieve! - use start_queue because it doesn't call schedule */
+ netif_start_subqueue(tx_ring->netdev, tx_ring->queue_index);
+ ++tx_ring->tx_stats.restart_queue;
+ return 0;
+}
+
+static inline int fm10k_maybe_stop_tx(struct fm10k_ring *tx_ring, u16 size)
+{
+ if (likely(fm10k_desc_unused(tx_ring) >= size))
+ return 0;
+ return __fm10k_maybe_stop_tx(tx_ring, size);
+}
+
static void fm10k_tx_map(struct fm10k_ring *tx_ring,
struct fm10k_tx_buffer *first)
{
@@ -1023,13 +1047,18 @@ static void fm10k_tx_map(struct fm10k_ring *tx_ring,
tx_ring->next_to_use = i;
+ /* Make sure there is space in the ring for the next send. */
+ fm10k_maybe_stop_tx(tx_ring, DESC_NEEDED);
+
/* notify HW of packet */
- writel(i, tx_ring->tail);
+ if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
+ writel(i, tx_ring->tail);
- /* we need this if more than one processor can write to our tail
- * at a time, it synchronizes IO on IA64/Altix systems
- */
- mmiowb();
+ /* we need this if more than one processor can write to our tail
+ * at a time, it synchronizes IO on IA64/Altix systems
+ */
+ mmiowb();
+ }
return;
dma_error:
@@ -1049,30 +1078,6 @@ dma_error:
tx_ring->next_to_use = i;
}
-static int __fm10k_maybe_stop_tx(struct fm10k_ring *tx_ring, u16 size)
-{
- netif_stop_subqueue(tx_ring->netdev, tx_ring->queue_index);
-
- smp_mb();
-
- /* We need to check again in a case another CPU has just
- * made room available. */
- if (likely(fm10k_desc_unused(tx_ring) < size))
- return -EBUSY;
-
- /* A reprieve! - use start_queue because it doesn't call schedule */
- netif_start_subqueue(tx_ring->netdev, tx_ring->queue_index);
- ++tx_ring->tx_stats.restart_queue;
- return 0;
-}
-
-static inline int fm10k_maybe_stop_tx(struct fm10k_ring *tx_ring, u16 size)
-{
- if (likely(fm10k_desc_unused(tx_ring) >= size))
- return 0;
- return __fm10k_maybe_stop_tx(tx_ring, size);
-}
-
netdev_tx_t fm10k_xmit_frame_ring(struct sk_buff *skb,
struct fm10k_ring *tx_ring)
{
@@ -1117,8 +1122,6 @@ netdev_tx_t fm10k_xmit_frame_ring(struct sk_buff *skb,
fm10k_tx_map(tx_ring, first);
- fm10k_maybe_stop_tx(tx_ring, DESC_NEEDED);
-
return NETDEV_TX_OK;
out_drop:
^ permalink raw reply related
* Re: [PATCH net-next] net: bcmgenet: enable driver to work without a device tree
From: Petri Gynther @ 2014-10-10 19:46 UTC (permalink / raw)
To: Florian Fainelli; +Cc: netdev, David Miller
In-Reply-To: <54382CA0.6040105@gmail.com>
Hi Florian,
On Fri, Oct 10, 2014 at 11:59 AM, Florian Fainelli <f.fainelli@gmail.com> wrote:
> On 10/10/2014 11:35 AM, Petri Gynther wrote:
>> Broadcom 7xxx MIPS-based STB platforms do not use device trees.
>> Modify bcmgenet driver so that it can be used on those platforms.
>>
>> Signed-off-by: Petri Gynther <pgynther@google.com>
>> ---
>
> [snip]
>
>> diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.h b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
>> index dbf524e..5191e3f 100644
>> --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.h
>> +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
>> @@ -17,6 +17,17 @@
>> #include <linux/if_vlan.h>
>> #include <linux/phy.h>
>>
>> +struct bcmgenet_platform_data {
>> + void __iomem *base_reg;
>> + int irq0;
>> + int irq1;
>
> Why would these members here? The platform device should provide those
> as standard resources that the driver fetches using
> platform_get_resource() and platform_get_irq().
>
I modeled this on struct bcmemac_platform_data that was used in the
legacy BRCMSTB code.
include/linux/brcmstb/brcmstb.h:
struct bcmemac_platform_data {
/* used by the BSP code only */
uintptr_t base_reg;
int irq0;
int irq1;
int phy_type;
int phy_id;
int phy_speed;
u8 macaddr[ETH_ALEN];
};
The legacy BRCMSTB code stores all relevant GENET info in this struct
and then creates the resources from that info:
static void __init brcm_register_genet(int id, struct bcmemac_platform_data *pd)
{
struct resource res[3];
struct platform_device *pdev;
memset(&res, 0, sizeof(res));
res[0].start = BPHYSADDR(pd->base_reg);
res[0].end = BPHYSADDR(pd->base_reg + 0x4fff);
res[0].flags = IORESOURCE_MEM;
res[1].start = res[1].end = pd->irq0;
res[1].flags = IORESOURCE_IRQ;
res[2].start = res[2].end = pd->irq1;
res[2].flags = IORESOURCE_IRQ;
brcm_alloc_macaddr(pd->macaddr);
pdev = platform_device_alloc("bcmgenet", id);
platform_device_add_resources(pdev, res, 3);
platform_device_add_data(pdev, pd, sizeof(*pd));
platform_device_add(pdev);
}
>> + int phy_type;
>> + int phy_addr;
>> + int phy_speed;
>> + u8 macaddr[ETH_ALEN];
>> + int genet_version;
>> +};
>
> I would rather we put this in include/linux/platform_data/bcmgenet.h
> where it belongs.
>
I wasn't aware of the directory include/linux/platform_data/. Yes,
that's where this belongs.
>> +
>> /* total number of Buffer Descriptors, same for Rx/Tx */
>> #define TOTAL_DESC 256
>>
>> diff --git a/drivers/net/ethernet/broadcom/genet/bcmmii.c b/drivers/net/ethernet/broadcom/genet/bcmmii.c
>> index 9ff799a..e5ff913 100644
>> --- a/drivers/net/ethernet/broadcom/genet/bcmmii.c
>> +++ b/drivers/net/ethernet/broadcom/genet/bcmmii.c
>> @@ -157,6 +157,21 @@ static void bcmgenet_mii_setup(struct net_device *dev)
>> phy_print_status(phydev);
>> }
>>
>> +static int bcmgenet_moca_fphy_update(struct net_device *dev,
>> + struct fixed_phy_status *status)
>> +{
>> + struct bcmgenet_priv *priv = netdev_priv(dev);
>> + struct phy_device *phydev = priv->phydev;
>> +
>> + /*
>> + * MoCA daemon updates phydev->autoneg to reflect the link status.
>> + * Update MoCA fixed PHY status accordingly, so that the PHY state
>> + * machine becomes aware of the real link status.
>> + */
>> + status->link = phydev->autoneg;
>> + return 0;
>> +}
>
> I don't want to see that in the upstream driver, please enable the link
> interrupts like I suggested before and do not use the autoneg field at
> all, which should require no MoCA daemon modifications.
>
I added debug printk's to bcmgenet_isr0 to check on UMAC_IRQ_LINK_UP
and UMAC_IRQ_LINK_DOWN.
I am not getting those interrupts on eth1 (MoCA) port when coax is
removed/inserted.
But, they do work on eth0.
I'll modify init_umac() to enable those interrupts for MoCA port and retest.
> [snip]
>
>>
>> priv->phydev = phydev;
>> @@ -437,6 +464,104 @@ static int bcmgenet_mii_of_init(struct bcmgenet_priv *priv)
>> return 0;
>> }
>>
>> +static int bcmgenet_mii_pd_init(struct bcmgenet_priv *priv)
>> +{
>> + struct device *kdev = &priv->pdev->dev;
>> + struct bcmgenet_platform_data *pd = kdev->platform_data;
>> + struct mii_bus *mdio = priv->mii_bus;
>> + int phy_addr = pd->phy_addr;
>> + struct phy_device *phydev;
>> + int ret;
>> + int i;
>> +
>> + /* Disable automatic MDIO bus scan */
>> + mdio->phy_mask = ~0;
>> +
>> + /* Clear all the IRQ properties */
>> + if (mdio->irq)
>> + for (i = 0; i < PHY_MAX_ADDR; i++)
>> + mdio->irq[i] = PHY_POLL;
>> +
>> + /* Register the MDIO bus */
>> + ret = mdiobus_register(mdio);
>> + if (ret) {
>> + dev_err(kdev, "failed to register MDIO bus\n");
>> + return ret;
>> + }
>> +
>> + /*
>> + * bcmgenet_platform_data needs to pass a valid PHY address for
>> + * internal/external PHY or -1 for MoCA PHY.
>> + */
>> + if (phy_addr >= 0 && phy_addr < PHY_MAX_ADDR) {
>
> We do too much low-level PHY device handling, and since you already have
> the phy_type provided via platform_data, we can use that hint to do the
> following:
>
> 1) an internal or external PHY with MDIO accesses should leave the bus
> auto-probing on with the specified PHY address in the mdio bus phy_mask
>
> 2) a MoCA PHY or an external PHY with MDIO accesses disabled should use
> the fixed-0 MII bus instead.
>
> This would look like this:
>
> if (pd->phy_type != PHY_INTERFACE_MODE_MOCA || pd->mdio_enabled)
> mdio->phy_mask = ~(1 << pd->phy_addr);
>
> ...
> mdiobus_register()
>
> priv->phydev = mdio->bus->phy_map[pd->phy_addr];
>
> phydev->phy_flags |= mask;
>
> if (pd->phy_type == PHY_INTERFACE_MODE_MOCA || !pd->mdio_enabled)
> priv->phydev = fixed_phy_register(...);
>
> and in both cases, later on you do connect to the PHY device
>
> I can cook a patch to illustrate what I think this could look like since
> I realize using pseudo-code to explain might not be the best thing.
>
I think I understand what you mean. I'll make a change.
>> + /*
>> + * 10/100/1000 Ethernet port with external or internal PHY.
>> + */
>> + phydev = get_phy_device(mdio, phy_addr, false);
>> + if (!phydev || IS_ERR(phydev)) {
>> + dev_err(kdev, "failed to create PHY device\n");
>> + mdiobus_unregister(mdio);
>> + return 1;
>> + }
>> +
>> + phydev->irq = PHY_POLL;
>> +
>> + ret = phy_device_register(phydev);
>> + if (ret) {
>> + dev_err(kdev, "failed to register PHY device\n");
>> + phy_device_free(phydev);
>> + mdiobus_unregister(mdio);
>> + return 1;
>> + }
>> +
>> + priv->phydev = phydev;
>> + priv->phy_interface = pd->phy_type;
>> + } else {
>> + /*
>> + * MoCA port with no MDIO-accessible PHY.
>> + * We need to use 1000/HD fixed PHY to represent the link layer.
>> + * MoCA daemon interacts with this PHY via ethtool.
>> + */
>> + struct fixed_phy_status moca_fphy_status = {
>> + .link = 0,
>> + .duplex = 0,
>
> This should be DUPLEX_FULL here, the link between GENET and the MoCA
> Ethernet convergence layer is full-duplex by nature (despite we report
> the PHY being half-duplex, which is a mistake in the downstream driver),
> the MoCA medium on the coaxial cable is half-duplex though, but that is
> handled by the MoCA HW.
>
> NB: I had issues in the past using a half-duplex link with the MoCA
> ethernet convergence layer, causing various types of packet loss because
> we use a simplified signaling internally in the hardware.
>
I picked this setting from 3.3 GENET driver. I'll test 1000/FULL on my
platform to see if it works.
>> + .speed = 1000,
>> + .pause = 0,
>> + .asym_pause = 0,
>> + };
>> +
>> + phydev = fixed_phy_register(PHY_POLL, &moca_fphy_status, NULL);
>> + if (!phydev || IS_ERR(phydev)) {
>> + dev_err(kdev, "failed to register fixed PHY device\n");
>> + mdiobus_unregister(mdio);
>> + return 1;
>> + }
>> +
>> + phydev->autoneg = AUTONEG_DISABLE;
>> +
>> + ret = fixed_phy_set_link_update(phydev,
>> + bcmgenet_moca_fphy_update);
>> + if (ret) {
>> + dev_err(kdev, "failed to set fixed PHY link update\n");
>> + }
>
> Should not we propagate this error to the caller?
Good catch. Yes.
> --
> Florian
^ permalink raw reply
* Re: [PATCH net v2 0/3] net: bcmgenet & systemport fixes
From: David Miller @ 2014-10-10 19:39 UTC (permalink / raw)
To: f.fainelli; +Cc: netdev, pgynther, jaedon.shin
In-Reply-To: <1412963514-7718-1-git-send-email-f.fainelli@gmail.com>
From: Florian Fainelli <f.fainelli@gmail.com>
Date: Fri, 10 Oct 2014 10:51:51 -0700
> This patch series fixes an off-by-one error introduced during a previous
> change, and the two other fixes fix a wake depth imbalance situation for
> the Wake-on-LAN interrupt line.
Series applied, thanks Florian.
^ permalink raw reply
* Re: [PATCH v2 net 0/5] net: fix races accessing page->_count
From: David Miller @ 2014-10-10 19:37 UTC (permalink / raw)
To: edumazet
Cc: netdev, alexander.h.duyck, jeffrey.t.kirsher, andreslc, gthelen,
hughd, rientjes
In-Reply-To: <1412941698-17502-1-git-send-email-edumazet@google.com>
From: Eric Dumazet <edumazet@google.com>
Date: Fri, 10 Oct 2014 04:48:13 -0700
> This is illegal to use atomic_set(&page->_count, ...) even if we 'own'
> the page. Other entities in the kernel need to use get_page_unless_zero()
> to get a reference to the page before testing page properties, so we could
> loose a refcount increment.
>
> The only case it is valid is when page->_count is 0, we can use this in
> __netdev_alloc_frag()
>
> Note that I never seen crashes caused by these races, the issue was reported
> by Andres Lagar-Cavilla and Hugh Dickins.
Series applied, thanks Eric.
^ permalink raw reply
* Re: [PATCH v2] net/phy: micrel: Add clock support for KSZ8021/KSZ8031
From: David Miller @ 2014-10-10 19:36 UTC (permalink / raw)
To: s.hauer; +Cc: f.fainelli, netdev, linux-kernel, kernel
In-Reply-To: <1412927285-6809-1-git-send-email-s.hauer@pengutronix.de>
From: Sascha Hauer <s.hauer@pengutronix.de>
Date: Fri, 10 Oct 2014 09:48:05 +0200
> The KSZ8021 and KSZ8031 support RMII reference input clocks of 25MHz
> and 50MHz. Both PHYs differ in the default frequency they expect
> after reset. If this differs from the actual input clock, then
> register 0x1f bit 7 must be changed.
>
> Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH v4] flow-dissector: Fix alignment issue in __skb_flow_get_ports
From: David Miller @ 2014-10-10 19:34 UTC (permalink / raw)
To: alexander.duyck; +Cc: netdev, eric.dumazet
In-Reply-To: <20141010190716.8311.98709.stgit@ahduyck-workstation.home>
From: alexander.duyck@gmail.com
Date: Fri, 10 Oct 2014 12:09:12 -0700
> From: Alexander Duyck <alexander.h.duyck@redhat.com>
>
> This patch addresses a kernel unaligned access bug seen on a sparc64 system
> with an igb adapter. Specifically the __skb_flow_get_ports was returning a
> be32 pointer which was then having the value directly returned.
>
> In order to prevent this it is actually easier to simply not populate the
> ports or address values when an skb is not present. In this case the
> assumption is that the data isn't needed and rather than slow down the
> faster aligned accesses by making them have to assume the unaligned path on
> architectures that don't support efficent unaligned access it makes more
> sense to simply switch off the bits that were copying the source and
> destination address/port for the case where we only care about the protocol
> types and lengths which are normally 16 bit fields anyway.
>
> Reported-by: David S. Miller <davem@davemloft.net>
> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
> ---
>
> v2: Fixed alignment to __be16 on ports
> v3: Discarded previous approach and instead simplified things by
> not populating ports, or src/dst addresses if skb is not present.
> By doing this we avoid the unaligned access issue entirely and do not
> populate fields that will not be used by the eth_get_headlen function.
> v4: Minor whitespace cleanups
> Added workaround for access of doff resulting in unaligned access
Applied, thanks Alex.
^ permalink raw reply
* Re: [PATCH] flow-dissector: Fix alignment issue in __skb_flow_get_ports
From: David Miller @ 2014-10-10 19:32 UTC (permalink / raw)
To: alexander.h.duyck; +Cc: alexander.duyck, eric.dumazet, David.Laight, netdev
In-Reply-To: <54382B2F.8030203@redhat.com>
From: Alexander Duyck <alexander.h.duyck@redhat.com>
Date: Fri, 10 Oct 2014 11:53:35 -0700
> On 10/10/2014 11:22 AM, David Miller wrote:
>> From: David Miller <davem@davemloft.net>
>> Date: Fri, 10 Oct 2014 14:15:59 -0400 (EDT)
>>
>>> Your original code works because you do things like "byte[12] & 0xf0"
>>> to
>>> extract these fields.
>> Changing that th->doff sequence to instead be:
>>
>> const u8 *bp;
>> u8 buf[13];
>>
>> bp = __skb_header_pointer(skb, poff, sizeof(buf),
>> data, hlen, &buf);
>> if (!bp)
>> return poff;
>>
>> poff += max_t(u32, sizeof(struct tcphdr), (bp[12] & 0xf0) >> 2);
>> break;
>>
>> on top of your v3 patch works for me.
>>
>> Please double-check my calculations.
>
> Any reason why you are grabbing all 13 bytes instead of just the 1 we
> care about? Seems like we could just use a u8 buf instead of the
> array since we are only grabbing doff.
No reason, just a thinko, my brain implemented this as if skb_header_pointer
worked like skb_pull :-/
^ permalink raw reply
* Re: [PATCH][net-next][V2] net: filter: fix the comments
From: David Miller @ 2014-10-10 19:12 UTC (permalink / raw)
To: roy.qing.li; +Cc: netdev, ast, alexei.starovoitov
In-Reply-To: <1412920611-2094-1-git-send-email-roy.qing.li@gmail.com>
From: roy.qing.li@gmail.com
Date: Fri, 10 Oct 2014 13:56:51 +0800
> From: Li RongQing <roy.qing.li@gmail.com>
>
> 1. sk_run_filter has been renamed, sk_filter() is using SK_RUN_FILTER.
> 2. Remove wrong comments about storing intermediate value.
> 3. replace sk_run_filter with __bpf_prog_run for check_load_and_stores's
> comments
>
> Cc: Alexei Starovoitov <ast@plumgrid.com>
> Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Applied.
^ permalink raw reply
* Re: [PATCH][net-next] Documentation: replace __sk_run_filter with __bpf_prog_run
From: David Miller @ 2014-10-10 19:11 UTC (permalink / raw)
To: roy.qing.li; +Cc: netdev
In-Reply-To: <1412912214-964-1-git-send-email-roy.qing.li@gmail.com>
From: roy.qing.li@gmail.com
Date: Fri, 10 Oct 2014 11:36:54 +0800
> From: Li RongQing <roy.qing.li@gmail.com>
>
> __sk_run_filter has been renamed as __bpf_prog_run, so replace them in comments
>
> Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Applied.
^ permalink raw reply
* Re: [PATCH net-next 0/2] macvlan: optimize receive path
From: David Miller @ 2014-10-10 19:10 UTC (permalink / raw)
To: jbaron; +Cc: eric.dumazet, stephen, vyasevich, kaber, netdev
In-Reply-To: <cover.1412908929.git.jbaron@akamai.com>
From: Jason Baron <jbaron@akamai.com>
Date: Fri, 10 Oct 2014 03:13:24 +0000 (GMT)
> So after porting this optimization to net-next, I found that the netperf
> results of TCP_RR regress right at the maximum peak of transactions/sec. That
> is as I increase the number of threads via the first argument to super_netperf,
> the number of transactions/sec keep increasing, peak, and then start
> decreasing. It is right at the peak, that I see a small regression with this
> patch (see results in patch 2/2).
>
> Without the patch, the ksoftirqd threads are the top cpu consumers threads on
> the system, since the extra 'netif_rx()', is queuing more softirq work, whereas
> with the patch, the ksoftirqd threads are below all of the 'netserver' threads
> in terms of their cpu usage. So there appears to be some interaction between how
> softirqs are serviced at the peak here and this patch. I think the test results
> are still supportive of this approach, but I wanted to be clear on my findings.
I think this is definitely the right thing to do, applied, thanks!
^ permalink raw reply
* [PATCH v4] flow-dissector: Fix alignment issue in __skb_flow_get_ports
From: alexander.duyck @ 2014-10-10 19:09 UTC (permalink / raw)
To: netdev, davem; +Cc: eric.dumazet
From: Alexander Duyck <alexander.h.duyck@redhat.com>
This patch addresses a kernel unaligned access bug seen on a sparc64 system
with an igb adapter. Specifically the __skb_flow_get_ports was returning a
be32 pointer which was then having the value directly returned.
In order to prevent this it is actually easier to simply not populate the
ports or address values when an skb is not present. In this case the
assumption is that the data isn't needed and rather than slow down the
faster aligned accesses by making them have to assume the unaligned path on
architectures that don't support efficent unaligned access it makes more
sense to simply switch off the bits that were copying the source and
destination address/port for the case where we only care about the protocol
types and lengths which are normally 16 bit fields anyway.
Reported-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
v2: Fixed alignment to __be16 on ports
v3: Discarded previous approach and instead simplified things by
not populating ports, or src/dst addresses if skb is not present.
By doing this we avoid the unaligned access issue entirely and do not
populate fields that will not be used by the eth_get_headlen function.
v4: Minor whitespace cleanups
Added workaround for access of doff resulting in unaligned access
net/core/flow_dissector.c | 36 +++++++++++++++++++++++-------------
1 file changed, 23 insertions(+), 13 deletions(-)
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 8560dea..4508493 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -100,6 +100,13 @@ ip:
if (ip_is_fragment(iph))
ip_proto = 0;
+ /* skip the address processing if skb is NULL. The assumption
+ * here is that if there is no skb we are not looking for flow
+ * info but lengths and protocols.
+ */
+ if (!skb)
+ break;
+
iph_to_flow_copy_addrs(flow, iph);
break;
}
@@ -114,17 +121,15 @@ ipv6:
return false;
ip_proto = iph->nexthdr;
- flow->src = (__force __be32)ipv6_addr_hash(&iph->saddr);
- flow->dst = (__force __be32)ipv6_addr_hash(&iph->daddr);
nhoff += sizeof(struct ipv6hdr);
- /* skip the flow label processing if skb is NULL. The
- * assumption here is that if there is no skb we are not
- * looking for flow info as much as we are length.
- */
+ /* see comment above in IPv4 section */
if (!skb)
break;
+ flow->src = (__force __be32)ipv6_addr_hash(&iph->saddr);
+ flow->dst = (__force __be32)ipv6_addr_hash(&iph->daddr);
+
flow_label = ip6_flowlabel(iph);
if (flow_label) {
/* Awesome, IPv6 packet has a flow label so we can
@@ -231,9 +236,13 @@ ipv6:
flow->n_proto = proto;
flow->ip_proto = ip_proto;
- flow->ports = __skb_flow_get_ports(skb, nhoff, ip_proto, data, hlen);
flow->thoff = (u16) nhoff;
+ /* unless skb is set we don't need to record port info */
+ if (skb)
+ flow->ports = __skb_flow_get_ports(skb, nhoff, ip_proto,
+ data, hlen);
+
return true;
}
EXPORT_SYMBOL(__skb_flow_dissect);
@@ -334,15 +343,16 @@ u32 __skb_get_poff(const struct sk_buff *skb, void *data,
switch (keys->ip_proto) {
case IPPROTO_TCP: {
- const struct tcphdr *tcph;
- struct tcphdr _tcph;
+ /* access doff as u8 to avoid unaligned access */
+ const u8 *doff;
+ u8 _doff;
- tcph = __skb_header_pointer(skb, poff, sizeof(_tcph),
- data, hlen, &_tcph);
- if (!tcph)
+ doff = __skb_header_pointer(skb, poff + 12, sizeof(_doff),
+ data, hlen, &_doff);
+ if (!doff)
return poff;
- poff += max_t(u32, sizeof(struct tcphdr), tcph->doff * 4);
+ poff += max_t(u32, sizeof(struct tcphdr), (*doff & 0xF0) >> 2);
break;
}
case IPPROTO_UDP:
^ permalink raw reply related
* Re: [PATCH v4 0/6] Add 10GbE support to APM X-Gene SoC ethernet driver
From: David Miller @ 2014-10-10 19:07 UTC (permalink / raw)
To: isubramanian; +Cc: netdev, devicetree, linux-arm-kernel, patches, kchudgar
In-Reply-To: <1412904727-23485-1-git-send-email-isubramanian@apm.com>
From: Iyappan Subramanian <isubramanian@apm.com>
Date: Thu, 9 Oct 2014 18:32:01 -0700
> Adding 10GbE support to APM X-Gene SoC ethernet driver.
>
> v4: Address comments from v3
> * dtb: resolved merge conflict for the net tree
>
> v3: Address comments from v2
> * dtb: changed to use all-zeros for the mac address
>
> v2: Address comments from v1
> * created preparatory patch to review before adding new functionality
> * dtb: updated to use tabs consistently
>
> v1:
> * Initial version
Series applied, thanks.
^ permalink raw reply
* Re: [PATCH net] net: bpf: fix bpf syscall dependence on anon_inodes
From: David Miller @ 2014-10-10 19:03 UTC (permalink / raw)
To: ast; +Cc: sojkam1, netdev, linux-kernel
In-Reply-To: <1412893001-8401-1-git-send-email-ast@plumgrid.com>
From: Alexei Starovoitov <ast@plumgrid.com>
Date: Thu, 9 Oct 2014 15:16:41 -0700
> minimal configurations where EPOLL, PERF_EVENTS, etc are disabled,
> but NET is enabled, are failing to build with link error:
> kernel/built-in.o: In function `bpf_prog_load':
> syscall.c:(.text+0x3b728): undefined reference to `anon_inode_getfd'
>
> fix it by selecting ANON_INODES when NET is enabled
>
> Reported-by: Michal Sojka <sojkam1@fel.cvut.cz>
> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
> ---
>
> I understand that 'select' is highly non-recommended for all good reasons,
> but here 'depends on' is very user unfriendly, since ANON_INODES is
> a hidden config that users cannot select directly.
Applied, thanks.
^ permalink raw reply
* Re: [PATCH 0/2] Netfilter fixes for net-next
From: David Miller @ 2014-10-10 19:01 UTC (permalink / raw)
To: pablo; +Cc: netfilter-devel, netdev
In-Reply-To: <1412879261-25045-1-git-send-email-pablo@netfilter.org>
From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Thu, 9 Oct 2014 20:27:39 +0200
> This batch contains two fixes for what you have in your net-next,
> they are:
>
> 1) Remove nf_send_reset6() from header file. This function now resides
> in the nf_reject_ipv6 module. Reported by Eric Dumazet.
>
> 2) Fix wrong NFT_REJECT_ICMPX_MAX definition and adjust code to fix
> errors reported by Dan Carpenter's static analysis tools.
>
> You can pull these changes from:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git
Pulled, thanks Pablo.
^ permalink raw reply
* Re: [PATCH net-next] net: bcmgenet: enable driver to work without a device tree
From: Florian Fainelli @ 2014-10-10 18:59 UTC (permalink / raw)
To: Petri Gynther, netdev; +Cc: davem
In-Reply-To: <20141010183501.0189F1008A3@puck.mtv.corp.google.com>
On 10/10/2014 11:35 AM, Petri Gynther wrote:
> Broadcom 7xxx MIPS-based STB platforms do not use device trees.
> Modify bcmgenet driver so that it can be used on those platforms.
>
> Signed-off-by: Petri Gynther <pgynther@google.com>
> ---
[snip]
> diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.h b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
> index dbf524e..5191e3f 100644
> --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.h
> +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
> @@ -17,6 +17,17 @@
> #include <linux/if_vlan.h>
> #include <linux/phy.h>
>
> +struct bcmgenet_platform_data {
> + void __iomem *base_reg;
> + int irq0;
> + int irq1;
Why would these members here? The platform device should provide those
as standard resources that the driver fetches using
platform_get_resource() and platform_get_irq().
> + int phy_type;
> + int phy_addr;
> + int phy_speed;
> + u8 macaddr[ETH_ALEN];
> + int genet_version;
> +};
I would rather we put this in include/linux/platform_data/bcmgenet.h
where it belongs.
> +
> /* total number of Buffer Descriptors, same for Rx/Tx */
> #define TOTAL_DESC 256
>
> diff --git a/drivers/net/ethernet/broadcom/genet/bcmmii.c b/drivers/net/ethernet/broadcom/genet/bcmmii.c
> index 9ff799a..e5ff913 100644
> --- a/drivers/net/ethernet/broadcom/genet/bcmmii.c
> +++ b/drivers/net/ethernet/broadcom/genet/bcmmii.c
> @@ -157,6 +157,21 @@ static void bcmgenet_mii_setup(struct net_device *dev)
> phy_print_status(phydev);
> }
>
> +static int bcmgenet_moca_fphy_update(struct net_device *dev,
> + struct fixed_phy_status *status)
> +{
> + struct bcmgenet_priv *priv = netdev_priv(dev);
> + struct phy_device *phydev = priv->phydev;
> +
> + /*
> + * MoCA daemon updates phydev->autoneg to reflect the link status.
> + * Update MoCA fixed PHY status accordingly, so that the PHY state
> + * machine becomes aware of the real link status.
> + */
> + status->link = phydev->autoneg;
> + return 0;
> +}
I don't want to see that in the upstream driver, please enable the link
interrupts like I suggested before and do not use the autoneg field at
all, which should require no MoCA daemon modifications.
[snip]
>
> priv->phydev = phydev;
> @@ -437,6 +464,104 @@ static int bcmgenet_mii_of_init(struct bcmgenet_priv *priv)
> return 0;
> }
>
> +static int bcmgenet_mii_pd_init(struct bcmgenet_priv *priv)
> +{
> + struct device *kdev = &priv->pdev->dev;
> + struct bcmgenet_platform_data *pd = kdev->platform_data;
> + struct mii_bus *mdio = priv->mii_bus;
> + int phy_addr = pd->phy_addr;
> + struct phy_device *phydev;
> + int ret;
> + int i;
> +
> + /* Disable automatic MDIO bus scan */
> + mdio->phy_mask = ~0;
> +
> + /* Clear all the IRQ properties */
> + if (mdio->irq)
> + for (i = 0; i < PHY_MAX_ADDR; i++)
> + mdio->irq[i] = PHY_POLL;
> +
> + /* Register the MDIO bus */
> + ret = mdiobus_register(mdio);
> + if (ret) {
> + dev_err(kdev, "failed to register MDIO bus\n");
> + return ret;
> + }
> +
> + /*
> + * bcmgenet_platform_data needs to pass a valid PHY address for
> + * internal/external PHY or -1 for MoCA PHY.
> + */
> + if (phy_addr >= 0 && phy_addr < PHY_MAX_ADDR) {
We do too much low-level PHY device handling, and since you already have
the phy_type provided via platform_data, we can use that hint to do the
following:
1) an internal or external PHY with MDIO accesses should leave the bus
auto-probing on with the specified PHY address in the mdio bus phy_mask
2) a MoCA PHY or an external PHY with MDIO accesses disabled should use
the fixed-0 MII bus instead.
This would look like this:
if (pd->phy_type != PHY_INTERFACE_MODE_MOCA || pd->mdio_enabled)
mdio->phy_mask = ~(1 << pd->phy_addr);
...
mdiobus_register()
priv->phydev = mdio->bus->phy_map[pd->phy_addr];
phydev->phy_flags |= mask;
if (pd->phy_type == PHY_INTERFACE_MODE_MOCA || !pd->mdio_enabled)
priv->phydev = fixed_phy_register(...);
and in both cases, later on you do connect to the PHY device
I can cook a patch to illustrate what I think this could look like since
I realize using pseudo-code to explain might not be the best thing.
> + /*
> + * 10/100/1000 Ethernet port with external or internal PHY.
> + */
> + phydev = get_phy_device(mdio, phy_addr, false);
> + if (!phydev || IS_ERR(phydev)) {
> + dev_err(kdev, "failed to create PHY device\n");
> + mdiobus_unregister(mdio);
> + return 1;
> + }
> +
> + phydev->irq = PHY_POLL;
> +
> + ret = phy_device_register(phydev);
> + if (ret) {
> + dev_err(kdev, "failed to register PHY device\n");
> + phy_device_free(phydev);
> + mdiobus_unregister(mdio);
> + return 1;
> + }
> +
> + priv->phydev = phydev;
> + priv->phy_interface = pd->phy_type;
> + } else {
> + /*
> + * MoCA port with no MDIO-accessible PHY.
> + * We need to use 1000/HD fixed PHY to represent the link layer.
> + * MoCA daemon interacts with this PHY via ethtool.
> + */
> + struct fixed_phy_status moca_fphy_status = {
> + .link = 0,
> + .duplex = 0,
This should be DUPLEX_FULL here, the link between GENET and the MoCA
Ethernet convergence layer is full-duplex by nature (despite we report
the PHY being half-duplex, which is a mistake in the downstream driver),
the MoCA medium on the coaxial cable is half-duplex though, but that is
handled by the MoCA HW.
NB: I had issues in the past using a half-duplex link with the MoCA
ethernet convergence layer, causing various types of packet loss because
we use a simplified signaling internally in the hardware.
> + .speed = 1000,
> + .pause = 0,
> + .asym_pause = 0,
> + };
> +
> + phydev = fixed_phy_register(PHY_POLL, &moca_fphy_status, NULL);
> + if (!phydev || IS_ERR(phydev)) {
> + dev_err(kdev, "failed to register fixed PHY device\n");
> + mdiobus_unregister(mdio);
> + return 1;
> + }
> +
> + phydev->autoneg = AUTONEG_DISABLE;
> +
> + ret = fixed_phy_set_link_update(phydev,
> + bcmgenet_moca_fphy_update);
> + if (ret) {
> + dev_err(kdev, "failed to set fixed PHY link update\n");
> + }
Should not we propagate this error to the caller?
--
Florian
^ permalink raw reply
* Re: [PATCH] flow-dissector: Fix alignment issue in __skb_flow_get_ports
From: Alexander Duyck @ 2014-10-10 18:53 UTC (permalink / raw)
To: David Miller, alexander.duyck; +Cc: eric.dumazet, David.Laight, netdev
In-Reply-To: <20141010.142255.359367991309840826.davem@davemloft.net>
On 10/10/2014 11:22 AM, David Miller wrote:
> From: David Miller <davem@davemloft.net>
> Date: Fri, 10 Oct 2014 14:15:59 -0400 (EDT)
>
>> Your original code works because you do things like "byte[12] & 0xf0" to
>> extract these fields.
> Changing that th->doff sequence to instead be:
>
> const u8 *bp;
> u8 buf[13];
>
> bp = __skb_header_pointer(skb, poff, sizeof(buf),
> data, hlen, &buf);
> if (!bp)
> return poff;
>
> poff += max_t(u32, sizeof(struct tcphdr), (bp[12] & 0xf0) >> 2);
> break;
>
> on top of your v3 patch works for me.
>
> Please double-check my calculations.
Any reason why you are grabbing all 13 bytes instead of just the 1 we
care about? Seems like we could just use a u8 buf instead of the array
since we are only grabbing doff.
Thanks,
Alex
^ permalink raw reply
* Re: pull request: wireless-next 2014-10-09
From: David Miller @ 2014-10-10 18:50 UTC (permalink / raw)
To: linville; +Cc: linux-wireless, netdev, linux-kernel
In-Reply-To: <20141009154402.GB13986@tuxdriver.com>
From: "John W. Linville" <linville@tuxdriver.com>
Date: Thu, 9 Oct 2014 11:44:02 -0400
> Please pull this batch of fixes intended for the 3.18 stream!
>
> Andrea Merello makes rtl818x_pci use a more reasonable transmission
> rate for HW generated frames.
>
> Fabian Frederick tweaks some kernel-doc bits to avoid warnings.
>
> Larry Finger corrects a possible unaligned access in the rtlwifi code.
>
> Marek Puzyniak avoids a kernel panic in ath9k_hw_reset.
>
> Sujith Manoharan goes for the hat trick -- he fixes a smatch warning
> in the shared ath code, he fixes a crash in ath9k, and he corrects
> a sequence number assignment problem in ath9k too.
>
> For ease of merging, I pulled the last bits of the wireless tree as well...
>
> Please let me know if there are problems!
Pulled, thank you John.
^ permalink raw reply
* Re: [PATCH net] stmmac: correct mc_filter local variable in set_filter and set_mac_addr call
From: David Miller @ 2014-10-10 18:48 UTC (permalink / raw)
To: vbridger; +Cc: peppe.cavallaro, netdev, linux-kernel, vbridger
In-Reply-To: <1412867436-22153-1-git-send-email-vbridger@opensource.altera.com>
From: Vince Bridgers <vbridger@opensource.altera.com>
Date: Thu, 9 Oct 2014 10:10:36 -0500
> Testing revealed that the local variable mc_filter was dimensioned
> incorrectly for all possible configurations and get_mac_addr should
> have been set_mac_addr (a typo). Make sure mc_filter is dimensioned
> to 8 32-bit unsigned longs - the largest size of the Synopsys
> multicast filter register set.
>
> Signed-off-by: Vince Bridgers <vbridger@opensource.altera.com>
Applied, thanks vince.
^ permalink raw reply
* [PATCH RFC v4 net 1/3] ipv6: Remove BACKTRACK macro
From: Martin KaFai Lau @ 2014-10-10 18:48 UTC (permalink / raw)
To: netdev; +Cc: David Miller, Hannes Frederic Sowa
In-Reply-To: <1412966888-31384-1-git-send-email-kafai@fb.com>
It is the prep work to reduce the number of calls to fib6_lookup().
The BACKTRACK macro could be hard-to-read and error-prone due to
its side effects (mainly goto).
This patch is to:
1. Replace BACKTRACK macro with a function (fib6_backtrack) with the following
return values:
* If it is backtrack-able, returns next fn for retry.
* If it reaches the root, returns NULL.
2. The caller needs to decide if a backtrack is needed (by testing
rt == net->ipv6.ip6_null_entry).
3. Rename the goto labels in ip6_pol_route() to make the next few
patches easier to read.
Cc: David Miller <davem@davemloft.net>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
include/net/ip6_fib.h | 3 +++
net/ipv6/ip6_fib.c | 17 ++++++++++++++++
net/ipv6/route.c | 55 ++++++++++++++++++++++-----------------------------
3 files changed, 44 insertions(+), 31 deletions(-)
diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 8eea35d..1f1d7f8 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -288,6 +288,9 @@ struct fib6_node *fib6_locate(struct fib6_node *root,
const struct in6_addr *daddr, int dst_len,
const struct in6_addr *saddr, int src_len);
+struct fib6_node *fib6_backtrack(struct fib6_node *fn,
+ struct in6_addr *saddr);
+
void fib6_clean_all(struct net *net, int (*func)(struct rt6_info *, void *arg),
void *arg);
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index b2d1838..594d189 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1139,6 +1139,23 @@ struct fib6_node *fib6_locate(struct fib6_node *root,
return NULL;
}
+struct fib6_node* fib6_backtrack(struct fib6_node *fn,
+ struct in6_addr *saddr)
+{
+ struct fib6_node *pn;
+ while (1) {
+ if (fn->fn_flags & RTN_TL_ROOT)
+ return NULL;
+ pn = fn->parent;
+ if (FIB6_SUBTREE(pn) && FIB6_SUBTREE(pn) != fn)
+ fn = fib6_lookup(FIB6_SUBTREE(pn), NULL, saddr);
+ else
+ fn = pn;
+ if (fn->fn_flags & RTN_RTINFO)
+ return fn;
+ }
+}
+
/*
* Deletion
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index a318dd89..797a6e7 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -772,24 +772,6 @@ int rt6_route_rcv(struct net_device *dev, u8 *opt, int len,
}
#endif
-#define BACKTRACK(__net, saddr) \
-do { \
- if (rt == __net->ipv6.ip6_null_entry) { \
- struct fib6_node *pn; \
- while (1) { \
- if (fn->fn_flags & RTN_TL_ROOT) \
- goto out; \
- pn = fn->parent; \
- if (FIB6_SUBTREE(pn) && FIB6_SUBTREE(pn) != fn) \
- fn = fib6_lookup(FIB6_SUBTREE(pn), NULL, saddr); \
- else \
- fn = pn; \
- if (fn->fn_flags & RTN_RTINFO) \
- goto restart; \
- } \
- } \
-} while (0)
-
static struct rt6_info *ip6_pol_route_lookup(struct net *net,
struct fib6_table *table,
struct flowi6 *fl6, int flags)
@@ -804,8 +786,11 @@ restart:
rt = rt6_device_match(net, rt, &fl6->saddr, fl6->flowi6_oif, flags);
if (rt->rt6i_nsiblings && fl6->flowi6_oif == 0)
rt = rt6_multipath_select(rt, fl6, fl6->flowi6_oif, flags);
- BACKTRACK(net, &fl6->saddr);
-out:
+ if (rt == net->ipv6.ip6_null_entry) {
+ fn = fib6_backtrack(fn, &fl6->saddr);
+ if (fn)
+ goto restart;
+ }
dst_use(&rt->dst, jiffies);
read_unlock_bh(&table->tb6_lock);
return rt;
@@ -924,19 +909,25 @@ static struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
strict |= flags & RT6_LOOKUP_F_IFACE;
-relookup:
+redo_fib6_lookup_lock:
read_lock_bh(&table->tb6_lock);
-restart_2:
+redo_fib6_lookup:
fn = fib6_lookup(&table->tb6_root, &fl6->daddr, &fl6->saddr);
-restart:
+redo_rt6_select:
rt = rt6_select(fn, oif, strict | reachable);
if (rt->rt6i_nsiblings)
rt = rt6_multipath_select(rt, fl6, oif, strict | reachable);
- BACKTRACK(net, &fl6->saddr);
- if (rt == net->ipv6.ip6_null_entry ||
- rt->rt6i_flags & RTF_CACHE)
+ if (rt == net->ipv6.ip6_null_entry) {
+ fn = fib6_backtrack(fn, &fl6->saddr);
+ if (fn)
+ goto redo_rt6_select;
+ else
+ goto out;
+ }
+
+ if (rt->rt6i_flags & RTF_CACHE)
goto out;
dst_hold(&rt->dst);
@@ -967,12 +958,12 @@ restart:
* released someone could insert this route. Relookup.
*/
ip6_rt_put(rt);
- goto relookup;
+ goto redo_fib6_lookup_lock;
out:
if (reachable) {
reachable = 0;
- goto restart_2;
+ goto redo_fib6_lookup;
}
dst_hold(&rt->dst);
read_unlock_bh(&table->tb6_lock);
@@ -1235,10 +1226,12 @@ restart:
rt = net->ipv6.ip6_null_entry;
else if (rt->dst.error) {
rt = net->ipv6.ip6_null_entry;
- goto out;
+ } else if (rt == net->ipv6.ip6_null_entry) {
+ fn = fib6_backtrack(fn, &fl6->saddr);
+ if (fn)
+ goto restart;
}
- BACKTRACK(net, &fl6->saddr);
-out:
+
dst_hold(&rt->dst);
read_unlock_bh(&table->tb6_lock);
--
1.8.1
^ permalink raw reply related
* [PATCH RFC v4 net 3/3] ipv6: Avoid redo-ing fib6_lookup() with reachable = 0 by saving fn
From: Martin KaFai Lau @ 2014-10-10 18:48 UTC (permalink / raw)
To: netdev; +Cc: David Miller, Hannes Frederic Sowa
In-Reply-To: <1412966888-31384-1-git-send-email-kafai@fb.com>
This patch saves the fn before doing rt6_backtrack.
Hence, without redo-ing the fib6_lookup(), saved_fn can be used
to redo rt6_select() with RT6_LOOKUP_F_REACHABLE off.
Some minor changes I think make sense to review as a single patch:
* Remove the 'out:' goto label.
* Remove the 'reachable' variable.
After this patch, "failing ip6_ins_rt()" is the only case that
requires a redo of fib6_lookup().
Cc: David Miller <davem@davemloft.net>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
net/ipv6/route.c | 30 ++++++++++++++++--------------
1 file changed, 16 insertions(+), 14 deletions(-)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 5469ecf..a8ecb9f 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -900,31 +900,40 @@ static struct rt6_info *rt6_alloc_clone(struct rt6_info *ort,
static struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table, int oif,
struct flowi6 *fl6, int flags)
{
- struct fib6_node *fn;
+ struct fib6_node *fn, *saved_fn;
struct rt6_info *rt, *nrt;
int strict = 0;
int attempts = 3;
int err;
- int reachable = net->ipv6.devconf_all->forwarding ? 0 : RT6_LOOKUP_F_REACHABLE;
strict |= flags & RT6_LOOKUP_F_IFACE;
+ if (net->ipv6.devconf_all->forwarding == 0)
+ strict |= RT6_LOOKUP_F_REACHABLE;
redo_fib6_lookup_lock:
read_lock_bh(&table->tb6_lock);
-redo_fib6_lookup:
fn = fib6_lookup(&table->tb6_root, &fl6->daddr, &fl6->saddr);
+ saved_fn = fn;
redo_rt6_select:
- rt = rt6_select(fn, oif, strict | reachable);
+ rt = rt6_select(fn, oif, strict);
if (rt->rt6i_nsiblings)
- rt = rt6_multipath_select(rt, fl6, oif, strict | reachable);
+ rt = rt6_multipath_select(rt, fl6, oif, strict);
if (rt == net->ipv6.ip6_null_entry) {
fn = fib6_backtrack(fn, &fl6->saddr);
if (fn)
goto redo_rt6_select;
- else
- goto out;
+ else if (strict & RT6_LOOKUP_F_REACHABLE) {
+ /* also consider unreachable route */
+ strict &= ~RT6_LOOKUP_F_REACHABLE;
+ fn = saved_fn;
+ goto redo_rt6_select;
+ } else {
+ dst_hold(&rt->dst);
+ read_unlock_bh(&table->tb6_lock);
+ goto out2;
+ }
}
dst_hold(&rt->dst);
@@ -960,13 +969,6 @@ redo_rt6_select:
ip6_rt_put(rt);
goto redo_fib6_lookup_lock;
-out:
- if (reachable) {
- reachable = 0;
- goto redo_fib6_lookup;
- }
- dst_hold(&rt->dst);
- read_unlock_bh(&table->tb6_lock);
out2:
rt->dst.lastuse = jiffies;
rt->dst.__use++;
--
1.8.1
^ permalink raw reply related
* [PATCH RFC v4 net 2/3] ipv6: Avoid redoing fib6_lookup() for RTF_CACHE hit case
From: Martin KaFai Lau @ 2014-10-10 18:48 UTC (permalink / raw)
To: netdev; +Cc: David Miller, Hannes Frederic Sowa
In-Reply-To: <1412966888-31384-1-git-send-email-kafai@fb.com>
When there is a RTF_CACHE hit, no need to redo fib6_lookup()
with reachable=0.
Cc: David Miller <davem@davemloft.net>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
net/ipv6/route.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 797a6e7..5469ecf 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -927,12 +927,12 @@ redo_rt6_select:
goto out;
}
- if (rt->rt6i_flags & RTF_CACHE)
- goto out;
-
dst_hold(&rt->dst);
read_unlock_bh(&table->tb6_lock);
+ if (rt->rt6i_flags & RTF_CACHE)
+ goto out2;
+
if (!(rt->rt6i_flags & (RTF_NONEXTHOP | RTF_GATEWAY)))
nrt = rt6_alloc_cow(rt, &fl6->daddr, &fl6->saddr);
else if (!(rt->dst.flags & DST_HOST))
--
1.8.1
^ permalink raw reply related
* [PATCH RFC v4 net 0/3]: ipv6: Reduce the number of fib6_lookup() calls from ip6_pol_route()
From: Martin KaFai Lau @ 2014-10-10 18:48 UTC (permalink / raw)
To: netdev
Hi,
This patch set is trying to reduce the number of fib6_lookup()
calls from ip6_pol_route().
I have adapted davem's udpflood and kbench_mod test
(https://git.kernel.org/pub/scm/linux/kernel/git/davem/net_test_tools.git) to
support IPv6 and here is the result:
Before:
[root]# for i in $(seq 1 3); do time ./udpflood -l 20000000 -c 250 2401:face:face:face::2; done
real 0m34.190s
user 0m3.047s
sys 0m31.108s
real 0m34.635s
user 0m3.125s
sys 0m31.475s
real 0m34.517s
user 0m3.034s
sys 0m31.449s
[root]# insmod ip6_route_kbench.ko oif=2 src=2401:face:face:face::1 dst=2401:face:face:face::2
[ 660.160976] ip6_route_kbench: ip6_route_output tdiff: 933
[ 660.207261] ip6_route_kbench: ip6_route_output tdiff: 988
[ 660.253492] ip6_route_kbench: ip6_route_output tdiff: 896
[ 660.298862] ip6_route_kbench: ip6_route_output tdiff: 898
After:
[root]# for i in $(seq 1 3); do time ./udpflood -l 20000000 -c 250 2401:face:face:face::2; done
real 0m32.695s
user 0m2.925s
sys 0m29.737s
real 0m32.636s
user 0m3.007s
sys 0m29.596s
real 0m32.797s
user 0m2.866s
sys 0m29.898s
[root]# insmod ip6_route_kbench.ko oif=2 src=2401:face:face:face::1 dst=2401:face:face:face::2
[ 881.220793] ip6_route_kbench: ip6_route_output tdiff: 684
[ 881.253477] ip6_route_kbench: ip6_route_output tdiff: 640
[ 881.286867] ip6_route_kbench: ip6_route_output tdiff: 630
[ 881.320749] ip6_route_kbench: ip6_route_output tdiff: 653
/****************************** udpflood.c ******************************/
/* It is an adaptation of the Eric Dumazet's and David Miller's
* udpflood tool, by adding IPv6 support.
*/
#include <stdio.h>
#include <stdlib.h>
#include <stddef.h>
#include <malloc.h>
#include <string.h>
#include <errno.h>
#include <unistd.h>
#include <stdint.h>
#include <assert.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#define _GNU_SOURCE
#include <getopt.h>
typedef uint32_t u32;
static int debug =3D 0;
/* Allow -fstrict-aliasing */
typedef union sa_u {
struct sockaddr_storage a46;
struct sockaddr_in a4;
struct sockaddr_in6 a6;
} sa_u;
static int usage(void)
{
printf("usage: udpflood [ -l count ] [ -m message_size ] [ -c num_ip_addrs=
] IP_ADDRESS\n");
return -1;
}
static u32 get_last32h(const sa_u *sa)
{
if (sa->a46.ss_family =3D=3D PF_INET)
return ntohl(sa->a4.sin_addr.s_addr);
else
return ntohl(sa->a6.sin6_addr.s6_addr32[3]);
}
static void set_last32h(sa_u *sa, u32 last32h)
{
if (sa->a46.ss_family =3D=3D PF_INET)
sa->a4.sin_addr.s_addr =3D htonl(last32h);
else
sa->a6.sin6_addr.s6_addr32[3] =3D htonl(last32h);
}
static void print_saddr(const sa_u *sa, const char *msg)
{
char buf[64];
if (!debug)
return;
switch (sa->a46.ss_family) {
case PF_INET:
inet_ntop(PF_INET, &(sa->a4.sin_addr.s_addr), buf,
sizeof(buf));
break;
case PF_INET6:
inet_ntop(PF_INET6, &(sa->a6.sin6_addr), buf, sizeof(buf));
break;
}
printf("%s: %s\n", msg, buf);
}
static int send_packets(const sa_u *sa, size_t num_addrs, int count, int ms=
g_sz)
{
char *msg =3D malloc(msg_sz);
sa_u saddr;
u32 start_addr32h, end_addr32h, cur_addr32h;
int fd, i, err;
if (!msg)
return -ENOMEM;
memset(msg, 0, msg_sz);
memcpy(&saddr, sa, sizeof(saddr));
cur_addr32h =3D start_addr32h =3D get_last32h(&saddr);
end_addr32h =3D start_addr32h + num_addrs;
fd =3D socket(saddr.a46.ss_family, SOCK_DGRAM, 0);
if (fd < 0) {
perror("socket");
err =3D fd;
goto out_nofd;
}
/* connect to avoid the kernel spending time in figuring
* out the source address (i.e pin the src address)
*/
err =3D connect(fd, (struct sockaddr *) &saddr, sizeof(saddr));
if (err < 0) {
perror("connect");
goto out;
}
print_saddr(&saddr, "start_addr");
for (i =3D 0; i < count; i++) {
print_saddr(&saddr, "sendto");
err =3D sendto(fd, msg, msg_sz, 0, (struct sockaddr *)&saddr,
sizeof(saddr));
if (err < 0) {
perror("sendto");
goto out;
}
if (++cur_addr32h >=3D end_addr32h)
cur_addr32h =3D start_addr32h;
set_last32h(&saddr, cur_addr32h);
}
err =3D 0;
out:
close(fd);
out_nofd:
free(msg);
return err;
}
int main(int argc, char **argv, char **envp)
{
int port, msg_sz, count, num_addrs, ret;
sa_u start_addr;
port =3D 6000;
msg_sz =3D 32;
count =3D 10000000;
num_addrs =3D 1;
while ((ret =3D getopt(argc, argv, "dl:s:p:c:")) >=3D 0) {
switch (ret) {
case 'l':
sscanf(optarg, "%d", &count);
break;
case 's':
sscanf(optarg, "%d", &msg_sz);
break;
case 'p':
sscanf(optarg, "%d", &port);
break;
case 'c':
sscanf(optarg, "%d", &num_addrs);
break;
case 'd':
debug =3D 1;
break;
case '?':
return usage();
}
}
if (num_addrs < 1)
return usage();
if (!argv[optind])
return usage();
start_addr.a4.sin_port =3D htons(port);
if (inet_pton(PF_INET, argv[optind], &start_addr.a4.sin_addr))
start_addr.a46.ss_family =3D PF_INET;
else if (inet_pton(PF_INET6, argv[optind], &start_addr.a6.sin6_addr.s6_add=
r))
start_addr.a46.ss_family =3D PF_INET6;
else
return usage();
return send_packets(&start_addr, num_addrs, count, msg_sz);
}
/****************** ip6_route_kbench_mod.c ******************/
#define pr_fmt(fmt) "ip6_route_kbench: " fmt
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/inet.h>
#include <linux/in6.h>
#include <net/route.h>
#include <net/ip6_route.h>
#include <linux/timex.h>
#include <uapi/linux/icmpv6.h>
/* We can't just use "get_cycles()" as on some platforms, such
* as sparc64, that gives system cycles rather than cpu clock
* cycles.
*/
#ifdef CONFIG_SPARC64
static inline unsigned long long get_tick(void)
{
unsigned long long t;
__asm__ __volatile__("rd %%tick, %0" : "=r" (t));
return t;
}
#elif defined(CONFIG_X86)
static inline unsigned long long get_tick(void)
{
unsigned long long t;
rdtscll(t);
return t;
}
#elif defined(CONFIG_POWERPC)
static inline unsigned long long get_tick(void)
{
return get_cycles();
}
#else
#error Unsupported architecture, please implement get_tick()
#endif
#define DEFAULT_WARMUP_COUNT 100000
#define DEFAULT_DST_IP_ADDR 0x4a800001
#define DEFAULT_SRC_IP_ADDR 0x00000000
#define DEFAULT_OIF 0
#define DEFAULT_IIF 0
#define DEFAULT_MARK 0x00000000
#define DEFAULT_TOS 0x00
static int flow_oif = DEFAULT_OIF;
static int flow_iif = DEFAULT_IIF;
static u32 flow_mark = DEFAULT_MARK;
static struct in6_addr flow_dst_ip_addr;
static struct in6_addr flow_src_ip_addr;
static int flow_tos = DEFAULT_TOS;
static char dst_string[64];
static char src_string[64];
module_param_string(dst, dst_string, sizeof(dst_string), 0);
module_param_string(src, src_string, sizeof(src_string), 0);
static int __init flow_setup(void)
{
if (dst_string[0] &&
!in6_pton(dst_string, -1, &flow_dst_ip_addr.s6_addr[0], -1, NULL)) {
pr_info("cannot parse \"%s\"\n", dst_string);
return -1;
}
if (src_string[0] &&
!in6_pton(src_string, -1, &flow_src_ip_addr.s6_addr[0], -1, NULL)) {
pr_info("cannot parse \"%s\"\n", dst_string);
return -1;
}
return 0;
}
module_param_named(oif, flow_oif, int, 0);
module_param_named(iif, flow_iif, int, 0);
module_param_named(mark, flow_mark, uint, 0);
module_param_named(tos, flow_tos, int, 0);
static int warmup_count = DEFAULT_WARMUP_COUNT;
module_param_named(count, warmup_count, int, 0);
static void flow_init(struct flowi6 *fl6)
{
memset(fl6, 0, sizeof(*fl6));
fl6->flowi6_proto = IPPROTO_ICMPV6;
fl6->flowi6_oif = flow_oif;
fl6->flowi6_iif = flow_iif;
fl6->flowi6_mark = flow_mark;
fl6->flowi6_tos = flow_tos;
fl6->daddr = flow_dst_ip_addr;
fl6->saddr = flow_src_ip_addr;
}
static struct sk_buff * fake_skb_get(void)
{
struct ipv6hdr *hdr;
struct sk_buff *skb;
skb = alloc_skb(4096, GFP_KERNEL);
if (!skb) {
pr_info("Cannot alloc SKB for test\n");
return NULL;
}
skb->dev = __dev_get_by_index(&init_net, flow_iif);
if (skb->dev == NULL) {
pr_info("Input device (%d) does not exist\n", flow_iif);
goto err;
}
skb_reset_mac_header(skb);
skb_reset_network_header(skb);
skb_reserve(skb, MAX_HEADER + sizeof(struct ipv6hdr));
hdr = ipv6_hdr(skb);
hdr->priority = 0;
hdr->version = 6;
memset(hdr->flow_lbl, 0, sizeof(hdr->flow_lbl));
hdr->payload_len = htons(sizeof(struct icmp6hdr));
hdr->nexthdr = IPPROTO_ICMPV6;
hdr->saddr = flow_src_ip_addr;
hdr->daddr = flow_dst_ip_addr;
skb->protocol = htons(ETH_P_IPV6);
skb->mark = flow_mark;
return skb;
err:
kfree_skb(skb);
return NULL;
}
static void do_full_output_lookup_bench(void)
{
unsigned long long t1, t2, tdiff;
struct rt6_info *rt;
struct flowi6 fl6;
int i;
rt = NULL;
for (i = 0; i < warmup_count; i++) {
flow_init(&fl6);
rt = (struct rt6_info *)ip6_route_output(&init_net, NULL, &fl6);
if (IS_ERR(rt))
break;
ip6_rt_put(rt);
}
if (IS_ERR(rt)) {
pr_info("ip_route_output_key: err=%ld\n", PTR_ERR(rt));
return;
}
flow_init(&fl6);
t1 = get_tick();
rt = (struct rt6_info *)ip6_route_output(&init_net, NULL, &fl6);
t2 = get_tick();
if (!IS_ERR(rt))
ip6_rt_put(rt);
tdiff = t2 - t1;
pr_info("ip6_route_output tdiff: %llu\n", tdiff);
}
static void do_full_input_lookup_bench(void)
{
unsigned long long t1, t2, tdiff;
struct sk_buff *skb;
struct rt6_info *rt;
int err, i;
skb = fake_skb_get();
if (skb == NULL)
goto out_free;
err = 0;
local_bh_disable();
for (i = 0; i < warmup_count; i++) {
ip6_route_input(skb);
rt = (struct rt6_info *)skb_dst(skb);
err = (!rt || rt == init_net.ipv6.ip6_null_entry);
skb_dst_drop(skb);
if (err)
break;
}
local_bh_enable();
if (err) {
pr_info("Input route lookup fails\n");
goto out_free;
}
local_bh_disable();
t1 = get_tick();
ip6_route_input(skb);
t2 = get_tick();
local_bh_enable();
rt = (struct rt6_info *)skb_dst(skb);
err = (!rt || rt == init_net.ipv6.ip6_null_entry);
skb_dst_drop(skb);
if (err) {
pr_info("Input route lookup fails\n");
goto out_free;
}
tdiff = t2 - t1;
pr_info("ip6_route_input tdiff: %llu\n", tdiff);
out_free:
kfree_skb(skb);
}
static void do_full_lookup_bench(void)
{
if (!flow_iif)
do_full_output_lookup_bench();
else
do_full_input_lookup_bench();
}
static void do_bench(void)
{
do_full_lookup_bench();
do_full_lookup_bench();
do_full_lookup_bench();
do_full_lookup_bench();
}
static int __init kbench_init(void)
{
if (flow_setup())
return -EINVAL;
pr_info("flow [IIF(%d),OIF(%d),MARK(0x%08x),D("IP6_FMT"),"
"S("IP6_FMT"),TOS(0x%02x)]\n",
flow_iif, flow_oif, flow_mark,
IP6_PRT(flow_dst_ip_addr),
IP6_PRT(flow_src_ip_addr),
flow_tos);
#if defined(CONFIG_X86)
if (!cpu_has_tsc) {
pr_err("X86 TSC is required, but is unavailable.\n");
return -EINVAL;
}
#endif
pr_info("sizeof(struct rt6_info)==%zu\n", sizeof(struct rt6_info));
do_bench();
return -ENODEV;
}
static void __exit kbench_exit(void)
{
}
module_init(kbench_init);
module_exit(kbench_exit);
MODULE_LICENSE("GPL");
^ permalink raw reply
* Re: [PATCH] net: pxa168_eth: PXA168_ETH should depend on HAS_DMA
From: David Miller @ 2014-10-10 18:46 UTC (permalink / raw)
To: geert; +Cc: netdev, linux-kernel
In-Reply-To: <1412864142-21468-1-git-send-email-geert@linux-m68k.org>
From: Geert Uytterhoeven <geert@linux-m68k.org>
Date: Thu, 9 Oct 2014 16:15:42 +0200
> If NO_DMA=y:
...
> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Applied, thank you.
^ permalink raw reply
* Re: eth_get_headlen() and unaligned accesses...
From: Tom Herbert @ 2014-10-10 18:41 UTC (permalink / raw)
To: David Miller; +Cc: Linux Netdev List, Alexander Duyck
In-Reply-To: <20141009.201248.1210454965155680255.davem@davemloft.net>
On Thu, Oct 9, 2014 at 5:12 PM, David Miller <davem@davemloft.net> wrote:
>
> So, we have a bit of a problem, this is on sparc64:
>
> [423475.740836] Kernel unaligned access at TPC[81d330] __skb_flow_get_ports+0x70/0xe0
> [423475.755756] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.17.0+ #2
> [423475.767854] Call Trace:
> [423475.772877] [0000000000433288] kernel_unaligned_trap+0x368/0x5c0
> [423475.785203] [000000000042a824] sun4v_do_mna+0x84/0xa0
> [423475.795624] [0000000000406cd0] sun4v_mna+0x5c/0x68
> [423475.805521] [000000000081d330] __skb_flow_get_ports+0x70/0xe0
> [423475.817323] [000000000081d6ac] __skb_flow_dissect+0x1ac/0x460
> [423475.829128] [0000000000843c98] eth_get_headlen+0x38/0xa0
> [423475.840083] [0000000010064d54] igb_poll+0x8d4/0xf60 [igb]
> [423475.851184] [00000000008243c8] net_rx_action+0xa8/0x1c0
>
> The chip DMA's to the beginning of a frag page and (unless timestamps
> are enabled) that's where the ethernet header begins.
>
> So any larger than 16-bit access to the IP and later headers will be
> unaligned.
>
Will this also be a problem for GRE packet carrying Ethernet packet?
> We have various ways we can deal with this based upon the capabilities
> of the chips involved. Can we configure the IGB to put 2 "don't care"
> bytes at the beginning of the packet?
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox