* Re: [PATCH net] net: qmi_wwan: fix Oops while disconnecting
From: Oliver Neukum @ 2012-06-28 8:35 UTC (permalink / raw)
To: Ming Lei
Cc: Bjørn Mork, netdev, linux-usb, Marius Bjørnstad Kotsbak
In-Reply-To: <CACVXFVO5x9pePZekJPnywr7czXG_3_xCRj-rXEgf2KSQXnmPbA@mail.gmail.com>
Am Dienstag, 26. Juni 2012, 09:23:19 schrieb Ming Lei:
> On Mon, Jun 25, 2012 at 8:10 PM, Oliver Neukum <oliver@neukum.org> wrote:
> > At this point a minidriver must not follow the intfdata pointer,
> > because the interface may again be probed. So if here a minidriver
>
> IMO, probe is serialized strictly with driver unbind since both the parent
> lock and its own device lock have been held, so the probe may only be
> started after driver unbinding is completed.
Yes, but if you have a driver which claims multiple interfaces and uses
a subdriver, then you will have cases of intfdate being NULL before
disconnect() finishes.
> > still uses intfdata, locking will be needed. We want to catch those
> > casees.
>
> Suppose infdata is used here somewhere, it is surely a bug because
> the usbnet instance pointed by intfdata will be freed soon.
Of course. That is the point.
> So looks putting the set to NULL after driver_info->unbind is good,
> doesn't it?
Again, of course. We could drop it (but not the check for NULL in usbnet).
It is a debugging aid.
> > usb_kill_urb(dev->interrupt);
> > usb_free_urb(dev->interrupt);
> >
> > free_netdev(net);
> > usb_put_dev (xdev);
> > }
> >
> >> > Sure, it is a debugging aid. It has the drawback that minidrivers have
> >> > to be able to deal with intfdata being NULL. That is not hard to do.
> >>
> >> The check isn't needed if the set to NULL is put after driver_info->unbind
> >> in usbnet_disconnect.
> >
> > True, but we don't catch bugs.
>
> If the check is added, the bugs may be hided, and no stack will be
> dumped, :-)
That is also true.
Bjørn,
do you use subdrivers with cdc-ether?
Regards
Oliver
--
- - -
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nürnberg)
Maxfeldstraße 5
90409 Nürnberg
Germany
- - -
^ permalink raw reply
* Re: [patch net-next] virtio_net: allow to change mac when iface is running
From: David Miller @ 2012-06-28 8:30 UTC (permalink / raw)
To: jpirko; +Cc: netdev, virtualization, brouer, mst
In-Reply-To: <20120628063525.GA1520@minipsycho.orion>
From: Jiri Pirko <jpirko@redhat.com>
Date: Thu, 28 Jun 2012 08:35:25 +0200
> Thu, Jun 28, 2012 at 06:30:46AM CEST, davem@davemloft.net wrote:
>>It therefore probably makes sense to add a boolean arg which when true
>>elides the netif_running() check then fixup and audit every caller.
>
> I was thinking about this. Maybe probably __eth_mac_addr() which does
> not have netif_running() check and eth_mac_addr() calling
> netif_running() check and __eth_mac_addr() after that.
>
> What do you think?
Yes, sounds good.
^ permalink raw reply
* Re: [PATCH v2] l2tp: use per-cpu variables for u64_stats updates
From: Tom Parkin @ 2012-06-28 8:24 UTC (permalink / raw)
To: Eric Dumazet
Cc: Rick Jones, Ben Greear, Stephen Hemminger, netdev, David.Laight,
James Chapman
In-Reply-To: <1340859654.26242.201.camel@edumazet-glaptop>
[-- Attachment #1: Type: text/plain, Size: 807 bytes --]
On Thu, Jun 28, 2012 at 07:00:54AM +0200, Eric Dumazet wrote:
> [1] : LLTX drivers case
> since ndo_start_xmit() can be run concurrently by many cpus, safely
> updating an "unsigned long" requires additional hassle :
>
> 1) Use of a spinlock to protect the update.
> 2) Use atomic_long_t instead of "unsigned long"
> 3) Use percpu data
>
> 3) is overkill for devices with light traffic, because it consumes lot
> of RAM on machines with 2048 possible cpus, _and_ the reader must fold
> the data of all possible values.
Thanks Eric.
So am I right in thinking that a v3 patch which uses atomic_long_t for
the statistics would be the correct way forwards?
--
Tom Parkin
Katalix Systems Ltd
http://www.katalix.com
Catalysts for your Embedded Linux software development
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]
^ permalink raw reply
* Re: [net-next PATCH 02/02] net/ipv4: VTI support new module for ip_vti.
From: Steffen Klassert @ 2012-06-28 8:04 UTC (permalink / raw)
To: Saurabh; +Cc: netdev
In-Reply-To: <20120628010218.GA4056@debian-saurabh-64.vyatta.com>
On Wed, Jun 27, 2012 at 06:02:18PM -0700, Saurabh wrote:
>
> +config NET_IPVTI
> + tristate "Virtual (secure) IP: tunneling"
> + select INET_TUNNEL
> + depends on INET_XFRM_MODE_TUNNEL
> + ---help---
> + Tunneling means encapsulating data of one protocol type within
> + another protocol and sending it over a channel that understands the
> + Pencapsulating protocol. This particular tunneling driver implements
> + encapsulation of IP within IP-ESP. This can be used with xfrm to give
This is not ESP specific anymore.
> + the notion of a secure tunnel and then use routing protocol on top.
> +
> + Saying Y to this option will produce one module ( = code which can
> + be inserted in and removed from the running kernel whenever you
> + want). Most people won't need this and can say N.
> +
Saying Y does not build a module, saying M builds a module. Also,
even if build as a module, you can't remove it whenever you want.
You can remove it as long as it is unused.
...
> +static netdev_tx_t vti_tunnel_xmit(struct sk_buff *skb, struct net_device *dev)
> +{
> + struct ip_tunnel *tunnel = netdev_priv(dev);
> + struct pcpu_tstats *tstats;
> + struct net_device_stats *stats = &tunnel->dev->stats;
> + struct iphdr *tiph = &tunnel->parms.iph;
> + u8 tos = tunnel->parms.iph.tos;
> + struct rtable *rt; /* Route to the other host */
> + struct net_device *tdev; /* Device to other host */
> + struct iphdr *old_iph = ip_hdr(skb);
> + __be32 dst = tiph->daddr;
> + struct flowi4 fl4;
> +
> + if (skb->protocol != htons(ETH_P_IP))
> + goto tx_error;
> +
> + if (tos&1)
> + tos = old_iph->tos;
> +
> + if (!dst) {
> + /* NBMA tunnel */
> + rt = skb_rtable(skb);
> + if (rt == NULL) {
> + stats->tx_fifo_errors++;
> + goto tx_error;
> + }
> + dst = rt->rt_gateway;
> + if (dst == 0)
> + goto tx_error_icmp;
> + }
> +
> + memset(&fl4, 0, sizeof(fl4));
> + flowi4_init_output(&fl4, tunnel->parms.link,
> + htonl(tunnel->parms.i_key), RT_TOS(tos), RT_SCOPE_UNIVERSE,
> + IPPROTO_IPIP, 0,
> + dst, tiph->saddr, 0, 0);
> + rt = ip_route_output_key(dev_net(dev), &fl4);
> + if (IS_ERR(rt)) {
> + dev->stats.tx_carrier_errors++;
> + goto tx_error_icmp;
> + }
> +#ifdef CONFIG_XFRM
> + /* if there is no transform then this tunnel is not functional. */
> + if (!rt->dst.xfrm) {
What if this is a transport mode xfrm?
You should ensure that this is really a tunnel mode xfrm.
> + stats->tx_carrier_errors++;
> + goto tx_error_icmp;
> + }
> +#endif
^ permalink raw reply
* Re: [PATCH] xfrm_user: Propagate netlink error codes properly.
From: Thomas Graf @ 2012-06-28 7:40 UTC (permalink / raw)
To: David Miller; +Cc: netdev
In-Reply-To: <20120627.215728.137770920620706227.davem@davemloft.net>
On Wed, Jun 27, 2012 at 09:57:28PM -0700, David Miller wrote:
>
> Instead of using a fixed value of "-1" or "-EMSGSIZE", propagate what
> the nla_*() interfaces actually return.
>
> Signed-off-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Thomas Graf <tgraf@suug.ch>
^ permalink raw reply
* [net-next.git 4/4 (v9)] phy: add the EEE support and the way to access to the MMD registers.
From: Giuseppe CAVALLARO @ 2012-06-28 7:14 UTC (permalink / raw)
To: netdev
Cc: eric.dumazet, rayagond, davem, yuvalmin, bhutchings,
Giuseppe Cavallaro
In-Reply-To: <1340867678-18375-1-git-send-email-peppe.cavallaro@st.com>
This patch adds the support for the Energy-Efficient Ethernet (EEE)
to the Physical Abstraction Layer.
To support the EEE we have to access to the MMD registers 3.20 and
7.60/61. So two new functions have been added to read/write the MMD
registers (clause 45).
An Ethernet driver (I tested the stmmac) can invoke the phy_init_eee to properly
check if the EEE is supported by the PHYs and it can also set the clock
stop enable bit in the 3.0 register.
The phy_get_eee_err can be used for reporting the number of time where
the PHY failed to complete its normal wake sequence.
In the end, this patch also adds the EEE ethtool support implementing:
o phy_ethtool_set_eee
o phy_ethtool_get_eee
v1: initial patch
v2: fixed some errors especially on naming convention
v3: renamed again the mmd read/write functions thank to Ben's feedback
v4: moved file to phy.c and added the ethtool support.
v5: fixed phy_adv_to_eee, phy_eee_to_supported, phy_eee_to_adv return
values according to ethtool API (thanks to Ben's feedback).
Renamed some macros to avoid too long names.
v6: fixed kernel-doc comments to be properly parsed.
Fixed the phy_init_eee function: we need to check which link mode
was autonegotiated and then the corresponding bits in 7.60 and 7.61
registers.
v7: reviewed the way to get the negotiated settings.
v8: fixed a problem in the phy_init_eee return value erroneously added
when included the phy_read_status call.
v9: do not remove the MDIO_AN_EEE_ADV_100TX and MDIO_AN_EEE_ADV_1000T
and fixed the eee_{cap,lp,adv} declaration as "int" instead of u16.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
---
drivers/net/phy/phy.c | 281 +++++++++++++++++++++++++++++++++++++++++++++++++
include/linux/mdio.h | 28 ++++-
include/linux/mii.h | 9 ++
include/linux/phy.h | 5 +
4 files changed, 319 insertions(+), 4 deletions(-)
diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 2e1c237..7ca2ff9 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -35,6 +35,7 @@
#include <linux/phy.h>
#include <linux/timer.h>
#include <linux/workqueue.h>
+#include <linux/mdio.h>
#include <linux/atomic.h>
#include <asm/io.h>
@@ -967,3 +968,283 @@ void phy_state_machine(struct work_struct *work)
schedule_delayed_work(&phydev->state_queue, PHY_STATE_TIME * HZ);
}
+
+static inline void mmd_phy_indirect(struct mii_bus *bus, int prtad, int devad,
+ int addr)
+{
+ /* Write the desired MMD Devad */
+ bus->write(bus, addr, MII_MMD_CTRL, devad);
+
+ /* Write the desired MMD register address */
+ bus->write(bus, addr, MII_MMD_DATA, prtad);
+
+ /* Select the Function : DATA with no post increment */
+ bus->write(bus, addr, MII_MMD_CTRL, (devad | MII_MMD_CTRL_NOINCR));
+}
+
+/**
+ * phy_read_mmd_indirect - reads data from the MMD registers
+ * @bus: the target MII bus
+ * @prtad: MMD Address
+ * @devad: MMD DEVAD
+ * @addr: PHY address on the MII bus
+ *
+ * Description: it reads data from the MMD registers (clause 22 to access to
+ * clause 45) of the specified phy address.
+ * To read these register we have:
+ * 1) Write reg 13 // DEVAD
+ * 2) Write reg 14 // MMD Address
+ * 3) Write reg 13 // MMD Data Command for MMD DEVAD
+ * 3) Read reg 14 // Read MMD data
+ */
+static int phy_read_mmd_indirect(struct mii_bus *bus, int prtad, int devad,
+ int addr)
+{
+ u32 ret;
+
+ mmd_phy_indirect(bus, prtad, devad, addr);
+
+ /* Read the content of the MMD's selected register */
+ ret = bus->read(bus, addr, MII_MMD_DATA);
+
+ return ret;
+}
+
+/**
+ * phy_write_mmd_indirect - writes data to the MMD registers
+ * @bus: the target MII bus
+ * @prtad: MMD Address
+ * @devad: MMD DEVAD
+ * @addr: PHY address on the MII bus
+ * @data: data to write in the MMD register
+ *
+ * Description: Write data from the MMD registers of the specified
+ * phy address.
+ * To write these register we have:
+ * 1) Write reg 13 // DEVAD
+ * 2) Write reg 14 // MMD Address
+ * 3) Write reg 13 // MMD Data Command for MMD DEVAD
+ * 3) Write reg 14 // Write MMD data
+ */
+static void phy_write_mmd_indirect(struct mii_bus *bus, int prtad, int devad,
+ int addr, u32 data)
+{
+ mmd_phy_indirect(bus, prtad, devad, addr);
+
+ /* Write the data into MMD's selected register */
+ bus->write(bus, addr, MII_MMD_DATA, data);
+}
+
+static u32 phy_eee_to_adv(u16 eee_adv)
+{
+ u32 adv = 0;
+
+ if (eee_adv & MDIO_EEE_100TX)
+ adv |= ADVERTISED_100baseT_Full;
+ if (eee_adv & MDIO_EEE_1000T)
+ adv |= ADVERTISED_1000baseT_Full;
+ if (eee_adv & MDIO_EEE_10GT)
+ adv |= ADVERTISED_10000baseT_Full;
+ if (eee_adv & MDIO_EEE_1000KX)
+ adv |= ADVERTISED_1000baseKX_Full;
+ if (eee_adv & MDIO_EEE_10GKX4)
+ adv |= ADVERTISED_10000baseKX4_Full;
+ if (eee_adv & MDIO_EEE_10GKR)
+ adv |= ADVERTISED_10000baseKR_Full;
+
+ return adv;
+}
+
+static u32 phy_eee_to_supported(u16 eee_caported)
+{
+ u32 supported = 0;
+
+ if (eee_caported & MDIO_EEE_100TX)
+ supported |= SUPPORTED_100baseT_Full;
+ if (eee_caported & MDIO_EEE_1000T)
+ supported |= SUPPORTED_1000baseT_Full;
+ if (eee_caported & MDIO_EEE_10GT)
+ supported |= SUPPORTED_10000baseT_Full;
+ if (eee_caported & MDIO_EEE_1000KX)
+ supported |= SUPPORTED_1000baseKX_Full;
+ if (eee_caported & MDIO_EEE_10GKX4)
+ supported |= SUPPORTED_10000baseKX4_Full;
+ if (eee_caported & MDIO_EEE_10GKR)
+ supported |= SUPPORTED_10000baseKR_Full;
+
+ return supported;
+}
+
+static u16 phy_adv_to_eee(u32 adv)
+{
+ u16 reg = 0;
+
+ if (adv & ADVERTISED_100baseT_Full)
+ reg |= MDIO_EEE_100TX;
+ if (adv & ADVERTISED_1000baseT_Full)
+ reg |= MDIO_EEE_1000T;
+ if (adv & ADVERTISED_10000baseT_Full)
+ reg |= MDIO_EEE_10GT;
+ if (adv & ADVERTISED_1000baseKX_Full)
+ reg |= MDIO_EEE_1000KX;
+ if (adv & ADVERTISED_10000baseKX4_Full)
+ reg |= MDIO_EEE_10GKX4;
+ if (adv & ADVERTISED_10000baseKR_Full)
+ reg |= MDIO_EEE_10GKR;
+
+ return reg;
+}
+
+/**
+ * phy_init_eee - init and check the EEE feature
+ * @phydev: target phy_device struct
+ * @clk_stop_enable: PHY may stop the clock during LPI
+ *
+ * Description: it checks if the Energy-Efficient Ethernet (EEE)
+ * is supported by looking at the MMD registers 3.20 and 7.60/61
+ * and it programs the MMD register 3.0 setting the "Clock stop enable"
+ * bit if required.
+ */
+int phy_init_eee(struct phy_device *phydev, bool clk_stop_enable)
+{
+ int ret = -EPROTONOSUPPORT;
+
+ /* According to 802.3az,the EEE is supported only in full duplex-mode.
+ * Also EEE feature is active when core is operating with MII, GMII
+ * or RGMII.
+ */
+ if ((phydev->duplex == DUPLEX_FULL) &&
+ ((phydev->interface == PHY_INTERFACE_MODE_MII) ||
+ (phydev->interface == PHY_INTERFACE_MODE_GMII) ||
+ (phydev->interface == PHY_INTERFACE_MODE_RGMII))) {
+ int eee_lp, eee_cap, eee_adv;
+ u32 lp, cap, adv;
+ int idx, status;
+
+ /* Read phy status to properly get the right settings */
+ status = phy_read_status(phydev);
+ if (status)
+ return status;
+
+ /* First check if the EEE ability is supported */
+ eee_cap = phy_read_mmd_indirect(phydev->bus, MDIO_PCS_EEE_ABLE,
+ MDIO_MMD_PCS, phydev->addr);
+ if (eee_cap < 0)
+ return eee_cap;
+
+ cap = phy_eee_to_supported(eee_cap);
+ if (!cap)
+ goto eee_exit;
+
+ /* Check which link settings negotiated and verify it in
+ * the EEE advertising registers.
+ */
+ eee_lp = phy_read_mmd_indirect(phydev->bus, MDIO_AN_EEE_LPABLE,
+ MDIO_MMD_AN, phydev->addr);
+ if (eee_lp < 0)
+ return eee_lp;
+
+ eee_adv = phy_read_mmd_indirect(phydev->bus, MDIO_AN_EEE_ADV,
+ MDIO_MMD_AN, phydev->addr);
+ if (eee_adv < 0)
+ return eee_adv;
+
+ adv = phy_eee_to_adv(eee_adv);
+ lp = phy_eee_to_adv(eee_lp);
+ idx = phy_find_setting(phydev->speed, phydev->duplex);
+ if ((lp & adv & settings[idx].setting))
+ goto eee_exit;
+
+ if (clk_stop_enable) {
+ /* Configure the PHY to stop receiving xMII
+ * clock while it is signaling LPI.
+ */
+ int val = phy_read_mmd_indirect(phydev->bus, MDIO_CTRL1,
+ MDIO_MMD_PCS,
+ phydev->addr);
+ if (val < 0)
+ return val;
+
+ val |= MDIO_PCS_CTRL1_CLKSTOP_EN;
+ phy_write_mmd_indirect(phydev->bus, MDIO_CTRL1,
+ MDIO_MMD_PCS, phydev->addr, val);
+ }
+
+ ret = 0; /* EEE supported */
+ }
+
+eee_exit:
+ return ret;
+}
+EXPORT_SYMBOL(phy_init_eee);
+
+/**
+ * phy_get_eee_err - report the EEE wake error count
+ * @phydev: target phy_device struct
+ *
+ * Description: it is to report the number of time where the PHY
+ * failed to complete its normal wake sequence.
+ */
+int phy_get_eee_err(struct phy_device *phydev)
+{
+ return phy_read_mmd_indirect(phydev->bus, MDIO_PCS_EEE_WK_ERR,
+ MDIO_MMD_PCS, phydev->addr);
+
+}
+EXPORT_SYMBOL(phy_get_eee_err);
+
+/**
+ * phy_ethtool_get_eee - get EEE supported and status
+ * @phydev: target phy_device struct
+ * @data: ethtool_eee data
+ *
+ * Description: it reportes the Supported/Advertisement/LP Advertisement
+ * capabilities.
+ */
+int phy_ethtool_get_eee(struct phy_device *phydev, struct ethtool_eee *data)
+{
+ int val;
+
+ /* Get Supported EEE */
+ val = phy_read_mmd_indirect(phydev->bus, MDIO_PCS_EEE_ABLE,
+ MDIO_MMD_PCS, phydev->addr);
+ if (val < 0)
+ return val;
+ data->supported = phy_eee_to_supported(val);
+
+ /* Get advertisement EEE */
+ val = phy_read_mmd_indirect(phydev->bus, MDIO_AN_EEE_ADV,
+ MDIO_MMD_AN, phydev->addr);
+ if (val < 0)
+ return val;
+ data->advertised = phy_eee_to_adv(val);
+
+ /* Get LP advertisement EEE */
+ val = phy_read_mmd_indirect(phydev->bus, MDIO_AN_EEE_LPABLE,
+ MDIO_MMD_AN, phydev->addr);
+ if (val < 0)
+ return val;
+ data->lp_advertised = phy_eee_to_adv(val);
+
+ return 0;
+}
+EXPORT_SYMBOL(phy_ethtool_get_eee);
+
+/**
+ * phy_ethtool_set_eee - set EEE supported and status
+ * @phydev: target phy_device struct
+ * @data: ethtool_eee data
+ *
+ * Description: it is to program the Advertisement EEE register.
+ */
+int phy_ethtool_set_eee(struct phy_device *phydev, struct ethtool_eee *data)
+{
+ int val;
+
+ val = phy_adv_to_eee(data->advertised);
+ phy_write_mmd_indirect(phydev->bus, MDIO_AN_EEE_ADV, MDIO_MMD_AN,
+ phydev->addr, val);
+
+ return 0;
+}
+EXPORT_SYMBOL(phy_ethtool_set_eee);
diff --git a/include/linux/mdio.h b/include/linux/mdio.h
index dfb9479..7cccafe 100644
--- a/include/linux/mdio.h
+++ b/include/linux/mdio.h
@@ -43,7 +43,11 @@
#define MDIO_PKGID2 15
#define MDIO_AN_ADVERTISE 16 /* AN advertising (base page) */
#define MDIO_AN_LPA 19 /* AN LP abilities (base page) */
+#define MDIO_PCS_EEE_ABLE 20 /* EEE Capability register */
+#define MDIO_PCS_EEE_WK_ERR 22 /* EEE wake error counter */
#define MDIO_PHYXS_LNSTAT 24 /* PHY XGXS lane state */
+#define MDIO_AN_EEE_ADV 60 /* EEE advertisement */
+#define MDIO_AN_EEE_LPABLE 61 /* EEE link partner ability */
/* Media-dependent registers. */
#define MDIO_PMA_10GBT_SWAPPOL 130 /* 10GBASE-T pair swap & polarity */
@@ -56,7 +60,6 @@
#define MDIO_PCS_10GBRT_STAT2 33 /* 10GBASE-R/-T PCS status 2 */
#define MDIO_AN_10GBT_CTRL 32 /* 10GBASE-T auto-negotiation control */
#define MDIO_AN_10GBT_STAT 33 /* 10GBASE-T auto-negotiation status */
-#define MDIO_AN_EEE_ADV 60 /* EEE advertisement */
/* LASI (Link Alarm Status Interrupt) registers, defined by XENPAK MSA. */
#define MDIO_PMA_LASI_RXCTRL 0x9000 /* RX_ALARM control */
@@ -82,6 +85,7 @@
#define MDIO_AN_CTRL1_RESTART BMCR_ANRESTART
#define MDIO_AN_CTRL1_ENABLE BMCR_ANENABLE
#define MDIO_AN_CTRL1_XNP 0x2000 /* Enable extended next page */
+#define MDIO_PCS_CTRL1_CLKSTOP_EN 0x400 /* Stop the clock during LPI */
/* 10 Gb/s */
#define MDIO_CTRL1_SPEED10G (MDIO_CTRL1_SPEEDSELEXT | 0x00)
@@ -237,9 +241,25 @@
#define MDIO_AN_10GBT_STAT_MS 0x4000 /* Master/slave config */
#define MDIO_AN_10GBT_STAT_MSFLT 0x8000 /* Master/slave config fault */
-/* AN EEE Advertisement register. */
-#define MDIO_AN_EEE_ADV_100TX 0x0002 /* Advertise 100TX EEE cap */
-#define MDIO_AN_EEE_ADV_1000T 0x0004 /* Advertise 1000T EEE cap */
+/* EEE Supported/Advertisement/LP Advertisement registers.
+ *
+ * EEE capability Register (3.20), Advertisement (7.60) and
+ * Link partner ability (7.61) registers have and can use the same identical
+ * bit masks.
+ */
+#define MDIO_AN_EEE_ADV_100TX 0x0002 /* Advertise 100TX EEE cap */
+#define MDIO_AN_EEE_ADV_1000T 0x0004 /* Advertise 1000T EEE cap */
+/* Note: the two defines above can be potentially used by the user-land
+ * and cannot remove them now.
+ * So, we define the new generic MDIO_EEE_100TX and MDIO_EEE_1000T macros
+ * using the previous ones (that can be considered obsolete).
+ */
+#define MDIO_EEE_100TX MDIO_AN_EEE_ADV_100TX /* 100TX EEE cap */
+#define MDIO_EEE_1000T MDIO_AN_EEE_ADV_1000T /* 1000T EEE cap */
+#define MDIO_EEE_10GT 0x0008 /* 10GT EEE cap */
+#define MDIO_EEE_1000KX 0x0010 /* 1000KX EEE cap */
+#define MDIO_EEE_10GKX4 0x0020 /* 10G KX4 EEE cap */
+#define MDIO_EEE_10GKR 0x0040 /* 10G KR EEE cap */
/* LASI RX_ALARM control/status registers. */
#define MDIO_PMA_LASI_RX_PHYXSLFLT 0x0001 /* PHY XS RX local fault */
diff --git a/include/linux/mii.h b/include/linux/mii.h
index 2783eca..8ef3a7a 100644
--- a/include/linux/mii.h
+++ b/include/linux/mii.h
@@ -21,6 +21,8 @@
#define MII_EXPANSION 0x06 /* Expansion register */
#define MII_CTRL1000 0x09 /* 1000BASE-T control */
#define MII_STAT1000 0x0a /* 1000BASE-T status */
+#define MII_MMD_CTRL 0x0d /* MMD Access Control Register */
+#define MII_MMD_DATA 0x0e /* MMD Access Data Register */
#define MII_ESTATUS 0x0f /* Extended Status */
#define MII_DCOUNTER 0x12 /* Disconnect counter */
#define MII_FCSCOUNTER 0x13 /* False carrier counter */
@@ -141,6 +143,13 @@
#define FLOW_CTRL_TX 0x01
#define FLOW_CTRL_RX 0x02
+/* MMD Access Control register fields */
+#define MII_MMD_CTRL_DEVAD_MASK 0x1f /* Mask MMD DEVAD*/
+#define MII_MMD_CTRL_ADDR 0x0000 /* Address */
+#define MII_MMD_CTRL_NOINCR 0x4000 /* no post increment */
+#define MII_MMD_CTRL_INCR_RDWT 0x8000 /* post increment on reads & writes */
+#define MII_MMD_CTRL_INCR_ON_WT 0xC000 /* post increment on writes only */
+
/* This structure is used in all SIOCxMIIxxx ioctl calls */
struct mii_ioctl_data {
__u16 phy_id;
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 7eac80a..c35299e 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -554,6 +554,11 @@ int phy_register_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask,
int (*run)(struct phy_device *));
int phy_scan_fixups(struct phy_device *phydev);
+int phy_init_eee(struct phy_device *phydev, bool clk_stop_enable);
+int phy_get_eee_err(struct phy_device *phydev);
+int phy_ethtool_set_eee(struct phy_device *phydev, struct ethtool_eee *data);
+int phy_ethtool_get_eee(struct phy_device *phydev, struct ethtool_eee *data);
+
int __init mdio_bus_init(void);
void mdio_bus_exit(void);
--
1.7.4.4
^ permalink raw reply related
* [net-next.git 3/4 (v5)] stmmac: add the Energy Efficient Ethernet support
From: Giuseppe CAVALLARO @ 2012-06-28 7:14 UTC (permalink / raw)
To: netdev
Cc: eric.dumazet, rayagond, davem, yuvalmin, bhutchings,
Giuseppe Cavallaro
In-Reply-To: <1340867678-18375-1-git-send-email-peppe.cavallaro@st.com>
This patch adds the Energy Efficient Ethernet support to the stmmac.
Please see the driver's documentation for further details about this support
in the driver.
Thanks also goes to Rayagond Kokatanur for his first implementation.
Note:
to clearly manage and expose the lpi interrupt status and eee ethtool
stats I've had to do some modifications to the driver's design and I
found really useful to move other parts of the code (e.g. mmc irq stat)
in the main directly. So this means that some core has been reworked
to introduce the EEE.
v1: initial patch
v2: fixed some sparse issues (typos)
v3: erroneously sent the v2 renamed as v3
v4:
o Fixed the return value of the stmmac_eee_init as suggested by D.Miller
o Totally reviewed the ethtool support for EEE
o Added a new internal parameter to tune the SW timer for TX LPI.
v5: do not change any eee setting in case of the stmmac_ethtool_op_set_eee fails
(it has to return -EOPNOTSUPP in that case).
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
---
drivers/net/ethernet/stmicro/stmmac/common.h | 31 ++++-
drivers/net/ethernet/stmicro/stmmac/dwmac1000.h | 20 +++
.../net/ethernet/stmicro/stmmac/dwmac1000_core.c | 101 +++++++++++-
.../net/ethernet/stmicro/stmmac/dwmac100_core.c | 4 +-
drivers/net/ethernet/stmicro/stmmac/dwmac_dma.h | 1 +
drivers/net/ethernet/stmicro/stmmac/stmmac.h | 8 +
.../net/ethernet/stmicro/stmmac/stmmac_ethtool.c | 57 +++++++
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 166 +++++++++++++++++++-
.../net/ethernet/stmicro/stmmac/stmmac_platform.c | 2 +
9 files changed, 372 insertions(+), 18 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h
index bcd54d6..e2d0832 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -95,6 +95,16 @@ struct stmmac_extra_stats {
unsigned long poll_n;
unsigned long sched_timer_n;
unsigned long normal_irq_n;
+ unsigned long mmc_tx_irq_n;
+ unsigned long mmc_rx_irq_n;
+ unsigned long mmc_rx_csum_offload_irq_n;
+ /* EEE */
+ unsigned long irq_receive_pmt_irq_n;
+ unsigned long irq_tx_path_in_lpi_mode_n;
+ unsigned long irq_tx_path_exit_lpi_mode_n;
+ unsigned long irq_rx_path_in_lpi_mode_n;
+ unsigned long irq_rx_path_exit_lpi_mode_n;
+ unsigned long phy_eee_wakeup_error_n;
};
/* CSR Frequency Access Defines*/
@@ -162,6 +172,17 @@ enum tx_dma_irq_status {
handle_tx_rx = 3,
};
+enum core_specific_irq_mask {
+ core_mmc_tx_irq = 1,
+ core_mmc_rx_irq = 2,
+ core_mmc_rx_csum_offload_irq = 4,
+ core_irq_receive_pmt_irq = 8,
+ core_irq_tx_path_in_lpi_mode = 16,
+ core_irq_tx_path_exit_lpi_mode = 32,
+ core_irq_rx_path_in_lpi_mode = 64,
+ core_irq_rx_path_exit_lpi_mode = 128,
+};
+
/* DMA HW capabilities */
struct dma_features {
unsigned int mbps_10_100;
@@ -208,6 +229,10 @@ struct dma_features {
#define MAC_ENABLE_TX 0x00000008 /* Transmitter Enable */
#define MAC_RNABLE_RX 0x00000004 /* Receiver Enable */
+/* Default LPI timers */
+#define STMMAC_DEFAULT_LIT_LS_TIMER 0x3E8
+#define STMMAC_DEFAULT_TWT_LS_TIMER 0x0
+
struct stmmac_desc_ops {
/* DMA RX descriptor ring initialization */
void (*init_rx_desc) (struct dma_desc *p, unsigned int ring_size,
@@ -278,7 +303,7 @@ struct stmmac_ops {
/* Dump MAC registers */
void (*dump_regs) (void __iomem *ioaddr);
/* Handle extra events on specific interrupts hw dependent */
- void (*host_irq_status) (void __iomem *ioaddr);
+ int (*host_irq_status) (void __iomem *ioaddr);
/* Multicast filter setting */
void (*set_filter) (struct net_device *dev, int id);
/* Flow control setting */
@@ -291,6 +316,10 @@ struct stmmac_ops {
unsigned int reg_n);
void (*get_umac_addr) (void __iomem *ioaddr, unsigned char *addr,
unsigned int reg_n);
+ void (*set_eee_mode) (void __iomem *ioaddr);
+ void (*reset_eee_mode) (void __iomem *ioaddr);
+ void (*set_eee_timer) (void __iomem *ioaddr, int ls, int tw);
+ void (*set_eee_pls) (void __iomem *ioaddr, int link);
};
struct mac_link {
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000.h b/drivers/net/ethernet/stmicro/stmmac/dwmac1000.h
index 23478bf..f90fcb5 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000.h
@@ -36,6 +36,7 @@
#define GMAC_INT_STATUS 0x00000038 /* interrupt status register */
enum dwmac1000_irq_status {
+ lpiis_irq = 0x400,
time_stamp_irq = 0x0200,
mmc_rx_csum_offload_irq = 0x0080,
mmc_tx_irq = 0x0040,
@@ -60,6 +61,25 @@ enum power_event {
power_down = 0x00000001,
};
+/* Energy Efficient Ethernet (EEE)
+ *
+ * LPI status, timer and control register offset
+ */
+#define LPI_CTRL_STATUS 0x0030
+#define LPI_TIMER_CTRL 0x0034
+
+/* LPI control and status defines */
+#define LPI_CTRL_STATUS_LPITXA 0x00080000 /* Enable LPI TX Automate */
+#define LPI_CTRL_STATUS_PLSEN 0x00040000 /* Enable PHY Link Status */
+#define LPI_CTRL_STATUS_PLS 0x00020000 /* PHY Link Status */
+#define LPI_CTRL_STATUS_LPIEN 0x00010000 /* LPI Enable */
+#define LPI_CTRL_STATUS_RLPIST 0x00000200 /* Receive LPI state */
+#define LPI_CTRL_STATUS_TLPIST 0x00000100 /* Transmit LPI state */
+#define LPI_CTRL_STATUS_RLPIEX 0x00000008 /* Receive LPI Exit */
+#define LPI_CTRL_STATUS_RLPIEN 0x00000004 /* Receive LPI Entry */
+#define LPI_CTRL_STATUS_TLPIEX 0x00000002 /* Transmit LPI Exit */
+#define LPI_CTRL_STATUS_TLPIEN 0x00000001 /* Transmit LPI Entry */
+
/* GMAC HW ADDR regs */
#define GMAC_ADDR_HIGH(reg) (((reg > 15) ? 0x00000800 : 0x00000040) + \
(reg * 8))
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
index b5e4d02..bfe0226 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
@@ -194,26 +194,107 @@ static void dwmac1000_pmt(void __iomem *ioaddr, unsigned long mode)
}
-static void dwmac1000_irq_status(void __iomem *ioaddr)
+static int dwmac1000_irq_status(void __iomem *ioaddr)
{
u32 intr_status = readl(ioaddr + GMAC_INT_STATUS);
+ int status = 0;
/* Not used events (e.g. MMC interrupts) are not handled. */
- if ((intr_status & mmc_tx_irq))
- CHIP_DBG(KERN_DEBUG "GMAC: MMC tx interrupt: 0x%08x\n",
+ if ((intr_status & mmc_tx_irq)) {
+ CHIP_DBG(KERN_INFO "GMAC: MMC tx interrupt: 0x%08x\n",
readl(ioaddr + GMAC_MMC_TX_INTR));
- if (unlikely(intr_status & mmc_rx_irq))
- CHIP_DBG(KERN_DEBUG "GMAC: MMC rx interrupt: 0x%08x\n",
+ status |= core_mmc_tx_irq;
+ }
+ if (unlikely(intr_status & mmc_rx_irq)) {
+ CHIP_DBG(KERN_INFO "GMAC: MMC rx interrupt: 0x%08x\n",
readl(ioaddr + GMAC_MMC_RX_INTR));
- if (unlikely(intr_status & mmc_rx_csum_offload_irq))
- CHIP_DBG(KERN_DEBUG "GMAC: MMC rx csum offload: 0x%08x\n",
+ status |= core_mmc_rx_irq;
+ }
+ if (unlikely(intr_status & mmc_rx_csum_offload_irq)) {
+ CHIP_DBG(KERN_INFO "GMAC: MMC rx csum offload: 0x%08x\n",
readl(ioaddr + GMAC_MMC_RX_CSUM_OFFLOAD));
+ status |= core_mmc_rx_csum_offload_irq;
+ }
if (unlikely(intr_status & pmt_irq)) {
- CHIP_DBG(KERN_DEBUG "GMAC: received Magic frame\n");
+ CHIP_DBG(KERN_INFO "GMAC: received Magic frame\n");
/* clear the PMT bits 5 and 6 by reading the PMT
* status register. */
readl(ioaddr + GMAC_PMT);
+ status |= core_irq_receive_pmt_irq;
}
+ /* MAC trx/rx EEE LPI entry/exit interrupts */
+ if (intr_status & lpiis_irq) {
+ /* Clean LPI interrupt by reading the Reg 12 */
+ u32 lpi_status = readl(ioaddr + LPI_CTRL_STATUS);
+
+ if (lpi_status & LPI_CTRL_STATUS_TLPIEN) {
+ CHIP_DBG(KERN_INFO "GMAC TX entered in LPI\n");
+ status |= core_irq_tx_path_in_lpi_mode;
+ }
+ if (lpi_status & LPI_CTRL_STATUS_TLPIEX) {
+ CHIP_DBG(KERN_INFO "GMAC TX exit from LPI\n");
+ status |= core_irq_tx_path_exit_lpi_mode;
+ }
+ if (lpi_status & LPI_CTRL_STATUS_RLPIEN) {
+ CHIP_DBG(KERN_INFO "GMAC RX entered in LPI\n");
+ status |= core_irq_rx_path_in_lpi_mode;
+ }
+ if (lpi_status & LPI_CTRL_STATUS_RLPIEX) {
+ CHIP_DBG(KERN_INFO "GMAC RX exit from LPI\n");
+ status |= core_irq_rx_path_exit_lpi_mode;
+ }
+ }
+
+ return status;
+}
+
+static void dwmac1000_set_eee_mode(void __iomem *ioaddr)
+{
+ u32 value;
+
+ /* Enable the link status receive on RGMII, SGMII ore SMII
+ * receive path and instruct the transmit to enter in LPI
+ * state. */
+ value = readl(ioaddr + LPI_CTRL_STATUS);
+ value |= LPI_CTRL_STATUS_LPIEN | LPI_CTRL_STATUS_LPITXA;
+ writel(value, ioaddr + LPI_CTRL_STATUS);
+}
+
+static void dwmac1000_reset_eee_mode(void __iomem *ioaddr)
+{
+ u32 value;
+
+ value = readl(ioaddr + LPI_CTRL_STATUS);
+ value &= ~(LPI_CTRL_STATUS_LPIEN | LPI_CTRL_STATUS_LPITXA);
+ writel(value, ioaddr + LPI_CTRL_STATUS);
+}
+
+static void dwmac1000_set_eee_pls(void __iomem *ioaddr, int link)
+{
+ u32 value;
+
+ value = readl(ioaddr + LPI_CTRL_STATUS);
+
+ if (link)
+ value |= LPI_CTRL_STATUS_PLS;
+ else
+ value &= ~LPI_CTRL_STATUS_PLS;
+
+ writel(value, ioaddr + LPI_CTRL_STATUS);
+}
+
+static void dwmac1000_set_eee_timer(void __iomem *ioaddr, int ls, int tw)
+{
+ int value = ((tw & 0xffff)) | ((ls & 0x7ff) << 16);
+
+ /* Program the timers in the LPI timer control register:
+ * LS: minimum time (ms) for which the link
+ * status from PHY should be ok before transmitting
+ * the LPI pattern.
+ * TW: minimum time (us) for which the core waits
+ * after it has stopped transmitting the LPI pattern.
+ */
+ writel(value, ioaddr + LPI_TIMER_CTRL);
}
static const struct stmmac_ops dwmac1000_ops = {
@@ -226,6 +307,10 @@ static const struct stmmac_ops dwmac1000_ops = {
.pmt = dwmac1000_pmt,
.set_umac_addr = dwmac1000_set_umac_addr,
.get_umac_addr = dwmac1000_get_umac_addr,
+ .set_eee_mode = dwmac1000_set_eee_mode,
+ .reset_eee_mode = dwmac1000_reset_eee_mode,
+ .set_eee_timer = dwmac1000_set_eee_timer,
+ .set_eee_pls = dwmac1000_set_eee_pls,
};
struct mac_device_info *dwmac1000_setup(void __iomem *ioaddr)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
index 19e0f4e..f83210e 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
@@ -72,9 +72,9 @@ static int dwmac100_rx_ipc_enable(void __iomem *ioaddr)
return 0;
}
-static void dwmac100_irq_status(void __iomem *ioaddr)
+static int dwmac100_irq_status(void __iomem *ioaddr)
{
- return;
+ return 0;
}
static void dwmac100_set_umac_addr(void __iomem *ioaddr, unsigned char *addr,
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac_dma.h b/drivers/net/ethernet/stmicro/stmmac/dwmac_dma.h
index 6e0360f..e678ce3 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac_dma.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac_dma.h
@@ -70,6 +70,7 @@
#define DMA_INTR_DEFAULT_MASK (DMA_INTR_NORMAL | DMA_INTR_ABNORMAL)
/* DMA Status register defines */
+#define DMA_STATUS_GLPII 0x40000000 /* GMAC LPI interrupt */
#define DMA_STATUS_GPI 0x10000000 /* PMT interrupt */
#define DMA_STATUS_GMI 0x08000000 /* MMC interrupt */
#define DMA_STATUS_GLI 0x04000000 /* GMAC Line interface int */
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
index dc20c56..ab4c376 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
@@ -87,6 +87,12 @@ struct stmmac_priv {
#endif
int clk_csr;
int synopsys_id;
+ struct timer_list eee_ctrl_timer;
+ bool tx_path_in_lpi_mode;
+ int lpi_irq;
+ int eee_enabled;
+ int eee_active;
+ int tx_lpi_timer;
};
extern int phyaddr;
@@ -104,6 +110,8 @@ int stmmac_dvr_remove(struct net_device *ndev);
struct stmmac_priv *stmmac_dvr_probe(struct device *device,
struct plat_stmmacenet_data *plat_dat,
void __iomem *addr);
+void stmmac_disable_eee_mode(struct stmmac_priv *priv);
+bool stmmac_eee_init(struct stmmac_priv *priv);
#ifdef CONFIG_HAVE_CLK
static inline int stmmac_clk_enable(struct stmmac_priv *priv)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
index ce43184..76fd61a 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
@@ -93,6 +93,16 @@ static const struct stmmac_stats stmmac_gstrings_stats[] = {
STMMAC_STAT(poll_n),
STMMAC_STAT(sched_timer_n),
STMMAC_STAT(normal_irq_n),
+ STMMAC_STAT(normal_irq_n),
+ STMMAC_STAT(mmc_tx_irq_n),
+ STMMAC_STAT(mmc_rx_irq_n),
+ STMMAC_STAT(mmc_rx_csum_offload_irq_n),
+ STMMAC_STAT(irq_receive_pmt_irq_n),
+ STMMAC_STAT(irq_tx_path_in_lpi_mode_n),
+ STMMAC_STAT(irq_tx_path_exit_lpi_mode_n),
+ STMMAC_STAT(irq_rx_path_in_lpi_mode_n),
+ STMMAC_STAT(irq_rx_path_exit_lpi_mode_n),
+ STMMAC_STAT(phy_eee_wakeup_error_n),
};
#define STMMAC_STATS_LEN ARRAY_SIZE(stmmac_gstrings_stats)
@@ -366,6 +376,11 @@ static void stmmac_get_ethtool_stats(struct net_device *dev,
(*(u32 *)p);
}
}
+ if (priv->eee_enabled) {
+ int val = phy_get_eee_err(priv->phydev);
+ if (val)
+ priv->xstats.phy_eee_wakeup_error_n = val;
+ }
}
for (i = 0; i < STMMAC_STATS_LEN; i++) {
char *p = (char *)priv + stmmac_gstrings_stats[i].stat_offset;
@@ -464,6 +479,46 @@ static int stmmac_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
return 0;
}
+static int stmmac_ethtool_op_get_eee(struct net_device *dev,
+ struct ethtool_eee *edata)
+{
+ struct stmmac_priv *priv = netdev_priv(dev);
+
+ if (!priv->dma_cap.eee)
+ return -EOPNOTSUPP;
+
+ edata->eee_enabled = priv->eee_enabled;
+ edata->eee_active = priv->eee_active;
+ edata->tx_lpi_timer = priv->tx_lpi_timer;
+
+ return phy_ethtool_get_eee(priv->phydev, edata);
+}
+
+static int stmmac_ethtool_op_set_eee(struct net_device *dev,
+ struct ethtool_eee *edata)
+{
+ struct stmmac_priv *priv = netdev_priv(dev);
+
+ priv->eee_enabled = edata->eee_enabled;
+
+ if (!priv->eee_enabled)
+ stmmac_disable_eee_mode(priv);
+ else {
+ /* We are asking for enabling the EEE but it is safe
+ * to verify all by invoking the eee_init function.
+ * In case of failure it will return an error.
+ */
+ priv->eee_enabled = stmmac_eee_init(priv);
+ if (!priv->eee_enabled)
+ return -EOPNOTSUPP;
+
+ /* Do not change tx_lpi_timer in case of failure */
+ priv->tx_lpi_timer = edata->tx_lpi_timer;
+ }
+
+ return phy_ethtool_set_eee(priv->phydev, edata);
+}
+
static const struct ethtool_ops stmmac_ethtool_ops = {
.begin = stmmac_check_if_running,
.get_drvinfo = stmmac_ethtool_getdrvinfo,
@@ -480,6 +535,8 @@ static const struct ethtool_ops stmmac_ethtool_ops = {
.get_strings = stmmac_get_strings,
.get_wol = stmmac_get_wol,
.set_wol = stmmac_set_wol,
+ .get_eee = stmmac_ethtool_op_get_eee,
+ .set_eee = stmmac_ethtool_op_set_eee,
.get_sset_count = stmmac_get_sset_count,
.get_ts_info = ethtool_op_get_ts_info,
};
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index eba49cb..ea3bc09 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -133,6 +133,12 @@ static const u32 default_msg_level = (NETIF_MSG_DRV | NETIF_MSG_PROBE |
NETIF_MSG_LINK | NETIF_MSG_IFUP |
NETIF_MSG_IFDOWN | NETIF_MSG_TIMER);
+#define STMMAC_DEFAULT_LPI_TIMER 1000
+static int eee_timer = STMMAC_DEFAULT_LPI_TIMER;
+module_param(eee_timer, int, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(eee_timer, "LPI tx expiration time in msec");
+#define STMMAC_LPI_TIMER(x) (jiffies + msecs_to_jiffies(x))
+
static irqreturn_t stmmac_interrupt(int irq, void *dev_id);
#ifdef CONFIG_STMMAC_DEBUG_FS
@@ -161,6 +167,8 @@ static void stmmac_verify_args(void)
flow_ctrl = FLOW_OFF;
if (unlikely((pause < 0) || (pause > 0xffff)))
pause = PAUSE_TIME;
+ if (eee_timer < 0)
+ eee_timer = STMMAC_DEFAULT_LPI_TIMER;
}
static void stmmac_clk_csr_set(struct stmmac_priv *priv)
@@ -229,6 +237,85 @@ static inline void stmmac_hw_fix_mac_speed(struct stmmac_priv *priv)
phydev->speed);
}
+static void stmmac_enable_eee_mode(struct stmmac_priv *priv)
+{
+ /* Check and enter in LPI mode */
+ if ((priv->dirty_tx == priv->cur_tx) &&
+ (priv->tx_path_in_lpi_mode == false))
+ priv->hw->mac->set_eee_mode(priv->ioaddr);
+}
+
+void stmmac_disable_eee_mode(struct stmmac_priv *priv)
+{
+ /* Exit and disable EEE in case of we are are in LPI state. */
+ priv->hw->mac->reset_eee_mode(priv->ioaddr);
+ del_timer_sync(&priv->eee_ctrl_timer);
+ priv->tx_path_in_lpi_mode = false;
+}
+
+/**
+ * stmmac_eee_ctrl_timer
+ * @arg : data hook
+ * Description:
+ * If there is no data transfer and if we are not in LPI state,
+ * then MAC Transmitter can be moved to LPI state.
+ */
+static void stmmac_eee_ctrl_timer(unsigned long arg)
+{
+ struct stmmac_priv *priv = (struct stmmac_priv *)arg;
+
+ stmmac_enable_eee_mode(priv);
+ mod_timer(&priv->eee_ctrl_timer, STMMAC_LPI_TIMER(eee_timer));
+}
+
+/**
+ * stmmac_eee_init
+ * @priv: private device pointer
+ * Description:
+ * If the EEE support has been enabled while configuring the driver,
+ * if the GMAC actually supports the EEE (from the HW cap reg) and the
+ * phy can also manage EEE, so enable the LPI state and start the timer
+ * to verify if the tx path can enter in LPI state.
+ */
+bool stmmac_eee_init(struct stmmac_priv *priv)
+{
+ bool ret = false;
+
+ /* MAC core supports the EEE feature. */
+ if (priv->dma_cap.eee) {
+ /* Check if the PHY supports EEE */
+ if (phy_init_eee(priv->phydev, 1))
+ goto out;
+
+ priv->eee_active = 1;
+ init_timer(&priv->eee_ctrl_timer);
+ priv->eee_ctrl_timer.function = stmmac_eee_ctrl_timer;
+ priv->eee_ctrl_timer.data = (unsigned long)priv;
+ priv->eee_ctrl_timer.expires = STMMAC_LPI_TIMER(eee_timer);
+ add_timer(&priv->eee_ctrl_timer);
+
+ priv->hw->mac->set_eee_timer(priv->ioaddr,
+ STMMAC_DEFAULT_LIT_LS_TIMER,
+ priv->tx_lpi_timer);
+
+ pr_info("stmmac: Energy-Efficient Ethernet initialized\n");
+
+ ret = true;
+ }
+out:
+ return ret;
+}
+
+static void stmmac_eee_adjust(struct stmmac_priv *priv)
+{
+ /* When the EEE has been already initialised we have to
+ * modify the PLS bit in the LPI ctrl & status reg according
+ * to the PHY link status. For this reason.
+ */
+ if (priv->eee_enabled)
+ priv->hw->mac->set_eee_pls(priv->ioaddr, priv->phydev->link);
+}
+
/**
* stmmac_adjust_link
* @dev: net device structure
@@ -249,6 +336,7 @@ static void stmmac_adjust_link(struct net_device *dev)
phydev->addr, phydev->link);
spin_lock_irqsave(&priv->lock, flags);
+
if (phydev->link) {
u32 ctrl = readl(priv->ioaddr + MAC_CTRL_REG);
@@ -315,6 +403,8 @@ static void stmmac_adjust_link(struct net_device *dev)
if (new_state && netif_msg_link(priv))
phy_print_status(phydev);
+ stmmac_eee_adjust(priv);
+
spin_unlock_irqrestore(&priv->lock, flags);
DBG(probe, DEBUG, "stmmac_adjust_link: exiting\n");
@@ -332,7 +422,7 @@ static int stmmac_init_phy(struct net_device *dev)
{
struct stmmac_priv *priv = netdev_priv(dev);
struct phy_device *phydev;
- char phy_id[MII_BUS_ID_SIZE + 3];
+ char phy_id_fmt[MII_BUS_ID_SIZE + 3];
char bus_id[MII_BUS_ID_SIZE];
int interface = priv->plat->interface;
priv->oldlink = 0;
@@ -346,11 +436,12 @@ static int stmmac_init_phy(struct net_device *dev)
snprintf(bus_id, MII_BUS_ID_SIZE, "stmmac-%x",
priv->plat->bus_id);
- snprintf(phy_id, MII_BUS_ID_SIZE + 3, PHY_ID_FMT, bus_id,
+ snprintf(phy_id_fmt, MII_BUS_ID_SIZE + 3, PHY_ID_FMT, bus_id,
priv->plat->phy_addr);
- pr_debug("stmmac_init_phy: trying to attach to %s\n", phy_id);
+ pr_debug("stmmac_init_phy: trying to attach to %s\n", phy_id_fmt);
- phydev = phy_connect(dev, phy_id, &stmmac_adjust_link, 0, interface);
+ phydev = phy_connect(dev, phy_id_fmt, &stmmac_adjust_link, 0,
+ interface);
if (IS_ERR(phydev)) {
pr_err("%s: Could not attach to PHY\n", dev->name);
@@ -689,6 +780,11 @@ static void stmmac_tx(struct stmmac_priv *priv)
}
netif_tx_unlock(priv->dev);
}
+
+ if ((priv->eee_enabled) && (!priv->tx_path_in_lpi_mode)) {
+ stmmac_enable_eee_mode(priv);
+ mod_timer(&priv->eee_ctrl_timer, STMMAC_LPI_TIMER(eee_timer));
+ }
spin_unlock(&priv->tx_lock);
}
@@ -1027,6 +1123,17 @@ static int stmmac_open(struct net_device *dev)
}
}
+ /* Request the IRQ lines */
+ if (priv->lpi_irq != -ENXIO) {
+ ret = request_irq(priv->lpi_irq, stmmac_interrupt, IRQF_SHARED,
+ dev->name, dev);
+ if (unlikely(ret < 0)) {
+ pr_err("%s: ERROR: allocating the LPI IRQ %d (%d)\n",
+ __func__, priv->lpi_irq, ret);
+ goto open_error_lpiirq;
+ }
+ }
+
/* Enable the MAC Rx/Tx */
stmmac_set_mac(priv->ioaddr, true);
@@ -1062,12 +1169,19 @@ static int stmmac_open(struct net_device *dev)
if (priv->phydev)
phy_start(priv->phydev);
+ priv->tx_lpi_timer = STMMAC_DEFAULT_TWT_LS_TIMER;
+ priv->eee_enabled = stmmac_eee_init(priv);
+
napi_enable(&priv->napi);
skb_queue_head_init(&priv->rx_recycle);
netif_start_queue(dev);
return 0;
+open_error_lpiirq:
+ if (priv->wol_irq != dev->irq)
+ free_irq(priv->wol_irq, dev);
+
open_error_wolirq:
free_irq(dev->irq, dev);
@@ -1093,6 +1207,9 @@ static int stmmac_release(struct net_device *dev)
{
struct stmmac_priv *priv = netdev_priv(dev);
+ if (priv->eee_enabled)
+ del_timer_sync(&priv->eee_ctrl_timer);
+
/* Stop and disconnect the PHY */
if (priv->phydev) {
phy_stop(priv->phydev);
@@ -1115,6 +1232,8 @@ static int stmmac_release(struct net_device *dev)
free_irq(dev->irq, dev);
if (priv->wol_irq != dev->irq)
free_irq(priv->wol_irq, dev);
+ if (priv->lpi_irq != -ENXIO)
+ free_irq(priv->lpi_irq, dev);
/* Stop TX/RX DMA and clear the descriptors */
priv->hw->dma->stop_tx(priv->ioaddr);
@@ -1164,6 +1283,9 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
spin_lock(&priv->tx_lock);
+ if (priv->tx_path_in_lpi_mode)
+ stmmac_disable_eee_mode(priv);
+
entry = priv->cur_tx % txsize;
#ifdef STMMAC_XMIT_DEBUG
@@ -1540,10 +1662,37 @@ static irqreturn_t stmmac_interrupt(int irq, void *dev_id)
return IRQ_NONE;
}
- if (priv->plat->has_gmac)
- /* To handle GMAC own interrupts */
- priv->hw->mac->host_irq_status((void __iomem *) dev->base_addr);
+ /* To handle GMAC own interrupts */
+ if (priv->plat->has_gmac) {
+ int status = priv->hw->mac->host_irq_status((void __iomem *)
+ dev->base_addr);
+ if (unlikely(status)) {
+ if (status & core_mmc_tx_irq)
+ priv->xstats.mmc_tx_irq_n++;
+ if (status & core_mmc_rx_irq)
+ priv->xstats.mmc_rx_irq_n++;
+ if (status & core_mmc_rx_csum_offload_irq)
+ priv->xstats.mmc_rx_csum_offload_irq_n++;
+ if (status & core_irq_receive_pmt_irq)
+ priv->xstats.irq_receive_pmt_irq_n++;
+
+ /* For LPI we need to save the tx status */
+ if (status & core_irq_tx_path_in_lpi_mode) {
+ priv->xstats.irq_tx_path_in_lpi_mode_n++;
+ priv->tx_path_in_lpi_mode = true;
+ }
+ if (status & core_irq_tx_path_exit_lpi_mode) {
+ priv->xstats.irq_tx_path_exit_lpi_mode_n++;
+ priv->tx_path_in_lpi_mode = false;
+ }
+ if (status & core_irq_rx_path_in_lpi_mode)
+ priv->xstats.irq_rx_path_in_lpi_mode_n++;
+ if (status & core_irq_rx_path_exit_lpi_mode)
+ priv->xstats.irq_rx_path_exit_lpi_mode_n++;
+ }
+ }
+ /* To handle DMA interrupts */
stmmac_dma_interrupt(priv);
return IRQ_HANDLED;
@@ -2155,6 +2304,9 @@ static int __init stmmac_cmdline_opt(char *str)
} else if (!strncmp(opt, "pause:", 6)) {
if (kstrtoint(opt + 6, 0, &pause))
goto err;
+ } else if (!strncmp(opt, "eee_timer:", 6)) {
+ if (kstrtoint(opt + 10, 0, &eee_timer))
+ goto err;
#ifdef CONFIG_STMMAC_TIMER
} else if (!strncmp(opt, "tmrate:", 7)) {
if (kstrtoint(opt + 7, 0, &tmrate))
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
index 20eb502..7d36163 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
@@ -156,6 +156,8 @@ static int stmmac_pltfr_probe(struct platform_device *pdev)
if (priv->wol_irq == -ENXIO)
priv->wol_irq = priv->dev->irq;
+ priv->lpi_irq = platform_get_irq_byname(pdev, "eth_lpi");
+
platform_set_drvdata(pdev, priv->dev);
pr_debug("STMMAC platform driver registration completed");
--
1.7.4.4
^ permalink raw reply related
* [net-next.git 2/4] stmmac: update the driver Documentation and add EEE
From: Giuseppe CAVALLARO @ 2012-06-28 7:14 UTC (permalink / raw)
To: netdev
Cc: eric.dumazet, rayagond, davem, yuvalmin, bhutchings,
Giuseppe Cavallaro
In-Reply-To: <1340867678-18375-1-git-send-email-peppe.cavallaro@st.com>
This patch updates the stmmac's documentation adding
some missing files in the section used to describe the
internal driver's structure.
Also the patch adds a new section to describe the EEE support.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
---
Documentation/networking/stmmac.txt | 36 +++++++++++++++++++++++++++++-----
1 files changed, 30 insertions(+), 6 deletions(-)
diff --git a/Documentation/networking/stmmac.txt b/Documentation/networking/stmmac.txt
index 5cb9a19..c676b9c 100644
--- a/Documentation/networking/stmmac.txt
+++ b/Documentation/networking/stmmac.txt
@@ -257,9 +257,11 @@ reset procedure etc).
o Makefile
o stmmac_main.c: main network device driver;
o stmmac_mdio.c: mdio functions;
+ o stmmac_pci: PCI driver;
+ o stmmac_platform.c: platform driver
o stmmac_ethtool.c: ethtool support;
o stmmac_timer.[ch]: timer code used for mitigating the driver dma interrupts
- Only tested on ST40 platforms based.
+ (only tested on ST40 platforms based);
o stmmac.h: private driver structure;
o common.h: common definitions and VFTs;
o descs.h: descriptor structure definitions;
@@ -269,9 +271,11 @@ reset procedure etc).
o dwmac100_core: MAC 100 core and dma code;
o dwmac100_dma.c: dma funtions for the MAC chip;
o dwmac1000.h: specific header file for the MAC;
- o dwmac_lib.c: generic DMA functions shared among chips
- o enh_desc.c: functions for handling enhanced descriptors
- o norm_desc.c: functions for handling normal descriptors
+ o dwmac_lib.c: generic DMA functions shared among chips;
+ o enh_desc.c: functions for handling enhanced descriptors;
+ o norm_desc.c: functions for handling normal descriptors;
+ o chain_mode.c/ring_mode.c:: functions to manage RING/CHAINED modes;
+ o mmc_core.c/mmc.h: Management MAC Counters;
5) Debug Information
@@ -304,7 +308,27 @@ All these are only useful during the developing stage
and should never enabled inside the code for general usage.
In fact, these can generate an huge amount of debug messages.
-6) TODO:
+6) Energy Efficient Ethernet
+
+Energy Efficient Ethernet(EEE) enables IEEE 802.3 MAC sublayer along
+with a family of Physical layer to operate in the Low power Idle(LPI)
+mode. The EEE mode supports the IEEE 802.3 MAC operation at 100Mbps,
+1000Mbps & 10Gbps.
+
+The LPI mode allows power saving by switching off parts of the
+communication device functionality when there is no data to be
+transmitted & received. The system on both the side of the link can
+disable some functionalities & save power during the period of low-link
+utilization. The MAC controls whether the system should enter or exit
+the LPI mode & communicate this to PHY.
+
+As soon as the interface is opened, the driver verifies if the EEE can
+be supported. This is done by looking at both the DMA HW capability
+register and the PHY devices MCD registers.
+To enter in Tx LPI mode the driver needs to have a software timer
+that enable and disable the LPI mode when there is nothing to be
+transmitted.
+
+7) TODO:
o XGMAC is not supported.
- o Add the EEE - Energy Efficient Ethernet
o Add the PTP - precision time protocol
--
1.7.4.4
^ permalink raw reply related
* [net-next.git 0/4] EEE for PAL and stmmac (V7)
From: Giuseppe CAVALLARO @ 2012-06-28 7:14 UTC (permalink / raw)
To: netdev
Cc: eric.dumazet, rayagond, davem, yuvalmin, bhutchings,
Giuseppe Cavallaro
These patches add the EEE support in the stmmac device driver
restoring an old work I had done some months ago and not
completed in time.
I've tested all on ST STB with the IC+ 101G PHY device that has
this feature.
The initial EEE support for the stmmac has been written by Rayagond
but I have reworked all his code adding new parts and especially
performing tests on a real hardware. Thx Rayagond!
In these patches, we can see that the stmmac supports the EEE
only if the DMA HW capability register says that this
feature is actually available. In that case, the driver can enter
in the Tx LPI mode by using a timer as recommended by Synopsys.
Note that EEE is supported in new chip generations; in particular
I used the 3.61a.
At any rate, further information about how the driver treats the EEE
can be found in the stmmac.txt file (there is a patch for that).
Another patch is for Physical Abstraction Layer now able to
manage the MMD registers (clause 45); it also provides the ethtool
support to manage supported/advertisement/lp adv features.
v3: fixed the "stmmac: do not use strict_strtoul but kstrtoint"
to use the kstrtoint.
v4: fixed the function to enable the EEE and add a check that verifies
if the link auto-negotiated matches with the bits in the adv and lp
registers.
v5: reviewed the way to get the negotiated settings
v6: fixed a broken return value in the phy_eee_init function
v7: do not remove the MDIO_AN_EEE_ADV_100TX and MDIO_AN_EEE_ADV_1000T
and fixed the eee_{cap,lp,adv} declaration as "int" instead of u16.
Giuseppe Cavallaro (4):
stmmac: do not use strict_strtoul but kstrtoint
stmmac: update the driver Documentation and add EEE
stmmac: add the Energy Efficient Ethernet support
phy: add the EEE support and the way to access to the MMD registers.
Documentation/networking/stmmac.txt | 36 ++-
drivers/net/ethernet/stmicro/stmmac/common.h | 31 ++-
drivers/net/ethernet/stmicro/stmmac/dwmac1000.h | 20 ++
.../net/ethernet/stmicro/stmmac/dwmac1000_core.c | 101 +++++++-
.../net/ethernet/stmicro/stmmac/dwmac100_core.c | 4 +-
drivers/net/ethernet/stmicro/stmmac/dwmac_dma.h | 1 +
drivers/net/ethernet/stmicro/stmmac/stmmac.h | 8 +
.../net/ethernet/stmicro/stmmac/stmmac_ethtool.c | 57 ++++
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 193 ++++++++++++--
.../net/ethernet/stmicro/stmmac/stmmac_platform.c | 2 +
drivers/net/phy/phy.c | 281 ++++++++++++++++++++
include/linux/mdio.h | 28 ++-
include/linux/mii.h | 9 +
include/linux/phy.h | 5 +
14 files changed, 731 insertions(+), 45 deletions(-)
--
1.7.4.4
^ permalink raw reply
* [net-next.git 1/4 (v3)] stmmac: do not use strict_strtoul but kstrtoint
From: Giuseppe CAVALLARO @ 2012-06-28 7:14 UTC (permalink / raw)
To: netdev
Cc: eric.dumazet, rayagond, davem, yuvalmin, bhutchings,
Giuseppe Cavallaro
In-Reply-To: <1340867678-18375-1-git-send-email-peppe.cavallaro@st.com>
This patch replaces the obsolete strict_strtoul with kstrtoint.
v2: also removed casting on kstrtoul.
v3: use kstrtoint instead of kstrtoul due to all vars are integer.
thanks to E. Dumazet.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
---
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 27 +++++++-------------
1 files changed, 10 insertions(+), 17 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 590e95b..eba49cb 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2129,42 +2129,35 @@ static int __init stmmac_cmdline_opt(char *str)
return -EINVAL;
while ((opt = strsep(&str, ",")) != NULL) {
if (!strncmp(opt, "debug:", 6)) {
- if (strict_strtoul(opt + 6, 0, (unsigned long *)&debug))
+ if (kstrtoint(opt + 6, 0, &debug))
goto err;
} else if (!strncmp(opt, "phyaddr:", 8)) {
- if (strict_strtoul(opt + 8, 0,
- (unsigned long *)&phyaddr))
+ if (kstrtoint(opt + 8, 0, &phyaddr))
goto err;
} else if (!strncmp(opt, "dma_txsize:", 11)) {
- if (strict_strtoul(opt + 11, 0,
- (unsigned long *)&dma_txsize))
+ if (kstrtoint(opt + 11, 0, &dma_txsize))
goto err;
} else if (!strncmp(opt, "dma_rxsize:", 11)) {
- if (strict_strtoul(opt + 11, 0,
- (unsigned long *)&dma_rxsize))
+ if (kstrtoint(opt + 11, 0, &dma_rxsize))
goto err;
} else if (!strncmp(opt, "buf_sz:", 7)) {
- if (strict_strtoul(opt + 7, 0,
- (unsigned long *)&buf_sz))
+ if (kstrtoint(opt + 7, 0, &buf_sz))
goto err;
} else if (!strncmp(opt, "tc:", 3)) {
- if (strict_strtoul(opt + 3, 0, (unsigned long *)&tc))
+ if (kstrtoint(opt + 3, 0, &tc))
goto err;
} else if (!strncmp(opt, "watchdog:", 9)) {
- if (strict_strtoul(opt + 9, 0,
- (unsigned long *)&watchdog))
+ if (kstrtoint(opt + 9, 0, &watchdog))
goto err;
} else if (!strncmp(opt, "flow_ctrl:", 10)) {
- if (strict_strtoul(opt + 10, 0,
- (unsigned long *)&flow_ctrl))
+ if (kstrtoint(opt + 10, 0, &flow_ctrl))
goto err;
} else if (!strncmp(opt, "pause:", 6)) {
- if (strict_strtoul(opt + 6, 0, (unsigned long *)&pause))
+ if (kstrtoint(opt + 6, 0, &pause))
goto err;
#ifdef CONFIG_STMMAC_TIMER
} else if (!strncmp(opt, "tmrate:", 7)) {
- if (strict_strtoul(opt + 7, 0,
- (unsigned long *)&tmrate))
+ if (kstrtoint(opt + 7, 0, &tmrate))
goto err;
#endif
}
--
1.7.4.4
^ permalink raw reply related
* Re: LOCKDEP complaints in l2tp_xmit_skb()
From: Eric Dumazet @ 2012-06-28 6:56 UTC (permalink / raw)
To: Tom Parkin; +Cc: netdev
In-Reply-To: <20120627111152.GA2531@raven>
On Wed, 2012-06-27 at 12:11 +0100, Tom Parkin wrote:
> In testing L2TP ethernet pseudowires I have observed some complaints
> from lockdep due to circular/recursive locking in l2tp_xmit_skb().
>
> I'm testing the -net tree, which includes Eric's recent patches to
> squash another lockdep error by converting l2tp to LLTX. Git hash
> d7ffde35e31a811.
>
> My test setup consists of two AMD64 boxes, both running 32bit kernels.
> One box is SMP, the other UP. My test procedure consists of creating
> an L2TP tunnel containing N ethernet pseudowires. I then run N iperf
> sessions across the N pseudowires. The simplest configuration is:
>
>
> [On HOST A]
> ip l2tp add tunnel \
> tunnel_id 1 \
> peer_tunnel_id 1 \
> local <HOST A ip> \
> remote <HOST B ip> \
> udp_sport 9999 \
> udp_dport 9999
> ip add session \
> tunnel_id 1 \
> session_id 1 \
> peer_session_id 1
> ip addr add 172.16.0.1 \
> peer 172.16.0.2/24 \
> broadcast 172.16.0.255 \
> dev l2tpeth0
> ip link set l2tpeth0 up
> iperf -s -B 172.16.0.1
>
> [On HOST B]
> ip l2tp add tunnel \
> tunnel_id 1 \
> peer_tunnel_id 1 \
> local <HOST B ip> \
> remote <HOST A ip> \
> udp_sport 9999 \
> udp_dport 9999
> ip add session \
> tunnel_id 1 \
> session_id 1 \
> peer_session_id 1
> ip addr add 172.16.0.2 \
> peer 172.16.0.1/24 \
> broadcast 172.16.0.255 \
> dev l2tpeth0
> ip link set l2tpeth0 up
> iperf -c 172.16.0.1
>
>
> If I run four concurrent iperf sessions across four pseudowires I
> see lockdep complaints on both SMP and UP boxes.
>
> Lockdep output for the AMD64 SMP machine:
>
> ======================================================
> [ INFO: possible circular locking dependency detected ]
> 3.5.0-rc2-net-lockdep-u64-sync-006-+ #2 Not tainted
> -------------------------------------------------------
> swapper/1/0 is trying to acquire lock:
> (slock-AF_INET){+.-...}, at: [<f85f5bff>] l2tp_xmit_skb+0x13f/0x8e0 [l2tp_core]
>
> but task is already holding lock:
> (&(&sch->busylock)->rlock){+.-...}, at: [<c14fb1b2>] dev_queue_xmit+0xb42/0xbd0
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #1 (&(&sch->busylock)->rlock){+.-...}:
> [<c10a8b48>] lock_acquire+0x88/0x120
> [<c16157bb>] _raw_spin_lock+0x3b/0x70
> [<c1535cf8>] __inet_hash_nolisten+0xb8/0x140
> [<c1536b77>] __inet_hash_connect+0x267/0x2c0
> [<c1536c10>] inet_hash_connect+0x40/0x50
> [<c154e4d4>] tcp_v4_connect+0x2c4/0x510
> [<c156293f>] inet_stream_connect+0x1ff/0x380
> [<c14e30c1>] sys_connect+0xc1/0xe0
> [<c14e3d13>] sys_socketcall+0xe3/0x2e0
> [<c161d89f>] sysenter_do_call+0x12/0x38
>
> -> #0 (slock-AF_INET){+.-...}:
> [<c10a78cc>] __lock_acquire+0xaec/0x17d0
> [<c10a8b48>] lock_acquire+0x88/0x120
> [<c16157bb>] _raw_spin_lock+0x3b/0x70
> [<f85f5bff>] l2tp_xmit_skb+0x13f/0x8e0 [l2tp_core]
> [<f851432d>] l2tp_eth_dev_xmit+0x2d/0x40 [l2tp_eth]
> [<c14fa32f>] dev_hard_start_xmit+0x49f/0x7e0
> [<c1515819>] sch_direct_xmit+0xa9/0x250
> [<c14fa835>] dev_queue_xmit+0x1c5/0xbd0
> [<c159442c>] ip6_finish_output2+0x11c/0x620
> [<c159813f>] ip6_finish_output+0x7f/0x1e0
> [<c15982ea>] ip6_output+0x4a/0x1f0
> [<c15bbddc>] mld_sendpack+0x21c/0x530
> [<c15bc817>] mld_ifc_timer_expire+0x187/0x260
> [<c1055d10>] run_timer_softirq+0x140/0x370
> [<c104da27>] __do_softirq+0x97/0x1f0
>
> other info that might help us debug this:
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(&(&sch->busylock)->rlock);
> lock(slock-AF_INET);
> lock(&(&sch->busylock)->rlock);
> lock(slock-AF_INET);
>
> *** DEADLOCK ***
>
> 5 locks held by swapper/1/0:
> #0: (&idev->mc_ifc_timer){+.-...}, at: [<c1055c88>] run_timer_softirq+0xb8/0x370
> #1: (rcu_read_lock){.+.+..}, at: [<c15bbbc0>] mld_sendpack+0x0/0x530
> #2: (rcu_read_lock){.+.+..}, at: [<c159434f>] ip6_finish_output2+0x3f/0x620
> #3: (rcu_read_lock_bh){.+....}, at: [<c14fa670>] dev_queue_xmit+0x0/0xbd0
> #4: (&(&sch->busylock)->rlock){+.-...}, at: [<c14fb1b2>] dev_queue_xmit+0xb42/0xbd0
>
> stack backtrace:
> Pid: 0, comm: swapper/1 Not tainted 3.5.0-rc2-net-lockdep-u64-sync-006-+ #2
> Call Trace:
> [<c160b540>] print_circular_bug+0x1b4/0x1be
> [<c10a78cc>] __lock_acquire+0xaec/0x17d0
> [<c10a8b48>] lock_acquire+0x88/0x120
> [<f85f5bff>] ? l2tp_xmit_skb+0x13f/0x8e0 [l2tp_core]
> [<c16157bb>] _raw_spin_lock+0x3b/0x70
> [<f85f5bff>] ? l2tp_xmit_skb+0x13f/0x8e0 [l2tp_core]
> [<f85f5bff>] l2tp_xmit_skb+0x13f/0x8e0 [l2tp_core]
> [<f851432d>] l2tp_eth_dev_xmit+0x2d/0x40 [l2tp_eth]
> [<c14fa32f>] dev_hard_start_xmit+0x49f/0x7e0
> [<c14f9ee1>] ? dev_hard_start_xmit+0x51/0x7e0
> [<c1515819>] sch_direct_xmit+0xa9/0x250
> [<c16157e1>] ? _raw_spin_lock+0x61/0x70
> [<c14fa835>] dev_queue_xmit+0x1c5/0xbd0
> [<c14fa670>] ? dev_hard_start_xmit+0x7e0/0x7e0
> [<c159442c>] ip6_finish_output2+0x11c/0x620
> [<c159434f>] ? ip6_finish_output2+0x3f/0x620
> [<c159813f>] ip6_finish_output+0x7f/0x1e0
> [<c15982ea>] ip6_output+0x4a/0x1f0
> [<c15a6ae0>] ? ip6_blackhole_route+0x2c0/0x2c0
> [<c15bbddc>] mld_sendpack+0x21c/0x530
> [<c15bbbc0>] ? igmp6_group_added+0x170/0x170
> [<c15bc817>] mld_ifc_timer_expire+0x187/0x260
> [<c1055d10>] run_timer_softirq+0x140/0x370
> [<c1055c88>] ? run_timer_softirq+0xb8/0x370
> [<c1085776>] ? rebalance_domains+0x1b6/0x2a0
> [<c15bc690>] ? igmp6_timer_handler+0x80/0x80
> [<c104da27>] __do_softirq+0x97/0x1f0
> [<c104d990>] ? local_bh_enable_ip+0xd0/0xd0
> <IRQ> [<c104ddce>] ? irq_exit+0x7e/0xa0
> [<c161e0f9>] ? smp_apic_timer_interrupt+0x59/0x88
> [<c12fb498>] ? trace_hardirqs_off_thunk+0xc/0x14
> [<c1616882>] ? apic_timer_interrupt+0x36/0x3c
> [<c10380d5>] ? native_safe_halt+0x5/0x10
> [<c1018bdf>] ? default_idle+0x4f/0x1e0
> [<c1018dc1>] ? amd_e400_idle+0x51/0x100
> [<c10199c9>] ? cpu_idle+0xb9/0xe0
> [<c16038fc>] ? start_secondary+0x1ea/0x1f0
>
>
>
> And on AMD64 UP machine:
>
> ============================================
> INFO: possible recursive locking detected ]
> .5.0-rc2-net-lockdep-u64-sync-006-+ #2 Not tainted
> --------------------------------------------
> wapper/0/0 is trying to acquire lock:
> (slock-AF_INET){+.-...}, at: [<f864fbff>] l2tp_xmit_skb+0x13f/0x8e0 [l2tp_core]
>
> ut task is already holding lock:
> (slock-AF_INET){+.-...}, at: [<c154c177>] tcp_delack_timer+0x17/0x1e0
>
> ther info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock(slock-AF_INET);
> lock(slock-AF_INET);
>
> *** DEADLOCK ***
>
> May be due to missing lock nesting notation
>
> locks held by swapper/0/0:
> #0: (&icsk->icsk_delack_timer){+.-...}, at: [<c1055c88>] run_timer_softirq+0xb8/0x370
> #1: (slock-AF_INET){+.-...}, at: [<c154c177>] tcp_delack_timer+0x17/0x1e0
> #2: (rcu_read_lock){.+.+..}, at: [<c1531bf0>] ip_queue_xmit+0x0/0x610
> #3: (rcu_read_lock){.+.+..}, at: [<c1531456>] ip_finish_output+0x106/0x710
> #4: (rcu_read_lock_bh){.+....}, at: [<c14fa670>] dev_queue_xmit+0x0/0xbd0
>
> tack backtrace:
> id: 0, comm: swapper/0 Not tainted 3.5.0-rc2-net-lockdep-u64-sync-006-+ #2
> all Trace:
> [<c10a7b32>] __lock_acquire+0xd52/0x17d0
> [<c1017ba8>] ? sched_clock+0x8/0x10
> [<c107edbb>] ? sched_clock_local+0xcb/0x1c0
> [<c10a8b48>] lock_acquire+0x88/0x120
> [<f864fbff>] ? l2tp_xmit_skb+0x13f/0x8e0 [l2tp_core]
> [<c16157bb>] _raw_spin_lock+0x3b/0x70
> [<f864fbff>] ? l2tp_xmit_skb+0x13f/0x8e0 [l2tp_core]
> [<f864fbff>] l2tp_xmit_skb+0x13f/0x8e0 [l2tp_core]
> [<f853032d>] l2tp_eth_dev_xmit+0x2d/0x40 [l2tp_eth]
> [<c14fa32f>] dev_hard_start_xmit+0x49f/0x7e0
> [<c14f9ee1>] ? dev_hard_start_xmit+0x51/0x7e0
> [<c1515819>] sch_direct_xmit+0xa9/0x250
> [<c16157e1>] ? _raw_spin_lock+0x61/0x70
> [<c14fa835>] dev_queue_xmit+0x1c5/0xbd0
> [<c14fa670>] ? dev_hard_start_xmit+0x7e0/0x7e0
> [<c15065f7>] neigh_resolve_output+0x117/0x230
> [<c1514880>] ? eth_rebuild_header+0x80/0x80
> [<c1531612>] ip_finish_output+0x2c2/0x710
> [<c1531456>] ? ip_finish_output+0x106/0x710
> [<c1532770>] ? ip_output+0x60/0x120
> [<c10a585b>] ? trace_hardirqs_on+0xb/0x10
> [<c153278b>] ip_output+0x7b/0x120
> [<c1531b95>] ip_local_out+0x25/0x80
> [<c1531d73>] ip_queue_xmit+0x183/0x610
> [<c1531bf0>] ? ip_local_out+0x80/0x80
> [<c154ecb5>] ? tcp_md5_do_lookup+0x125/0x170
> [<c15498c6>] tcp_transmit_skb+0x396/0x970
> [<c154bb12>] ? tcp_send_ack+0x32/0x100
> [<c154bb9d>] tcp_send_ack+0xbd/0x100
> [<c154c271>] tcp_delack_timer+0x111/0x1e0
> [<c1055d10>] run_timer_softirq+0x140/0x370
> [<c1055c88>] ? run_timer_softirq+0xb8/0x370
> [<c154c160>] ? tcp_out_of_resources+0xb0/0xb0
> [<c14f88cc>] ? net_rx_action+0x10c/0x210
> [<c104da27>] __do_softirq+0x97/0x1f0
> [<c104d990>] ? local_bh_enable_ip+0xd0/0xd0
> <IRQ> [<c104ddce>] ? irq_exit+0x7e/0xa0
> [<c161e02b>] ? do_IRQ+0x4b/0xc0
> [<c161de75>] ? common_interrupt+0x35/0x3c
> [<c10380d5>] ? native_safe_halt+0x5/0x10
> [<c1018bdf>] ? default_idle+0x4f/0x1e0
> [<c1018dc1>] ? amd_e400_idle+0x51/0x100
> [<c10199c9>] ? cpu_idle+0xb9/0xe0
> [<c15eab3e>] ? rest_init+0x112/0x124
> [<c15eaa2c>] ? __read_lock_failed+0x14/0x14
> [<c1907a11>] ? start_kernel+0x376/0x37c
> [<c19074d6>] ? repair_env_string+0x51/0x51
> [<c19072f8>] ? i386_start_kernel+0x9b/0xa2
>
Hi Tom
Could you please test following patch ?
Thanks !
[PATCH] net: Qdisc busylock gets its own lockdep class
Tom Parkin reported following LOCKDEP splat :
======================================================
[ INFO: possible circular locking dependency detected ]
3.5.0-rc2-net-lockdep-u64-sync-006-+ #2 Not tainted
-------------------------------------------------------
swapper/1/0 is trying to acquire lock:
(slock-AF_INET){+.-...}, at: [<f85f5bff>] l2tp_xmit_skb+0x13f/0x8e0
[l2tp_core]
but task is already holding lock:
(&(&sch->busylock)->rlock){+.-...}, at: [<c14fb1b2>] dev_queue_xmit
+0xb42/0xbd0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&(&sch->busylock)->rlock){+.-...}:
[<c10a8b48>] lock_acquire+0x88/0x120
[<c16157bb>] _raw_spin_lock+0x3b/0x70
[<c1535cf8>] __inet_hash_nolisten+0xb8/0x140
[<c1536b77>] __inet_hash_connect+0x267/0x2c0
[<c1536c10>] inet_hash_connect+0x40/0x50
[<c154e4d4>] tcp_v4_connect+0x2c4/0x510
[<c156293f>] inet_stream_connect+0x1ff/0x380
[<c14e30c1>] sys_connect+0xc1/0xe0
[<c14e3d13>] sys_socketcall+0xe3/0x2e0
[<c161d89f>] sysenter_do_call+0x12/0x38
-> #0 (slock-AF_INET){+.-...}:
[<c10a78cc>] __lock_acquire+0xaec/0x17d0
[<c10a8b48>] lock_acquire+0x88/0x120
[<c16157bb>] _raw_spin_lock+0x3b/0x70
[<f85f5bff>] l2tp_xmit_skb+0x13f/0x8e0 [l2tp_core]
[<f851432d>] l2tp_eth_dev_xmit+0x2d/0x40 [l2tp_eth]
[<c14fa32f>] dev_hard_start_xmit+0x49f/0x7e0
[<c1515819>] sch_direct_xmit+0xa9/0x250
[<c14fa835>] dev_queue_xmit+0x1c5/0xbd0
[<c159442c>] ip6_finish_output2+0x11c/0x620
[<c159813f>] ip6_finish_output+0x7f/0x1e0
[<c15982ea>] ip6_output+0x4a/0x1f0
[<c15bbddc>] mld_sendpack+0x21c/0x530
[<c15bc817>] mld_ifc_timer_expire+0x187/0x260
[<c1055d10>] run_timer_softirq+0x140/0x370
[<c104da27>] __do_softirq+0x97/0x1f0
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&(&sch->busylock)->rlock);
lock(slock-AF_INET);
lock(&(&sch->busylock)->rlock);
lock(slock-AF_INET);
*** DEADLOCK ***
5 locks held by swapper/1/0:
#0: (&idev->mc_ifc_timer){+.-...}, at: [<c1055c88>] run_timer_softirq
+0xb8/0x370
#1: (rcu_read_lock){.+.+..}, at: [<c15bbbc0>] mld_sendpack+0x0/0x530
#2: (rcu_read_lock){.+.+..}, at: [<c159434f>] ip6_finish_output2
+0x3f/0x620
#3: (rcu_read_lock_bh){.+....}, at: [<c14fa670>] dev_queue_xmit
+0x0/0xbd0
#4: (&(&sch->busylock)->rlock){+.-...}, at: [<c14fb1b2>]
dev_queue_xmit+0xb42/0xbd0
stack backtrace:
Pid: 0, comm: swapper/1 Not tainted 3.5.0-rc2-net-lockdep-u64-sync-006-+
#2
Call Trace:
[<c160b540>] print_circular_bug+0x1b4/0x1be
[<c10a78cc>] __lock_acquire+0xaec/0x17d0
[<c10a8b48>] lock_acquire+0x88/0x120
[<f85f5bff>] ? l2tp_xmit_skb+0x13f/0x8e0 [l2tp_core]
[<c16157bb>] _raw_spin_lock+0x3b/0x70
[<f85f5bff>] ? l2tp_xmit_skb+0x13f/0x8e0 [l2tp_core]
[<f85f5bff>] l2tp_xmit_skb+0x13f/0x8e0 [l2tp_core]
[<f851432d>] l2tp_eth_dev_xmit+0x2d/0x40 [l2tp_eth]
[<c14fa32f>] dev_hard_start_xmit+0x49f/0x7e0
[<c14f9ee1>] ? dev_hard_start_xmit+0x51/0x7e0
[<c1515819>] sch_direct_xmit+0xa9/0x250
[<c16157e1>] ? _raw_spin_lock+0x61/0x70
[<c14fa835>] dev_queue_xmit+0x1c5/0xbd0
[<c14fa670>] ? dev_hard_start_xmit+0x7e0/0x7e0
[<c159442c>] ip6_finish_output2+0x11c/0x620
[<c159434f>] ? ip6_finish_output2+0x3f/0x620
[<c159813f>] ip6_finish_output+0x7f/0x1e0
[<c15982ea>] ip6_output+0x4a/0x1f0
[<c15a6ae0>] ? ip6_blackhole_route+0x2c0/0x2c0
[<c15bbddc>] mld_sendpack+0x21c/0x530
[<c15bbbc0>] ? igmp6_group_added+0x170/0x170
[<c15bc817>] mld_ifc_timer_expire+0x187/0x260
[<c1055d10>] run_timer_softirq+0x140/0x370
[<c1055c88>] ? run_timer_softirq+0xb8/0x370
[<c1085776>] ? rebalance_domains+0x1b6/0x2a0
[<c15bc690>] ? igmp6_timer_handler+0x80/0x80
[<c104da27>] __do_softirq+0x97/0x1f0
[<c104d990>] ? local_bh_enable_ip+0xd0/0xd0
<IRQ> [<c104ddce>] ? irq_exit+0x7e/0xa0
[<c161e0f9>] ? smp_apic_timer_interrupt+0x59/0x88
[<c12fb498>] ? trace_hardirqs_off_thunk+0xc/0x14
[<c1616882>] ? apic_timer_interrupt+0x36/0x3c
[<c10380d5>] ? native_safe_halt+0x5/0x10
[<c1018bdf>] ? default_idle+0x4f/0x1e0
[<c1018dc1>] ? amd_e400_idle+0x51/0x100
[<c10199c9>] ? cpu_idle+0xb9/0xe0
[<c16038fc>] ? start_secondary+0x1ea/0x1f0
Instruct lockdep that each Qdisc busylock is independant, or else
bonding or various tunnels can trigger a splat.
Reported-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/net/sch_generic.h | 1 +
net/sched/sch_generic.c | 1 +
2 files changed, 2 insertions(+)
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 9d7d54a..a45b501 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -83,6 +83,7 @@ struct Qdisc {
struct rcu_head rcu_head;
spinlock_t busylock;
u32 limit;
+ struct lock_class_key busylock_key;
};
static inline bool qdisc_is_running(const struct Qdisc *qdisc)
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 511323e..3357c6d 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -572,6 +572,7 @@ struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
INIT_LIST_HEAD(&sch->list);
skb_queue_head_init(&sch->q);
spin_lock_init(&sch->busylock);
+ lockdep_set_class(&sch->busylock, &sch->busylock_key);
sch->ops = ops;
sch->enqueue = ops->enqueue;
sch->dequeue = ops->dequeue;
^ permalink raw reply related
* Re: [net-next.git 4/4 (v8)] phy: add the EEE support and the way to access to the MMD registers.
From: Giuseppe CAVALLARO @ 2012-06-28 6:54 UTC (permalink / raw)
To: Ben Hutchings; +Cc: netdev, eric.dumazet, rayagond, davem, yuvalmin
In-Reply-To: <1340813737.2591.5.camel@bwh-desktop.uk.solarflarecom.com>
On 6/27/2012 6:15 PM, Ben Hutchings wrote:
> On Thu, 2012-06-21 at 08:03 +0200, Giuseppe CAVALLARO wrote:
> [...]
>> v8: fixed a problem in the phy_init_eee return value erroneously added
>> when included the phy_read_status call.
>
> Almost there. :-/
no problem :-)
> [...]
>> +int phy_init_eee(struct phy_device *phydev, bool clk_stop_enable)
>> +{
>> + int ret = -EPROTONOSUPPORT;
>> +
>> + /* According to 802.3az,the EEE is supported only in full duplex-mode.
>> + * Also EEE feature is active when core is operating with MII, GMII
>> + * or RGMII.
>> + */
>> + if ((phydev->duplex == DUPLEX_FULL) &&
>> + ((phydev->interface == PHY_INTERFACE_MODE_MII) ||
>> + (phydev->interface == PHY_INTERFACE_MODE_GMII) ||
>> + (phydev->interface == PHY_INTERFACE_MODE_RGMII))) {
>> + u16 eee_lp, eee_cap, eee_adv;
>> + u32 lp, cap, adv;
>> + int idx, status;
[snip]
>> +
>> + eee_adv = phy_read_mmd_indirect(phydev->bus, MDIO_AN_EEE_ADV,
>> + MDIO_MMD_AN, phydev->addr);
>> + if (eee_adv < 0)
>> + return eee_adv;
>
> You check for eee_{cap,lp,adv} < 0 but that's impossible since the
> variables are declared unsigned (u16). (I wonder what compiler you are
> using, as I would expect this to result in a warning.) I think they
> need to be declared int.
IIRC I have compiled w/ no warnings (on arm and sh4 -- gcc 4.6.3) but
you are right and I'll fix that, no problem at all.
[snip]
>> + u32 val = phy_read_mmd_indirect(phydev->bus, MDIO_CTRL1,
>> + MDIO_MMD_PCS,
>> + phydev->addr);
>> + if (val < 0)
>> + return val;
>
> Same problem here.
yes
>
> [...]
>> --- a/include/linux/mdio.h
>> +++ b/include/linux/mdio.h
> [...]
>> @@ -237,9 +241,18 @@
>> #define MDIO_AN_10GBT_STAT_MS 0x4000 /* Master/slave config */
>> #define MDIO_AN_10GBT_STAT_MSFLT 0x8000 /* Master/slave config fault */
>>
>> -/* AN EEE Advertisement register. */
>> -#define MDIO_AN_EEE_ADV_100TX 0x0002 /* Advertise 100TX EEE cap */
>> -#define MDIO_AN_EEE_ADV_1000T 0x0004 /* Advertise 1000T EEE cap */
> [...]
>
> This header is exported to userland so I don't think these definitions
> can be removed. But you could comment that they're redundant with the
> following MDIO_EEE_* definitions.
Indeed I agree with you and I've already fixed this w/o removing them.
I'm sending the new patches now.
Thanks again for your feedback and help.
Peppe
>
> Ben.
>
^ permalink raw reply
* Re: [patch net-next] virtio_net: allow to change mac when iface is running
From: Jiri Pirko @ 2012-06-28 6:35 UTC (permalink / raw)
To: David Miller; +Cc: netdev, virtualization, brouer, mst
In-Reply-To: <20120627.213046.1244710404799995026.davem@davemloft.net>
Thu, Jun 28, 2012 at 06:30:46AM CEST, davem@davemloft.net wrote:
>From: Jiri Pirko <jpirko@redhat.com>
>Date: Wed, 27 Jun 2012 17:27:46 +0200
>
>> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
>
>Applied, but this seriously makes eth_mac_addr() completely useless.
>
>Technically, every eth_mac_addr() user in a software/virtual device
>should behave the way virtio_net does now.
I guess to. But for some HW devices eth_mac_addr() is needed (when they
does not support "life" mac change")
>
>It therefore probably makes sense to add a boolean arg which when true
>elides the netif_running() check then fixup and audit every caller.
I was thinking about this. Maybe probably __eth_mac_addr() which does
not have netif_running() check and eth_mac_addr() calling
netif_running() check and __eth_mac_addr() after that.
What do you think?
Jirka
^ permalink raw reply
* Re: [net-next RFC V3 PATCH 4/6] tuntap: multiqueue support
From: Jason Wang @ 2012-06-28 5:31 UTC (permalink / raw)
To: Sridhar Samudrala
Cc: Michael S. Tsirkin, habanero, netdev, linux-kernel, krkumar2,
tahm, akong, davem, shemminger, mashirle
In-Reply-To: <4FEBE2FF.3010105@us.ibm.com>
On 06/28/2012 12:52 PM, Sridhar Samudrala wrote:
> On 6/27/2012 8:02 PM, Jason Wang wrote:
>> On 06/27/2012 04:44 PM, Michael S. Tsirkin wrote:
>>> On Wed, Jun 27, 2012 at 01:16:30PM +0800, Jason Wang wrote:
>>>> On 06/26/2012 06:42 PM, Michael S. Tsirkin wrote:
>>>>> On Tue, Jun 26, 2012 at 11:42:17AM +0800, Jason Wang wrote:
>>>>>> On 06/25/2012 04:25 PM, Michael S. Tsirkin wrote:
>>>>>>> On Mon, Jun 25, 2012 at 02:10:18PM +0800, Jason Wang wrote:
>>>>>>>> This patch adds multiqueue support for tap device. This is done
>>>>>>>> by abstracting
>>>>>>>> each queue as a file/socket and allowing multiple sockets to be
>>>>>>>> attached to the
>>>>>>>> tuntap device (an array of tun_file were stored in the
>>>>>>>> tun_struct). Userspace
>>>>>>>> could write and read from those files to do the parallel packet
>>>>>>>> sending/receiving.
>>>>>>>>
>>>>>>>> Unlike the previous single queue implementation, the socket and
>>>>>>>> device were
>>>>>>>> loosely coupled, each of them were allowed to go away first. In
>>>>>>>> order to let the
>>>>>>>> tx path lockless, netif_tx_loch_bh() is replaced by
>>>>>>>> RCU/NETIF_F_LLTX to
>>>>>>>> synchronize between data path and system call.
>>>>>>> Don't use LLTX/RCU. It's not worth it.
>>>>>>> Use something like netif_set_real_num_tx_queues.
>>>>>>>
>>>>>>>> The tx queue selecting is first based on the recorded rxq index
>>>>>>>> of an skb, it
>>>>>>>> there's no such one, then choosing based on rx hashing
>>>>>>>> (skb_get_rxhash()).
>>>>>>>>
>>>>>>>> Signed-off-by: Jason Wang<jasowang@redhat.com>
>>>>>>> Interestingly macvtap switched to hashing first:
>>>>>>> ef0002b577b52941fb147128f30bd1ecfdd3ff6d
>>>>>>> (the commit log is corrupted but see what it
>>>>>>> does in the patch).
>>>>>>> Any idea why?
>>>>>> Yes, so tap should be changed to behave same as macvtap. I remember
>>>>>> the reason we do that is to make sure the packet of a single flow to
>>>>>> be queued to a fixed socket/virtqueues. As 10g cards like ixgbe
>>>>>> choose the rx queue for a flow based on the last tx queue where the
>>>>>> packets of that flow comes. So if we are using recored rx queue in
>>>>>> macvtap, the queue index of a flow would change as vhost thread
>>>>>> moves amongs processors.
>>>>> Hmm. OTOH if you override this, if TX is sent from VCPU0, RX might
>>>>> land
>>>>> on VCPU1 in the guest, which is not good, right?
>>>> Yes, but better than making the rx moves between vcpus when we use
>>>> recorded rx queue.
>>> Why isn't this a problem with native TCP?
>>> I think what happens is one of the following:
>>> - moving between CPUs is more expensive with tun
>>> because it can queue so much data on xmit
>>> - scheduler makes very bad decisions about VCPUs
>>> bouncing them around all the time
>>
>> For usual native TCP/host process, as it reads and writes tcp
>> sockets, so it make make sense to move rx to the porcessor where the
>> process moves. But vhost does not do tcp stuffs and ixgbe would still
>> move rx when vhost process moves, and we can't even make sure the
>> vhost process that handling rx is running on processor that handle rx
>> interrupt.
>
> We also saw this behavior with the default ixgbe configuration. If
> vhost is pinned to a CPU all
> packets for that VM are received on a single RX queue.
> So even if the VM is doing multiple TCP_RR sessions, packets for all
> the flows are received
> on a single RX queue. Without pinning, vhost moves around and so does
> the packets across
> the RX queues.
>
> I think
> ethtool -K ethX ntuple on
> will disable this behavior and it should be possible to program the
> flow director using ethtool -U.
> This way we can split the packets across the host NIC RX queues based
> on the flows, but it is not
> clear if this would help with the current model of single vhost per
> device.
> With per-cpu vhost, each RX queue can be handled by the matching
> vhost, but if we have only
> 1 queue in the VMs virtio-net device, that could become the bottleneck.
Yes, I've been thinking about this. And instead of using ethtool -U
(maybe possible for macvtap but hard for tuntap), we can 'teach' the
ixgbe of the rxq it would used for a flow because ixgbe_select_queue()
would first select the txq based on the recorded rxq. So if we want the
flow using a dedicated rxq say N, we can record N to the rxq in tuntap
before we passing the skb to bridge.
> Multi-queue virtio-net should help here, but we need the same number
> of queues in VM's virtio-net
> device as the host's NIC so that each vhost can handle the
> corresponding virtio queue.
> But if the VM has only 2 vcpus, i think it is not efficient to have 8
> virtio-net queues.(to match a host
> with 8 physical cpus and 8 RX queues in the NIC).
Ideally, if we can 2 queues in guest, it's better to only use 2 queues
in host to avoid extra contention.
>
> Thanks
> Sridhar
>
>>
>>> Could we isolate which it is? Does the problem
>>> still happen if you pin VCPUs to host cpus?
>>> If not it's the queue depth.
>>
>> It may not help as tun does not record the vcpu/queue that send the
>> stream, so it can't transmit the packets back the same vcpu/queue.
>>>> Flow steering is needed to make sure the tx and
>>>> rx on the same vcpu.
>>> That involves IPI between processes, so it might be
>>> very expensive for kvm.
>>>
>>>>>> But during test tun/tap, one interesting thing I find is that even
>>>>>> ixgbe has recorded the queue index during rx, it seems be lost when
>>>>>> tap tries to transmit skbs to userspace.
>>>>> dev_pick_tx does this I think but ndo_select_queue
>>>>> should be able to get it without trouble.
>>>>>
>>>>>
>
^ permalink raw reply
* Re: [PATCH net-next] ipv4: tcp: dont cache unconfirmed intput dst
From: David Miller @ 2012-06-28 5:22 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev, hans.schillstrom
In-Reply-To: <1340860399.26242.206.camel@edumazet-glaptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 28 Jun 2012 07:13:19 +0200
> On Thu, 2012-06-28 at 07:08 +0200, Eric Dumazet wrote:
>
>> The initial idea was to perform this only for SYN packets received on a
>> listener in SYNCOOKIE mode. I'll resend the patch when fully
>> implemented, instead of a forward patch.
>>
>
> s/forward/followup/
>
> ;)
Ok :-)
^ permalink raw reply
* [GIT] Networking
From: David Miller @ 2012-06-28 5:21 UTC (permalink / raw)
To: torvalds; +Cc: akpm, netdev, linux-kernel
1) Pairing and deadlock fixes in bluetooth from Johan Hedberg.
2) Add device IDs for AR3011 and AR3012 bluetooth chips. From
Giancarlo Formicuccia and Marek Vasut.
3) Fix wireless regulatory deadlock, from Eliad Peller.
4) Fix full TX ring panic in bnx2x driver, from Eric Dumazet.
5) Revert the two commits that added skb_orphan_try(), it causes
erratic bonding behavior with UDP clients and the gains it
used to give are mostly no longer happening due to how BQL
works. From Eric Dumazet.
6) It took two tries, but Thomas Graf fixed a problem wherein we
registered ipv6 routing procfs files before their backend data
were initialized properly.
7) Fix max GSO size setting in be2net, from Sarveshwar Bandi.
8) PHY device id mask is wrong for KSZ9021 and KS8001 chips, fix
from Jason Wang.
9) Fix use of stale SKB data pointer after skb_linearize() call in
batman-adv, from Antonio Quartulli.
10) Fix memory leak in IXGBE due to missing __GFP_COMP, from Alexander
Duyck.
11) Fix probing of Gobi devices in qmi_wwan usbnet driver, from Bjørn Mork.
12) Fix suspend/resume and open failure handling in usbnet from Ming
Lei.
13) Attempt to fix device r8169 hangs for certain chips, from Francois
Romieu.
14) Fix advancement of RX dirty pointer in some situations in sh_eth
driver, from Yoshihiro Shimoda.
15) Attempt to fix restart of IPV6 routing table dumps when there is
an intervening table update. From Eric Dumazet.
16) Respect security_inet_conn_request() return value in ipv6 TCP. From
Neal Cardwell.
17) Add another iPAD device ID to ipheth driver, from Davide Gerhard.
18) Fix access to freed SKB in l2tp_eth_dev_xmit(), and fix l2tp lockdep
splats, from Eric Dumazet.
19) Make sure all bridge devices, regardless of whether they were created
via netlink or ioctls, have their rtnetlink ops hooked up. From
Thomas Graf and Stephen Hemminger.
Please pull, thanks a lot!
The following changes since commit 424d54d2dca03805942055e5b19926d33a7d1e31:
Merge git://git.kernel.org/pub/scm/virt/kvm/kvm (2012-06-14 15:46:59 +0300)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net master
for you to fetch changes up to a969dd139cc2f2bccdcb11894f0695517cf84d4d:
Merge branch 'for-davem' of git://gitorious.org/linux-can/linux-can (2012-06-27 15:27:24 -0700)
----------------------------------------------------------------
Alexander Duyck (2):
ixgbe: Fix memory leak in ixgbe when receiving traffic on DDP enabled rings
ixgbe: Do not pad FCoE frames as this can cause issues with FCoE DDP
Amerigo Wang (1):
bonding: show all the link status of slaves
Andrei Emeltchenko (1):
Bluetooth: btmrvl: Do not send vendor events to bluetooth stack
Antonio Quartulli (2):
batman-adv: fix skb->data assignment
batman-adv: fix race condition in TT full-table replacement
Ashok Nagarajan (1):
mac80211: add missing kernel-doc
Avinash Patil (2):
mwifiex: fix incorrect privacy setting in beacon and probe response
mwifiex: fix uAP TX packet timeout issue
Bing Zhao (1):
mwifiex: fix wrong return values in add_virtual_intf() error cases
Bjørn Mork (1):
net: qmi_wwan: fix Gobi device probing
Bob Copeland (1):
ath5k: remove _bh from inner locks
Carolyn Wyborny (2):
igb: Fix incorrect RAR address entries for i210/i211 device.
Kconfig: Fix Kconfig for Intel ixgbe and igb PTP support.
Dan Carpenter (4):
can: c_can: precedence error in c_can_chip_config()
qlcnic: off by one in qlcnic_init_pci_info()
airo: copying wrong data in airo_get_aplist()
9p: fix min_t() casting in p9pdu_vwritef()
Daniel Halperin (1):
sctp: fix warning when compiling without IPv6
David S. Miller (2):
Revert "ipv6: Prevent access to uninitialized fib_table_hash via /proc/net/ipv6_route"
Merge branch 'for-davem' of git://gitorious.org/linux-can/linux-can
David Spinadel (1):
mac80211: stop polling in disassociation
Davide Gerhard (1):
ipheth: add support for iPad
Eliad Peller (2):
cfg80211: fix potential deadlock in regulatory
mac80211: check sdata_running on ieee80211_set_bitrate_mask
Eric Dumazet (5):
bnx2x: fix panic when TX ring is full
net: remove skb_orphan_try()
ipv6: fib: fix fib dump restart
net: l2tp_eth: fix l2tp_eth_dev_xmit race
net: l2tp_eth: use LLTX to avoid LOCKDEP splats
Felix Fietkau (2):
ath9k: fix a tx rate duration calculation bug
ath9k: fix invalid pointer access in the tx path
Giancarlo Formicuccia (1):
Bluetooth: add support for atheros 0930:0219
Grazvydas Ignotas (3):
wl1251: fix TSF calculation
wl1251: always report beacon loss to the stack
wl1251: Fix memory leaks in SPI initialization
Hui Wang (1):
can: flexcan: use be32_to_cpup to handle the value of dt entry
Ian Campbell (1):
xen/netfront: teardown the device before unregistering it.
Jacob Keller (1):
ixgbe: Fix PHC loophole allowing misconfiguration of increment register
Jason Wang (1):
phy/micrel: change phy_id_mask for KSZ9021 and KS8001
Jens Freimann (1):
vhost: use USER_DS in vhost_worker thread
Johan Hedberg (4):
Bluetooth: Fix SMP pairing method selection
Bluetooth: Fix deadlock and crash when SMP pairing times out
Bluetooth: Fix SMP security elevation from medium to high
Bluetooth: Add support for encryption key refresh
Johannes Berg (2):
mac80211: add some missing kernel-doc
iwlwifi: remove log_event debugfs file debugging is disabled
John W. Linville (5):
Merge branch 'for-upstream' of git://git.kernel.org/.../bluetooth/bluetooth
Merge branch 'for-john' of git://git.kernel.org/.../jberg/mac80211
Merge branch 'master' of git://git.kernel.org/.../linville/wireless into for-davem
Merge branch 'for-upstream' of git://git.kernel.org/.../bluetooth/bluetooth
Merge branch 'master' of git://git.kernel.org/.../linville/wireless into for-davem
Jussi Kivilinna (1):
rndis_wlan: fix matching bssid check in rndis_check_bssid_list()
Marek Lindner (1):
batman-adv: only drop packets of known wifi clients
Marek Vasut (1):
Bluetooth: Support AR3011 in Acer Iconia Tab W500
Michal Kazior (1):
cfg80211: check iface combinations only when iface is running
Ming Lei (3):
usbnet: clear OPEN flag in failure path
usbnet: decrease suspend count if returning -EBUSY for runtime suspend
usbnet: handle remote wakeup asap
Mohammed Shafi Shajakhan (4):
ath9k: Fix a WARNING on suspend/resume with IBSS
ath9k: remove incompatible IBSS interface check in change_iface
ath9k: Fix softlockup in AR9485
ath9k_hw: avoid possible infinite loop in ar9003_get_pll_sqsum_dvc
Neal Cardwell (1):
tcp: heed result of security_inet_conn_request() in tcp_v6_conn_request()
Per Ellefsen (1):
caif-hsi: Bugfix - Piggyback'ed embedded CAIF frame lost
Phil Sutter (1):
usbnet: sanitise overlong driver information strings
Rajkumar Manoharan (1):
ath9k_htc: configure bssid on ASSOC/IBSS change
Rémi Denis-Courmont (1):
net: remove my future former mail address
Sarveshwar Bandi (1):
be2net: reduce gso_max_size setting to account for ethernet header.
Sjur Brændeland (2):
caif: Clear shutdown mask to zero at reconnect.
caif-hsi: Add missing return in error path
Szymon Janc (1):
Bluetooth: Fix using uninitialized option in RFCMode
Thomas Graf (2):
ipv6: Prevent access to uninitialized fib_table_hash via /proc/net/ipv6_route
ipv6: Move ipv6 proc file registration to end of init order
Vasundhara Volam (2):
be2net: Modify error message to incorporate subsystem
be2net: Increase statistics structure size for skyhawk.
Vishal Agarwal (2):
Bluetooth: Fix LE pairing completion on connection failure
Bluetooth: Fix sending HCI_Disconnect only when connected
Yevgeny Petrilin (3):
net/mlx4_en: Set correct port parameters during device initialization
net/mlx4: Use single completion vector after NOP failure
net/mlx4_en: Release QP range in free_resources
Yoshihiro Shimoda (1):
net: sh_eth: fix the condition to fix the cur_tx/dirty_rx
Yuval Mintz (2):
bnx2x: fix I2C non-respondent issue
bnx2x: fix link for BCM57711 with 84823 phy
alex.bluesman.smirnov@gmail.com (1):
mac802154: add missed braces
françois romieu (1):
r8169: RxConfig hack for the 8168evl.
stephen hemminger (1):
bridge: Assign rtnl_link_ops to bridge devices created via ioctl (v2)
drivers/bluetooth/ath3k.c | 3 ++
drivers/bluetooth/btmrvl_drv.h | 2 +-
drivers/bluetooth/btmrvl_main.c | 14 +++++++--
drivers/bluetooth/btmrvl_sdio.c | 8 +++--
drivers/bluetooth/btusb.c | 2 ++
drivers/net/bonding/bond_procfs.c | 15 +++++++--
drivers/net/caif/caif_hsi.c | 5 +--
drivers/net/can/c_can/c_can.c | 4 +--
drivers/net/can/flexcan.c | 4 +--
drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 8 ++---
drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c | 54 ++++++++++++++++++--------------
drivers/net/ethernet/emulex/benet/be_cmds.c | 12 +++----
drivers/net/ethernet/emulex/benet/be_cmds.h | 2 +-
drivers/net/ethernet/emulex/benet/be_main.c | 2 +-
drivers/net/ethernet/intel/Kconfig | 10 ++++--
drivers/net/ethernet/intel/igb/e1000_82575.c | 2 --
drivers/net/ethernet/intel/ixgbe/ixgbe.h | 4 +--
drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c | 2 +-
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 16 +++++++---
drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c | 13 ++++++--
drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 18 +++++++----
drivers/net/ethernet/mellanox/mlx4/main.c | 2 ++
drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 1 +
drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c | 2 +-
drivers/net/ethernet/realtek/r8169.c | 1 +
drivers/net/ethernet/renesas/sh_eth.c | 12 ++++---
drivers/net/phy/micrel.c | 8 ++---
drivers/net/usb/ipheth.c | 5 +++
drivers/net/usb/qmi_wwan.c | 83 ++++++++++++++++++++++++-------------------------
drivers/net/usb/usbnet.c | 53 +++++++++++++++++++------------
drivers/net/wireless/airo.c | 4 +--
drivers/net/wireless/ath/ath5k/base.c | 4 +--
drivers/net/wireless/ath/ath9k/ath9k.h | 1 +
drivers/net/wireless/ath/ath9k/htc_drv_main.c | 5 ++-
drivers/net/wireless/ath/ath9k/hw.c | 14 ++++++++-
drivers/net/wireless/ath/ath9k/main.c | 27 ++++++----------
drivers/net/wireless/ath/ath9k/xmit.c | 31 ++++++++++--------
drivers/net/wireless/iwlwifi/iwl-debugfs.c | 6 ++++
drivers/net/wireless/mwifiex/cfg80211.c | 25 +++++++--------
drivers/net/wireless/mwifiex/txrx.c | 10 ++----
drivers/net/wireless/mwifiex/uap_cmd.c | 11 +++++++
drivers/net/wireless/rndis_wlan.c | 2 +-
drivers/net/wireless/ti/wl1251/acx.c | 2 +-
drivers/net/wireless/ti/wl1251/event.c | 3 +-
drivers/net/wireless/ti/wl1251/spi.c | 4 +++
drivers/net/xen-netfront.c | 8 ++---
drivers/vhost/vhost.c | 3 ++
include/linux/skbuff.h | 7 ++---
include/net/bluetooth/hci.h | 6 ++++
include/net/mac80211.h | 6 ++++
include/net/phonet/gprs.h | 2 +-
net/9p/protocol.c | 2 +-
net/batman-adv/routing.c | 2 ++
net/batman-adv/translation-table.c | 12 +++----
net/bluetooth/hci_event.c | 48 ++++++++++++++++++++++++++++
net/bluetooth/l2cap_core.c | 21 ++++++++-----
net/bluetooth/mgmt.c | 20 +++++++++++-
net/bluetooth/smp.c | 11 ++++---
net/bridge/br_if.c | 1 +
net/bridge/br_netlink.c | 2 +-
net/bridge/br_private.h | 1 +
net/caif/caif_dev.c | 3 +-
net/caif/caif_socket.c | 1 +
net/can/raw.c | 3 --
net/core/dev.c | 23 +-------------
net/ipv6/ip6_fib.c | 4 +--
net/ipv6/route.c | 41 ++++++++++++++++++------
net/ipv6/tcp_ipv6.c | 3 +-
net/iucv/af_iucv.c | 1 -
net/l2tp/l2tp_eth.c | 45 ++++++++++++++++++++-------
net/mac80211/cfg.c | 3 ++
net/mac80211/mlme.c | 4 +--
net/mac80211/sta_info.h | 5 +++
net/mac802154/tx.c | 3 +-
net/phonet/af_phonet.c | 4 +--
net/phonet/datagram.c | 4 +--
net/phonet/pep-gprs.c | 2 +-
net/phonet/pep.c | 2 +-
net/phonet/pn_dev.c | 4 +--
net/phonet/pn_netlink.c | 4 +--
net/phonet/socket.c | 4 +--
net/phonet/sysctl.c | 2 +-
net/sctp/protocol.c | 2 ++
net/wireless/reg.c | 2 +-
net/wireless/util.c | 2 +-
85 files changed, 529 insertions(+), 310 deletions(-)
^ permalink raw reply
* Re: [PATCH net-next] ipv4: tcp: dont cache unconfirmed intput dst
From: Eric Dumazet @ 2012-06-28 5:13 UTC (permalink / raw)
To: David Miller; +Cc: netdev, hans.schillstrom
In-Reply-To: <1340860102.26242.203.camel@edumazet-glaptop>
On Thu, 2012-06-28 at 07:08 +0200, Eric Dumazet wrote:
> The initial idea was to perform this only for SYN packets received on a
> listener in SYNCOOKIE mode. I'll resend the patch when fully
> implemented, instead of a forward patch.
>
s/forward/followup/
;)
^ permalink raw reply
* Re: [PATCH] Build fix in drivers/net/wireless/ath/ath9k/main.c
From: Mohammed Shafi @ 2012-06-28 5:09 UTC (permalink / raw)
To: Arvydas Sidorenko
Cc: mcgrof-A+ZNKFmMK5xy9aJCnZT0Uw, jouni-A+ZNKFmMK5xy9aJCnZT0Uw,
vthiagar-A+ZNKFmMK5xy9aJCnZT0Uw, senthilb-A+ZNKFmMK5xy9aJCnZT0Uw,
linville-2XuSBdqkA4R54TAoqtyWWQ,
linux-wireless-u79uwXL29TY76Z2rM5mHXA,
ath9k-devel-xDcbHBWguxHbcTqmT+pZeQ, netdev-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1340824779-5157-1-git-send-email-asido4-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Hi,
On Thu, Jun 28, 2012 at 12:49 AM, Arvydas Sidorenko <asido4-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Commit fad29cd2f59949581050a937786c2c9bc78b2f04 broke the build if
> no CONFIG_ATH9K_BTCOEX_SUPPORT is enabled.
>
> Signed-off-by: Arvydas Sidorenko <asido4-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> ---
> drivers/net/wireless/ath/ath9k/main.c | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c
> index c14cf5a..e4e73f0 100644
> --- a/drivers/net/wireless/ath/ath9k/main.c
> +++ b/drivers/net/wireless/ath/ath9k/main.c
> @@ -151,8 +151,10 @@ static void __ath_cancel_work(struct ath_softc *sc)
> cancel_delayed_work_sync(&sc->tx_complete_work);
> cancel_delayed_work_sync(&sc->hw_pll_work);
>
> +#ifdef CONFIG_ATH9K_BTCOEX_SUPPORT
> if (ath9k_hw_mci_is_enabled(sc->sc_ah))
> cancel_work_sync(&sc->mci_work);
> +#endif
> }
>
> static void ath_cancel_work(struct ath_softc *sc)
> --
> 1.7.8.6
thanks for the patch, but it was just sent some time back
http://www.spinics.net/lists/linux-wireless/msg93078.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
thanks,
shafi
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH net-next] ipv4: tcp: dont cache unconfirmed intput dst
From: Eric Dumazet @ 2012-06-28 5:08 UTC (permalink / raw)
To: David Miller; +Cc: netdev, hans.schillstrom
In-Reply-To: <20120627.170830.811332455348620174.davem@davemloft.net>
On Wed, 2012-06-27 at 17:08 -0700, David Miller wrote:
> From: David Miller <davem@davemloft.net>
> Date: Wed, 27 Jun 2012 17:01:01 -0700 (PDT)
>
> > There are quite a number of unwanted side effects from this change, so
> > I think we'll have to revert unless you can fix up all of the relevant
> > cases quickly.
>
> Actually I've decided to revert it now.
>
> Whilst this was a swell idea, there is no way for you to know if
> we should really create a cached route or not.
>
> Even if you could, there is a lot of logic you'll need to code up
> so that, f.e., once we determine that we've got a DST_NOCACHE route
> when we move to established state, we can insert it into the routing
> cache and not mark it DST_NOCACHE any longer.
>
> But even if we did that, we're going to eat 2 uncached route lookups
> for every new incoming legitimate connection.
The initial idea was to perform this only for SYN packets received on a
listener in SYNCOOKIE mode. I'll resend the patch when fully
implemented, instead of a forward patch.
Thanks
^ permalink raw reply
* [PATCH] ipv4: Kill early demux method return value.
From: David Miller @ 2012-06-28 5:07 UTC (permalink / raw)
To: netdev
It's completely unnecessary.
Signed-off-by: David S. Miller <davem@davemloft.net>
---
include/net/protocol.h | 2 +-
include/net/tcp.h | 2 +-
net/ipv4/ip_input.c | 42 +++++++++++++++++++-----------------------
net/ipv4/tcp_ipv4.c | 19 ++++++-------------
4 files changed, 27 insertions(+), 38 deletions(-)
diff --git a/include/net/protocol.h b/include/net/protocol.h
index 967b926..057f2d3 100644
--- a/include/net/protocol.h
+++ b/include/net/protocol.h
@@ -37,7 +37,7 @@
/* This is used to register protocols. */
struct net_protocol {
- int (*early_demux)(struct sk_buff *skb);
+ void (*early_demux)(struct sk_buff *skb);
int (*handler)(struct sk_buff *skb);
void (*err_handler)(struct sk_buff *skb, u32 info);
int (*gso_send_check)(struct sk_buff *skb);
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 6660ffc..53fb7d8 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -325,7 +325,7 @@ extern void tcp_v4_err(struct sk_buff *skb, u32);
extern void tcp_shutdown (struct sock *sk, int how);
-extern int tcp_v4_early_demux(struct sk_buff *skb);
+extern void tcp_v4_early_demux(struct sk_buff *skb);
extern int tcp_v4_rcv(struct sk_buff *skb);
extern struct inet_peer *tcp_v4_get_peer(struct sock *sk);
diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
index 2a39204..b27d444 100644
--- a/net/ipv4/ip_input.c
+++ b/net/ipv4/ip_input.c
@@ -320,33 +320,29 @@ static int ip_rcv_finish(struct sk_buff *skb)
const struct iphdr *iph = ip_hdr(skb);
struct rtable *rt;
+ if (sysctl_ip_early_demux && !skb_dst(skb)) {
+ const struct net_protocol *ipprot;
+ int protocol = iph->protocol;
+
+ rcu_read_lock();
+ ipprot = rcu_dereference(inet_protos[protocol]);
+ if (ipprot && ipprot->early_demux)
+ ipprot->early_demux(skb);
+ rcu_read_unlock();
+ }
+
/*
* Initialise the virtual path cache for the packet. It describes
* how the packet travels inside Linux networking.
*/
- if (skb_dst(skb) == NULL) {
- int err = -ENOENT;
-
- if (sysctl_ip_early_demux) {
- const struct net_protocol *ipprot;
- int protocol = iph->protocol;
-
- rcu_read_lock();
- ipprot = rcu_dereference(inet_protos[protocol]);
- if (ipprot && ipprot->early_demux)
- err = ipprot->early_demux(skb);
- rcu_read_unlock();
- }
-
- if (err) {
- err = ip_route_input_noref(skb, iph->daddr, iph->saddr,
- iph->tos, skb->dev);
- if (unlikely(err)) {
- if (err == -EXDEV)
- NET_INC_STATS_BH(dev_net(skb->dev),
- LINUX_MIB_IPRPFILTER);
- goto drop;
- }
+ if (!skb_dst(skb)) {
+ int err = ip_route_input_noref(skb, iph->daddr, iph->saddr,
+ iph->tos, skb->dev);
+ if (unlikely(err)) {
+ if (err == -EXDEV)
+ NET_INC_STATS_BH(dev_net(skb->dev),
+ LINUX_MIB_IPRPFILTER);
+ goto drop;
}
}
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 1781dc6..b4ae1c1 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1673,30 +1673,28 @@ csum_err:
}
EXPORT_SYMBOL(tcp_v4_do_rcv);
-int tcp_v4_early_demux(struct sk_buff *skb)
+void tcp_v4_early_demux(struct sk_buff *skb)
{
struct net *net = dev_net(skb->dev);
const struct iphdr *iph;
const struct tcphdr *th;
struct net_device *dev;
struct sock *sk;
- int err;
- err = -ENOENT;
if (skb->pkt_type != PACKET_HOST)
- goto out_err;
+ return;
if (!pskb_may_pull(skb, ip_hdrlen(skb) + sizeof(struct tcphdr)))
- goto out_err;
+ return;
iph = ip_hdr(skb);
th = (struct tcphdr *) ((char *)iph + ip_hdrlen(skb));
if (th->doff < sizeof(struct tcphdr) / 4)
- goto out_err;
+ return;
if (!pskb_may_pull(skb, ip_hdrlen(skb) + th->doff * 4))
- goto out_err;
+ return;
dev = skb->dev;
sk = __inet_lookup_established(net, &tcp_hashinfo,
@@ -1713,16 +1711,11 @@ int tcp_v4_early_demux(struct sk_buff *skb)
if (dst) {
struct rtable *rt = (struct rtable *) dst;
- if (rt->rt_iif == dev->ifindex) {
+ if (rt->rt_iif == dev->ifindex)
skb_dst_set_noref(skb, dst);
- err = 0;
- }
}
}
}
-
-out_err:
- return err;
}
/*
--
1.7.10
^ permalink raw reply related
* Re: [PATCH v2] l2tp: use per-cpu variables for u64_stats updates
From: Eric Dumazet @ 2012-06-28 5:00 UTC (permalink / raw)
To: Rick Jones
Cc: Ben Greear, Stephen Hemminger, Tom Parkin, netdev, David.Laight,
James Chapman
In-Reply-To: <4FEB90C3.9050607@hp.com>
On Wed, 2012-06-27 at 16:01 -0700, Rick Jones wrote:
> Today, sure, generalizing to packet counters in general, that bloat is
> likely on its way. At 100 Gbit/s Ethernet, that is upwards of 147
> million packets per second each way. At 1 GbE it is 125 million octets
> per second. So, if 32 bit octet counters were insufficient for 1 GbE,
> 32 bit packet counters likely will be insufficient for 100GbE.
>
> Or, I suppose, 3 or more bonded 40 GbEs or 10 or more bonded 10 GbEs
> (unlikely though that last one may be) assuming there is stats
> aggregation in the bond interface.
Note that I am all for 64bit counters on 64bit kernels because they are
almost[1] free, since they fit in a machine word (unsigned long).
tx_dropped is the count of dropped _packets_.
If more than 32bits are needed, and someone must run this 100GbE on a
32bit machine of the last century, he really has a big problem.
[1] : LLTX drivers case
since ndo_start_xmit() can be run concurrently by many cpus, safely
updating an "unsigned long" requires additional hassle :
1) Use of a spinlock to protect the update.
2) Use atomic_long_t instead of "unsigned long"
3) Use percpu data
3) is overkill for devices with light traffic, because it consumes lot
of RAM on machines with 2048 possible cpus, _and_ the reader must fold
the data of all possible values.
^ permalink raw reply
* [PATCH] xfrm_user: Propagate netlink error codes properly.
From: David Miller @ 2012-06-28 4:57 UTC (permalink / raw)
To: netdev
Instead of using a fixed value of "-1" or "-EMSGSIZE", propagate what
the nla_*() interfaces actually return.
Signed-off-by: David S. Miller <davem@davemloft.net>
---
include/net/xfrm.h | 10 +-
net/xfrm/xfrm_user.c | 394 ++++++++++++++++++++++++++------------------------
2 files changed, 208 insertions(+), 196 deletions(-)
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index e0a55df..17acbc9 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1682,13 +1682,11 @@ static inline int xfrm_mark_get(struct nlattr **attrs, struct xfrm_mark *m)
static inline int xfrm_mark_put(struct sk_buff *skb, const struct xfrm_mark *m)
{
- if ((m->m | m->v) &&
- nla_put(skb, XFRMA_MARK, sizeof(struct xfrm_mark), m))
- goto nla_put_failure;
- return 0;
+ int ret = 0;
-nla_put_failure:
- return -1;
+ if (m->m | m->v)
+ ret = nla_put(skb, XFRMA_MARK, sizeof(struct xfrm_mark), m);
+ return ret;
}
#endif /* _NET_XFRM_H */
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 44293b3..5407627 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -754,58 +754,67 @@ static int copy_to_user_state_extra(struct xfrm_state *x,
struct xfrm_usersa_info *p,
struct sk_buff *skb)
{
- copy_to_user_state(x, p);
-
- if (x->coaddr &&
- nla_put(skb, XFRMA_COADDR, sizeof(*x->coaddr), x->coaddr))
- goto nla_put_failure;
-
- if (x->lastused &&
- nla_put_u64(skb, XFRMA_LASTUSED, x->lastused))
- goto nla_put_failure;
-
- if (x->aead &&
- nla_put(skb, XFRMA_ALG_AEAD, aead_len(x->aead), x->aead))
- goto nla_put_failure;
-
- if (x->aalg &&
- (copy_to_user_auth(x->aalg, skb) ||
- nla_put(skb, XFRMA_ALG_AUTH_TRUNC,
- xfrm_alg_auth_len(x->aalg), x->aalg)))
- goto nla_put_failure;
-
- if (x->ealg &&
- nla_put(skb, XFRMA_ALG_CRYPT, xfrm_alg_len(x->ealg), x->ealg))
- goto nla_put_failure;
-
- if (x->calg &&
- nla_put(skb, XFRMA_ALG_COMP, sizeof(*(x->calg)), x->calg))
- goto nla_put_failure;
-
- if (x->encap &&
- nla_put(skb, XFRMA_ENCAP, sizeof(*x->encap), x->encap))
- goto nla_put_failure;
+ int ret = 0;
- if (x->tfcpad &&
- nla_put_u32(skb, XFRMA_TFCPAD, x->tfcpad))
- goto nla_put_failure;
-
- if (xfrm_mark_put(skb, &x->mark))
- goto nla_put_failure;
-
- if (x->replay_esn &&
- nla_put(skb, XFRMA_REPLAY_ESN_VAL,
- xfrm_replay_state_esn_len(x->replay_esn),
- x->replay_esn))
- goto nla_put_failure;
-
- if (x->security && copy_sec_ctx(x->security, skb))
- goto nla_put_failure;
-
- return 0;
+ copy_to_user_state(x, p);
-nla_put_failure:
- return -EMSGSIZE;
+ if (x->coaddr) {
+ ret = nla_put(skb, XFRMA_COADDR, sizeof(*x->coaddr), x->coaddr);
+ if (ret)
+ goto out;
+ }
+ if (x->lastused) {
+ ret = nla_put_u64(skb, XFRMA_LASTUSED, x->lastused);
+ if (ret)
+ goto out;
+ }
+ if (x->aead) {
+ ret = nla_put(skb, XFRMA_ALG_AEAD, aead_len(x->aead), x->aead);
+ if (ret)
+ goto out;
+ }
+ if (x->aalg) {
+ ret = copy_to_user_auth(x->aalg, skb);
+ if (!ret)
+ ret = nla_put(skb, XFRMA_ALG_AUTH_TRUNC,
+ xfrm_alg_auth_len(x->aalg), x->aalg);
+ if (ret)
+ goto out;
+ }
+ if (x->ealg) {
+ ret = nla_put(skb, XFRMA_ALG_CRYPT, xfrm_alg_len(x->ealg), x->ealg);
+ if (ret)
+ goto out;
+ }
+ if (x->calg) {
+ ret = nla_put(skb, XFRMA_ALG_COMP, sizeof(*(x->calg)), x->calg);
+ if (ret)
+ goto out;
+ }
+ if (x->encap) {
+ ret = nla_put(skb, XFRMA_ENCAP, sizeof(*x->encap), x->encap);
+ if (ret)
+ goto out;
+ }
+ if (x->tfcpad) {
+ ret = nla_put_u32(skb, XFRMA_TFCPAD, x->tfcpad);
+ if (ret)
+ goto out;
+ }
+ ret = xfrm_mark_put(skb, &x->mark);
+ if (ret)
+ goto out;
+ if (x->replay_esn) {
+ ret = nla_put(skb, XFRMA_REPLAY_ESN_VAL,
+ xfrm_replay_state_esn_len(x->replay_esn),
+ x->replay_esn);
+ if (ret)
+ goto out;
+ }
+ if (x->security)
+ ret = copy_sec_ctx(x->security, skb);
+out:
+ return ret;
}
static int dump_one_state(struct xfrm_state *x, int count, void *ptr)
@@ -825,15 +834,12 @@ static int dump_one_state(struct xfrm_state *x, int count, void *ptr)
p = nlmsg_data(nlh);
err = copy_to_user_state_extra(x, p, skb);
- if (err)
- goto nla_put_failure;
-
+ if (err) {
+ nlmsg_cancel(skb, nlh);
+ return err;
+ }
nlmsg_end(skb, nlh);
return 0;
-
-nla_put_failure:
- nlmsg_cancel(skb, nlh);
- return err;
}
static int xfrm_dump_sa_done(struct netlink_callback *cb)
@@ -904,6 +910,7 @@ static int build_spdinfo(struct sk_buff *skb, struct net *net,
struct xfrmu_spdinfo spc;
struct xfrmu_spdhinfo sph;
struct nlmsghdr *nlh;
+ int err;
u32 *f;
nlh = nlmsg_put(skb, pid, seq, XFRM_MSG_NEWSPDINFO, sizeof(u32), 0);
@@ -922,15 +929,15 @@ static int build_spdinfo(struct sk_buff *skb, struct net *net,
sph.spdhcnt = si.spdhcnt;
sph.spdhmcnt = si.spdhmcnt;
- if (nla_put(skb, XFRMA_SPD_INFO, sizeof(spc), &spc) ||
- nla_put(skb, XFRMA_SPD_HINFO, sizeof(sph), &sph))
- goto nla_put_failure;
+ err = nla_put(skb, XFRMA_SPD_INFO, sizeof(spc), &spc);
+ if (!err)
+ err = nla_put(skb, XFRMA_SPD_HINFO, sizeof(sph), &sph);
+ if (err) {
+ nlmsg_cancel(skb, nlh);
+ return err;
+ }
return nlmsg_end(skb, nlh);
-
-nla_put_failure:
- nlmsg_cancel(skb, nlh);
- return -EMSGSIZE;
}
static int xfrm_get_spdinfo(struct sk_buff *skb, struct nlmsghdr *nlh,
@@ -965,6 +972,7 @@ static int build_sadinfo(struct sk_buff *skb, struct net *net,
struct xfrmk_sadinfo si;
struct xfrmu_sadhinfo sh;
struct nlmsghdr *nlh;
+ int err;
u32 *f;
nlh = nlmsg_put(skb, pid, seq, XFRM_MSG_NEWSADINFO, sizeof(u32), 0);
@@ -978,15 +986,15 @@ static int build_sadinfo(struct sk_buff *skb, struct net *net,
sh.sadhmcnt = si.sadhmcnt;
sh.sadhcnt = si.sadhcnt;
- if (nla_put_u32(skb, XFRMA_SAD_CNT, si.sadcnt) ||
- nla_put(skb, XFRMA_SAD_HINFO, sizeof(sh), &sh))
- goto nla_put_failure;
+ err = nla_put_u32(skb, XFRMA_SAD_CNT, si.sadcnt);
+ if (!err)
+ err = nla_put(skb, XFRMA_SAD_HINFO, sizeof(sh), &sh);
+ if (err) {
+ nlmsg_cancel(skb, nlh);
+ return err;
+ }
return nlmsg_end(skb, nlh);
-
-nla_put_failure:
- nlmsg_cancel(skb, nlh);
- return -EMSGSIZE;
}
static int xfrm_get_sadinfo(struct sk_buff *skb, struct nlmsghdr *nlh,
@@ -1439,9 +1447,8 @@ static inline int copy_to_user_state_sec_ctx(struct xfrm_state *x, struct sk_buf
static inline int copy_to_user_sec_ctx(struct xfrm_policy *xp, struct sk_buff *skb)
{
- if (xp->security) {
+ if (xp->security)
return copy_sec_ctx(xp->security, skb);
- }
return 0;
}
static inline size_t userpolicy_type_attrsize(void)
@@ -1477,6 +1484,7 @@ static int dump_one_policy(struct xfrm_policy *xp, int dir, int count, void *ptr
struct sk_buff *in_skb = sp->in_skb;
struct sk_buff *skb = sp->out_skb;
struct nlmsghdr *nlh;
+ int err;
nlh = nlmsg_put(skb, NETLINK_CB(in_skb).pid, sp->nlmsg_seq,
XFRM_MSG_NEWPOLICY, sizeof(*p), sp->nlmsg_flags);
@@ -1485,22 +1493,19 @@ static int dump_one_policy(struct xfrm_policy *xp, int dir, int count, void *ptr
p = nlmsg_data(nlh);
copy_to_user_policy(xp, p, dir);
- if (copy_to_user_tmpl(xp, skb) < 0)
- goto nlmsg_failure;
- if (copy_to_user_sec_ctx(xp, skb))
- goto nlmsg_failure;
- if (copy_to_user_policy_type(xp->type, skb) < 0)
- goto nlmsg_failure;
- if (xfrm_mark_put(skb, &xp->mark))
- goto nla_put_failure;
-
+ err = copy_to_user_tmpl(xp, skb);
+ if (!err)
+ err = copy_to_user_sec_ctx(xp, skb);
+ if (!err)
+ err = copy_to_user_policy_type(xp->type, skb);
+ if (!err)
+ err = xfrm_mark_put(skb, &xp->mark);
+ if (err) {
+ nlmsg_cancel(skb, nlh);
+ return err;
+ }
nlmsg_end(skb, nlh);
return 0;
-
-nla_put_failure:
-nlmsg_failure:
- nlmsg_cancel(skb, nlh);
- return -EMSGSIZE;
}
static int xfrm_dump_policy_done(struct netlink_callback *cb)
@@ -1688,6 +1693,7 @@ static int build_aevent(struct sk_buff *skb, struct xfrm_state *x, const struct
{
struct xfrm_aevent_id *id;
struct nlmsghdr *nlh;
+ int err;
nlh = nlmsg_put(skb, c->pid, c->seq, XFRM_MSG_NEWAE, sizeof(*id), 0);
if (nlh == NULL)
@@ -1703,35 +1709,39 @@ static int build_aevent(struct sk_buff *skb, struct xfrm_state *x, const struct
id->flags = c->data.aevent;
if (x->replay_esn) {
- if (nla_put(skb, XFRMA_REPLAY_ESN_VAL,
- xfrm_replay_state_esn_len(x->replay_esn),
- x->replay_esn))
- goto nla_put_failure;
+ err = nla_put(skb, XFRMA_REPLAY_ESN_VAL,
+ xfrm_replay_state_esn_len(x->replay_esn),
+ x->replay_esn);
} else {
- if (nla_put(skb, XFRMA_REPLAY_VAL, sizeof(x->replay),
- &x->replay))
- goto nla_put_failure;
+ err = nla_put(skb, XFRMA_REPLAY_VAL, sizeof(x->replay),
+ &x->replay);
}
- if (nla_put(skb, XFRMA_LTIME_VAL, sizeof(x->curlft), &x->curlft))
- goto nla_put_failure;
-
- if ((id->flags & XFRM_AE_RTHR) &&
- nla_put_u32(skb, XFRMA_REPLAY_THRESH, x->replay_maxdiff))
- goto nla_put_failure;
-
- if ((id->flags & XFRM_AE_ETHR) &&
- nla_put_u32(skb, XFRMA_ETIMER_THRESH,
- x->replay_maxage * 10 / HZ))
- goto nla_put_failure;
+ if (err)
+ goto out_cancel;
+ err = nla_put(skb, XFRMA_LTIME_VAL, sizeof(x->curlft), &x->curlft);
+ if (err)
+ goto out_cancel;
- if (xfrm_mark_put(skb, &x->mark))
- goto nla_put_failure;
+ if (id->flags & XFRM_AE_RTHR) {
+ err = nla_put_u32(skb, XFRMA_REPLAY_THRESH, x->replay_maxdiff);
+ if (err)
+ goto out_cancel;
+ }
+ if (id->flags & XFRM_AE_ETHR) {
+ err = nla_put_u32(skb, XFRMA_ETIMER_THRESH,
+ x->replay_maxage * 10 / HZ);
+ if (err)
+ goto out_cancel;
+ }
+ err = xfrm_mark_put(skb, &x->mark);
+ if (err)
+ goto out_cancel;
return nlmsg_end(skb, nlh);
-nla_put_failure:
+out_cancel:
nlmsg_cancel(skb, nlh);
- return -EMSGSIZE;
+ return err;
}
static int xfrm_get_ae(struct sk_buff *skb, struct nlmsghdr *nlh,
@@ -2155,7 +2165,7 @@ static int build_migrate(struct sk_buff *skb, const struct xfrm_migrate *m,
const struct xfrm_migrate *mp;
struct xfrm_userpolicy_id *pol_id;
struct nlmsghdr *nlh;
- int i;
+ int i, err;
nlh = nlmsg_put(skb, 0, 0, XFRM_MSG_MIGRATE, sizeof(*pol_id), 0);
if (nlh == NULL)
@@ -2167,21 +2177,25 @@ static int build_migrate(struct sk_buff *skb, const struct xfrm_migrate *m,
memcpy(&pol_id->sel, sel, sizeof(pol_id->sel));
pol_id->dir = dir;
- if (k != NULL && (copy_to_user_kmaddress(k, skb) < 0))
- goto nlmsg_failure;
-
- if (copy_to_user_policy_type(type, skb) < 0)
- goto nlmsg_failure;
-
+ if (k != NULL) {
+ err = copy_to_user_kmaddress(k, skb);
+ if (err)
+ goto out_cancel;
+ }
+ err = copy_to_user_policy_type(type, skb);
+ if (err)
+ goto out_cancel;
for (i = 0, mp = m ; i < num_migrate; i++, mp++) {
- if (copy_to_user_migrate(mp, skb) < 0)
- goto nlmsg_failure;
+ err = copy_to_user_migrate(mp, skb);
+ if (err)
+ goto out_cancel;
}
return nlmsg_end(skb, nlh);
-nlmsg_failure:
+
+out_cancel:
nlmsg_cancel(skb, nlh);
- return -EMSGSIZE;
+ return err;
}
static int xfrm_send_migrate(const struct xfrm_selector *sel, u8 dir, u8 type,
@@ -2354,6 +2368,7 @@ static int build_expire(struct sk_buff *skb, struct xfrm_state *x, const struct
{
struct xfrm_user_expire *ue;
struct nlmsghdr *nlh;
+ int err;
nlh = nlmsg_put(skb, c->pid, 0, XFRM_MSG_EXPIRE, sizeof(*ue), 0);
if (nlh == NULL)
@@ -2363,13 +2378,11 @@ static int build_expire(struct sk_buff *skb, struct xfrm_state *x, const struct
copy_to_user_state(x, &ue->state);
ue->hard = (c->data.hard != 0) ? 1 : 0;
- if (xfrm_mark_put(skb, &x->mark))
- goto nla_put_failure;
+ err = xfrm_mark_put(skb, &x->mark);
+ if (err)
+ return err;
return nlmsg_end(skb, nlh);
-
-nla_put_failure:
- return -EMSGSIZE;
}
static int xfrm_exp_state_notify(struct xfrm_state *x, const struct km_event *c)
@@ -2470,7 +2483,7 @@ static int xfrm_notify_sa(struct xfrm_state *x, const struct km_event *c)
struct nlmsghdr *nlh;
struct sk_buff *skb;
int len = xfrm_sa_len(x);
- int headlen;
+ int headlen, err;
headlen = sizeof(*p);
if (c->event == XFRM_MSG_DELSA) {
@@ -2485,8 +2498,9 @@ static int xfrm_notify_sa(struct xfrm_state *x, const struct km_event *c)
return -ENOMEM;
nlh = nlmsg_put(skb, c->pid, c->seq, c->event, headlen, 0);
+ err = -EMSGSIZE;
if (nlh == NULL)
- goto nla_put_failure;
+ goto out_free_skb;
p = nlmsg_data(nlh);
if (c->event == XFRM_MSG_DELSA) {
@@ -2499,24 +2513,23 @@ static int xfrm_notify_sa(struct xfrm_state *x, const struct km_event *c)
id->proto = x->id.proto;
attr = nla_reserve(skb, XFRMA_SA, sizeof(*p));
+ err = -EMSGSIZE;
if (attr == NULL)
- goto nla_put_failure;
+ goto out_free_skb;
p = nla_data(attr);
}
-
- if (copy_to_user_state_extra(x, p, skb))
- goto nla_put_failure;
+ err = copy_to_user_state_extra(x, p, skb);
+ if (err)
+ goto out_free_skb;
nlmsg_end(skb, nlh);
return nlmsg_multicast(net->xfrm.nlsk, skb, 0, XFRMNLGRP_SA, GFP_ATOMIC);
-nla_put_failure:
- /* Somebody screwed up with xfrm_sa_len! */
- WARN_ON(1);
+out_free_skb:
kfree_skb(skb);
- return -1;
+ return err;
}
static int xfrm_send_state_notify(struct xfrm_state *x, const struct km_event *c)
@@ -2557,9 +2570,10 @@ static int build_acquire(struct sk_buff *skb, struct xfrm_state *x,
struct xfrm_tmpl *xt, struct xfrm_policy *xp,
int dir)
{
+ __u32 seq = xfrm_get_acqseq();
struct xfrm_user_acquire *ua;
struct nlmsghdr *nlh;
- __u32 seq = xfrm_get_acqseq();
+ int err;
nlh = nlmsg_put(skb, 0, 0, XFRM_MSG_ACQUIRE, sizeof(*ua), 0);
if (nlh == NULL)
@@ -2575,21 +2589,19 @@ static int build_acquire(struct sk_buff *skb, struct xfrm_state *x,
ua->calgos = xt->calgos;
ua->seq = x->km.seq = seq;
- if (copy_to_user_tmpl(xp, skb) < 0)
- goto nlmsg_failure;
- if (copy_to_user_state_sec_ctx(x, skb))
- goto nlmsg_failure;
- if (copy_to_user_policy_type(xp->type, skb) < 0)
- goto nlmsg_failure;
- if (xfrm_mark_put(skb, &xp->mark))
- goto nla_put_failure;
+ err = copy_to_user_tmpl(xp, skb);
+ if (!err)
+ err = copy_to_user_state_sec_ctx(x, skb);
+ if (!err)
+ err = copy_to_user_policy_type(xp->type, skb);
+ if (!err)
+ err = xfrm_mark_put(skb, &xp->mark);
+ if (err) {
+ nlmsg_cancel(skb, nlh);
+ return err;
+ }
return nlmsg_end(skb, nlh);
-
-nla_put_failure:
-nlmsg_failure:
- nlmsg_cancel(skb, nlh);
- return -EMSGSIZE;
}
static int xfrm_send_acquire(struct xfrm_state *x, struct xfrm_tmpl *xt,
@@ -2681,8 +2693,9 @@ static int build_polexpire(struct sk_buff *skb, struct xfrm_policy *xp,
int dir, const struct km_event *c)
{
struct xfrm_user_polexpire *upe;
- struct nlmsghdr *nlh;
int hard = c->data.hard;
+ struct nlmsghdr *nlh;
+ int err;
nlh = nlmsg_put(skb, c->pid, 0, XFRM_MSG_POLEXPIRE, sizeof(*upe), 0);
if (nlh == NULL)
@@ -2690,22 +2703,20 @@ static int build_polexpire(struct sk_buff *skb, struct xfrm_policy *xp,
upe = nlmsg_data(nlh);
copy_to_user_policy(xp, &upe->pol, dir);
- if (copy_to_user_tmpl(xp, skb) < 0)
- goto nlmsg_failure;
- if (copy_to_user_sec_ctx(xp, skb))
- goto nlmsg_failure;
- if (copy_to_user_policy_type(xp->type, skb) < 0)
- goto nlmsg_failure;
- if (xfrm_mark_put(skb, &xp->mark))
- goto nla_put_failure;
+ err = copy_to_user_tmpl(xp, skb);
+ if (!err)
+ err = copy_to_user_sec_ctx(xp, skb);
+ if (!err)
+ err = copy_to_user_policy_type(xp->type, skb);
+ if (!err)
+ err = xfrm_mark_put(skb, &xp->mark);
+ if (err) {
+ nlmsg_cancel(skb, nlh);
+ return err;
+ }
upe->hard = !!hard;
return nlmsg_end(skb, nlh);
-
-nla_put_failure:
-nlmsg_failure:
- nlmsg_cancel(skb, nlh);
- return -EMSGSIZE;
}
static int xfrm_exp_policy_notify(struct xfrm_policy *xp, int dir, const struct km_event *c)
@@ -2725,13 +2736,13 @@ static int xfrm_exp_policy_notify(struct xfrm_policy *xp, int dir, const struct
static int xfrm_notify_policy(struct xfrm_policy *xp, int dir, const struct km_event *c)
{
+ int len = nla_total_size(sizeof(struct xfrm_user_tmpl) * xp->xfrm_nr);
struct net *net = xp_net(xp);
struct xfrm_userpolicy_info *p;
struct xfrm_userpolicy_id *id;
struct nlmsghdr *nlh;
struct sk_buff *skb;
- int len = nla_total_size(sizeof(struct xfrm_user_tmpl) * xp->xfrm_nr);
- int headlen;
+ int headlen, err;
headlen = sizeof(*p);
if (c->event == XFRM_MSG_DELPOLICY) {
@@ -2747,8 +2758,9 @@ static int xfrm_notify_policy(struct xfrm_policy *xp, int dir, const struct km_e
return -ENOMEM;
nlh = nlmsg_put(skb, c->pid, c->seq, c->event, headlen, 0);
+ err = -EMSGSIZE;
if (nlh == NULL)
- goto nlmsg_failure;
+ goto out_free_skb;
p = nlmsg_data(nlh);
if (c->event == XFRM_MSG_DELPOLICY) {
@@ -2763,29 +2775,29 @@ static int xfrm_notify_policy(struct xfrm_policy *xp, int dir, const struct km_e
memcpy(&id->sel, &xp->selector, sizeof(id->sel));
attr = nla_reserve(skb, XFRMA_POLICY, sizeof(*p));
+ err = -EMSGSIZE;
if (attr == NULL)
- goto nlmsg_failure;
+ goto out_free_skb;
p = nla_data(attr);
}
copy_to_user_policy(xp, p, dir);
- if (copy_to_user_tmpl(xp, skb) < 0)
- goto nlmsg_failure;
- if (copy_to_user_policy_type(xp->type, skb) < 0)
- goto nlmsg_failure;
-
- if (xfrm_mark_put(skb, &xp->mark))
- goto nla_put_failure;
+ err = copy_to_user_tmpl(xp, skb);
+ if (!err)
+ err = copy_to_user_policy_type(xp->type, skb);
+ if (!err)
+ err = xfrm_mark_put(skb, &xp->mark);
+ if (err)
+ goto out_free_skb;
nlmsg_end(skb, nlh);
return nlmsg_multicast(net->xfrm.nlsk, skb, 0, XFRMNLGRP_POLICY, GFP_ATOMIC);
-nla_put_failure:
-nlmsg_failure:
+out_free_skb:
kfree_skb(skb);
- return -1;
+ return err;
}
static int xfrm_notify_policy_flush(const struct km_event *c)
@@ -2793,24 +2805,27 @@ static int xfrm_notify_policy_flush(const struct km_event *c)
struct net *net = c->net;
struct nlmsghdr *nlh;
struct sk_buff *skb;
+ int err;
skb = nlmsg_new(userpolicy_type_attrsize(), GFP_ATOMIC);
if (skb == NULL)
return -ENOMEM;
nlh = nlmsg_put(skb, c->pid, c->seq, XFRM_MSG_FLUSHPOLICY, 0, 0);
+ err = -EMSGSIZE;
if (nlh == NULL)
- goto nlmsg_failure;
- if (copy_to_user_policy_type(c->data.type, skb) < 0)
- goto nlmsg_failure;
+ goto out_free_skb;
+ err = copy_to_user_policy_type(c->data.type, skb);
+ if (err)
+ goto out_free_skb;
nlmsg_end(skb, nlh);
return nlmsg_multicast(net->xfrm.nlsk, skb, 0, XFRMNLGRP_POLICY, GFP_ATOMIC);
-nlmsg_failure:
+out_free_skb:
kfree_skb(skb);
- return -1;
+ return err;
}
static int xfrm_send_policy_notify(struct xfrm_policy *xp, int dir, const struct km_event *c)
@@ -2853,15 +2868,14 @@ static int build_report(struct sk_buff *skb, u8 proto,
ur->proto = proto;
memcpy(&ur->sel, sel, sizeof(ur->sel));
- if (addr &&
- nla_put(skb, XFRMA_COADDR, sizeof(*addr), addr))
- goto nla_put_failure;
-
+ if (addr) {
+ int err = nla_put(skb, XFRMA_COADDR, sizeof(*addr), addr);
+ if (err) {
+ nlmsg_cancel(skb, nlh);
+ return err;
+ }
+ }
return nlmsg_end(skb, nlh);
-
-nla_put_failure:
- nlmsg_cancel(skb, nlh);
- return -EMSGSIZE;
}
static int xfrm_send_report(struct net *net, u8 proto,
--
1.7.10.2
^ permalink raw reply related
* Re: [net-next RFC V3 PATCH 4/6] tuntap: multiqueue support
From: Sridhar Samudrala @ 2012-06-28 4:52 UTC (permalink / raw)
To: Jason Wang
Cc: Michael S. Tsirkin, habanero, netdev, linux-kernel, krkumar2,
tahm, akong, davem, shemminger, mashirle
In-Reply-To: <4FEBC936.3080001@redhat.com>
On 6/27/2012 8:02 PM, Jason Wang wrote:
> On 06/27/2012 04:44 PM, Michael S. Tsirkin wrote:
>> On Wed, Jun 27, 2012 at 01:16:30PM +0800, Jason Wang wrote:
>>> On 06/26/2012 06:42 PM, Michael S. Tsirkin wrote:
>>>> On Tue, Jun 26, 2012 at 11:42:17AM +0800, Jason Wang wrote:
>>>>> On 06/25/2012 04:25 PM, Michael S. Tsirkin wrote:
>>>>>> On Mon, Jun 25, 2012 at 02:10:18PM +0800, Jason Wang wrote:
>>>>>>> This patch adds multiqueue support for tap device. This is done
>>>>>>> by abstracting
>>>>>>> each queue as a file/socket and allowing multiple sockets to be
>>>>>>> attached to the
>>>>>>> tuntap device (an array of tun_file were stored in the
>>>>>>> tun_struct). Userspace
>>>>>>> could write and read from those files to do the parallel packet
>>>>>>> sending/receiving.
>>>>>>>
>>>>>>> Unlike the previous single queue implementation, the socket and
>>>>>>> device were
>>>>>>> loosely coupled, each of them were allowed to go away first. In
>>>>>>> order to let the
>>>>>>> tx path lockless, netif_tx_loch_bh() is replaced by
>>>>>>> RCU/NETIF_F_LLTX to
>>>>>>> synchronize between data path and system call.
>>>>>> Don't use LLTX/RCU. It's not worth it.
>>>>>> Use something like netif_set_real_num_tx_queues.
>>>>>>
>>>>>>> The tx queue selecting is first based on the recorded rxq index
>>>>>>> of an skb, it
>>>>>>> there's no such one, then choosing based on rx hashing
>>>>>>> (skb_get_rxhash()).
>>>>>>>
>>>>>>> Signed-off-by: Jason Wang<jasowang@redhat.com>
>>>>>> Interestingly macvtap switched to hashing first:
>>>>>> ef0002b577b52941fb147128f30bd1ecfdd3ff6d
>>>>>> (the commit log is corrupted but see what it
>>>>>> does in the patch).
>>>>>> Any idea why?
>>>>> Yes, so tap should be changed to behave same as macvtap. I remember
>>>>> the reason we do that is to make sure the packet of a single flow to
>>>>> be queued to a fixed socket/virtqueues. As 10g cards like ixgbe
>>>>> choose the rx queue for a flow based on the last tx queue where the
>>>>> packets of that flow comes. So if we are using recored rx queue in
>>>>> macvtap, the queue index of a flow would change as vhost thread
>>>>> moves amongs processors.
>>>> Hmm. OTOH if you override this, if TX is sent from VCPU0, RX might
>>>> land
>>>> on VCPU1 in the guest, which is not good, right?
>>> Yes, but better than making the rx moves between vcpus when we use
>>> recorded rx queue.
>> Why isn't this a problem with native TCP?
>> I think what happens is one of the following:
>> - moving between CPUs is more expensive with tun
>> because it can queue so much data on xmit
>> - scheduler makes very bad decisions about VCPUs
>> bouncing them around all the time
>
> For usual native TCP/host process, as it reads and writes tcp sockets,
> so it make make sense to move rx to the porcessor where the process
> moves. But vhost does not do tcp stuffs and ixgbe would still move rx
> when vhost process moves, and we can't even make sure the vhost
> process that handling rx is running on processor that handle rx
> interrupt.
We also saw this behavior with the default ixgbe configuration. If vhost
is pinned to a CPU all
packets for that VM are received on a single RX queue.
So even if the VM is doing multiple TCP_RR sessions, packets for all the
flows are received
on a single RX queue. Without pinning, vhost moves around and so does
the packets across
the RX queues.
I think
ethtool -K ethX ntuple on
will disable this behavior and it should be possible to program the flow
director using ethtool -U.
This way we can split the packets across the host NIC RX queues based on
the flows, but it is not
clear if this would help with the current model of single vhost per device.
With per-cpu vhost, each RX queue can be handled by the matching vhost,
but if we have only
1 queue in the VMs virtio-net device, that could become the bottleneck.
Multi-queue virtio-net should help here, but we need the same number of
queues in VM's virtio-net
device as the host's NIC so that each vhost can handle the corresponding
virtio queue.
But if the VM has only 2 vcpus, i think it is not efficient to have 8
virtio-net queues.(to match a host
with 8 physical cpus and 8 RX queues in the NIC).
Thanks
Sridhar
>
>> Could we isolate which it is? Does the problem
>> still happen if you pin VCPUs to host cpus?
>> If not it's the queue depth.
>
> It may not help as tun does not record the vcpu/queue that send the
> stream, so it can't transmit the packets back the same vcpu/queue.
>>> Flow steering is needed to make sure the tx and
>>> rx on the same vcpu.
>> That involves IPI between processes, so it might be
>> very expensive for kvm.
>>
>>>>> But during test tun/tap, one interesting thing I find is that even
>>>>> ixgbe has recorded the queue index during rx, it seems be lost when
>>>>> tap tries to transmit skbs to userspace.
>>>> dev_pick_tx does this I think but ndo_select_queue
>>>> should be able to get it without trouble.
>>>>
>>>>
^ permalink raw reply
* Re: [PATCH v2 0/5] fec driver updates
From: David Miller @ 2012-06-28 4:30 UTC (permalink / raw)
To: shawn.guo; +Cc: LW, florian, netdev, linux-arm-kernel, devicetree-discuss
In-Reply-To: <1340804724-29410-1-git-send-email-shawn.guo@linaro.org>
From: Shawn Guo <shawn.guo@linaro.org>
Date: Wed, 27 Jun 2012 21:45:19 +0800
> Changes since v1:
> * Add one patch to use devm_gpio_request_one
> * Have a separate patch to fix phy-reset-gpios property in binding
> document
> * Change phy-reset-interval to phy-reset-duration
> * Add a sanity check on phy-reset-duration value
All applied.
^ permalink raw reply
* Re: [patch net-next] virtio_net: allow to change mac when iface is running
From: David Miller @ 2012-06-28 4:30 UTC (permalink / raw)
To: jpirko; +Cc: netdev, virtualization, brouer, mst
In-Reply-To: <1340810866-1017-1-git-send-email-jpirko@redhat.com>
From: Jiri Pirko <jpirko@redhat.com>
Date: Wed, 27 Jun 2012 17:27:46 +0200
> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Applied, but this seriously makes eth_mac_addr() completely useless.
Technically, every eth_mac_addr() user in a software/virtual device
should behave the way virtio_net does now.
It therefore probably makes sense to add a boolean arg which when true
elides the netif_running() check then fixup and audit every caller.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox