Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH RFC 4/6] net: rfkill: gpio: add device tree support
From: Alexandre Courbot @ 2014-01-21 14:53 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	Linus Walleij, devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Heikki Krogerus, netdev, linux-wireless, linux-sunxi,
	linux-kernel, Maxime Ripard, Chen-Yu Tsai, Johannes Berg,
	Mika Westerberg, David S. Miller,
	linux-gpio-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <201401211335.16885.arnd-r2nGTMty4D4@public.gmane.org>

On Tue, Jan 21, 2014 at 9:35 PM, Arnd Bergmann <arnd-r2nGTMty4D4@public.gmane.org> wrote:
> On Tuesday 21 January 2014, Linus Walleij wrote:
>> On Tue, Jan 21, 2014 at 4:11 AM, Alexandre Courbot <gnurou-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> > On Sat, Jan 18, 2014 at 8:11 AM, Linus Walleij <linus.walleij-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>> >
>> > I agree that's how it should be be done with the current API if your
>> > driver can obtain GPIOs from both ACPI and DT. This is a potential
>> > issue, as drivers are not supposed to make assumptions about who is
>> > going to be their GPIO provider. Let's say you started a driver with
>> > only DT in mind, and used gpio_get(dev, con_id) to get your GPIOs. DT
>> > bindings are thus of the form "con_id-gpio = <phandle>", and set in
>> > stone. Then later, someone wants to use your driver with ACPI. How do
>> > you handle that gracefully?
>>
>> Short answer is you can't. You have to pour backward-compatibility
>> code into the driver first checking for that property and then falling
>> back to the new binding if it doesn't exist.
>
> With the ACPI named properties extension, it should be possible to have
> something akin to a "gpio-names" list that can be attached to an indexed
> array of gpio descriptors. I assume that Intel is going to need this
> for named irqs, clocks, regulators, dmas as well, so I think it will
> eventually get there. It's not something that can be done today though,
> or that is standardized in APCI-5.0.

Good to know. I'm not sure this would help with the named GPIO
resolution issue however. I assume that contrary to DT we will have no
control over the naming of ACPI-defined GPIOs, and thus there is
little chance that a GPIO serving a function will end up having the
same name as the one we chose for the DT binding...

>
> My guess is that named GPIOs are going to make more sense on x86 embedded
> than on arm64 server.
>
>> > I'm starting to wonder, now that ACPI is a first-class GPIO provider,
>> > whether we should not start to encourage the deprecation of the
>> > "con_id-gpio = <phandle>" binding form in DT and only use a single
>> > indexed GPIO property per device.
>>
>> You have a valid point.
>
> Independent of ACPI, I prefer indexed "gpios" properties over "con_id-gpio"
> properties anyway, because it's more consistent with some of the other
> subsystems. I don't have an opinion though on whether we should also
> allow a "gpios"/"gpio-names" pair, or whether we should keep the indexed
> "gpios" list for the anonymous case.
>
>> > The con_id parameter would then only
>> > be used as a label, which would also have the nice side-effect that
>> > all GPIOs used for a given function will be reported under the same
>> > name no matter what the GPIO provider is.
>>
>> As discussed earlier in this thread I'm not sure the con_id is
>> suitable for labelling GPIOs. It'd be better to have a proper name
>> specified in DT/ACPI instead.
>
> +1

I wonder why you guys prefer to have the name defined in the GPIO
mapping. Having the driver decide the label makes it easier to look up
which GPIO does what in debugfs, whereas nothing prevents people to
name GPIOs whatever inadequate name they want in the device DT node.
What am I overlooking here?

>
>> > From an aesthetic point of view, I definitely prefer using con_id to
>> > identify GPIOs instead of indexes, but I don't see how we can make it
>> > play nice with ACPI. Thoughts?
>>
>> Let's ask the DT maintainers...
>>
>> I'm a bit sceptic to the whole ACPI-DT-API-should-be-unified
>> just-one-function-call business, as this was just a very simple example
>> of what can happen to something as simple as
>> devm_gpiod_get[_index]().
>
> I think a unified kernel API makes more sense for some subsystems than
> others, and it depends a bit on the rate of adoption of APCI for drivers
> that already have a DT binding (or vice versa, if that happens).
>
> GPIO might actually be in the first category since it's commonly used
> for off-chip components that will get shared across ARM and x86 (as
> well as everything else), while a common kernel API would be less
> important for things that are internal to an SoC where Intel is the
> only company needing ACPI support.

I am afraid I don't have a good enough view of the ACPI landscape to
understand how often drivers might be reused on both ACPI and DT. But
I suppose nothing speaks against that, technically speaking. Maybe
Mika would have comments to make here?

The good (or bad, rather) thing about DT is that we can do whatever we
please with the new bindings: decide which name or which index
(doesn't matter here) a GPIO should have. However we don't have this
control over ACPI, where nothing guarantees that the same index will
be used for the same GPIO function. And there goes our unified GPIO
mapping. Workarounds would imply additional layers of mapping, or
using different probe functions depending on whether we rely on DT or
ACPI (I don't want to imagine there will be systems that use *both*).
Considering that we already have drivers using that trick (e.g. to
choose between SPI or I2C), the latter might be acceptable.

Alex.

^ permalink raw reply

* Re: [PATCH net-next] net: filter: let bpf_tell_extensions return SKF_AD_MAX
From: Michal Sekletar @ 2014-01-21 14:25 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: davem, netdev, Eric Dumazet
In-Reply-To: <1390259977-28770-1-git-send-email-dborkman@redhat.com>

On Tue, Jan 21, 2014 at 12:19:37AM +0100, Daniel Borkmann wrote:
> Michal Sekletar added in commit ea02f9411d9f ("net: introduce
> SO_BPF_EXTENSIONS") a facility where user space can enquire
> the BPF ancillary instruction set, which is imho a step into
> the right direction for letting user space high-level to BPF
> optimizers make an informed decision for possibly using these
> extensions.
> 
> The original rationale was to return through a getsockopt(2)
> a bitfield of which instructions are supported and which
> are not, as of right now, we just return 0 to indicate a
> base support for SKF_AD_PROTOCOL up to SKF_AD_PAY_OFFSET.
> Limitations of this approach are that this API which we need
> to maintain for a long time can only support a maximum of 32
> extensions, and needs to be additionally maintained/updated
> when each new extension that comes in.
> 
> I thought about this a bit more and what we can do here to
> overcome this is to just return SKF_AD_MAX. Since we never
> remove any extension since we cannot break user space and
> always linearly increase SKF_AD_MAX on each newly added
> extension, user space can make a decision on what extensions
> are supported in the whole set of extensions and which aren't,
> by just checking which of them from the whole set have an
> offset < SKF_AD_MAX of the underlying kernel.
> 
> Since SKF_AD_MAX must be updated each time we add new ones,
> we don't need to introduce an additional enum and got
> maintenance for free. At some point in time when
> SO_BPF_EXTENSIONS becomes ubiquitous for most kernels, then
> an application can simply make use of this and easily be run
> on newer or older underlying kernels without needing to be
> recompiled, of course. Since that is for 3.14, it's not too
> late to do this change.
> 
> Cc: Michal Sekletar <msekleta@redhat.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
> ---
>  include/linux/filter.h | 8 +-------
>  1 file changed, 1 insertion(+), 7 deletions(-)
> 
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index 1a95a2d..e568c8e 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -85,13 +85,7 @@ static inline void bpf_jit_free(struct sk_filter *fp)
>  
>  static inline int bpf_tell_extensions(void)
>  {
> -	/* When adding new BPF extension it is necessary to enumerate
> -	 * it here, so userspace software which wants to know what is
> -	 * supported can do so by inspecting return value of this
> -	 * function
> -	 */
> -
> -	return 0;
> +	return SKF_AD_MAX;
>  }
>  
>  enum {
> -- 
> 1.8.3.1
> 
Acked-by: Michal Sekletar <msekleta@redhat.com>

Thanks!

^ permalink raw reply

* Re: [Xen-devel] [PATCH v3] xen/grant-table: Avoid m2p_override during mapping
From: Stefano Stabellini @ 2014-01-21 14:22 UTC (permalink / raw)
  To: David Vrabel
  Cc: Stefano Stabellini, Zoltan Kiss, jonathan.davies, wei.liu2,
	ian.campbell, netdev, linux-kernel, xen-devel
In-Reply-To: <52DE78BF.2070909@citrix.com>

On Tue, 21 Jan 2014, David Vrabel wrote:
> On 21/01/14 12:26, Stefano Stabellini wrote:
> > On Mon, 20 Jan 2014, Zoltan Kiss wrote:
> >
> >> -		ret = m2p_add_override(mfn, pages[i], kmap_ops ?
> >> -				       &kmap_ops[i] : NULL);
> >> +		if (m2p_override)
> >> +			ret = m2p_add_override(mfn, pages[i], kmap_ops ?
> >> +					       &kmap_ops[i] : NULL);
> >> +		else {
> >> +			unsigned long pfn = page_to_pfn(pages[i]);
> >> +			WARN_ON(PagePrivate(pages[i]));
> >> +			SetPagePrivate(pages[i]);
> >> +			set_page_private(pages[i], mfn);
> >> +			pages[i]->index = pfn_to_mfn(pfn);
> >> +			if (unlikely(!set_phys_to_machine(pfn, FOREIGN_FRAME(mfn))))
> >> +				return -ENOMEM;
> > 
> > What happens if the page is PageHighMem?
> > 
> > This looks like a subset of m2p_add_override, but it is missing some
> > relevant bits, like the PageHighMem check, or the p2m(m2p(mfn)) == mfn
> > check.  Maybe we can find a way to avoid duplicating the code.
> > We could split m2p_add_override in two functions or add yet another
> > parameter to it.
> 
> The PageHighMem() check isn't relevant as we're not mapping anything
> here.  Also, a page for a kernel grant mapping only cannot be highmem.
> 
> The check for a local mfn and the additional set_phys_to_machine() is
> only necessary if something tries an mfn_to_pfn() on the local mfn.  We
> can only omit adding an m2p override if we know thing will try
> mfn_to_pfn(), therefore the check and set_phys_to_machine() is unnecessary.

OK, you convinced me that the two checks are superfluous for this case.

Can we still avoid the code duplication by removing the corresponding
code from m2p_add_override and m2p_remove_override and doing the
set_page_private thing uniquely in grant-table.c?

^ permalink raw reply

* Re: [PATCH net-next 0/7] sctp: remove some macro locking wrappers
From: Neil Horman @ 2014-01-21 14:10 UTC (permalink / raw)
  To: Wang Weidong; +Cc: davem, vyasevich, dborkman, netdev, linux-sctp
In-Reply-To: <1390290252-16080-1-git-send-email-wangweidong1@huawei.com>

On Tue, Jan 21, 2014 at 03:44:05PM +0800, Wang Weidong wrote:
> In sctp.h we can find some macro locking wrappers. As Neil point out that:
> 
> "Its because in the origional implementation of the sctp protocol, there was a
> user space test harness which built the kernel module for userspace execution to
> cary our some unit testing on the code.  It did so by redefining some of those
> locking macros to user space friendly code.  IIRC we haven't use those unit
> tests in years, and so should be removing them, not adding them to other
> locations."
> 
> So I remove them.
> 
> Wang Weidong (7):
>   sctp: remove macro sctp_spin_[un]lock_irqrestore
>   sctp: remove macro sctp_local_bh_{disable|enable}
>   sctp: remove macro sctp_spin_[un]lock
>   sctp: remove macros sctp_write_[un]_lock
>   sctp: remove macros sctp_read_[un]lock
>   sctp: remove macros sctp_{lock|release}_sock
>   sctp: remove macros sctp_bh_[un]lock_sock
> 
>  include/net/sctp/sctp.h  | 27 ++-----------
>  net/sctp/endpointola.c   |  4 +-
>  net/sctp/input.c         | 54 +++++++++++++-------------
>  net/sctp/proc.c          | 12 +++---
>  net/sctp/protocol.c      |  4 +-
>  net/sctp/sm_sideeffect.c | 16 ++++----
>  net/sctp/socket.c        | 98 ++++++++++++++++++++++++------------------------
>  7 files changed, 98 insertions(+), 117 deletions(-)
> 
> -- 
> 1.7.12
> 
> 
> 
I would have preferred that this all be done as one patch, so that if anyone was
still using this macro set they would have a single change to bisect to, but
given that we already had several non-macrotized uses of these locks, its not
overly relevant

Acked-by: Neil Horman <nhorman@tuxdriver.com>

^ permalink raw reply

* [PATCH v3 3/7] net: moxa: connect to PHY
From: Jonas Jensen @ 2014-01-21 13:50 UTC (permalink / raw)
  To: netdev
  Cc: linux-arm-kernel, linux-kernel, davem, f.fainelli, bhutchings,
	robherring2, Jonas Jensen
In-Reply-To: <CAL_JsqKs-snsRaD1JQRVUrg+WBDd9cYxPhtfKf9m5HeJ6df3Pw@mail.gmail.com>

The kernel now has a MDIO bus driver and a phy_driver (RTL8201CP),
connect to this PHY using OF.

Signed-off-by: Jonas Jensen <jonas.jensen@gmail.com>
---

Notes:
    Thanks for reviewing,
    
    This time I'll avoid sending out the entire set, attaching it here
    directly to my reply.
    
    Because the DT example is directly imported from files in arch/arm/
    those must be updated too. I'll submit those in a separate fixup patch.
    
    Changes since v3:
    
    1. drop device_type property from DT example
    2. replace "mac@" with "ethernet@" in DT example
    
    Applies to next-20140121

 .../devicetree/bindings/net/moxa,moxart-mac.txt    | 49 +++++++++++-
 drivers/net/ethernet/moxa/moxart_ether.c           | 92 +++++++++++++++++++++-
 drivers/net/ethernet/moxa/moxart_ether.h           |  2 +
 3 files changed, 138 insertions(+), 5 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/moxa,moxart-mac.txt b/Documentation/devicetree/bindings/net/moxa,moxart-mac.txt
index 583418b..36d5185 100644
--- a/Documentation/devicetree/bindings/net/moxa,moxart-mac.txt
+++ b/Documentation/devicetree/bindings/net/moxa,moxart-mac.txt
@@ -1,21 +1,62 @@
 MOXA ART Ethernet Controller
 
+Integrated MDIO bus node:
+
+- compatible: "moxa,moxart-mdio"
+- Inherits from MDIO bus node binding[1]
+
+[1] Documentation/devicetree/bindings/net/phy.txt
+
+
+Ethernet node:
+
 Required properties:
 
 - compatible : Must be "moxa,moxart-mac"
 - reg : Should contain register location and length
 - interrupts : Should contain the mac interrupt number
 
+Optional Properties:
+
+- phy-handle : the phandle to a PHY node
+
+
 Example:
 
-	mac0: mac@90900000 {
+	mdio0: mdio@90900090 {
+		compatible = "moxa,moxart-mdio";
+		reg = <0x90900090 0x8>;
+		#address-cells = <1>;
+		#size-cells = <0>;
+
+		ethphy0: ethernet-phy@1 {
+			compatible = "moxa,moxart-rtl8201cp", "ethernet-phy-ieee802.3-c22";
+			reg = <1>;
+		};
+	};
+
+	mdio1: mdio@92000090 {
+		compatible = "moxa,moxart-mdio";
+		reg = <0x92000090 0x8>;
+		#address-cells = <1>;
+		#size-cells = <0>;
+
+		ethphy1: ethernet-phy@1 {
+			compatible = "moxa,moxart-rtl8201cp", "ethernet-phy-ieee802.3-c22";
+			reg = <1>;
+		};
+	};
+
+	mac0: ethernet@90900000 {
 		compatible = "moxa,moxart-mac";
-		reg =	<0x90900000 0x100>;
+		reg = <0x90900000 0x90>;
 		interrupts = <25 0>;
+		phy-handle = <&ethphy0>;
 	};
 
-	mac1: mac@92000000 {
+	mac1: ethernet@92000000 {
 		compatible = "moxa,moxart-mac";
-		reg =	<0x92000000 0x100>;
+		reg = <0x92000000 0x90>;
 		interrupts = <27 0>;
+		phy-handle = <&ethphy1>;
 	};
diff --git a/drivers/net/ethernet/moxa/moxart_ether.c b/drivers/net/ethernet/moxa/moxart_ether.c
index 17c9f0e..c19bff2 100644
--- a/drivers/net/ethernet/moxa/moxart_ether.c
+++ b/drivers/net/ethernet/moxa/moxart_ether.c
@@ -25,6 +25,9 @@
 #include <linux/of_irq.h>
 #include <linux/crc32.h>
 #include <linux/crc32c.h>
+#include <linux/phy.h>
+#include <linux/of_mdio.h>
+#include <linux/of_net.h>
 
 #include "moxart_ether.h"
 
@@ -60,6 +63,16 @@ static int moxart_set_mac_address(struct net_device *ndev, void *addr)
 	return 0;
 }
 
+static int moxart_do_ioctl(struct net_device *ndev, struct ifreq *ir, int cmd)
+{
+	struct moxart_mac_priv_t *priv = netdev_priv(ndev);
+
+	if (!netif_running(ndev))
+		return -EINVAL;
+
+	return phy_mii_ioctl(priv->phy_dev, ir, cmd);
+}
+
 static void moxart_mac_free_memory(struct net_device *ndev)
 {
 	struct moxart_mac_priv_t *priv = netdev_priv(ndev);
@@ -109,6 +122,19 @@ static void moxart_mac_enable(struct net_device *ndev)
 	writel(priv->reg_maccr, priv->base + REG_MAC_CTRL);
 }
 
+static void moxart_mac_update_duplex(struct net_device *ndev)
+{
+	struct moxart_mac_priv_t *priv = netdev_priv(ndev);
+
+	priv->reg_maccr &= ~(FULLDUP | ENRX_IN_HALFTX);
+	if (priv->duplex)
+		priv->reg_maccr |= FULLDUP;
+	else
+		priv->reg_maccr |= ENRX_IN_HALFTX;
+
+	writel(priv->reg_maccr, priv->base + REG_MAC_CTRL);
+}
+
 static void moxart_mac_setup_desc_ring(struct net_device *ndev)
 {
 	struct moxart_mac_priv_t *priv = netdev_priv(ndev);
@@ -168,6 +194,9 @@ static int moxart_mac_open(struct net_device *ndev)
 	moxart_update_mac_address(ndev);
 	moxart_mac_setup_desc_ring(ndev);
 	moxart_mac_enable(ndev);
+
+	phy_start(priv->phy_dev);
+
 	netif_start_queue(ndev);
 
 	netdev_dbg(ndev, "%s: IMR=0x%x, MACCR=0x%x\n",
@@ -183,6 +212,8 @@ static int moxart_mac_stop(struct net_device *ndev)
 
 	napi_disable(&priv->napi);
 
+	phy_stop(priv->phy_dev);
+
 	netif_stop_queue(ndev);
 
 	/* disable all interrupts */
@@ -435,12 +466,49 @@ static struct net_device_ops moxart_netdev_ops = {
 	.ndo_set_mac_address	= moxart_set_mac_address,
 	.ndo_validate_addr	= eth_validate_addr,
 	.ndo_change_mtu		= eth_change_mtu,
+	.ndo_do_ioctl		= moxart_do_ioctl,
 };
 
+static void moxart_adjust_link(struct net_device *ndev)
+{
+	struct moxart_mac_priv_t *priv = netdev_priv(ndev);
+	unsigned long flags;
+	int status_change = 0;
+
+	if (priv->phy_dev->link) {
+		if (priv->speed != priv->phy_dev->speed) {
+			priv->speed = priv->phy_dev->speed;
+			status_change = 1;
+		}
+
+		if (priv->duplex != priv->phy_dev->duplex) {
+			spin_lock_irqsave(&priv->txlock, flags);
+
+			priv->duplex = priv->phy_dev->duplex;
+			moxart_mac_update_duplex(ndev);
+
+			spin_unlock_irqrestore(&priv->txlock, flags);
+			status_change = 1;
+		}
+	}
+
+	if (priv->link != priv->phy_dev->link) {
+		if (!priv->phy_dev->link) {
+			priv->speed = 0;
+			priv->duplex = -1;
+		}
+		priv->link = priv->phy_dev->link;
+		status_change = 1;
+	}
+
+	if (status_change)
+		phy_print_status(priv->phy_dev);
+}
+
 static int moxart_mac_probe(struct platform_device *pdev)
 {
 	struct device *p_dev = &pdev->dev;
-	struct device_node *node = p_dev->of_node;
+	struct device_node *node = p_dev->of_node, *phy_node;
 	struct net_device *ndev;
 	struct moxart_mac_priv_t *priv;
 	struct resource *res;
@@ -461,6 +529,28 @@ static int moxart_mac_probe(struct platform_device *pdev)
 	priv = netdev_priv(ndev);
 	priv->ndev = ndev;
 
+	priv->link = 0;
+	priv->speed = 0;
+	priv->duplex = -1;
+
+	phy_node = of_parse_phandle(node, "phy-handle", 0);
+	if (!phy_node) {
+		dev_err(p_dev, "of_parse_phandle failed\n");
+		ret = -ENODEV;
+		goto init_fail;
+	}
+
+	if (phy_node) {
+		priv->phy_dev = of_phy_connect(priv->ndev, phy_node,
+					       &moxart_adjust_link,
+					       0, of_get_phy_mode(node));
+		if (!priv->phy_dev) {
+			dev_err(p_dev, "of_phy_connect failed\n");
+			ret = -ENODEV;
+			goto init_fail;
+		}
+	}
+
 	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
 	ndev->base_addr = res->start;
 	priv->base = devm_ioremap_resource(p_dev, res);
diff --git a/drivers/net/ethernet/moxa/moxart_ether.h b/drivers/net/ethernet/moxa/moxart_ether.h
index 2be9280..b8877bf 100644
--- a/drivers/net/ethernet/moxa/moxart_ether.h
+++ b/drivers/net/ethernet/moxa/moxart_ether.h
@@ -297,6 +297,8 @@ struct moxart_mac_priv_t {
 	unsigned int reg_imr;
 	struct napi_struct napi;
 	struct net_device *ndev;
+	struct phy_device *phy_dev;
+	int speed, duplex, link;
 
 	dma_addr_t rx_base;
 	dma_addr_t rx_mapping[RX_DESC_NUM];
-- 
1.8.2.1

^ permalink raw reply related

* Re: [Xen-devel] [PATCH v3] xen/grant-table: Avoid m2p_override during mapping
From: David Vrabel @ 2014-01-21 13:40 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Zoltan Kiss, jonathan.davies, wei.liu2, ian.campbell, netdev,
	linux-kernel, xen-devel
In-Reply-To: <alpine.DEB.2.02.1401211217510.21510@kaball.uk.xensource.com>

On 21/01/14 12:26, Stefano Stabellini wrote:
> On Mon, 20 Jan 2014, Zoltan Kiss wrote:
>
>> -		ret = m2p_add_override(mfn, pages[i], kmap_ops ?
>> -				       &kmap_ops[i] : NULL);
>> +		if (m2p_override)
>> +			ret = m2p_add_override(mfn, pages[i], kmap_ops ?
>> +					       &kmap_ops[i] : NULL);
>> +		else {
>> +			unsigned long pfn = page_to_pfn(pages[i]);
>> +			WARN_ON(PagePrivate(pages[i]));
>> +			SetPagePrivate(pages[i]);
>> +			set_page_private(pages[i], mfn);
>> +			pages[i]->index = pfn_to_mfn(pfn);
>> +			if (unlikely(!set_phys_to_machine(pfn, FOREIGN_FRAME(mfn))))
>> +				return -ENOMEM;
> 
> What happens if the page is PageHighMem?
> 
> This looks like a subset of m2p_add_override, but it is missing some
> relevant bits, like the PageHighMem check, or the p2m(m2p(mfn)) == mfn
> check.  Maybe we can find a way to avoid duplicating the code.
> We could split m2p_add_override in two functions or add yet another
> parameter to it.

The PageHighMem() check isn't relevant as we're not mapping anything
here.  Also, a page for a kernel grant mapping only cannot be highmem.

The check for a local mfn and the additional set_phys_to_machine() is
only necessary if something tries an mfn_to_pfn() on the local mfn.  We
can only omit adding an m2p override if we know thing will try
mfn_to_pfn(), therefore the check and set_phys_to_machine() is unnecessary.

David

^ permalink raw reply

* Re: [PATCH net-next v2 2/2] bonding: add netlink attributes to slave link dev
From: Jiri Pirko @ 2014-01-21 13:34 UTC (permalink / raw)
  To: Scott Feldman; +Cc: vfalico, fubar, andy, netdev, roopa, shm, dingtianhong
In-Reply-To: <20140117065756.3194.70179.stgit@monster-03.cumulusnetworks.com>

Fri, Jan 17, 2014 at 07:57:56AM CET, sfeldma@cumulusnetworks.com wrote:
>If link is IFF_SLAVE, extend link dev netlink attributes to include
>slave attributes with new IFLA_SLAVE nest.  Add netlink notification
>(RTM_NEWLINK) when slave status changes from backup to active, or
>visa-versa.
>
>Adds new ndo_get_slave op to net_device_ops to fill skb with IFLA_SLAVE
>attributes.  Currently only used by bonding driver, but could be
>used by other aggregating devices with slaves.
>
>Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
>---
> drivers/net/bonding/bond_main.c    |    1 +
> drivers/net/bonding/bond_netlink.c |   36 ++++++++++++++++++++++++
> drivers/net/bonding/bonding.h      |   11 ++++++-
> include/linux/netdevice.h          |    5 +++
> include/uapi/linux/if_link.h       |   13 +++++++++
> net/core/rtnetlink.c               |   54 ++++++++++++++++++++++++++++++++++++
> 6 files changed, 118 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index df85cec..3220b48 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -3883,6 +3883,7 @@ static const struct net_device_ops bond_netdev_ops = {
> #endif
> 	.ndo_add_slave		= bond_enslave,
> 	.ndo_del_slave		= bond_release,
>+	.ndo_get_slave		= bond_get_slave,
> 	.ndo_fix_features	= bond_fix_features,
> };
> 
>diff --git a/drivers/net/bonding/bond_netlink.c b/drivers/net/bonding/bond_netlink.c
>index 555c783..21c6488 100644
>--- a/drivers/net/bonding/bond_netlink.c
>+++ b/drivers/net/bonding/bond_netlink.c
>@@ -22,6 +22,42 @@
> #include <linux/reciprocal_div.h>
> #include "bonding.h"
> 
>+int bond_get_slave(struct net_device *slave_dev, struct sk_buff *skb)
>+{
>+	struct slave *slave = bond_slave_get_rtnl(slave_dev);
>+	const struct aggregator *agg;
>+
>+	if (nla_put_u8(skb, IFLA_SLAVE_STATE, bond_slave_state(slave)))
>+		goto nla_put_failure;
>+
>+	if (nla_put_u8(skb, IFLA_SLAVE_MII_STATUS, slave->link))
>+		goto nla_put_failure;
>+
>+	if (nla_put_u32(skb, IFLA_SLAVE_LINK_FAILURE_COUNT,
>+			slave->link_failure_count))
>+		goto nla_put_failure;
>+
>+	if (nla_put(skb, IFLA_SLAVE_PERM_HWADDR,
>+		    slave_dev->addr_len, slave->perm_hwaddr))
>+		goto nla_put_failure;
>+
>+	if (nla_put_u16(skb, IFLA_SLAVE_QUEUE_ID, slave->queue_id))
>+		goto nla_put_failure;
>+
>+	if (slave->bond->params.mode == BOND_MODE_8023AD) {
>+		agg = SLAVE_AD_INFO(slave).port.aggregator;
>+		if (agg)
>+			if (nla_put_u16(skb, IFLA_SLAVE_AD_AGGREGATOR_ID,
>+					agg->aggregator_identifier))
>+				goto nla_put_failure;
>+	}
>+
>+	return 0;
>+
>+nla_put_failure:
>+	return -EMSGSIZE;
>+}
>+
> static const struct nla_policy bond_policy[IFLA_BOND_MAX + 1] = {
> 	[IFLA_BOND_MODE]		= { .type = NLA_U8 },
> 	[IFLA_BOND_ACTIVE_SLAVE]	= { .type = NLA_U32 },
>diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
>index 309757d..8a935f8 100644
>--- a/drivers/net/bonding/bonding.h
>+++ b/drivers/net/bonding/bonding.h
>@@ -285,12 +285,18 @@ static inline bool bond_is_lb(const struct bonding *bond)
> 
> static inline void bond_set_active_slave(struct slave *slave)
> {
>-	slave->backup = 0;
>+	if (slave->backup) {
>+		slave->backup = 0;
>+		rtmsg_ifinfo(RTM_NEWLINK, slave->dev, 0, GFP_KERNEL);
>+	}
> }
> 
> static inline void bond_set_backup_slave(struct slave *slave)
> {
>-	slave->backup = 1;
>+	if (!slave->backup) {
>+		slave->backup = 1;
>+		rtmsg_ifinfo(RTM_NEWLINK, slave->dev, 0, GFP_KERNEL);
>+	}
> }
> 
> static inline int bond_slave_state(struct slave *slave)
>@@ -426,6 +432,7 @@ int bond_sysfs_slave_add(struct slave *slave);
> void bond_sysfs_slave_del(struct slave *slave);
> int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev);
> int bond_release(struct net_device *bond_dev, struct net_device *slave_dev);
>+int bond_get_slave(struct net_device *slave_dev, struct sk_buff *skb);
> int bond_xmit_hash(struct bonding *bond, struct sk_buff *skb, int count);
> int bond_parse_parm(const char *mode_arg, const struct bond_parm_tbl *tbl);
> int bond_parm_tbl_lookup(int mode, const struct bond_parm_tbl *tbl);
>diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>index d7668b88..22c8698 100644
>--- a/include/linux/netdevice.h
>+++ b/include/linux/netdevice.h
>@@ -908,6 +908,9 @@ struct netdev_phys_port_id {
>  * int (*ndo_del_slave)(struct net_device *dev, struct net_device *slave_dev);
>  *	Called to release previously enslaved netdev.
>  *
>+ * int (*ndo_get_slave)(struct net_device *slave_dev, struct sk_buff *skb);
>+ *	Called to fill netlink skb with slave info.
>+ *
>  *      Feature/offload setting functions.
>  * netdev_features_t (*ndo_fix_features)(struct net_device *dev,
>  *		netdev_features_t features);
>@@ -1080,6 +1083,8 @@ struct net_device_ops {
> 						 struct net_device *slave_dev);
> 	int			(*ndo_del_slave)(struct net_device *dev,
> 						 struct net_device *slave_dev);
>+	int			(*ndo_get_slave)(struct net_device *slave_dev,
>+						 struct sk_buff *skb);
> 	netdev_features_t	(*ndo_fix_features)(struct net_device *dev,
> 						    netdev_features_t features);
> 	int			(*ndo_set_features)(struct net_device *dev,
>diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
>index 3e6bd3c..ba2f3bf 100644
>--- a/include/uapi/linux/if_link.h
>+++ b/include/uapi/linux/if_link.h
>@@ -144,6 +144,7 @@ enum {
> 	IFLA_NUM_RX_QUEUES,
> 	IFLA_CARRIER,
> 	IFLA_PHYS_PORT_ID,
>+	IFLA_SLAVE,
> 	__IFLA_MAX
> };
> 
>@@ -368,6 +369,18 @@ enum {
> 
> #define IFLA_BOND_AD_INFO_MAX	(__IFLA_BOND_AD_INFO_MAX - 1)
> 
>+enum {
>+	IFLA_SLAVE_STATE,
>+	IFLA_SLAVE_MII_STATUS,
>+	IFLA_SLAVE_LINK_FAILURE_COUNT,
>+	IFLA_SLAVE_PERM_HWADDR,
>+	IFLA_SLAVE_QUEUE_ID,
>+	IFLA_SLAVE_AD_AGGREGATOR_ID,
>+	__IFLA_SLAVE_MAX,
>+};


The names are not right in my opinion. "BOND" should be mentioned there.

>+
>+#define IFLA_SLAVE_MAX	(__IFLA_SLAVE_MAX - 1)
>+
> /* SR-IOV virtual function management section */
> 
> enum {
>diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
>index e6e7d58..4f85de7 100644
>--- a/net/core/rtnetlink.c
>+++ b/net/core/rtnetlink.c
>@@ -721,6 +721,28 @@ static size_t rtnl_port_size(const struct net_device *dev)
> 		return port_self_size;
> }
> 
>+static size_t rtnl_bond_slave_size(const struct net_device *dev)
>+{
>+	struct net_device *bond;
>+	size_t slave_size =
>+		nla_total_size(sizeof(struct nlattr)) +	/* IFLA_SLAVE */
>+		nla_total_size(1) +	/* IFLA_SLAVE_STATE */
>+		nla_total_size(1) +	/* IFLA_SLAVE_MII_STATUS */
>+		nla_total_size(4) +	/* IFLA_SLAVE_LINK_FAILURE_COUNT */
>+		nla_total_size(MAX_ADDR_LEN) +	/* IFLA_SLAVE_PERM_HWADDR */
>+		nla_total_size(2) +	/* IFLA_SLAVE_QUEUE_ID */
>+		nla_total_size(2) +	/* IFLA_SLAVE_AD_AGGREGATOR_ID */
>+		0;
>+
>+	if (netif_is_bond_slave((struct net_device *)dev)) {
>+		bond = netdev_master_upper_dev_get((struct net_device *)dev);
>+		if (bond && bond->netdev_ops->ndo_get_slave)
>+			return slave_size;
>+	}
>+
>+	return 0;
>+}
>+
> static noinline size_t if_nlmsg_size(const struct net_device *dev,
> 				     u32 ext_filter_mask)
> {
>@@ -750,6 +772,7 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev,
> 	       + rtnl_port_size(dev) /* IFLA_VF_PORTS + IFLA_PORT_SELF */
> 	       + rtnl_link_get_size(dev) /* IFLA_LINKINFO */
> 	       + rtnl_link_get_af_size(dev) /* IFLA_AF_SPEC */
>+	       + rtnl_bond_slave_size(dev) /* IFLA_SLAVE */
> 	       + nla_total_size(MAX_PHYS_PORT_ID_LEN); /* IFLA_PHYS_PORT_ID */
> }
> 
>@@ -847,6 +870,34 @@ static int rtnl_phys_port_id_fill(struct sk_buff *skb, struct net_device *dev)
> 	return 0;
> }
> 
>+static size_t rtnl_bond_slave_fill(struct sk_buff *skb, struct net_device *dev)
>+{
>+	struct net_device *bond;
>+	struct nlattr *nest;
>+	int err;
>+
>+	if (!netif_is_bond_slave(dev))
>+		return 0;
>+
>+	bond = netdev_master_upper_dev_get(dev);
>+	if (!bond || !bond->netdev_ops->ndo_get_slave)
>+		return 0;
>+
>+	nest = nla_nest_start(skb, IFLA_SLAVE);
>+	if (!nest)
>+		return -EMSGSIZE;
>+
>+	err = bond->netdev_ops->ndo_get_slave(dev, skb);
>+	if (err) {
>+		nla_nest_cancel(skb, nest);
>+		return (err == -EMSGSIZE) ? err : 0;
>+	}
>+
>+	nla_nest_end(skb, nest);
>+
>+	return 0;
>+}
>+
> static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
> 			    int type, u32 pid, u32 seq, u32 change,
> 			    unsigned int flags, u32 ext_filter_mask)
>@@ -1001,6 +1052,9 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
> 	if (rtnl_port_fill(skb, dev))
> 		goto nla_put_failure;
> 
>+	if (rtnl_bond_slave_fill(skb, dev))
>+		goto nla_put_failure;
>+

I must say I do not like this at all. This should be done in a generic
way. By a callback registered by bonding and possibly other master-slave
device types.

I have something in mind, will try to prepare patch soon.

Jiri


> 	if (dev->rtnl_link_ops) {
> 		if (rtnl_link_fill(skb, dev) < 0)
> 			goto nla_put_failure;
>
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next v2 1/2] bonding: add sysfs /slave dir for bond slave devices.
From: Jiri Pirko @ 2014-01-21 13:29 UTC (permalink / raw)
  To: Scott Feldman; +Cc: vfalico, fubar, andy, netdev, roopa, shm, dingtianhong
In-Reply-To: <20140117065749.3194.440.stgit@monster-03.cumulusnetworks.com>

Fri, Jan 17, 2014 at 07:57:49AM CET, sfeldma@cumulusnetworks.com wrote:
>Add sub-directory under /sys/class/net/<interface>/slave with
>read-only attributes for slave.  Directory only appears when
><interface> is a slave.
>
>$ tree /sys/class/net/eth2/slave/

This should be either "bonding_slave" or

>/sys/class/net/eth2/slave/
>├── ad_aggregator_id
>├── link_failure_count
>├── mii_status
>├── perm_hwaddr
>├── queue_id
>└── state

^ these should be prefixed by "bonding_"


>
>$ cat /sys/class/net/eth2/slave/*
>2
>0
>up
>40:02:10:ef:06:01
>0
>active
>
>Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
>---
> drivers/net/bonding/Makefile           |    2 
> drivers/net/bonding/bond_main.c        |   27 ++++++
> drivers/net/bonding/bond_procfs.c      |   12 ---
> drivers/net/bonding/bond_sysfs_slave.c |  144 ++++++++++++++++++++++++++++++++
> drivers/net/bonding/bonding.h          |    4 +
> 5 files changed, 176 insertions(+), 13 deletions(-)
> create mode 100644 drivers/net/bonding/bond_sysfs_slave.c
>
>diff --git a/drivers/net/bonding/Makefile b/drivers/net/bonding/Makefile
>index 5a5d720..6f4e808 100644
>--- a/drivers/net/bonding/Makefile
>+++ b/drivers/net/bonding/Makefile
>@@ -4,7 +4,7 @@
> 
> obj-$(CONFIG_BONDING) += bonding.o
> 
>-bonding-objs := bond_main.o bond_3ad.o bond_alb.o bond_sysfs.o bond_debugfs.o bond_netlink.o bond_options.o
>+bonding-objs := bond_main.o bond_3ad.o bond_alb.o bond_sysfs.o bond_sysfs_slave.o bond_debugfs.o bond_netlink.o bond_options.o
> 
> proc-$(CONFIG_PROC_FS) += bond_procfs.o
> bonding-objs += $(proc-y)
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index f00dd45..df85cec 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -466,6 +466,22 @@ static void bond_update_speed_duplex(struct slave *slave)
> 	return;
> }
> 
>+const char *bond_slave_link_status(s8 link)
>+{
>+	switch (link) {
>+	case BOND_LINK_UP:
>+		return "up";
>+	case BOND_LINK_FAIL:
>+		return "going down";
>+	case BOND_LINK_DOWN:
>+		return "down";
>+	case BOND_LINK_BACK:
>+		return "going back";
>+	default:
>+		return "unknown";
>+	}
>+}
>+
> /*
>  * if <dev> supports MII link status reporting, check its link status.
>  *
>@@ -1576,6 +1592,12 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
> 		goto err_unregister;
> 	}
> 
>+	res = bond_sysfs_slave_add(new_slave);
>+	if (res) {
>+		pr_debug("Error %d calling bond_sysfs_slave_add\n", res);
>+		goto err_upper_unlink;
>+	}
>+
> 	bond->slave_cnt++;
> 	bond_compute_features(bond);
> 	bond_set_carrier(bond);
>@@ -1595,6 +1617,9 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
> 	return 0;
> 
> /* Undo stages on error */
>+err_upper_unlink:
>+	bond_upper_dev_unlink(bond_dev, slave_dev);
>+
> err_unregister:
> 	netdev_rx_handler_unregister(slave_dev);
> 
>@@ -1687,6 +1712,8 @@ static int __bond_release_one(struct net_device *bond_dev,
> 	/* release the slave from its bond */
> 	bond->slave_cnt--;
> 
>+	bond_sysfs_slave_del(slave);
>+
> 	bond_upper_dev_unlink(bond_dev, slave_dev);
> 	/* unregister rx_handler early so bond_handle_frame wouldn't be called
> 	 * for this slave anymore.
>diff --git a/drivers/net/bonding/bond_procfs.c b/drivers/net/bonding/bond_procfs.c
>index fb868d6..8515b344 100644
>--- a/drivers/net/bonding/bond_procfs.c
>+++ b/drivers/net/bonding/bond_procfs.c
>@@ -159,18 +159,6 @@ static void bond_info_show_master(struct seq_file *seq)
> 	}
> }
> 
>-static const char *bond_slave_link_status(s8 link)
>-{
>-	static const char * const status[] = {
>-		[BOND_LINK_UP] = "up",
>-		[BOND_LINK_FAIL] = "going down",
>-		[BOND_LINK_DOWN] = "down",
>-		[BOND_LINK_BACK] = "going back",
>-	};
>-
>-	return status[link];
>-}
>-
> static void bond_info_show_slave(struct seq_file *seq,
> 				 const struct slave *slave)
> {
>diff --git a/drivers/net/bonding/bond_sysfs_slave.c b/drivers/net/bonding/bond_sysfs_slave.c
>new file mode 100644
>index 0000000..7cb97de
>--- /dev/null
>+++ b/drivers/net/bonding/bond_sysfs_slave.c
>@@ -0,0 +1,144 @@
>+/*	Sysfs attributes of bond slaves
>+ *
>+ *      Copyright (c) 2014 Scott Feldman <sfeldma@cumulusnetworks.com>
>+ *
>+ *	This program is free software; you can redistribute it and/or
>+ *	modify it under the terms of the GNU General Public License
>+ *	as published by the Free Software Foundation; either version
>+ *	2 of the License, or (at your option) any later version.
>+ */
>+
>+#include <linux/capability.h>
>+#include <linux/kernel.h>
>+#include <linux/netdevice.h>
>+
>+#include "bonding.h"
>+
>+struct slave_attribute {
>+	struct attribute attr;
>+	ssize_t (*show)(struct slave *, char *);
>+};
>+
>+#define SLAVE_ATTR(_name, _mode, _show)				\
>+const struct slave_attribute slave_attr_##_name = {		\
>+	.attr = {.name = __stringify(_name),			\
>+		 .mode = _mode },				\
>+	.show	= _show,					\
>+};
>+#define SLAVE_ATTR_RO(_name) \
>+	SLAVE_ATTR(_name, S_IRUGO, _name##_show)
>+
>+static ssize_t state_show(struct slave *slave, char *buf)
>+{
>+	switch (bond_slave_state(slave)) {
>+	case BOND_STATE_ACTIVE:
>+		return sprintf(buf, "active\n");
>+	case BOND_STATE_BACKUP:
>+		return sprintf(buf, "backup\n");
>+	default:
>+		return sprintf(buf, "UNKONWN\n");
>+	}
>+}
>+static SLAVE_ATTR_RO(state);
>+
>+static ssize_t mii_status_show(struct slave *slave, char *buf)
>+{
>+	return sprintf(buf, "%s\n", bond_slave_link_status(slave->link));
>+}
>+static SLAVE_ATTR_RO(mii_status);
>+
>+static ssize_t link_failure_count_show(struct slave *slave, char *buf)
>+{
>+	return sprintf(buf, "%d\n", slave->link_failure_count);
>+}
>+static SLAVE_ATTR_RO(link_failure_count);
>+
>+static ssize_t perm_hwaddr_show(struct slave *slave, char *buf)
>+{
>+	return sprintf(buf, "%pM\n", slave->perm_hwaddr);
>+}
>+static SLAVE_ATTR_RO(perm_hwaddr);
>+
>+static ssize_t queue_id_show(struct slave *slave, char *buf)
>+{
>+	return sprintf(buf, "%d\n", slave->queue_id);
>+}
>+static SLAVE_ATTR_RO(queue_id);
>+
>+static ssize_t ad_aggregator_id_show(struct slave *slave, char *buf)
>+{
>+	const struct aggregator *agg;
>+
>+	if (slave->bond->params.mode == BOND_MODE_8023AD) {
>+		agg = SLAVE_AD_INFO(slave).port.aggregator;
>+		if (agg)
>+			return sprintf(buf, "%d\n",
>+				       agg->aggregator_identifier);
>+	}
>+
>+	return sprintf(buf, "N/A\n");
>+}
>+static SLAVE_ATTR_RO(ad_aggregator_id);
>+
>+static const struct slave_attribute *slave_attrs[] = {
>+	&slave_attr_state,
>+	&slave_attr_mii_status,
>+	&slave_attr_link_failure_count,
>+	&slave_attr_perm_hwaddr,
>+	&slave_attr_queue_id,
>+	&slave_attr_ad_aggregator_id,
>+	NULL
>+};
>+
>+#define to_slave_attr(_at) container_of(_at, struct slave_attribute, attr)
>+#define to_slave(obj)	container_of(obj, struct slave, kobj)
>+
>+static ssize_t slave_show(struct kobject *kobj,
>+			  struct attribute *attr, char *buf)
>+{
>+	struct slave_attribute *slave_attr = to_slave_attr(attr);
>+	struct slave *slave = to_slave(kobj);
>+
>+	return slave_attr->show(slave, buf);
>+}
>+
>+const struct sysfs_ops slave_sysfs_ops = {
>+	.show = slave_show,
>+};
>+
>+static struct kobj_type slave_ktype = {
>+#ifdef CONFIG_SYSFS
>+	.sysfs_ops = &slave_sysfs_ops,
>+#endif
>+};
>+
>+int bond_sysfs_slave_add(struct slave *slave)
>+{
>+	const struct slave_attribute **a;
>+	int err;
>+
>+	err = kobject_init_and_add(&slave->kobj, &slave_ktype,
>+				   &(slave->dev->dev.kobj), "slave");
>+	if (err)
>+		return err;
>+
>+	for (a = slave_attrs; *a; ++a) {
>+		err = sysfs_create_file(&slave->kobj, &((*a)->attr));
>+		if (err) {
>+			kobject_del(&slave->kobj);
>+			return err;
>+		}
>+	}
>+
>+	return 0;
>+}
>+
>+void bond_sysfs_slave_del(struct slave *slave)
>+{
>+	const struct slave_attribute **a;
>+
>+	for (a = slave_attrs; *a; ++a)
>+		sysfs_remove_file(&slave->kobj, &((*a)->attr));
>+
>+	kobject_del(&slave->kobj);
>+}
>diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
>index 955dc48..309757d 100644
>--- a/drivers/net/bonding/bonding.h
>+++ b/drivers/net/bonding/bonding.h
>@@ -203,6 +203,7 @@ struct slave {
> #ifdef CONFIG_NET_POLL_CONTROLLER
> 	struct netpoll *np;
> #endif
>+	struct kobject kobj;
> };
> 
> /*
>@@ -421,6 +422,8 @@ int bond_create(struct net *net, const char *name);
> int bond_create_sysfs(struct bond_net *net);
> void bond_destroy_sysfs(struct bond_net *net);
> void bond_prepare_sysfs_group(struct bonding *bond);
>+int bond_sysfs_slave_add(struct slave *slave);
>+void bond_sysfs_slave_del(struct slave *slave);
> int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev);
> int bond_release(struct net_device *bond_dev, struct net_device *slave_dev);
> int bond_xmit_hash(struct bonding *bond, struct sk_buff *skb, int count);
>@@ -469,6 +472,7 @@ int bond_option_lacp_rate_set(struct bonding *bond, int lacp_rate);
> int bond_option_ad_select_set(struct bonding *bond, int ad_select);
> struct net_device *bond_option_active_slave_get_rcu(struct bonding *bond);
> struct net_device *bond_option_active_slave_get(struct bonding *bond);
>+const char *bond_slave_link_status(s8 link);
> 
> struct bond_net {
> 	struct net *		net;	/* Associated network namespace */
>
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH iproute2] Add IFLA_SLAVE support.
From: Jiri Pirko @ 2014-01-21 13:26 UTC (permalink / raw)
  To: Scott Feldman; +Cc: stephen, netdev, roopa, shm
In-Reply-To: <20140120222505.6206.85084.stgit@debian>

Mon, Jan 20, 2014 at 11:25:05PM CET, sfeldma@cumulusnetworks.com wrote:
>Show slave details for link when slave has IFLA_SLAVE attributes, e.g.:
>
>ip -d link show eth4
>6: eth4: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond1 state UP mode DEFAULT group default qlen 1000
>    link/ether 00:02:00:00:04:03 brd ff:ff:ff:ff:ff:ff promiscuity 1
>    slave state ACTIVE mii_status UP link_failure_count 0 perm_hwaddr 00:02:00:00:04:03 queue_id 0 ad_aggregator_id 1
>---
> include/linux/if_link.h |   13 +++++++++
> ip/ipaddress.c          |   71 +++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 84 insertions(+)
>
>diff --git a/include/linux/if_link.h b/include/linux/if_link.h
>index 098be3d..7f956f6 100644
>--- a/include/linux/if_link.h
>+++ b/include/linux/if_link.h
>@@ -144,6 +144,7 @@ enum {
> 	IFLA_NUM_RX_QUEUES,
> 	IFLA_CARRIER,
> 	IFLA_PHYS_PORT_ID,
>+	IFLA_SLAVE,
> 	__IFLA_MAX
> };
> 
>@@ -366,6 +367,18 @@ enum {
> 
> #define IFLA_BOND_AD_INFO_MAX   (__IFLA_BOND_AD_INFO_MAX - 1)
> 
>+enum {
>+	IFLA_SLAVE_STATE,
>+	IFLA_SLAVE_MII_STATUS,
>+	IFLA_SLAVE_LINK_FAILURE_COUNT,
>+	IFLA_SLAVE_PERM_HWADDR,
>+	IFLA_SLAVE_QUEUE_ID,
>+	IFLA_SLAVE_AD_AGGREGATOR_ID,
>+	__IFLA_SLAVE_MAX,
>+};

I don't like this. Its not clear that this is related to bond only...
misleading

>+
>+#define IFLA_SLAVE_MAX  (__IFLA_SLAVE_MAX - 1)
>+
> /* SR-IOV virtual function management section */
> 
> enum {
>diff --git a/ip/ipaddress.c b/ip/ipaddress.c
>index d02eaaf..a0d3ab8 100644
>--- a/ip/ipaddress.c
>+++ b/ip/ipaddress.c
>@@ -27,6 +27,7 @@
> 
> #include <linux/netdevice.h>
> #include <linux/if_arp.h>
>+#include <linux/if_bonding.h>
> #include <linux/sockios.h>
> 
> #include "rt_names.h"
>@@ -223,6 +224,73 @@ static void print_linktype(FILE *fp, struct rtattr *tb)
> 	}
> }
> 
>+static const char *slave_states[] = {
>+	[BOND_STATE_ACTIVE] = "ACTIVE",
>+	[BOND_STATE_BACKUP] = "BACKUP",
>+};
>+
>+static void print_slave_state(FILE *f, struct rtattr *tb)
>+{
>+	unsigned int state = rta_getattr_u8(tb);
>+
>+	if (state >= sizeof(slave_states) / sizeof(slave_states[0]))
>+		fprintf(f, "state %d ", state);
>+	else
>+		fprintf(f, "state %s ", slave_states[state]);
>+}
>+
>+static const char *slave_mii_status[] = {
>+	[BOND_LINK_UP] = "UP",
>+	[BOND_LINK_FAIL] = "GOING_DOWN",
>+	[BOND_LINK_DOWN] = "DOWN",
>+	[BOND_LINK_BACK] = "GOING_BACK",
>+};
>+
>+static void print_slave_mii_status(FILE *f, struct rtattr *tb)
>+{
>+	unsigned int status = rta_getattr_u8(tb);
>+
>+	if (status >= sizeof(slave_mii_status) / sizeof(slave_mii_status[0]))
>+		fprintf(f, "mii_status %d ", status);
>+	else
>+		fprintf(f, "mii_status %s ", slave_mii_status[status]);
>+}
>+
>+static void print_slave(FILE *fp, struct rtattr *tb)
>+{
>+	struct rtattr *slave[IFLA_SLAVE_MAX+1];
>+	SPRINT_BUF(b1);
>+
>+	parse_rtattr_nested(slave, IFLA_SLAVE_MAX, tb);
>+
>+	if (!slave[IFLA_SLAVE_STATE])
>+		return;
>+
>+	fprintf(fp, "%s    slave ", _SL_);
>+	print_slave_state(fp, slave[IFLA_SLAVE_STATE]);
>+
>+	if (slave[IFLA_SLAVE_MII_STATUS])
>+		print_slave_mii_status(fp, slave[IFLA_SLAVE_MII_STATUS]);
>+
>+	if (slave[IFLA_SLAVE_LINK_FAILURE_COUNT])
>+		fprintf(fp, "link_failure_count %d ",
>+			rta_getattr_u32(slave[IFLA_SLAVE_LINK_FAILURE_COUNT]));
>+
>+	if (slave[IFLA_SLAVE_PERM_HWADDR])
>+		fprintf(fp, "perm_hwaddr %s ",
>+			ll_addr_n2a(RTA_DATA(slave[IFLA_SLAVE_PERM_HWADDR]),
>+				    RTA_PAYLOAD(slave[IFLA_SLAVE_PERM_HWADDR]),
>+				    0, b1, sizeof(b1)));
>+
>+	if (slave[IFLA_SLAVE_QUEUE_ID])
>+		fprintf(fp, "queue_id %d ",
>+			rta_getattr_u16(slave[IFLA_SLAVE_QUEUE_ID]));
>+
>+	if (slave[IFLA_SLAVE_AD_AGGREGATOR_ID])
>+		fprintf(fp, "ad_aggregator_id %d ",
>+			rta_getattr_u16(slave[IFLA_SLAVE_AD_AGGREGATOR_ID]));
>+}
>+
> static void print_vfinfo(FILE *fp, struct rtattr *vfinfo)
> {
> 	struct ifla_vf_mac *vf_mac;
>@@ -516,6 +584,9 @@ int print_linkinfo(const struct sockaddr_nl *who,
> 			print_vfinfo(fp, i);
> 	}
> 
>+	if (do_link && tb[IFLA_SLAVE] && show_details)
>+		print_slave(fp, tb[IFLA_SLAVE]);
>+
> 	fprintf(fp, "\n");
> 	fflush(fp);
> 	return 0;
>
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH RFC 4/6] net: rfkill: gpio: add device tree support
From: Arnd Bergmann @ 2014-01-21 12:35 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: Linus Walleij, Alexandre Courbot,
	devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Heikki Krogerus, netdev, linux-wireless, linux-sunxi,
	linux-kernel, Maxime Ripard, Chen-Yu Tsai, Johannes Berg,
	Mika Westerberg, David S. Miller
In-Reply-To: <CACRpkdYiy+sya6NqRfAmsrFOXvaa3qX=qjRuTDW1vZVSaG1+Gg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Tuesday 21 January 2014, Linus Walleij wrote:
> On Tue, Jan 21, 2014 at 4:11 AM, Alexandre Courbot <gnurou-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > On Sat, Jan 18, 2014 at 8:11 AM, Linus Walleij <linus.walleij-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> >
> > I agree that's how it should be be done with the current API if your
> > driver can obtain GPIOs from both ACPI and DT. This is a potential
> > issue, as drivers are not supposed to make assumptions about who is
> > going to be their GPIO provider. Let's say you started a driver with
> > only DT in mind, and used gpio_get(dev, con_id) to get your GPIOs. DT
> > bindings are thus of the form "con_id-gpio = <phandle>", and set in
> > stone. Then later, someone wants to use your driver with ACPI. How do
> > you handle that gracefully?
> 
> Short answer is you can't. You have to pour backward-compatibility
> code into the driver first checking for that property and then falling
> back to the new binding if it doesn't exist.

With the ACPI named properties extension, it should be possible to have
something akin to a "gpio-names" list that can be attached to an indexed
array of gpio descriptors. I assume that Intel is going to need this
for named irqs, clocks, regulators, dmas as well, so I think it will
eventually get there. It's not something that can be done today though,
or that is standardized in APCI-5.0.

My guess is that named GPIOs are going to make more sense on x86 embedded
than on arm64 server.

> > I'm starting to wonder, now that ACPI is a first-class GPIO provider,
> > whether we should not start to encourage the deprecation of the
> > "con_id-gpio = <phandle>" binding form in DT and only use a single
> > indexed GPIO property per device.
> 
> You have a valid point.

Independent of ACPI, I prefer indexed "gpios" properties over "con_id-gpio"
properties anyway, because it's more consistent with some of the other
subsystems. I don't have an opinion though on whether we should also
allow a "gpios"/"gpio-names" pair, or whether we should keep the indexed
"gpios" list for the anonymous case.

> > The con_id parameter would then only
> > be used as a label, which would also have the nice side-effect that
> > all GPIOs used for a given function will be reported under the same
> > name no matter what the GPIO provider is.
> 
> As discussed earlier in this thread I'm not sure the con_id is
> suitable for labelling GPIOs. It'd be better to have a proper name
> specified in DT/ACPI instead.

+1

> > From an aesthetic point of view, I definitely prefer using con_id to
> > identify GPIOs instead of indexes, but I don't see how we can make it
> > play nice with ACPI. Thoughts?
> 
> Let's ask the DT maintainers...
> 
> I'm a bit sceptic to the whole ACPI-DT-API-should-be-unified
> just-one-function-call business, as this was just a very simple example
> of what can happen to something as simple as
> devm_gpiod_get[_index]().

I think a unified kernel API makes more sense for some subsystems than
others, and it depends a bit on the rate of adoption of APCI for drivers
that already have a DT binding (or vice versa, if that happens).

GPIO might actually be in the first category since it's commonly used
for off-chip components that will get shared across ARM and x86 (as
well as everything else), while a common kernel API would be less
important for things that are internal to an SoC where Intel is the
only company needing ACPI support.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH net-next] tcp: metrics: Fix rcu-race when deleting multiple entries
From: Christoph Paasch @ 2014-01-21 12:30 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

In bbf852b96ebdc6d1 I introduced the tmlist, which allows to delete
multiple entries from the cache that match a specified destination if no
source-IP is specified.

However, as the cache is an RCU-list, we should not create this tmlist, as
it will change the tcpm_next pointer of the element that will be deleted
and so a thread iterating over the cache's entries while holding the
RCU-lock might get "redirected" to this tmlist.

This patch fixes this, by reverting back to the old behavior prior to
bbf852b96ebdc6d1, which means that we simply change the tcpm_next
pointer of the previous element (pp) to jump over the one we are
deleting.
The difference is that we call kfree_rcu() directly on the cache entry,
which allows us to delete multiple entries from the list.

Fixes: bbf852b96ebdc6d1 (tcp: metrics: Delete all entries matching a certain destination)
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
---
 net/ipv4/tcp_metrics.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c
index fa950941de65..9ae48b4a37d1 100644
--- a/net/ipv4/tcp_metrics.c
+++ b/net/ipv4/tcp_metrics.c
@@ -1019,13 +1019,13 @@ static int tcp_metrics_flush_all(struct net *net)
 static int tcp_metrics_nl_cmd_del(struct sk_buff *skb, struct genl_info *info)
 {
 	struct tcpm_hash_bucket *hb;
-	struct tcp_metrics_block *tm, *tmlist = NULL;
+	struct tcp_metrics_block *tm;
 	struct tcp_metrics_block __rcu **pp;
 	struct inetpeer_addr saddr, daddr;
 	unsigned int hash;
 	struct net *net = genl_info_net(info);
 	int ret;
-	bool src = true;
+	bool src = true, found = false;
 
 	ret = parse_nl_addr(info, &daddr, &hash, 1);
 	if (ret < 0)
@@ -1044,19 +1044,15 @@ static int tcp_metrics_nl_cmd_del(struct sk_buff *skb, struct genl_info *info)
 		if (addr_same(&tm->tcpm_daddr, &daddr) &&
 		    (!src || addr_same(&tm->tcpm_saddr, &saddr))) {
 			*pp = tm->tcpm_next;
-			tm->tcpm_next = tmlist;
-			tmlist = tm;
+			kfree_rcu(tm, rcu_head);
+			found = true;
 		} else {
 			pp = &tm->tcpm_next;
 		}
 	}
 	spin_unlock_bh(&tcp_metrics_lock);
-	if (!tmlist)
+	if (!found)
 		return -ESRCH;
-	for (tm = tmlist; tm; tm = tmlist) {
-		tmlist = tm->tcpm_next;
-		kfree_rcu(tm, rcu_head);
-	}
 	return 0;
 }
 
-- 
1.8.3.2

^ permalink raw reply related

* Re: [Xen-devel] [PATCH v3] xen/grant-table: Avoid m2p_override during mapping
From: Stefano Stabellini @ 2014-01-21 12:26 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel,
	jonathan.davies
In-Reply-To: <1390244288-3186-1-git-send-email-zoltan.kiss@citrix.com>

On Mon, 20 Jan 2014, Zoltan Kiss wrote:
> The grant mapping API does m2p_override unnecessarily: only gntdev needs it,
> for blkback and future netback patches it just cause a lock contention, as
> those pages never go to userspace. Therefore this series does the following:
> - the original functions were renamed to __gnttab_[un]map_refs, with a new
>   parameter m2p_override
> - based on m2p_override either they follow the original behaviour, or just set
>   the private flag and call set_phys_to_machine
> - gnttab_[un]map_refs are now a wrapper to call __gnttab_[un]map_refs with
>   m2p_override false
> - a new function gnttab_[un]map_refs_userspace provides the old behaviour
> 
> v2:
> - move the storing of the old mfn in page->index to gnttab_map_refs
> - move the function header update to a separate patch
> 
> v3:
> - a new approach to retain old behaviour where it needed
> - squash the patches into one
> 
> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
> Suggested-by: David Vrabel <david.vrabel@citrix.com>
> ---
>  drivers/block/xen-blkback/blkback.c |   15 +++----
>  drivers/xen/gntdev.c                |   13 +++---
>  drivers/xen/grant-table.c           |   81 +++++++++++++++++++++++++++++------
>  include/xen/grant_table.h           |    8 +++-
>  4 files changed, 87 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
> index aa846a4..87ded60 100644
> --- a/drivers/xen/grant-table.c
> +++ b/drivers/xen/grant-table.c
> @@ -880,15 +880,17 @@ void gnttab_batch_copy(struct gnttab_copy *batch, unsigned count)
>  }
>  EXPORT_SYMBOL_GPL(gnttab_batch_copy);
>  
> -int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
> +int __gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
>  		    struct gnttab_map_grant_ref *kmap_ops,
> -		    struct page **pages, unsigned int count)
> +		    struct page **pages, unsigned int count,
> +		    bool m2p_override)
>  {
>  	int i, ret;
>  	bool lazy = false;
>  	pte_t *pte;
>  	unsigned long mfn;
>  
> +	BUG_ON(kmap_ops && !m2p_override);
>  	ret = HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, map_ops, count);
>  	if (ret)
>  		return ret;
> @@ -907,10 +909,12 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
>  			set_phys_to_machine(map_ops[i].host_addr >> PAGE_SHIFT,
>  					map_ops[i].dev_bus_addr >> PAGE_SHIFT);
>  		}
> -		return ret;
> +		return 0;
>  	}
>  
> -	if (!in_interrupt() && paravirt_get_lazy_mode() == PARAVIRT_LAZY_NONE) {
> +	if (m2p_override &&
> +	    !in_interrupt() &&
> +	    paravirt_get_lazy_mode() == PARAVIRT_LAZY_NONE) {
>  		arch_enter_lazy_mmu_mode();
>  		lazy = true;
>  	}
> @@ -927,8 +931,18 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
>  		} else {
>  			mfn = PFN_DOWN(map_ops[i].dev_bus_addr);
>  		}
> -		ret = m2p_add_override(mfn, pages[i], kmap_ops ?
> -				       &kmap_ops[i] : NULL);
> +		if (m2p_override)
> +			ret = m2p_add_override(mfn, pages[i], kmap_ops ?
> +					       &kmap_ops[i] : NULL);
> +		else {
> +			unsigned long pfn = page_to_pfn(pages[i]);
> +			WARN_ON(PagePrivate(pages[i]));
> +			SetPagePrivate(pages[i]);
> +			set_page_private(pages[i], mfn);
> +			pages[i]->index = pfn_to_mfn(pfn);
> +			if (unlikely(!set_phys_to_machine(pfn, FOREIGN_FRAME(mfn))))
> +				return -ENOMEM;

What happens if the page is PageHighMem?

This looks like a subset of m2p_add_override, but it is missing some
relevant bits, like the PageHighMem check, or the p2m(m2p(mfn)) == mfn
check.  Maybe we can find a way to avoid duplicating the code.
We could split m2p_add_override in two functions or add yet another
parameter to it.


> +		}
>  		if (ret)
>  			goto out;
>  	}
> @@ -937,17 +951,33 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
>  	if (lazy)
>  		arch_leave_lazy_mmu_mode();
>  
> -	return ret;
> +	return 0;
> +}
> +
> +int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
> +		    struct page **pages, unsigned int count)
> +{
> +	return __gnttab_map_refs(map_ops, NULL, pages, count, false);
>  }
>  EXPORT_SYMBOL_GPL(gnttab_map_refs);
>  
> -int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
> +int gnttab_map_refs_userspace(struct gnttab_map_grant_ref *map_ops,
> +			      struct gnttab_map_grant_ref *kmap_ops,
> +			      struct page **pages, unsigned int count)
> +{
> +	return __gnttab_map_refs(map_ops, kmap_ops, pages, count, true);
> +}
> +EXPORT_SYMBOL_GPL(gnttab_map_refs_userspace);
> +
> +int __gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
>  		      struct gnttab_map_grant_ref *kmap_ops,
> -		      struct page **pages, unsigned int count)
> +		      struct page **pages, unsigned int count,
> +		      bool m2p_override)
>  {
>  	int i, ret;
>  	bool lazy = false;
>  
> +	BUG_ON(kmap_ops && !m2p_override);
>  	ret = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, unmap_ops, count);
>  	if (ret)
>  		return ret;
> @@ -958,17 +988,26 @@ int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
>  			set_phys_to_machine(unmap_ops[i].host_addr >> PAGE_SHIFT,
>  					INVALID_P2M_ENTRY);
>  		}
> -		return ret;
> +		return 0;
>  	}
>  
> -	if (!in_interrupt() && paravirt_get_lazy_mode() == PARAVIRT_LAZY_NONE) {
> +	if (m2p_override &&
> +	    !in_interrupt() &&
> +	    paravirt_get_lazy_mode() == PARAVIRT_LAZY_NONE) {
>  		arch_enter_lazy_mmu_mode();
>  		lazy = true;
>  	}
>  
>  	for (i = 0; i < count; i++) {
> -		ret = m2p_remove_override(pages[i], kmap_ops ?
> -				       &kmap_ops[i] : NULL);
> +		if (m2p_override)
> +			ret = m2p_remove_override(pages[i], kmap_ops ?
> +						  &kmap_ops[i] : NULL);
> +		else {
> +			unsigned long pfn = page_to_pfn(pages[i]);
> +			WARN_ON(!PagePrivate(pages[i]));
> +			ClearPagePrivate(pages[i]);
> +			set_phys_to_machine(pfn, pages[i]->index);

same here: let's try to avoid code duplication


> +		}
>  		if (ret)
>  			goto out;
>  	}
> @@ -977,10 +1016,24 @@ int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
>  	if (lazy)
>  		arch_leave_lazy_mmu_mode();
>  
> -	return ret;
> +	return 0;
> +}
> +
> +int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *map_ops,
> +		    struct page **pages, unsigned int count)
> +{
> +	return __gnttab_unmap_refs(map_ops, NULL, pages, count, false);
>  }
>  EXPORT_SYMBOL_GPL(gnttab_unmap_refs);
>  
> +int gnttab_unmap_refs_userspace(struct gnttab_unmap_grant_ref *map_ops,
> +				struct gnttab_map_grant_ref *kmap_ops,
> +				struct page **pages, unsigned int count)
> +{
> +	return __gnttab_unmap_refs(map_ops, kmap_ops, pages, count, true);
> +}
> +EXPORT_SYMBOL_GPL(gnttab_unmap_refs_userspace);
> +
>  static unsigned nr_status_frames(unsigned nr_grant_frames)
>  {
>  	BUG_ON(grefs_per_grant_frame == 0);

^ permalink raw reply

* [PATCH 3/4] ipv4: use SNMP macro assuming softirq context in ip_forward().
From: Sergey Popovich @ 2014-01-21 11:48 UTC (permalink / raw)
  To: netdev
In-Reply-To: <cover.1390304505.git.popovich_sergei@mail.ru>

ip_forward() runs entirely in softirq context, we could use
version of SNMP macro which updates stats counters assuming that.

Signed-off-by: Sergey Popovich <popovich_sergei@mail.ru>
---
 net/ipv4/ip_forward.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/ip_forward.c b/net/ipv4/ip_forward.c
index e9f1217..fd7eeec 100644
--- a/net/ipv4/ip_forward.c
+++ b/net/ipv4/ip_forward.c
@@ -93,7 +93,7 @@ int ip_forward(struct sk_buff *skb)
 	mtu = ip_dst_mtu_maybe_forward(&rt->dst, true);
 	if (unlikely(skb->len > mtu && !skb_is_gso(skb) &&
 		     (ip_hdr(skb)->frag_off & htons(IP_DF))) && !skb->local_df) {
-		IP_INC_STATS(dev_net(rt->dst.dev), IPSTATS_MIB_FRAGFAILS);
+		IP_INC_STATS_BH(dev_net(rt->dst.dev), IPSTATS_MIB_FRAGFAILS);
 		icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
 			  htonl(mtu));
 		goto drop;
-- 
1.8.3.4

^ permalink raw reply related

* [PATCH 1/4] ipv4: don't disable interface if last ipv4 address is removed
From: Sergey Popovich @ 2014-01-21 11:48 UTC (permalink / raw)
  To: netdev
In-Reply-To: <cover.1390304505.git.popovich_sergei@mail.ru>

Since

  commit 876fd05ddbae03166e7037fca957b55bb3be6594
  Author: Hannes Frederic Sowa <>
  Date:   Mon Jun 24 00:22:20 2013 +0200

    ipv6: don't disable interface if last ipv6 address is removed

we have disabled this behavior for IPv6. Adjust behavior for IPv4.

There is at least one additional reason to do the same for IPv4.
Suppose we have following configuration:

  # ip link add dev dummy1 type dummy
  # ip -4 addr add 192.168.1.254/24 dev dummy1
  # ip link set up dev dummy1

  # ip link add dev veth1a type veth peer name veth1b
  # ip link set up dev veth1a
  # ip -4 route add 192.168.1.1/32 dev veth1a src 192.168.1.254 proto static

  # ip netns add pc1
  # ip link set netns pc1 dev veth1b
  # ip netns exec pc1 ip -4 addr add 192.168.1.1/24 dev veth1b
  # ip netns exec pc1 ip link set up dev veth1b
  # ip netns exec pc1 ip link set up dev lo

  # ip -4 addr add 10.0.1.1/30 dev veth1a
  # ip -4 addr sh dev veth1a
  32: veth1a: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    inet 10.0.1.1/30 scope global veth1a
       valid_lft forever preferred_lft forever
  # ip -4 route show dev veth1a
  10.0.1.0/30  proto kernel  scope link  src 10.0.1.1
  192.168.1.1  proto static  scope link  src 192.168.1.254

  # ping -I192.168.1.254 192.168.1.1 -c1
  PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
  64 bytes from 192.168.1.1: icmp_req=1 ttl=64 time=0.039 ms

  --- 192.168.1.1 ping statistics ---
  1 packets transmitted, 1 received, 0% packet loss, time 0ms
  rtt min/avg/max/mdev = 0.039/0.039/0.039/0.000 ms
  # ip -4 neigh sh dev veth1a
  192.168.1.1 lladdr 9a:7e:a5:aa:1b:a0 REACHABLE

  and after removing last ipv4 address from veth1a
  ------------------------------------------------

  # ip -4 addr del 10.0.1.1/30 dev veth1a
  # ip -4 neigh sh dev veth1a
  # ip -4 route show dev veth1a

Signed-off-by: Sergey Popovich <popovich_sergei@mail.ru>
---
 net/ipv4/fib_frontend.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index d846304..ae5f35f 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -1021,14 +1021,7 @@ static int fib_inetaddr_event(struct notifier_block *this, unsigned long event,
 	case NETDEV_DOWN:
 		fib_del_ifaddr(ifa, NULL);
 		atomic_inc(&net->ipv4.dev_addr_genid);
-		if (ifa->ifa_dev->ifa_list == NULL) {
-			/* Last address was deleted from this interface.
-			 * Disable IP.
-			 */
-			fib_disable_ip(dev, 1);
-		} else {
-			rt_cache_flush(dev_net(dev));
-		}
+		rt_cache_flush(dev_net(dev));
 		break;
 	}
 	return NOTIFY_DONE;
-- 
1.8.3.4

^ permalink raw reply related

* [PATCH 4/4] ipv4: mark nexthop as dead when it's subnet becomes unreachable
From: Sergey Popovich @ 2014-01-21 11:48 UTC (permalink / raw)
  To: netdev
In-Reply-To: <cover.1390304505.git.popovich_sergei@mail.ru>

Removing ip address and it's subnet route using fib_del_ifaddr() does
not purge routes with nexthop in such subnet.

This could be easily reproduced with the following config:

  # ip link add dev dummy1 type dummy
  # ip link set up dev dummy1
  # ip -4 addr add 10.0.10.1/24 dev dummy1
  # ip -4 addr add 10.0.20.1/24 dev dummy1

  # ip -4 route add 172.16.0.0/12 proto static via 10.0.10.5
  # ip -4 route show exact 172.16.0.0/12
  172.16.0.0/12 via 10.0.10.5 dev dummy1  proto static

  # ip -4 addr del 10.0.10.1/24 dev dummy1

  # ip -4 route show exact 172.16.0.0/12
  172.16.0.0/12 via 10.0.10.5 dev dummy1  proto static

Add interface address (ifa) parameter to fib_sync_down_dev()
and use it to match nexthop against it's subnet.

Use fib_sync_down_dev() in fib_del_ifaddr() among with fib_sync_down_addr()
to mark as dead routes with nexthop in ifa.

Also optimize loop a bit:
  move handling of force > 1 out of the change_nexthops() loop.

Signed-off-by: Sergey Popovich <popovich_sergei@mail.ru>
---
 net/ipv4/fib_frontend.c  |  5 +++--
 net/ipv4/fib_semantics.c | 36 ++++++++++++++++++++----------------
 2 files changed, 23 insertions(+), 18 deletions(-)

diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index ae5f35f..fd3445e 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -907,7 +907,8 @@ void fib_del_ifaddr(struct in_ifaddr *ifa, struct in_ifaddr *iprim)
 			 * First of all, we scan fib_info list searching
 			 * for stray nexthop entries, then ignite fib_flush.
 			 */
-			if (fib_sync_down_addr(dev_net(dev), ifa->ifa_local))
+			if (fib_sync_down_addr(dev_net(dev), ifa->ifa_local) +
+			    fib_sync_down_dev(dev, ifa, 0))
 				fib_flush(dev_net(dev));
 		}
 	}
@@ -997,7 +998,7 @@ static void nl_fib_lookup_exit(struct net *net)
 
 static void fib_disable_ip(struct net_device *dev, int force)
 {
-	if (fib_sync_down_dev(dev, force))
+	if (fib_sync_down_dev(dev, NULL, force))
 		fib_flush(dev_net(dev));
 	rt_cache_flush(dev_net(dev));
 	arp_ifdown(dev);
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 9d43468..ca920df 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -1112,7 +1112,7 @@ int fib_sync_down_addr(struct net *net, __be32 local)
 	return ret;
 }
 
-int fib_sync_down_dev(struct net_device *dev, int force)
+int fib_sync_down_dev(struct net_device *dev, struct in_ifaddr *ifa, int force)
 {
 	int ret = 0;
 	int scope = RT_SCOPE_NOWHERE;
@@ -1132,29 +1132,33 @@ int fib_sync_down_dev(struct net_device *dev, int force)
 		if (nh->nh_dev != dev || fi == prev_fi)
 			continue;
 		prev_fi = fi;
-		dead = 0;
-		change_nexthops(fi) {
-			if (nexthop_nh->nh_flags & RTNH_F_DEAD)
-				dead++;
-			else if (nexthop_nh->nh_dev == dev &&
-				 nexthop_nh->nh_scope != scope) {
-				nexthop_nh->nh_flags |= RTNH_F_DEAD;
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
-				spin_lock_bh(&fib_multipath_lock);
-				fi->fib_power -= nexthop_nh->nh_power;
-				nexthop_nh->nh_power = 0;
-				spin_unlock_bh(&fib_multipath_lock);
+		if (force > 1)
+			goto fi_dead;
 #endif
+		dead = 0;
+		change_nexthops(fi) {
+			if (nexthop_nh->nh_flags & RTNH_F_DEAD) {
 				dead++;
+				continue;
 			}
+			if (nexthop_nh->nh_dev != dev ||
+			    nexthop_nh->nh_scope == scope ||
+			    (ifa && !inet_ifa_match(nexthop_nh->nh_gw, ifa)))
+				continue;
+			nexthop_nh->nh_flags |= RTNH_F_DEAD;
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
-			if (force > 1 && nexthop_nh->nh_dev == dev) {
-				dead = fi->fib_nhs;
-				break;
-			}
+			spin_lock_bh(&fib_multipath_lock);
+			fi->fib_power -= nexthop_nh->nh_power;
+			nexthop_nh->nh_power = 0;
+			spin_unlock_bh(&fib_multipath_lock);
 #endif
+			dead++;
 		} endfor_nexthops(fi)
 		if (dead == fi->fib_nhs) {
+#ifdef CONFIG_IP_ROUTE_MULTIPATH
+fi_dead:
+#endif
 			fi->fib_flags |= RTNH_F_DEAD;
 			ret++;
 		}
-- 
1.8.3.4

^ permalink raw reply related

* [PATCH 2/4] ipv4: fib_semantics: increment fib_info_cnt after fib_info allocation
From: Sergey Popovich @ 2014-01-21 11:48 UTC (permalink / raw)
  To: netdev
In-Reply-To: <cover.1390304505.git.popovich_sergei@mail.ru>

Increment fib_info_cnt in fib_create_info() right after successfuly
alllocating fib_info structure, overwise fib_metrics allocation failure
leads to fib_info_cnt incorrectly decremented in free_fib_info(), called
on error path from fib_create_info().

Signed-off-by: Sergey Popovich <popovich_sergei@mail.ru>
---
 net/ipv4/fib_semantics.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index b53f0bf..9d43468 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -820,13 +820,13 @@ struct fib_info *fib_create_info(struct fib_config *cfg)
 	fi = kzalloc(sizeof(*fi)+nhs*sizeof(struct fib_nh), GFP_KERNEL);
 	if (fi == NULL)
 		goto failure;
+	fib_info_cnt++;
 	if (cfg->fc_mx) {
 		fi->fib_metrics = kzalloc(sizeof(u32) * RTAX_MAX, GFP_KERNEL);
 		if (!fi->fib_metrics)
 			goto failure;
 	} else
 		fi->fib_metrics = (u32 *) dst_default_metrics;
-	fib_info_cnt++;
 
 	fi->fib_net = hold_net(net);
 	fi->fib_protocol = cfg->fc_protocol;
-- 
1.8.3.4

^ permalink raw reply related

* [PATCH 0/4] ipv4: small set of fixes
From: Sergey Popovich @ 2014-01-21 11:48 UTC (permalink / raw)
  To: netdev

In this patch series I present my attempt to address
some issues in IPv4 implementation.

Please take time to review my patches.

Thanks.

Sergey Popovich (4):
  ipv4: don't disable interface if last ipv4 address is removed
  ipv4: fib_semantics: increment fib_info_cnt after fib_info allocation
  ipv4: use SNMP macro assuming softirq context in ip_forward().
  ipv4: mark nexthop as dead when it's subnet becomes unreachable

 net/ipv4/fib_frontend.c  | 14 ++++----------
 net/ipv4/fib_semantics.c | 38 +++++++++++++++++++++-----------------
 net/ipv4/ip_forward.c    |  2 +-
 3 files changed, 26 insertions(+), 28 deletions(-)

-- 
1.8.3.4

^ permalink raw reply

* [PATCH net] be2net: Fix be_vlan_add/rem_vid() routines
From: Somnath Kotur @ 2014-01-21 10:20 UTC (permalink / raw)
  To: netdev; +Cc: davem, Somnath Kotur, Kalesh AP

The current logic to put interface into VLAN Promiscous mode is not correct.
We should increment "adapter->vlans_added" before calling be_vid_config().
Also removed some unwanted log messages.

Signed-off-by: Kalesh AP <kalesh.purayil@emulex.com>
Signed-off-by: Somnath Kotur <somnath.kotur@emulex.com>
---
 drivers/net/ethernet/emulex/benet/be_main.c |   21 ++++++++-------------
 1 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index bf40fda..2fada24 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -1096,8 +1096,6 @@ static int be_vid_config(struct be_adapter *adapter)
 				dev_info(&adapter->pdev->dev,
 					 "Disabling VLAN Promiscuous mode.\n");
 				adapter->flags &= ~BE_FLAGS_VLAN_PROMISC;
-				dev_info(&adapter->pdev->dev,
-					 "Re-Enabling HW VLAN filtering\n");
 			}
 		}
 	}
@@ -1105,12 +1103,12 @@ static int be_vid_config(struct be_adapter *adapter)
 	return status;
 
 set_vlan_promisc:
-	dev_warn(&adapter->pdev->dev, "Exhausted VLAN HW filters.\n");
+	if (adapter->flags & BE_FLAGS_VLAN_PROMISC)
+		return 0;
 
 	status = be_cmd_rx_filter(adapter, BE_FLAGS_VLAN_PROMISC, ON);
 	if (!status) {
 		dev_info(&adapter->pdev->dev, "Enable VLAN Promiscuous mode\n");
-		dev_info(&adapter->pdev->dev, "Disabling HW VLAN filtering\n");
 		adapter->flags |= BE_FLAGS_VLAN_PROMISC;
 	} else
 		dev_err(&adapter->pdev->dev,
@@ -1123,19 +1121,18 @@ static int be_vlan_add_vid(struct net_device *netdev, __be16 proto, u16 vid)
 	struct be_adapter *adapter = netdev_priv(netdev);
 	int status = 0;
 
-
 	/* Packets with VID 0 are always received by Lancer by default */
 	if (lancer_chip(adapter) && vid == 0)
 		goto ret;
 
 	adapter->vlan_tag[vid] = 1;
-	if (adapter->vlans_added <= (be_max_vlans(adapter) + 1))
-		status = be_vid_config(adapter);
+	adapter->vlans_added++;
 
-	if (!status)
-		adapter->vlans_added++;
-	else
+	status = be_vid_config(adapter);
+	if (status) {
+		adapter->vlans_added--;
 		adapter->vlan_tag[vid] = 0;
+	}
 ret:
 	return status;
 }
@@ -1150,9 +1147,7 @@ static int be_vlan_rem_vid(struct net_device *netdev, __be16 proto, u16 vid)
 		goto ret;
 
 	adapter->vlan_tag[vid] = 0;
-	if (adapter->vlans_added <= be_max_vlans(adapter))
-		status = be_vid_config(adapter);
-
+	status = be_vid_config(adapter);
 	if (!status)
 		adapter->vlans_added--;
 	else
-- 
1.6.0.2

^ permalink raw reply related

* Re: [PATCH] SUNRPC: Allow one callback request to be received from two sk_buff
From: shaobingqing @ 2014-01-21 10:08 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: bfields-H+wXaHxf7aLQT0dZR+AlfA, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1390259843.2501.2.camel-5lNtUQgoD8Pfa3cDbr2K10B+6BGkLq7r@public.gmane.org>

2014/1/21 Trond Myklebust <trond.myklebust-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>:
> On Mon, 2014-01-20 at 14:59 +0800, shaobingqing wrote:
>> In current code, there only one struct rpc_rqst is prealloced. If one
>> callback request is received from two sk_buff, the xprt_alloc_bc_request
>> would be execute two times with the same transport->xid. The first time
>> xprt_alloc_bc_request will alloc one struct rpc_rqst and the TCP_RCV_COPY_DATA
>> bit of transport->tcp_flags will not be cleared. The second time
>> xprt_alloc_bc_request could not alloc struct rpc_rqst any more and NULL
>> pointer will be returned, then xprt_force_disconnect occur. I think one
>> callback request can be allowed to be received from two sk_buff.
>>
>> Signed-off-by: shaobingqing <shaobingqing-Gb3srWounXyPt1CcHtbs0g@public.gmane.org>
>> ---
>>  net/sunrpc/xprtsock.c |   11 +++++++++--
>>  1 files changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
>> index ee03d35..606950d 100644
>> --- a/net/sunrpc/xprtsock.c
>> +++ b/net/sunrpc/xprtsock.c
>> @@ -1271,8 +1271,13 @@ static inline int xs_tcp_read_callback(struct rpc_xprt *xprt,
>>       struct sock_xprt *transport =
>>                               container_of(xprt, struct sock_xprt, xprt);
>>       struct rpc_rqst *req;
>> +     static struct rpc_rqst *req_partial;
>> +
>> +     if (req_partial == NULL)
>> +             req = xprt_alloc_bc_request(xprt);
>> +     else if (req_partial->rq_xid == transport->tcp_xid)
>> +             req = req_partial;
>
> What happens here if req_partial->rq_xid != transport->tcp_xid? AFAICS,
> req will be undefined. Either way, you cannot use a static variable for
> storage here: that isn't re-entrant.

Because metadata sever only have one slot for backchannel request,
req_partial->rq_xid == transport->tcp_xid always happens, if the callback
request just being splited in two sk_buffs. But req_partial->rq_xid !=
transport->tcp_xid may also happens in some special cases, such as
retransmission occurs?
If one callback request is splited in two sk_buffs, xs_tcp_read_callback
will be execute two times. The req_partial should be a static variable,
because  the second execution of xs_tcp_read_callback should use
the rpc_rqst allocated for the first execution, which saves information
copies from the first sk_buff.
I think perhaps the code should be modified like bellows:

diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 606950d..02dbb82 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1273,10 +1273,14 @@ static inline int xs_tcp_read_callback(struct
rpc_xprt *xprt,
        struct rpc_rqst *req;
        static struct rpc_rqst *req_partial;

-       if (req_partial == NULL)
+       if (req_partial == NULL) {
                req = xprt_alloc_bc_request(xprt);
-       else if (req_partial->rq_xid == transport->tcp_xid)
+       } else if (req_partial->rq_xid == transport->tcp_xid) {
                req = req_partial;
+       } else {
+               xprt_free_bc_request(req_partial);
+               req = xprt_alloc_bc_request(xprt);
+       }

        if (req == NULL) {
                printk(KERN_WARNING "Callback slot table overflowed\n");
@@ -1303,8 +1307,9 @@ static inline int xs_tcp_read_callback(struct
rpc_xprt *xprt,
                list_add(&req->rq_bc_list, &bc_serv->sv_cb_list);
                spin_unlock(&bc_serv->sv_cb_lock);
                wake_up(&bc_serv->sv_cb_waitq);
-       } else
+       } else {
                req_partial = req;
+       }

        req->rq_private_buf.len = transport->tcp_copied;


>
>> -     req = xprt_alloc_bc_request(xprt);
>>       if (req == NULL) {
>>               printk(KERN_WARNING "Callback slot table overflowed\n");
>>               xprt_force_disconnect(xprt);
>> @@ -1285,6 +1290,7 @@ static inline int xs_tcp_read_callback(struct rpc_xprt *xprt,
>>
>>       if (!(transport->tcp_flags & TCP_RCV_COPY_DATA)) {
>>               struct svc_serv *bc_serv = xprt->bc_serv;
>> +             req_partial = NULL;
>>
>>               /*
>>                * Add callback request to callback list.  The callback
>> @@ -1297,7 +1303,8 @@ static inline int xs_tcp_read_callback(struct rpc_xprt *xprt,
>>               list_add(&req->rq_bc_list, &bc_serv->sv_cb_list);
>>               spin_unlock(&bc_serv->sv_cb_lock);
>>               wake_up(&bc_serv->sv_cb_waitq);
>> -     }
>> +     } else
>> +             req_partial = req;
>>
>>       req->rq_private_buf.len = transport->tcp_copied;
>>
>
>
> --
> Trond Myklebust
> Linux NFS client maintainer
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH net-next 5/5] bonding: remove the redundant judgements for bond_set_mac_address()
From: Ding Tianhong @ 2014-01-21  9:44 UTC (permalink / raw)
  To: Jay Vosburgh, Veaceslav Falico, Andy Gospodarek, David S. Miller,
	Netdev

The dev_set_mac_address() will check the dev->netdev_ops->ndo_set_mac_address,
so no need to check it in bond_set_mac_address().

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
 drivers/net/bonding/bond_main.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index a318a78..638e1d5 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3500,15 +3500,7 @@ static int bond_set_mac_address(struct net_device *bond_dev, void *addr)
 	 */
 
 	bond_for_each_slave(bond, slave, iter) {
-		const struct net_device_ops *slave_ops = slave->dev->netdev_ops;
 		pr_debug("slave %p %s\n", slave, slave->dev->name);
-
-		if (slave_ops->ndo_set_mac_address == NULL) {
-			res = -EOPNOTSUPP;
-			pr_debug("EOPNOTSUPP %s\n", slave->dev->name);
-			goto unwind;
-		}
-
 		res = dev_set_mac_address(slave->dev, addr);
 		if (res) {
 			/* TODO: consider downing the slave
-- 
1.8.0

^ permalink raw reply related

* [PATCH net-next 4/5] bonding: set fail_over_mac to active only in active-backup mode at enslavement
From: Ding Tianhong @ 2014-01-21  9:44 UTC (permalink / raw)
  To: Jay Vosburgh, Veaceslav Falico, Andy Gospodarek, David S. Miller,
	Netdev

The fail_over_mac only affect active-backup mode, if the first slave
does not support setting the MAC address, we should set fail_over_mac
to active only in active-backup mode, otherwise the bonding could not set
all slaves to the master's address for other modes.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
 drivers/net/bonding/bond_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index ecff04e..a318a78 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1336,7 +1336,8 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 		if (!bond_has_slaves(bond)) {
 			pr_warning("%s: Warning: The first slave device specified does not support setting the MAC address. Setting fail_over_mac to active.",
 				   bond_dev->name);
-			bond->params.fail_over_mac = BOND_FOM_ACTIVE;
+			if (bond->params.mode == BOND_MODE_ACTIVEBACKUP)
+				bond->params.fail_over_mac = BOND_FOM_ACTIVE;
 		} else if (bond->params.fail_over_mac != BOND_FOM_ACTIVE) {
 			pr_err("%s: Error: The slave device specified does not support setting the MAC address, but fail_over_mac is not set to active.\n",
 			       bond_dev->name);
-- 
1.8.0

^ permalink raw reply related

* [PATCH net-next 3/5] bonding: set fail_over_mac to none if new mode is not active-backup
From: Ding Tianhong @ 2014-01-21  9:44 UTC (permalink / raw)
  To: Jay Vosburgh, Veaceslav Falico, Andy Gospodarek, David S. Miller,
	Netdev

The fail_over_mac only affects active backup mode, if it is set to active
or follow, the bonding could not set all slaves to the same MAC address
for other modes, just like RR, XOR, 802.3ad or BROADCAST, so when changing
bond to new mode and the new mode is not active-backup, we should check
the fail_over_mac and set it to 0.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
 drivers/net/bonding/bond_options.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index 0996ab4..74d12ba 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -50,6 +50,12 @@ int bond_option_mode_set(struct bonding *bond, int mode)
 			bond->dev->name, bond->params.miimon);
 	}
 
+	if (mode != BOND_MODE_ACTIVEBACKUP && bond->params.fail_over_mac) {
+		pr_info("%s: fail_over_mac only affects active-backup mode, so set it to 0\n",
+			bond->dev->name);
+		bond->params.fail_over_mac = BOND_FOM_NONE;
+	}
+
 	/* don't cache arp_validate between modes */
 	bond->params.arp_validate = BOND_ARP_VALIDATE_NONE;
 	bond->params.mode = mode;
-- 
1.8.0

^ permalink raw reply related

* Re: [PATCH v3 1/4] net_dma: simple removal
From: Dan Williams @ 2014-01-21  9:44 UTC (permalink / raw)
  To: saeed bishara
  Cc: dmaengine@vger.kernel.org, Alexander Duyck, Dave Jiang,
	Vinod Koul, netdev@vger.kernel.org, David Whipple, lkml,
	David S. Miller
In-Reply-To: <CAMAG_efOD03wLtetfNAfymsCs6kVEqoJx6e7qY2txBBMDgbmfQ@mail.gmail.com>

On Fri, Jan 17, 2014 at 12:16 PM, saeed bishara <saeed.bishara@gmail.com> wrote:
> Dan,
>
> isn't this issue similar to direct io case?
> can you please look at the following article
> http://lwn.net/Articles/322795/

I guess it's similar, but the NET_DMA dma api violation is more
blatant.  The same thread that requested DMA is also writing to those
same pages with the cpu.  The fix is either guaranteeing that only the
dma engine ever touches the gup'd pages or synchronizing dma before
every cpu fallback.

> regarding performance improvement using NET_DMA, I don't have concrete
> numbers, but it should be around 15-20%. my system is i/o coherent.

That sounds too high... is that throughput or cpu utilization?  It
sounds high because NET_DMA also makes the data cache cold while the
cpu copy warms the data before handing it to the application.

Can you measure relative numbers and share your testing details?  You
will need to fix the data corruption and verify that the performance
advantage is still there before proposing NET_DMA be restored.

I have a new dma_debug capability in Andrew's tree that can you help
you identify holes in the implementation.

http://ozlabs.org/~akpm/mmots/broken-out/dma-debug-introduce-debug_dma_assert_idle.patch

--
Dan

>
> saeed
>
> On Wed, Jan 15, 2014 at 11:33 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>> On Wed, Jan 15, 2014 at 1:31 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>>> On Wed, Jan 15, 2014 at 1:20 PM, saeed bishara <saeed.bishara@gmail.com> wrote:
>>>> Hi Dan,
>>>>
>>>> I'm using net_dma on my system and I achieve meaningful performance
>>>> boost when running Iperf receive.
>>>>
>>>> As far as I know the net_dma is used by many embedded systems out
>>>> there and might effect their performance.
>>>> Can you please elaborate on the exact scenario that cause the memory corruption?
>>>>
>>>> Is the scenario mentioned here caused by "real life" application or
>>>> this is more of theoretical issue found through manual testing, I was
>>>> trying to find the thread describing the failing scenario and couldn't
>>>> find it, any pointer will be appreciated.
>>>
>>> Did you see the referenced commit?
>>>
>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=77873803363c
>>>
>>> This is a real issue in that any app that forks() while receiving data
>>> can cause the dma data to be lost.  The problem is that the copy
>>> operation falls back to cpu at many locations.  Any one of those
>>> instance could touch a mapped page and trigger a copy-on-write event.
>>> The dma completes to the wrong location.
>>>
>>
>> Btw, do you have benchmark data showing that NET_DMA is beneficial on
>> these platforms?  I would have expected worse performance on platforms
>> without i/o coherent caches.

^ permalink raw reply

* [PATCH net-next 2/5] bonding: don't set fail_over_mac if the mode is not active backup
From: Ding Tianhong @ 2014-01-21  9:44 UTC (permalink / raw)
  To: Jay Vosburgh, Veaceslav Falico, David S. Miller, Netdev,
	Andy Gospodarek

The fail_over_mac could be set by sysfs in every mode now, it will
cause bonding not to set all slaves to the same MAC address, but in
rr, xor, broadcast, lacp mode ,the slaves should have the same MAC
address with the master, so don't set the fail_over_mac by sysfs
if the mode is not active-backup.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
 drivers/net/bonding/bond_options.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index 945a666..0996ab4 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -550,6 +550,12 @@ int bond_option_primary_reselect_set(struct bonding *bond, int primary_reselect)
 
 int bond_option_fail_over_mac_set(struct bonding *bond, int fail_over_mac)
 {
+	if (bond->params.mode != BOND_MODE_ACTIVEBACKUP) {
+		pr_err("%s: fail_over_mac only affects active-backup mode\n",
+		       bond->dev->name);
+		return -EINVAL;
+	}
+
 	if (bond_parm_tbl_lookup(fail_over_mac, fail_over_mac_tbl) < 0) {
 		pr_err("%s: Ignoring invalid fail_over_mac value %d.\n",
 		       bond->dev->name, fail_over_mac);
-- 
1.8.0

^ permalink raw reply related

* [PATCH net-next 1/5] bonding: The fail_over_mac should be set only in ACTIVE_BACKUP mode
From: Ding Tianhong @ 2014-01-21  9:44 UTC (permalink / raw)
  To: Jay Vosburgh, Veaceslav Falico, David S. Miller, Netdev,
	Andy Gospodarek

According the bonding.txt, the option fail_over_mac only affect for
AB mode, but in currect code, the parameter could be set to active
or follow in every mode, this will cause bonding could not set all
slaves of an RR or XOR mode to the same MAC address at enslavement
time, so reset fail_over_mac to 0 if the mode is not ACTIVE_BACKUP.

Fix the wrong variables for pr_err().

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
 drivers/net/bonding/bond_main.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 3220b48..ecff04e 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4307,12 +4307,14 @@ static int bond_check_params(struct bond_params *params)
 						      fail_over_mac_tbl);
 		if (fail_over_mac_value == -1) {
 			pr_err("Error: invalid fail_over_mac \"%s\"\n",
-			       arp_validate == NULL ? "NULL" : arp_validate);
+			       fail_over_mac == NULL ? "NULL" : fail_over_mac);
 			return -EINVAL;
 		}
 
-		if (bond_mode != BOND_MODE_ACTIVEBACKUP)
-			pr_warning("Warning: fail_over_mac only affects active-backup mode.\n");
+		if (bond_mode != BOND_MODE_ACTIVEBACKUP) {
+			pr_warning("Warning: fail_over_mac only affects active-backup mode, set it to 0.\n");
+			fail_over_mac_value = BOND_FOM_NONE;
+		}
 	} else {
 		fail_over_mac_value = BOND_FOM_NONE;
 	}
-- 
1.8.0

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox