Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next v8 2/3] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
From: Simon Horman @ 2017-04-26 11:02 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: Jamal Hadi Salim, davem, xiyou.wangcong, eric.dumazet, netdev
In-Reply-To: <20170425160445.GD1867@nanopsycho.orion>

On Tue, Apr 25, 2017 at 06:04:45PM +0200, Jiri Pirko wrote:
> Tue, Apr 25, 2017 at 03:01:22PM CEST, jhs@mojatatu.com wrote:
> >On 17-04-25 08:13 AM, Jiri Pirko wrote:
> >> Tue, Apr 25, 2017 at 01:54:06PM CEST, jhs@mojatatu.com wrote:
> >
> >
> >[..]
> >
> >> > -#define TCAA_MAX 1
> >> > +/* tcamsg flags stored in attribute TCA_ROOT_FLAGS
> >> > + *
> >> > + * TCA_FLAG_LARGE_DUMP_ON user->kernel to request for larger than TCA_ACT_MAX_PRIO
> >> > + * actions in a dump. All dump responses will contain the number of actions
> >> > + * being dumped stored in for user app's consumption in TCA_ROOT_COUNT
> >> > + *
> >> > + */
> >> > +#define TCA_FLAG_LARGE_DUMP_ON		(1 << 0)
> >> 
> >> BIT (I think I mentioned this before)
> >> 
> >
> >You did - but i took it out about two submissions back (per cover
> >letter) because it is no part of UAPI today. I noticed devlink was
> >using it but they defined  their own variant.
> >So if i added this, iproute2 doesnt compile. I could fix iproute2
> >to move it somewhere to a common header then restore this.
> 
> So fix iproute2. It is always first kernel, then iproute2.

Perhaps I am missing the point or somehow misguided but I would expect that
if the UAPI uses BIT() it also provides BIT().

...

^ permalink raw reply

* Re: [PATCH v4 net-next] mdio_bus: Issue GPIO RESET to PHYs.
From: Roger Quadros @ 2017-04-26 10:46 UTC (permalink / raw)
  To: Florian Fainelli, Lars-Peter Clausen, Andrew Lunn
  Cc: davem, tony, nsekhar, jsarha, netdev, linux-omap, linux-kernel
In-Reply-To: <7c443bf0-03ac-cc1c-3776-a36287801e96@gmail.com>

On 25/04/17 19:31, Florian Fainelli wrote:
> On 04/25/2017 09:22 AM, Lars-Peter Clausen wrote:
>> On 04/24/2017 11:04 AM, Roger Quadros wrote:
>>> On 24/04/17 02:35, Andrew Lunn wrote:
>>>> On Fri, Apr 21, 2017 at 03:31:09PM +0200, Lars-Peter Clausen wrote:
>>>>> On 04/21/2017 03:15 PM, Roger Quadros wrote:
>>>>>> diff --git a/Documentation/devicetree/bindings/net/mdio.txt b/Documentation/devicetree/bindings/net/mdio.txt
>>>>>> new file mode 100644
>>>>>> index 0000000..4ffbbac
>>>>>> --- /dev/null
>>>>>> +++ b/Documentation/devicetree/bindings/net/mdio.txt
>>>>>> @@ -0,0 +1,33 @@
>>>>>> +Common MDIO bus properties.
>>>>>> +
>>>>>> +These are generic properties that can apply to any MDIO bus.
>>>>>> +
>>>>>> +Optional properties:
>>>>>> +- reset-gpios: List of one or more GPIOs that control the RESET lines
>>>>>> +  of the PHYs on that MDIO bus.
>>>>>> +- reset-delay-us: RESET pulse width in microseconds as per PHY datasheet.
>>>>>> +
>>>>>> +A list of child nodes, one per device on the bus is expected. These
>>>>>> +should follow the generic phy.txt, or a device specific binding document.
>>>>>> +
>>>>>> +Example :
>>>>>> +This example shows these optional properties, plus other properties
>>>>>> +required for the TI Davinci MDIO driver.
>>>>>> +
>>>>>> +	davinci_mdio: ethernet@0x5c030000 {
>>>>>> +		compatible = "ti,davinci_mdio";
>>>>>> +		reg = <0x5c030000 0x1000>;
>>>>>> +		#address-cells = <1>;
>>>>>> +		#size-cells = <0>;
>>>>>> +
>>>>>> +		reset-gpios = <&gpio2 5 GPIO_ACTIVE_LOW>;
>>>>>> +		reset-delay-us = <2>;   /* PHY datasheet states 1us min */
>>>>>
>>>>> If this is the reset line of the PHY shouldn't it be a property of the PHY
>>>>> node rather than of the MDIO controller node (which might have a reset on
>>>>> its own)?
>>>>>> +
>>>>>> +		ethphy0: ethernet-phy@1 {
>>>>>> +			reg = <1>;
>>>>>> +		};
>>>>>> +
>>>>>> +		ethphy1: ethernet-phy@3 {
>>>>>> +			reg = <3>;
>>>>>> +		};
>>>>
>>>> Hi Lars-Peter
>>>>
>>>> We discussed this when the first proposal was made. There are two
>>>> cases, to consider.
>>>>
>>>> 1) Here, one GPIO line resets all PHYs on the same MDIO bus. In this
>>>> example, two PHYs.
>>>>
>>>> 2) There is one GPIO line per PHY. That is a separate case, and as you
>>>> say, the reset line should probably be considered a PHY property, not
>>>> an MDIO property. However, it can be messy, since in order to probe
>>>> the MDIO bus, you probably need to take the PHY out of reset.
>>>>
>>
>> But the DT binding documentation says something else "List of one or more
>> GPIOs that control the RESET lines of the PHYs on that MDIO bus".
> 
> I agree, it should be defined more strictly as:
> 
> "One GPIO that controls the reset line of *all* PHYs populated on that
> MDIO bus"

Patch is already in net-next. How can we get this fixed? Should I send a v5?

> 
> If there are separate lines, these automatically become properties of
> the PHY nodes.
> 
>>
>>>> Anyway, this patch addresses the first case, so should be accepted. If
>>>> anybody wants to address the second case, they are free to do so.
>>
>> I think we all know that that's not going to happen. Once there is a working
>> kludge there is no incentive to do a proper implementation anymore.
>>
>>
>>> Thanks for the explanation Andrew.
>>>
>>> For the second case, even if the RESET GPIO property is specified
>>> in the PHY node, the RESET *will* have to be done by the MDIO bus driver
>>> else the PHY might not be probed at all.
>>
>> I'm not arguing with that, just that the hardware description should be
>> truthful to the hardware topology and not to the software topology, i.e. the
>> implementation details of the Linux kernel in this case. Reset GPIOs are not
>> the only resource that is connected to the PHY that needs to be enabled
>> before they can be enumerated. E.g. clocks and regulators fall into the same
>> realm. And while you might argue that with a on-SoC phy controller node
>> there wont be any conflicts in regard to the reset-gpios property, this not
>> so very true for the clocks property.
> 
> Agreed, but with the exception of the unfortunate choice of words here
> (single vs. multiple) there is not a really a divergence in how the
> shared reset line is represented compared to other similar control
> busses, is there?
> 
>>
>> And MDIO is not really special in this regard, other discoverable buses
>> (like USB, SDIO, ULPI) have the very same issue. Having a standardized
>> binding approach where the resources are declared as part as the child child
>> is preferable in my opinion.
>>
>>>
>>> Whether we need additional code to just to make the DT look prettier is
>>> questionable and if required can come as a separate patch.
>>
>> Unfortunately not, once it is merged it can't be changed anymore.
> 
> There are no in tree users yet, so let's get the different things fixed
> right now.
> 

-- 
cheers,
-roger

^ permalink raw reply

* Re: [PATCH v4 net-next] mdio_bus: Issue GPIO RESET to PHYs.
From: Roger Quadros @ 2017-04-26 10:43 UTC (permalink / raw)
  To: Lars-Peter Clausen, Andrew Lunn
  Cc: davem, Florian Fainelli, tony, nsekhar, jsarha, netdev,
	linux-omap, linux-kernel
In-Reply-To: <5954130c-c08c-1555-4d34-83307ed68d92@metafoo.de>

On 25/04/17 19:22, Lars-Peter Clausen wrote:
> On 04/24/2017 11:04 AM, Roger Quadros wrote:
>> On 24/04/17 02:35, Andrew Lunn wrote:
>>> On Fri, Apr 21, 2017 at 03:31:09PM +0200, Lars-Peter Clausen wrote:
>>>> On 04/21/2017 03:15 PM, Roger Quadros wrote:
>>>>> diff --git a/Documentation/devicetree/bindings/net/mdio.txt b/Documentation/devicetree/bindings/net/mdio.txt
>>>>> new file mode 100644
>>>>> index 0000000..4ffbbac
>>>>> --- /dev/null
>>>>> +++ b/Documentation/devicetree/bindings/net/mdio.txt
>>>>> @@ -0,0 +1,33 @@
>>>>> +Common MDIO bus properties.
>>>>> +
>>>>> +These are generic properties that can apply to any MDIO bus.
>>>>> +
>>>>> +Optional properties:
>>>>> +- reset-gpios: List of one or more GPIOs that control the RESET lines
>>>>> +  of the PHYs on that MDIO bus.
>>>>> +- reset-delay-us: RESET pulse width in microseconds as per PHY datasheet.
>>>>> +
>>>>> +A list of child nodes, one per device on the bus is expected. These
>>>>> +should follow the generic phy.txt, or a device specific binding document.
>>>>> +
>>>>> +Example :
>>>>> +This example shows these optional properties, plus other properties
>>>>> +required for the TI Davinci MDIO driver.
>>>>> +
>>>>> +	davinci_mdio: ethernet@0x5c030000 {
>>>>> +		compatible = "ti,davinci_mdio";
>>>>> +		reg = <0x5c030000 0x1000>;
>>>>> +		#address-cells = <1>;
>>>>> +		#size-cells = <0>;
>>>>> +
>>>>> +		reset-gpios = <&gpio2 5 GPIO_ACTIVE_LOW>;
>>>>> +		reset-delay-us = <2>;   /* PHY datasheet states 1us min */
>>>>
>>>> If this is the reset line of the PHY shouldn't it be a property of the PHY
>>>> node rather than of the MDIO controller node (which might have a reset on
>>>> its own)?
>>>>> +
>>>>> +		ethphy0: ethernet-phy@1 {
>>>>> +			reg = <1>;
>>>>> +		};
>>>>> +
>>>>> +		ethphy1: ethernet-phy@3 {
>>>>> +			reg = <3>;
>>>>> +		};
>>>
>>> Hi Lars-Peter
>>>
>>> We discussed this when the first proposal was made. There are two
>>> cases, to consider.
>>>
>>> 1) Here, one GPIO line resets all PHYs on the same MDIO bus. In this
>>> example, two PHYs.
>>>
>>> 2) There is one GPIO line per PHY. That is a separate case, and as you
>>> say, the reset line should probably be considered a PHY property, not
>>> an MDIO property. However, it can be messy, since in order to probe
>>> the MDIO bus, you probably need to take the PHY out of reset.
>>>
> 
> But the DT binding documentation says something else "List of one or more
> GPIOs that control the RESET lines of the PHYs on that MDIO bus".
> 
>>> Anyway, this patch addresses the first case, so should be accepted. If
>>> anybody wants to address the second case, they are free to do so.
> 
> I think we all know that that's not going to happen. Once there is a working
> kludge there is no incentive to do a proper implementation anymore.
> 
> 
>> Thanks for the explanation Andrew.
>>
>> For the second case, even if the RESET GPIO property is specified
>> in the PHY node, the RESET *will* have to be done by the MDIO bus driver
>> else the PHY might not be probed at all.
> 
> I'm not arguing with that, just that the hardware description should be
> truthful to the hardware topology and not to the software topology, i.e. the
> implementation details of the Linux kernel in this case. Reset GPIOs are not
> the only resource that is connected to the PHY that needs to be enabled
> before they can be enumerated. E.g. clocks and regulators fall into the same
> realm. And while you might argue that with a on-SoC phy controller node
> there wont be any conflicts in regard to the reset-gpios property, this not
> so very true for the clocks property.
> 
> And MDIO is not really special in this regard, other discoverable buses
> (like USB, SDIO, ULPI) have the very same issue. Having a standardized
> binding approach where the resources are declared as part as the child child
> is preferable in my opinion.

Good point. I agree now that if PHYs have individual RESET lines, they
should be part of the PHY node.
> 
>>
>> Whether we need additional code to just to make the DT look prettier is
>> questionable and if required can come as a separate patch.
> 
> Unfortunately not, once it is merged it can't be changed anymore.
> 

cheers,
-roger

^ permalink raw reply

* Re: [PATCH v2 0/8] NFC: fix device allocation and nfcmrvl crashes
From: Johan Hovold @ 2017-04-26 10:36 UTC (permalink / raw)
  To: Samuel Ortiz; +Cc: Johan Hovold, linux-wireless, netdev, linux-kernel
In-Reply-To: <20170418232431.GB13724@zurbaran.ger.intel.com>

On Wed, Apr 19, 2017 at 01:24:31AM +0200, Samuel Ortiz wrote:

> On Tue, Apr 18, 2017 at 12:09:16PM +0200, Johan Hovold wrote:
> > On Thu, Mar 30, 2017 at 12:15:34PM +0200, Johan Hovold wrote:
> > > This started out with the observation that the nfcmrvl_uart driver
> > > unconditionally dereferenced the tty class device despite the fact that
> > > not every tty has an associated struct device (Unix98 ptys). Some
> > > further changes were needed in the common nfcmrvl code to fully address
> > > this, some of which also incidentally fixed a few related bugs (e.g.
> > > resource leaks in error paths).
> > > 
> > > While fixing this I stumbled over a regression in NFC core that lead to
> > > broken registration error paths and misnamed workqueues.
> > > 
> > > Note that this has only been tested by configuring the n_hci line
> > > discipline for different ttys without any actual NFC hardware connected.
> > 
> > > Johan Hovold (8):
> > >   NFC: fix broken device allocation
> > >   NFC: nfcmrvl_uart: add missing tty-device sanity check
> > >   NFC: nfcmrvl: do not use device-managed resources
> > >   NFC: nfcmrvl: use nfc-device for firmware download
> > >   NFC: nfcmrvl: fix firmware-management initialisation
> > >   NFC: nfcmrvl_uart: fix device-node leak during probe
> > >   NFC: nfcmrvl_usb: use interface as phy device
> > >   NFC: nfcmrvl: allow gpio 0 for reset signalling
> > 
> > Any chance of getting these into 4.12, Samuel?

> I have yours and Mark Greer patchset pending. I'll push them to nfc-next
> and see if I can send another pull request to Dave by the end of this
> week.

Any progress on this now that we got another week?

Thanks,
Johan

^ permalink raw reply

* Re: [PATCH v1] net: phy: fix auto-negotiation stall due to unavailable interrupt
From: Alexandre Belloni @ 2017-04-26 10:21 UTC (permalink / raw)
  To: Alexander Kochetkov
  Cc: David Miller, Florian Fainelli, netdev, LKML, Sergei Shtylyov,
	Roger Quadros, Madalin Bucur
In-Reply-To: <20170425200911.d7zfakqs5k5zwisg@piout.net>

On 25/04/2017 at 22:09:11 +0200, Alexandre Belloni wrote:
> Hi,
> 
> On 25/04/2017 at 18:25:30 +0300, Alexander Kochetkov wrote:
> > Hello David!
> > 
> > > 25 апр. 2017 г., в 17:36, David Miller <davem@davemloft.net> написал(а):
> > > 
> > > So... what are we doing here?
> > > 
> > > My understanding is that this should fix the same problem that commit
> > > 99f81afc139c6edd14d77a91ee91685a414a1c66 ("phy: micrel: Disable auto
> > > negotiation on startup") fixed and that this micrel commit should thus
> > > be reverted to improve MAC startup times which regressed.
> > > 
> > > Florian, any guidance?
> > 
> > Yes, this should be done.
> > 
> > I aksed Alexandre to check if 99f81afc139c6edd14d77a91ee91685a414a1c66 ("phy: micrel: Disable auto
> > negotiation on startup») can be reverted, and he answered what it may do that
> > sometime this/next week.
> > 
> 
> Yes, it can be safely reverted after Alexander's patch. I had to test on
> v4.7 because we are not using interrupts on those boards since v4.8
> (another issue to be fixed).
> 
> As Florian pointed out, at the time I sent my patch, I didn't have time
> to investigate whether this was affecting other phys, see
> https://lkml.org/lkml/2016/2/26/766
> 
> I can send the revert or you can do it.
> 

I've now tested on linux-next after fixing phy interrupts in the macb
driver and this also fixes the issue I was trying to solve with:
https://lkml.org/lkml/2016/4/15/786

-- 
Alexandre Belloni, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply

* [PATCH] net: macb: fix phy interrupt parsing
From: Alexandre Belloni @ 2017-04-26 10:06 UTC (permalink / raw)
  To: Nicolas Ferre, David S . Miller
  Cc: Bartosz Folta, netdev, linux-kernel, Alexandre Belloni

Since 83a77e9ec415, the phydev irq is explicitly set to PHY_POLL when
there is no pdata. It doesn't work on DT enabled platforms because the
phydev irq is already set by libphy before.

Fixes: 83a77e9ec415 ("net: macb: Added PCI wrapper for Platform Driver.")
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
---
 drivers/net/ethernet/cadence/macb.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c
index e131ecd17ab9..004223a866fe 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -432,15 +432,17 @@ static int macb_mii_probe(struct net_device *dev)
 	}
 
 	pdata = dev_get_platdata(&bp->pdev->dev);
-	if (pdata && gpio_is_valid(pdata->phy_irq_pin)) {
-		ret = devm_gpio_request(&bp->pdev->dev, pdata->phy_irq_pin,
-					"phy int");
-		if (!ret) {
-			phy_irq = gpio_to_irq(pdata->phy_irq_pin);
-			phydev->irq = (phy_irq < 0) ? PHY_POLL : phy_irq;
+	if (pdata) {
+		if (gpio_is_valid(pdata->phy_irq_pin)) {
+			ret = devm_gpio_request(&bp->pdev->dev,
+						pdata->phy_irq_pin, "phy int");
+			if (!ret) {
+				phy_irq = gpio_to_irq(pdata->phy_irq_pin);
+				phydev->irq = (phy_irq < 0) ? PHY_POLL : phy_irq;
+			}
+		} else {
+			phydev->irq = PHY_POLL;
 		}
-	} else {
-		phydev->irq = PHY_POLL;
 	}
 
 	/* attach the mac to the phy */
-- 
2.11.0

^ permalink raw reply related

* Re: [iproute2] iplink: add support for IFLA_CARRIER attribute
From: Nikolay Aleksandrov @ 2017-04-26  9:59 UTC (permalink / raw)
  To: Zhang Shengju, netdev
In-Reply-To: <1493190519-5518-1-git-send-email-zhangshengju@cmss.chinamobile.com>

On 26/04/17 10:08, Zhang Shengju wrote:
> Add support to set IFLA_CARRIER attribute.
> 
> Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
> ---
>  ip/iplink.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 

You should also update the ip-link man page with this new option.

> diff --git a/ip/iplink.c b/ip/iplink.c
> index 866ad72..263bfdd 100644
> --- a/ip/iplink.c
> +++ b/ip/iplink.c
> @@ -72,6 +72,7 @@ void iplink_usage(void)
>  		"	                  [ allmulticast { on | off } ]\n"
>  		"	                  [ promisc { on | off } ]\n"
>  		"	                  [ trailers { on | off } ]\n"
> +		"	                  [ carrier { on | off } ]\n"
>  		"	                  [ txqueuelen PACKETS ]\n"
>  		"	                  [ name NEWNAME ]\n"
>  		"	                  [ address LLADDR ]\n"
> @@ -673,6 +674,17 @@ int iplink_parse(int argc, char **argv, struct iplink_req *req,
>  				req->i.ifi_flags |= IFF_NOARP;
>  			else
>  				return on_off("arp", *argv);
> +		} else if (strcmp(*argv, "carrier") == 0) {
> +			int carrier;

Please leave a blank line between the variable definition and the code.

> +			NEXT_ARG();
> +			if (strcmp(*argv, "on") == 0)
> +				carrier = 1;
> +			else if (strcmp(*argv, "off") == 0)
> +				carrier = 0;
> +			else
> +				return on_off("carrier", *argv);
> +
> +			addattr8(&req->n, sizeof(*req), IFLA_CARRIER, carrier);
>  		} else if (strcmp(*argv, "vf") == 0) {
>  			struct rtattr *vflist;
>  
> 

^ permalink raw reply

* [PATCH net-next] l2tp: remove useless device duplication test in l2tp_eth_create()
From: Guillaume Nault @ 2017-04-26  9:54 UTC (permalink / raw)
  To: netdev; +Cc: James Chapman

There's no need to verify that cfg->ifname is unique at this point.
register_netdev() will return -EEXIST if asked to create a device with
a name that's alrealy in use.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
---
 net/l2tp/l2tp_eth.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/net/l2tp/l2tp_eth.c b/net/l2tp/l2tp_eth.c
index 59aba8aeac03..8b21af7321b9 100644
--- a/net/l2tp/l2tp_eth.c
+++ b/net/l2tp/l2tp_eth.c
@@ -280,12 +280,6 @@ static int l2tp_eth_create(struct net *net, u32 tunnel_id, u32 session_id, u32 p
 	}
 
 	if (cfg->ifname) {
-		dev = dev_get_by_name(net, cfg->ifname);
-		if (dev) {
-			dev_put(dev);
-			rc = -EEXIST;
-			goto out;
-		}
 		strlcpy(name, cfg->ifname, IFNAMSIZ);
 		name_assign_type = NET_NAME_USER;
 	} else {
-- 
2.11.0

^ permalink raw reply related

* [net-next] net: remove unnecessary carrier status check
From: Zhang Shengju @ 2017-04-26  9:49 UTC (permalink / raw)
  To: davem, netdev

Since netif_carrier_on() will do nothing if device's carrier is already
on, so it's unnecessary to do carrier status check.

It's the same for netif_carrier_off().

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
---
 net/core/dev.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index db6e315..1d02c5a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7104,13 +7104,10 @@ void netif_stacked_transfer_operstate(const struct net_device *rootdev,
 	else
 		netif_dormant_off(dev);
 
-	if (netif_carrier_ok(rootdev)) {
-		if (!netif_carrier_ok(dev))
-			netif_carrier_on(dev);
-	} else {
-		if (netif_carrier_ok(dev))
-			netif_carrier_off(dev);
-	}
+	if (netif_carrier_ok(rootdev))
+		netif_carrier_on(dev);
+	else
+		netif_carrier_off(dev);
 }
 EXPORT_SYMBOL(netif_stacked_transfer_operstate);
 
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH net-next 1/4] ixgbe: sparc: rename the ARCH_WANT_RELAX_ORDER to IXGBE_ALLOW_RELAXED_ORDER
From: Ding Tianhong @ 2017-04-26  9:26 UTC (permalink / raw)
  To: Amir Ancel, David Laight, 'Gabriele Paoloni',
	davem@davemloft.net
  Cc: Mark Rutland, Catalin Marinas, Will Deacon, LinuxArm,
	alexander.duyck@gmail.com, jeffrey.t.kirsher@intel.com,
	netdev@vger.kernel.org, Robin Murphy,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <DB5PR05MB138288172B04713EB0C1DB55D3190@DB5PR05MB1382.eurprd05.prod.outlook.com>

Hi Amir:

It is really glad to hear that the mlx5 will support RO mode this year, if so, do you agree that enable it dynamic by ethtool -s xxx,
we have try it several month ago but there was only one drivers would use it at that time so the maintainer against it, it mlx5 would support RO,
we could try to restart this solution, what do you think about it. :)

Thanks
Ding

On 2017/4/19 4:17, Amir Ancel wrote:
> Hi,
> 
> mlx5 driver is planned to have RO support this year.
> 
> I believe drivers should be able to query whether the arch support it or not and enable it in the network adapter accordingly.
> 
>  
> 
> -Amir
> 
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> *From:* netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org> on behalf of David Laight <David.Laight@ACULAB.COM>
> *Sent:* Tuesday, April 18, 2017 4:25:44 PM
> *To:* 'Gabriele Paoloni'; davem@davemloft.net
> *Cc:* Catalin Marinas; Will Deacon; Mark Rutland; Robin Murphy; jeffrey.t.kirsher@intel.com; alexander.duyck@gmail.com; linux-arm-kernel@lists.infradead.org; netdev@vger.kernel.org; Dingtianhong; Linuxarm
> *Subject:* RE: Re: [PATCH net-next 1/4] ixgbe: sparc: rename the ARCH_WANT_RELAX_ORDER to IXGBE_ALLOW_RELAXED_ORDER
>  
> From: Gabriele Paoloni
>> Sent: 13 April 2017 10:11
>> > > Till now only the Intel ixgbe could support enable
>> > > Relaxed ordering in the drivers for special architecture,
>> > > but the ARCH_WANT_RELAX_ORDER is looks like a general name
>> > > for all arch, so rename to a specific name for intel
>> > > card looks more appropriate.
>> > >
>> > > Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
>> >
>> > This is not a driver specific facility.
>> >
>> > Any driver can test this symbol and act accordingly.
>> >
>> > Just because IXGBE is the first and only user, doesn't mean
>> > the facility is driver specific.
>> 
>> 
>> Please correct me if I am wrong but my understanding is that the standard
>> way to enable/disable relaxed ordering is to set/clear bit 4 of the Device
>> Control Register (PCI_EXP_DEVCTL_RELAX_EN 0x0010 /* Enable relaxed
>> ordering */).
>> Now I have looked up for all drivers either enabling or disabling relaxed
>> ordering and none of them seems to need a symbol to decide whether to
>> enable it or not.
>> Also it seems to me enabling/disabling relaxed ordering is never bound to the
>> host architecture.
>> 
>> So why this should be (or it is expected to be) a generic symbol?
>> Wouldn't it be more correct to have this as a driver specific symbol now and
>> move it to a generic one later once we have other drivers requiring it?
> 
> 'Relaxed ordering' is a bit in the TLP header.
> A device (like the ixgbe hardware) can set it for some transactions and
> still have the transactions 'ordered enough' for the driver to work.
> (If all transactions have 'relaxed ordering' set then I suspect it is
> almost impossible to write a working driver.)
> The host side could (probably does) have a bit to enable 'relaxed ordering',
> it could also enforce stronger ordering than required by the PCIe spec
> (eg keeping reads in order).
> 
> Clearly, on some sparc64 systems, ixgbe needs to use 'relaxed ordering'.
> To me this requires two separate bits be enabled:
> 1) to the ixgbe driver to generate the 'relaxed' TLP.
> 2) to the host to actually act on them.
> If the ixgbe driver works when both are enabled why have options to
> disable either (except for bug-finding)?
> 
>         David
> 

^ permalink raw reply

* Re: xdp_redirect ifindex vs port. Was: best API for returning/setting egress port?
From: Jesper Dangaard Brouer @ 2017-04-26  9:11 UTC (permalink / raw)
  To: John Fastabend
  Cc: Alexei Starovoitov, Daniel Borkmann, Andy Gospodarek,
	Daniel Borkmann, Alexei Starovoitov, netdev@vger.kernel.org,
	xdp-newbies@vger.kernel.org, brouer
In-Reply-To: <59000EF6.1050204@gmail.com>

On Tue, 25 Apr 2017 20:07:34 -0700
John Fastabend <john.fastabend@gmail.com> wrote:

> On 17-04-25 05:26 PM, Alexei Starovoitov wrote:
> > On Tue, Apr 25, 2017 at 11:34:53AM +0200, Jesper Dangaard Brouer wrote:  
> >>> Note the very first bpf patchset years ago contained the port table
> >>> abstraction. ovs has concept of vports as well. These two very
> >>> different projects needed port table to provide a layer of
> >>> indirection between ifindex==netdev and virtual port number.
> >>> This is still the case and I'd like to see this port table to be
> >>> implemented for both cls_bpf and xdp. In that sense xdp is not
> >>> special.  
> >>
> >> Glad to hear you want to see this implemented, I will start coding on
> >> this then.  Good point with cls_bpf, I was planning to make this port
> >> table strongly connected to XDP, guess I should also think of cls_bpf.  
> > 
> > perfect.
> > I think we should try to make all additions to bpf networking world
> > to be usable for both tc and xdp, since both are actively used and
> > it wouldn't be great to have cool feature for one, but not the other.
> > I think port table is an excellent candidate that applies to both.  
> 
> +1
> 
> Jesper, I was working up the code for the redirect piece for ixgbe and
> virtio, please use this as a base for your virtual port number table. I'll
> push an update onto github tomorrow. I think the table should drop in fairly
> nicely.

Cool, I will do that. Then, I'll also have a redirect method to shape
this around, and I would have to benchmark/test your ixgbe redirect.

(John please let me know, what github tree we are talking about, and
what branch)


> One piece that isn't clear to me is how do you plan to instantiate and
> program this table. Is it a new static bpf map that is created any
> time we see the redirect command? I think this would be preferred.

(This is difficult to explain without us misunderstanding each-other)

As Alexei also mentioned before, ifindex vs port makes no real
difference seen from the bpf program side.  It is userspace's
responsibility to add ifindex/port's to the bpf-maps, according to how
the bpf program "policy" want to "connect" these ports.  The
port-table system add one extra step, of also adding this port to the
port-table (which lives inside the kernel). 

When loading the XDP program, we also need to pass along a port table
"id" this XDP program is associated with (and if it doesn't exists you
create it).  And your userspace "control-plane" application also need
to know this port table "id", when adding a new port.

The concept of having multiple port tables is key.  As this implies we
can have several simultaneous "data-planes" that is *isolated* from
each-other.  Think about how network-namespaces/containers want
isolation. A subtle thing I'm afraid to mention, is that oppose to the
ifindex model, a port table with mapping to a net_device pointer, would
allow (faster) delivery into the container's inner net_device, which
sort of violates the isolation, but I would argue it is not a problem
as this net_device pointer could only be added from a process within the
namespace.  I like this feature, but it could easily be disallowed via
port insertion-time validation.

   
> >> I'm not worried about the DROP case, I agree that is fine (as you
> >> also say).  The problem is unintentionally sending a packet to a
> >> wrong ifindex.  This is clearly an eBPF program error, BUT with
> >> XDP this becomes a very hard to debug program error.  With
> >> TC-redirect/cls_bpf we can tcpdump the packets, with XDP there is
> >> no visibility into this happening (the NSA is going to love this
> >> "feature").  Maybe we could add yet-another tracepoint to allow
> >> debugging this.  My proposal that we simply remove the possibility
> >> for such program errors, by as you say move the validation from
> >> run-time into static insertion-time, via a port table.  
> > 
> > I think lack of tcpdump-like debugging in xdp is a separate issue.
> > As I was saying in the other thread we have trivial 'xdpdump'
> > kern+user app that emits pcap file, but it's too specific to how we
> > use tail_calls+prog_array in our xdp setup. I'm working on the
> > program chaining that will be generic and allow us transparently
> > add multiple xdp or tc progs to the same attachment point and will
> > allow us to do 'xdpdump' at any point of this pipeline, so
> > debugging of what happened to the packet will be easier and done in
> > the same way for both tc and xdp.
> > btw in our experience working with both tc and xdp the tc+bpf was
> > actually harder to use and more bug prone.
> >   
> 
> Nice, the tcpdump-like debugging looks interesting.

Yes, this xdpdump sound like a very useful tool.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH RFC v2] ptr_ring: add ptr_ring_unconsume
From: Jason Wang @ 2017-04-26  9:09 UTC (permalink / raw)
  To: Michael S. Tsirkin, linux-kernel; +Cc: netdev
In-Reply-To: <1493049492-3229-1-git-send-email-mst@redhat.com>



On 2017年04月25日 00:01, Michael S. Tsirkin wrote:
> Applications that consume a batch of entries in one go
> can benefit from ability to return some of them back
> into the ring.
>
> Add an API for that - assuming there's space. If there's no space
> naturally can't do this and have to drop entries, but this implies ring
> is full so we'd likely drop some anyway.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>
> Jason, if you add this and unconsume the outstanding packets
> on backend disconnect, vhost close and reset, I think
> we should apply your patch even if we don't yet know 100%
> why it helps.
>
> changes from v1:
> - fix up coding style issues reported by Sergei Shtylyov
>
>
>   include/linux/ptr_ring.h | 56 ++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 56 insertions(+)
>
> diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
> index 783e7f5..902afc2 100644
> --- a/include/linux/ptr_ring.h
> +++ b/include/linux/ptr_ring.h
> @@ -457,6 +457,62 @@ static inline int ptr_ring_init(struct ptr_ring *r, int size, gfp_t gfp)
>   	return 0;
>   }
>   
> +/*
> + * Return entries into ring. Destroy entries that don't fit.
> + *
> + * Note: this is expected to be a rare slow path operation.
> + *
> + * Note: producer lock is nested within consumer lock, so if you
> + * resize you must make sure all uses nest correctly.
> + * In particular if you consume ring in interrupt or BH context, you must
> + * disable interrupts/BH when doing so.
> + */
> +static inline void ptr_ring_unconsume(struct ptr_ring *r, void **batch, int n,
> +				      void (*destroy)(void *))
> +{
> +	unsigned long flags;
> +	int head;
> +
> +	spin_lock_irqsave(&r->consumer_lock, flags);
> +	spin_lock(&r->producer_lock);
> +
> +	if (!r->size)
> +		goto done;
> +
> +	/*
> +	 * Clean out buffered entries (for simplicity). This way following code
> +	 * can test entries for NULL and if not assume they are valid.
> +	 */
> +	head = r->consumer_head - 1;
> +	while (likely(head >= r->consumer_tail))
> +		r->queue[head--] = NULL;
> +	r->consumer_tail = r->consumer_head;
> +
> +	/*
> +	 * Go over entries in batch, start moving head back and copy entries.
> +	 * Stop when we run into previously unconsumed entries.
> +	 */
> +	while (n--) {
> +		head = r->consumer_head - 1;
> +		if (head < 0)
> +			head = r->size - 1;
> +		if (r->queue[head]) {
> +			/* This batch entry will have to be destroyed. */
> +			++n;
> +			goto done;
> +		}
> +		r->queue[head] = batch[n];
> +		r->consumer_tail = r->consumer_head = head;

Looks like something wrong here (bad page state reported), uncomment the 
above while() solving the issue. But after staring it for a while I 
didn't find anything interesting, maybe you have some idea on this?

Thanks


> +	}
> +
> +done:
> +	/* Destroy all entries left in the batch. */
> +	while (n--)
> +		destroy(batch[n]);
> +	spin_unlock(&r->producer_lock);
> +	spin_unlock_irqrestore(&r->consumer_lock, flags);
> +}
> +
>   static inline void **__ptr_ring_swap_queue(struct ptr_ring *r, void **queue,
>   					   int size, gfp_t gfp,
>   					   void (*destroy)(void *))

^ permalink raw reply

* Re: [v3] brcmfmac: Make skb header writable before use
From: Kalle Valo @ 2017-04-26  9:05 UTC (permalink / raw)
  To: James Hughes
  Cc: Arend van Spriel, Franky Lin, Hante Meuleman, linux-wireless,
	brcm80211-dev-list.pdl, netdev, James Hughes
In-Reply-To: <20170425091506.18224-1-james.hughes@raspberrypi.org>

James Hughes <james.hughes@raspberrypi.org> wrote:
> The driver was making changes to the skb_header without
> ensuring it was writable (i.e. uncloned).
> This patch also removes some boiler plate header size
> checking/adjustment code as that is also handled by the
> skb_cow_header function used to make header writable.
> 
> Signed-off-by: James Hughes <james.hughes@raspberrypi.org>
> Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>

Patch applied to wireless-drivers-next.git, thanks.

9cc4b7cb86cb brcmfmac: Make skb header writable before use

-- 
https://patchwork.kernel.org/patch/9697763/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply

* Re: [v2] brcmfmac: Ensure pointer correctly set if skb data location changes
From: Kalle Valo @ 2017-04-26  9:04 UTC (permalink / raw)
  To: James Hughes
  Cc: Arend van Spriel, Franky Lin, Hante Meuleman, linux-wireless,
	brcm80211-dev-list.pdl, netdev, James Hughes
In-Reply-To: <20170424114050.24948-1-james.hughes@raspberrypi.org>

James Hughes <james.hughes@raspberrypi.org> wrote:
> The incoming skb header may be resized if header space is
> insufficient, which might change the data adddress in the skb.
> Ensure that a cached pointer to that data is correctly set by
> moving assignment to after any possible changes.
> 
> Signed-off-by: James Hughes <james.hughes@raspberrypi.org>
> 
> Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>

Patch applied to wireless-drivers-next.git, thanks.

455a1eb4654c brcmfmac: Ensure pointer correctly set if skb data location changes

-- 
https://patchwork.kernel.org/patch/9696045/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply

* Re: [1/1] rndis_wlan: add return value validation
From: Kalle Valo @ 2017-04-26  9:04 UTC (permalink / raw)
  To: Pan Bian; +Cc: Jussi Kivilinna, linux-wireless, netdev, linux-kernel, Pan Bian
In-Reply-To: <1492994428-16090-1-git-send-email-bianpan201603@163.com>

Pan Bian <bianpan201603@163.com> wrote:
> From: Pan Bian <bianpan2016@163.com>
> 
> Function create_singlethread_workqueue() will return a NULL pointer if
> there is no enough memory, and its return value should be validated
> before using. However, in function rndis_wlan_bind(), its return value
> is not checked. This may cause NULL dereference bugs. This patch fixes
> it.
> 
> Signed-off-by: Pan Bian <bianpan2016@163.com>

Patch applied to wireless-drivers-next.git, thanks.

9dc7efd3978a rndis_wlan: add return value validation

-- 
https://patchwork.kernel.org/patch/9695387/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply

* Re: [1/1] libertas: check return value of alloc_workqueue
From: Kalle Valo @ 2017-04-26  9:03 UTC (permalink / raw)
  To: Pan Bian
  Cc: Bhaktipriya Shridhar, Tejun Heo, libertas-dev, linux-wireless,
	netdev, linux-kernel, Pan Bian
In-Reply-To: <1492953578-387-1-git-send-email-bianpan201603@163.com>

Pan Bian <bianpan201603@163.com> wrote:
> From: Pan Bian <bianpan2016@163.com>
> 
> Function alloc_workqueue() will return a NULL pointer if there is no
> enough memory, and its return value should be validated before using.
> However, in function if_spi_probe(), its return value is not checked.
> This may result in a NULL dereference bug. This patch fixes the bug.
> 
> Signed-off-by: Pan Bian <bianpan2016@163.com>

Patch applied to wireless-drivers-next.git, thanks.

dc3f89c38a84 libertas: check return value of alloc_workqueue

-- 
https://patchwork.kernel.org/patch/9694827/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply

* Re: [1/1] mt7601u: check return value of alloc_skb
From: Kalle Valo @ 2017-04-26  9:03 UTC (permalink / raw)
  To: Pan Bian
  Cc: Jakub Kicinski, Matthias Brugger,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Pan Bian
In-Reply-To: <1492930823-17249-1-git-send-email-bianpan2016-9Onoh4P/yGk@public.gmane.org>

Pan Bian <bianpan2016-9Onoh4P/yGk@public.gmane.org> wrote:
> Function alloc_skb() will return a NULL pointer if there is no enough
> memory. However, in function mt7601u_mcu_msg_alloc(), its return value
> is not validated before it is used. This patch fixes it.
> 
> Signed-off-by: Pan Bian <bianpan2016-9Onoh4P/yGk@public.gmane.org>
> Acked-by: Jakub Kicinski <kubakici-5tc4TXWwyLM@public.gmane.org>

Patch applied to wireless-drivers-next.git, thanks.

5fb01e91daf8 mt7601u: check return value of alloc_skb

-- 
https://patchwork.kernel.org/patch/9694549/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply

* Re: [1/1] mt7601u: check return value of alloc_skb
From: Kalle Valo @ 2017-04-26  9:03 UTC (permalink / raw)
  To: Pan Bian
  Cc: Jakub Kicinski, netdev, Pan Bian, linux-wireless, linux-kernel,
	linux-mediatek, Matthias Brugger, linux-arm-kernel
In-Reply-To: <1492930823-17249-1-git-send-email-bianpan2016@163.com>

Pan Bian <bianpan2016@163.com> wrote:
> Function alloc_skb() will return a NULL pointer if there is no enough
> memory. However, in function mt7601u_mcu_msg_alloc(), its return value
> is not validated before it is used. This patch fixes it.
> 
> Signed-off-by: Pan Bian <bianpan2016@163.com>
> Acked-by: Jakub Kicinski <kubakici@wp.pl>

Patch applied to wireless-drivers-next.git, thanks.

5fb01e91daf8 mt7601u: check return value of alloc_skb

-- 
https://patchwork.kernel.org/patch/9694549/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply

* Re: orinoco_usb: Fix buffer on stack
From: Kalle Valo @ 2017-04-26  9:01 UTC (permalink / raw)
  To: Maksim Salau
  Cc: David S . Miller, Wolfram Sang, Mugunthan V N, Florian Westphal,
	linux-wireless, netdev, Maksim Salau
In-Reply-To: <20170422170306.11668-1-maksim.salau@gmail.com>

Maksim Salau <maksim.salau@gmail.com> wrote:
> Allocate buffer on HEAP instead of STACK for a local variable
> that is to be sent using usb_control_msg().
> 
> Signed-off-by: Maksim Salau <maksim.salau@gmail.com>

Patch applied to wireless-drivers-next.git, thanks.

2f6ae79cb04b orinoco_usb: Fix buffer on stack

-- 
https://patchwork.kernel.org/patch/9694451/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply

* Re: [PATCH v2 01/21] scatterlist: Introduce sg_map helper functions
From: Christian König @ 2017-04-26  8:59 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-crypto-u79uwXL29TY76Z2rM5mHXA,
	linux-media-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	intel-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-raid-u79uwXL29TY76Z2rM5mHXA,
	linux-mmc-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	open-iscsi-/JYPxA39Uh5TLH3MbocFFw,
	megaraidlinux.pdl-dY08KVG/lbpWk0Htik3J/w,
	sparmaintainer-GLv8BlqOqDDQT0dZR+AlfA,
	devel-gWbeCf7V1WCQmaza687I9mD2FQJk+8+b,
	target-devel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA
  Cc: Jens Axboe, James E.J. Bottomley, Martin K. Petersen,
	Matthew Wilcox, Greg Kroah-Hartman, Christoph Hellwig
In-Reply-To: <1493144468-22493-2-git-send-email-logang-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org>

Am 25.04.2017 um 20:20 schrieb Logan Gunthorpe:
> This patch introduces functions which kmap the pages inside an sgl.
> These functions replace a common pattern of kmap(sg_page(sg)) that is
> used in more than 50 places within the kernel.
>
> The motivation for this work is to eventually safely support sgls that
> contain io memory. In order for that to work, any access to the contents
> of an iomem SGL will need to be done with iomemcpy or hit some warning.
> (The exact details of how this will work have yet to be worked out.)
> Having all the kmaps in one place is just a first step in that
> direction. Additionally, seeing this helps cut down the users of sg_page,
> it should make any effort to go to struct-page-less DMAs a little
> easier (should that idea ever swing back into favour again).
>
> A flags option is added to select between a regular or atomic mapping so
> these functions can replace kmap(sg_page or kmap_atomic(sg_page.
> Future work may expand this to have flags for using page_address or
> vmap. We include a flag to require the function not to fail to
> support legacy code that has no easy error path. Much further in the
> future, there may be a flag to allocate memory and copy the data
> from/to iomem.
>
> We also add the semantic that sg_map can fail to create a mapping,
> despite the fact that the current code this is replacing is assumed to
> never fail and the current version of these functions cannot fail. This
> is to support iomem which may either have to fail to create the mapping or
> allocate memory as a bounce buffer which itself can fail.
>
> Also, in terms of cleanup, a few of the existing kmap(sg_page) users
> play things a bit loose in terms of whether they apply sg->offset
> so using these helper functions should help avoid such issues.
>
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> ---

Good to know that somebody is working on this. Those problems troubled 
us as well.

Patch is Acked-by: Christian König <christian.koenig@amd.com>.

Regards,
Christian.

>   include/linux/scatterlist.h | 85 +++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 85 insertions(+)
>
> diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
> index cb3c8fe..fad170b 100644
> --- a/include/linux/scatterlist.h
> +++ b/include/linux/scatterlist.h
> @@ -5,6 +5,7 @@
>   #include <linux/types.h>
>   #include <linux/bug.h>
>   #include <linux/mm.h>
> +#include <linux/highmem.h>
>   #include <asm/io.h>
>   
>   struct scatterlist {
> @@ -126,6 +127,90 @@ static inline struct page *sg_page(struct scatterlist *sg)
>   	return (struct page *)((sg)->page_link & ~0x3);
>   }
>   
> +#define SG_KMAP		     (1 << 0)	/* create a mapping with kmap */
> +#define SG_KMAP_ATOMIC	     (1 << 1)	/* create a mapping with kmap_atomic */
> +#define SG_MAP_MUST_NOT_FAIL (1 << 2)	/* indicate sg_map should not fail */
> +
> +/**
> + * sg_map - kmap a page inside an sgl
> + * @sg:		SG entry
> + * @offset:	Offset into entry
> + * @flags:	Flags for creating the mapping
> + *
> + * Description:
> + *   Use this function to map a page in the scatterlist at the specified
> + *   offset. sg->offset is already added for you. Note: the semantics of
> + *   this function are that it may fail. Thus, its output should be checked
> + *   with IS_ERR and PTR_ERR. Otherwise, a pointer to the specified offset
> + *   in the mapped page is returned.
> + *
> + *   Flags can be any of:
> + *	* SG_KMAP		- Use kmap to create the mapping
> + *	* SG_KMAP_ATOMIC	- Use kmap_atomic to map the page atommically.
> + *				  Thus, the rules of that function apply: the
> + *				  cpu may not sleep until it is unmaped.
> + *	* SG_MAP_MUST_NOT_FAIL	- Indicate that sg_map must not fail.
> + *				  If it does, it will issue a BUG_ON instead.
> + *				  This is intended for legacy code only, it
> + *				  is not to be used in new code.
> + *
> + *   Also, consider carefully whether this function is appropriate. It is
> + *   largely not recommended for new code and if the sgl came from another
> + *   subsystem and you don't know what kind of memory might be in the list
> + *   then you definitely should not call it. Non-mappable memory may be in
> + *   the sgl and thus this function may fail unexpectedly. Consider using
> + *   sg_copy_to_buffer instead.
> + **/
> +static inline void *sg_map(struct scatterlist *sg, size_t offset, int flags)
> +{
> +	struct page *pg;
> +	unsigned int pg_off;
> +	void *ret;
> +
> +	offset += sg->offset;
> +	pg = nth_page(sg_page(sg), offset >> PAGE_SHIFT);
> +	pg_off = offset_in_page(offset);
> +
> +	if (flags & SG_KMAP_ATOMIC)
> +		ret = kmap_atomic(pg) + pg_off;
> +	else if (flags & SG_KMAP)
> +		ret = kmap(pg) + pg_off;
> +	else
> +		ret = ERR_PTR(-EINVAL);
> +
> +	/*
> +	 * In theory, this can't happen yet. Once we start adding
> +	 * unmapable memory, it also shouldn't happen unless developers
> +	 * start putting unmappable struct pages in sgls and passing
> +	 * it to code that doesn't support it.
> +	 */
> +	BUG_ON(flags & SG_MAP_MUST_NOT_FAIL && IS_ERR(ret));
> +
> +	return ret;
> +}
> +
> +/**
> + * sg_unmap - unmap a page that was mapped with sg_map_offset
> + * @sg:		SG entry
> + * @addr:	address returned by sg_map_offset
> + * @offset:	Offset into entry (same as specified for sg_map)
> + * @flags:	Flags, which are the same specified for sg_map
> + *
> + * Description:
> + *   Unmap the page that was mapped with sg_map_offset
> + **/
> +static inline void sg_unmap(struct scatterlist *sg, void *addr,
> +			    size_t offset, int flags)
> +{
> +	struct page *pg = nth_page(sg_page(sg), offset >> PAGE_SHIFT);
> +	unsigned int pg_off = offset_in_page(offset);
> +
> +	if (flags & SG_KMAP_ATOMIC)
> +		kunmap_atomic(addr - sg->offset - pg_off);
> +	else if (flags & SG_KMAP)
> +		kunmap(pg);
> +}
> +
>   /**
>    * sg_set_buf - Set sg entry to point at given data
>    * @sg:		 SG entry


_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply

* Re: orinoco: fix spelling mistake: "Registerred" -> "Registered"
From: Kalle Valo @ 2017-04-26  8:59 UTC (permalink / raw)
  To: Colin Ian King
  Cc: David S . Miller, Tobias Klauser, Jarod Wilson,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20170422132149.10901-1-colin.king-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

Colin Ian King <colin.king-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote:
> From: Colin Ian King <colin.king-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> 
> trivial fix to spelling mistake in dbg_dbg message
> 
> Signed-off-by: Colin Ian King <colin.king-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

Patch applied to wireless-drivers-next.git, thanks.

bea35f90dbb1 orinoco: fix spelling mistake: "Registerred" -> "Registered"

-- 
https://patchwork.kernel.org/patch/9694381/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply

* Re: [PATCH] ipv6: ensure message length for raw socket is at least sizeof(ipv6hdr)
From: Alexander Potapenko @ 2017-04-26  8:54 UTC (permalink / raw)
  To: David Miller
  Cc: Dmitriy Vyukov, Kostya Serebryany, Eric Dumazet, Alexey Kuznetsov,
	LKML, Networking
In-Reply-To: <20170425.135543.1706004185593424024.davem@davemloft.net>

On Tue, Apr 25, 2017 at 7:55 PM, David Miller <davem@davemloft.net> wrote:
> From: Alexander Potapenko <glider@google.com>
> Date: Tue, 25 Apr 2017 15:18:27 +0200
>
>> rawv6_send_hdrinc() expects that the buffer copied from the userspace
>> contains the IPv6 header, so if too few bytes are copied parts of the
>> header may remain uninitialized.
>>
>> This bug has been detected with KMSAN.
>>
>> Signed-off-by: Alexander Potapenko <glider@google.com>
>
> Hmmm, ipv4 seems to lack this check as well.
>
> I think we need to be careful here and fully understand why KMSAN doesn't
> seem to be triggering in the ipv4 case but for ipv6 it is before I apply
> this.
Maybe I just couldn't come up with a decent test case for ipv4 yet.
syzkaller generated the equivalent of the following program for ipv6:

=======================================
#define _GNU_SOURCE

#include <netinet/in.h>
#include <string.h>
#include <sys/socket.h>
#include <error.h>

int main()
{
  int sock = socket(PF_INET6, SOCK_RAW, IPPROTO_RAW);
  struct sockaddr_in6 dest_addr;
  memset(&dest_addr, 0, sizeof(dest_addr));
  dest_addr.sin6_family = AF_INET6;
  inet_pton(AF_INET6, "ff00::", &dest_addr.sin6_addr);
  int err = sendto(sock, 0, 0, 0, &dest_addr, sizeof(dest_addr));
  if (err == -1)
    perror("sendto");
  return 0;
}
=======================================

I attempted to replace INET6 and such with INET and provide a legal
IPv4 address to inet_pton(), but couldn't trigger the warning.
> Thanks.



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply

* Re: more test_progs...
From: Daniel Borkmann @ 2017-04-26  8:14 UTC (permalink / raw)
  To: Alexei Starovoitov, David Miller; +Cc: netdev
In-Reply-To: <470871b2-c4c0-ad23-7fda-1a38c2736679@fb.com>

On 04/26/2017 05:42 AM, Alexei Starovoitov wrote:
> On 4/25/17 9:52 AM, David Miller wrote:
>>
>> 10: (15) if r3 == 0xdd86 goto pc+9
>>  R0=imm2,min_value=2,max_value=2 R1=pkt(id=0,off=0,r=14) R2=pkt_end R3=inv R4=pkt(id=0,off=14,r=14) R5=inv56 R10=fp
>>
>> Hmmm, endianness looks wrong here.  "-target bpf" defaults to the
>> endianness of whatever cpu that llvm was built for, right?
>
> yeah. so here the code comes from:
>          } else if (eth->h_proto == _htons(ETH_P_IPV6)) {
>
> and in the beginning of that .c file:
> #define _htons __builtin_bswap16
> ^^^ that's the bug.
> It should have been nop for sparc.
> We cannot use htons() from system header, since it uses x86 inline asm.

Ahh yes, you're right! Wouldn't it be much easier for the above
case to just include <asm/byteorder.h> and use __constant_htons()?
(In fact, all htons() users in this file are constants.)

>>  R0=imm2,min_value=2,max_value=2 R1=inv56,min_value=6,max_value=6 R2=pkt_end R3=imm3,min_value=3,max_value=3 R4=pkt(id=1,off=14,r=34) R5=pkt(id=1,off=34,r=34) R10=fp
>> 36: (07) r4 += 18
>> 37: (2d) if r4 > r2 goto pc+5
>>  R0=imm2,min_value=2,max_value=2 R1=inv56,min_value=6,max_value=6 R2=pkt_end R3=imm3,min_value=3,max_value=3 R4=pkt(id=1,off=32,r=34) R5=pkt(id=1,off=34,r=34) R10=fp
>> 38: (b7) r0 = 0
>> 39: (69) r1 = *(u16 *)(r4 +0)
>> Unknown alignment. Only byte-sized access allowed in packet access.
>>
>> And this seems to load the urgent pointer as a u16 which is what the verifier rejects.
>>
>> Oh I see, this is guarded by CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS.
>
> yes. exactly.
> Your analysis is correct and offending 16-bit load is this C code:
>                  if (tcp->urg_ptr == 123)
>
> the parsing code before that line deals with variable length ip header:
>                  tcp = (struct tcphdr *)((void *)(iph) + ihl_len);
>
> and at that point verifier cannot track the alignment of the pointer
> anymore. It knows 'iph' alignment, but as soon as some variable is added
> to it cannot assume good alignment anymore and from there on it
> allows >1 byte accesses only on HAVE_EFFICIENT_UNALIGNED_ACCESS
> architectures.
> That's the 'if' that you found:
>   'if (reg->id && size != 1) {'
>
> ref->id > 0 means that some variable was added to ptr_to_packet.
>
> That sucks for sparc, of course. I don't have good suggestions for
> workaround other than marking tcphdr as packed :(
>  From code efficiency point of view it still will be faster than
> LD_ABS insn.

Plus, ld_abs would also only work on skbs right now, and would
need JIT support for XDP. But it's also cumbersome to use with
f.e. IPv6, etc, so should not be encouraged, imo.

One other option for !HAVE_EFFICIENT_UNALIGNED_ACCESS archs could
be to provide a bpf_skb_load_bytes() helper equivalent for XDP,
where you eventually do the memcpy() inside. We could see to inline
the helper itself to optimize it slightly.

> Another idea is to try to track that ihl_len variable is also
> somehow multiple of 4, but it will be quite complicated on
> the verifier side in its existing architecture and maybe? worth
> waiting until the verifier has proper dataflow analysis.

^ permalink raw reply

* [PATCH net v1 3/3] tipc: close the connection if protocol messages contain errors
From: Parthasarathy Bhuvaragan @ 2017-04-26  8:05 UTC (permalink / raw)
  To: netdev; +Cc: tipc-discussion
In-Reply-To: <1493193902-18332-1-git-send-email-parthasarathy.bhuvaragan@ericsson.com>

When a socket is shutting down, we notify the peer node about the
connection termination by reusing an incoming message if possible.
If the last received message was a connection acknowledgment
message, we reverse this message and set the error code to
TIPC_ERR_NO_PORT and send it to peer.

In tipc_sk_proto_rcv(), we never check for message errors while
processing the connection acknowledgment or probe messages. Thus
this message performs the usual flow control accounting and leaves
the session hanging.

In this commit, we terminate the connection when we receive such
error messages.

Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
---
 net/tipc/socket.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 38c367f6ced4..bdce99f9407a 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -866,6 +866,14 @@ static void tipc_sk_proto_rcv(struct tipc_sock *tsk, struct sk_buff *skb,
 	if (!tsk_peer_msg(tsk, hdr))
 		goto exit;
 
+	if (unlikely(msg_errcode(hdr))) {
+		tipc_set_sk_state(sk, TIPC_DISCONNECTING);
+		tipc_node_remove_conn(sock_net(sk), tsk_peer_node(tsk),
+				      tsk_peer_port(tsk));
+		sk->sk_state_change(sk);
+		goto exit;
+	}
+
 	tsk->probe_unacked = false;
 
 	if (mtyp == CONN_PROBE) {
-- 
2.1.4


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

^ permalink raw reply related

* [PATCH net v1 2/3] tipc: improve error validations for sockets in CONNECTING state
From: Parthasarathy Bhuvaragan @ 2017-04-26  8:05 UTC (permalink / raw)
  To: netdev; +Cc: tipc-discussion
In-Reply-To: <1493193902-18332-1-git-send-email-parthasarathy.bhuvaragan@ericsson.com>

Until now, the checks for sockets in CONNECTING state was based on
the assumption that the incoming message was always from the
peer's accepted data socket.

However an application using a non-blocking socket sends an implicit
connect, this socket which is in CONNECTING state can receive error
messages from the peer's listening socket. As we discard these
messages, the application socket hangs as there due to inactivity.
In addition to this, there are other places where we process errors
but do not notify the user.

In this commit, we process such incoming error messages and notify
our users about them using sk_state_change().

Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
---
 net/tipc/socket.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 3b8df510a80c..38c367f6ced4 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -1259,7 +1259,10 @@ static int tipc_wait_for_rcvmsg(struct socket *sock, long *timeop)
 	struct sock *sk = sock->sk;
 	DEFINE_WAIT(wait);
 	long timeo = *timeop;
-	int err;
+	int err = sock_error(sk);
+
+	if (err)
+		return err;
 
 	for (;;) {
 		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
@@ -1281,6 +1284,10 @@ static int tipc_wait_for_rcvmsg(struct socket *sock, long *timeop)
 		err = sock_intr_errno(timeo);
 		if (signal_pending(current))
 			break;
+
+		err = sock_error(sk);
+		if (err)
+			break;
 	}
 	finish_wait(sk_sleep(sk), &wait);
 	*timeop = timeo;
@@ -1551,6 +1558,8 @@ static bool filter_connect(struct tipc_sock *tsk, struct sk_buff *skb)
 	struct sock *sk = &tsk->sk;
 	struct net *net = sock_net(sk);
 	struct tipc_msg *hdr = buf_msg(skb);
+	u32 pport = msg_origport(hdr);
+	u32 pnode = msg_orignode(hdr);
 
 	if (unlikely(msg_mcast(hdr)))
 		return false;
@@ -1558,18 +1567,28 @@ static bool filter_connect(struct tipc_sock *tsk, struct sk_buff *skb)
 	switch (sk->sk_state) {
 	case TIPC_CONNECTING:
 		/* Accept only ACK or NACK message */
-		if (unlikely(!msg_connected(hdr)))
-			return false;
+		if (unlikely(!msg_connected(hdr))) {
+			if (pport != tsk_peer_port(tsk) ||
+			    pnode != tsk_peer_node(tsk))
+				return false;
+
+			tipc_set_sk_state(sk, TIPC_DISCONNECTING);
+			sk->sk_err = ECONNREFUSED;
+			sk->sk_state_change(sk);
+			return true;
+		}
 
 		if (unlikely(msg_errcode(hdr))) {
 			tipc_set_sk_state(sk, TIPC_DISCONNECTING);
 			sk->sk_err = ECONNREFUSED;
+			sk->sk_state_change(sk);
 			return true;
 		}
 
 		if (unlikely(!msg_isdata(hdr))) {
 			tipc_set_sk_state(sk, TIPC_DISCONNECTING);
 			sk->sk_err = EINVAL;
+			sk->sk_state_change(sk);
 			return true;
 		}
 
-- 
2.1.4


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox