Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] net: phy: marvell: clear wol event before setting it
From: Jisheng Zhang @ 2018-04-27  7:25 UTC (permalink / raw)
  To: Bhadram Varka
  Cc: Andrew Lunn, Florian Fainelli, David S. Miller, netdev,
	linux-kernel, Jingju Hou
In-Reply-To: <73e21c83-f78a-8b22-a421-f179ef6adef1@nvidia.com>

On Thu, 26 Apr 2018 12:39:59 +0530 Bhadram Varka wrote:

> >>>>>
> >>>>> diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
> >>>>> index c22e8e383247..b6abe1cbc84b 100644
> >>>>> --- a/drivers/net/phy/marvell.c
> >>>>> +++ b/drivers/net/phy/marvell.c
> >>>>> @@ -115,6 +115,9 @@
> >>>>>    /* WOL Event Interrupt Enable */
> >>>>>    #define MII_88E1318S_PHY_CSIER_WOL_EIE BIT(7)
> >>>>>    +/* Copper Specific Interrupt Status Register */
> >>>>> +#define MII_88E1318S_PHY_CSISR                0x13
> >>>>> +
> >>>>>    /* LED Timer Control Register */
> >>>>>    #define MII_88E1318S_PHY_LED_TCR            0x12
> >>>>>    #define MII_88E1318S_PHY_LED_TCR_FORCE_INT BIT(15)
> >>>>> @@ -1393,6 +1396,12 @@ static int m88e1318_set_wol(struct 
> >>>>> phy_device *phydev,
> >>>>>            if (err < 0)
> >>>>>                goto error;
> >>>>>    +        /* If WOL event happened once, the LED[2] interrupt pin
> >>>>> +         * will not be cleared unless reading the CSISR register.
> >>>>> +         * So clear the WOL event first before enabling it.
> >>>>> +         */
> >>>>> +        phy_read(phydev, MII_88E1318S_PHY_CSISR);
> >>>>> +  
> >>>> Hi Jisheng
> >>>>
> >>>> The problem with this is, you could be clearing a real interrupt, link
> >>>> down/up etc. If interrupts are in use, i think the normal interrupt
> >>>> handling will clear the WOL interrupt? So can you make this read
> >>>> conditional on !phy_interrupt_is_valid()?  
> >>> So this will clear WoL interrupt bit from Copper Interrupt status 
> >>> register.
> >>>
> >>> How about clearing WoL status (Page 17, register 17) for every WOL 
> >>> event ?
> >>>  
> >> This is already properly done by setting 
> >> MII_88E1318S_PHY_WOL_CTRL_CLEAR_WOL_STATUS
> >> in m88e1318_set_wol()  
> > This part of the code executes only when we enable WOL through ethtool 
> > (ethtool -s eth0 wol g)
> >
> > Lets say once WOL enabled through magic packet - HW generates WOL 
> > interrupt once magic packet received.
> > The problem that I see here is that for the next immediate magic 
> > packet I don't see WOL interrupt generated by the HW.
> > I need to explicitly clear WOL status for HW to generate WOL interrupt.  
> 
> With the below patch I see WOL event interrupt for every magic packet 
> that HW receives...
> 
> diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
> index ed8a67d..5d3d138 100644
> --- a/drivers/net/phy/marvell.c
> +++ b/drivers/net/phy/marvell.c
> @@ -55,6 +55,7 @@
> 
>   #define MII_M1011_IEVENT               0x13
>   #define MII_M1011_IEVENT_CLEAR         0x0000
> +#define MII_M1011_IEVENT_WOL_EVENT     BIT(7)
> 
>   #define MII_M1011_IMASK                        0x12
> - #define MII_M1011_IMASK_INIT           0x6400
> + #define MII_M1011_IMASK_INIT           0x6480
> 
> @@ -195,13 +196,40 @@ struct marvell_priv {
>          bool copper;
>   };
> 
> +static int marvell_clear_wol_status(struct phy_device *phydev)
> +{
> +       int err, temp, oldpage;
> +
> +       oldpage = phy_read(phydev, MII_MARVELL_PHY_PAGE);
> +       if (oldpage < 0)
> +               return oldpage;
> +
> +       err = phy_write(phydev, MII_MARVELL_PHY_PAGE,
> +                        MII_88E1318S_PHY_WOL_PAGE);
> +       if (err < 0)
> +               return err;
> +
> +       /*
> +        * Clear WOL status so that for next WOL event
> +        * interrupt will be generated by HW
> +        */
> +       temp = phy_read(phydev, MII_88E1318S_PHY_WOL_CTRL);
> +       temp |= MII_88E1318S_PHY_WOL_CTRL_CLEAR_WOL_STATUS;
> +       err = phy_write(phydev, MII_88E1318S_PHY_WOL_CTRL, temp);

is it better to reuse __phy_write()?

> +       if (err < 0)
> +               return err;
> +
> +
> +       phy_write(phydev, MII_MARVELL_PHY_PAGE, oldpage);
> +
> +       return 0;
> +}
> +
>   static int marvell_ack_interrupt(struct phy_device *phydev)
>   {
>          int err;
> 
>          /* Clear the interrupts by reading the reg */
>          err = phy_read(phydev, MII_M1011_IEVENT);
> -
>          if (err < 0)
>                  return err;
> 
> @@ -1454,12 +1482,18 @@ static int marvell_aneg_done(struct phy_device 
> *phydev)
> 
>   static int m88e1121_did_interrupt(struct phy_device *phydev)
>   {
> -       int imask;
> +       int imask, err;
> 
>          imask = phy_read(phydev, MII_M1011_IEVENT);
> 
> -       if (imask & MII_M1011_IMASK_INIT)
> +       if (imask & MII_M1011_IMASK_INIT) {
> +               if (imask & MII_M1011_IEVENT_WOL_EVENT) {
> +                       err = marvell_clear_wol_status(phydev);
> +                       if (err < 0)
> +                               return 0;
> +               }
>                  return 1;
> +       }
> 
>          return 0;
>   }

^ permalink raw reply

* Re: [PATCH net-next v2 2/2] openvswitch: Support conntrack zone limit
From: Pravin Shelar @ 2018-04-27  7:28 UTC (permalink / raw)
  To: Yi-Hung Wei; +Cc: Linux Kernel Network Developers
In-Reply-To: <CAG1aQh+KXPxB3yvmrdxgV6eO63CiwDFSB_RGcsMhGd4rUvSxSQ@mail.gmail.com>

On Wed, Apr 25, 2018 at 2:51 PM, Yi-Hung Wei <yihung.wei@gmail.com> wrote:
>>> +#if    IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
>>> +#define OVS_CT_LIMIT_UNLIMITED 0
>>> +#define OVS_CT_LIMIT_DEFAULT OVS_CT_LIMIT_UNLIMITED
>>> +#define CT_LIMIT_HASH_BUCKETS 512
>>> +
>> Can you use static key when the limit is not set.
>> This would avoid overhead in datapath when these limits are not used.
>>
> Thanks Parvin for the review.  I'm not sure about the 'static key'
> part, are you suggesting that say if we can have a flag to check if no
> one has ever set the ct_limit?   If the ct_limit feature is not used
> so far, then instead of look up the hash table for the zone limit, we
> just return the default unlimited value.  So that we can avoid the
> overhead of checking ct_limit.
>

Right, here documentaion of static keys:
https://www.kernel.org/doc/Documentation/static-keys.txt

>>> +struct ovs_ct_limit {
>>> +       /* Elements in ovs_ct_limit_info->limits hash table */
>>> +       struct hlist_node hlist_node;
>>> +       struct rcu_head rcu;
>>> +       u16 zone;
>>> +       u32 limit;
>>> +};
>>> +
>> ...
>
>
>>>  /* Lookup connection and confirm if unconfirmed. */
>>>  static int ovs_ct_commit(struct net *net, struct sw_flow_key *key,
>>>                          const struct ovs_conntrack_info *info,
>>> @@ -1054,6 +1176,13 @@ static int ovs_ct_commit(struct net *net, struct sw_flow_key *key,
>>>         if (!ct)
>>>                 return 0;
>>>
>>> +#if    IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
>>> +       err = ovs_ct_check_limit(net, info,
>>> +                                &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple);
>>> +       if (err)
>>> +               return err;
>>> +#endif
>>> +
>>
>> This could be checked during flow install time, so that only permitted
>> flows would have 'ct commit' action, we can avoid per packet cost
>> checking the limit.
> It seems to me that it would be hard to check the # of connections of
> a flow in the flow installation stage.  For example, if we have a
> datapath flow that performs “ct commit” action on all UDP traffic from
> in_port 1, then it could be various combinations of 5-tuple that form
> various # of connections.  Therefore, it would be hard to do the
> admission control there.
>

Ok, I see the issue.
I think we could still avoid the lookup every time by checking the
limits for unconfirmed connections (ref: nf_ct_is_confirmed()).
So once connection confirmed and is under limit  we should allow all
packets for given connection.
This way we can avoid connection disruption when we are reaching near
connection limit.

>
>> returning error code form ovs_ct_commit() is lost in datapath and it
>> would be hard to debug packet lost in case of the limit is reached. So
>> another advantage of checking the limit in flow install be better
>> traceability. datapath would return error to usespace and it can log
>> the error code.
> Thanks for pointing out the error code from ovs_ct_commit() will be
> lost in the datapath.  In this case, shall we consider to report the
> packet drop by some rate_limit logging such as net_warn_ratelimited()
> or net_info_ratelimited()?
>

ok, that would be fine.

^ permalink raw reply

* [PATCH] ptp_pch: use helpers function for converting between ns and timespec
From: YueHaibing @ 2018-04-27  7:36 UTC (permalink / raw)
  To: richardcochran, davem; +Cc: netdev, linux-kernel, YueHaibing

use ns_to_timespec64() and timespec64_to_ns() instead of open coding

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
 drivers/ptp/ptp_pch.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/ptp/ptp_pch.c b/drivers/ptp/ptp_pch.c
index b328517..78ccf93 100644
--- a/drivers/ptp/ptp_pch.c
+++ b/drivers/ptp/ptp_pch.c
@@ -452,7 +452,6 @@ static int ptp_pch_adjtime(struct ptp_clock_info *ptp, s64 delta)
 static int ptp_pch_gettime(struct ptp_clock_info *ptp, struct timespec64 *ts)
 {
 	u64 ns;
-	u32 remainder;
 	unsigned long flags;
 	struct pch_dev *pch_dev = container_of(ptp, struct pch_dev, caps);
 	struct pch_ts_regs __iomem *regs = pch_dev->regs;
@@ -461,8 +460,7 @@ static int ptp_pch_gettime(struct ptp_clock_info *ptp, struct timespec64 *ts)
 	ns = pch_systime_read(regs);
 	spin_unlock_irqrestore(&pch_dev->register_lock, flags);
 
-	ts->tv_sec = div_u64_rem(ns, 1000000000, &remainder);
-	ts->tv_nsec = remainder;
+	*ts = ns_to_timespec64(ns);
 	return 0;
 }
 
@@ -474,8 +472,7 @@ static int ptp_pch_settime(struct ptp_clock_info *ptp,
 	struct pch_dev *pch_dev = container_of(ptp, struct pch_dev, caps);
 	struct pch_ts_regs __iomem *regs = pch_dev->regs;
 
-	ns = ts->tv_sec * 1000000000ULL;
-	ns += ts->tv_nsec;
+	ns = timespec64_to_ns(ts);
 
 	spin_lock_irqsave(&pch_dev->register_lock, flags);
 	pch_systime_write(regs, ns);
-- 
2.7.0

^ permalink raw reply related

* Re: [PATCH net-next v2 5/7] MIPS: mscc: Add switch to ocelot
From: Alexandre Belloni @ 2018-04-27  7:40 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: David S . Miller, Allan Nielsen, razvan.stefanescu, po.liu,
	Thomas Petazzoni, Florian Fainelli, netdev, devicetree,
	linux-kernel, linux-mips, James Hogan
In-Reply-To: <20180426205113.GD23481@lunn.ch>

On 26/04/2018 22:51:13+0200, Andrew Lunn wrote:
> > +
> > +		mdio0: mdio@107009c {
> > +			#address-cells = <1>;
> > +			#size-cells = <0>;
> > +			compatible = "mscc,ocelot-miim";
> > +			reg = <0x107009c 0x36>, <0x10700f0 0x8>;
> > +			interrupts = <14>;
> > +			status = "disabled";
> > +
> > +			phy0: ethernet-phy@0 {
> > +				reg = <0>;
> > +			};
> > +			phy1: ethernet-phy@1 {
> > +				reg = <1>;
> > +			};
> > +			phy2: ethernet-phy@2 {
> > +				reg = <2>;
> > +			};
> > +			phy3: ethernet-phy@3 {
> > +				reg = <3>;
> > +			};
> 
> Hi Alexandre
> 
> These are internal PHYs? Is there an option to use external PHYs for
> the ports which have internal PHYs?
> 
> I'm just wondering if they should be linked together by default. Or a
> comment added to the commit message about why they are not linked
> together here.
> 

They are dual media ports so they are not necessarily using the
integrated PHY but can use SerDEs1G lanes.

I'll add that to the commit message.

-- 
Alexandre Belloni, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply

* Re: [PATCH 3/3] can: xilinx: fix xcan_start_xmit()'s return type
From: Michal Simek @ 2018-04-27  7:49 UTC (permalink / raw)
  To: Luc Van Oostenryck, Marc Kleine-Budde
  Cc: Wolfgang Grandegger, Maxime Ripard, Chen-Yu Tsai, Michal Simek,
	open list:CAN NETWORK DRIVERS, open list:NETWORKING DRIVERS,
	open list, moderated list:ARM/Allwinner sunXi SoC support
In-Reply-To: <20180426211339.30821-4-luc.vanoostenryck@gmail.com>

On 26.4.2018 23:13, Luc Van Oostenryck wrote:
> The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
> which is a typedef for an enum type, but the implementation in this
> driver returns an 'int'.
> 
> Fix this by returning 'netdev_tx_t' in this driver too.
> 
> Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
> ---
>  drivers/net/can/xilinx_can.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/can/xilinx_can.c b/drivers/net/can/xilinx_can.c
> index 89aec07c2..a19648606 100644
> --- a/drivers/net/can/xilinx_can.c
> +++ b/drivers/net/can/xilinx_can.c
> @@ -386,7 +386,7 @@ static int xcan_do_set_mode(struct net_device *ndev, enum can_mode mode)
>   *
>   * Return: 0 on success and failure value on error
>   */
> -static int xcan_start_xmit(struct sk_buff *skb, struct net_device *ndev)
> +static netdev_tx_t xcan_start_xmit(struct sk_buff *skb, struct net_device *ndev)
>  {
>  	struct xcan_priv *priv = netdev_priv(ndev);
>  	struct net_device_stats *stats = &ndev->stats;
> 

It was applied already but there should be also kernel-doc update too to
use enum values instead of 0.

M

^ permalink raw reply

* Re: [PATCH 3/3] can: xilinx: fix xcan_start_xmit()'s return type
From: Marc Kleine-Budde @ 2018-04-27  7:55 UTC (permalink / raw)
  To: Michal Simek, Luc Van Oostenryck
  Cc: Wolfgang Grandegger, Maxime Ripard, Chen-Yu Tsai,
	open list:CAN NETWORK DRIVERS, open list:NETWORKING DRIVERS,
	open list, moderated list:ARM/Allwinner sunXi SoC support
In-Reply-To: <79ea4ab6-a12a-21ad-b586-6a1dad176c0a@xilinx.com>


[-- Attachment #1.1: Type: text/plain, Size: 2245 bytes --]

On 04/27/2018 09:49 AM, Michal Simek wrote:
> On 26.4.2018 23:13, Luc Van Oostenryck wrote:
>> The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
>> which is a typedef for an enum type, but the implementation in this
>> driver returns an 'int'.
>>
>> Fix this by returning 'netdev_tx_t' in this driver too.
>>
>> Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
>> ---
>>  drivers/net/can/xilinx_can.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/can/xilinx_can.c b/drivers/net/can/xilinx_can.c
>> index 89aec07c2..a19648606 100644
>> --- a/drivers/net/can/xilinx_can.c
>> +++ b/drivers/net/can/xilinx_can.c
>> @@ -386,7 +386,7 @@ static int xcan_do_set_mode(struct net_device *ndev, enum can_mode mode)
>>   *
>>   * Return: 0 on success and failure value on error
>>   */
>> -static int xcan_start_xmit(struct sk_buff *skb, struct net_device *ndev)
>> +static netdev_tx_t xcan_start_xmit(struct sk_buff *skb, struct net_device *ndev)
>>  {
>>  	struct xcan_priv *priv = netdev_priv(ndev);
>>  	struct net_device_stats *stats = &ndev->stats;
>>
> 
> It was applied already but there should be also kernel-doc update too to
> use enum values instead of 0.

Like this:

> diff --git a/drivers/net/can/xilinx_can.c b/drivers/net/can/xilinx_can.c
> index f07ce4945356..d0ad1473f689 100644
> --- a/drivers/net/can/xilinx_can.c
> +++ b/drivers/net/can/xilinx_can.c
> @@ -398,7 +398,7 @@ static int xcan_do_set_mode(struct net_device *ndev, enum can_mode mode)
>   * function uses the next available free txbuff and populates their fields to
>   * start the transmission.
>   *
> - * Return: 0 on success and failure value on error
> + * Return: NETDEV_TX_OK on success and NETDEV_TX_BUSY when the tx queue is full
>   */
>  static netdev_tx_t xcan_start_xmit(struct sk_buff *skb, struct net_device *ndev)
>  {

I can squash in that change.

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH 3/3] can: xilinx: fix xcan_start_xmit()'s return type
From: Michal Simek @ 2018-04-27  8:03 UTC (permalink / raw)
  To: Marc Kleine-Budde, Michal Simek, Luc Van Oostenryck
  Cc: Wolfgang Grandegger, Maxime Ripard, Chen-Yu Tsai,
	open list:CAN NETWORK DRIVERS, open list:NETWORKING DRIVERS,
	open list, moderated list:ARM/Allwinner sunXi SoC support
In-Reply-To: <02e9be66-1345-2af7-c343-9f94d71e1d10@pengutronix.de>

On 27.4.2018 09:55, Marc Kleine-Budde wrote:
> On 04/27/2018 09:49 AM, Michal Simek wrote:
>> On 26.4.2018 23:13, Luc Van Oostenryck wrote:
>>> The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
>>> which is a typedef for an enum type, but the implementation in this
>>> driver returns an 'int'.
>>>
>>> Fix this by returning 'netdev_tx_t' in this driver too.
>>>
>>> Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
>>> ---
>>>  drivers/net/can/xilinx_can.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/can/xilinx_can.c b/drivers/net/can/xilinx_can.c
>>> index 89aec07c2..a19648606 100644
>>> --- a/drivers/net/can/xilinx_can.c
>>> +++ b/drivers/net/can/xilinx_can.c
>>> @@ -386,7 +386,7 @@ static int xcan_do_set_mode(struct net_device *ndev, enum can_mode mode)
>>>   *
>>>   * Return: 0 on success and failure value on error
>>>   */
>>> -static int xcan_start_xmit(struct sk_buff *skb, struct net_device *ndev)
>>> +static netdev_tx_t xcan_start_xmit(struct sk_buff *skb, struct net_device *ndev)
>>>  {
>>>  	struct xcan_priv *priv = netdev_priv(ndev);
>>>  	struct net_device_stats *stats = &ndev->stats;
>>>
>>
>> It was applied already but there should be also kernel-doc update too to
>> use enum values instead of 0.
> 
> Like this:
> 
>> diff --git a/drivers/net/can/xilinx_can.c b/drivers/net/can/xilinx_can.c
>> index f07ce4945356..d0ad1473f689 100644
>> --- a/drivers/net/can/xilinx_can.c
>> +++ b/drivers/net/can/xilinx_can.c
>> @@ -398,7 +398,7 @@ static int xcan_do_set_mode(struct net_device *ndev, enum can_mode mode)
>>   * function uses the next available free txbuff and populates their fields to
>>   * start the transmission.
>>   *
>> - * Return: 0 on success and failure value on error
>> + * Return: NETDEV_TX_OK on success and NETDEV_TX_BUSY when the tx queue is full
>>   */
>>  static netdev_tx_t xcan_start_xmit(struct sk_buff *skb, struct net_device *ndev)
>>  {
> 
> I can squash in that change.

looks good to me.
Acked-by: Michal Simek <michal.simek@xilinx.com>

Thanks,
Michal

^ permalink raw reply

* [PATCH] drivers: net: replace UINT64_MAX with U64_MAX
From: Jisheng Zhang @ 2018-04-27  8:18 UTC (permalink / raw)
  To: Andrew Lunn, Vivien Didelot, Florian Fainelli, David S. Miller
  Cc: netdev, linux-kernel

U64_MAX is well defined now while the UINT64_MAX is not, so we fall
back to drivers' own definition as below:

	#ifndef UINT64_MAX
	#define UINT64_MAX             (u64)(~((u64)0))
	#endif

I believe this is in one phy driver then copied and pasted to other phy
drivers.

Replace the UINT64_MAX with U64_MAX to clean up the source code.

Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
---
 drivers/net/dsa/mv88e6xxx/chip.c | 6 +++---
 drivers/net/dsa/mv88e6xxx/chip.h | 4 ----
 drivers/net/phy/bcm-phy-lib.c    | 6 +-----
 drivers/net/phy/marvell.c        | 5 +----
 drivers/net/phy/micrel.c         | 5 +----
 drivers/net/phy/smsc.c           | 5 +----
 6 files changed, 7 insertions(+), 24 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 3d2091099f7f..1a2dd340853f 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -665,13 +665,13 @@ static uint64_t _mv88e6xxx_get_ethtool_stat(struct mv88e6xxx_chip *chip,
 	case STATS_TYPE_PORT:
 		err = mv88e6xxx_port_read(chip, port, s->reg, &reg);
 		if (err)
-			return UINT64_MAX;
+			return U64_MAX;
 
 		low = reg;
 		if (s->size == 4) {
 			err = mv88e6xxx_port_read(chip, port, s->reg + 1, &reg);
 			if (err)
-				return UINT64_MAX;
+				return U64_MAX;
 			high = reg;
 		}
 		break;
@@ -685,7 +685,7 @@ static uint64_t _mv88e6xxx_get_ethtool_stat(struct mv88e6xxx_chip *chip,
 			mv88e6xxx_g1_stats_read(chip, reg + 1, &high);
 		break;
 	default:
-		return UINT64_MAX;
+		return U64_MAX;
 	}
 	value = (((u64)high) << 16) | low;
 	return value;
diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h
index 80490f66bc06..4163c8099d0b 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.h
+++ b/drivers/net/dsa/mv88e6xxx/chip.h
@@ -21,10 +21,6 @@
 #include <linux/timecounter.h>
 #include <net/dsa.h>
 
-#ifndef UINT64_MAX
-#define UINT64_MAX		(u64)(~((u64)0))
-#endif
-
 #define SMI_CMD			0x00
 #define SMI_CMD_BUSY		BIT(15)
 #define SMI_CMD_CLAUSE_22	BIT(12)
diff --git a/drivers/net/phy/bcm-phy-lib.c b/drivers/net/phy/bcm-phy-lib.c
index 5ad130c3da43..0876aec7328c 100644
--- a/drivers/net/phy/bcm-phy-lib.c
+++ b/drivers/net/phy/bcm-phy-lib.c
@@ -346,10 +346,6 @@ void bcm_phy_get_strings(struct phy_device *phydev, u8 *data)
 }
 EXPORT_SYMBOL_GPL(bcm_phy_get_strings);
 
-#ifndef UINT64_MAX
-#define UINT64_MAX              (u64)(~((u64)0))
-#endif
-
 /* Caller is supposed to provide appropriate storage for the library code to
  * access the shadow copy
  */
@@ -362,7 +358,7 @@ static u64 bcm_phy_get_stat(struct phy_device *phydev, u64 *shadow,
 
 	val = phy_read(phydev, stat.reg);
 	if (val < 0) {
-		ret = UINT64_MAX;
+		ret = U64_MAX;
 	} else {
 		val >>= stat.shift;
 		val = val & ((1 << stat.bits) - 1);
diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
index 25e2a099b71c..b8f57e9b9379 100644
--- a/drivers/net/phy/marvell.c
+++ b/drivers/net/phy/marvell.c
@@ -1482,9 +1482,6 @@ static void marvell_get_strings(struct phy_device *phydev, u8 *data)
 	}
 }
 
-#ifndef UINT64_MAX
-#define UINT64_MAX		(u64)(~((u64)0))
-#endif
 static u64 marvell_get_stat(struct phy_device *phydev, int i)
 {
 	struct marvell_hw_stat stat = marvell_hw_stats[i];
@@ -1494,7 +1491,7 @@ static u64 marvell_get_stat(struct phy_device *phydev, int i)
 
 	val = phy_read_paged(phydev, stat.page, stat.reg);
 	if (val < 0) {
-		ret = UINT64_MAX;
+		ret = U64_MAX;
 	} else {
 		val = val & ((1 << stat.bits) - 1);
 		priv->stats[i] += val;
diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c
index f41b224a9cdb..de31c5170a5b 100644
--- a/drivers/net/phy/micrel.c
+++ b/drivers/net/phy/micrel.c
@@ -650,9 +650,6 @@ static void kszphy_get_strings(struct phy_device *phydev, u8 *data)
 	}
 }
 
-#ifndef UINT64_MAX
-#define UINT64_MAX              (u64)(~((u64)0))
-#endif
 static u64 kszphy_get_stat(struct phy_device *phydev, int i)
 {
 	struct kszphy_hw_stat stat = kszphy_hw_stats[i];
@@ -662,7 +659,7 @@ static u64 kszphy_get_stat(struct phy_device *phydev, int i)
 
 	val = phy_read(phydev, stat.reg);
 	if (val < 0) {
-		ret = UINT64_MAX;
+		ret = U64_MAX;
 	} else {
 		val = val & ((1 << stat.bits) - 1);
 		priv->stats[i] += val;
diff --git a/drivers/net/phy/smsc.c b/drivers/net/phy/smsc.c
index be399d645224..c328208388da 100644
--- a/drivers/net/phy/smsc.c
+++ b/drivers/net/phy/smsc.c
@@ -168,9 +168,6 @@ static void smsc_get_strings(struct phy_device *phydev, u8 *data)
 	}
 }
 
-#ifndef UINT64_MAX
-#define UINT64_MAX              (u64)(~((u64)0))
-#endif
 static u64 smsc_get_stat(struct phy_device *phydev, int i)
 {
 	struct smsc_hw_stat stat = smsc_hw_stats[i];
@@ -179,7 +176,7 @@ static u64 smsc_get_stat(struct phy_device *phydev, int i)
 
 	val = phy_read(phydev, stat.reg);
 	if (val < 0)
-		ret = UINT64_MAX;
+		ret = U64_MAX;
 	else
 		ret = val;
 
-- 
2.17.0

^ permalink raw reply related

* Re: [dm-devel] [PATCH v5] fault-injection: introduce kvmalloc fallback options
From: Michal Hocko @ 2018-04-27  8:25 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Michael S. Tsirkin, John Stoffel, James Bottomley, Michal,
	eric.dumazet, netdev, jasowang, Randy Dunlap, linux-kernel,
	Matthew Wilcox, linux-mm, dm-devel, Vlastimil Babka, Andrew,
	David Rientjes, Morton, virtualization, David Miller, edumazet
In-Reply-To: <alpine.LRH.2.02.1804261829190.30599@file01.intranet.prod.int.rdu2.redhat.com>

On Thu 26-04-18 18:52:05, Mikulas Patocka wrote:
> 
> 
> On Fri, 27 Apr 2018, Michael S. Tsirkin wrote:
[...]
> >    But assuming it's important to control this kind of
> >    fault injection to be controlled from
> >    a dedicated menuconfig option, why not the rest of
> >    faults?
> 
> The injected faults cause damage to the user, so there's no point to 
> enable them by default. vmalloc fallback should not cause any damage 
> (assuming that the code is correctly written).

But you want to find those bugs which would BUG_ON easier, so there is a
risk of harm IIUC and this is not much different than other fault
injecting paths.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply

* Re: [PATCH net] bridge: check iface upper dev when setting master via ioctl
From: Nikolay Aleksandrov @ 2018-04-27  8:33 UTC (permalink / raw)
  To: Hangbin Liu; +Cc: netdev, Dmitry Vyukov, syzbot, David Miller
In-Reply-To: <20180427013102.GJ20683@leo.usersys.redhat.com>

On 27/04/18 04:31, Hangbin Liu wrote:
> Hi Nikolay,
> 
> Thanks for the comments.
> On Thu, Apr 26, 2018 at 05:22:46PM +0300, Nikolay Aleksandrov wrote:
>>> Not all upper devs are masters. This can break some setups.
> 
> Ah, like vlan device.. So how about
> 
> +	if (netdev_master_upper_dev_get(dev))
>   		return -EBUSY;

That should be fine, yes.

> 
>>>
>>>
>>
>> Also it's not really a bug, the device begins to get initialized but it
>> will get removed at netdev_master_upper_dev_link() anyway if there's
>> already a master. Why would it be better ?
> 
>> It's clearly wrong to try and enslave a device that already has a master
>> via ioctl, rtnetlink already deals with that and the old ioctl interface
>> will get an error, yes it will initialize some structs but they'll get
>> freed later. This is common practice, check the bonding for example.
> 
> Bonding use netdev_is_rx_handler_busy(slave_dev) to check if the slave
> already has a master, which is another solution.

Some masters don't use rx_handlers and the bonding fails at linking them
as a master which is still fine, it cleans up after the error like the bridge.

>>
>> If anything do the check in the ioctl interface (add_del_if) only and
>> maybe target net-next, there's really no bug fix here. IMO it's not
> 
> What if someone do like
> 
> while true; do brctl addif br0 bond_slave &; done
> 
> I know this is stupid and almost no one will do that in real world.
> But syzbot run some similar test and get warn from kobject_add_internal()
> with -ENOMEM. That's why I think we should fix it before allocate any
> resource.
> 
> What do you think?

The bridge code is only a symptom of what happened, that warn was placed to
warn people against doing stupid things - it was literally in the commit message
of some kobject patch. As long as the resources involved are cleaned up and it's
returned to the bridge to cleanup after itself, it should be fine.
  
You can add the check if you feel like it, I don't have an
objection against failing earlier. My main concern was the netdev_has_any_upper
usage which can break some setups.

Cheers,
  Nik

> 
> [1] https://syzkaller.appspot.com/bug?id=3e0339080acd6a2a350a900bc6533b03f5498490
> 
> Thanks
> Hangbin
> 

^ permalink raw reply

* Re: [PATCH net-next 1/2 v2] netns: restrict uevents
From: Christian Brauner @ 2018-04-27  8:41 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: David Miller, netdev, linux-kernel, avagin, ktkhai, serge, gregkh
In-Reply-To: <87vacdbi58.fsf@xmission.com>

On Thu, Apr 26, 2018 at 07:35:47PM -0500, Eric W. Biederman wrote:
> Christian Brauner <christian.brauner@canonical.com> writes:
> 
> > On Thu, Apr 26, 2018 at 12:10:30PM -0500, Eric W. Biederman wrote:
> >> Christian Brauner <christian.brauner@canonical.com> writes:
> >> 
> >> > On Thu, Apr 26, 2018 at 11:47:19AM -0500, Eric W. Biederman wrote:
> >> >> Christian Brauner <christian.brauner@canonical.com> writes:
> >> >> 
> >> >> > On Tue, Apr 24, 2018 at 06:00:35PM -0500, Eric W. Biederman wrote:
> >> >> >> Christian Brauner <christian.brauner@canonical.com> writes:
> >> >> >> 
> >> >> >> > On Wed, Apr 25, 2018, 00:41 Eric W. Biederman <ebiederm@xmission.com> wrote:
> >> >> >> >
> >> >> >> >  Bah. This code is obviously correct and probably wrong.
> >> >> >> >
> >> >> >> >  How do we deliver uevents for network devices that are outside of the
> >> >> >> >  initial user namespace? The kernel still needs to deliver those.
> >> >> >> >
> >> >> >> >  The logic to figure out which network namespace a device needs to be
> >> >> >> >  delivered to is is present in kobj_bcast_filter. That logic will almost
> >> >> >> >  certainly need to be turned inside out. Sign not as easy as I would
> >> >> >> >  have hoped.
> >> >> >> >
> >> >> >> > My first patch that we discussed put additional filtering logic into kobj_bcast_filter for that very reason. But I can move that logic
> >> >> >> > out and come up with a new patch.
> >> >> >> 
> >> >> >> I may have mis-understood.
> >> >> >> 
> >> >> >> I heard and am still hearing additional filtering to reduce the places
> >> >> >> the packet is delievered.
> >> >> >> 
> >> >> >> I am saying something needs to change to increase the number of places
> >> >> >> the packet is delivered.
> >> >> >> 
> >> >> >> For the special class of devices that kobj_bcast_filter would apply to
> >> >> >> those need to be delivered to netowrk namespaces  that are no longer on
> >> >> >> uevent_sock_list.
> >> >> >> 
> >> >> >> So the code fundamentally needs to split into two paths.  Ordinary
> >> >> >> devices that use uevent_sock_list.  Network devices that are just
> >> >> >> delivered in their own network namespace.
> >> >> >> 
> >> >> >> netlink_broadcast_filtered gets to go away completely.
> >> >> >
> >> >> > The split *might* make sense but I think you're wrong about removing the
> >> >> > kobj_bcast_filter. The current filter doesn't operate on the uevent
> >> >> > socket in uevent_sock_list itself it rather operates on the sockets in
> >> >> > mc_list. And if socket in mc_list can have a different network namespace
> >> >> > then the uevent_socket itself then your way won't work. That's why my
> >> >> > original patch added additional filtering in there. The way I see it we
> >> >> > need something like:
> >> >> 
> >> >> We already filter the sockets in the mc_list by network namespace.
> >> >
> >> > Oh really? That's good to know. I haven't found where in the code this
> >> > actually happens. I thought that when netlink_bind() is called anyone
> >> > could register themselves in mc_list.
> >> 
> >> The code in af_netlink.c does:
> >> > static void do_one_broadcast(struct sock *sk,
> >> > 				    struct netlink_broadcast_data *p)
> >> > {
> >> > 	struct netlink_sock *nlk = nlk_sk(sk);
> >> > 	int val;
> >> > 
> >> > 	if (p->exclude_sk == sk)
> >> > 		return;
> >> > 
> >> > 	if (nlk->portid == p->portid || p->group - 1 >= nlk->ngroups ||
> >> > 	    !test_bit(p->group - 1, nlk->groups))
> >> > 		return;
> >> > 
> >> > 	if (!net_eq(sock_net(sk), p->net)) {
> >>             ^^^^^^^^^^^^ Here
> >> > 		if (!(nlk->flags & NETLINK_F_LISTEN_ALL_NSID))
> >> > 			return;
> >>                 ^^^^^^^^^^^ Here
> >> > 
> >> > 		if (!peernet_has_id(sock_net(sk), p->net))
> >> > 			return;
> >> > 
> >> > 		if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
> >> > 				     CAP_NET_BROADCAST))
> >> > 			return;
> >> > 	}
> >> 
> >> Which if you are not a magic NETLINK_F_LISTEN_ALL_NSID socket filters
> >> you out if you are the wrong network namespace.
> >> 
> >> 
> >> >> When a packet is transmitted with netlink_broadcast it is only
> >> >> transmitted within a single network namespace.
> >> >> 
> >> >> Even in the case of a NETLINK_F_LISTEN_ALL_NSID socket the skb is tagged
> >> >> with it's source network namespace so no confusion will result, and the
> >> >> permission checks have been done to make it safe. So you can safely
> >> >> ignore that case.  Please ignore that case.  It only needs to be
> >> >> considered if refactoring af_netlink.c
> >> >> 
> >> >> When I added netlink_broadcast_filtered I imagined that we would need
> >> >> code that worked across network namespaces that worked for different
> >> >> namespaces.   So it looked like we would need the level of granularity
> >> >> that you can get with netlink_broadcast_filtered.  It turns out we don't
> >> >> and that it was a case of over design.  As the only split we care about
> >> >> is per network namespace there is no need for
> >> >> netlink_broadcast_filtered.
> >> >> 
> >> >> > init_user_ns_broadcast_filtered(uevent_sock_list, kobj_bcast_filter);
> >> >> > user_ns_broadcast_filtered(uevent_sock_list,kobj_bcast_filter);
> >> >> >
> >> >> > The question that remains is whether we can rely on the network
> >> >> > namespace information we can gather from the kobject_ns_type_operations
> >> >> > to decide where we want to broadcast that event to. So something
> >> >> > *like*:
> >> >> 
> >> >> We can.  We already do.  That is what kobj_bcast_filter implements.
> >> >> 
> >> >> > 	ops = kobj_ns_ops(kobj);
> >> >> > 	if (!ops && kobj->kset) {
> >> >> > 		struct kobject *ksobj = &kobj->kset->kobj;
> >> >> > 		if (ksobj->parent != NULL)
> >> >> > 			ops = kobj_ns_ops(ksobj->parent);
> >> >> > 	}
> >> >> >
> >> >> > 	if (ops && ops->netlink_ns && kobj->ktype->namespace)
> >> >> > 		if (ops->type == KOBJ_NS_TYPE_NET)
> >> >> > 			net = kobj->ktype->namespace(kobj);
> >> >> 
> >> >> Please note the only entry in the enumeration in the kobj_ns_type
> >> >> enumeration other than KOBJ_NS_TYPE_NONE is KOBJ_NS_TYPE_NET.  So the
> >> >> check for ops->type in this case is redundant.
> >> >
> >> > Yes, I know the reason for doing it explicitly is to block the case
> >> > where kobjects get tagged with other namespaces. So we'd need to be
> >> > vigilant should that ever happen but fine.
> >> 
> >> It is fine to keep the check.
> >> 
> >> I was intending to point out that it is much more likely that we remove
> >> the enumeration and remove some of the extra abstraction, than another
> >> namespace is implemented there.
> >> 
> >> >> That is something else that could be simplifed.  At the time it was the
> >> >> necessary to get the sysfs changes merged.
> >> >> 
> >> >> > 	if (!net || net->user_ns == &init_user_ns)
> >> >> > 		ret = init_user_ns_broadcast(env, action_string, devpath);
> >> >> > 	else
> >> >> > 		ret = user_ns_broadcast(net->uevent_sock->sk, env,
> >> >> > 					action_string, devpath);
> >> >> 
> >> >> Almost.
> >> >> 
> >> >> 	if (!net)
> >> >>         	kobject_uevent_net_broadcast(kobj, env, action_string,
> >> >>         					dev_path);
> >> >> 	else
> >> >>         	netlink_broadcast(net->uevent_sock->sk, skb, 0, 1, GFP_KERNEL);
> >> >> 
> >> >> 
> >> >> I am handwaving to get the skb in the netlink_broadcast case but that
> >> >> should be enough for you to see what I am thinking.
> >> >
> >> > I have added a helper alloc_uevent_skb() that can be used in both cases.
> >> >
> >> > static struct sk_buff *alloc_uevent_skb(struct kobj_uevent_env *env,
> >> > 					const char *action_string,
> >> > 					const char *devpath)
> >> > {
> >> > 	struct sk_buff *skb = NULL;
> >> > 	char *scratch;
> >> > 	size_t len;
> >> >
> >> > 	/* allocate message with maximum possible size */
> >> > 	len = strlen(action_string) + strlen(devpath) + 2;
> >> > 	skb = alloc_skb(len + env->buflen, GFP_KERNEL);
> >> > 	if (!skb)
> >> > 		return NULL;
> >> >
> >> > 	/* add header */
> >> > 	scratch = skb_put(skb, len);
> >> > 	sprintf(scratch, "%s@%s", action_string, devpath);
> >> >
> >> > 	skb_put_data(skb, env->buf, env->buflen);
> >> >
> >> > 	NETLINK_CB(skb).dst_group = 1;
> >> >
> >> > 	return skb;
> >> > }
> >> >
> >> >> 
> >> >> My only concern with the above is that we almost certainly need to fix
> >> >> the credentials on the skb so that userspace does not drop the packet
> >
> > I guess we simply want:
> > 	if (user_ns != &init_user_ns) {
> > 		NETLINK_CB(skb).creds.uid = (kuid_t)0;
> > 		NETLINK_CB(skb).creds.gid = kgid_t)0;
> > 	}
> 
> I believe the above is what we already have.
> 
> > instead of the more complicated and - imho wrong:
> >
> > 	if (user_ns != &init_user_ns) {
> > 		/* fix credentials for udev running in user namespace */
> > 		kuid_t uid = NETLINK_CB(skb).creds.uid;
> > 		kgid_t gid = NETLINK_CB(skb).creds.gid;
> > 		NETLINK_CB(skb).creds.uid = from_kuid_munged(user_ns, uid);
> > 		NETLINK_CB(skb).creds.gid = from_kgid_munged(user_ns, gid);
> > 	}
> 
> The above is most definitely wrong as we store kuids and kgids in
> "NETLINK_CB(skb).creds".
> 
> I am pretty certain what we want is:
> 	kuid_t root_uid = make_kuid(net->user_ns, 0);
>         kgid_g root_gid = make_kgid(net->user_ns, 0);

Thanks! I looked at user_namespace.c which contained map_id_down() which
is the function that I wanted and remembered from a prior patchset of
mine but they weren't exported. I didn't spot make_k{g,u}id() which are
wrapping those. These are the droids^H^H^H^H^H^Hfunctions I was looking
for!

>         if (!uid_valid(root_uid))
>         	root_uid = GLOBAL_ROOT_UID;
> 	if (!gid_valid(root_gid))
>         	root_gid = GLOBAL_ROOT_GID;
> 	NETLINK_CB(skb).creds.uid = root_uid;
>         NETLINK_CB(skb).creds.gid = root_gid;
> 
> Let's be careful and only fix this for the networking uevents please.
> We want the other onces to just go away.

This is already handled by the

if (!net)
       handle_untagged_uevents()
else
       handle_taggged_uevents()

The else branch will only every contain network devices as to my
knowledge no other kernel devices are currently tagged.

Thanks!
Christian

> 
> The networking uevents we have to fix or they will be gone completely.
> 
> Eric

^ permalink raw reply

* Re: [PATCH v2 net-next 1/2] tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive
From: kbuild test robot @ 2018-04-27  8:44 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: kbuild-all, David S . Miller, netdev, Andy Lutomirski,
	linux-kernel, linux-mm, Eric Dumazet, Eric Dumazet,
	Soheil Hassas Yeganeh
In-Reply-To: <20180425214307.159264-2-edumazet@google.com>

[-- Attachment #1: Type: text/plain, Size: 895 bytes --]

Hi Eric,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Eric-Dumazet/tcp-add-TCP_ZEROCOPY_RECEIVE-support-for-zerocopy-receive/20180427-122234
config: sh-rsk7269_defconfig (attached as .config)
compiler: sh4-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=sh 

All errors (new ones prefixed by >>):

   net/ipv4/tcp.o: In function `tcp_setsockopt':
>> tcp.c:(.text+0x3f80): undefined reference to `zap_page_range'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 10487 bytes --]

^ permalink raw reply

* Re: Page allocator bottleneck
From: Aaron Lu @ 2018-04-27  8:45 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: Linux Kernel Network Developers, linux-mm, Mel Gorman,
	David Miller, Jesper Dangaard Brouer, Eric Dumazet,
	Alexei Starovoitov, Saeed Mahameed, Eran Ben Elisha,
	Andrew Morton, Michal Hocko
In-Reply-To: <20180423131033.GA13792@intel.com>

On Mon, Apr 23, 2018 at 09:10:33PM +0800, Aaron Lu wrote:
> On Mon, Apr 23, 2018 at 11:54:57AM +0300, Tariq Toukan wrote:
> > Hi,
> > 
> > I ran my tests with your patches.
> > Initial BW numbers are significantly higher than I documented back then in
> > this mail-thread.
> > For example, in driver #2 (see original mail thread), with 6 rings, I now
> > get 92Gbps (slightly less than linerate) in comparison to 64Gbps back then.
> > 
> > However, there were many kernel changes since then, I need to isolate your
> > changes. I am not sure I can finish this today, but I will surely get to it
> > next week after I'm back from vacation.
> > 
> > Still, when I increase the scale (more rings, i.e. more cpus), I see that
> > queued_spin_lock_slowpath gets to 60%+ cpu. Still high, but lower than it
> > used to be.
> 
> I wonder if it is on allocation path or free path?

Just FYI, I have pushed two more commits on top of the branch.
They should improve free path zone lock contention for MIGRATE_UNMOVABLE
pages(most kernel code alloc such pages), you may consider apply them if
free path contention is a problem.

^ permalink raw reply

* [PATCH] netfilter: ebtables: handle string from userspace with care
From: Paolo Abeni @ 2018-04-27  8:45 UTC (permalink / raw)
  To: netfilter-devel; +Cc: syzbot, fw, coreteam, syzkaller-bugs, netdev

strlcpy() can't be safely used on a user-space provided string,
as it can try to read beyond the buffer's end, if the latter is
not NULL terminated.

Leveraging the above, syzbot has been able to trigger the following
splat:

BUG: KASAN: stack-out-of-bounds in strlcpy include/linux/string.h:300
[inline]
BUG: KASAN: stack-out-of-bounds in compat_mtw_from_user
net/bridge/netfilter/ebtables.c:1957 [inline]
BUG: KASAN: stack-out-of-bounds in ebt_size_mwt
net/bridge/netfilter/ebtables.c:2059 [inline]
BUG: KASAN: stack-out-of-bounds in size_entry_mwt
net/bridge/netfilter/ebtables.c:2155 [inline]
BUG: KASAN: stack-out-of-bounds in compat_copy_entries+0x96c/0x14a0
net/bridge/netfilter/ebtables.c:2194
Write of size 33 at addr ffff8801b0abf888 by task syz-executor0/4504

CPU: 0 PID: 4504 Comm: syz-executor0 Not tainted 4.17.0-rc2+ #40
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
  print_address_description+0x6c/0x20b mm/kasan/report.c:256
  kasan_report_error mm/kasan/report.c:354 [inline]
  kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
  check_memory_region_inline mm/kasan/kasan.c:260 [inline]
  check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267
  memcpy+0x37/0x50 mm/kasan/kasan.c:303
  strlcpy include/linux/string.h:300 [inline]
  compat_mtw_from_user net/bridge/netfilter/ebtables.c:1957 [inline]
  ebt_size_mwt net/bridge/netfilter/ebtables.c:2059 [inline]
  size_entry_mwt net/bridge/netfilter/ebtables.c:2155 [inline]
  compat_copy_entries+0x96c/0x14a0 net/bridge/netfilter/ebtables.c:2194
  compat_do_replace+0x483/0x900 net/bridge/netfilter/ebtables.c:2285
  compat_do_ebt_set_ctl+0x2ac/0x324 net/bridge/netfilter/ebtables.c:2367
  compat_nf_sockopt net/netfilter/nf_sockopt.c:144 [inline]
  compat_nf_setsockopt+0x9b/0x140 net/netfilter/nf_sockopt.c:156
  compat_ip_setsockopt+0xff/0x140 net/ipv4/ip_sockglue.c:1279
  inet_csk_compat_setsockopt+0x97/0x120 net/ipv4/inet_connection_sock.c:1041
  compat_tcp_setsockopt+0x49/0x80 net/ipv4/tcp.c:2901
  compat_sock_common_setsockopt+0xb4/0x150 net/core/sock.c:3050
  __compat_sys_setsockopt+0x1ab/0x7c0 net/compat.c:403
  __do_compat_sys_setsockopt net/compat.c:416 [inline]
  __se_compat_sys_setsockopt net/compat.c:413 [inline]
  __ia32_compat_sys_setsockopt+0xbd/0x150 net/compat.c:413
  do_syscall_32_irqs_on arch/x86/entry/common.c:323 [inline]
  do_fast_syscall_32+0x345/0xf9b arch/x86/entry/common.c:394
  entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139
RIP: 0023:0xf7fb3cb9
RSP: 002b:00000000fff0c26c EFLAGS: 00000282 ORIG_RAX: 000000000000016e
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000000000
RDX: 0000000000000080 RSI: 0000000020000300 RDI: 00000000000005f4
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000

The buggy address belongs to the page:
page:ffffea0006c2afc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
flags: 0x2fffc0000000000()
raw: 02fffc0000000000 0000000000000000 0000000000000000 00000000ffffffff
raw: 0000000000000000 ffffea0006c20101 0000000000000000 0000000000000000
page dumped because: kasan: bad access detected

Fix the issue replacing the unsafe function with strscpy() and
taking care of possible errors.

Fixes: 81e675c227ec ("netfilter: ebtables: add CONFIG_COMPAT support")
Reported-and-tested-by: syzbot+4e42a04e0bc33cb6c087@syzkaller.appspotmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
Notes: others strlcpy() usage in ebtables.c look safe, as the source
string cames from the kernel.
---
 net/bridge/netfilter/ebtables.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index 28a4c3490359..6ba639f6c51d 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -1954,7 +1954,8 @@ static int compat_mtw_from_user(struct compat_ebt_entry_mwt *mwt,
 	int off, pad = 0;
 	unsigned int size_kern, match_size = mwt->match_size;
 
-	strlcpy(name, mwt->u.name, sizeof(name));
+	if (strscpy(name, mwt->u.name, sizeof(name)) < 0)
+		return -EINVAL;
 
 	if (state->buf_kern_start)
 		dst = state->buf_kern_start + state->buf_kern_offset;
-- 
2.14.3

^ permalink raw reply related

* Re: i.MX6S/DL and QCA8334 switch using DSA driver - CPU port not working
From: Michal Vokáč @ 2018-04-27  8:49 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev, Vivien Didelot, Florian Fainelli
In-Reply-To: <20180426140629.GB15370@lunn.ch>

On 26.4.2018 16:06, Andrew Lunn wrote:
> On Thu, Apr 26, 2018 at 03:37:33PM +0200, Michal Vokáč wrote:
>>
>>   - Linux 4.9.84 (Freescale 4.9-1.0.x-imx branch)
> 
> Hi Michal
> 
> Please use mainline, not the freescale fork. For DSA, there is nothing
> you need in the freescale fork. Once it works with mainline, you can
> then figure out what needs to be done to make the fork work.

Hi Andrew,
Thanks for such a quick reply!

OK, good point, I will go this way - mainline first, fork then.

>> The Freescale branch does not introduce any changes to the DSA nor to the QCA8K
>> drivers from mainline.
> 
> Does it have
> fbbeefdd2104 ("net: fec: Allow reception of frames bigger than 1522 bytes")

Yes, this one was backported to 4.9.74 and is in the freescale branch too.

>> To make the bridge work I need to enable forwarding across all the switch ports
>> at setup.
>>
>> --- a/drivers/net/dsa/qca8k.c
>> +++ b/drivers/net/dsa/qca8k.c
>> @@ -578,12 +578,12 @@ qca8k_setup(struct dsa_switch *ds)
>>     		if (ds->enabled_port_mask & BIT(i))
>>     			qca8k_port_set_status(priv, i, 0);
>> -	/* Forward all unknown frames to CPU port for Linux processing */
>> +	/* Forward all unknown frames to all pors */
>>     	qca8k_write(priv, QCA8K_REG_GLOBAL_FW_CTRL1,
>>     		    BIT(0) << QCA8K_GLOBAL_FW_CTRL1_IGMP_DP_S |
>> -		    BIT(0) << QCA8K_GLOBAL_FW_CTRL1_BC_DP_S |
>> -		    BIT(0) << QCA8K_GLOBAL_FW_CTRL1_MC_DP_S |
>> -		    BIT(0) << QCA8K_GLOBAL_FW_CTRL1_UC_DP_S);
>> +		    0x7f << QCA8K_GLOBAL_FW_CTRL1_BC_DP_S |
>> +		    0x7f << QCA8K_GLOBAL_FW_CTRL1_MC_DP_S |
>> +		    0x7f << QCA8K_GLOBAL_FW_CTRL1_UC_DP_S);
>>     	/* Setup connection between CPU port & user ports */
>>     	for (i = 0; i < DSA_MAX_PORTS; i++) {
>> --
> 
> This is probably because you don't have a working CPU port.  If that
> worked, all unknown frames would be passed to the software bridge. It
> would then either flood them out all ports, or if it knows the
> destination MAC address, out one specific port. The should be enough
> to make the destination reply, at which point the switch learns the
> MAC address, and it is no longer unknown.
> 
> So lets leave this alone for the moment.

First attempt - pure mainline 4.9.84 without my patch.
CPU port not working, bridge not working.

>> But I am still not able to make work the CPU port though.
>>
>>   # udhcpc -i eth2
>>   Sending discover...
>>   [FOREVER]
>>
>> The same for eth1, eth2 and br0.
>>
>> I suspect the problem may be at different levels:
>>
>>   - The RGMII interface is not properly configured
>>    -- at the CPU side, or
>>    -- at the switch chip side.
>>   - Some setup that I have not done needs to be done (in userspace).
> 
> Your user space setup look O.K.
> 
> Try playing with RGMII delays. Set the phy-mode to rgmii-id.

OK, I will try some combinations and also to tune the numbers in the driver.
That part is actually quite confusing to me. phy-mode can be set for the fec
and for the port. For the fec I am now using rgmii as that is what we were
using before and it worked. Though from my understanding of the ethernet
binding doc it totally make sense to use rgmii-id for the fec.
I tried that and it did not help.

Using rgmii-id for the port is not valid as the qca8k driver does not support
that mode. It only supports rgmii and sgmii. I think this is actually not
correct. When phy-mode is set to rgmii for port the qca8k driver configures
internal delays in the switch. So it behaves like rgmii-id I think.

Should not it be:

--- a/drivers/net/dsa/qca8k.c
+++ b/drivers/net/dsa/qca8k.c
@@ -474,7 +474,7 @@ qca8k_set_pad_ctrl(struct qca8k_priv *priv, int port, int mode)
  	 * PHY or MAC.
  	 */
  	switch (mode) {
-	case PHY_INTERFACE_MODE_RGMII:
+	case PHY_INTERFACE_MODE_RGMII_ID:
  	qca8k_write(priv, reg,
		    QCA8K_PORT_PAD_RGMII_EN |
		    QCA8K_PORT_PAD_RGMII_TX_DELAY(3) |

^ permalink raw reply

* Re: [PATCHv3 3/3] tools bpftool: Display license GPL compatible in prog show/list
From: Jiri Olsa @ 2018-04-27  8:58 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Jakub Kicinski, Jiri Olsa, Alexei Starovoitov, lkml, netdev,
	Quentin Monnet
In-Reply-To: <dbc91e08-4c99-e01f-eab5-ac235812cee7@iogearbox.net>

On Thu, Apr 26, 2018 at 10:49:25PM +0200, Daniel Borkmann wrote:
> On 04/26/2018 10:18 AM, Jiri Olsa wrote:
> [...]
> > v3 of the last patch attached, the branch is also updated
> > 
> > thanks,
> > jirka
> > 
> > 
> > ---
> > Display the license "gpl" string in bpftool prog command, like:
> > 
> >   # bpftool prog list
> >   5: tracepoint  name func  tag 57cd311f2e27366b  gpl
> >           loaded_at Apr 26/09:37  uid 0
> >           xlated 16B  not jited  memlock 4096B
> > 
> >   # bpftool --json --pretty prog show
> >   [{
> >           "id": 5,
> >           "type": "tracepoint",
> >           "name": "func",
> >           "tag": "57cd311f2e27366b",
> >           "gpl_compatible": true,
> >           "loaded_at": "Apr 26/09:37",
> >           "uid": 0,
> >           "bytes_xlated": 16,
> >           "jited": false,
> >           "bytes_memlock": 4096
> >       }
> >   ]
> > 
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> 
> Ok, v2 from prior two patches and v3 of this one applied to bpf-next. Please
> next time always submit a fresh new series at once, thanks Jiri.

noted, thanks a lot

jirka

^ permalink raw reply

* Re: [PATCH bpf-next v4 00/10] bpf: document eBPF helpers and add a script to generate man page
From: Daniel Borkmann @ 2018-04-27  9:08 UTC (permalink / raw)
  To: Quentin Monnet, ast; +Cc: netdev, oss-drivers, linux-doc, linux-man
In-Reply-To: <20180425171701.11048-1-quentin.monnet@netronome.com>

On 04/25/2018 07:16 PM, Quentin Monnet wrote:
> eBPF helper functions can be called from within eBPF programs to perform
> a variety of tasks that would be otherwise hard or impossible to do with
> eBPF itself. There is a growing number of such helper functions in the
> kernel, but documentation is scarce. The main user space header file
> does contain a short commented description of most helpers, but it is
> somewhat outdated and not complete. It is more a "cheat sheet" than a
> real documentation accessible to new eBPF developers.
> 
> This commit attempts to improve the situation by replacing the existing
> overview for the helpers with a more developed description. Furthermore,
> a Python script is added to generate a manual page for eBPF helpers. The
> workflow is the following, and requires the rst2man utility:
> 
>     $ ./scripts/bpf_helpers_doc.py \
>             --filename include/uapi/linux/bpf.h > /tmp/bpf-helpers.rst
>     $ rst2man /tmp/bpf-helpers.rst > /tmp/bpf-helpers.7
>     $ man /tmp/bpf-helpers.7
> 
> The objective is to keep all documentation related to the helpers in a
> single place, and to be able to generate from here a manual page that
> could be packaged in the man-pages repository and shipped with most
> distributions.
> 
> Additionally, parsing the prototypes of the helper functions could
> hopefully be reused, with a different Printer object, to generate
> header files needed in some eBPF-related projects.
> 
> Regarding the description of each helper, it comprises several items:
> 
> - The function prototype.
> - A description of the function and of its arguments (except for a
>   couple of cases, when there are no arguments and the return value
>   makes the function usage really obvious).
> - A description of return values (if not void).
> 
> Additional items such as the list of compatible eBPF program and map
> types for each helper, Linux kernel version that introduced the helper,
> GPL-only restriction, and commit hash could be added in the future, but
> it was decided on the mailing list to leave them aside for now.
> 
> For several helpers, descriptions are inspired (at times, nearly copied)
> from the commit logs introducing them in the kernel--Many thanks to
> their respective authors! Some sentences were also adapted from comments
> from the reviews, thanks to the reviewers as well. Descriptions were
> completed as much as possible, the objective being to have something easily
> accessible even for people just starting with eBPF. There is probably a bit
> more work to do in this direction for some helpers.
[...]

Applied yesterday night to bpf-next (and now in net-next), thanks Quentin!

^ permalink raw reply

* Re: [PATCHv2 bpf-next 0/2] BPF tunnel testsuite
From: Daniel Borkmann @ 2018-04-27  9:10 UTC (permalink / raw)
  To: William Tu, netdev
In-Reply-To: <1524776500-27030-1-git-send-email-u9012063@gmail.com>

On 04/26/2018 11:01 PM, William Tu wrote:
> The patch series provide end-to-end eBPF tunnel testsute.  A common topology
> is created below for all types of tunnels:
> 
> Topology:                                                                     
> ---------                                                                     
>      root namespace   |     at_ns0 namespace                                   
>                       |                                                        
>       -----------     |     -----------                                        
>       | tnl dev |     |     | tnl dev |  (overlay network)                     
>       -----------     |     -----------                                        
>       metadata-mode   |     native-mode                                        
>        with bpf       |                                                        
>                       |                                                        
>       ----------      |     ----------                                         
>       |  veth1  | --------- |  veth0  |  (underlay network)                    
>       ----------    peer    ----------                                         
> 	                                                                              
>                                                                                
> Device Configuration                                                          
> --------------------                                                          
>  Root namespace with metadata-mode tunnel + BPF                                
>  Device names and addresses:                                                   
>        veth1 IP: 172.16.1.200, IPv6: 00::22 (underlay)                         
>        tunnel dev <type>11, ex: gre11, IPv4: 10.1.1.200 (overlay)              
>                                                                                
>  Namespace at_ns0 with native tunnel                                           
>  Device names and addresses:                                                   
>        veth0 IPv4: 172.16.1.100, IPv6: 00::11 (underlay)                       
>        tunnel dev <type>00, ex: gre00, IPv4: 10.1.1.100 (overlay)              
>                                                                                
>                                                                                
> End-to-end ping packet flow                                                   
> ---------------------------                                                   
>  Most of the tests start by namespace creation, device configuration,          
>  then ping the underlay and overlay network.  When doing 'ping 10.1.1.100'     
>  from root namespace, the following operations happen:                         
>  1) Route lookup shows 10.1.1.100/24 belongs to tnl dev, fwd to tnl dev.       
>  2) Tnl device's egress BPF program is triggered and set the tunnel metadata,  
>     with remote_ip=172.16.1.200 and others.                                    
>  3) Outer tunnel header is prepended and route the packet to veth1's egress    
>  4) veth0's ingress queue receive the tunneled packet at namespace at_ns0      
>  5) Tunnel protocol handler, ex: vxlan_rcv, decap the packet                   
>  6) Forward the packet to the overlay tnl dev                                  
> 
> Test Cases
> -----------------------------
>  Tunnel Type |  BPF Programs
> -----------------------------
>  GRE:          gre_set_tunnel, gre_get_tunnel
>  IP6GRE:       ip6gretap_set_tunnel, ip6gretap_get_tunnel
>  ERSPAN:       erspan_set_tunnel, erspan_get_tunnel
>  IP6ERSPAN:    ip4ip6erspan_set_tunnel, ip4ip6erspan_get_tunnel
>  VXLAN:        vxlan_set_tunnel, vxlan_get_tunnel
>  IP6VXLAN:     ip6vxlan_set_tunnel, ip6vxlan_get_tunnel
>  GENEVE:       geneve_set_tunnel, geneve_get_tunnel
>  IP6GENEVE:    ip6geneve_set_tunnel, ip6geneve_get_tunnel
>  IPIP:         ipip_set_tunnel, ipip_get_tunnel
>  IP6IP:        ipip6_set_tunnel, ipip6_get_tunnel,
>                ip6ip6_set_tunnel, ip6ip6_get_tunnel
>  XFRM:         xfrm_get_state

Applied yesterday night to bpf-next (and now in net-next), thanks William!

^ permalink raw reply

* Re: [PATCH bpf-next] bpf: fix xdp_generic for bpf_adjust_tail usecase
From: Daniel Borkmann @ 2018-04-27  9:11 UTC (permalink / raw)
  To: Nikita V. Shirokov, Alexei Starovoitov, David S . Miller; +Cc: netdev
In-Reply-To: <20180425141503.25772-1-tehnerd@tehnerd.com>

On 04/25/2018 04:15 PM, Nikita V. Shirokov wrote:
>  when bpf_adjust_tail was introduced for generic xdp, it changed skb's tail
>  pointer, so it was pointing to the new  "end of the packet". however skb's
>  len field wasn't properly modified, so on the wire ethernet frame had
>  original (or even bigger, if adjust_head was used) size. this diff is fixing
>  this.
> 
> Fixes: 198d83bb3 (" bpf: make generic xdp compatible w/
> bpf_xdp_adjust_tail")
> 
> Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com>

Applied yesterday to bpf-next (and now in net-next), thanks Nikita!

^ permalink raw reply

* Re: [RFC v3 0/5] virtio: support packed ring
From: Tiwei Bie @ 2018-04-27  9:12 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, virtualization, linux-kernel, netdev, wexu,
	jfreimann
In-Reply-To: <5c712aa2-f00e-b472-cdfc-48175aea790d@redhat.com>

On Fri, Apr 27, 2018 at 02:17:51PM +0800, Jason Wang wrote:
> On 2018年04月27日 12:18, Michael S. Tsirkin wrote:
> > On Fri, Apr 27, 2018 at 11:56:05AM +0800, Jason Wang wrote:
> > > On 2018年04月25日 13:15, Tiwei Bie wrote:
> > > > Hello everyone,
> > > > 
> > > > This RFC implements packed ring support in virtio driver.
> > > > 
> > > > Some simple functional tests have been done with Jason's
> > > > packed ring implementation in vhost:
> > > > 
> > > > https://lkml.org/lkml/2018/4/23/12
> > > > 
> > > > Both of ping and netperf worked as expected (with EVENT_IDX
> > > > disabled). But there are below known issues:
> > > > 
> > > > 1. Reloading the guest driver will break the Tx/Rx;
> > > Will have a look at this issue.
> > > 
> > > > 2. Zeroing the flags when detaching a used desc will
> > > >      break the guest -> host path.
> > > I still think zeroing flags is unnecessary or even a bug. At host, I track
> > > last observed avail wrap counter and detect avail like (what is suggested in
> > > the example code in the spec):
> > > 
> > > static bool desc_is_avail(struct vhost_virtqueue *vq, __virtio16 flags)
> > > {
> > >         bool avail = flags & cpu_to_vhost16(vq, DESC_AVAIL);
> > > 
> > >         return avail == vq->avail_wrap_counter;
> > > }
> > > 
> > > So zeroing wrap can not work with this obviously.
> > > 
> > > Thanks
> > I agree. I think what one should do is flip the available bit.
> > 
> 
> But is this flipping a must?
> 
> Thanks

Yeah, that's my question too. It seems to be a requirement
for driver that, the only change to the desc status that a
driver can do during running is to mark the desc as avail,
and any other changes to the desc status are not allowed.
Similarly, the device can only mark the desc as used, and
any other changes to the desc status are also not allowed.
So the question is, are there such requirements?

Based on below contents in the spec:

"""
Thus VIRTQ_DESC_F_AVAIL and VIRTQ_DESC_F_USED bits are different
for an available descriptor and equal for a used descriptor.

Note that this observation is mostly useful for sanity-checking
as these are necessary but not sufficient conditions
"""

It seems that, it's necessary for devices to check whether
the AVAIL bit and USED bit are different.

Best regards,
Tiwei Bie

^ permalink raw reply

* Re: [PATCH] [PATCH bpf-next] samples/bpf/bpf_load.c: remove redundant ret assignment in bpf_load_program()
From: Daniel Borkmann @ 2018-04-27  9:15 UTC (permalink / raw)
  To: Wang Sheng-Hui, ast, netdev
In-Reply-To: <20180425020713.1795-1-shhuiw@foxmail.com>

On 04/25/2018 04:07 AM, Wang Sheng-Hui wrote:
> 2 redundant ret assignments removded:
> * 'ret = 1' before the logic 'if (data_maps)', and if any errors jump to
>   label 'done'. No 'ret = 1' needed before the error jump.
> * After the '/* load programs */' part, if everything goes well, then
>   the BPF code will be loaded and 'ret' set to 0 by load_and_attach().
>   If something goes wrong, 'ret' set to none-O, the redundant 'ret = 0'
>   after the for clause will make the error skipped.
>   For example, if some BPF code cannot provide supported program types
>   in ELF SEC("unknown"), the for clause will not call load_and_attach()
>   to load the BPF code. 1 should be returned to callees instead of 0.
> 
> Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com>

Applied yesterday to bpf-next (and now in net-next), thanks Nikita!

^ permalink raw reply

* Re: [PATCH] netfilter: ebtables: handle string from userspace with care
From: Florian Westphal @ 2018-04-27  9:26 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: netfilter-devel, syzbot, fw, coreteam, syzkaller-bugs, netdev
In-Reply-To: <8710122d42aa1f3e081812f2abf406973f834982.1524818458.git.pabeni@redhat.com>

Paolo Abeni <pabeni@redhat.com> wrote:
> strlcpy() can't be safely used on a user-space provided string,
> as it can try to read beyond the buffer's end, if the latter is
> not NULL terminated.

Yes.

> Leveraging the above, syzbot has been able to trigger the following
> splat:
> 
> BUG: KASAN: stack-out-of-bounds in strlcpy include/linux/string.h:300
> [inline]
> BUG: KASAN: stack-out-of-bounds in compat_mtw_from_user
> net/bridge/netfilter/ebtables.c:1957 [inline]
> BUG: KASAN: stack-out-of-bounds in ebt_size_mwt
> net/bridge/netfilter/ebtables.c:2059 [inline]
> BUG: KASAN: stack-out-of-bounds in size_entry_mwt
> net/bridge/netfilter/ebtables.c:2155 [inline]
> BUG: KASAN: stack-out-of-bounds in compat_copy_entries+0x96c/0x14a0
> net/bridge/netfilter/ebtables.c:2194
> Write of size 33 at addr ffff8801b0abf888 by task syz-executor0/4504

Which is weird, I don't understand this report.
The code IS wrong, but it should cause out-of-bounds read (strlen on
src), but not out-of-bounds write.

Yes, I sent a recent patch (dceb48d86b4871984b8ce9ad5057fb2c01aa33de in
nf.git) that would now allow to get rid of the strlcpy and use the
source directly.

^ permalink raw reply

* Re: [PATCH net-next] selftests: pmtu: Minimum MTU for vti6 is 68
From: Xin Long @ 2018-04-27  9:33 UTC (permalink / raw)
  To: Stefano Brivio
  Cc: David S . Miller, Steffen Klassert, Alexey Kodanev, Jarod Wilson,
	Sabrina Dubroca, network dev
In-Reply-To: <c2369c8f004006b33007bad40b63c35f50ff3c23.1524764073.git.sbrivio@redhat.com>

On Fri, Apr 27, 2018 at 1:41 AM, Stefano Brivio <sbrivio@redhat.com> wrote:
> A vti6 interface can carry IPv4 packets too.
>
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> ---
>  tools/testing/selftests/net/pmtu.sh | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh
> index 1e428781a625..7651fd4d86fe 100755
> --- a/tools/testing/selftests/net/pmtu.sh
> +++ b/tools/testing/selftests/net/pmtu.sh
> @@ -368,7 +368,7 @@ test_pmtu_vti6_link_add_mtu() {
>
>         fail=0
>
> -       min=1280
> +       min=68                  # vti6 can carry IPv4 packets too
>         max=$((65535 - 40))
>         # Check invalid values first
>         for v in $((min - 1)) $((max + 1)); do
> @@ -384,7 +384,7 @@ test_pmtu_vti6_link_add_mtu() {
>         done
>
>         # Now check valid values
> -       for v in 1280 1300 $((65535 - 40)); do
> +       for v in 68 1280 1300 $((65535 - 40)); do
>                 ${ns_a} ip link add vti6_a mtu ${v} type vti6 local ${veth6_a_addr} remote ${veth6_b_addr} key 10
>                 mtu="$(link_get_mtu "${ns_a}" vti6_a)"
>                 ${ns_a} ip link del vti6_a
> --
> 2.15.1
>
Reviewed-by: Xin Long <lucien.xin@gmail.com>

^ permalink raw reply

* Re: [PATCH bpf-next] bpf, doc: Update bpf_jit_enable limitation for CONFIG_BPF_JIT_ALWAYS_ON
From: Daniel Borkmann @ 2018-04-27  9:44 UTC (permalink / raw)
  To: Leo Yan, Alexei Starovoitov, David S. Miller, Jonathan Corbet,
	netdev, linux-kernel, linux-doc
In-Reply-To: <1524709611-29437-1-git-send-email-leo.yan@linaro.org>

On 04/26/2018 04:26 AM, Leo Yan wrote:
> When CONFIG_BPF_JIT_ALWAYS_ON is enabled, kernel has limitation for
> bpf_jit_enable, so it has fixed value 1 and we cannot set it to 2
> for JIT opcode dumping; this patch is to update the doc for it.
> 
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  Documentation/networking/filter.txt | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt
> index fd55c7d..feddab9 100644
> --- a/Documentation/networking/filter.txt
> +++ b/Documentation/networking/filter.txt
> @@ -483,6 +483,12 @@ Example output from dmesg:
>  [ 3389.935851] JIT code: 00000030: 00 e8 28 94 ff e0 83 f8 01 75 07 b8 ff ff 00 00
>  [ 3389.935852] JIT code: 00000040: eb 02 31 c0 c9 c3
>  
> +When CONFIG_BPF_JIT_ALWAYS_ON is enabled, bpf_jit_enable is set to 1 by default
> +and it returns failure if change to any other value from proc node; this is
> +for security consideration to avoid leaking info to unprivileged users. In this
> +case, we can't directly dump JIT opcode image from kernel log, alternatively we
> +need to use bpf tool for the dumping.
> +

Could you change this doc text a bit, I think it's slightly misleading. From the first
sentence one could also interpret that value 0 would leaking info to unprivileged users
whereas here we're only talking about the case of value 2. Maybe something roughly like
this to make it more clear:

  When CONFIG_BPF_JIT_ALWAYS_ON is enabled, bpf_jit_enable is permanently set to 1 and
  setting any other value than that will return in failure. This is even the case for
  setting bpf_jit_enable to 2, since dumping the final JIT image into the kernel log
  is discouraged and introspection through bpftool (under tools/bpf/bpftool/) is the
  generally recommended approach instead.

Thanks,
Daniel

^ permalink raw reply

* Re: [PATCH] netfilter: ebtables: handle string from userspace with care
From: Dmitry Vyukov @ 2018-04-27  9:46 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Paolo Abeni, netfilter-devel, syzbot, coreteam, syzkaller-bugs,
	netdev
In-Reply-To: <20180427092622.4ifhb4zjoncwawmi@breakpoint.cc>

On Fri, Apr 27, 2018 at 11:26 AM, Florian Westphal <fw@strlen.de> wrote:
> Paolo Abeni <pabeni@redhat.com> wrote:
>> strlcpy() can't be safely used on a user-space provided string,
>> as it can try to read beyond the buffer's end, if the latter is
>> not NULL terminated.
>
> Yes.
>
>> Leveraging the above, syzbot has been able to trigger the following
>> splat:
>>
>> BUG: KASAN: stack-out-of-bounds in strlcpy include/linux/string.h:300
>> [inline]
>> BUG: KASAN: stack-out-of-bounds in compat_mtw_from_user
>> net/bridge/netfilter/ebtables.c:1957 [inline]
>> BUG: KASAN: stack-out-of-bounds in ebt_size_mwt
>> net/bridge/netfilter/ebtables.c:2059 [inline]
>> BUG: KASAN: stack-out-of-bounds in size_entry_mwt
>> net/bridge/netfilter/ebtables.c:2155 [inline]
>> BUG: KASAN: stack-out-of-bounds in compat_copy_entries+0x96c/0x14a0
>> net/bridge/netfilter/ebtables.c:2194
>> Write of size 33 at addr ffff8801b0abf888 by task syz-executor0/4504
>
> Which is weird, I don't understand this report.
> The code IS wrong, but it should cause out-of-bounds read (strlen on
> src), but not out-of-bounds write.

Please see this for explanation:
https://groups.google.com/d/msg/syzkaller-bugs/-Jyti8zBWjU/6n-fkmXeBAAJ
The stack overwrite actually happens here.

> Yes, I sent a recent patch (dceb48d86b4871984b8ce9ad5057fb2c01aa33de in
> nf.git) that would now allow to get rid of the strlcpy and use the
> source directly.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox