Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next 4/4] net: phy: Add generic support for 2.5GBaseT and 5GBaseT
From: Heiner Kallweit @ 2019-02-12 19:14 UTC (permalink / raw)
  To: Maxime Chevallier, davem
  Cc: netdev, linux-kernel, Andrew Lunn, Florian Fainelli, Russell King,
	linux-arm-kernel, Antoine Tenart, thomas.petazzoni,
	gregory.clement, miquel.raynal, nadavh, stefanc, mw
In-Reply-To: <20190211142529.22885-5-maxime.chevallier@bootlin.com>

On 11.02.2019 15:25, Maxime Chevallier wrote:
> The 802.3bz specification, based on previous by the NBASET alliance,
> defines the 2.5GBaseT and 5GBaseT link modes for ethernet traffic on
> cat5e, cat6 and cat7 cables.
> 
> These mode integrate with the already defined C45 MDIO PMA/PMD registers
> set that added 10G support, by defining some previously reserved bits,
> and adding a new register (2.5G/5G Extended abilities).
> 
> This commit adds the required definitions in include/uapi/linux/mdio.h
> to support these modes, and detect when a link-partner advertises them.
> 
> It also adds support for these mode in the generic C45 PHY
> infrastructure.
> 
> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
> ---
>  drivers/net/phy/phy-c45.c | 37 +++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/mdio.h | 16 ++++++++++++++++
>  2 files changed, 53 insertions(+)
> 
> diff --git a/drivers/net/phy/phy-c45.c b/drivers/net/phy/phy-c45.c
> index 6f028de4dae1..7af5fa81daf6 100644
> --- a/drivers/net/phy/phy-c45.c
> +++ b/drivers/net/phy/phy-c45.c
> @@ -47,6 +47,16 @@ int genphy_c45_pma_setup_forced(struct phy_device *phydev)
>  		/* Assume 1000base-T */
>  		ctrl2 |= MDIO_PMA_CTRL2_1000BT;
>  		break;
> +	case SPEED_2500:
> +		ctrl1 |= MDIO_CTRL1_SPEED2_5G;
> +		/* Assume 2.5Gbase-T */
> +		ctrl2 |= MDIO_PMA_CTRL2_2_5GBT;
> +		break;
> +	case SPEED_5000:
> +		ctrl1 |= MDIO_CTRL1_SPEED5G;
> +		/* Assume 5Gbase-T */
> +		ctrl2 |= MDIO_PMA_CTRL2_5GBT;
> +		break;
>  	case SPEED_10000:
>  		ctrl1 |= MDIO_CTRL1_SPEED10G;
>  		/* Assume 10Gbase-T */
> @@ -194,6 +204,12 @@ int genphy_c45_read_lpa(struct phy_device *phydev)
>  	if (val < 0)
>  		return val;
>  
> +	if (val & MDIO_AN_10GBT_STAT_LP2_5G)
> +		linkmode_set_bit(ETHTOOL_LINK_MODE_2500baseT_Full_BIT,
> +				 phydev->lp_advertising);
> +	if (val & MDIO_AN_10GBT_STAT_LP5G)
> +		linkmode_set_bit(ETHTOOL_LINK_MODE_5000baseT_Full_BIT,
> +				 phydev->lp_advertising);
>  	if (val & MDIO_AN_10GBT_STAT_LP10G)
>  		linkmode_set_bit(ETHTOOL_LINK_MODE_10000baseT_Full_BIT,
>  				 phydev->lp_advertising);
> @@ -224,6 +240,12 @@ int genphy_c45_read_pma(struct phy_device *phydev)
>  	case MDIO_PMA_CTRL1_SPEED1000:
>  		phydev->speed = SPEED_1000;
>  		break;
> +	case MDIO_CTRL1_SPEED2_5G:
> +		phydev->speed = SPEED_2500;
> +		break;
> +	case MDIO_CTRL1_SPEED5G:
> +		phydev->speed = SPEED_5000;
> +		break;
>  	case MDIO_CTRL1_SPEED10G:
>  		phydev->speed = SPEED_10000;
>  		break;
> @@ -339,6 +361,21 @@ int genphy_c45_pma_read_abilities(struct phy_device *phydev)
>  		linkmode_mod_bit(ETHTOOL_LINK_MODE_10baseT_Half_BIT,
>  				 phydev->supported,
>  				 val & MDIO_PMA_EXTABLE_10BT);
> +
> +		if (val & MDIO_PMA_EXTABLE_NBT) {
> +			val = phy_read_mmd(phydev, MDIO_MMD_PMAPMD,
> +					   MDIO_PMA_NG_EXTABLE);
> +			if (val < 0)
> +				return val;
> +
> +			linkmode_mod_bit(ETHTOOL_LINK_MODE_2500baseT_Full_BIT,
> +					 phydev->supported,
> +					 val & MDIO_PMA_NG_EXTABLE_2_5GBT);
> +
> +			linkmode_mod_bit(ETHTOOL_LINK_MODE_5000baseT_Full_BIT,
> +					 phydev->supported,
> +					 val & MDIO_PMA_NG_EXTABLE_5GBT);
> +		}
>  	}
>  
>  	return 0;
> diff --git a/include/uapi/linux/mdio.h b/include/uapi/linux/mdio.h
> index 0e012b168e4d..0a552061ff1c 100644
> --- a/include/uapi/linux/mdio.h
> +++ b/include/uapi/linux/mdio.h
> @@ -45,6 +45,7 @@
>  #define MDIO_AN_ADVERTISE	16	/* AN advertising (base page) */
>  #define MDIO_AN_LPA		19	/* AN LP abilities (base page) */
>  #define MDIO_PCS_EEE_ABLE	20	/* EEE Capability register */
> +#define MDIO_PMA_NG_EXTABLE	21	/* 2.5G/5G PMA/PMD extended ability */
>  #define MDIO_PCS_EEE_WK_ERR	22	/* EEE wake error counter */
>  #define MDIO_PHYXS_LNSTAT	24	/* PHY XGXS lane state */
>  #define MDIO_AN_EEE_ADV		60	/* EEE advertisement */
> @@ -92,6 +93,10 @@
>  #define MDIO_CTRL1_SPEED10G		(MDIO_CTRL1_SPEEDSELEXT | 0x00)
>  /* 10PASS-TS/2BASE-TL */
>  #define MDIO_CTRL1_SPEED10P2B		(MDIO_CTRL1_SPEEDSELEXT | 0x04)
> +/* 2.5 Gb/s */
> +#define MDIO_CTRL1_SPEED2_5G		(MDIO_CTRL1_SPEEDSELEXT | 0x18)
> +/* 5 Gb/s */
> +#define MDIO_CTRL1_SPEED5G		(MDIO_CTRL1_SPEEDSELEXT | 0x1c)
>  
>  /* Status register 1. */
>  #define MDIO_STAT1_LPOWERABLE		0x0002	/* Low-power ability */
> @@ -145,6 +150,8 @@
>  #define MDIO_PMA_CTRL2_1000BKX		0x000d	/* 1000BASE-KX type */
>  #define MDIO_PMA_CTRL2_100BTX		0x000e	/* 100BASE-TX type */
>  #define MDIO_PMA_CTRL2_10BT		0x000f	/* 10BASE-T type */
> +#define MDIO_PMA_CTRL2_2_5GBT		0x0030  /* 2.5GBaseT type */
> +#define MDIO_PMA_CTRL2_5GBT		0x0031  /* 5GBaseT type */
>  #define MDIO_PCS_CTRL2_TYPE		0x0003	/* PCS type selection */
>  #define MDIO_PCS_CTRL2_10GBR		0x0000	/* 10GBASE-R type */
>  #define MDIO_PCS_CTRL2_10GBX		0x0001	/* 10GBASE-X type */
> @@ -198,6 +205,7 @@
>  #define MDIO_PMA_EXTABLE_1000BKX	0x0040	/* 1000BASE-KX ability */
>  #define MDIO_PMA_EXTABLE_100BTX		0x0080	/* 100BASE-TX ability */
>  #define MDIO_PMA_EXTABLE_10BT		0x0100	/* 10BASE-T ability */
> +#define MDIO_PMA_EXTABLE_NBT		0x4000  /* 2.5/5GBASE-T ability */
>  
>  /* PHY XGXS lane state register. */
>  #define MDIO_PHYXS_LNSTAT_SYNC0		0x0001
> @@ -234,9 +242,13 @@
>  #define MDIO_PCS_10GBRT_STAT2_BER	0x3f00
>  
>  /* AN 10GBASE-T control register. */
> +#define MDIO_AN_10GBT_CTRL_ADV2_5G	0x0080	/* Advertise 2.5GBASE-T */
> +#define MDIO_AN_10GBT_CTRL_ADV5G	0x0100	/* Advertise 5GBASE-T */
>  #define MDIO_AN_10GBT_CTRL_ADV10G	0x1000	/* Advertise 10GBASE-T */
>  
>  /* AN 10GBASE-T status register. */
> +#define MDIO_AN_10GBT_STAT_LP2_5G	0x0020  /* LP is 2.5GBT capable */
> +#define MDIO_AN_10GBT_STAT_LP5G		0x0040  /* LP is 5GBT capable */
>  #define MDIO_AN_10GBT_STAT_LPTRR	0x0200	/* LP training reset req. */
>  #define MDIO_AN_10GBT_STAT_LPLTABLE	0x0400	/* LP loop timing ability */
>  #define MDIO_AN_10GBT_STAT_LP10G	0x0800	/* LP is 10GBT capable */
> @@ -265,6 +277,10 @@
>  #define MDIO_EEE_10GKX4		0x0020	/* 10G KX4 EEE cap */
>  #define MDIO_EEE_10GKR		0x0040	/* 10G KR EEE cap */
>  
> +/* 2.5G/5G Extended abilities register. */
> +#define MDIO_PMA_NG_EXTABLE_2_5GBT	0x0001	/* 2.5GBASET ability */
> +#define MDIO_PMA_NG_EXTABLE_5GBT	0x0002	/* 5GBASET ability */
> +
>  /* LASI RX_ALARM control/status registers. */
>  #define MDIO_PMA_LASI_RX_PHYXSLFLT	0x0001	/* PHY XS RX local fault */
>  #define MDIO_PMA_LASI_RX_PCSLFLT	0x0008	/* PCS RX local fault */
> 
Looks good to me.

Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>

Heiner

^ permalink raw reply

* Re: KASAN: use-after-free Read in sctp_outq_tail
From: Marcelo Ricardo Leitner @ 2019-02-12 19:19 UTC (permalink / raw)
  To: syzbot
  Cc: davem, linux-kernel, linux-sctp, netdev, nhorman, syzkaller-bugs,
	vyasevich
In-Reply-To: <0000000000002015db0581b71858@google.com>

On Tue, Feb 12, 2019 at 11:04:27AM -0800, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:    d4104460aec1 Add linux-next specific files for 20190211
> git tree:       linux-next

I can't find this commit. How can I know for sure which commits are in?
Recent patches are involved with this code, but not sure what was
included on the test.

  Marcelo

^ permalink raw reply

* Re: [Patch net] team: avoid complex list operations in team_nl_cmd_options_set()
From: David Miller @ 2019-02-12 19:21 UTC (permalink / raw)
  To: xiyou.wangcong
  Cc: netdev, syzbot+4d4af685432dc0e56c91, syzbot+68ee510075cf64260cc4,
	jiri, pabeni
In-Reply-To: <20190212055951.6712-1-xiyou.wangcong@gmail.com>

From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Mon, 11 Feb 2019 21:59:51 -0800

> The current opt_inst_list operations inside team_nl_cmd_options_set()
> is too complex to track:
> 
>     LIST_HEAD(opt_inst_list);
>     nla_for_each_nested(...) {
>         list_for_each_entry(opt_inst, &team->option_inst_list, list) {
>             if (__team_option_inst_tmp_find(&opt_inst_list, opt_inst))
>                 continue;
>             list_add(&opt_inst->tmp_list, &opt_inst_list);
>         }
>     }
>     team_nl_send_event_options_get(team, &opt_inst_list);
> 
> as while we retrieve 'opt_inst' from team->option_inst_list, it could
> be added to the local 'opt_inst_list' for multiple times. The
> __team_option_inst_tmp_find() doesn't work, as the setter
> team_mode_option_set() still calls team->ops.exit() which uses
> ->tmp_list too in __team_options_change_check().
> 
> Simplify the list operations by moving the 'opt_inst_list' and
> team_nl_send_event_options_get() into the nla_for_each_nested() loop so
> that it can be guranteed that we won't insert a same list entry for
> multiple times. Therefore, __team_option_inst_tmp_find() can be removed
> too.
> 
> Fixes: 4fb0534fb7bb ("team: avoid adding twice the same option to the event list")
> Fixes: 2fcdb2c9e659 ("team: allow to send multiple set events in one message")
> Reported-by: syzbot+4d4af685432dc0e56c91@syzkaller.appspotmail.com
> Reported-by: syzbot+68ee510075cf64260cc4@syzkaller.appspotmail.com
> Cc: Jiri Pirko <jiri@resnulli.us>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Applied and queued up for -stable, thanks Cong.

^ permalink raw reply

* Re: [PATCH net-next 1/2] devlink: Return right error code in case of errors for region read
From: David Miller @ 2019-02-12 19:24 UTC (permalink / raw)
  To: parav; +Cc: jiri, netdev
In-Reply-To: <1549955359-15786-1-git-send-email-parav@mellanox.com>

From: Parav Pandit <parav@mellanox.com>
Date: Tue, 12 Feb 2019 01:09:19 -0600

> devlink_nl_cmd_region_read_dumpit() misses to return right error code on
> most error conditions.
> Return the right error code on such errors.
> 
> Fixes: 4e54795a27f5 ("devlink: Add support for region snapshot read command")
> Signed-off-by: Parav Pandit <parav@mellanox.com>
> Acked-by: Jiri Pirko <jiri@mellanox.com>

This has conflicts with the locking changes I applied earlier today.

Please respin, thanks.

^ permalink raw reply

* Re: [PATCH net] net: phy: fix interrupt handling in non-started states
From: Andrew Lunn @ 2019-02-12 19:37 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Florian Fainelli, David Miller, netdev@vger.kernel.org,
	Russell King - ARM Linux
In-Reply-To: <25e86edc-0b88-8c03-b692-776e971331f2@gmail.com>

On Tue, Feb 12, 2019 at 07:56:15PM +0100, Heiner Kallweit wrote:
> phylib enables interrupts before phy_start() has been called, and if
> we receive an interrupt in a non-started state, the interrupt handler
> returns IRQ_NONE. This causes problems with at least one Marvell chip
> as reported by Andrew.
> Fix this by handling interrupts the same as in phy_mac_interrupt(),
> basically always running the phylib state machine. It knows when it
> has to do something and when not.
> This change allows to handle interrupts gracefully even if they
> occur in a non-started state.
> 
> Fixes: 2b3e88ea6528 ("net: phy: improve phy state checking")
> Reported-by: Andrew Lunn <andrew@lunn.ch>
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
> ---
>  drivers/net/phy/phy.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
> index 189cd2048c3a..ca5e0c0f018c 100644
> --- a/drivers/net/phy/phy.c
> +++ b/drivers/net/phy/phy.c
> @@ -762,9 +762,6 @@ static irqreturn_t phy_interrupt(int irq, void *phy_dat)
>  {
>  	struct phy_device *phydev = phy_dat;
>  
> -	if (!phy_is_started(phydev))
> -		return IRQ_NONE;		/* It can't be ours.  */
> -
>  	if (phydev->drv->did_interrupt && !phydev->drv->did_interrupt(phydev))
>  		return IRQ_NONE;

Hi Heiner

Did you look at

ommit 3c3070d713d798f7f9e7ee3614e49b47655d14d8
Author: Maciej W. Rozycki <macro@linux-mips.org>
Date:   Tue Oct 3 16:18:35 2006 +0100

    [PATCH] 2.6.18: sb1250-mac: Phylib IRQ handling fixes
    
     This patch fixes a couple of problems discovered with interrupt handling
    in the phylib core, namely:
    
    1. The driver uses timer and workqueue calls, but does not include
       <linux/timer.h> nor <linux/workqueue.h>.
    
    2. The driver uses schedule_work() for handling interrupts, but does not
       make sure any pending work scheduled thus has been completed before
       driver's structures get freed from memory.  This is especially
       important as interrupts may keep arriving if the line is shared with
       another PHY.
    
       The solution is to ignore phy_interrupt() calls if the reported device
       has already been halted and calling flush_scheduled_work() from
       phy_stop_interrupts() (but guarded with current_is_keventd() in case
       the function has been called through keventd from the MAC device's
       close call to avoid a deadlock on the netlink lock).
    
    Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
    
    patch-mips-2.6.18-20060920-phy-irq-16
    Signed-off-by: Jeff Garzik <jeff@garzik.org>

There has been a lot of change since then, so maybe this is no longer
an issue?

   Thanks
	Andrew

^ permalink raw reply

* patch for ip6_input.c
From: Farrell.Woods @ 2019-02-12 19:53 UTC (permalink / raw)
  To: netdev

Folks,

I'm proposing the following patch for ip6_input.c:

diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
index c7ed2b6..5aba6a6 100644
--- a/net/ipv6/ip6_input.c
+++ b/net/ipv6/ip6_input.c
@@ -409,12 +409,10 @@ void ip6_protocol_deliver_rcu(struct net *net, 
struct sk_buff *skb, int nexthdr,
                 }
         } else {
                 if (!raw) {
-                       if (xfrm6_policy_check(NULL, XFRM_POLICY_IN, skb)) {
-                               __IP6_INC_STATS(net, idev,
- IPSTATS_MIB_INUNKNOWNPROTOS);
-                               icmpv6_send(skb, ICMPV6_PARAMPROB,
-                                           ICMPV6_UNK_NEXTHDR, nhoff);
-                       }
+                       __IP6_INC_STATS(net, idev,
+                               IPSTATS_MIB_INUNKNOWNPROTOS);
+                       icmpv6_send(skb, ICMPV6_PARAMPROB,
+                               ICMPV6_UNK_NEXTHDR, nhoff);
                         kfree_skb(skb);
                 } else {
                         __IP6_INC_STATS(net, idev, IPSTATS_MIB_INDELIVERS);

The patch fixes an IPv6 conformance test failure (v6LC_1_2_03a in the 
UNH INTACT suite) that occurs specifically when IPsec is in use.  The 
test iterates through the set of unassigned protocol numbers (currently, 
143 through 252) and inserts these into the next header field of a 
Destination Options header.  The expected test result is that an ICMPv6 
Parameter Problem is sent back.  But if there's a policy in place that 
requires an active SA between the Test Node and the Device Under Test 
(and none exists), the inbound packet is quietly dropped.

This behavior is inconsistent with, for example, how unknown tlv's are 
handled in extension headers (see the tlv parsing code in 
ipv6/exthdrs.c) or for instance how misaligned fragment headers are 
handled.  These will always cause a Parameter Problem message to get 
sent back to the source.

I have verified that with the policy check removed, that the unit test 
passes.

FYI here's a trace of the test in question:

No.     Time           Source Destination           Protocol Length Info
       1 0.000000000    fe80::200:10ff:fe10:1080 
fe80::260:16ff:fe97:ebf2 IPv6     71 *Unknown IP Protocol: Unassigned (143)*

Frame 1: 71 bytes on wire (568 bits), 71 bytes captured (568 bits) on 
interface 0
     Interface id: 0 (unknown)
         Interface name: unknown
     Encapsulation type: Ethernet (1)
     Arrival Time: Feb  6, 2019 13:27:29.949609000 EST
     [Time shift for this packet: 0.000000000 seconds]
     Epoch Time: 1549477649.949609000 seconds
     [Time delta from previous captured frame: 0.000000000 seconds]
     [Time delta from previous displayed frame: 0.000000000 seconds]
     [Time since reference or first frame: 0.000000000 seconds]
     Frame Number: 1
     Frame Length: 71 bytes (568 bits)
     Capture Length: 71 bytes (568 bits)
     [Frame is marked: False]
     [Frame is ignored: False]
     [Protocols in frame: eth:ethertype:ipv6:ipv6.dstopts:data]
Ethernet II, Src: Sytek_10:10:80 (00:00:10:10:10:80), Dst: 
Clariion_97:eb:f2 (00:60:16:97:eb:f2)
     Destination: Clariion_97:eb:f2 (00:60:16:97:eb:f2)
         Address: Clariion_97:eb:f2 (00:60:16:97:eb:f2)
         .... ..0. .... .... .... .... = LG bit: Globally unique address 
(factory default)
         .... ...0 .... .... .... .... = IG bit: Individual address 
(unicast)
     Source: Sytek_10:10:80 (00:00:10:10:10:80)
         Address: Sytek_10:10:80 (00:00:10:10:10:80)
         .... ..0. .... .... .... .... = LG bit: Globally unique address 
(factory default)
         .... ...0 .... .... .... .... = IG bit: Individual address 
(unicast)
     Type: IPv6 (0x86dd)
Internet Protocol Version 6, Src: fe80::200:10ff:fe10:1080, Dst: 
fe80::260:16ff:fe97:ebf2
     0110 .... = Version: 6
     .... 0000 0000 .... .... .... .... .... = Traffic Class: 0x00 
(DSCP: CS0, ECN: Not-ECT)
         .... 0000 00.. .... .... .... .... .... = Differentiated 
Services Codepoint: Default (0)
         .... .... ..00 .... .... .... .... .... = Explicit Congestion 
Notification: Not ECN-Capable Transport (0)
     .... .... .... 0000 0000 0000 0000 0000 = Flow Label: 0x00000
     Payload Length: 17
     Next Header: Destination Options for IPv6 (60)
     Hop Limit: 255
     Source: fe80::200:10ff:fe10:1080
     Destination: fe80::260:16ff:fe97:ebf2
     [Source SA MAC: Sytek_10:10:80 (00:00:10:10:10:80)]
     [Destination SA MAC: Clariion_97:eb:f2 (00:60:16:97:eb:f2)]
     Destination Options for IPv6
*Next Header: Unassigned (143**)*
         Length: 0
         [Length: 8 bytes]
         PadN
             Type: PadN (0x01)
                 00.. .... = Action: Skip and continue (0)
                 ..0. .... = May Change: No
                 ...0 0001 = Low-Order Bits: 0x01
             Length: 4
             PadN: 00000000
Data (9 bytes)

0000  80 00 5c eb 00 00 00 00 00 ..\......
     Data: 80005ceb0000000000
     [Length: 9]

I am working on a product that will ship with IPsec enabled and with a 
set of traffic selectors in place that will exclude most inbound 
traffic.  Since this is how it will ship to the customer, we must leave 
IPsec enabled when this goes to UNH for USGv6 certification.


Thanks for your consideration.

     -- Farrell

^ permalink raw reply related

* linux-next: Fixes tag needs some work in the net tree
From: Stephen Rothwell @ 2019-02-12 20:04 UTC (permalink / raw)
  To: David Miller, Networking
  Cc: Linux Next Mailing List, Linux Kernel Mailing List, Li RongQing

[-- Attachment #1: Type: text/plain, Size: 433 bytes --]

Hi all,

In commit

  d1f20798a119 ("ipv6: propagate genlmsg_reply return code")

Fixes tag

  Fixes: 915d7e5e593 ("ipv6: sr: add code base for control plane support of SR-IPv6")

has these problem(s):

  - SHA1 should be at least 12 digits long
    Can be fixed by setting core.abbrev to 12 (or more) or (for git v2.11
    or later) just making sure it is not set (or set to "auto")

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH net] net: phy: fix interrupt handling in non-started states
From: Heiner Kallweit @ 2019-02-12 20:06 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Florian Fainelli, David Miller, netdev@vger.kernel.org,
	Russell King - ARM Linux
In-Reply-To: <20190212193716.GD7527@lunn.ch>

On 12.02.2019 20:37, Andrew Lunn wrote:
> On Tue, Feb 12, 2019 at 07:56:15PM +0100, Heiner Kallweit wrote:
>> phylib enables interrupts before phy_start() has been called, and if
>> we receive an interrupt in a non-started state, the interrupt handler
>> returns IRQ_NONE. This causes problems with at least one Marvell chip
>> as reported by Andrew.
>> Fix this by handling interrupts the same as in phy_mac_interrupt(),
>> basically always running the phylib state machine. It knows when it
>> has to do something and when not.
>> This change allows to handle interrupts gracefully even if they
>> occur in a non-started state.
>>
>> Fixes: 2b3e88ea6528 ("net: phy: improve phy state checking")
>> Reported-by: Andrew Lunn <andrew@lunn.ch>
>> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>> ---
>>  drivers/net/phy/phy.c | 3 ---
>>  1 file changed, 3 deletions(-)
>>
>> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
>> index 189cd2048c3a..ca5e0c0f018c 100644
>> --- a/drivers/net/phy/phy.c
>> +++ b/drivers/net/phy/phy.c
>> @@ -762,9 +762,6 @@ static irqreturn_t phy_interrupt(int irq, void *phy_dat)
>>  {
>>  	struct phy_device *phydev = phy_dat;
>>  
>> -	if (!phy_is_started(phydev))
>> -		return IRQ_NONE;		/* It can't be ours.  */
>> -
>>  	if (phydev->drv->did_interrupt && !phydev->drv->did_interrupt(phydev))
>>  		return IRQ_NONE;
> 
> Hi Heiner
> 
> Did you look at
> 
Thanks for this interesting old read.

> ommit 3c3070d713d798f7f9e7ee3614e49b47655d14d8
> Author: Maciej W. Rozycki <macro@linux-mips.org>
> Date:   Tue Oct 3 16:18:35 2006 +0100
> 
>     [PATCH] 2.6.18: sb1250-mac: Phylib IRQ handling fixes
>     
>      This patch fixes a couple of problems discovered with interrupt handling
>     in the phylib core, namely:
>     
>     1. The driver uses timer and workqueue calls, but does not include
>        <linux/timer.h> nor <linux/workqueue.h>.
>     
>     2. The driver uses schedule_work() for handling interrupts, but does not
>        make sure any pending work scheduled thus has been completed before
>        driver's structures get freed from memory.  This is especially
>        important as interrupts may keep arriving if the line is shared with
>        another PHY.
>     
Since that time this have massively changed and I *think* this doesn't apply
any longer. cancel_delayed_work_sync() is called always before driver
structure is freed.

>        The solution is to ignore phy_interrupt() calls if the reported device
>        has already been halted and calling flush_scheduled_work() from
>        phy_stop_interrupts() (but guarded with current_is_keventd() in case
>        the function has been called through keventd from the MAC device's
>        close call to avoid a deadlock on the netlink lock).
>     
>     Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
>     
>     patch-mips-2.6.18-20060920-phy-irq-16
>     Signed-off-by: Jeff Garzik <jeff@garzik.org>
> 
> There has been a lot of change since then, so maybe this is no longer
> an issue?
> 
AFAICS, yes. I'll have a closer look at the scenario described by Russell,
this however should be independent of the patch here.

>    Thanks
> 	Andrew
> 
Heiner

^ permalink raw reply

* Re: [PATCH net] dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit
From: John David Anglin @ 2019-02-12 20:09 UTC (permalink / raw)
  To: Andrew Lunn, Heiner Kallweit
  Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev
In-Reply-To: <20190212125644.GA7527@lunn.ch>

On 2019-02-12 7:56 a.m., Andrew Lunn wrote:
> It is a little bit old, 5.0-rc1 net-next. I should rebase and
> retest. I'm testing on a ZII board which is not fully in mainline So i
> need some patches.
I attempted to test 5.0.0-rc5 on espressobin but it dies on boot:
Starting kernel ...

[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]
[    0.000000] Linux version 5.0.0-rc5+ (root@espressobin) (gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)) #1 SMP PREEMPT Tue Feb 12
10:02:30 EST 2019
[    0.000000] Machine model: Globalscale Marvell ESPRESSOBin Board
[    0.000000] earlycon: ar3700_uart0 at MMIO 0x00000000d0012000 (options '')
[    0.000000] printk: bootconsole [ar3700_uart0] enabled
[    3.218865] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[    3.224521] Modules linked in:
[    3.227661] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc5+ #1
[    3.234107] Hardware name: Globalscale Marvell ESPRESSOBin Board (DT)
[    3.240740] pstate: 20000005 (nzCv daif -PAN -UAO)
[    3.245676] pc : dma_direct_map_page+0x48/0x1d8
[    3.250331] lr : mv_xor_channel_add+0x3b0/0xb28
[    3.254984] sp : ffffff8010033a60
[    3.258389] x29: ffffff8010033a60 x28: ffffffc03bf70480
[    3.263854] x27: ffffff8010e97068 x26: 0000000000000000
[    3.269320] x25: 0000000000000029 x24: 0000000000000083
[    3.274785] x23: 0000000000000000 x22: 0000000000000002
[    3.280251] x21: 0000000000000080 x20: ffffff8010ecd000
[    3.285716] x19: 0000000000000000 x18: ffffffffffffffff
[    3.291182] x17: 000000000000000c x16: 000000000000000a
[    3.296648] x15: ffffff8010ecd6c8 x14: ffffffc03bf46783
[    3.302113] x13: ffffffc03bf46782 x12: 0000000000000038
[    3.307579] x11: 0000000000001fff x10: 0000000000000001
[    3.313044] x9 : 0000000000000000 x8 : ffffff8010dbe000
[    3.318510] x7 : ffffff8010fbe000 x6 : ffffffbf00000000
[    3.323976] x5 : 0000000000000000 x4 : 0000000000000002
[    3.329441] x3 : 0000000000000002 x2 : 00000000000006ac
[    3.334907] x1 : ffffffbf00efdc00 x0 : 0000000000000000
[    3.340373] Process swapper/0 (pid: 1, stack limit = 0x(____ptrval____))
[    3.347272] Call trace:
[    3.349784]  dma_direct_map_page+0x48/0x1d8
[    3.354084]  mv_xor_channel_add+0x3b0/0xb28
[    3.358384]  mv_xor_probe+0x20c/0x4b8
[    3.362150]  platform_drv_probe+0x50/0xb0
[    3.366269]  really_probe+0x1fc/0x2c0
[    3.370032]  driver_probe_device+0x58/0x100
[    3.374332]  __driver_attach+0xd8/0xe0
[    3.378188]  bus_for_each_dev+0x68/0xc8
[    3.382129]  driver_attach+0x20/0x28
[    3.385803]  bus_add_driver+0x108/0x228
[    3.389744]  driver_register+0x60/0x110
[    3.393687]  __platform_driver_register+0x44/0x50
[    3.398529]  mv_xor_driver_init+0x18/0x20
[    3.402648]  do_one_initcall+0x58/0x170
[    3.406591]  kernel_init_freeable+0x190/0x234
[    3.411072]  kernel_init+0x10/0x108
[    3.414653]  ret_from_fork+0x10/0x1c
[    3.418329] Code: 2a0403f6 934cfc00 aa0503f7 7100047f (f9412663)
[    3.424610] ---[ end trace f7751570455a07a0 ]---
[    3.429423] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    3.437232] SMP: stopping secondary CPUs
[    3.441265] Kernel Offset: disabled
[    3.444849] CPU features: 0x002,2000200c
[    3.448878] Memory Limit: none
[    3.452018] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---

Dave

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply

* Re: [PATCH net] dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit
From: Heiner Kallweit @ 2019-02-12 20:11 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: Andrew Lunn, John David Anglin, Vivien Didelot, Florian Fainelli,
	netdev
In-Reply-To: <20190212163017.lwstmgtyw76cwrd7@shell.armlinux.org.uk>

On 12.02.2019 17:30, Russell King - ARM Linux admin wrote:
> On Tue, Feb 12, 2019 at 07:51:05AM +0100, Heiner Kallweit wrote:
>> On 12.02.2019 04:58, Andrew Lunn wrote:
>>> That change means we don't check the PHY device if it caused an
>>> interrupt when its state is less than UP.
>>>
>>> What i'm seeing is that the PHY is interrupting pretty early on after
>>> a reboot when the previous boot had the interface up.
>>>
>> So this means that when going down for reboot the interrupts are not
>> properly masked / disabled? Because (at least for net-next) we enable
>> interrupts in phy_start() only.
> 
> Looking at Linus' tree as opposed to net-next, things do look rather
> broken wrt interrupts:
> 
> +-phy_attach_direct
>   `-phydev->state = PHY_READY
> +-phy_prepare_link
> +-phy_start_machine
>   `-phy_trigger_machine()
> `-phy_start_interrupts
>   +-request_threaded_irq()
>   `-phy_enable_interrupts()
>     +-phy_clear_interrupt()
>     `-phy_config_interrupt(, PHY_INTERRUPT_ENABLED)
> 
> At this point, the PHY is then able to generate interrupts, which,
> because phy_start() has not been called and phy_interrupt() checks
> that phydev->state >= PHY_UP, get ignored by the interrupt handler
> exactly as Andrew is finding.
> 
> So it looks like 5.0-rc is already in need of this being fixed.
> 
> In looking at this, I came across this chunk of code:
> 
> static inline bool __phy_is_started(struct phy_device *phydev)
> {
>         WARN_ON(!mutex_is_locked(&phydev->lock));
> 
>         return phydev->state >= PHY_UP;
> }
> 
> /**
>  * phy_is_started - Convenience function to check whether PHY is started
>  * @phydev: The phy_device struct
>  */
> static inline bool phy_is_started(struct phy_device *phydev)
> {
>         bool started;
> 
>         mutex_lock(&phydev->lock);
>         started = __phy_is_started(phydev);
>         mutex_unlock(&phydev->lock);
> 
>         return started;
> }
> 
> which looks to me like over-complication.  The mutex locking there is
> completely pointless - what are you trying to achieve with it?
> 
Even though this code is new it's kind of heritage in phylib that each
access (read or write) to phydev->state is protected by this lock.
I also once wondered whether it's actually needed but didn't spend
effort so far on challenging this. Seems that now the time has come ..

> Let's go through this.  The above is exactly equivalent to:
> 
> bool phy_is_started(phydev)
> {
> 	int state;
> 
> 	mutex_lock(&phydev->lock);
> 	state = phydev->state;
> 	mutex_unlock(&phydev->lock);
> 
> 	return state >= PHY_UP;
> }
> 
> since when we do the test is irrelevant.  Architectures that Linux
> runs on are single-copy atomic, which means that reading phydev->state
> itself is an atomic operation.  So, the mutex locking around that
> doesn't add to the atomicity of the entire operation.
> 
> How, depending on what you do with the rest of this function depends
> whether the entire operation is safe or not.  For example, let's take
> this code at the end of phy_state_machine():
> 
>         if (phy_polling_mode(phydev) && phy_is_started(phydev))
>                 phy_queue_state_machine(phydev, PHY_STATE_TIME);
> 
> state = PHY_UP
> 		thread 0			thread 1
> 						phy_disconnect()
> 						+-phy_is_started()
> 		phy_is_started()                |
> 						`-phy_stop()
> 						  +-phydev->state = PHY_HALTED
> 						  `-phy_stop_machine()
> 						    `-cancel_delayed_work_sync()
> 		phy_queue_state_machine()
> 		`-mod_delayed_work()
> 
Thanks for describing this scenario, I'll have a closer look at it.

> At this point, the phydev->state_queue() has been added back onto the
> system workqueue despite phy_stop_machine() having been called and
> cancel_delayed_work_sync() called on it.
> 
> The original code in 4.20 did not have this race condition.
> 
> Basically, the lock inside phy_is_started() does nothing useful, and
> I'd say is dangerously misleading.
> 

Heiner

^ permalink raw reply

* RE: [PATCH net-next 1/2] devlink: Return right error code in case of errors for region read
From: Parav Pandit @ 2019-02-12 20:22 UTC (permalink / raw)
  To: David Miller; +Cc: Jiri Pirko, netdev@vger.kernel.org
In-Reply-To: <20190212.112447.1565311464519442439.davem@davemloft.net>



> -----Original Message-----
> From: David Miller <davem@davemloft.net>
> Sent: Tuesday, February 12, 2019 1:25 PM
> To: Parav Pandit <parav@mellanox.com>
> Cc: Jiri Pirko <jiri@mellanox.com>; netdev@vger.kernel.org
> Subject: Re: [PATCH net-next 1/2] devlink: Return right error code in case of
> errors for region read
> 
> From: Parav Pandit <parav@mellanox.com>
> Date: Tue, 12 Feb 2019 01:09:19 -0600
> 
> > devlink_nl_cmd_region_read_dumpit() misses to return right error code
> > on most error conditions.
> > Return the right error code on such errors.
> >
> > Fixes: 4e54795a27f5 ("devlink: Add support for region snapshot read
> > command")
> > Signed-off-by: Parav Pandit <parav@mellanox.com>
> > Acked-by: Jiri Pirko <jiri@mellanox.com>
> 
> This has conflicts with the locking changes I applied earlier today.
> 
> Please respin, thanks.
Ok. sending rebased v2.

^ permalink raw reply

* [PATCHv2 net-next 0/2] devlink: 2 fixes for devlink region read
From: Parav Pandit @ 2019-02-12 20:22 UTC (permalink / raw)
  To: jiri, davem, netdev; +Cc: parav

This 2 patches consist of fixes for devlink region read handling.

Signed-off-by: Parav Pandit <parav@mellanox.com>
---
v0->v1:
 - Fixed typo from user to use
v1->v2:
 - Rebased

Parav Pandit (2):
  devlink: Return right error code in case of errors for region read
  devlink: Fix list access without lock while reading region

 net/core/devlink.c | 33 ++++++++++++++++++++++++---------
 1 file changed, 24 insertions(+), 9 deletions(-)

-- 
1.8.3.1


^ permalink raw reply

* [PATCHv2 net-next 1/2] devlink: Return right error code in case of errors for region read
From: Parav Pandit @ 2019-02-12 20:23 UTC (permalink / raw)
  To: jiri, davem, netdev; +Cc: parav

devlink_nl_cmd_region_read_dumpit() misses to return right error code on
most error conditions.
Return the right error code on such errors.

Fixes: 4e54795a27f5 ("devlink: Add support for region snapshot read command")
Signed-off-by: Parav Pandit <parav@mellanox.com>
---
v1->v2:
 - Rebased
---
 net/core/devlink.c | 26 +++++++++++++++++++-------
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/net/core/devlink.c b/net/core/devlink.c
index 283c3ed..312084f 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3646,26 +3646,34 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
 		goto out_free;
 
 	devlink = devlink_get_from_attrs(sock_net(cb->skb->sk), attrs);
-	if (IS_ERR(devlink))
+	if (IS_ERR(devlink)) {
+		err = PTR_ERR(devlink);
 		goto out_free;
+	}
 
 	mutex_lock(&devlink_mutex);
 	mutex_lock(&devlink->lock);
 
 	if (!attrs[DEVLINK_ATTR_REGION_NAME] ||
-	    !attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID])
+	    !attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]) {
+		err = -EINVAL;
 		goto out_unlock;
+	}
 
 	region_name = nla_data(attrs[DEVLINK_ATTR_REGION_NAME]);
 	region = devlink_region_get_by_name(devlink, region_name);
-	if (!region)
+	if (!region) {
+		err = -EINVAL;
 		goto out_unlock;
+	}
 
 	hdr = genlmsg_put(skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq,
 			  &devlink_nl_family, NLM_F_ACK | NLM_F_MULTI,
 			  DEVLINK_CMD_REGION_READ);
-	if (!hdr)
+	if (!hdr) {
+		err = -EMSGSIZE;
 		goto out_unlock;
+	}
 
 	err = devlink_nl_put_handle(skb, devlink);
 	if (err)
@@ -3676,8 +3684,10 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
 		goto nla_put_failure;
 
 	chunks_attr = nla_nest_start(skb, DEVLINK_ATTR_REGION_CHUNKS);
-	if (!chunks_attr)
+	if (!chunks_attr) {
+		err = -EMSGSIZE;
 		goto nla_put_failure;
+	}
 
 	if (attrs[DEVLINK_ATTR_REGION_CHUNK_ADDR] &&
 	    attrs[DEVLINK_ATTR_REGION_CHUNK_LEN]) {
@@ -3700,8 +3710,10 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
 		goto nla_put_failure;
 
 	/* Check if there was any progress done to prevent infinite loop */
-	if (ret_offset == start_offset)
+	if (ret_offset == start_offset) {
+		err = -EINVAL;
 		goto nla_put_failure;
+	}
 
 	*((u64 *)&cb->args[0]) = ret_offset;
 
@@ -3720,7 +3732,7 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
 	mutex_unlock(&devlink_mutex);
 out_free:
 	kfree(attrs);
-	return 0;
+	return err;
 }
 
 struct devlink_info_req {
-- 
1.8.3.1


^ permalink raw reply related

* [PATCHv2 net-next 2/2] devlink: Fix list access without lock while reading region
From: Parav Pandit @ 2019-02-12 20:24 UTC (permalink / raw)
  To: jiri, davem, netdev; +Cc: parav

While finding the devlink device during region reading,
devlink device list is accessed and devlink device is
returned without holding a lock. This could lead to use-after-free
accesses.

While at it, add lockdep assert to ensure that all future callers hold
the lock when calling devlink_get_from_attrs().

Fixes: 4e54795a27f5 ("devlink: Add support for region snapshot read command")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
v0->v1:
 - Fixed typo from user to use
---
 net/core/devlink.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/core/devlink.c b/net/core/devlink.c
index 312084f..1d7502a 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -116,6 +116,8 @@ static struct devlink *devlink_get_from_attrs(struct net *net,
 	busname = nla_data(attrs[DEVLINK_ATTR_BUS_NAME]);
 	devname = nla_data(attrs[DEVLINK_ATTR_DEV_NAME]);
 
+	lockdep_assert_held(&devlink_mutex);
+
 	list_for_each_entry(devlink, &devlink_list, list) {
 		if (strcmp(devlink->dev->bus->name, busname) == 0 &&
 		    strcmp(dev_name(devlink->dev), devname) == 0 &&
@@ -3645,13 +3647,13 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
 	if (err)
 		goto out_free;
 
+	mutex_lock(&devlink_mutex);
 	devlink = devlink_get_from_attrs(sock_net(cb->skb->sk), attrs);
 	if (IS_ERR(devlink)) {
 		err = PTR_ERR(devlink);
-		goto out_free;
+		goto out_dev;
 	}
 
-	mutex_lock(&devlink_mutex);
 	mutex_lock(&devlink->lock);
 
 	if (!attrs[DEVLINK_ATTR_REGION_NAME] ||
@@ -3729,6 +3731,7 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
 	genlmsg_cancel(skb, hdr);
 out_unlock:
 	mutex_unlock(&devlink->lock);
+out_dev:
 	mutex_unlock(&devlink_mutex);
 out_free:
 	kfree(attrs);
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH net] net: fix possible overflow in __sk_mem_raise_allocated()
From: Eric Dumazet @ 2019-02-12 20:26 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric Dumazet, Eric Dumazet

With many active TCP sockets, fat TCP sockets could fool
__sk_mem_raise_allocated() thanks to an overflow.

They would increase their share of the memory, instead
of decreasing it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/sock.h | 2 +-
 net/core/sock.c    | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 2b229f7be8ebbc160706012f7ed03db85c5689d0..f43f935cb113b73c6fc0df35b5f43103ba131ab2 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1277,7 +1277,7 @@ static inline void sk_sockets_allocated_inc(struct sock *sk)
 	percpu_counter_inc(sk->sk_prot->sockets_allocated);
 }
 
-static inline int
+static inline u64
 sk_sockets_allocated_read_positive(struct sock *sk)
 {
 	return percpu_counter_read_positive(sk->sk_prot->sockets_allocated);
diff --git a/net/core/sock.c b/net/core/sock.c
index 6aa2e7e0b4fbdbc29d43d6b61a53b8de2a7ba269..bc3512f230a304c97c519a82c69d1e86f115b651 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2380,7 +2380,7 @@ int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind)
 	}
 
 	if (sk_has_memory_pressure(sk)) {
-		int alloc;
+		u64 alloc;
 
 		if (!sk_under_memory_pressure(sk))
 			return 1;
-- 
2.20.1.791.gb4d0f1c61a-goog


^ permalink raw reply related

* Re: [PATCH bpf-next 4/4] selftests/bpf: Test static data relocation
From: Joe Stringer @ 2019-02-12 20:43 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Joe Stringer, bpf, netdev, Daniel Borkmann, ast
In-Reply-To: <20190212050113.qkryxb34vpnst6w6@ast-mbp>

On Mon, Feb 11, 2019 at 9:01 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Feb 11, 2019 at 04:47:29PM -0800, Joe Stringer wrote:
> > Add tests for libbpf relocation of static variable references into the
> > .data and .bss sections of the ELF.
> >
> > Signed-off-by: Joe Stringer <joe@wand.net.nz>
> ...
> > +#define __fetch(x) (__u32)(&(x))
> > +
> > +static __u32 static_bss = 0; /* Reloc reference to .bss section */
> > +static __u32 static_data = 42;       /* Reloc reference to .data section */
> > +
> > +/**
> > + * Load a u32 value from a static variable into a map, for the userland test
> > + * program to validate.
> > + */
> > +SEC("static_data_load")
> > +int load_static_data(struct __sk_buff *skb)
> > +{
> > +     __u32 key, value;
> > +
> > +     key = 0;
> > +     value = __fetch(static_bss);
>
> If we proceed with this approach we will not be able to add support
> for normal 'value = static_bss;' C code in the future.

Hmm, completely agree that breaking future use of standard code is a
non-starter.

Digging around a bit more, I think I could drop the .bss patch/code
here and still end up with something that will work for my use case.
Just need to ensure that all template values are non-zero when run
through the compiler.

> Let's figure out the way to do it right from the start.
> Support for global and static variables is must have feature to add asap,
> but let's not cut the corner like this.
> We did such hacks in the past and every time it came back to bite us.

Do you see any value in having incremental support in libbpf that
could be used as a fallback for older kernels like in patch #2 of this
series? I could imagine libbpf probing kernel support for
global/static variables and attempting to handle references to .data
via some more comprehensive mechanism in-kernel, or falling back to
this approach if it is not available.

^ permalink raw reply

* Re: [PATCH] net: stmmac: Add SMC support for EMAC System Manager register
From: Thor Thayer @ 2019-02-12 20:52 UTC (permalink / raw)
  To: Ooi, Joyce, Giuseppe Cavallaro, Alexandre Torgue, Jose Abreu,
	David S. Miller, Maxime Coquelin
  Cc: netdev, linux-stm32, linux-arm-kernel, linux-kernel,
	See Chin Liang
In-Reply-To: <1549988692-4124-1-git-send-email-joyce.ooi@intel.com>

Hi Joyce,

On 2/12/19 10:24 AM, Ooi, Joyce wrote:
> As there is restriction to access to EMAC System Manager registers in
> the kernel for Intel Stratix10, the use of SMC calls are required and
> added in dwmac-socfpga driver.
> 
> Signed-off-by: Ooi, Joyce <joyce.ooi@intel.com>

<snip>

I have a pending patchset[1] that addresses this but can be used by 
other drivers as well.

[1] https://patchwork.kernel.org/cover/10612891/


^ permalink raw reply

* Re: [PATCH] rpc: properly check debugfs dentry before using it
From: Schumaker, Anna @ 2019-02-12 20:52 UTC (permalink / raw)
  To: bfields@fieldses.org, trond.myklebust@hammerspace.com,
	gregkh@linuxfoundation.org, jlayton@kernel.org
  Cc: netdev@vger.kernel.org, linux-nfs@vger.kernel.org,
	dhowells@redhat.com
In-Reply-To: <20190212182734.GA4509@kroah.com>

On Tue, 2019-02-12 at 19:27 +0100, Greg Kroah-Hartman wrote:
> debugfs can now report an error code if something went wrong instead of
> just NULL.  So if the return value is to be used as a "real" dentry, it
> needs to be checked if it is an error before dereferencing it.
> 
> This is now happening because of ff9fb72bc077 ("debugfs: return error
> values, not NULL"), but why debugfs files are not being created properly
> is an older issue, probably one that has always been there and should
> probably be looked at...
> 
> Cc: "J. Bruce Fields" <bfields@fieldses.org>
> Cc: Jeff Layton <jlayton@kernel.org>
> Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
> Cc: Anna Schumaker <anna.schumaker@netapp.com>
> Cc: linux-nfs@vger.kernel.org
> Cc: netdev@vger.kernel.org
> Reported-by: David Howells <dhowells@redhat.com>
> Tested-by: David Howells <dhowells@redhat.com>
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> 
> ---
>  net/sunrpc/debugfs.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> I can take this through my tree if people don't object, or it can go
> through the NFS tree.  It does need to get merged before 5.0-final
> though.

I'm planning another bugfixes pull for 5.0, so I can take this patch and send it
with the others this week.

Thanks!
Anna

> 
> I also have a "larger" debugfs cleanup patch for this file, but that's
> not really 5.0-final material and I will send it out later.
> 
> thanks,
> 
> greg k-h
> 
> diff --git a/net/sunrpc/debugfs.c b/net/sunrpc/debugfs.c
> index 45a033329cd4..19bb356230ed 100644
> --- a/net/sunrpc/debugfs.c
> +++ b/net/sunrpc/debugfs.c
> @@ -146,7 +146,7 @@ rpc_clnt_debugfs_register(struct rpc_clnt *clnt)
>         rcu_read_lock();
>         xprt = rcu_dereference(clnt->cl_xprt);
>         /* no "debugfs" dentry? Don't bother with the symlink. */
> -       if (!xprt->debugfs) {
> +       if (IS_ERR_OR_NULL(xprt->debugfs)) {
>                 rcu_read_unlock();
>                 return;
>         }

^ permalink raw reply

* Re: [PATCH net] dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit
From: Heiner Kallweit @ 2019-02-12 20:54 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: Andrew Lunn, John David Anglin, Vivien Didelot, Florian Fainelli,
	netdev
In-Reply-To: <20190212163017.lwstmgtyw76cwrd7@shell.armlinux.org.uk>

On 12.02.2019 17:30, Russell King - ARM Linux admin wrote:
> On Tue, Feb 12, 2019 at 07:51:05AM +0100, Heiner Kallweit wrote:
>> On 12.02.2019 04:58, Andrew Lunn wrote:
>>> That change means we don't check the PHY device if it caused an
>>> interrupt when its state is less than UP.
>>>
>>> What i'm seeing is that the PHY is interrupting pretty early on after
>>> a reboot when the previous boot had the interface up.
>>>
>> So this means that when going down for reboot the interrupts are not
>> properly masked / disabled? Because (at least for net-next) we enable
>> interrupts in phy_start() only.
> 
[..]
> In looking at this, I came across this chunk of code:
> 
> static inline bool __phy_is_started(struct phy_device *phydev)
> {
>         WARN_ON(!mutex_is_locked(&phydev->lock));
> 
>         return phydev->state >= PHY_UP;
> }
> 
> /**
>  * phy_is_started - Convenience function to check whether PHY is started
>  * @phydev: The phy_device struct
>  */
> static inline bool phy_is_started(struct phy_device *phydev)
> {
>         bool started;
> 
>         mutex_lock(&phydev->lock);
>         started = __phy_is_started(phydev);
>         mutex_unlock(&phydev->lock);
> 
>         return started;
> }
> 
> which looks to me like over-complication.  The mutex locking there is
> completely pointless - what are you trying to achieve with it?
> 
> Let's go through this.  The above is exactly equivalent to:
> 
> bool phy_is_started(phydev)
> {
> 	int state;
> 
> 	mutex_lock(&phydev->lock);
> 	state = phydev->state;
> 	mutex_unlock(&phydev->lock);
> 
> 	return state >= PHY_UP;
> }
> 
> since when we do the test is irrelevant.  Architectures that Linux
> runs on are single-copy atomic, which means that reading phydev->state
> itself is an atomic operation.  So, the mutex locking around that
> doesn't add to the atomicity of the entire operation.
> 
> How, depending on what you do with the rest of this function depends
> whether the entire operation is safe or not.  For example, let's take
> this code at the end of phy_state_machine():
> 
>         if (phy_polling_mode(phydev) && phy_is_started(phydev))
>                 phy_queue_state_machine(phydev, PHY_STATE_TIME);
> 
> state = PHY_UP
> 		thread 0			thread 1
> 						phy_disconnect()
> 						+-phy_is_started()
> 		phy_is_started()                |
> 						`-phy_stop()
> 						  +-phydev->state = PHY_HALTED
> 						  `-phy_stop_machine()
> 						    `-cancel_delayed_work_sync()
> 		phy_queue_state_machine()
> 		`-mod_delayed_work()
> 
> At this point, the phydev->state_queue() has been added back onto the
> system workqueue despite phy_stop_machine() having been called and
> cancel_delayed_work_sync() called on it.
> 
> The original code in 4.20 did not have this race condition.
> 
> Basically, the lock inside phy_is_started() does nothing useful, and
> I'd say is dangerously misleading.
> 
Then idea would be to first remove the locking from phy_is_started()
and in a second step do the following to prevent the described race
(phy_stop() takes phydev->lock too).

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index c1ed03800..69dc64a4d 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -957,8 +957,10 @@ void phy_state_machine(struct work_struct *work)
         * state machine would be pointless and possibly error prone when
         * called from phy_disconnect() synchronously.
         */
+       mutex_lock(&phydev->lock);
        if (phy_polling_mode(phydev) && phy_is_started(phydev))
                phy_queue_state_machine(phydev, PHY_STATE_TIME);
+       mutex_unlock(&phydev->lock);
 }

Heiner

^ permalink raw reply related

* Re: [PATCH net-next 3/4] net: phy: Extract genphy_c45_pma_read_abilities from marvell10g
From: Andrew Lunn @ 2019-02-12 21:11 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: davem, netdev, linux-kernel, Florian Fainelli, Heiner Kallweit,
	Russell King, linux-arm-kernel, Antoine Tenart, thomas.petazzoni,
	gregory.clement, miquel.raynal, nadavh, stefanc, mw
In-Reply-To: <20190211142529.22885-4-maxime.chevallier@bootlin.com>

On Mon, Feb 11, 2019 at 03:25:28PM +0100, Maxime Chevallier wrote:
> Marvell 10G PHY driver has a generic way of initializing the supported
> link modes by reading the PHY's C45 PMA abilities. This can be made
> generic, since these registers are part of the 802.3 specifications.
> 
> This commit extracts the config_init link_mode initialization code from
> marvell10g and uses it to introduce the genphy_c45_pma_read_abilities
> function.
> 
> Only PMA modes are read, it's still up to the caller to set the Pause
> parameters.
> 
> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

> -	__set_bit(ETHTOOL_LINK_MODE_Pause_BIT, supported);
> -	__set_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT, supported);
> +	__set_bit(ETHTOOL_LINK_MODE_Pause_BIT, phydev->supported);
> +	__set_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT, phydev->supported);

I think someone already pointed out, this is not ideal. The PHY driver
should only set pause bits, if it needs odd pause settings because of
HW limitations. If no bits are set, the core will set both bits.

But this is not a new problem introduced by this patch, so it can be
fixed later, rather than hold up this patchset.

      Andrew

^ permalink raw reply

* Re: [PATCH net-next 4/4] net: phy: Add generic support for 2.5GBaseT and 5GBaseT
From: Andrew Lunn @ 2019-02-12 21:13 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: davem, netdev, linux-kernel, Florian Fainelli, Heiner Kallweit,
	Russell King, linux-arm-kernel, Antoine Tenart, thomas.petazzoni,
	gregory.clement, miquel.raynal, nadavh, stefanc, mw
In-Reply-To: <20190211142529.22885-5-maxime.chevallier@bootlin.com>

On Mon, Feb 11, 2019 at 03:25:29PM +0100, Maxime Chevallier wrote:
> The 802.3bz specification, based on previous by the NBASET alliance,
> defines the 2.5GBaseT and 5GBaseT link modes for ethernet traffic on
> cat5e, cat6 and cat7 cables.
> 
> These mode integrate with the already defined C45 MDIO PMA/PMD registers
> set that added 10G support, by defining some previously reserved bits,
> and adding a new register (2.5G/5G Extended abilities).
> 
> This commit adds the required definitions in include/uapi/linux/mdio.h
> to support these modes, and detect when a link-partner advertises them.
> 
> It also adds support for these mode in the generic C45 PHY
> infrastructure.
> 
> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH net-next 0/3] Remove getting SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS
From: Florian Fainelli @ 2019-02-12 21:35 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, idosch, linux-kernel, devel, bridge, jiri, andrew,
	vivien.didelot
In-Reply-To: <6ff1b9c2-1206-f568-8556-28efe852dd08@gmail.com>

On 2/12/19 9:54 AM, Florian Fainelli wrote:
> On 2/12/19 9:50 AM, David Miller wrote:
>> From: Florian Fainelli <f.fainelli@gmail.com>
>> Date: Mon, 11 Feb 2019 13:17:46 -0800
>>
>>> AFAICT there is no code that attempts to get the value of the attribute
>>> SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS while it is used with
>>> switchdev_port_attr_set().
>>>
>>> This is effectively no doing anything and it can slow down future work
>>> that tries to make modifications in these areas so remove that.
>>
>> Series applied.
>>
>>> David, there should be no dependency with previous patch series, but
>>> again, feedback from Ido and Jiri would be welcome in case this was
>>> added for a reason.
>>
>> Ok, is there going to be another respin of that switchdev_ops removal
>> series?
> 
> Yes, I will be working on a v5 which addresses Ido's feedback in the
> next hours.

David, looks like I managed to introduce a build failure with
rocker_ofdpa.c, sorry about that, I will send a follow-up immediately.
-- 
Florian

^ permalink raw reply

* [PATCH net-next] net: sched: flower: only return error from hw offload if skip_sw
From: Vlad Buslov @ 2019-02-12 21:39 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, pablo, Vlad Buslov

Recently introduced tc_setup_flow_action() can fail when parsing tcf_exts
on some unsupported action commands. However, this should not affect the
case when user did not explicitly request hw offload by setting skip_sw
flag. Modify tc_setup_flow_action() callers to only propagate the error if
skip_sw flag is set for filter that is being offloaded, and set extack
error message in that case.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Fixes: 3a7b68617de7 ("cls_api: add translator to flow_action representation")
---
 net/sched/cls_flower.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 6a341287a527..f695048455a5 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -396,7 +396,11 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
 	err = tc_setup_flow_action(&cls_flower.rule->action, &f->exts);
 	if (err) {
 		kfree(cls_flower.rule);
-		return err;
+		if (skip_sw) {
+			NL_SET_ERR_MSG_MOD(extack, "Failed to setup flow action");
+			return err;
+		}
+		return 0;
 	}
 
 	err = tc_setup_cb_call(block, TC_SETUP_CLSFLOWER, &cls_flower, skip_sw);
@@ -1499,7 +1503,11 @@ static int fl_reoffload(struct tcf_proto *tp, bool add, tc_setup_cb_t *cb,
 						   &f->exts);
 			if (err) {
 				kfree(cls_flower.rule);
-				return err;
+				if (tc_skip_sw(f->flags)) {
+					NL_SET_ERR_MSG_MOD(extack, "Failed to setup flow action");
+					return err;
+				}
+				continue;
 			}
 
 			cls_flower.classid = f->res.classid;
-- 
2.13.6


^ permalink raw reply related

* [PATCH net-next] rocker: Remove port_attr_bridge_flags_get assignment
From: Florian Fainelli @ 2019-02-12 21:39 UTC (permalink / raw)
  To: netdev; +Cc: Florian Fainelli, Jiri Pirko, David S. Miller, open list

After 610d2b601bba ("rocker: Remove getting PORT_BRIDGE_FLAGS") we no
longer have a port_attr_bridge_flags_get member in the rocker_world_ops
structre, fix that.

Fixes: 610d2b601bba ("rocker: Remove getting PORT_BRIDGE_FLAGS")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/rocker/rocker_ofdpa.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/rocker/rocker_ofdpa.c b/drivers/net/ethernet/rocker/rocker_ofdpa.c
index 9c07f6040720..fa296a7c255d 100644
--- a/drivers/net/ethernet/rocker/rocker_ofdpa.c
+++ b/drivers/net/ethernet/rocker/rocker_ofdpa.c
@@ -2813,7 +2813,6 @@ struct rocker_world_ops rocker_ofdpa_ops = {
 	.port_stop = ofdpa_port_stop,
 	.port_attr_stp_state_set = ofdpa_port_attr_stp_state_set,
 	.port_attr_bridge_flags_set = ofdpa_port_attr_bridge_flags_set,
-	.port_attr_bridge_flags_get = ofdpa_port_attr_bridge_flags_get,
 	.port_attr_bridge_flags_support_get = ofdpa_port_attr_bridge_flags_support_get,
 	.port_attr_bridge_ageing_time_set = ofdpa_port_attr_bridge_ageing_time_set,
 	.port_obj_vlan_add = ofdpa_port_obj_vlan_add,
-- 
2.17.1


^ permalink raw reply related

* [PATCH v4 02/14] linux/net.h: use DYNAMIC_DEBUG_BRANCH in net_dbg_ratelimited
From: Rasmus Villemoes @ 2019-02-12 21:41 UTC (permalink / raw)
  To: Andrew Morton, Jason Baron; +Cc: linux-kernel, Rasmus Villemoes, netdev
In-Reply-To: <20190212214150.4807-1-linux@rasmusvillemoes.dk>

net_dbg_ratelimited tests the dynamic debug descriptor the
old-fashioned way, and doesn't utilize the static key/jump label
implementation when CONFIG_JUMP_LABEL is set. Use the
DYNAMIC_DEBUG_BRANCH which is defined appropriately.

Cc: netdev@vger.kernel.org
Acked-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
---
 include/linux/net.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/net.h b/include/linux/net.h
index e0930678c8bf..651fca72286c 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -263,7 +263,7 @@ do {								\
 #define net_dbg_ratelimited(fmt, ...)					\
 do {									\
 	DEFINE_DYNAMIC_DEBUG_METADATA(descriptor, fmt);			\
-	if (unlikely(descriptor.flags & _DPRINTK_FLAGS_PRINT) &&	\
+	if (DYNAMIC_DEBUG_BRANCH(descriptor) &&				\
 	    net_ratelimit())						\
 		__dynamic_pr_debug(&descriptor, pr_fmt(fmt),		\
 		                   ##__VA_ARGS__);			\
-- 
2.20.1


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox