* Re: [PATCH net-next 2/2] net: phy: broadcom: add 1000Base-X support for BCM54616S
From: Tao Ren @ 2019-07-31 5:55 UTC (permalink / raw)
To: Andrew Lunn
Cc: Vladimir Oltean, Florian Fainelli, Heiner Kallweit,
David S . Miller, Arun Parameswaran, Justin Chen, netdev, lkml,
Andrew Jeffery, openbmc@lists.ozlabs.org
In-Reply-To: <20190731023417.GD9523@lunn.ch>
On 7/30/19 7:34 PM, Andrew Lunn wrote:
>> Hi Andrew,
>>
>> The BCM54616S PHY on my machine is connected to a BCM5396 switch chip over backplane (1000Base-KX).
>
> Ah, that is different. So the board is using it for RGMII to 1000Base-KX?
>
> phy-mode is about the MAC-PHY link. So in this case RGMII.
Yes. It's RGMII to 1000Base-KX.
> There is no DT way to configure the PHY-Switch link. However, it
> sounds like you have the PHY strapped so it is doing 1000BaseX on the
> PHY-Switch link. So do you actually need to configure this?
The PHY is strapped in RGMII-Fiber Mode (the term used in datasheet), but besides 1000BaseX, 100Base-FX is also supported in this mode.
The datasheet doesn't say which link type (1000BaseX or 100Base-FX) is active after reset and I cannot find a way to auto-detect the link type, either.
Below are a few more details about 1000Base-X and 100Base-FX in BCM54616S datasheet.
- 1000BaseX:
The 1000BaseX register set (MII registers 00-0F) needs to be enabled by setting bit 0 of Mode Control Register.
Although the register address is the same between 1000BaseX and 1000BaseT/100BaseT/10BaseT registers, some bit fields in 1000BaseX registers are different: for example, speed field is not available in 1000BaseX status register.
- 100Base-FX:
100Base-FX registers need to be enabled by writing 1 to SerDes 100FX Control Register.
100Base-FX Control and Status registers are in shadow register 1Ch, instead of MII register 00h and 01h.
> You report you never see link up? So maybe the problem is actually in
> read_status? When in 1000BaseX mode, do you need to read the link
> status from some other register? The Marvell PHYs use a second page
> for 1000BaseX.
read_status callback needs to be updated to report correct link speed in my case. But as I cannot tell which link type (1000BaseX or 100Base-FX) is active, it becomes hard to access registers in read_status method. Any suggestions?
BTW, link-never-up issue seems to be caused by static/dynamic feature detection. I'm still tracing down the issue and it's being tracked in patch #1 of the series.
Thanks,
Tao
^ permalink raw reply
* Re: [RFC] net: phy: read link status twice when phy_check_link_status()
From: Heiner Kallweit @ 2019-07-31 5:44 UTC (permalink / raw)
To: liuyonglong, andrew, davem
Cc: netdev, linux-kernel, linuxarm, salil.mehta, yisen.zhuang,
shiju.jose
In-Reply-To: <6add4874-fd2b-9b21-cd78-80b6dde4dd53@huawei.com>
On 31.07.2019 05:33, liuyonglong wrote:
>
>
> On 2019/7/31 3:04, Heiner Kallweit wrote:
>> On 30.07.2019 08:35, liuyonglong wrote:
>>> :/sys/kernel/debug/tracing$ cat trace
>>> # tracer: nop
>>> #
>>> # entries-in-buffer/entries-written: 45/45 #P:128
>>> #
>>> # _-----=> irqs-off
>>> # / _----=> need-resched
>>> # | / _---=> hardirq/softirq
>>> # || / _--=> preempt-depth
>>> # ||| / delay
>>> # TASK-PID CPU# |||| TIMESTAMP FUNCTION
>>> # | | | |||| | |
>>> kworker/64:2-1028 [064] .... 172.295687: mdio_access: mii-0000:bd:00.0 read phy:0x01 reg:0x02 val:0x001c
>>> kworker/64:2-1028 [064] .... 172.295726: mdio_access: mii-0000:bd:00.0 read phy:0x01 reg:0x03 val:0xc916
>>> kworker/64:2-1028 [064] .... 172.296902: mdio_access: mii-0000:bd:00.0 read phy:0x01 reg:0x01 val:0x79ad
>>> kworker/64:2-1028 [064] .... 172.296938: mdio_access: mii-0000:bd:00.0 read phy:0x01 reg:0x0f val:0x2000
>>> kworker/64:2-1028 [064] .... 172.321213: mdio_access: mii-0000:bd:00.0 read phy:0x01 reg:0x00 val:0x1040
>>> kworker/64:2-1028 [064] .... 172.343209: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x02 val:0x001c
>>> kworker/64:2-1028 [064] .... 172.343245: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x03 val:0xc916
>>> kworker/64:2-1028 [064] .... 172.343882: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x01 val:0x79ad
>>> kworker/64:2-1028 [064] .... 172.343918: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x0f val:0x2000
>>> kworker/64:2-1028 [064] .... 172.362658: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x00 val:0x1040
>>> kworker/64:2-1028 [064] .... 172.385961: mdio_access: mii-0000:bd:00.2 read phy:0x05 reg:0x02 val:0x001c
>>> kworker/64:2-1028 [064] .... 172.385996: mdio_access: mii-0000:bd:00.2 read phy:0x05 reg:0x03 val:0xc916
>>> kworker/64:2-1028 [064] .... 172.386646: mdio_access: mii-0000:bd:00.2 read phy:0x05 reg:0x01 val:0x79ad
>>> kworker/64:2-1028 [064] .... 172.386681: mdio_access: mii-0000:bd:00.2 read phy:0x05 reg:0x0f val:0x2000
>>> kworker/64:2-1028 [064] .... 172.411286: mdio_access: mii-0000:bd:00.2 read phy:0x05 reg:0x00 val:0x1040
>>> kworker/64:2-1028 [064] .... 172.433225: mdio_access: mii-0000:bd:00.3 read phy:0x07 reg:0x02 val:0x001c
>>> kworker/64:2-1028 [064] .... 172.433260: mdio_access: mii-0000:bd:00.3 read phy:0x07 reg:0x03 val:0xc916
>>> kworker/64:2-1028 [064] .... 172.433887: mdio_access: mii-0000:bd:00.3 read phy:0x07 reg:0x01 val:0x79ad
>>> kworker/64:2-1028 [064] .... 172.433922: mdio_access: mii-0000:bd:00.3 read phy:0x07 reg:0x0f val:0x2000
>>> kworker/64:2-1028 [064] .... 172.452862: mdio_access: mii-0000:bd:00.3 read phy:0x07 reg:0x00 val:0x1040
>>> ifconfig-1324 [011] .... 177.325585: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x00 val:0x1040
>>> kworker/u257:0-8 [012] .... 177.325642: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x04 val:0x01e1
>>> kworker/u257:0-8 [012] .... 177.325654: mdio_access: mii-0000:bd:00.1 write phy:0x03 reg:0x04 val:0x05e1
>>> kworker/u257:0-8 [012] .... 177.325708: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x01 val:0x79ad
>>> kworker/u257:0-8 [012] .... 177.325744: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x09 val:0x0200
>>> kworker/u257:0-8 [012] .... 177.325779: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x00 val:0x1040
>>> kworker/u257:0-8 [012] .... 177.325788: mdio_access: mii-0000:bd:00.1 write phy:0x03 reg:0x00 val:0x1240
>>> kworker/u257:0-8 [012] .... 177.325843: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x01 val:0x798d
>>
>> What I think that happens here:
>> Writing 0x1240 to BMCR starts aneg. When reading BMSR immediately after that then the PHY seems to have cleared
>> the "aneg complete" bit already, but not yet the "link up" bit. This results in the false "link up" notification.
>> The following patch is based on the fact that in case of enabled aneg we can't have a valid link if aneg isn't
>> finished. Could you please test whether this works for you?
>>
>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
>> index 6b5cb87f3..7ddd91df9 100644
>> --- a/drivers/net/phy/phy_device.c
>> +++ b/drivers/net/phy/phy_device.c
>> @@ -1774,6 +1774,12 @@ int genphy_update_link(struct phy_device *phydev)
>> phydev->link = status & BMSR_LSTATUS ? 1 : 0;
>> phydev->autoneg_complete = status & BMSR_ANEGCOMPLETE ? 1 : 0;
>>
>> + /* Consider the case that autoneg was started and "aneg complete"
>> + * bit has been reset, but "link up" bit not yet.
>> + */
>> + if (phydev->autoneg == AUTONEG_ENABLE && !phydev->autoneg_complete)
>> + phydev->link = 0;
>> +
>> return 0;
>> }
>> EXPORT_SYMBOL(genphy_update_link);
>>
>
> This patch can solve the issue! Will it be upstream?
>
I'll check for side effects, but in general: yes.
> So it's nothing to do with the bios, and just the PHY's own behavior,
> the "link up" bit can not reset immediately,right?
>
Yes, it's the PHY's own behavior, and to a certain extent it may depend on speed
of the MDIO bus. At least few network chips require a delay of several microseconds
after each MDIO bus access. This may be sufficient for the PHY to reset the
link-up bit in time.
> ps: It will take 1 second more to make the link up for RTL8211F when 0x798d happend.
>
In polling mode link-up is detected up to 1s after it happened.
You could switch to interrupt mode to reduce the aneg time.
>
>
Heiner
^ permalink raw reply
* Re: [PATCH net-next 1/2] net: phy: broadcom: set features explicitly for BCM54616S
From: Heiner Kallweit @ 2019-07-31 5:53 UTC (permalink / raw)
To: Tao Ren, Andrew Lunn
Cc: Florian Fainelli, David S . Miller, Arun Parameswaran,
Justin Chen, Vladimir Oltean, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, Andrew Jeffery,
openbmc@lists.ozlabs.org
In-Reply-To: <f59c2ae9-ef44-1e1b-4ae2-216eb911e92e@fb.com>
On 31.07.2019 02:12, Tao Ren wrote:
> On 7/29/19 11:00 PM, Heiner Kallweit wrote:
>> On 30.07.2019 07:05, Tao Ren wrote:
>>> On 7/29/19 8:35 PM, Andrew Lunn wrote:
>>>> On Mon, Jul 29, 2019 at 05:25:32PM -0700, Tao Ren wrote:
>>>>> BCM54616S feature "PHY_GBIT_FEATURES" was removed by commit dcdecdcfe1fc
>>>>> ("net: phy: switch drivers to use dynamic feature detection"). As dynamic
>>>>> feature detection doesn't work when BCM54616S is working in RGMII-Fiber
>>>>> mode (different sets of MII Control/Status registers being used), let's
>>>>> set "PHY_GBIT_FEATURES" for BCM54616S explicitly.
>>>>
>>>> Hi Tao
>>>>
>>>> What exactly does it get wrong?
>>>>
>>>> Thanks
>>>> Andrew
>>>
>>> Hi Andrew,
>>>
>>> BCM54616S is set to RGMII-Fiber (1000Base-X) mode on my platform, and none of the features (1000BaseT/100BaseT/10BaseT) can be detected by genphy_read_abilities(), because the PHY only reports 1000BaseX_Full|Half ability in this mode.
>>>
>> Are you going to use the PHY in copper or fibre mode?
>> In case you use fibre mode, why do you need the copper modes set as supported?
>> Or does the PHY just start in fibre mode and you want to switch it to copper mode?
>
> Hi Heiner,
>
> The phy starts in fiber mode and that's the mode I want.
> My observation is: phydev->link is always 0 (Link status bit is never set in MII_BMSR) by using dynamic ability detection on my machine. I checked phydev->supported and it's set to "AutoNeg | TP | MII | Pause | Asym_Pause" by dynamic ability detection. Is it normal/expected? Or maybe the fix should go to different places? Thank you for your help.
>
Not sure whether you stated already which kernel version you're using.
There's a brand-new extension to auto-detect 1000BaseX:
f30e33bcdab9 ("net: phy: Add more 1000BaseX support detection")
It's included in the 5.3-rc series.
If a feature can be read from a vendor-specific register only,
then the preferred way is: Implement callback get_features in
the PHY driver, call genphy_read_abilities for the basic features
and complement it with reading the vendor-specific register(s).
>
> Thanks,
>
> Tao
>
Heiner
^ permalink raw reply
* [PATCH net-next v2 4/4] net: ftgmac100: Select ASPEED MDIO driver for the AST2600
From: Andrew Jeffery @ 2019-07-31 5:39 UTC (permalink / raw)
To: netdev
Cc: Andrew Jeffery, davem, robh+dt, mark.rutland, joel, andrew,
f.fainelli, hkallweit1, devicetree, linux-arm-kernel,
linux-aspeed, linux-kernel
In-Reply-To: <20190731053959.16293-1-andrew@aj.id.au>
Ensures we can talk to a PHY via MDIO on the AST2600, as the MDIO
controller is now separate from the MAC.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
---
drivers/net/ethernet/faraday/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/faraday/Kconfig b/drivers/net/ethernet/faraday/Kconfig
index a9b105803fb7..73e4f2648e49 100644
--- a/drivers/net/ethernet/faraday/Kconfig
+++ b/drivers/net/ethernet/faraday/Kconfig
@@ -32,6 +32,7 @@ config FTGMAC100
depends on ARM || NDS32 || COMPILE_TEST
depends on !64BIT || BROKEN
select PHYLIB
+ select MDIO_ASPEED if MACH_ASPEED_G6
---help---
This driver supports the FTGMAC100 Gigabit Ethernet controller
from Faraday. It is used on Faraday A369, Andes AG102 and some
--
2.20.1
^ permalink raw reply related
* [PATCH net-next v2 3/4] net: ftgmac100: Add support for DT phy-handle property
From: Andrew Jeffery @ 2019-07-31 5:39 UTC (permalink / raw)
To: netdev
Cc: Andrew Jeffery, davem, robh+dt, mark.rutland, joel, andrew,
f.fainelli, hkallweit1, devicetree, linux-arm-kernel,
linux-aspeed, linux-kernel
In-Reply-To: <20190731053959.16293-1-andrew@aj.id.au>
phy-handle is necessary for the AST2600 which separates the MDIO
controllers from the MAC.
I've tried to minimise the intrusion of supporting the AST2600 to the
FTGMAC100 by leaving in place the existing MDIO support for the embedded
MDIO interface. The AST2400 and AST2500 continue to be supported this
way, as it avoids breaking/reworking existing devicetrees.
The AST2600 support by contrast requires the presence of the phy-handle
property in the MAC devicetree node to specify the appropriate PHY to
associate with the MAC. In the event that someone wants to specify the
MDIO bus topology under the MAC node on an AST2400 or AST2500, the
current auto-probe approach is done conditional on the absence of an
"mdio" child node of the MAC.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
---
drivers/net/ethernet/faraday/ftgmac100.c | 37 +++++++++++++++++++++---
1 file changed, 33 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c
index 030fed65393e..563384b788ab 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -17,6 +17,7 @@
#include <linux/module.h>
#include <linux/netdevice.h>
#include <linux/of.h>
+#include <linux/of_mdio.h>
#include <linux/phy.h>
#include <linux/platform_device.h>
#include <linux/property.h>
@@ -1619,8 +1620,13 @@ static int ftgmac100_setup_mdio(struct net_device *netdev)
if (!priv->mii_bus)
return -EIO;
- if (priv->is_aspeed) {
- /* This driver supports the old MDIO interface */
+ if (of_device_is_compatible(np, "aspeed,ast2400-mac") ||
+ of_device_is_compatible(np, "aspeed,ast2500-mac")) {
+ /* The AST2600 has a separate MDIO controller */
+
+ /* For the AST2400 and AST2500 this driver only supports the
+ * old MDIO interface
+ */
reg = ioread32(priv->base + FTGMAC100_OFFSET_REVR);
reg &= ~FTGMAC100_REVR_NEW_MDIO_INTERFACE;
iowrite32(reg, priv->base + FTGMAC100_OFFSET_REVR);
@@ -1797,7 +1803,8 @@ static int ftgmac100_probe(struct platform_device *pdev)
np = pdev->dev.of_node;
if (np && (of_device_is_compatible(np, "aspeed,ast2400-mac") ||
- of_device_is_compatible(np, "aspeed,ast2500-mac"))) {
+ of_device_is_compatible(np, "aspeed,ast2500-mac") ||
+ of_device_is_compatible(np, "aspeed,ast2600-mac"))) {
priv->rxdes0_edorr_mask = BIT(30);
priv->txdes0_edotr_mask = BIT(30);
priv->is_aspeed = true;
@@ -1817,7 +1824,29 @@ static int ftgmac100_probe(struct platform_device *pdev)
priv->ndev = ncsi_register_dev(netdev, ftgmac100_ncsi_handler);
if (!priv->ndev)
goto err_ncsi_dev;
- } else {
+ } else if (np && of_get_property(np, "phy-handle", NULL)) {
+ struct phy_device *phy;
+
+ phy = of_phy_get_and_connect(priv->netdev, np,
+ &ftgmac100_adjust_link);
+ if (!phy) {
+ dev_err(&pdev->dev, "Failed to connect to phy\n");
+ goto err_setup_mdio;
+ }
+
+ /* Indicate that we support PAUSE frames (see comment in
+ * Documentation/networking/phy.txt)
+ */
+ phy_support_asym_pause(phy);
+
+ /* Display what we found */
+ phy_attached_info(phy);
+ } else if (np && !of_get_child_by_name(np, "mdio")) {
+ /* Support legacy ASPEED devicetree descriptions that decribe a
+ * MAC with an embedded MDIO controller but have no "mdio"
+ * child node. Automatically scan the MDIO bus for available
+ * PHYs.
+ */
priv->use_ncsi = false;
err = ftgmac100_setup_mdio(netdev);
if (err)
--
2.20.1
^ permalink raw reply related
* [PATCH net-next v2 2/4] net: phy: Add mdio-aspeed
From: Andrew Jeffery @ 2019-07-31 5:39 UTC (permalink / raw)
To: netdev
Cc: Andrew Jeffery, davem, robh+dt, mark.rutland, joel, andrew,
f.fainelli, hkallweit1, devicetree, linux-arm-kernel,
linux-aspeed, linux-kernel
In-Reply-To: <20190731053959.16293-1-andrew@aj.id.au>
The AST2600 design separates the MDIO controllers from the MAC, which is
where they were placed in the AST2400 and AST2500. Further, the register
interface is reworked again, so now we have three possible different
interface implementations, however this driver only supports the
interface provided by the AST2600. The AST2400 and AST2500 will continue
to be supported by the MDIO support embedded in the FTGMAC100 driver.
The hardware supports both C22 and C45 mode, but for the moment only C22
support is implemented.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
---
v2:
* Use readl_poll_timeout() instead of open-coded loop
* Error on C45 accesses
---
drivers/net/phy/Kconfig | 13 +++
drivers/net/phy/Makefile | 1 +
drivers/net/phy/mdio-aspeed.c | 157 ++++++++++++++++++++++++++++++++++
3 files changed, 171 insertions(+)
create mode 100644 drivers/net/phy/mdio-aspeed.c
diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index 20f14c5fbb7e..206d8650ee7f 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -21,6 +21,19 @@ config MDIO_BUS
if MDIO_BUS
+config MDIO_ASPEED
+ tristate "ASPEED MDIO bus controller"
+ depends on ARCH_ASPEED || COMPILE_TEST
+ depends on OF_MDIO && HAS_IOMEM
+ help
+ This module provides a driver for the independent MDIO bus
+ controllers found in the ASPEED AST2600 SoC. This is a driver for the
+ third revision of the ASPEED MDIO register interface - the first two
+ revisions are the "old" and "new" interfaces found in the AST2400 and
+ AST2500, embedded in the MAC. For legacy reasons, FTGMAC100 driver
+ continues to drive the embedded MDIO controller for the AST2400 and
+ AST2500 SoCs, so say N if AST2600 support is not required.
+
config MDIO_BCM_IPROC
tristate "Broadcom iProc MDIO bus controller"
depends on ARCH_BCM_IPROC || COMPILE_TEST
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index 839acb292c38..ba07c27e4208 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -22,6 +22,7 @@ libphy-$(CONFIG_LED_TRIGGER_PHY) += phy_led_triggers.o
obj-$(CONFIG_PHYLINK) += phylink.o
obj-$(CONFIG_PHYLIB) += libphy.o
+obj-$(CONFIG_MDIO_ASPEED) += mdio-aspeed.o
obj-$(CONFIG_MDIO_BCM_IPROC) += mdio-bcm-iproc.o
obj-$(CONFIG_MDIO_BCM_UNIMAC) += mdio-bcm-unimac.o
obj-$(CONFIG_MDIO_BITBANG) += mdio-bitbang.o
diff --git a/drivers/net/phy/mdio-aspeed.c b/drivers/net/phy/mdio-aspeed.c
new file mode 100644
index 000000000000..cad820568f75
--- /dev/null
+++ b/drivers/net/phy/mdio-aspeed.c
@@ -0,0 +1,157 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Copyright (C) 2019 IBM Corp. */
+
+#include <linux/bitfield.h>
+#include <linux/delay.h>
+#include <linux/iopoll.h>
+#include <linux/mdio.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/of_mdio.h>
+#include <linux/phy.h>
+#include <linux/platform_device.h>
+
+#define DRV_NAME "mdio-aspeed"
+
+#define ASPEED_MDIO_CTRL 0x0
+#define ASPEED_MDIO_CTRL_FIRE BIT(31)
+#define ASPEED_MDIO_CTRL_ST BIT(28)
+#define ASPEED_MDIO_CTRL_ST_C45 0
+#define ASPEED_MDIO_CTRL_ST_C22 1
+#define ASPEED_MDIO_CTRL_OP GENMASK(27, 26)
+#define MDIO_C22_OP_WRITE 0b01
+#define MDIO_C22_OP_READ 0b10
+#define ASPEED_MDIO_CTRL_PHYAD GENMASK(25, 21)
+#define ASPEED_MDIO_CTRL_REGAD GENMASK(20, 16)
+#define ASPEED_MDIO_CTRL_MIIWDATA GENMASK(15, 0)
+
+#define ASPEED_MDIO_DATA 0x4
+#define ASPEED_MDIO_DATA_MDC_THRES GENMASK(31, 24)
+#define ASPEED_MDIO_DATA_MDIO_EDGE BIT(23)
+#define ASPEED_MDIO_DATA_MDIO_LATCH GENMASK(22, 20)
+#define ASPEED_MDIO_DATA_IDLE BIT(16)
+#define ASPEED_MDIO_DATA_MIIRDATA GENMASK(15, 0)
+
+#define ASPEED_MDIO_INTERVAL_US 100
+#define ASPEED_MDIO_TIMEOUT_US (ASPEED_MDIO_INTERVAL_US * 10)
+
+struct aspeed_mdio {
+ void __iomem *base;
+};
+
+static int aspeed_mdio_read(struct mii_bus *bus, int addr, int regnum)
+{
+ struct aspeed_mdio *ctx = bus->priv;
+ u32 ctrl;
+ u32 data;
+ int rc;
+
+ dev_dbg(&bus->dev, "%s: addr: %d, regnum: %d\n", __func__, addr,
+ regnum);
+
+ /* Just clause 22 for the moment */
+ if (regnum & MII_ADDR_C45)
+ return -EOPNOTSUPP;
+
+ ctrl = ASPEED_MDIO_CTRL_FIRE
+ | FIELD_PREP(ASPEED_MDIO_CTRL_ST, ASPEED_MDIO_CTRL_ST_C22)
+ | FIELD_PREP(ASPEED_MDIO_CTRL_OP, MDIO_C22_OP_READ)
+ | FIELD_PREP(ASPEED_MDIO_CTRL_PHYAD, addr)
+ | FIELD_PREP(ASPEED_MDIO_CTRL_REGAD, regnum);
+
+ iowrite32(ctrl, ctx->base + ASPEED_MDIO_CTRL);
+
+ rc = readl_poll_timeout(ctx->base + ASPEED_MDIO_DATA, data,
+ data & ASPEED_MDIO_DATA_IDLE,
+ ASPEED_MDIO_INTERVAL_US,
+ ASPEED_MDIO_TIMEOUT_US);
+ if (rc < 0)
+ return rc;
+
+ return FIELD_GET(ASPEED_MDIO_DATA_MIIRDATA, data);
+}
+
+static int aspeed_mdio_write(struct mii_bus *bus, int addr, int regnum, u16 val)
+{
+ struct aspeed_mdio *ctx = bus->priv;
+ u32 ctrl;
+
+ dev_dbg(&bus->dev, "%s: addr: %d, regnum: %d, val: 0x%x\n",
+ __func__, addr, regnum, val);
+
+ /* Just clause 22 for the moment */
+ if (regnum & MII_ADDR_C45)
+ return -EOPNOTSUPP;
+
+ ctrl = ASPEED_MDIO_CTRL_FIRE
+ | FIELD_PREP(ASPEED_MDIO_CTRL_ST, ASPEED_MDIO_CTRL_ST_C22)
+ | FIELD_PREP(ASPEED_MDIO_CTRL_OP, MDIO_C22_OP_WRITE)
+ | FIELD_PREP(ASPEED_MDIO_CTRL_PHYAD, addr)
+ | FIELD_PREP(ASPEED_MDIO_CTRL_REGAD, regnum)
+ | FIELD_PREP(ASPEED_MDIO_CTRL_MIIWDATA, val);
+
+ iowrite32(ctrl, ctx->base + ASPEED_MDIO_CTRL);
+
+ return readl_poll_timeout(ctx->base + ASPEED_MDIO_CTRL, ctrl,
+ !(ctrl & ASPEED_MDIO_CTRL_FIRE),
+ ASPEED_MDIO_INTERVAL_US,
+ ASPEED_MDIO_TIMEOUT_US);
+}
+
+static int aspeed_mdio_probe(struct platform_device *pdev)
+{
+ struct aspeed_mdio *ctx;
+ struct mii_bus *bus;
+ int rc;
+
+ bus = devm_mdiobus_alloc_size(&pdev->dev, sizeof(*ctx));
+ if (!bus)
+ return -ENOMEM;
+
+ ctx = bus->priv;
+ ctx->base = devm_platform_ioremap_resource(pdev, 0);
+ if (IS_ERR(ctx->base))
+ return PTR_ERR(ctx->base);
+
+ bus->name = DRV_NAME;
+ snprintf(bus->id, MII_BUS_ID_SIZE, "%s%d", pdev->name, pdev->id);
+ bus->parent = &pdev->dev;
+ bus->read = aspeed_mdio_read;
+ bus->write = aspeed_mdio_write;
+
+ rc = of_mdiobus_register(bus, pdev->dev.of_node);
+ if (rc) {
+ dev_err(&pdev->dev, "Cannot register MDIO bus!\n");
+ return rc;
+ }
+
+ platform_set_drvdata(pdev, bus);
+
+ return 0;
+}
+
+static int aspeed_mdio_remove(struct platform_device *pdev)
+{
+ mdiobus_unregister(platform_get_drvdata(pdev));
+
+ return 0;
+}
+
+static const struct of_device_id aspeed_mdio_of_match[] = {
+ { .compatible = "aspeed,ast2600-mdio", },
+ { },
+};
+
+static struct platform_driver aspeed_mdio_driver = {
+ .driver = {
+ .name = DRV_NAME,
+ .of_match_table = aspeed_mdio_of_match,
+ },
+ .probe = aspeed_mdio_probe,
+ .remove = aspeed_mdio_remove,
+};
+
+module_platform_driver(aspeed_mdio_driver);
+
+MODULE_AUTHOR("Andrew Jeffery <andrew@aj.id.au>");
+MODULE_LICENSE("GPL");
--
2.20.1
^ permalink raw reply related
* [PATCH net-next v2 1/4] dt-bindings: net: Add aspeed,ast2600-mdio binding
From: Andrew Jeffery @ 2019-07-31 5:39 UTC (permalink / raw)
To: netdev
Cc: Andrew Jeffery, davem, robh+dt, mark.rutland, joel, andrew,
f.fainelli, hkallweit1, devicetree, linux-arm-kernel,
linux-aspeed, linux-kernel
In-Reply-To: <20190731053959.16293-1-andrew@aj.id.au>
The AST2600 splits out the MDIO bus controller from the MAC into its own
IP block and rearranges the register layout. Add a new binding to
describe the new hardware.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
---
v2:
* aspeed: Utilise mdio.yaml
* aspeed: Drop status from example
---
.../bindings/net/aspeed,ast2600-mdio.yaml | 45 +++++++++++++++++++
1 file changed, 45 insertions(+)
create mode 100644 Documentation/devicetree/bindings/net/aspeed,ast2600-mdio.yaml
diff --git a/Documentation/devicetree/bindings/net/aspeed,ast2600-mdio.yaml b/Documentation/devicetree/bindings/net/aspeed,ast2600-mdio.yaml
new file mode 100644
index 000000000000..71808e78a495
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/aspeed,ast2600-mdio.yaml
@@ -0,0 +1,45 @@
+# SPDX-License-Identifier: GPL-2.0-or-later
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/net/aspeed,ast2600-mdio.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: ASPEED AST2600 MDIO Controller
+
+maintainers:
+ - Andrew Jeffery <andrew@aj.id.au>
+
+description: |+
+ The ASPEED AST2600 MDIO controller is the third iteration of ASPEED's MDIO
+ bus register interface, this time also separating out the controller from the
+ MAC.
+
+allOf:
+ - $ref: "mdio.yaml#"
+
+properties:
+ compatible:
+ const: aspeed,ast2600-mdio
+ reg:
+ maxItems: 1
+ description: The register range of the MDIO controller instance
+
+required:
+ - compatible
+ - reg
+ - "#address-cells"
+ - "#size-cells"
+
+examples:
+ - |
+ mdio0: mdio@1e650000 {
+ compatible = "aspeed,ast2600-mdio";
+ reg = <0x1e650000 0x8>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ ethphy0: ethernet-phy@0 {
+ compatible = "ethernet-phy-ieee802.3-c22";
+ reg = <0>;
+ };
+ };
--
2.20.1
^ permalink raw reply related
* [PATCH net-next v2 0/4] net: phy: Add AST2600 MDIO support
From: Andrew Jeffery @ 2019-07-31 5:39 UTC (permalink / raw)
To: netdev
Cc: Andrew Jeffery, davem, robh+dt, mark.rutland, joel, andrew,
f.fainelli, hkallweit1, devicetree, linux-arm-kernel,
linux-aspeed, linux-kernel
Hello,
v2 of the ASPEED MDIO series addresses comments from Rob on the devicetree
bindings and Andrew on the driver itself.
v1 of the series can be found here:
http://patchwork.ozlabs.org/cover/1138140/
Please review!
Andrew
Andrew Jeffery (4):
dt-bindings: net: Add aspeed,ast2600-mdio binding
net: phy: Add mdio-aspeed
net: ftgmac100: Add support for DT phy-handle property
net: ftgmac100: Select ASPEED MDIO driver for the AST2600
.../bindings/net/aspeed,ast2600-mdio.yaml | 45 +++++
drivers/net/ethernet/faraday/Kconfig | 1 +
drivers/net/ethernet/faraday/ftgmac100.c | 37 ++++-
drivers/net/phy/Kconfig | 13 ++
drivers/net/phy/Makefile | 1 +
drivers/net/phy/mdio-aspeed.c | 157 ++++++++++++++++++
6 files changed, 250 insertions(+), 4 deletions(-)
create mode 100644 Documentation/devicetree/bindings/net/aspeed,ast2600-mdio.yaml
create mode 100644 drivers/net/phy/mdio-aspeed.c
--
2.20.1
^ permalink raw reply
* Re: [PATCH v2 bpf-next 02/12] libbpf: implement BPF CO-RE offset relocation algorithm
From: Song Liu @ 2019-07-31 5:19 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Andrii Nakryiko, bpf, netdev@vger.kernel.org, Alexei Starovoitov,
daniel@iogearbox.net, Yonghong Song, Kernel Team
In-Reply-To: <CAEf4BzYE9xnyFjmN3+-LgkkOomt383OPNXVhSCO4PncAu20wgw@mail.gmail.com>
> On Jul 30, 2019, at 6:00 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>
> On Tue, Jul 30, 2019 at 5:39 PM Song Liu <songliubraving@fb.com> wrote:
>>
>>
>>
>>> On Jul 30, 2019, at 12:53 PM, Andrii Nakryiko <andriin@fb.com> wrote:
>>>
>>> This patch implements the core logic for BPF CO-RE offsets relocations.
>>> Every instruction that needs to be relocated has corresponding
>>> bpf_offset_reloc as part of BTF.ext. Relocations are performed by trying
>>> to match recorded "local" relocation spec against potentially many
>>> compatible "target" types, creating corresponding spec. Details of the
>>> algorithm are noted in corresponding comments in the code.
>>>
>>> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
>>> ---
>>> tools/lib/bpf/libbpf.c | 915 ++++++++++++++++++++++++++++++++++++++++-
>>> tools/lib/bpf/libbpf.h | 1 +
>>> 2 files changed, 909 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
>
> [...]
>
> Please trim irrelevant parts. It doesn't matter with desktop Gmail,
> but pretty much everywhere else is very hard to work with.
This won't be a problem if the patch is shorter. ;)
>
>>> +
>>> + for (i = 1; i < spec->raw_len; i++) {
>>> + t = skip_mods_and_typedefs(btf, id, &id);
>>> + if (!t)
>>> + return -EINVAL;
>>> +
>>> + access_idx = spec->raw_spec[i];
>>> +
>>> + if (btf_is_composite(t)) {
>>> + const struct btf_member *m = (void *)(t + 1);
>>
>> Why (void *) instead of (const struct btf_member *)? There are a few more
>> in the rest of the patch.
>>
>
> I just picked the most succinct and non-repetitive form. It's
> immediately apparent which type it's implicitly converted to, so I
> felt there is no need to repeat it. Also, just (void *) is much
> shorter. :)
_All_ other code in btf.c converts the pointer to the target type.
In some cases, it is not apparent which type it is converted to,
for example:
+ m = (void *)(targ_type + 1);
I would suggest we do implicit conversion whenever possible.
Thanks,
Song
^ permalink raw reply
* [PATCH] net: sctp: Rename fallthrough label to unhandled
From: Joe Perches @ 2019-07-31 5:04 UTC (permalink / raw)
To: Vlad Yasevich, Neil Horman, Marcelo Ricardo Leitner
Cc: David S. Miller, linux-sctp, netdev, linux-kernel
fallthrough may become a pseudo reserved keyword so this only use of
fallthrough is better renamed to allow it.
Signed-off-by: Joe Perches <joe@perches.com>
---
net/sctp/sm_make_chunk.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index 36bd8a6e82df..3fdcaa2fbf12 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -2152,7 +2152,7 @@ static enum sctp_ierror sctp_verify_param(struct net *net,
case SCTP_PARAM_SET_PRIMARY:
if (net->sctp.addip_enable)
break;
- goto fallthrough;
+ goto unhandled;
case SCTP_PARAM_HOST_NAME_ADDRESS:
/* Tell the peer, we won't support this param. */
@@ -2163,11 +2163,11 @@ static enum sctp_ierror sctp_verify_param(struct net *net,
case SCTP_PARAM_FWD_TSN_SUPPORT:
if (ep->prsctp_enable)
break;
- goto fallthrough;
+ goto unhandled;
case SCTP_PARAM_RANDOM:
if (!ep->auth_enable)
- goto fallthrough;
+ goto unhandled;
/* SCTP-AUTH: Secion 6.1
* If the random number is not 32 byte long the association
@@ -2184,7 +2184,7 @@ static enum sctp_ierror sctp_verify_param(struct net *net,
case SCTP_PARAM_CHUNKS:
if (!ep->auth_enable)
- goto fallthrough;
+ goto unhandled;
/* SCTP-AUTH: Section 3.2
* The CHUNKS parameter MUST be included once in the INIT or
@@ -2200,7 +2200,7 @@ static enum sctp_ierror sctp_verify_param(struct net *net,
case SCTP_PARAM_HMAC_ALGO:
if (!ep->auth_enable)
- goto fallthrough;
+ goto unhandled;
hmacs = (struct sctp_hmac_algo_param *)param.p;
n_elt = (ntohs(param.p->length) -
@@ -2223,7 +2223,7 @@ static enum sctp_ierror sctp_verify_param(struct net *net,
retval = SCTP_IERROR_ABORT;
}
break;
-fallthrough:
+unhandled:
default:
pr_debug("%s: unrecognized param:%d for chunk:%d\n",
__func__, ntohs(param.p->type), cid);
--
2.15.0
^ permalink raw reply related
* Re: [PATCH 4/4] net: dsa: mv88e6xxx: add PTP support for MV88E6250 family
From: Richard Cochran @ 2019-07-31 4:30 UTC (permalink / raw)
To: Vladimir Oltean
Cc: Hubert Feurstein, netdev, lkml, Andrew Lunn, Vivien Didelot,
Florian Fainelli, David S. Miller, Rasmus Villemoes
In-Reply-To: <CA+h21hqWO=qT6EuQOVgX=J1=m60AWT6EGvQgfzGS=BNNq1cyTg@mail.gmail.com>
On Tue, Jul 30, 2019 at 11:46:51PM +0300, Vladimir Oltean wrote:
> Technically it is not "not true".
[Sigh] The statement was:
The adjfine API clamps ppb between [-32,768,000, 32,768,000]
The adjfine API does NOT clamp to that range. That statement is
simply false.
> And what is the reason for the neg_adj thing? Can you give an example
> of when does the "normal way" of doing signed arithmetics not work?
The detail from years ago escape me ATM, but I needed to use div_u64.
Maybe div_s64 was broken.
But that is not the point. Changing the adjfine() logic for this
driver is out of scope for this series. If someone thinks the logic
needs changing, then that must carefully explained and justified in
the changelog of a patch implementing that _one_ change.
Thanks,
Richard
^ permalink raw reply
* Re: [PATCH bpf-next v2] tools: bpftool: add support for reporting the effective cgroup progs
From: Alexei Starovoitov @ 2019-07-31 4:16 UTC (permalink / raw)
To: Takshak Chahande
Cc: Jakub Kicinski, daniel@iogearbox.net, netdev@vger.kernel.org,
bpf@vger.kernel.org, oss-drivers@netronome.com, Kernel Team,
Quentin Monnet
In-Reply-To: <20190730220053.GA69301@ctakshak-mbp.dhcp.thefacebook.com>
On Tue, Jul 30, 2019 at 3:01 PM Takshak Chahande <ctakshak@fb.com> wrote:
>
> Jakub Kicinski <jakub.kicinski@netronome.com> wrote on Tue [2019-Jul-30 14:03:00 -0700]:
> > Takshak said in the original submission:
> >
> > With different bpf attach_flags available to attach bpf programs specially
> > with BPF_F_ALLOW_OVERRIDE and BPF_F_ALLOW_MULTI, the list of effective
> > bpf-programs available to any sub-cgroups really needs to be available for
> > easy debugging.
> >
> > Using BPF_F_QUERY_EFFECTIVE flag, one can get the list of not only attached
> > bpf-programs to a cgroup but also the inherited ones from parent cgroup.
> >
> > So a new option is introduced to use BPF_F_QUERY_EFFECTIVE query flag here
> > to list all the effective bpf-programs available for execution at a specified
> > cgroup.
> >
> > Reused modified test program test_cgroup_attach from tools/testing/selftests/bpf:
> > # ./test_cgroup_attach
> >
> > With old bpftool:
> >
> > # bpftool cgroup show /sys/fs/cgroup/cgroup-test-work-dir/cg1/
> > ID AttachType AttachFlags Name
> > 271 egress multi pkt_cntr_1
> > 272 egress multi pkt_cntr_2
> >
> > Attached new program pkt_cntr_4 in cg2 gives following:
> >
> > # bpftool cgroup show /sys/fs/cgroup/cgroup-test-work-dir/cg1/cg2
> > ID AttachType AttachFlags Name
> > 273 egress override pkt_cntr_4
> >
> > And with new "effective" option it shows all effective programs for cg2:
> >
> > # bpftool cgroup show /sys/fs/cgroup/cgroup-test-work-dir/cg1/cg2 effective
> > ID AttachType AttachFlags Name
> > 273 egress override pkt_cntr_4
> > 271 egress override pkt_cntr_1
> > 272 egress override pkt_cntr_2
> >
> > Compared to original submission use a local flag instead of global
> > option.
> >
> > We need to clear query_flags on every command, in case batch mode
> > wants to use varying settings.
> >
> > v2: (Takshak)
> > - forbid duplicated flags;
> > - fix cgroup path freeing.
> >
> > Signed-off-by: Takshak Chahande <ctakshak@fb.com>
> > Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> > Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
...
>
> Reviewed-by: Takshak Chahande <ctakshak@fb.com>
Applied. Thanks
^ permalink raw reply
* Re: [PATCH v2 bpf-next] selftests/bpf: fix clearing buffered output between tests/subtests
From: Alexei Starovoitov @ 2019-07-31 4:14 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: bpf, Network Development, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Kernel Team
In-Reply-To: <20190730180541.212452-1-andriin@fb.com>
On Tue, Jul 30, 2019 at 11:17 AM Andrii Nakryiko <andriin@fb.com> wrote:
>
> Clear buffered output once test or subtests finishes even if test was
> successful. Not doing this leads to accumulation of output from previous
> tests and on first failed tests lots of irrelevant output will be
> dumped, greatly confusing things.
>
> v1->v2: fix Fixes tag, add more context to patch
>
> Fixes: 3a516a0a3a7b ("selftests/bpf: add sub-tests support for test_progs")
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Applied. Thanks
^ permalink raw reply
* Re: [bpf-next,v2 0/6] Introduce a BPF helper to generate SYN cookies
From: Alexei Starovoitov @ 2019-07-31 4:12 UTC (permalink / raw)
To: Petar Penkov
Cc: Petar Penkov, Networking, bpf, David S . Miller,
Alexei Starovoitov, Daniel Borkmann, Eric Dumazet, Lorenz Bauer,
Stanislav Fomichev, Toke Høiland-Jørgensen
In-Reply-To: <CAG4SDVV9oBYkXqof=FoD0DeRY=+tSwZo3E1jhqMnF8F8+bVTbg@mail.gmail.com>
On Mon, Jul 29, 2019 at 4:46 PM Petar Penkov <ppenkov@google.com> wrote:
>>
> > What is cpu utilization at this rate?
> > Is it cpu or nic limited if you crank up the syn flood?
> > Original 7M with all cores or single core?
> My receiver was configured with 16rx queues and 16 cores. 7M all cores
> are at 100% so I believe this case is CPU limited. At XDP, all cores
> are at roughly 40%. I couldn't reliably generate higher SYN flood rate
> than that, and the highest numbers I could see for XDP did not go past
> 10.65Mpps with ~42% utilization on each core. I think I am hitting a
> NIC limit here since the CPUs are free.
Applied. Thanks!
^ permalink raw reply
* [PATCH] atm: iphase: Fix Spectre v1 vulnerability
From: Gustavo A. R. Silva @ 2019-07-31 3:21 UTC (permalink / raw)
To: Chas Williams
Cc: linux-atm-general, netdev, linux-kernel, Gustavo A. R. Silva
board is controlled by user-space, hence leading to a potential
exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
drivers/atm/iphase.c:2765 ia_ioctl() warn: potential spectre issue 'ia_dev' [r] (local cap)
drivers/atm/iphase.c:2774 ia_ioctl() warn: possible spectre second half. 'iadev'
drivers/atm/iphase.c:2782 ia_ioctl() warn: possible spectre second half. 'iadev'
drivers/atm/iphase.c:2816 ia_ioctl() warn: possible spectre second half. 'iadev'
drivers/atm/iphase.c:2823 ia_ioctl() warn: possible spectre second half. 'iadev'
drivers/atm/iphase.c:2830 ia_ioctl() warn: potential spectre issue '_ia_dev' [r] (local cap)
drivers/atm/iphase.c:2845 ia_ioctl() warn: possible spectre second half. 'iadev'
drivers/atm/iphase.c:2856 ia_ioctl() warn: possible spectre second half. 'iadev'
Fix this by sanitizing board before using it to index ia_dev and _ia_dev
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://lore.kernel.org/lkml/20180423164740.GY17484@dhcp22.suse.cz/
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
---
drivers/atm/iphase.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/atm/iphase.c b/drivers/atm/iphase.c
index 302cf0ba1600..8c7a996d1f16 100644
--- a/drivers/atm/iphase.c
+++ b/drivers/atm/iphase.c
@@ -63,6 +63,7 @@
#include <asm/byteorder.h>
#include <linux/vmalloc.h>
#include <linux/jiffies.h>
+#include <linux/nospec.h>
#include "iphase.h"
#include "suni.h"
#define swap_byte_order(x) (((x & 0xff) << 8) | ((x & 0xff00) >> 8))
@@ -2760,8 +2761,11 @@ static int ia_ioctl(struct atm_dev *dev, unsigned int cmd, void __user *arg)
}
if (copy_from_user(&ia_cmds, arg, sizeof ia_cmds)) return -EFAULT;
board = ia_cmds.status;
- if ((board < 0) || (board > iadev_count))
- board = 0;
+
+ if ((board < 0) || (board > iadev_count))
+ board = 0;
+ board = array_index_nospec(board, iadev_count + 1);
+
iadev = ia_dev[board];
switch (ia_cmds.cmd) {
case MEMDUMP:
--
2.22.0
^ permalink raw reply related
* Re: [RFC] net: phy: read link status twice when phy_check_link_status()
From: liuyonglong @ 2019-07-31 3:33 UTC (permalink / raw)
To: Heiner Kallweit, andrew, davem
Cc: netdev, linux-kernel, linuxarm, salil.mehta, yisen.zhuang,
shiju.jose
In-Reply-To: <9a0a8094-42ee-0a18-0e9a-d3ca783d6d4b@gmail.com>
On 2019/7/31 3:04, Heiner Kallweit wrote:
> On 30.07.2019 08:35, liuyonglong wrote:
>> :/sys/kernel/debug/tracing$ cat trace
>> # tracer: nop
>> #
>> # entries-in-buffer/entries-written: 45/45 #P:128
>> #
>> # _-----=> irqs-off
>> # / _----=> need-resched
>> # | / _---=> hardirq/softirq
>> # || / _--=> preempt-depth
>> # ||| / delay
>> # TASK-PID CPU# |||| TIMESTAMP FUNCTION
>> # | | | |||| | |
>> kworker/64:2-1028 [064] .... 172.295687: mdio_access: mii-0000:bd:00.0 read phy:0x01 reg:0x02 val:0x001c
>> kworker/64:2-1028 [064] .... 172.295726: mdio_access: mii-0000:bd:00.0 read phy:0x01 reg:0x03 val:0xc916
>> kworker/64:2-1028 [064] .... 172.296902: mdio_access: mii-0000:bd:00.0 read phy:0x01 reg:0x01 val:0x79ad
>> kworker/64:2-1028 [064] .... 172.296938: mdio_access: mii-0000:bd:00.0 read phy:0x01 reg:0x0f val:0x2000
>> kworker/64:2-1028 [064] .... 172.321213: mdio_access: mii-0000:bd:00.0 read phy:0x01 reg:0x00 val:0x1040
>> kworker/64:2-1028 [064] .... 172.343209: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x02 val:0x001c
>> kworker/64:2-1028 [064] .... 172.343245: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x03 val:0xc916
>> kworker/64:2-1028 [064] .... 172.343882: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x01 val:0x79ad
>> kworker/64:2-1028 [064] .... 172.343918: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x0f val:0x2000
>> kworker/64:2-1028 [064] .... 172.362658: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x00 val:0x1040
>> kworker/64:2-1028 [064] .... 172.385961: mdio_access: mii-0000:bd:00.2 read phy:0x05 reg:0x02 val:0x001c
>> kworker/64:2-1028 [064] .... 172.385996: mdio_access: mii-0000:bd:00.2 read phy:0x05 reg:0x03 val:0xc916
>> kworker/64:2-1028 [064] .... 172.386646: mdio_access: mii-0000:bd:00.2 read phy:0x05 reg:0x01 val:0x79ad
>> kworker/64:2-1028 [064] .... 172.386681: mdio_access: mii-0000:bd:00.2 read phy:0x05 reg:0x0f val:0x2000
>> kworker/64:2-1028 [064] .... 172.411286: mdio_access: mii-0000:bd:00.2 read phy:0x05 reg:0x00 val:0x1040
>> kworker/64:2-1028 [064] .... 172.433225: mdio_access: mii-0000:bd:00.3 read phy:0x07 reg:0x02 val:0x001c
>> kworker/64:2-1028 [064] .... 172.433260: mdio_access: mii-0000:bd:00.3 read phy:0x07 reg:0x03 val:0xc916
>> kworker/64:2-1028 [064] .... 172.433887: mdio_access: mii-0000:bd:00.3 read phy:0x07 reg:0x01 val:0x79ad
>> kworker/64:2-1028 [064] .... 172.433922: mdio_access: mii-0000:bd:00.3 read phy:0x07 reg:0x0f val:0x2000
>> kworker/64:2-1028 [064] .... 172.452862: mdio_access: mii-0000:bd:00.3 read phy:0x07 reg:0x00 val:0x1040
>> ifconfig-1324 [011] .... 177.325585: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x00 val:0x1040
>> kworker/u257:0-8 [012] .... 177.325642: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x04 val:0x01e1
>> kworker/u257:0-8 [012] .... 177.325654: mdio_access: mii-0000:bd:00.1 write phy:0x03 reg:0x04 val:0x05e1
>> kworker/u257:0-8 [012] .... 177.325708: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x01 val:0x79ad
>> kworker/u257:0-8 [012] .... 177.325744: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x09 val:0x0200
>> kworker/u257:0-8 [012] .... 177.325779: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x00 val:0x1040
>> kworker/u257:0-8 [012] .... 177.325788: mdio_access: mii-0000:bd:00.1 write phy:0x03 reg:0x00 val:0x1240
>> kworker/u257:0-8 [012] .... 177.325843: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x01 val:0x798d
>
> What I think that happens here:
> Writing 0x1240 to BMCR starts aneg. When reading BMSR immediately after that then the PHY seems to have cleared
> the "aneg complete" bit already, but not yet the "link up" bit. This results in the false "link up" notification.
> The following patch is based on the fact that in case of enabled aneg we can't have a valid link if aneg isn't
> finished. Could you please test whether this works for you?
>
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index 6b5cb87f3..7ddd91df9 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -1774,6 +1774,12 @@ int genphy_update_link(struct phy_device *phydev)
> phydev->link = status & BMSR_LSTATUS ? 1 : 0;
> phydev->autoneg_complete = status & BMSR_ANEGCOMPLETE ? 1 : 0;
>
> + /* Consider the case that autoneg was started and "aneg complete"
> + * bit has been reset, but "link up" bit not yet.
> + */
> + if (phydev->autoneg == AUTONEG_ENABLE && !phydev->autoneg_complete)
> + phydev->link = 0;
> +
> return 0;
> }
> EXPORT_SYMBOL(genphy_update_link);
>
This patch can solve the issue! Will it be upstream?
So it's nothing to do with the bios, and just the PHY's own behavior,
the "link up" bit can not reset immediately,right?
ps: It will take 1 second more to make the link up for RTL8211F when 0x798d happend.
^ permalink raw reply
* Fwd: net: ipv6: Fix a bug in ndisc_send_ns when netdev only has a global address
From: Mark Smith @ 2019-07-31 3:32 UTC (permalink / raw)
To: David Ahern, Su Yanjun, netdev
In-Reply-To: <CAO42Z2we930YTmMUkaWXZOqFQVFxH=bd=y+JFXG8V0Y736kzug@mail.gmail.com>
Re-sending in text format due to Gmail preserving the HTML email I
received and vger (quite reasonably) rejecting my response.
---------- Forwarded message ---------
From: Mark Smith <markzzzsmith@gmail.com>
Date: Wed, 31 Jul 2019 at 12:23
Subject: Re: net: ipv6: Fix a bug in ndisc_send_ns when netdev only
has a global address
To: David Ahern <dsahern@gmail.com>
Cc: Su Yanjun <suyj.fnst@cn.fujitsu.com>, <netdev@vger.kernel.org>
Hi David,
On Wed., 31 Jul. 2019, 00:11 David Ahern, <dsahern@gmail.com> wrote:
>
> On 7/30/19 4:28 AM, Mark Smith wrote:
> > Hi Su,
> >
>
> <snip>
>
> >>> This patch is not the correct solution to this issue.
> >>>
> >
> > <snip>
> >
> >> In linux implementation, one interface may have no link local address if
> >> kernel config
> >>
> >> *addr_gen_mode* is set to IN6_ADDR_GEN_MODE_NONE. My patch is to fix
> >> this problem.
> >>
> >
> > So this "IN6_ADDR_GEN_MODE_NONE" behaviour doesn't comply with RFC 4291.
> >
> > As RFC 4291 says,
> >
> > "All interfaces are *required* to have *at least one* Link-Local
> > unicast address."
> >
> > That's not an ambiguous requirement.
>
> Interesting. Going back to the original commit:
>
> commit bc91b0f07ada5535427373a4e2050877bcc12218
> Author: Jiri Pirko <jiri@resnulli.us>
> Date: Fri Jul 11 21:10:18 2014 +0200
>
> ipv6: addrconf: implement address generation modes
>
> This patch introduces a possibility for userspace to set various (so far
> two) modes of generating addresses. This is useful for example for
> NetworkManager because it can set the mode to NONE and take care of link
> local addresses itself. That allow it to have the interface up,
> monitoring carrier but still don't have any addresses on it.
>
> So the intention of IN6_ADDR_GEN_MODE_NONE was for userspace to control
> it. If an LLA is required (4291 says yes, 4861 suggests no) then the
> current behavior is correct and if IN6_ADDR_GEN_MODE_NONE is used by an
> admin some userspace agent is required to add it for IPv6 to work on
> that link.
>
Ok. That seems to be saying that IN6_ADDR_GEN_MODE_NONE means that the
kernel is not going perform any address configuration on the interface
for any prefixes.
That would then place the RFC 4291 burden to generate at least one LL
address for the interface onto the user space application that has
taken over performing IPv6 address configuration on an interface.
> <snip>
> >
> > It is an IPv6 enabled interface, so it requires a link-local address,
> > per RFC 4291. RFC 4291 doesn't exclude any interfaces types from the
> > LL address requirement.
>
> There is no 'link' for loopback, so really no point in generating an LLA
> for it.
>
If your 'link' mean something physical, then I agree, the loopback
virtual interface doesn't have a link.
From IPv6's perspective, there is a link attached, because the
interface is operationally UP and IPv6 can send and receive packets
over it. They just happen to be returned to the sender by the
link-layer below the IPv6 layer. This behaviour is functionally no
different to when a physical loopback cable/plug is plugged into a
physical interface.
IPv6 tries to be fairly generic with definitions such as 'link' and
'interface' to be future proof. Here's the RFC 8200 definitons:
"link a communication facility or medium over which nodes can
communicate at the link layer, i.e., the layer
immediately below IPv6. Examples are Ethernets [...];
and internet-layer or higher-layer "tunnels",
such as tunnels over IPv4 or IPv6 itself.
interface a node's attachment to a link."
The loopback virtual interface is providing both "a communication
facility [...] over which nodes can communicate at the link layer,
i.e., the layer immediately below IPv6" and an "attachment to a link".
So the loopback virtual interface is by definition a interface per the
IPv6 specification, and therefore requires a link-local address to be
compliant.
Regards,
Mark.
^ permalink raw reply
* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
From: Andrew Lunn @ 2019-07-31 3:31 UTC (permalink / raw)
To: Allan W. Nielsen
Cc: Ido Schimmel, Nikolay Aleksandrov, Horatiu Vultur, roopa, davem,
bridge, netdev, linux-kernel
In-Reply-To: <20190730190000.diacyjw6owqkf7uf@lx-anielsen.microsemi.net>
> Our plan was to implement this in pure SW, and then look at how to HW offload
> it.
Great.
> But this will take some time before we have anything meaning full to show.
>
> > Make it an alternative to the STP code?
> I'm still working on learning the details of DLR, but I actually believe that it
> in some situations may co-exists with STP ;-)
The PDF you linked to suggests this as well. But i think you will need
to make some core changes to the bridge. At the moment, STP is a
bridge level property. But you are going to need it to be a per-port
option. You can then use DLR on the ring ports, and optionally STP on
the other ports.
> But what we are looking at here, is to offload a
> non-aware-(DLR|MRP)-switch which happens to be placed in a network
> with these protocols running.
So we need to think about why we are passing traffic to the CPU port,
and under what conditions can it be blocked.
1) The interface is not part of a bridge. In this case, we only need
the switch to pass to the CPU port MC addresses which have been set
via set_rx_mode().
I think this case does not apply for what you want. You have two ports
bridges together as part of the ring.
2) The interface is part of a bridge. There are a few sub-cases
a) IGMP snooping is being performed. We can block multicast where
there is no interest in the group. But this is limited to IP
multicast.
b) IGMP snooping is not being used and all interfaces in the bridge
are ports of the switch. IP Multicast can be blocked to the CPU.
c) IGMP snooping is not being used and there is a non-switch interface
in the bridge. Multicast needed is needed, so it can be flooded out
this port.
d) set_rx_mode() has been called on the br0 interface, indicating
there is interest in the packets on the host. They must be sent to the
CPU so they can be delivered locally.
e) ????
Does the Multicast MAC address being used by DLR also map to an IP
mmulticast address? 01:21:6C:00:00:0[123] appear to be the MAC
addresses used by DLR. IPv4 multicast MAC addresses are
01:00:5E:XX:XX:XX. IPv6 multicast MAC addresses are 33:33:XX:XX:XX:XX.
So one possibility here is to teach the SW bridge about non-IP
multicast addresses. Initially the switch should forward all MAC
multicast frames to the CPU. If the frame is not an IPv4 or IPv6
frame, and there has not been a call to set_rx_mode() for the MAC
address on the br0 interface, and the bridge only contains switch
ports, switchdev could be used to block the multicast to the CPU
frame, but forward it out all other ports of the bridge.
Andrew
^ permalink raw reply
* Re: Reminder: 99 open syzbot bugs in net subsystem
From: Eric Biggers @ 2019-07-31 2:57 UTC (permalink / raw)
To: Eric Dumazet
Cc: David Miller, dvyukov, netdev, fw, i.maximets, edumazet, dsahern,
linux-kernel, syzkaller-bugs
In-Reply-To: <1e07462d-61e2-9885-edd0-97a82dd7883e@gmail.com>
On Thu, Jul 25, 2019 at 07:04:47AM +0200, Eric Dumazet wrote:
>
>
> On 7/24/19 11:09 PM, Eric Biggers wrote:
> > On Wed, Jul 24, 2019 at 01:09:28PM -0700, David Miller wrote:
> >> From: Eric Biggers <ebiggers@kernel.org>
> >> Date: Wed, 24 Jul 2019 11:37:12 -0700
> >>
> >>> We can argue about what words to use to describe this situation, but
> >>> it doesn't change the situation itself.
> >>
> >> And we should argue about those words because it matters to humans and
> >> effects how they feel, and humans ultimately fix these bugs.
> >>
> >> So please stop with the hyperbole.
> >>
> >> Thank you.
> >
> > Okay, there are 151 bugs that syzbot saw on the mainline Linux kernel in the
> > last 7 days (90.1% with reproducers). Of those, 59 were reported over 3 months
> > ago (89.8% with reproducers). Of those, 12 were reported over a year ago (83.3%
> > with reproducers).
> >
> > No opinion on whether those are small/medium/large numbers, in case it would
> > hurt someone's feelings.
> >
> > These numbers do *not* include bugs that are still valid but weren't seen on
> > mainline in last 7 days, e.g.:
> >
> > - Bugs that are seen only rarely, so by chance weren't seen in last 7 days.
> > - Bugs only in linux-next and/or subsystem branches.
> > - Bugs that were seen in mainline more than 7 days ago, and then only on
> > linux-next or subsystem branch in last 7 days.
> > - Bugs that stopped being seen due to a change in syzkaller.
> > - Bugs that stopped being seen due to a change in kernel config.
> > - Bugs that stopped being seen due to other environment changes such as kernel
> > command line parameters.
> > - Bugs that stopped being seen due to a kernel change that hid the bug but
> > didn't actually fix it, i.e. still reachable in other ways.
> >
>
> We do not doubt syzkaller is an incredible tool.
>
> But netdev@ and lkml@ are mailing lists for humans to interact,
> exchange ideas, send patches and review them.
>
> To me, an issue that was reported to netdev by a real user is _way_ more important
> than potential issues that a bot might have found doing crazy things.
>
> We need to keep optimal S/N on mailing lists, so any bots trying to interact
> with these lists must be very cautious and damn smart.
>
> When I have time to spare and can work on syzbot reports, I am going to a web
> page where I can see them and select the ones it makes sense to fix.
> I hate having to set up email filters.
>
syzbot finds a lot of security bugs, and security bugs are important. And the
bugs are still there regardless of whether they're reported by human or bot.
Also, there *are* bugs being fixed because of these reminders; some subsystem
maintainers have even fixed all the bugs in their subsystem. But I can
understand that for subsystems with a lot of open bug reports it's overwhelming.
What I'll try doing next time (if there *is* a next time; it isn't actually my
job to do any of this, I just care about the security and reliability of
Linux...) is for subsystems with lots of open bug reports, only listing the ones
actually seen in the last week or so, and perhaps also spending some more time
manually checking those bugs. That should cut down the noise a lot.
- Eric
^ permalink raw reply
* Re: [PATCH net-next 2/2] net: phy: broadcom: add 1000Base-X support for BCM54616S
From: Andrew Lunn @ 2019-07-31 2:34 UTC (permalink / raw)
To: Tao Ren
Cc: Vladimir Oltean, Florian Fainelli, Heiner Kallweit,
David S . Miller, Arun Parameswaran, Justin Chen, netdev, lkml,
Andrew Jeffery, openbmc@lists.ozlabs.org
In-Reply-To: <885e48dd-df5b-7f08-ef58-557fc2347fa6@fb.com>
> Hi Andrew,
>
> The BCM54616S PHY on my machine is connected to a BCM5396 switch chip over backplane (1000Base-KX).
Ah, that is different. So the board is using it for RGMII to 1000Base-KX?
phy-mode is about the MAC-PHY link. So in this case RGMII.
There is no DT way to configure the PHY-Switch link. However, it
sounds like you have the PHY strapped so it is doing 1000BaseX on the
PHY-Switch link. So do you actually need to configure this?
You report you never see link up? So maybe the problem is actually in
read_status? When in 1000BaseX mode, do you need to read the link
status from some other register? The Marvell PHYs use a second page
for 1000BaseX.
Andrew
^ permalink raw reply
* Re: [PATCH net-next 2/2] net: phy: broadcom: add 1000Base-X support for BCM54616S
From: Tao Ren @ 2019-07-31 2:09 UTC (permalink / raw)
To: Andrew Lunn
Cc: Vladimir Oltean, Florian Fainelli, Heiner Kallweit,
David S . Miller, Arun Parameswaran, Justin Chen, netdev, lkml,
Andrew Jeffery, openbmc@lists.ozlabs.org
In-Reply-To: <20190731013636.GC25700@lunn.ch>
On 7/30/19 6:36 PM, Andrew Lunn wrote:
>> The INTF_SEL pins report correct mode (RGMII-Fiber) on my machine,
>> but there are 2 "sub-modes" (1000Base-X and 100Base-FX) and I
>> couldn't find a proper/safe way to auto-detect which "sub-mode" is
>> active. The datasheet just describes instructions to enable a
>> specific mode, but it doesn't say 1000Base-X/100Base-FX mode will be
>> auto-selected. And that's why I came up with the patch to specify
>> 1000Base-X mode.
>
> Fibre does not perform any sort of auto-negotiation. I assume you have
> an SFP connected? When using PHYLINK, the sfp driver will get the
> supported baud rate from SFP EEPROM to determine what speed could be
> used. However, there is currently no mainline support for having a
> chain MAC-PHY-SFP. For that you need Russells out of tree patches.
Hi Andrew,
The BCM54616S PHY on my machine is connected to a BCM5396 switch chip over backplane (1000Base-KX).
Thanks,
Tao
^ permalink raw reply
* [PATCH net v3] net: ipv6: Fix a bug in ndisc_send_ns when netdev only has a global address
From: Su Yanjun @ 2019-07-31 1:52 UTC (permalink / raw)
To: davem, kuznet, yoshfuji; +Cc: netdev, linux-kernel, suyj.fnst
When the egress interface does not have a link local address, it can
not communicate with other hosts.
In RFC4861, 7.2.2 says
"If the source address of the packet prompting the solicitation is the
same as one of the addresses assigned to the outgoing interface, that
address SHOULD be placed in the IP Source Address of the outgoing
solicitation. Otherwise, any one of the addresses assigned to the
interface should be used."
In this patch we try get a global address if we get ll address failed.
Signed-off-by: Su Yanjun <suyj.fnst@cn.fujitsu.com>
---
Changes since V2:
- Let banned_flags under the scope of its use.
---
include/net/addrconf.h | 2 ++
net/ipv6/addrconf.c | 34 ++++++++++++++++++++++++++++++++++
net/ipv6/ndisc.c | 10 +++++++---
3 files changed, 43 insertions(+), 3 deletions(-)
diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index 269ec27..eae1167 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -107,6 +107,8 @@ int __ipv6_get_lladdr(struct inet6_dev *idev, struct in6_addr *addr,
u32 banned_flags);
int ipv6_get_lladdr(struct net_device *dev, struct in6_addr *addr,
u32 banned_flags);
+int ipv6_get_addr(struct net_device *dev, struct in6_addr *addr,
+ u32 banned_flags);
bool inet_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2,
bool match_wildcard);
bool inet_rcv_saddr_any(const struct sock *sk);
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 4ae17a9..9e537bd 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -1873,6 +1873,40 @@ int ipv6_get_lladdr(struct net_device *dev, struct in6_addr *addr,
return err;
}
+int __ipv6_get_addr(struct inet6_dev *idev, struct in6_addr *addr,
+ u32 banned_flags)
+{
+ struct inet6_ifaddr *ifp;
+ int err = -EADDRNOTAVAIL;
+
+ list_for_each_entry(ifp, &idev->addr_list, if_list) {
+ if (ifp->scope == 0 &&
+ !(ifp->flags & banned_flags)) {
+ *addr = ifp->addr;
+ err = 0;
+ break;
+ }
+ }
+ return err;
+}
+
+int ipv6_get_addr(struct net_device *dev, struct in6_addr *addr,
+ u32 banned_flags)
+{
+ struct inet6_dev *idev;
+ int err = -EADDRNOTAVAIL;
+
+ rcu_read_lock();
+ idev = __in6_dev_get(dev);
+ if (idev) {
+ read_lock_bh(&idev->lock);
+ err = __ipv6_get_addr(idev, addr, banned_flags);
+ read_unlock_bh(&idev->lock);
+ }
+ rcu_read_unlock();
+ return err;
+}
+
static int ipv6_count_addresses(const struct inet6_dev *idev)
{
const struct inet6_ifaddr *ifp;
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 659ecf4e..fa58c6e 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -592,9 +592,13 @@ void ndisc_send_ns(struct net_device *dev, const struct in6_addr *solicit,
struct nd_msg *msg;
if (!saddr) {
- if (ipv6_get_lladdr(dev, &addr_buf,
- (IFA_F_TENTATIVE|IFA_F_OPTIMISTIC)))
- return;
+ u32 banned_flags = IFA_F_TENTATIVE | IFA_F_OPTIMISTIC;
+
+ if (ipv6_get_lladdr(dev, &addr_buf, banned_flags)) {
+ /* try global address */
+ if (ipv6_get_addr(dev, &addr_buf, banned_flags))
+ return;
+ }
saddr = &addr_buf;
}
--
2.7.4
^ permalink raw reply related
* Re: [PATCH bpf-next v10 10/10] landlock: Add user and kernel documentation for Landlock
From: Randy Dunlap @ 2019-07-31 1:53 UTC (permalink / raw)
To: Mickaël Salaün, linux-kernel
Cc: Alexander Viro, Alexei Starovoitov, Andrew Morton,
Andy Lutomirski, Arnaldo Carvalho de Melo, Casey Schaufler,
Daniel Borkmann, David Drysdale, David S . Miller,
Eric W . Biederman, James Morris, Jann Horn, John Johansen,
Jonathan Corbet, Kees Cook, Michael Kerrisk,
Mickaël Salaün, Paul Moore, Sargun Dhillon,
Serge E . Hallyn, Shuah Khan, Stephen Smalley, Tejun Heo,
Tetsuo Handa, Thomas Graf, Tycho Andersen, Will Drewry,
kernel-hardening, linux-api, linux-fsdevel, linux-security-module,
netdev
In-Reply-To: <20190721213116.23476-11-mic@digikod.net>
On 7/21/19 2:31 PM, Mickaël Salaün wrote:
> This documentation can be built with the Sphinx framework.
>
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: James Morris <jmorris@namei.org>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Serge E. Hallyn <serge@hallyn.com>
> ---
>
> Changes since v9:
> * update with expected attach type and expected attach triggers
>
> Changes since v8:
> * remove documentation related to chaining and tagging according to this
> patch series
>
> Changes since v7:
> * update documentation according to the Landlock revamp
>
> Changes since v6:
> * add a check for ctx->event
> * rename BPF_PROG_TYPE_LANDLOCK to BPF_PROG_TYPE_LANDLOCK_RULE
> * rename Landlock version to ABI to better reflect its purpose and add a
> dedicated changelog section
> * update tables
> * relax no_new_privs recommendations
> * remove ABILITY_WRITE related functions
> * reword rule "appending" to "prepending" and explain it
> * cosmetic fixes
>
> Changes since v5:
> * update the rule hierarchy inheritance explanation
> * briefly explain ctx->arg2
> * add ptrace restrictions
> * explain EPERM
> * update example (subtype)
> * use ":manpage:"
> ---
> Documentation/security/index.rst | 1 +
> Documentation/security/landlock/index.rst | 20 +++
> Documentation/security/landlock/kernel.rst | 99 ++++++++++++++
> Documentation/security/landlock/user.rst | 147 +++++++++++++++++++++
> 4 files changed, 267 insertions(+)
> create mode 100644 Documentation/security/landlock/index.rst
> create mode 100644 Documentation/security/landlock/kernel.rst
> create mode 100644 Documentation/security/landlock/user.rst
> diff --git a/Documentation/security/landlock/kernel.rst b/Documentation/security/landlock/kernel.rst
> new file mode 100644
> index 000000000000..7d1e06d544bf
> --- /dev/null
> +++ b/Documentation/security/landlock/kernel.rst
> @@ -0,0 +1,99 @@
> +==============================
> +Landlock: kernel documentation
> +==============================
> +
> +eBPF properties
> +===============
> +
> +To get an expressive language while still being safe and small, Landlock is
> +based on eBPF. Landlock should be usable by untrusted processes and must
> +therefore expose a minimal attack surface. The eBPF bytecode is minimal,
> +powerful, widely used and designed to be used by untrusted applications. Thus,
> +reusing the eBPF support in the kernel enables a generic approach while
> +minimizing new code.
> +
> +An eBPF program has access to an eBPF context containing some fields used to
> +inspect the current object. These arguments can be used directly (e.g. cookie)
> +or passed to helper functions according to their types (e.g. inode pointer). It
> +is then possible to do complex access checks without race conditions or
> +inconsistent evaluation (i.e. `incorrect mirroring of the OS code and state
> +<https://www.ndss-symposium.org/ndss2003/traps-and-pitfalls-practical-problems-system-call-interposition-based-security-tools/>`_).
> +
> +A Landlock hook describes a particular access type. For now, there is two
there are two
> +hooks dedicated to filesystem related operations: LANDLOCK_HOOK_FS_PICK and
> +LANDLOCK_HOOK_FS_WALK. A Landlock program is tied to one hook. This makes it
> +possible to statically check context accesses, potentially performed by such
> +program, and hence prevents kernel address leaks and ensure the right use of
ensures
> +hook arguments with eBPF functions. Any user can add multiple Landlock
> +programs per Landlock hook. They are stacked and evaluated one after the
> +other, starting from the most recent program, as seccomp-bpf does with its
> +filters. Underneath, a hook is an abstraction over a set of LSM hooks.
> +
> +
> +Guiding principles
> +==================
> +
> +Unprivileged use
> +----------------
> +
> +* Landlock helpers and context should be usable by any unprivileged and
> + untrusted program while following the system security policy enforced by
> + other access control mechanisms (e.g. DAC, LSM).
> +
> +
> +Landlock hook and context
> +-------------------------
> +
> +* A Landlock hook shall be focused on access control on kernel objects instead
> + of syscall filtering (i.e. syscall arguments), which is the purpose of
> + seccomp-bpf.
> +* A Landlock context provided by a hook shall express the minimal and more
> + generic interface to control an access for a kernel object.
> +* A hook shall guaranty that all the BPF function calls from a program are> + safe. Thus, the related Landlock context arguments shall always be of the
> + same type for a particular hook. For example, a network hook could share
> + helpers with a file hook because of UNIX socket. However, the same helpers
> + may not be compatible for a file system handle and a net handle.
> +* Multiple hooks may use the same context interface.
> +
> +
> +Landlock helpers
> +----------------
> +
> +* Landlock helpers shall be as generic as possible while at the same time being
> + as simple as possible and following the syscall creation principles (cf.
> + *Documentation/adding-syscalls.txt*).
> +* The only behavior change allowed on a helper is to fix a (logical) bug to
> + match the initial semantic.
> +* Helpers shall be reentrant, i.e. only take inputs from arguments (e.g. from
> + the BPF context), to enable a hook to use a cache. Future program options
> + might change this cache behavior.
> +* It is quite easy to add new helpers to extend Landlock. The main concern
> + should be about the possibility to leak information from the kernel that may
> + not be accessible otherwise (i.e. side-channel attack).
> +
> +
> +Questions and answers
> +=====================
> +
> +Why not create a custom hook for each kind of action?
> +-----------------------------------------------------
> +
> +Landlock programs can handle these checks. Adding more exceptions to the
> +kernel code would lead to more code complexity. A decision to ignore a kind of
> +action can and should be done at the beginning of a Landlock program.
> +
> +
> +Why a program does not return an errno or a kill code?
> +------------------------------------------------------
> +
> +seccomp filters can return multiple kind of code, including an errno value or a
kinds
> +kill signal, which may be convenient for access control. Those return codes
> +are hardwired in the userland ABI. Instead, Landlock's approach is to return a
> +boolean to allow or deny an action, which is much simpler and more generic.
> +Moreover, we do not really have a choice because, unlike to seccomp, Landlock
> +programs are not enforced at the syscall entry point but may be executed at any
> +point in the kernel (through LSM hooks) where an errno return code may not make
> +sense. However, with this simple ABI and with the ability to call helpers,
> +Landlock may gain features similar to seccomp-bpf in the future while being
> +compatible with previous programs.
> diff --git a/Documentation/security/landlock/user.rst b/Documentation/security/landlock/user.rst
> new file mode 100644
> index 000000000000..14c4f3b377bd
> --- /dev/null
> +++ b/Documentation/security/landlock/user.rst
> @@ -0,0 +1,147 @@
> +================================
> +Landlock: userland documentation
> +================================
> +
> +Landlock programs
> +=================
> +
> +eBPF programs are used to create security programs. They are contained and can
> +call only a whitelist of dedicated functions. Moreover, they can only loop
> +under strict conditions, which protects from denial of service. More
> +information on BPF can be found in *Documentation/networking/filter.txt*.
> +
> +
> +Writing a program
> +-----------------
> +
> +To enforce a security policy, a thread first needs to create a Landlock program.
> +The easiest way to write an eBPF program depicting a security program is to write
> +it in the C language. As described in *samples/bpf/README.rst*, LLVM can
> +compile such programs. Files *samples/bpf/landlock1_kern.c* and those in
> +*tools/testing/selftests/landlock/* can be used as examples.
> +
> +Once the eBPF program is created, the next step is to create the metadata
> +describing the Landlock program. This metadata includes an expected attach type which
> +contains the hook type to which the program is tied, and expected attach
> +triggers which identify the actions for which the program should be run.
> +
> +A hook is a policy decision point which exposes the same context type for
> +each program evaluation.
> +
> +A Landlock hook describes the kind of kernel object for which a program will be
> +triggered to allow or deny an action. For example, the hook
> +BPF_LANDLOCK_FS_PICK can be triggered every time a landlocked thread performs a
> +set of action related to the filesystem (e.g. open, read, write, mount...).
actions
> +This actions are identified by the `triggers` bitfield.
> +
> +The next step is to fill a :c:type:`struct bpf_load_program_attr
> +<bpf_load_program_attr>` with BPF_PROG_TYPE_LANDLOCK_HOOK, the expected attach
> +type and other BPF program metadata. This bpf_attr must then be passed to the
> +:manpage:`bpf(2)` syscall alongside the BPF_PROG_LOAD command. If everything
> +is deemed correct by the kernel, the thread gets a file descriptor referring to
> +this program.
> +
> +In the following code, the *insn* variable is an array of BPF instructions
> +which can be extracted from an ELF file as is done in bpf_load_file() from
> +*samples/bpf/bpf_load.c*.
A little confusing. Is there a mixup of <insn> and <insns>?
> +
> +.. code-block:: c
> +
> + int prog_fd;
> + struct bpf_load_program_attr load_attr;
> +
> + memset(&load_attr, 0, sizeof(struct bpf_load_program_attr));
> + load_attr.prog_type = BPF_PROG_TYPE_LANDLOCK_HOOK;
> + load_attr.expected_attach_type = BPF_LANDLOCK_FS_PICK;
> + load_attr.expected_attach_triggers = LANDLOCK_TRIGGER_FS_PICK_OPEN;
> + load_attr.insns = insns;
> + load_attr.insns_cnt = sizeof(insn) / sizeof(struct bpf_insn);
> + load_attr.license = "GPL";
> +
> + prog_fd = bpf_load_program_xattr(&load_attr, log_buf, log_buf_sz);
> + if (prog_fd == -1)
> + exit(1);
> +
> +
> +Enforcing a program
> +-------------------
> +
> +Once the Landlock program has been created or received (e.g. through a UNIX
> +socket), the thread willing to sandbox itself (and its future children) should
> +perform the following two steps.
> +
> +The thread should first request to never be allowed to get new privileges with a
> +call to :manpage:`prctl(2)` and the PR_SET_NO_NEW_PRIVS option. More
> +information can be found in *Documentation/prctl/no_new_privs.txt*.
> +
> +.. code-block:: c
> +
> + if (prctl(PR_SET_NO_NEW_PRIVS, 1, NULL, 0, 0))
> + exit(1);
> +
> +A thread can apply a program to itself by using the :manpage:`seccomp(2)` syscall.
> +The operation is SECCOMP_PREPEND_LANDLOCK_PROG, the flags must be empty and the
> +*args* argument must point to a valid Landlock program file descriptor.
> +
> +.. code-block:: c
> +
> + if (seccomp(SECCOMP_PREPEND_LANDLOCK_PROG, 0, &fd))
> + exit(1);
> +
> +If the syscall succeeds, the program is now enforced on the calling thread and
> +will be enforced on all its subsequently created children of the thread as
> +well. Once a thread is landlocked, there is no way to remove this security
> +policy, only stacking more restrictions is allowed. The program evaluation is
> +performed from the newest to the oldest.
> +
> +When a syscall ask for an action on a kernel object, if this action is denied,
asks
> +then an EACCES errno code is returned through the syscall.
> +
> +
> +.. _inherited_programs:
> +
> +Inherited programs
> +------------------
> +
> +Every new thread resulting from a :manpage:`clone(2)` inherits Landlock program
> +restrictions from its parent. This is similar to the seccomp inheritance as
> +described in *Documentation/prctl/seccomp_filter.txt*.
> +
> +
> +Ptrace restrictions
> +-------------------
> +
> +A landlocked process has less privileges than a non-landlocked process and must
> +then be subject to additional restrictions when manipulating another process.
> +To be allowed to use :manpage:`ptrace(2)` and related syscalls on a target
> +process, a landlocked process must have a subset of the target process programs.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Maybe that last statement is correct, but it seems to me that it is missing something.
> +
> +
> +Landlock structures and constants
> +=================================
> +
> +Hook types
> +----------
> +
> +.. kernel-doc:: include/uapi/linux/landlock.h
> + :functions: landlock_hook_type
> +
> +
> +Contexts
> +--------
> +
> +.. kernel-doc:: include/uapi/linux/landlock.h
> + :functions: landlock_ctx_fs_pick landlock_ctx_fs_walk landlock_ctx_fs_get
> +
> +
> +Triggers for fs_pick
> +--------------------
> +
> +.. kernel-doc:: include/uapi/linux/landlock.h
> + :functions: landlock_triggers
> +
> +
> +Additional documentation
> +========================
> +
> +See https://landlock.io
>
--
~Randy
^ permalink raw reply
* Re: [PATCH 0/2] tools: bpftool: add net (un)load command to load XDP
From: Alexei Starovoitov @ 2019-07-31 1:52 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Daniel T. Lee, Stephen Hemminger, David Ahern,
Jesper Dangaard Brouer, John Fastabend, Daniel Borkmann,
Alexei Starovoitov, David Miller, netdev
In-Reply-To: <20190730182144.1355bf50@cakuba.netronome.com>
On Tue, Jul 30, 2019 at 06:21:44PM -0700, Jakub Kicinski wrote:
> > > Duplicating the same features in bpftool will only diminish the
> > > incentive for moving iproute2 to libbpf.
> >
> > not at all. why do you think so?
>
> Because iproute2's BPF has fallen behind so the simplest thing is to
> just contribute to bpftool. But iproute2 is the tool set for Linux
> networking, we can't let it bit rot :(
where were you when a lot of libbpf was copy pasted into iproute2 ?!
Now it diverged a lot and it's difficult to move iproute2 back to the main
train which is libbpf.
Same thing with at least 5 copy-pastes of samples/bpf/bpf_load.c
that spawned a bunch of different bpf loaders.
> IMHO vaguely competent users of Linux networking will know ip link.
> If they are not vaguely competent, they should not attach XDP programs
> to interfaces by hand...
I'm a prime example of moderately competent linux user who
doesn't know iproute2. I'm not joking.
I don't know tc syntax and always use my cheat sheet of ip/tc commands
to do anything.
> >
> > bpftool must be able to introspect every aspect of bpf programming.
> > That includes detaching and attaching anywhere.
> > Anyone doing 'bpftool p s' should be able to switch off particular
> > prog id without learning ten different other tools.
>
> I think the fact that we already have an implementation in iproute2,
> which is at the risk of bit rot is more important to me that the
> hypothetical scenario where everyone knows to just use bpftool (for
> XDP, for TC it's still iproute2 unless there's someone crazy enough
> to reimplement the TC functionality :))
I think you're missing the point that iproute2 is still necessary
to configure it.
bpftool should be able to attach/detach from anything.
But configuring that thing (whether it's cgroup or tc/xdp) is
a job of corresponding apis and tools.
> I'm not sure we can settle our differences over email :)
> I have tremendous respect for all the maintainers I CCed here,
> if nobody steps up to agree with me I'll concede the bpftool net
> battle entirely :)
we can keep arguing forever. Respect to ease-of-use only comes
from the pain of operational experience. I don't think I can
convey that pain in the email either.
^ permalink raw reply
* [PATCH bpf 0/2] bpf: fix bug in x64 JIT
From: Alexei Starovoitov @ 2019-07-31 1:38 UTC (permalink / raw)
To: davem; +Cc: daniel, netdev, bpf, kernel-team
Fix bug in computation of the jump offset to 1st instruction in BPF JIT
and add 2 tests.
Alexei Starovoitov (2):
bpf: fix x64 JIT code generation for jmp to 1st insn
selftests/bpf: tests for jmp to 1st insn
arch/x86/net/bpf_jit_comp.c | 7 +++--
tools/testing/selftests/bpf/verifier/loops1.c | 28 +++++++++++++++++++
2 files changed, 32 insertions(+), 3 deletions(-)
--
2.20.0
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox