Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next 2/3] net: dsa: mv88e6xxx: extend PTP gettime function to read system clock
From: Hubert Feurstein @ 2019-08-19 17:14 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, lkml, Richard Cochran, Vivien Didelot, Florian Fainelli,
	Vladimir Oltean, David S. Miller
In-Reply-To: <20190819132733.GE8981@lunn.ch>

Hi Andrew,

Am Mo., 19. Aug. 2019 um 15:27 Uhr schrieb Andrew Lunn <andrew@lunn.ch>:
>
> > @@ -45,7 +45,8 @@ static int mv88e6xxx_smi_direct_write(struct mv88e6xxx_chip *chip,
> >  {
> >       int ret;
> >
> > -     ret = mdiobus_write_nested(chip->bus, dev, reg, data);
> > +     ret = mdiobus_write_sts_nested(chip->bus, dev, reg, data,
> > +                                    chip->ptp_sts);
> >       if (ret < 0)
> >               return ret;
> >
>
> Please also make a similar change to mv88e6xxx_smi_indirect_write().
> The last write in that function should be timestamped.
Since it is already the last write it should be already ok (The
mv88e6xxx_smi_indirect_write
calls the mv88e6xxx_smi_direct_write which initiates the timestamping).

Hubert

^ permalink raw reply

* Re: What to do when a bridge port gets its pvid deleted?
From: Vladimir Oltean @ 2019-08-19 17:15 UTC (permalink / raw)
  To: Florian Fainelli, Ido Schimmel, roopa, nikolay
  Cc: netdev, Andrew Lunn, Vivien Didelot, stephen
In-Reply-To: <bb99eabb-1410-e7c2-4226-ee6c5fef6880@gmail.com>

On 6/28/19 7:45 PM, Florian Fainelli wrote:
> On 6/28/19 5:37 AM, Vladimir Oltean wrote:
>> On Fri, 28 Jun 2019 at 15:30, Ido Schimmel <idosch@idosch.org> wrote:
>>>
>>> On Tue, Jun 25, 2019 at 11:49:29PM +0300, Vladimir Oltean wrote:
>>>> A number of DSA drivers (BCM53XX, Microchip KSZ94XX, Mediatek MT7530
>>>> at the very least), as well as Mellanox Spectrum (I didn't look at all
>>>> the pure switchdev drivers) try to restore the pvid to a default value
>>>> on .port_vlan_del.
>>>
>>> I don't know about DSA drivers, but that's not what mlxsw is doing. If
>>> the VLAN that is configured as PVID is deleted from the bridge port, the
>>> driver instructs the port to discard untagged and prio-tagged packets.
>>> This is consistent with the bridge driver's behavior.
>>>
>>> We do have a flow the "restores" the PVID, but that's when a port is
>>> unlinked from its bridge master. The PVID we set is 4095 which cannot be
>>> configured by the 8021q / bridge driver. This is due to the way the
>>> underlying hardware works. Even if a port is not bridged and used purely
>>> for routing, packets still do L2 lookup first which sends them directly
>>> to the router block. If PVID is not configured, untagged packets could
>>> not be routed. Obviously, at egress we strip this VLAN.
>>>
>>>> Sure, the port stops receiving traffic when its pvid is a VLAN ID that
>>>> is not installed in its hw filter, but as far as the bridge core is
>>>> concerned, this is to be expected:
>>>>
>>>> # bridge vlan add dev swp2 vid 100 pvid untagged
>>>> # bridge vlan
>>>> port    vlan ids
>>>> swp5     1 PVID Egress Untagged
>>>>
>>>> swp2     1 Egress Untagged
>>>>           100 PVID Egress Untagged
>>>>
>>>> swp3     1 PVID Egress Untagged
>>>>
>>>> swp4     1 PVID Egress Untagged
>>>>
>>>> br0      1 PVID Egress Untagged
>>>> # ping 10.0.0.1
>>>> PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.
>>>> 64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=0.682 ms
>>>> 64 bytes from 10.0.0.1: icmp_seq=2 ttl=64 time=0.299 ms
>>>> 64 bytes from 10.0.0.1: icmp_seq=3 ttl=64 time=0.251 ms
>>>> 64 bytes from 10.0.0.1: icmp_seq=4 ttl=64 time=0.324 ms
>>>> 64 bytes from 10.0.0.1: icmp_seq=5 ttl=64 time=0.257 ms
>>>> ^C
>>>> --- 10.0.0.1 ping statistics ---
>>>> 5 packets transmitted, 5 received, 0% packet loss, time 4188ms
>>>> rtt min/avg/max/mdev = 0.251/0.362/0.682/0.163 ms
>>>> # bridge vlan del dev swp2 vid 100
>>>> # bridge vlan
>>>> port    vlan ids
>>>> swp5     1 PVID Egress Untagged
>>>>
>>>> swp2     1 Egress Untagged
>>>>
>>>> swp3     1 PVID Egress Untagged
>>>>
>>>> swp4     1 PVID Egress Untagged
>>>>
>>>> br0      1 PVID Egress Untagged
>>>>
>>>> # ping 10.0.0.1
>>>> PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.
>>>> ^C
>>>> --- 10.0.0.1 ping statistics ---
>>>> 8 packets transmitted, 0 received, 100% packet loss, time 7267ms
>>>>
>>>> What is the consensus here? Is there a reason why the bridge driver
>>>> doesn't take care of this?
>>>
>>> Take care of what? :) Always maintaining a PVID on the bridge port? It's
>>> completely OK not to have a PVID.
>>>
>>
>> Yes, I didn't think it through during the first email. I came to the
>> same conclusion in the second one.
>>
>>>> Do switchdev drivers have to restore the pvid to always be
>>>> operational, even if their state becomes inconsistent with the upper
>>>> dev? Is it just 'nice to have'? What if VID 1 isn't in the hw filter
>>>> either (perfectly legal)?
>>>
>>> Are you saying that DSA drivers always maintain a PVID on the bridge
>>> port and allow untagged traffic to ingress regardless of the bridge
>>> driver's configuration? If so, I think this needs to be fixed.
>>
>> Well, not at the DSA core level.
>> But for Microchip:
>> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/tree/drivers/net/dsa/microchip/ksz9477.c#n576
>> For Broadcom:
>> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/tree/drivers/net/dsa/b53/b53_common.c#n1376
>> For Mediatek:
>> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/tree/drivers/net/dsa/mt7530.c#n1196
>>
>> There might be others as well.
> 
> That sounds bogus indeed, and I bet that the two other drivers just
> followed the b53 driver there. I will have to test this again and come
> up with a patch eventually.
> 
> When the port leaves the bridge we do bring it back into a default PVID
> (which is different than the bridge's default PVID) so that part should
> be okay.
> 

Adding a few more networking people.
So my flow is something like this:
- Boot a board with a DSA switch
- Bring all interfaces up
- Enslave all interfaces to br0
- Enable vlan_filtering on br0

What VIDs should be installed into the ports' hw filters, and what 
should the pvid be at this point?
Should the switch ports pass any traffic?
At this point, 'bridge vlan' shows a confusing:
port    vlan ids
eth0     1 PVID Egress Untagged

swp5     1 PVID Egress Untagged

swp2     1 PVID Egress Untagged

swp3     1 PVID Egress Untagged

swp4     1 PVID Egress Untagged

br0      1 PVID Egress Untagged
for all ports, but the .port_vlan_add callback is nowhere to be found.

Whose responsibility is it for the switch to pass traffic without any 
further 'bridge vlan' command? What is the mechanism through which this 
should work?

What if I do:
sudo bridge vlan add vid 100 dev swp2 pvid untagged
echo 0 | sudo tee /sys/class/net/br0/bridge/vlan_filtering
echo 1 | sudo tee /sys/class/net/br0/bridge/vlan_filtering
What pvid should there be on swp2 now?
'bridge vlan' shows:
port    vlan ids
eth0     1 PVID Egress Untagged

swp5     1 PVID Egress Untagged

swp2     1 Egress Untagged
          100 PVID Egress Untagged

swp3     1 PVID Egress Untagged

swp4     1 PVID Egress Untagged

br0      1 PVID Egress Untagged
If the 'bridge vlan' output is correct, whose responsibility is it to 
restore this pvid?

More context: the sja1105 driver is somewhat similar to the mlxsw in 
that VLAN awareness cannot be truly disabled. Arid details aside, in 
both cases, achieving "VLAN-unaware"-like behavior involves manipulating 
the pvid in both cases. But it appears that the bridge core does expect:
(1) that the driver performs a default VLAN initialization which matches 
its own, without them ever communicating. But because switchdev/DSA 
drivers start off in standalone mode, vlan_filtering=0 comes first, 
hence the non-standard pvid. Through what mechanism is the 
bridge-expected pvid supposed to get restored upon flipping vlan_filtering?
(2) that toggling VLAN filtering off and on has no other state upon the 
underlying driver than enabling and disabling VLAN awareness. The VLAN 
hw filter table is assumed to be unchanged. Is this a correct assumption?

Thanks,
-Vladimir

^ permalink raw reply

* Re: [PATCH net-next 2/6] net: dsa: do not enable or disable non user ports
From: Florian Fainelli @ 2019-08-19 17:16 UTC (permalink / raw)
  To: Vivien Didelot, netdev; +Cc: marek.behun, davem, andrew
In-Reply-To: <20190818173548.19631-3-vivien.didelot@gmail.com>

On 8/18/19 10:35 AM, Vivien Didelot wrote:
> The .port_enable and .port_disable operations are currently only
> called for user ports, hence assuming they have a slave device. In
> preparation for using these operations for other port types as well,
> simply guard all implementations against non user ports and return
> directly in such case.
> 
> Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
> ---

[snip]

>
diff --git a/drivers/net/dsa/b53/b53_common.c
b/drivers/net/dsa/b53/b53_common.c
> index 907af62846ba..1639ea7b7dab 100644
> --- a/drivers/net/dsa/b53/b53_common.c
> +++ b/drivers/net/dsa/b53/b53_common.c
> @@ -510,10 +510,15 @@ EXPORT_SYMBOL(b53_imp_vlan_setup);
>  int b53_enable_port(struct dsa_switch *ds, int port, struct phy_device *phy)
>  {
>  	struct b53_device *dev = ds->priv;
> -	unsigned int cpu_port = ds->ports[port].cpu_dp->index;
> +	unsigned int cpu_port;
>  	int ret = 0;
>  	u16 pvlan;
>  
> +	if (!dsa_is_user_port(ds, port))
> +		return 0;
> +
> +	cpu_port = ds->ports[port].cpu_dp->index;
> +
>  	if (dev->ops->irq_enable)
>  		ret = dev->ops->irq_enable(dev, port);
>  	if (ret)
> @@ -547,6 +552,9 @@ void b53_disable_port(struct dsa_switch *ds, int port)
>  	struct b53_device *dev = ds->priv;
>  	u8 reg;
>  
> +	if (!dsa_is_user_port(ds, port))
> +		return;
> +
>  	/* Disable Tx/Rx for the port */
>  	b53_read8(dev, B53_CTRL_PAGE, B53_PORT_CTRL(port), &reg);
>  	reg |= PORT_CTRL_RX_DISABLE | PORT_CTRL_TX_DISABLE;

For b53, the changes are correct, for bcm_sf2, see comments below:

> diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
> index 49f99436018a..3d06262817bd 100644
> --- a/drivers/net/dsa/bcm_sf2.c
> +++ b/drivers/net/dsa/bcm_sf2.c
> @@ -157,6 +157,9 @@ static int bcm_sf2_port_setup(struct dsa_switch *ds, int port,
>  	unsigned int i;
>  	u32 reg;
>  
> +	if (!dsa_is_user_port(ds, port))
> +		return 0;
> +
>  	/* Clear the memory power down */
>  	reg = core_readl(priv, CORE_MEM_PSM_VDD_CTRL);
>  	reg &= ~P_TXQ_PSM_VDD(port);
> @@ -222,6 +225,9 @@ static void bcm_sf2_port_disable(struct dsa_switch *ds, int port)
>  	struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
>  	u32 reg;
>  
> +	if (!dsa_is_user_port(ds, port))
> +		return;

in bcm_sf2_sw_suspend(), we do call bcm_sf2_port_disable() against the
user and CPU port.
-- 
Florian

^ permalink raw reply

* Re: [PATCH net-next 3/6] net: dsa: enable and disable all ports
From: Florian Fainelli @ 2019-08-19 17:17 UTC (permalink / raw)
  To: Vivien Didelot, netdev; +Cc: marek.behun, davem, andrew
In-Reply-To: <20190818173548.19631-4-vivien.didelot@gmail.com>

On 8/18/19 10:35 AM, Vivien Didelot wrote:
> Call the .port_enable and .port_disable functions for all ports,
> not only the user ports, so that drivers may optimize the power
> consumption of all ports after a successful setup.
> 
> Unused ports are now disabled on setup. CPU and DSA ports are now
> enabled on setup and disabled on teardown. User ports were already
> enabled at slave creation and disabled at slave destruction.
> 
> Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply

* Re: [PATCH net-next 1/6] net: dsa: use a single switch statement for port setup
From: Vivien Didelot @ 2019-08-19 17:20 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: netdev, marek.behun, davem, andrew
In-Reply-To: <cb37b6b0-c6c4-6f77-3658-a5cf676fabfe@gmail.com>

Hi Florian,

On Mon, 19 Aug 2019 10:14:24 -0700, Florian Fainelli <f.fainelli@gmail.com> wrote:
> On 8/18/19 10:35 AM, Vivien Didelot wrote:
> > It is currently difficult to read the different steps involved in the
> > setup and teardown of ports in the DSA code. Keep it simple with a
> > single switch statement for each port type: UNUSED, CPU, DSA, or USER.
> > 
> > Also no need to call devlink_port_unregister from within dsa_port_setup
> > as this step is inconditionally handled by dsa_port_teardown on error.
> > 
> > Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
> > ---
> 
> [snip]
> 
> >  	case DSA_PORT_TYPE_CPU:
> > +		memset(dlp, 0, sizeof(*dlp));
> > +		devlink_port_attrs_set(dlp, DEVLINK_PORT_FLAVOUR_CPU,
> > +				       dp->index, false, 0, id, len);
> > +		err = devlink_port_register(dl, dlp, dp->index);
> > +		if (err)
> > +			return err;
> 
> This is shared between all port flavors with the exception that the
> flavor type is different, maybe we should create a helper function and
> factor out even more code. I don't feel great about repeating 3 times t
> the same code without making use of a fall through.

Nah I did not want to use an helper because the code is still too
noodly, if you look at user ports setup, we continue to setup devlink
after the slave creation, so here I prefered to keep things readable
and expose all steps first, before writing yet another helper.


Thanks,

	Vivien

^ permalink raw reply

* Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf
From: Alexei Starovoitov @ 2019-08-19 17:27 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Jordan Glover, Andy Lutomirski, Daniel Colascione, Song Liu,
	Kees Cook, Networking, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Lorenz Bauer, Jann Horn, Greg KH, Linux API,
	LSM List
In-Reply-To: <alpine.DEB.2.21.1908191103130.1923@nanos.tec.linutronix.de>

On Mon, Aug 19, 2019 at 11:15:11AM +0200, Thomas Gleixner wrote:
> Alexei,
> 
> On Sat, 17 Aug 2019, Alexei Starovoitov wrote:
> > On Fri, Aug 16, 2019 at 10:28:29PM +0200, Thomas Gleixner wrote:
> > > On Fri, 16 Aug 2019, Alexei Starovoitov wrote:
> > > While real usecases are helpful to understand a design decision, the design
> > > needs to be usecase independent.
> > > 
> > > The kernel provides mechanisms, not policies. My impression of this whole
> > > discussion is that it is policy driven. That's the wrong approach.
> > 
> > not sure what you mean by 'policy driven'.
> > Proposed CAP_BPF is a policy?
> 
> I was referring to the discussion as a whole.
>  
> > Can kernel.unprivileged_bpf_disabled=1 be used now?
> > Yes, but it will weaken overall system security because things that
> > use unpriv to load bpf and CAP_NET_ADMIN to attach bpf would need
> > to move to stronger CAP_SYS_ADMIN.
> > 
> > With CAP_BPF both load and attach would happen under CAP_BPF
> > instead of CAP_SYS_ADMIN.
> 
> I'm not arguing against that.
> 
> > > So let's look at the mechanisms which we have at hand:
> > > 
> > >  1) Capabilities
> > >  
> > >  2) SUID and dropping priviledges
> > > 
> > >  3) Seccomp and LSM
> > > 
> > > Now the real interesting questions are:
> > > 
> > >  A) What kind of restrictions does BPF allow? Is it a binary on/off or is
> > >     there a more finegrained control of BPF functionality?
> > > 
> > >     TBH, I can't tell.
> > > 
> > >  B) Depending on the answer to #A what is the control possibility for
> > >     #1/#2/#3 ?
> > 
> > Can any of the mechanisms 1/2/3 address the concern in mds.rst?
> 
> Well, that depends. As with any other security policy which is implemented
> via these mechanisms, the policy can be strict enough to prevent it by not
> allowing certain operations. The more fine-grained the control is, it
> allows the administrator who implements the policy to remove the
> 'dangerous' parts from an untrusted user.
> 
> So really question #A is important for this. Is BPF just providing a binary
> ON/OFF knob or does it allow to disable/enable certain aspects of BPF
> functionality in a more fine grained way? If the latter, then it might be
> possible to control functionality which might be abused for exploits of
> some sorts (including MDS) in a way which allows other parts of BBF to be
> exposed to less priviledged contexts.

I see. So the kernel.unprivileged_bpf_disabled knob is binary and I think it's
the right mechanism to expose to users.
Having N knobs for every map/prog type won't decrease attack surface.
In the other email Andy's quoting seccomp man page...
Today seccomp cannot really look into bpf_attr syscall args, but even
if it could it won't secure the system.
Examples:
1.
spectre v2 is using bpf in-kernel interpreter in speculative way.
The mere presence of interpreter as part of kernel .text makes the exploit
easier to do. That was the reason to do CONFIG_BPF_JIT_ALWAYS_ON.
For this case even kernel.unprivileged_bpf_disabled=1 was hopeless.

2.
var4 doing store hazard. It doesn't matter which program type is used.
load/store instructions are the same across program types.

3.
prog_array was used as part of var1. I guess it was simply more
convenient for Jann to do it this way :) All other map types
have the same out-of-bounds speculation issue.

In general side channels are cpu bugs that are exploited via sequences
of cpu instructions. In that sense bpf infra provides these instructions.
So all program types and all maps have the same level of 'side channel risk'.

> > I believe Andy wants to expand the attack surface when
> > kernel.unprivileged_bpf_disabled=0
> > Before that happens I'd like the community to work on addressing the text above.
> 
> Well, that text above can be removed when the BPF wizards are entirely sure
> that BPF cannot be abused to exploit stuff. 

Myself and Daniel looked at it in detail. I think we understood
MDS mechanism well enough. Right now we're fairly confident that
combination of existing mechanisms we did for var4 and
verifier speculative analysis protect us from MDS.
The thing is that every new cpu bug is looked at through the bpf lenses.
Can it be exploited through bpf? Complexity of side channels
is growing. Can the most recent swapgs be exploited ?
What if we kprobe+bpf somewhere ?
I don't think there is an issue, but we will never be 'entirely sure'.
Even if myself and Daniel are sure the concern will stay.
Unprivileged bpf as a whole is the concern due to side channels.
The number of them are not yet disclosed. Who is going to analyze them?
imo the only answer to that is kernel.unprivileged_bpf_disabled=1
which together with CONFIG_BPF_JIT_ALWAYS_ON is secure enough.
The other option is to sprinkle every bpf load/store with lfence
which will make execution so slow that it will be unusable.
Which is effectively the same as unprivileged_bpf_disabled=1.

There are other things we can do. Like kasan-style shadow memory
for bpf execution. Auto re-JITing the code after it's running.
We can do lfences everywhere for some time then re-JIT
when kasan-ed shadow memory shows only clean memory accesses.
The beauty of BPF that it is analyze-able and JIT-able instruction set.
The verifier speculative analysis is an example that the kernel
can analyze the speculative execution path that cpu will
take before the code starts executing.
Unprivileged bpf can made absolutely secure. It can be
made more secure than the rest of the kernel.
But today we should just go with unprivileged_bpf_disabled=1
and CAP_BPF.


^ permalink raw reply

* Re: [PATCH bpf-next] libbpf: relicense bpf_helpers.h and bpf_endian.h
From: Naveen N. Rao @ 2019-08-19 17:27 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, netdev
  Cc: andrii.nakryiko, David S . Miller, Michael Holzheu,
	John Fastabend, kernel-team, Michal Rostecki, Sargun Dhillon
In-Reply-To: <20190816054543.2215626-1-andriin@fb.com>

Andrii Nakryiko wrote:
> bpf_helpers.h and bpf_endian.h contain useful macros and BPF helper
> definitions essential to almost every BPF program. Which makes them
> useful not just for selftests. To be able to expose them as part of
> libbpf, though, we need them to be dual-licensed as LGPL-2.1 OR
> BSD-2-Clause. This patch updates licensing of those two files.

Quite late, but FWIW:
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>


- Naveen


^ permalink raw reply

* [PATCH net-next v2 2/4] net: mdio: add PTP offset compensation to mdiobus_write_sts
From: Hubert Feurstein @ 2019-08-19 17:28 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: Hubert Feurstein, Andrew Lunn, Richard Cochran, Florian Fainelli,
	Heiner Kallweit, Vladimir Oltean, David S. Miller
In-Reply-To: <20190819172827.9550-1-hubert.feurstein@vahle.at>

From: Hubert Feurstein <h.feurstein@gmail.com>

The slow MDIO access introduces quite a big offset (~13us) to the PTP
system time synchronisation. With this patch the driver has the possibility
to set the correct offset which can then be compensated.

Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
---
 drivers/net/phy/mdio_bus.c | 12 ++++++++++++
 include/linux/phy.h        |  8 ++++++++
 2 files changed, 20 insertions(+)

diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
index 4dba2714495e..50a37cf46f96 100644
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -739,6 +739,18 @@ int __mdiobus_write_sts(struct mii_bus *bus, int addr, u32 regnum, u16 val,
 	if (!(bus->flags & MII_BUS_F_PTP_STS_SUPPORTED))
 		ptp_read_system_postts(sts);
 
+	/* PTP offset compensation:
+	 * After the MDIO access is completed (from the chip perspective), the
+	 * switch chip will snapshot the PHC timestamp. To make sure our system
+	 * timestamp corresponds to the PHC timestamp, we have to add the
+	 * duration of this MDIO access to sts->post_ts. Linuxptp's phc2sys
+	 * takes the average of pre_ts and post_ts to calculate the final
+	 * system timestamp. With this in mind, we have to add ptp_sts_offset
+	 * twice to post_ts, in order to not introduce an constant time offset.
+	 */
+	if (sts)
+		timespec64_add_ns(&sts->post_ts, 2 * bus->ptp_sts_offset);
+
 	return retval;
 }
 EXPORT_SYMBOL(__mdiobus_write_sts);
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 0b33662e0320..615df9c7f2c3 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -283,8 +283,16 @@ struct mii_bus {
 	 * The ptp_read_system_*ts functions already check the ptp_sts pointer.
 	 * The MII_BUS_F_PTP_STS_SUPPORTED-bit must be set in flags, when the
 	 * MDIO bus driver takes the timestamps as described above.
+	 *
+	 * @ptp_sts_offset: This is the compensation offset for the system
+	 * timestamp which is introduced by the slow MDIO access duration. An
+	 * MDIO access consists of 32 clock cycles. Usually the MDIO bus runs
+	 * at ~2.5MHz, so we have to compensate ~12800ns offset.
+	 * Set the ptp_sts_offset to the exact duration of one MDIO frame
+	 * (= 32 * clock-period) in nano-seconds.
 	 */
 	struct ptp_system_timestamp *ptp_sts;
+	u32 ptp_sts_offset;
 };
 
 #define to_mii_bus(d) container_of(d, struct mii_bus, dev)
-- 
2.22.1


^ permalink raw reply related

* [PATCH net-next v2 4/4] net: fec: add support for PTP system timestamping for MDIO devices
From: Hubert Feurstein @ 2019-08-19 17:28 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: Hubert Feurstein, Andrew Lunn, Richard Cochran, Fugang Duan,
	Vladimir Oltean, David S. Miller
In-Reply-To: <20190819172827.9550-1-hubert.feurstein@vahle.at>

From: Hubert Feurstein <h.feurstein@gmail.com>

In order to improve the synchronisation precision of phc2sys (from
the linuxptp project) for devices like switches which are attached
to the MDIO bus, it is necessary the get the system timestamps as
close as possible to the access which causes the PTP timestamp
register to be snapshotted in the switch hardware. Usually this is
triggered by an MDIO write access, the snapshotted timestamp is then
transferred by several MDIO reads.

The ptp_read_system_*ts functions already check the ptp_sts pointer.

Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
---
 drivers/net/ethernet/freescale/fec_main.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index c01d3ec3e9af..dd1253683ac0 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -1815,10 +1815,12 @@ static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
 	reinit_completion(&fep->mdio_done);
 
 	/* start a write op */
+	ptp_read_system_prets(bus->ptp_sts);
 	writel(FEC_MMFR_ST | FEC_MMFR_OP_WRITE |
 		FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
 		FEC_MMFR_TA | FEC_MMFR_DATA(value),
 		fep->hwp + FEC_MII_DATA);
+	ptp_read_system_postts(bus->ptp_sts);
 
 	/* wait for end of transfer */
 	time_left = wait_for_completion_timeout(&fep->mdio_done,
@@ -1956,7 +1958,7 @@ static int fec_enet_mii_init(struct platform_device *pdev)
 	struct fec_enet_private *fep = netdev_priv(ndev);
 	struct device_node *node;
 	int err = -ENXIO;
-	u32 mii_speed, holdtime;
+	u32 mii_speed, mii_period, holdtime;
 
 	/*
 	 * The i.MX28 dual fec interfaces are not equal.
@@ -1993,6 +1995,7 @@ static int fec_enet_mii_init(struct platform_device *pdev)
 	 * document.
 	 */
 	mii_speed = DIV_ROUND_UP(clk_get_rate(fep->clk_ipg), 5000000);
+	mii_period = div_u64((u64)mii_speed * 2 * NSEC_PER_SEC, clk_get_rate(fep->clk_ipg));
 	if (fep->quirks & FEC_QUIRK_ENET_MAC)
 		mii_speed--;
 	if (mii_speed > 63) {
@@ -2034,6 +2037,8 @@ static int fec_enet_mii_init(struct platform_device *pdev)
 		pdev->name, fep->dev_id + 1);
 	fep->mii_bus->priv = fep;
 	fep->mii_bus->parent = &pdev->dev;
+	fep->mii_bus->flags = MII_BUS_F_PTP_STS_SUPPORTED;
+	fep->mii_bus->ptp_sts_offset = 32 * mii_period;
 
 	node = of_get_child_by_name(pdev->dev.of_node, "mdio");
 	err = of_mdiobus_register(fep->mii_bus, node);
-- 
2.22.1


^ permalink raw reply related

* [PATCH net-next v2 1/4] net: mdio: add support for passing a PTP system timestamp to the mii_bus driver
From: Hubert Feurstein @ 2019-08-19 17:28 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: Hubert Feurstein, Andrew Lunn, Richard Cochran, Florian Fainelli,
	Heiner Kallweit, Vladimir Oltean, David S. Miller
In-Reply-To: <20190819172827.9550-1-hubert.feurstein@vahle.at>

From: Hubert Feurstein <h.feurstein@gmail.com>

In order to improve the synchronisation precision of phc2sys (from
the linuxptp project) for devices like switches which are attached
to the MDIO bus, it is necessary the get the system timestamps as
close as possible to the access which causes the PTP timestamp
register to be snapshotted in the switch hardware. Usually this is
triggered by an MDIO write access, the snapshotted timestamp is then
transferred by several MDIO reads.

This patch adds the required infrastructure to solve the problem described
above.

Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
---
Changes in v2:
 - Removed mdiobus_write_sts as there was no user
 - Removed ptp_sts_supported-boolean and introduced flags instead

 drivers/net/phy/mdio_bus.c | 76 ++++++++++++++++++++++++++++++++++++++
 include/linux/mdio.h       |  5 +++
 include/linux/phy.h        | 34 +++++++++++++++++
 3 files changed, 115 insertions(+)

diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
index bd04fe762056..4dba2714495e 100644
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -34,6 +34,7 @@
 #include <linux/phy.h>
 #include <linux/io.h>
 #include <linux/uaccess.h>
+#include <linux/ptp_clock_kernel.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/mdio.h>
@@ -697,6 +698,81 @@ int mdiobus_write(struct mii_bus *bus, int addr, u32 regnum, u16 val)
 }
 EXPORT_SYMBOL(mdiobus_write);
 
+/**
+ * __mdiobus_write_sts - Unlocked version of the mdiobus_write_sts function
+ * @bus: the mii_bus struct
+ * @addr: the phy address
+ * @regnum: register number to write
+ * @val: value to write to @regnum
+ * @sts: the ptp system timestamp
+ *
+ * Write a MDIO bus register and request the MDIO bus driver to take the
+ * system timestamps when sts-pointer is valid. When the bus driver doesn't
+ * support this, the timestamps are taken in this function instead.
+ *
+ * In order to improve the synchronisation precision of phc2sys (from
+ * the linuxptp project) for devices like switches which are attached
+ * to the MDIO bus, it is necessary the get the system timestamps as
+ * close as possible to the access which causes the PTP timestamp
+ * register to be snapshotted in the switch hardware. Usually this is
+ * triggered by an MDIO write access, the snapshotted timestamp is then
+ * transferred by several MDIO reads.
+ *
+ * Caller must hold the mdio bus lock.
+ *
+ * NOTE: MUST NOT be called from interrupt context.
+ */
+int __mdiobus_write_sts(struct mii_bus *bus, int addr, u32 regnum, u16 val,
+			struct ptp_system_timestamp *sts)
+{
+	int retval;
+
+	WARN_ON_ONCE(!mutex_is_locked(&bus->mdio_lock));
+
+	if (!(bus->flags & MII_BUS_F_PTP_STS_SUPPORTED))
+		ptp_read_system_prets(sts);
+
+	bus->ptp_sts = sts;
+	retval = __mdiobus_write(bus, addr, regnum, val);
+	bus->ptp_sts = NULL;
+
+	if (!(bus->flags & MII_BUS_F_PTP_STS_SUPPORTED))
+		ptp_read_system_postts(sts);
+
+	return retval;
+}
+EXPORT_SYMBOL(__mdiobus_write_sts);
+
+/**
+ * mdiobus_write_sts_nested - Nested version of the mdiobus_write_sts function
+ * @bus: the mii_bus struct
+ * @addr: the phy address
+ * @regnum: register number to write
+ * @val: value to write to @regnum
+ * @sts: the ptp system timestamp
+ *
+ * In case of nested MDIO bus access avoid lockdep false positives by
+ * using mutex_lock_nested().
+ *
+ * NOTE: MUST NOT be called from interrupt context,
+ * because the bus read/write functions may wait for an interrupt
+ * to conclude the operation.
+ */
+int mdiobus_write_sts_nested(struct mii_bus *bus, int addr, u32 regnum, u16 val,
+			     struct ptp_system_timestamp *sts)
+{
+	int retval;
+
+	BUG_ON(in_interrupt());
+
+	mutex_lock_nested(&bus->mdio_lock, MDIO_MUTEX_NESTED);
+	retval = __mdiobus_write_sts(bus, addr, regnum, val, sts);
+	mutex_unlock(&bus->mdio_lock);
+
+	return retval;
+}
+EXPORT_SYMBOL(mdiobus_write_sts_nested);
+
 /**
  * mdio_bus_match - determine if given MDIO driver supports the given
  *		    MDIO device
diff --git a/include/linux/mdio.h b/include/linux/mdio.h
index e8242ad88c81..482388341c7b 100644
--- a/include/linux/mdio.h
+++ b/include/linux/mdio.h
@@ -9,6 +9,7 @@
 #include <uapi/linux/mdio.h>
 #include <linux/mod_devicetable.h>
 
+struct ptp_system_timestamp;
 struct gpio_desc;
 struct mii_bus;
 
@@ -305,11 +306,15 @@ static inline void mii_10gbt_stat_mod_linkmode_lpa_t(unsigned long *advertising,
 
 int __mdiobus_read(struct mii_bus *bus, int addr, u32 regnum);
 int __mdiobus_write(struct mii_bus *bus, int addr, u32 regnum, u16 val);
+int __mdiobus_write_sts(struct mii_bus *bus, int addr, u32 regnum, u16 val,
+			struct ptp_system_timestamp *sts);
 
 int mdiobus_read(struct mii_bus *bus, int addr, u32 regnum);
 int mdiobus_read_nested(struct mii_bus *bus, int addr, u32 regnum);
 int mdiobus_write(struct mii_bus *bus, int addr, u32 regnum, u16 val);
 int mdiobus_write_nested(struct mii_bus *bus, int addr, u32 regnum, u16 val);
+int mdiobus_write_sts_nested(struct mii_bus *bus, int addr, u32 regnum, u16 val,
+			     struct ptp_system_timestamp *sts);
 
 int mdiobus_register_device(struct mdio_device *mdiodev);
 int mdiobus_unregister_device(struct mdio_device *mdiodev);
diff --git a/include/linux/phy.h b/include/linux/phy.h
index d26779f1fb6b..0b33662e0320 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -205,6 +205,13 @@ struct device;
 struct phylink;
 struct sk_buff;
 
+/* MII-bus flags:
+ * @MII_BUS_F_PTP_STS_SUPPORTED: The driver supports PTP system timestamping
+ */
+enum mii_bus_flags {
+	MII_BUS_F_PTP_STS_SUPPORTED = BIT(0)
+};
+
 /*
  * The Bus class for PHYs.  Devices which provide access to
  * PHYs should register using this structure
@@ -252,7 +259,34 @@ struct mii_bus {
 	int reset_delay_us;
 	/* RESET GPIO descriptor pointer */
 	struct gpio_desc *reset_gpiod;
+
+	/* Feature flags */
+	u32 flags;
+
+	/* PTP system timestamping support
+	 *
+	 * In order to improve the synchronisation precision of phc2sys (from
+	 * the linuxptp project) for devices like switches which are attached
+	 * to the MDIO bus, it is necessary the get the system timestamps as
+	 * close as possible to the access which causes the PTP timestamp
+	 * register to be snapshotted in the switch hardware. Usually this is
+	 * triggered by an MDIO write access, the snapshotted timestamp is then
+	 * transferred by several MDIO reads.
+	 *
+	 * The switch driver can use mdio_write_sts*() to pass through the
+	 * system timestamp pointer @ptp_sts to the MDIO bus driver. The bus
+	 * driver simply has to do the following calls in its write handler:
+	 *	ptp_read_system_prets(bus->ptp_sts);
+	 *	writel(value, mdio-register)
+	 *	ptp_read_system_postts(bus->ptp_sts);
+	 *
+	 * The ptp_read_system_*ts functions already check the ptp_sts pointer.
+	 * The MII_BUS_F_PTP_STS_SUPPORTED-bit must be set in flags, when the
+	 * MDIO bus driver takes the timestamps as described above.
+	 */
+	struct ptp_system_timestamp *ptp_sts;
 };
+
 #define to_mii_bus(d) container_of(d, struct mii_bus, dev)
 
 struct mii_bus *mdiobus_alloc_size(size_t);
-- 
2.22.1


^ permalink raw reply related

* [PATCH net-next v2 3/4] net: dsa: mv88e6xxx: extend PTP gettime function to read system clock
From: Hubert Feurstein @ 2019-08-19 17:28 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: Hubert Feurstein, Andrew Lunn, Richard Cochran, Vivien Didelot,
	Florian Fainelli, Vladimir Oltean, David S. Miller
In-Reply-To: <20190819172827.9550-1-hubert.feurstein@vahle.at>

From: Hubert Feurstein <h.feurstein@gmail.com>

This adds support for the PTP_SYS_OFFSET_EXTENDED ioctl.

Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
---
 drivers/net/dsa/mv88e6xxx/chip.h |  2 ++
 drivers/net/dsa/mv88e6xxx/ptp.c  | 11 +++++++----
 drivers/net/dsa/mv88e6xxx/smi.c  |  3 ++-
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h
index a406be2f5652..1bfde0d8a5a3 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.h
+++ b/drivers/net/dsa/mv88e6xxx/chip.h
@@ -275,6 +275,8 @@ struct mv88e6xxx_chip {
 	struct ptp_clock_info	ptp_clock_info;
 	struct delayed_work	tai_event_work;
 	struct ptp_pin_desc	pin_config[MV88E6XXX_MAX_GPIO];
+	struct ptp_system_timestamp *ptp_sts;
+
 	u16 trig_config;
 	u16 evcap_config;
 	u16 enable_count;
diff --git a/drivers/net/dsa/mv88e6xxx/ptp.c b/drivers/net/dsa/mv88e6xxx/ptp.c
index 073cbd0bb91b..cf6e52ee9e0a 100644
--- a/drivers/net/dsa/mv88e6xxx/ptp.c
+++ b/drivers/net/dsa/mv88e6xxx/ptp.c
@@ -235,14 +235,17 @@ static int mv88e6xxx_ptp_adjtime(struct ptp_clock_info *ptp, s64 delta)
 	return 0;
 }
 
-static int mv88e6xxx_ptp_gettime(struct ptp_clock_info *ptp,
-				 struct timespec64 *ts)
+static int mv88e6xxx_ptp_gettimex(struct ptp_clock_info *ptp,
+				  struct timespec64 *ts,
+				  struct ptp_system_timestamp *sts)
 {
 	struct mv88e6xxx_chip *chip = ptp_to_chip(ptp);
 	u64 ns;
 
 	mv88e6xxx_reg_lock(chip);
+	chip->ptp_sts = sts;
 	ns = timecounter_read(&chip->tstamp_tc);
+	chip->ptp_sts = NULL;
 	mv88e6xxx_reg_unlock(chip);
 
 	*ts = ns_to_timespec64(ns);
@@ -426,7 +429,7 @@ static void mv88e6xxx_ptp_overflow_check(struct work_struct *work)
 	struct mv88e6xxx_chip *chip = dw_overflow_to_chip(dw);
 	struct timespec64 ts;
 
-	mv88e6xxx_ptp_gettime(&chip->ptp_clock_info, &ts);
+	mv88e6xxx_ptp_gettimex(&chip->ptp_clock_info, &ts, NULL);
 
 	schedule_delayed_work(&chip->overflow_work,
 			      MV88E6XXX_TAI_OVERFLOW_PERIOD);
@@ -472,7 +475,7 @@ int mv88e6xxx_ptp_setup(struct mv88e6xxx_chip *chip)
 	chip->ptp_clock_info.max_adj    = MV88E6XXX_MAX_ADJ_PPB;
 	chip->ptp_clock_info.adjfine	= mv88e6xxx_ptp_adjfine;
 	chip->ptp_clock_info.adjtime	= mv88e6xxx_ptp_adjtime;
-	chip->ptp_clock_info.gettime64	= mv88e6xxx_ptp_gettime;
+	chip->ptp_clock_info.gettimex64	= mv88e6xxx_ptp_gettimex;
 	chip->ptp_clock_info.settime64	= mv88e6xxx_ptp_settime;
 	chip->ptp_clock_info.enable	= ptp_ops->ptp_enable;
 	chip->ptp_clock_info.verify	= ptp_ops->ptp_verify;
diff --git a/drivers/net/dsa/mv88e6xxx/smi.c b/drivers/net/dsa/mv88e6xxx/smi.c
index 282fe08db050..abedd04ff2ae 100644
--- a/drivers/net/dsa/mv88e6xxx/smi.c
+++ b/drivers/net/dsa/mv88e6xxx/smi.c
@@ -45,7 +45,8 @@ static int mv88e6xxx_smi_direct_write(struct mv88e6xxx_chip *chip,
 {
 	int ret;
 
-	ret = mdiobus_write_nested(chip->bus, dev, reg, data);
+	ret = mdiobus_write_sts_nested(chip->bus, dev, reg, data,
+				       chip->ptp_sts);
 	if (ret < 0)
 		return ret;
 
-- 
2.22.1


^ permalink raw reply related

* [PATCH net-next v2 0/4] Improve phc2sys precision for mv88e6xxx switch in combination with imx6-fec
From: Hubert Feurstein @ 2019-08-19 17:28 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: Hubert Feurstein, Andrew Lunn, Richard Cochran, Vivien Didelot,
	Florian Fainelli, Heiner Kallweit, Vladimir Oltean, Fugang Duan,
	David S. Miller

With this patchset the phc2sys synchronisation precision improved to +/-555ns on
an IMX6DL with an MV88E6220 switch attached.

This patchset takes into account the comments from the following discussions:
- https://lkml.org/lkml/2019/8/2/1364
- https://lkml.org/lkml/2019/8/5/169

Patch 01 adds the required infrastructure in the MDIO layer.
Patch 02 adds additional PTP offset compensation
Patch 03 adds support for the PTP_SYS_OFFSET_EXTENDED ioctl in the mv88e6xxx driver.
Patch 04 adds support for the PTP system timestamping in the imx-fec driver.

The following tests show the improvement caused by each patch. The system clock 
precision was set to 15ns instead of 333ns (as described in https://lkml.org/lkml/2019/8/2/1364).

Without this patchset applied, the phc2sys synchronisation performance was very 
poor:

  offset: min -27120 max 28840 mean 2.44 stddev 8040.78 count 1236
  delay:  min 282103 max 386385 mean 352568.03 stddev 27814.27 count 1236
  (test runtime 20 minutes)

Results after appling patch 01-03:

  offset: min -12316 max 13314 mean -9.38 stddev 4274.82 count 1022
  delay:  min 69977 max 96266 mean 87939.04 stddev 6466.17 count 1022
  (test runtime 16 minutes)

Results after appling patch 03:

  offset: min -788 max 528 mean -0.06 stddev 185.02 count 7171
  delay:  min 1773 max 2031 mean 1909.43 stddev 33.74 count 7171
  (test runtime 119 minutes)

Hubert Feurstein (4):
  net: mdio: add support for passing a PTP system timestamp to the
    mii_bus driver
  net: mdio: add PTP offset compensation to mdiobus_write_sts
  net: dsa: mv88e6xxx: extend PTP gettime function to read system clock
  net: fec: add support for PTP system timestamping for MDIO devices

 drivers/net/dsa/mv88e6xxx/chip.h          |  2 +
 drivers/net/dsa/mv88e6xxx/ptp.c           | 11 +--
 drivers/net/dsa/mv88e6xxx/smi.c           |  3 +-
 drivers/net/ethernet/freescale/fec_main.c |  7 +-
 drivers/net/phy/mdio_bus.c                | 88 +++++++++++++++++++++++
 include/linux/mdio.h                      |  5 ++
 include/linux/phy.h                       | 42 +++++++++++
 7 files changed, 152 insertions(+), 6 deletions(-)

-- 
2.22.1


^ permalink raw reply

* Re: [PATCH net-next 3/6] net: dsa: enable and disable all ports
From: Marek Behun @ 2019-08-19 17:32 UTC (permalink / raw)
  To: Vivien Didelot; +Cc: netdev, davem, f.fainelli, andrew
In-Reply-To: <20190818173548.19631-4-vivien.didelot@gmail.com>

On Sun, 18 Aug 2019 13:35:45 -0400
Vivien Didelot <vivien.didelot@gmail.com> wrote:

> Call the .port_enable and .port_disable functions for all ports,
> not only the user ports, so that drivers may optimize the power
> consumption of all ports after a successful setup.
> 
> Unused ports are now disabled on setup. CPU and DSA ports are now
> enabled on setup and disabled on teardown. User ports were already
> enabled at slave creation and disabled at slave destruction.
> 
> Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>

My original reason for enabling CPU and DSA ports is that enabling
serdes irq could not be done in .setup in mv88e6xxx, since the required
phylink structures did not yet exists for those ports.

The case after this patch would be that .port_enable is called for
CPU/DSA ports right after these required phylink structures are created
for this port. A thought came to me while reading this that some driver
in the future can expect, in their implementation of
port_enable/port_disable, that phylink structures already exist for all
ports, not just the one being currently enabled/disabled.

Wouldn't it be safer if CPU/DSA ports were enabled in setup after all
ports are registered, and disabled in teardown before ports are
unregistered?

Current:
  ->setup()
  for each port
      dsa_port_link_register_of()
      dsa_port_enable()

Proposed:
  ->setup()
  for each port
      dsa_port_link_register_of()
  for each port
      dsa_port_enable()

Marek

^ permalink raw reply

* Re: [PATCH net-next 2/3] net: dsa: mv88e6xxx: extend PTP gettime function to read system clock
From: Andrew Lunn @ 2019-08-19 17:34 UTC (permalink / raw)
  To: Hubert Feurstein
  Cc: netdev, lkml, Richard Cochran, Vivien Didelot, Florian Fainelli,
	Vladimir Oltean, David S. Miller
In-Reply-To: <CAFfN3gUNrnjdOt8bW2EugzjSZMb_vvdEaLN0yOv_06=roqcJYQ@mail.gmail.com>

On Mon, Aug 19, 2019 at 07:14:25PM +0200, Hubert Feurstein wrote:
> Hi Andrew,
> 
> Am Mo., 19. Aug. 2019 um 15:27 Uhr schrieb Andrew Lunn <andrew@lunn.ch>:
> >
> > > @@ -45,7 +45,8 @@ static int mv88e6xxx_smi_direct_write(struct mv88e6xxx_chip *chip,
> > >  {
> > >       int ret;
> > >
> > > -     ret = mdiobus_write_nested(chip->bus, dev, reg, data);
> > > +     ret = mdiobus_write_sts_nested(chip->bus, dev, reg, data,
> > > +                                    chip->ptp_sts);
> > >       if (ret < 0)
> > >               return ret;
> > >
> >
> > Please also make a similar change to mv88e6xxx_smi_indirect_write().
> > The last write in that function should be timestamped.
> Since it is already the last write it should be already ok (The
> mv88e6xxx_smi_indirect_write
> calls the mv88e6xxx_smi_direct_write which initiates the timestamping).

Hi Hubert

But you are also time stamping the first write as well. And it seems
like it is not completely for free. So it would be nice to limit it to
the write which actually matters.

    Andrew

^ permalink raw reply

* Re: Help needed - Kernel lockup while running ipsec
From: Florian Westphal @ 2019-08-19 17:38 UTC (permalink / raw)
  To: Vakul Garg; +Cc: netdev@vger.kernel.org
In-Reply-To: <DB7PR04MB4620CD9AFFAFF8678F803DCE8BA80@DB7PR04MB4620.eurprd04.prod.outlook.com>

Vakul Garg <vakul.garg@nxp.com> wrote:
> Hi
> 
> With kernel 4.14.122, I am getting a kernel softlockup while running single static ipsec tunnel.
> The problem reproduces mostly after running 8-10 hours of ipsec encap test (on my dual core arm board).
> 
> I found that in function xfrm_policy_lookup_bytype(), the policy in variable 'ret' shows refcnt=0 under problem situation.
> This creates an infinite loop in  xfrm_policy_lookup_bytype() and hence the lockup.
> 
> Can some body please provide me pointers about 'refcnt'?
> Is it legitimate for 'refcnt' to become '0'? Under what condition can it become '0'?

Yes, when policy is destroyed and the last user calls
xfrm_pol_put() which will invoke call_rcu to free the structure.

^ permalink raw reply

* Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf
From: Andy Lutomirski @ 2019-08-19 17:38 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Thomas Gleixner, Jordan Glover, Andy Lutomirski,
	Daniel Colascione, Song Liu, Kees Cook, Networking, bpf,
	Alexei Starovoitov, Daniel Borkmann, Kernel Team, Lorenz Bauer,
	Jann Horn, Greg KH, Linux API, LSM List
In-Reply-To: <20190819172718.jwnvvotssxwhc7m6@ast-mbp.dhcp.thefacebook.com>



> On Aug 19, 2019, at 10:27 AM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
>> On Mon, Aug 19, 2019 at 11:15:11AM +0200, Thomas Gleixner wrote:
>> Alexei,
>> 
>>> On Sat, 17 Aug 2019, Alexei Starovoitov wrote:
>>>> On Fri, Aug 16, 2019 at 10:28:29PM +0200, Thomas Gleixner wrote:
>>>> On Fri, 16 Aug 2019, Alexei Starovoitov wrote:
>>>> While real usecases are helpful to understand a design decision, the design
>>>> needs to be usecase independent.
>>>> 
>>>> The kernel provides mechanisms, not policies. My impression of this whole
>>>> discussion is that it is policy driven. That's the wrong approach.
>>> 
>>> not sure what you mean by 'policy driven'.
>>> Proposed CAP_BPF is a policy?
>> 
>> I was referring to the discussion as a whole.
>> 
>>> Can kernel.unprivileged_bpf_disabled=1 be used now?
>>> Yes, but it will weaken overall system security because things that
>>> use unpriv to load bpf and CAP_NET_ADMIN to attach bpf would need
>>> to move to stronger CAP_SYS_ADMIN.
>>> 
>>> With CAP_BPF both load and attach would happen under CAP_BPF
>>> instead of CAP_SYS_ADMIN.
>> 
>> I'm not arguing against that.
>> 
>>>> So let's look at the mechanisms which we have at hand:
>>>> 
>>>> 1) Capabilities
>>>> 
>>>> 2) SUID and dropping priviledges
>>>> 
>>>> 3) Seccomp and LSM
>>>> 
>>>> Now the real interesting questions are:
>>>> 
>>>> A) What kind of restrictions does BPF allow? Is it a binary on/off or is
>>>>    there a more finegrained control of BPF functionality?
>>>> 
>>>>    TBH, I can't tell.
>>>> 
>>>> B) Depending on the answer to #A what is the control possibility for
>>>>    #1/#2/#3 ?
>>> 
>>> Can any of the mechanisms 1/2/3 address the concern in mds.rst?
>> 
>> Well, that depends. As with any other security policy which is implemented
>> via these mechanisms, the policy can be strict enough to prevent it by not
>> allowing certain operations. The more fine-grained the control is, it
>> allows the administrator who implements the policy to remove the
>> 'dangerous' parts from an untrusted user.
>> 
>> So really question #A is important for this. Is BPF just providing a binary
>> ON/OFF knob or does it allow to disable/enable certain aspects of BPF
>> functionality in a more fine grained way? If the latter, then it might be
>> possible to control functionality which might be abused for exploits of
>> some sorts (including MDS) in a way which allows other parts of BBF to be
>> exposed to less priviledged contexts.
> 
> I see. So the kernel.unprivileged_bpf_disabled knob is binary and I think it's
> the right mechanism to expose to users.
> Having N knobs for every map/prog type won't decrease attack surface.
> In the other email Andy's quoting seccomp man page...
> Today seccomp cannot really look into bpf_attr syscall args, but even
> if it could it won't secure the system.
> Examples:
> 1.
> spectre v2 is using bpf in-kernel interpreter in speculative way.
> The mere presence of interpreter as part of kernel .text makes the exploit
> easier to do. That was the reason to do CONFIG_BPF_JIT_ALWAYS_ON.
> For this case even kernel.unprivileged_bpf_disabled=1 was hopeless.
> 
> 2.
> var4 doing store hazard. It doesn't matter which program type is used.
> load/store instructions are the same across program types.
> 
> 3.
> prog_array was used as part of var1. I guess it was simply more
> convenient for Jann to do it this way :) All other map types
> have the same out-of-bounds speculation issue.
> 
> In general side channels are cpu bugs that are exploited via sequences
> of cpu instructions. In that sense bpf infra provides these instructions.
> So all program types and all maps have the same level of 'side channel risk'.
> 
>>> I believe Andy wants to expand the attack surface when
>>> kernel.unprivileged_bpf_disabled=0
>>> Before that happens I'd like the community to work on addressing the text above.
>> 
>> Well, that text above can be removed when the BPF wizards are entirely sure
>> that BPF cannot be abused to exploit stuff. 
> 
> Myself and Daniel looked at it in detail. I think we understood
> MDS mechanism well enough. Right now we're fairly confident that
> combination of existing mechanisms we did for var4 and
> verifier speculative analysis protect us from MDS.
> The thing is that every new cpu bug is looked at through the bpf lenses.
> Can it be exploited through bpf? Complexity of side channels
> is growing. Can the most recent swapgs be exploited ?
> What if we kprobe+bpf somewhere ?
> I don't think there is an issue, but we will never be 'entirely sure'.
> Even if myself and Daniel are sure the concern will stay.
> Unprivileged bpf as a whole is the concern due to side channels.
> The number of them are not yet disclosed. Who is going to analyze them?
> imo the only answer to that is kernel.unprivileged_bpf_disabled=1
> which together with CONFIG_BPF_JIT_ALWAYS_ON is secure enough.
> The other option is to sprinkle every bpf load/store with lfence
> which will make execution so slow that it will be unusable.
> Which is effectively the same as unprivileged_bpf_disabled=1.
> 
> There are other things we can do. Like kasan-style shadow memory
> for bpf execution. Auto re-JITing the code after it's running.
> We can do lfences everywhere for some time then re-JIT
> when kasan-ed shadow memory shows only clean memory accesses.
> The beauty of BPF that it is analyze-able and JIT-able instruction set.
> The verifier speculative analysis is an example that the kernel
> can analyze the speculative execution path that cpu will
> take before the code starts executing.
> Unprivileged bpf can made absolutely secure. It can be
> made more secure than the rest of the kernel.
> But today we should just go with unprivileged_bpf_disabled=1

I’m still okay with this.

> and CAP_BPF.
> 

I think this needs more design work.  I’m halfway through writing up an actual proposal. I’ll send it soon.

^ permalink raw reply

* Re: [PATCH net-next] net: openvswitch: Set OvS recirc_id from tc chain
From: Marcelo Ricardo Leitner @ 2019-08-19 17:42 UTC (permalink / raw)
  To: Paul Blakey
  Cc: Pravin B Shelar, netdev, David S. Miller, Justin Pettit,
	Simon Horman, Vlad Buslov, Jiri Pirko, Roi Dayan, Yossi Kuperman,
	Rony Efraim, Oz Shlomo
In-Reply-To: <1566144059-8247-1-git-send-email-paulb@mellanox.com>

On Sun, Aug 18, 2019 at 07:00:59PM +0300, Paul Blakey wrote:
> What do you guys say about the following diff on top of the last one?
> Use static key, and also have OVS_DP_CMD_SET command probe/enable the feature.
> 
> This will allow userspace to probe the feature, and selectivly enable it via the
> OVS_DP_CMD_SET command.

I'm not convinced yet that we need something like this. Been
wondering, skb_ext_find() below is not that expensive if not in use.
It's just a bit check and that's it, it returns NULL.

And drivers will only be setting this if they have tc-offloading
enabled (assuming they won't be seeing it for chain 0 all the time).
On which case, with tc offloading, we need this in order to work
properly.

Is the bit checking really that worrysome?

> --- a/net/openvswitch/flow.c
> +++ b/net/openvswitch/flow.c
> @@ -853,8 +853,10 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
>  	key->mac_proto = res;
>  
>  #if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
> -	tc_ext = skb_ext_find(skb, TC_SKB_EXT);
> -	key->recirc_id = tc_ext ? tc_ext->chain : 0;
> +	if (static_branch_unlikely(&tc_recirc_sharing_support)) {
> +		tc_ext = skb_ext_find(skb, TC_SKB_EXT);
> +		key->recirc_id = tc_ext ? tc_ext->chain : 0;
> +	}
>  #else
>  	key->recirc_id = 0;
>  #endif
> -- 
> 1.8.3.1
> 

^ permalink raw reply

* Re: [GIT] Networking
From: pr-tracker-bot @ 2019-08-19 17:45 UTC (permalink / raw)
  To: David Miller; +Cc: torvalds, akpm, netdev, linux-kernel
In-Reply-To: <20190818.194615.2174476213333990592.davem@davemloft.net>

The pull request you sent on Sun, 18 Aug 2019 19:46:15 -0700 (PDT):

> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git refs/heads/master

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/06821504fd47a5e5b641aeeb638a0ae10a216ef8

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

^ permalink raw reply

* Re: [PATCH 13/16] include/asm-generic: prefer __section from compiler_attributes.h
From: Naveen N. Rao @ 2019-08-19 17:52 UTC (permalink / raw)
  To: akpm, Nick Desaulniers
  Cc: Anil S Keshavamurthy, Arnd Bergmann, Alexei Starovoitov, bpf,
	clang-built-linux, Daniel Borkmann, David S. Miller, jpoimboe,
	Martin KaFai Lau, linux-arch, linux-kernel, Masami Hiramatsu,
	miguel.ojeda.sandonis, netdev, sedat.dilek, Song Liu, yhs
In-Reply-To: <20190812215052.71840-13-ndesaulniers@google.com>

Nick Desaulniers wrote:
> Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
> Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
> ---

Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>

- Naveen


^ permalink raw reply

* [PATCH net-next 0/1] Add genphy_c45_config_aneg() function to phy-c45.c
From: Marco Hartmann @ 2019-08-19 17:52 UTC (permalink / raw)
  To: Marco Hartmann, Christian Herber, andrew@lunn.ch,
	f.fainelli@gmail.com, hkallweit1@gmail.com, davem@davemloft.net,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org

Currently, using a Clause 45 Phy with the "Generic Clause 45 PHY"
driver leads to a warning, similar to the one below, as soon as the interface
is brought up.


  ------------[ cut here ]------------
  WARNING: CPU: 2 PID: 146 at drivers/net/phy/phy.c:736 phy_error+0x2c/0x64
  Modules linked in: fec
  CPU: 2 PID: 146 Comm: kworker/2:1 Not tainted 5.3.0-rc3-NETNEXT-00816-g48e924c73178 #20
  Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
  Workqueue: events_power_efficient phy_state_machine
  Backtrace: 
  ...

This happens, because the Genphy driver does not provide a config_aneg() func,
so that phy_start_aneg() ultimately fails such that phy_error() is called,
producing the above warning.

This patch adds the function genphy_c45_config_aneg(), which allows
phy_start_aneg() to work correctly for C45 phys.


Marco Hartmann (1):
  Add genphy_c45_config_aneg() function to phy-c45.c

 drivers/net/phy/phy-c45.c | 26 ++++++++++++++++++++++++++
 drivers/net/phy/phy.c     |  2 +-
 include/linux/phy.h       |  1 +
 3 files changed, 28 insertions(+), 1 deletion(-)

-- 
2.7.4


^ permalink raw reply

* [PATCH net-next 1/1] Add genphy_c45_config_aneg() function to phy-c45.c
From: Marco Hartmann @ 2019-08-19 17:52 UTC (permalink / raw)
  To: Marco Hartmann, Christian Herber, andrew@lunn.ch,
	f.fainelli@gmail.com, hkallweit1@gmail.com, davem@davemloft.net,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <1566237157-9054-1-git-send-email-marco.hartmann@nxp.com>

and call it from phy_config_aneg().

commit 34786005eca3 ("net: phy: prevent PHYs w/o Clause 22 regs from
calling genphy_config_aneg") introduced a check that aborts
phy_config_aneg() if the phy is a C45 phy.
This causes phy_state_machine() to call phy_error() so that the phy
ends up in PHY_HALTED state.

Instead of returning -EOPNOTSUPP, call genphy_c45_config_aneg()
(analogous to the C22 case) so that the state machine can run
correctly.

genphy_c45_config_aneg() closely resembles mv3310_config_aneg()
in drivers/net/phy/marvell10g.c, excluding vendor specific
configurations for 1000BaseT.

Fixes: 34786005eca3 ("net: phy: prevent PHYs w/o Clause 22 regs from
calling genphy_config_aneg")

Signed-off-by: Marco Hartmann <marco.hartmann@nxp.com>
---
 drivers/net/phy/phy-c45.c | 26 ++++++++++++++++++++++++++
 drivers/net/phy/phy.c     |  2 +-
 include/linux/phy.h       |  1 +
 3 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/drivers/net/phy/phy-c45.c b/drivers/net/phy/phy-c45.c
index b9d4145781ca..fa9062fd9122 100644
--- a/drivers/net/phy/phy-c45.c
+++ b/drivers/net/phy/phy-c45.c
@@ -509,6 +509,32 @@ int genphy_c45_read_status(struct phy_device *phydev)
 }
 EXPORT_SYMBOL_GPL(genphy_c45_read_status);
 
+/**
+ * genphy_c45_config_aneg - restart auto-negotiation or forced setup
+ * @phydev: target phy_device struct
+ *
+ * Description: If auto-negotiation is enabled, we configure the
+ *   advertising, and then restart auto-negotiation.  If it is not
+ *   enabled, then we force a configuration.
+ */
+int genphy_c45_config_aneg(struct phy_device *phydev)
+{
+	int ret;
+	bool changed = false;
+
+	if (phydev->autoneg == AUTONEG_DISABLE)
+		return genphy_c45_pma_setup_forced(phydev);
+
+	ret = genphy_c45_an_config_aneg(phydev);
+	if (ret < 0)
+		return ret;
+	if (ret > 0)
+		changed = true;
+
+	return genphy_c45_check_and_restart_aneg(phydev, changed);
+}
+EXPORT_SYMBOL_GPL(genphy_c45_config_aneg);
+
 /* The gen10g_* functions are the old Clause 45 stub */
 
 int gen10g_config_aneg(struct phy_device *phydev)
diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index f3adea9ef400..74c4e15ebe52 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -507,7 +507,7 @@ static int phy_config_aneg(struct phy_device *phydev)
 	 * allowed to call genphy_config_aneg()
 	 */
 	if (phydev->is_c45 && !(phydev->c45_ids.devices_in_package & BIT(0)))
-		return -EOPNOTSUPP;
+		return genphy_c45_config_aneg(phydev);
 
 	return genphy_config_aneg(phydev);
 }
diff --git a/include/linux/phy.h b/include/linux/phy.h
index d26779f1fb6b..a7ecbe0e55aa 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -1117,6 +1117,7 @@ int genphy_c45_an_disable_aneg(struct phy_device *phydev);
 int genphy_c45_read_mdix(struct phy_device *phydev);
 int genphy_c45_pma_read_abilities(struct phy_device *phydev);
 int genphy_c45_read_status(struct phy_device *phydev);
+int genphy_c45_config_aneg(struct phy_device *phydev);
 
 /* The gen10g_* functions are the old Clause 45 stub */
 int gen10g_config_aneg(struct phy_device *phydev);
-- 
2.7.4


^ permalink raw reply related

* Re: [PATCH 13/16] include/asm-generic: prefer __section from compiler_attributes.h
From: Sedat Dilek @ 2019-08-19 17:56 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: akpm, Nick Desaulniers, Anil S Keshavamurthy, Arnd Bergmann,
	Alexei Starovoitov, bpf, Clang-Built-Linux ML, Daniel Borkmann,
	David S. Miller, jpoimboe, Martin KaFai Lau, linux-arch,
	linux-kernel, Masami Hiramatsu, Miguel Ojeda, netdev, Song Liu,
	yhs
In-Reply-To: <1566237106.8670clhnrk.naveen@linux.ibm.com>

On Mon, Aug 19, 2019 at 7:52 PM Naveen N. Rao
<naveen.n.rao@linux.ibm.com> wrote:
>
> Nick Desaulniers wrote:
> > Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
> > Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
> > Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
> > ---
>
> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
>

Tested-by: Sedat Dilek <sedat.dilek@gmail.com> [ Linux v5.3-rc5 ]

Patchset "for-5.3/x86-section-name-escaping":

include/linux/compiler.h: remove unused KENTRY macro
include/linux: prefer __section from compiler_attributes.h
include/asm-generic: prefer __section from compiler_attributes.h
x86: prefer __section from compiler_attributes.h

- Sedat -

^ permalink raw reply

* Re: [PATCH net-next 0/8] sctp: support per endpoint auth and asconf flags
From: Marcelo Ricardo Leitner @ 2019-08-19 17:58 UTC (permalink / raw)
  To: Xin Long; +Cc: network dev, linux-sctp, Neil Horman, davem
In-Reply-To: <cover.1566223325.git.lucien.xin@gmail.com>

On Mon, Aug 19, 2019 at 10:02:42PM +0800, Xin Long wrote:
> This patchset mostly does 3 things:
> 
>   1. add per endpint asconf flag and use asconf flag properly
>      and add SCTP_ASCONF_SUPPORTED sockopt.
>   2. use auth flag properly and add SCTP_AUTH_SUPPORTED sockopt.
>   3. remove the 'global feature switch' to discard chunks.

What about net->sctp.addip_noauth, why wasn't it included as well?
I'm thinking that's because it's more of a compatibility knob and
that there isn't much value on having it more fine-grained than
per-netns. Just feels a bit odd to have the other two but not this
one, which is directly related, but yeah.

Otherwise, changes LGTM.

^ permalink raw reply

* Re: [PATCH 11/16] x86: prefer __section from compiler_attributes.h
From: Sedat Dilek @ 2019-08-19 17:59 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: akpm, jpoimboe, yhs, Miguel Ojeda, Clang-Built-Linux ML,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
	Song Liu, Armijn Hemel, Greg Kroah-Hartman, Allison Randal,
	Juergen Gross, Frederic Weisbecker, Brijesh Singh, Enrico Weigelt,
	Kate Stewart, Hannes Reinecke, Sean Christopherson,
	Rafael J. Wysocki, Pu Wen, linux-kernel, netdev, bpf
In-Reply-To: <20190812215052.71840-11-ndesaulniers@google.com>

On Mon, Aug 12, 2019 at 11:52 PM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
> Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>

Tested-by: Sedat Dilek <sedat.dilek@gmail.com> [ Linux v5.3-rc5 ]

Patchset "for-5.3/x86-section-name-escaping":

include/linux/compiler.h: remove unused KENTRY macro
include/linux: prefer __section from compiler_attributes.h
include/asm-generic: prefer __section from compiler_attributes.h
x86: prefer __section from compiler_attributes.h

Thanks.

- Sedat -

> ---
>  arch/x86/include/asm/cache.h       | 2 +-
>  arch/x86/include/asm/intel-mid.h   | 2 +-
>  arch/x86/include/asm/iommu_table.h | 5 ++---
>  arch/x86/include/asm/irqflags.h    | 2 +-
>  arch/x86/include/asm/mem_encrypt.h | 2 +-
>  arch/x86/kernel/cpu/cpu.h          | 3 +--
>  6 files changed, 7 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/include/asm/cache.h b/arch/x86/include/asm/cache.h
> index abe08690a887..bb9f4bf4ec02 100644
> --- a/arch/x86/include/asm/cache.h
> +++ b/arch/x86/include/asm/cache.h
> @@ -8,7 +8,7 @@
>  #define L1_CACHE_SHIFT (CONFIG_X86_L1_CACHE_SHIFT)
>  #define L1_CACHE_BYTES (1 << L1_CACHE_SHIFT)
>
> -#define __read_mostly __attribute__((__section__(".data..read_mostly")))
> +#define __read_mostly __section(.data..read_mostly)
>
>  #define INTERNODE_CACHE_SHIFT CONFIG_X86_INTERNODE_CACHE_SHIFT
>  #define INTERNODE_CACHE_BYTES (1 << INTERNODE_CACHE_SHIFT)
> diff --git a/arch/x86/include/asm/intel-mid.h b/arch/x86/include/asm/intel-mid.h
> index 8e5af119dc2d..f51f04aefe1b 100644
> --- a/arch/x86/include/asm/intel-mid.h
> +++ b/arch/x86/include/asm/intel-mid.h
> @@ -43,7 +43,7 @@ struct devs_id {
>
>  #define sfi_device(i)                                                          \
>         static const struct devs_id *const __intel_mid_sfi_##i##_dev __used     \
> -       __attribute__((__section__(".x86_intel_mid_dev.init"))) = &i
> +       __section(.x86_intel_mid_dev.init) = &i
>
>  /**
>  * struct mid_sd_board_info - template for SD device creation
> diff --git a/arch/x86/include/asm/iommu_table.h b/arch/x86/include/asm/iommu_table.h
> index 1fb3fd1a83c2..7d190710eb92 100644
> --- a/arch/x86/include/asm/iommu_table.h
> +++ b/arch/x86/include/asm/iommu_table.h
> @@ -50,9 +50,8 @@ struct iommu_table_entry {
>
>  #define __IOMMU_INIT(_detect, _depend, _early_init, _late_init, _finish)\
>         static const struct iommu_table_entry                           \
> -               __iommu_entry_##_detect __used                          \
> -       __attribute__ ((unused, __section__(".iommu_table"),            \
> -                       aligned((sizeof(void *)))))     \
> +               __iommu_entry_##_detect __used __section(.iommu_table)  \
> +               __aligned((sizeof(void *)))                             \
>         = {_detect, _depend, _early_init, _late_init,                   \
>            _finish ? IOMMU_FINISH_IF_DETECTED : 0}
>  /*
> diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
> index 8a0e56e1dcc9..68db90bca813 100644
> --- a/arch/x86/include/asm/irqflags.h
> +++ b/arch/x86/include/asm/irqflags.h
> @@ -9,7 +9,7 @@
>  #include <asm/nospec-branch.h>
>
>  /* Provide __cpuidle; we can't safely include <linux/cpu.h> */
> -#define __cpuidle __attribute__((__section__(".cpuidle.text")))
> +#define __cpuidle __section(.cpuidle.text)
>
>  /*
>   * Interrupt control:
> diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
> index 0c196c47d621..db2cd3709148 100644
> --- a/arch/x86/include/asm/mem_encrypt.h
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -50,7 +50,7 @@ void __init mem_encrypt_free_decrypted_mem(void);
>  bool sme_active(void);
>  bool sev_active(void);
>
> -#define __bss_decrypted __attribute__((__section__(".bss..decrypted")))
> +#define __bss_decrypted __section(.bss..decrypted)
>
>  #else  /* !CONFIG_AMD_MEM_ENCRYPT */
>
> diff --git a/arch/x86/kernel/cpu/cpu.h b/arch/x86/kernel/cpu/cpu.h
> index c0e2407abdd6..7ff9dc41a603 100644
> --- a/arch/x86/kernel/cpu/cpu.h
> +++ b/arch/x86/kernel/cpu/cpu.h
> @@ -38,8 +38,7 @@ struct _tlb_table {
>
>  #define cpu_dev_register(cpu_devX) \
>         static const struct cpu_dev *const __cpu_dev_##cpu_devX __used \
> -       __attribute__((__section__(".x86_cpu_dev.init"))) = \
> -       &cpu_devX;
> +       __section(.x86_cpu_dev.init) = &cpu_devX;
>
>  extern const struct cpu_dev *const __x86_cpu_dev_start[],
>                             *const __x86_cpu_dev_end[];
> --
> 2.23.0.rc1.153.gdeed80330f-goog
>

^ permalink raw reply

* Re: [PATCH 16/16] compiler_attributes.h: add note about __section
From: Sedat Dilek @ 2019-08-19 18:01 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: akpm, jpoimboe, yhs, Miguel Ojeda, Clang-Built-Linux ML,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	linux-kernel, netdev, bpf
In-Reply-To: <20190812215052.71840-16-ndesaulniers@google.com>

On Mon, Aug 12, 2019 at 11:53 PM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> The antipattern described can be found with:
> $ grep -e __section\(\" -r -e __section__\(\"
>
> Link: https://bugs.llvm.org/show_bug.cgi?id=42950
> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>

Tested-by: Sedat Dilek <sedat.dilek@gmail.com> [ Linux v5.3-rc5 ]

Patchset "for-5.3/x86-section-name-escaping":

include/linux/compiler.h: remove unused KENTRY macro
include/linux: prefer __section from compiler_attributes.h
include/asm-generic: prefer __section from compiler_attributes.h
x86: prefer __section from compiler_attributes.h

Thanks.

- Sedat -

> ---
>  include/linux/compiler_attributes.h | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/include/linux/compiler_attributes.h b/include/linux/compiler_attributes.h
> index 6b318efd8a74..f8c008d7f616 100644
> --- a/include/linux/compiler_attributes.h
> +++ b/include/linux/compiler_attributes.h
> @@ -225,6 +225,16 @@
>  #define __pure                          __attribute__((__pure__))
>
>  /*
> + *  Note: Since this macro makes use of the "stringification operator" `#`, a
> + *        quoted string literal should not be passed to it. eg.
> + *        prefer:
> + *        __section(.foo)
> + *        to:
> + *        __section(".foo")
> + *        unless the section name is dynamically built up, in which case the
> + *        verbose __attribute__((__section__(".foo" x))) should be preferred.
> + *        See also: https://bugs.llvm.org/show_bug.cgi?id=42950
> + *
>   *   gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-section-function-attribute
>   *   gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html#index-section-variable-attribute
>   * clang: https://clang.llvm.org/docs/AttributeReference.html#section-declspec-allocate
> --
> 2.23.0.rc1.153.gdeed80330f-goog
>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox